pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-11-03 07:24:58 +08:00

Author	SHA1	Message	Date
gchanan	7c7c9c3aa6	scatter/gather - check that inputs are of the same dimensionality (#41890 ) Co-authored-by: Nikita Vedeneev <nik@quansight.com>	2020-07-22 18:33:07 -07:00
Richard Zou	a2922f589d	[1.6.0] Mark `torch.set_deterministic` and `torch.is_deterministic` as experimental (#41870 ) This PR: - renames `torch.set_deterministic` to `torch._set_deterministic` - renames `torch.is_deterministic` to `torch._is_deterministic` - Modifies the docstrings for both to indicate that the feature is not yet complete. We would like to do this because this feature is experimental and the docstrings before this PR are misleading. This PR does not have an accompanying change in master. That is because there still is discussion over what the eventual state of the feature should be: https://github.com/pytorch/pytorch/issues/15359. I expect that there will be a better plan for this once 1.7 rolls around. Test Plan: - wait for CI	2020-07-22 18:32:47 -07:00
Jessica Lin	8acfecaecb	[1.6] Add optimizer_for_mobile doc into python api root doc (#41491 ) * Add optimizer_for_mobile doc into python api root doc * Apply suggestions from code review Remove all references to `optimization_blacklist` as it's missing in 1.6 Co-authored-by: Nikita Shulga <nshulga@fb.com>	2020-07-22 17:37:45 -07:00
anjali411	860e18a61b	Update torch.set_default_dtype doc (#41263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41263 Test Plan: Imported from OSS Differential Revision: D22482989 Pulled By: anjali411 fbshipit-source-id: 2aadfbb84bbab66f3111970734a37ba74d817ffd	2020-07-22 14:50:15 -07:00
anjali411	8f804baaa9	Doc note for complex (#41252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41252 Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D22553266 Pulled By: anjali411 fbshipit-source-id: f6dc409da048496d72b29b0976dfd3dd6645bc4d	2020-07-22 14:49:51 -07:00
anjali411	a395e0903e	Autograd Doc for Complex Numbers (#41012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41012 Test Plan: Imported from OSS Differential Revision: D22476911 Pulled By: anjali411 fbshipit-source-id: 7da20cb4312a0465272bebe053520d9911475828	2020-07-22 14:40:52 -07:00
Edward Z. Yang	2ca55430d2	Add reference documentation for torch/library.h (#41470 ) (#41602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41470 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D22577426 Pulled By: ezyang fbshipit-source-id: 4bfe5806061e74181a74d161c868acb7c1ecd1e4	2020-07-22 11:10:16 -07:00
Nikita Shulga	b8e77a42bd	Add CUDA11 build and test (#40452 ) (#41543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40452 Differential Revision: D22316007 Pulled By: malfet fbshipit-source-id: 94f4b4ba2a46ff3d3042ba842a615f8392cdc350 Co-authored-by: Gao, Xiang <qasdfgtyuiop@gmail.com>	2020-07-22 09:53:22 -07:00
gchanan	4081fdd3df	Revert "port masked_select from TH to ATen and optimize perf on CPU (#33269 )" (#41829 ) This reverts commit fe66bdb498efe912d8b9c437a14efa4295c04fdd. This also makes a sense to THTensorEvenMoreMath because sumall was removed, see THTensor_wrap.	2020-07-22 09:52:30 -07:00
Ashkan Aliabadi	cefb9e0cd6	Update pthreadpool to pthreadpool:029c88620802e1361ccf41d1970bd5b07fd6b7bb. (#40524 ) (#41190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40524 Reviewed By: ezyang Differential Revision: D22215742 Pulled By: AshkanAliabadi fbshipit-source-id: ef594e0901337a92b21ddd44e554da66c723eb7c	2020-07-10 09:11:32 -07:00
Luca Wehrstedt	d9e9e0087a	[v1.6] [RPC docs] Remove mention of TensorPipe's SHM and CMA backends as they're not built (#41229 ) Summary: In short, we messed up. The SHM and CMA backends of TensorPipe are Linux-specific and thus they are guarded by a #ifdef in the agent's code. Due to a mishap with CMake (due the fact that TensorPipe has two CMake files, one for PyTorch and a "standalone" one) we were not correctly propagating some flags and these #ifdefs were always false. This means that these two backends have always been disabled and have thus never been covered by our OSS CI. It would be irresponsible to enable them now in v1.6, so instead we remove any mention of them from the docs. Note that this is perhaps not as bad as it sounds. These two backends were providing higher performance (latency) when the two endpoints were on the same machine. However, I suspect that most RPC users will only do transfers across machines, for which SHM and CMA wouldn't have played any role. Original PR against master: #41200 (merged as dde3d5f4a8f713ecc4649d776565b68ca75ae5c8) Test Plan: Docs only	2020-07-10 09:02:08 -07:00
Nikita Shulga	43d746305c	Preserve CUDA gencode flags (#41212 ) Summary: Add `torch._C._cuda_getArchFlags()` that returns list of architecture `torch_cuda` were compiled with Add `torch.cuda.get_arch_list()` and `torch.cuda.get_gencode_flags()` methods that returns architecture list and gencode flags PyTorch were compiled with Print warning if some of GPUs is not compatible with any of the CUBINs Pull Request resolved: https://github.com/pytorch/pytorch/pull/41173 Differential Revision: D22459998 Pulled By: malfet fbshipit-source-id: 65d40ae29e54a0ba0f3f2da11b821fdb4d452d95	2020-07-09 17:34:50 -07:00
Negin Raoof	9409e03903	[ONNX][1.6] Update interpolate recompute_scale_factor default (#41117 ) * Update interpolate recompute_scale_factor default * Update upsampling.h * Update functional.py	2020-07-09 17:24:53 -07:00
Tongzhou Wang	c9a1853d2f	[1.6] Make IterableDataset DataLoader.__len__ warning clearer (#41185 ) * make IterableDataset DataLoader.__len__ warning clearer * typo	2020-07-09 14:07:58 -07:00
Nikita Shulga	7fa9b2923b	quantizer.cpp: fix cuda memory pinning (#41139 ) (#41194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41139 Fixes the test case in https://github.com/pytorch/pytorch/issues/41115 by using PyTorch's CUDA allocator instead of the old Caffe2 one. Test Plan: run the test case from the issue: https://gist.github.com/vkuzo/6d013aa1645cb986d0d4464a931c779b let's run CI and see what it uncovers Imported from OSS Reviewed By: malfet Differential Revision: D22438787 fbshipit-source-id: 0853b0115d198a99c43e6176aef34ea951bf5c2e Co-authored-by: Vasiliy Kuznetsov <vasiliy@fb.com>	2020-07-09 14:06:11 -07:00
Nikita Shulga	40bf15a8ac	Remove copy_ warnings for angle and abs for complex tensors (#41152 ) (#41191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41152 fixes https://github.com/pytorch/pytorch/issues/40838 Test Plan: Imported from OSS Differential Revision: D22444357 Pulled By: anjali411 fbshipit-source-id: 2879d0cffc0a011c624eb8e00c7b64bd33522cc3 Co-authored-by: anjali411 <chourdiaanjali123@gmail.com>	2020-07-09 13:41:15 -07:00
Ailing	c164fc4d7f	Patch #40883 to 1.6 release. (#41033 )	2020-07-09 10:25:39 -07:00
Nikita Shulga	e0b7480f34	Revert "make IterableDataset DataLoader.__len__ warning clearer (#41183 )" This reverts commit 89d7f194d8ea19f36c9afb52585a00b5b7d0ffeb.	2020-07-09 08:05:24 -07:00
Tongzhou Wang	89d7f194d8	make IterableDataset DataLoader.__len__ warning clearer (#41183 )	2020-07-09 08:00:00 -07:00
mrshenli	59bb44a8e8	Add a link in RPC doc page to point to PT Distributed overview (#41108 ) (#41156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41108 Test Plan: Imported from OSS Differential Revision: D22440751 Pulled By: mrshenli fbshipit-source-id: 9e7b002091a3161ae385fdfcc26484ae8fc243bb	2020-07-09 07:49:10 -07:00
Mike Ruberry	8f4d01d9f1	Disables unary op casting to output dtype (#41097 ) (#41160 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/41047. Some CPU kernel implementations don't call `cast_outputs()`, so when CPU temporaries were created to hold their outputs they weren't copied back to the out parameters correctly. Instead of fixing that issue, for simplicity this PR disables the behavior. The corresponding test in test_type_promotion.py is expanded with more operations to verify that unary ops can no longer have out arguments with different dtypes than their inputs (except in special cases like torch.abs which maps complex inputs to float outputs and torch.deg2rad which is secretly torch.mul). Pull Request resolved: https://github.com/pytorch/pytorch/pull/41097 Differential Revision: D22422352 Pulled By: mruberry fbshipit-source-id: 8e61d34ef1c9608790b35cf035302fd226fd9421 Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>	2020-07-08 22:06:48 -07:00
Rohan Varma	77ffb25925	Add guard for non-default stream in DDP's autograd engine callback (#40115 ) (#41151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40115 Closes https://github.com/pytorch/pytorch/issues/37790 Closes https://github.com/pytorch/pytorch/issues/37944 A user may wish to run DDP's forward + backwards step under a non-default CUDA stream such as those created by `with torch.cuda.Stream(stream)`. In this case, the user should be responsible for synchronizing events on this stream with other streams used in the program (per the documentation at https://pytorch.org/docs/stable/notes/cuda.html#cuda-semantics), but currently DDP has a bug which causes DDP under non-default streams to fail. If a user does the following: ``` model = DDP(...) loss = model(inptut).sum() loss.backward() grad = model.module.weight.grad() average = dist.all_reduce(grad) ``` There is a chance that `average` and `grad` will not be equal. This is because the CUDA kernels corresponding to the `all_reduce` call may run before `loss.backward()`'s kernels are finished. Specifically, in DDP we copy the allreduced gradients back to the model parameter gradients in an autograd engine callback, but this callback runs on the default stream. Note that this can also be fixed by the application synchronizing on the current stream, although this should not be expected, since the application is not using the current stream at all. This PR fixes the issue by passing the current stream into DDP's callback. Tested by adding a UT `test_DistributedDataParallel_non_default_stream` that fails without this PR ghstack-source-id: 106481208 Differential Revision: D22073353 fbshipit-source-id: 70da9b44e5f546ff8b6d8c42022ecc846dff033e	2020-07-08 21:08:17 -07:00
Nikita Shulga	af9600b1f5	[Caffe2] Move in-header virtual function implementation to .cc files (#41090 ) * Move OperatorSchema default inference function implementations to .cc… (#40845) Summary: … file This prevents implementation of those functions(as lambdas) to be embedded as weak symbol into every shared library that includes this header. Combination of this and https://github.com/pytorch/pytorch/pull/40844 reduces size of `libcaffe2_module_test_dynamic.so` from 500kb to 50Kb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40845 Differential Revision: D22334779 Pulled By: malfet fbshipit-source-id: 64706918fc2947350a58c0877f294b1b8b085455 * Move `OperatorBase::AddRelatedBlobInfo` implementation to .cc file (#40844) Summary: If virtual function is implemented in header file, it's implementation will be included as a weak symbol to every shared library that includes this header along with all of it's dependencies. This was one of the reasons why size of libcaffe2_module_test_dynamic.so was 500Kb (AddRelatedBlobInfo implementation pulled a quarter of libprotobuf.a with it) Combination of this and https://github.com/pytorch/pytorch/issues/40845 reduces size of `libcaffe2_module_test_dynamic.so` from 500kb to 50Kb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40844 Differential Revision: D22334725 Pulled By: malfet fbshipit-source-id: 836a4cbb9f344355ddd2512667e77472546616c0	2020-07-07 21:17:11 -07:00
Nikita Shulga	83262b1ba1	`torch._six.PY37` should be true for Python-3.8 as well (#40868 ) (#41091 ) Summary: Right now it is used to check whether `math.remainder` exists, which is the case for both Python-3.7 and 3.8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40868 Differential Revision: D22343454 Pulled By: malfet fbshipit-source-id: 6b6d4869705b64c4b952309120f92c04ac7e39fd	2020-07-07 17:15:01 -07:00
Nikita Shulga	f862a6ba4d	Remove unused Logger in get_matching_activations (#41023 ) (#41087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/41023 Remove Logger in get_matching_activations since it's not used. ghstack-source-id: 107237046 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_submodule_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_functional_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_functional_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_dynamic' Differential Revision: D22394957 fbshipit-source-id: 7d59e0f35e9f4c304b8487460d48236ee6e5a872 Co-authored-by: Haixin Liu <haixin@fb.com>	2020-07-07 16:09:37 -07:00
Nikita Shulga	f3c1ea7455	[PyTorch Numeric Suite] Remove unnecessary Logger in input arguments (#40890 ) (#41086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40890 Remove unnecessary Logger in input arguments and simplify the API. ghstack-source-id: 107110487 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_submodule_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_functional_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_functional_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_dynamic' Differential Revision: D22345477 fbshipit-source-id: d8b4eb3d6cb3049aa3296dead8ba29bf5467bd1c Co-authored-by: Haixin Liu <haixin@fb.com>	2020-07-07 16:09:11 -07:00
mrshenli	2ed3ad2891	fix autodoc for torch.distributed.launch (#40963 ) (#41089 ) Summary: The doc for `torch.distributed.launch` is missing since v1.2.0 (see issue https://github.com/pytorch/pytorch/issues/36386) because PR https://github.com/pytorch/pytorch/issues/22501 added some imports at the first line. `542ac74987/torch/distributed/launch.py (L1-L5)` I move it below the docstring to make the autodoc in Sphinx work normally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40963 Differential Revision: D22380816 Pulled By: mrshenli fbshipit-source-id: ee8406785b9a198bbf3fc65e589854379179496f Co-authored-by: Xin Yao <yaox12@outlook.com>	2020-07-07 14:23:31 -07:00
Jerry Zhang	a857af50a4	[quant][graphmode][fix] cloning schema in insert_observers (#40624 ) (#40934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40624 Previously we didn't clone schema, so the default schema is used, this is causing issue for some models Test Plan: Imported from OSS Differential Revision: D22259519 fbshipit-source-id: e2a393a54cb18f55da0c7152a74ddc22079ac350	2020-07-07 13:27:36 -07:00
Jerry Zhang	d0045e5520	Some fixes for graph mode quantization (#40935 ) * [quant] aten::repeat work for quantized tensor (#40644) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40644 Test Plan: Imported from OSS Differential Revision: D22268558 fbshipit-source-id: 3bc9a129bece1b547c519772ecc6b980780fb904 * [quant][graphmode][fix] remove unsupported ops in the list (#40653) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40653 (Note: this ignores all push blocking failures!) Test Plan: Imported from OSS Differential Revision: D22271413 fbshipit-source-id: a01611b5d90849ac673fa5a310f910c858e907a3	2020-07-07 13:26:27 -07:00
Jerry Zhang	0406b69b79	[quant][graphmode][fix] Fold conv bn (#40865 ) (#40970 ) * [quant][graphmode][fix] Fold conv bn (#40865) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40865 1. applied filter for the module types 2. removed the assumption that the conv bn are immediate child of parent module Test Plan: python test/test_quantization.py TestQuantizeJitPasses Imported from OSS Differential Revision: D22338074 fbshipit-source-id: 64739a5e56c0a74249a1dbc2c8454b88ec32aa9e * [quant][graphmode][fix] Print the node in error message (#40889) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40889 Test Plan: Imported from OSS Differential Revision: D22348266 fbshipit-source-id: eed2ece5c94fcfaf187d6770bed4a7109f0c0b4a	2020-07-07 13:25:39 -07:00
Jerry Zhang	6220cc4380	[quant][graphmode][fix] dequantize propagation for {add/mul}_scalar + aten::repeat (#40933 ) * [quant][graphmode][fix] dequantize propagation for {add/mul}_scalar (#40596) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40596 Previously the fusion patterns for {add/mul}_scalar is inconsistent since the op pattern produces a non-quantized tensor and the op replacement graph produces a quantized tensor Test Plan: Imported from OSS Differential Revision: D22251072 fbshipit-source-id: e16eb92cf6611578cca1ed8ebde961f8d0610137 * [quant][graphmode] Support quantization for `aten::apend` (#40743) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40743 `aten::append` modifies input inplace and the output is ignored, these ops are not supported right now, so we'll need to first make `aten::append` non-inplace by change ``` ignored = aten::append(list, x) ``` to ``` x_list = aten::ListConstruct(x) result = aten::add(list, x_list) ``` and then quantize the aten::add instead. Test Plan: TestQuantizeJitOps.test_general_shape_ops Imported from OSS Differential Revision: D22302151 fbshipit-source-id: 931000388e7501e9dd17bec2fad8a96b71a5efc5	2020-07-07 13:25:02 -07:00
mcarilli	eaf3f2fd34	Added index_put to promotelist (#41036 ) * Added index_put to promotelist * docstring Co-authored-by: Michael Carilli <mcarilli@nvidia.com>	2020-07-07 13:00:32 -07:00
eellison	c35b4c770b	Bucket of shape analysis fixes (#41044 ) * [JIT] fix unfold shape analysis (#40749) Summary: unfold on a 0-dimensioned tensor returns a 1-dim tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/40749 Differential Revision: D22361481 Pulled By: eellison fbshipit-source-id: 621597e5f97f6e39953eb86f8b85bb4142527a9f * shape analysis fix for default dtype' ghstack-source-id: 723aa27c2685417715a0891f5ca1ae885d4c9832 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40938 * fix grad thrashing of shape analysis ghstack-source-id: dd8742b1da52d17e9d6ab6c81ff0b27520b09417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40939 Co-authored-by: Elias Ellison <eellison@fb.com>	2020-07-07 12:59:47 -07:00
Eli Uriegas	11baccf1b5	[release/1.6] .circleci: Output binary sizes, store binaries (#41075 ) We need an easy to way to quickly visually grep binary sizes from builds and then have a way to test out those binaries quickly. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> (cherry picked from commit 66813515d4dec66f319442ba967c64b87c0286cd) Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-07-07 11:27:00 -07:00
raghuramank100	f0f0cbdd4a	Docstring changes for dynamic quantized classes (#40931 ) (#41032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40931 Fix docstrings for dynamic quantized Linear/LSTM and associated classes ghstack-source-id: 107064446 Test Plan: Docs show up in correctly Differential Revision: D22360787 fbshipit-source-id: 8e357e081dc59ee42fd7f12ea5079ce5d0cc9df2	2020-07-06 21:37:53 -07:00
Mikhail Zolotukhin	11b70b0041	[JIT] Switch executor from Simple to Legacy. (#41017 ) * properly skip legacy tests regardless of the default executor (#40381) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40381 Differential Revision: D22173938 Pulled By: Krovatkin fbshipit-source-id: 305fc4484977e828cc4cee6e053a1e1ab9f0d6c7 * [JIT] Switch executor from Simple to Legacy. This is done for 1.6 only in order to recover performance regressions caused by the Legacy->Simple switch that was done in 1.5. On master we still plan to use Simple executor and fix the performance issues in 1.7 without falling back to the Legacy executor. Co-authored-by: Nikolay Korovaiko <korovaikon@gmail.com>	2020-07-06 21:35:02 -07:00
James Reed	01e9562313	[1.6 cherrypick] Fix delegating to jit.load from torch.load (#41013 )	2020-07-06 16:55:00 -07:00
Nick Korovaiko	3f13c9a2c8	infer tensor properties based on an input tensor rather than defaults for xxx_like ctors (#40895 ) (#41016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40895 Reviewed By: eellison Differential Revision: D22358878 Pulled By: Krovatkin fbshipit-source-id: 2db2429aa89c180d8e52a6bb1265308483da46a2	2020-07-06 16:52:59 -07:00
Nick Korovaiko	63a94c021a	shape inference of undefined for prim::grad (#40866 ) (#41015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40866 Reviewed By: pbelevich Differential Revision: D22358988 Pulled By: Krovatkin fbshipit-source-id: 7118d7f8d4eaf056cfb71dc0d588d38b1dfb0fc7	2020-07-06 16:51:37 -07:00
Nick Korovaiko	2b175ba909	update requires_gard on loop inputs correctly (master) (#40926 ) (#41014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40926 Reviewed By: eellison Differential Revision: D22359471 Pulled By: Krovatkin fbshipit-source-id: 823e87674e2d2917f075255ec926e0485972f4e2	2020-07-06 16:30:14 -07:00
Ashkan Aliabadi	8c3f662224	Update FP16 to FP16:4dfe081cf6bcd15db339cf2680b9281b8451eeb3. (#40956 )	2020-07-06 06:59:41 -07:00
Ashkan Aliabadi	0ffdd5aa1d	Update cpuinfo to cpuinfo:63b254577ed77a8004a9be6ac707f3dccc4e1fd9. (#40955 )	2020-07-06 06:59:30 -07:00
Ashkan Aliabadi	d53427c541	Update FXdiv to FXdiv:b408327ac2a15ec3e43352421954f5b1967701d1. (#40954 )	2020-07-06 06:59:17 -07:00
Ashkan Aliabadi	b44b1d868e	Update psimd to psimd:072586a71b55b7f8c584153d223e95687148a900 (#40953 )	2020-07-06 06:59:01 -07:00
Ashkan Aliabadi	9184c9832e	Re-apply PyTorch pthreadpool changes (#40951 ) * Re-apply PyTorch pthreadpool changes Summary: This re-applies D21232894 (`b9d3869df3`) and D22162524, plus updates jni_deps in a few places to avoid breaking host JNI tests. Test Plan: `buck test @//fbandroid/mode/server //fbandroid/instrumentation_tests/com/facebook/caffe2:host-test` Reviewed By: xcheng16 Differential Revision: D22199952 fbshipit-source-id: df13eef39c01738637ae8cf7f581d6ccc88d37d5 * Enable XNNPACK ops on iOS and macOS. Test Plan: buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/pytext/pytext_mobile_inference.json --platform ios --framework pytorch --remote --devices D221 (`9788a74da8`)AP-12.0.1 Reviewed By: xta0 Differential Revision: D21886736 fbshipit-source-id: ac482619dc1b41a110a3c4c79cc0339e5555edeb * Respect user set thread count. (#40707) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40707 Test Plan: Imported from OSS Differential Revision: D22318197 Pulled By: AshkanAliabadi fbshipit-source-id: f11b7302a6e91d11d750df100d2a3d8d96b5d1db * Fix and reenable threaded QNNPACK linear (#40587) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40587 Previously, this was causing divide-by-zero only in the multithreaded empty-batch case, while calculating tiling parameters for the threads. In my opinion, the bug here is using a value that is allowed to be zero (batch size) for an argument that should not be zero (tile size), so I fixed the bug by bailing out right before the call to pthreadpool_compute_4d_tiled. Test Plan: TestQuantizedOps.test_empty_batch Differential Revision: D22264414 Pulled By: dreiss fbshipit-source-id: 9446d5231ff65ef19003686f3989e62f04cf18c9 * Fix batch size zero for QNNPACK linear_dynamic (#40588) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40588 Two bugs were preventing this from working. One was a divide by zero when multithreading was enabled, fixed similarly to the fix for static quantized linear in the previous commit. The other was computation of min and max to determine qparams. FBGEMM uses [0,0] for [min,max] of empty input, do the same. Test Plan: Added a unit test. Differential Revision: D22264415 Pulled By: dreiss fbshipit-source-id: 6ca9cf48107dd998ef4834e5540279a8826bc754 Co-authored-by: David Reiss <dreiss@fb.com>	2020-07-06 06:58:25 -07:00
Jerry Zhang	e89c4f0dec	[quant] Fix fuse linear pass (#40549 ) (#40751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40549 Currently we didn't check if %weight_t is produced by `aten::t`, this will fuse some `matmul`/`addmm` that is not 2d to `aten::linear`, which is incorrect Test Plan: Imported from OSS Differential Revision: D22225921 fbshipit-source-id: 9723e82fdbac6d8e1a7ade22f3a9791321ab12b6	2020-07-02 10:23:22 -07:00
Jerry Zhang	ea273c68f9	Inplace construct of TorchScript Module and inplace option for quantization (#40750 ) * [WIP][JIT] Add ScriptModule._reconstruct (#39979) Summary: Summary This commit adds an instance method `_reconstruct` that permits users to reconstruct a `ScriptModule` from a given C++ `Module` instance. Testing This commit adds a unit test for `_reconstruct`. Fixes This pull request fixes https://github.com/pytorch/pytorch/issues/33912. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39979 Differential Revision: D22172323 Pulled By: SplitInfinity fbshipit-source-id: 9aa6551c422a5a324b822a09cd8d7c660f99ca5c * [quant][graphmode] Enable inplace option for top level API (#40414) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40414 after `_reconstruct` is supported in RecursiveScriptModule: https://github.com/pytorch/pytorch/pull/39979 we can support inplace option in quantization API Test Plan: Imported from OSS Differential Revision: D22178326 fbshipit-source-id: c78bc2bcf2c42b06280c12262bb31aebcadc6c32 Co-authored-by: Meghan Lele <meghanl@fb.com>	2020-07-02 10:22:45 -07:00
Jerry Zhang	4dd37bfbf7	[jit] Remove unnecessary clone APIs for script::Module and RecursiveScriptModule (#40297 ) (#40748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40297 Test Plan: Imported from OSS Differential Revision: D22191660 fbshipit-source-id: 4b338ca82caaca04784bffe01fdae3d180c192f4	2020-07-02 10:22:27 -07:00
Nikita Shulga	2533b9da83	Fix complex printing for sci_mode=True (#40513 ) (#40919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40513 This PR makes the following changes: 1. Complex Printing now uses print formatting for it's real and imaginary values and they are joined at the end. 2. Adding 1. naturally fixes the printing of complex tensors in sci_mode=True ``` >>> torch.tensor(float('inf')+float('inf')*1j) tensor(nan+infj) >>> torch.randn(2000, dtype=torch.cfloat) tensor([ 0.3015-0.2502j, -1.1102+1.2218j, -0.6324+0.0640j, ..., -1.0200-0.2302j, 0.6511-0.1889j, -0.1069+0.1702j]) >>> torch.tensor([1e-3, 3+4j, 1e-5j, 1e-2+3j, 5+1e-6j]) tensor([1.0000e-03+0.0000e+00j, 3.0000e+00+4.0000e+00j, 0.0000e+00+1.0000e-05j, 1.0000e-02+3.0000e+00j, 5.0000e+00+1.0000e-06j]) >>> torch.randn(3, dtype=torch.cfloat) tensor([ 1.0992-0.4459j, 1.1073+0.1202j, -0.2177-0.6342j]) >>> x = torch.tensor([1e2, 1e-2]) >>> torch.set_printoptions(sci_mode=False) >>> x tensor([ 100.0000, 0.0100]) >>> x = torch.tensor([1e2, 1e-2j]) >>> x tensor([100.+0.0000j, 0.+0.0100j]) ``` Test Plan: Imported from OSS Differential Revision: D22309294 Pulled By: anjali411 fbshipit-source-id: 20edf9e28063725aeff39f3a246a2d7f348ff1e8 Co-authored-by: anjali411 <chourdiaanjali123@gmail.com>	2020-07-02 09:45:35 -07:00
Edward Z. Yang	c5c8a85a82	If ninja is being used, force build_ext to run. (#40881 ) As ninja has accurate dependency tracking, if there is nothing to do, then we will very quickly noop. But this is important for correctness: if a change was made to a header that is not listed explicitly in the distutils Extension, then distutils will come to the wrong conclusion about whether or not recompilation is needed (but Ninja will work it out.) This caused https://github.com/pytorch/vision/issues/2367 Signed-off-by: Edward Z. Yang <ezyang@fb.com> ghstack-source-id: 6409595c8ac091f3863f305c123266b9d3a167ad Pull Request resolved: https://github.com/pytorch/pytorch/pull/40837	2020-07-02 08:05:25 -07:00
Nikita Shulga	b4b8f5b9d4	Release GIL during DDP construction. (#40877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40495 As part of debugging flaky ddp_under_dist_autograd tests, I realized we were running into the following deadlock. 1) Rank 0 would go into DDP construction, hold GIL and wait for broadcast in DDP construction. 2) Rank 3 is a little slower and performs an RRef fetch call before the DDP construction. 3) The RRef fetch call is done on Rank 0 and tries to acquire GIL. 4) We now have a deadlock since Rank 0 is waiting for Rank 3 to enter the collective and Rank 3 is waiting for Rank 0 to release GIL. ghstack-source-id: 106534442 Test Plan: 1) Ran ddp_under_dist_autograd 500 times. 2) waitforbuildbot Differential Revision: D22205180 fbshipit-source-id: 6afd55342e801b9edb9591ff25158a244a8ea66a Co-authored-by: Pritam Damania <pritam.damania@fb.com>	2020-07-01 13:36:50 -07:00
Wanchao	41816dc97f	[1.6] Fix dictConstruct ordering and enable dict mix (#40797 ) A combination of https://github.com/pytorch/pytorch/pull/39601 and https://github.com/pytorch/pytorch/pull/40424 both are approved and merged in master	2020-07-01 09:30:16 -07:00
Wanchao	31d9776c04	[1.6] fix autograd doc subsubsection display issue (#40796 ) Master branch PR: https://github.com/pytorch/pytorch/pull/40582	2020-07-01 09:28:25 -07:00
Mike Ruberry	ddea6c552f	Ports full dtype inference deprecation to 1.6 (#40799 ) * ports full deprecation * fixtures * Fixes lint * Trying to fix phantom lint issue * nuclear lint option * Paradoxical linter fix Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>	2020-07-01 09:27:27 -07:00
Mikhail Zolotukhin	091537a764	[JIT][1.6] Shape analysis fixes. (#40716 ) * [JIT] Update type of the unsqueeze's output in shape analysis. * [JIT] Fix shape analysis for aten::masked_select. The reference says that this op always returns a 1-D tensor, even if the input and the mask are 0-D.	2020-07-01 08:41:05 -07:00
peterjc123	bf4d905ea1	Fix wrong MSVC version constraint for CUDA 9.2 (#40794 ) (#40849 ) Summary: Tested with https://github.com/pytorch/pytorch/pull/40782. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40794 Differential Revision: D22318045 Pulled By: malfet fbshipit-source-id: a737ffd7cb8a6a9efb62b84378318f4c3800ad8f	2020-07-01 08:37:40 -07:00
peterjc123	415e499330	Fix zip serialization for file > 2GiB for Windows (#40852 )	2020-07-01 08:36:40 -07:00
James Reed	eaf7dad5d6	[1.6 cherrypick] Support Pathlike for zipfile serialization (#40793 )	2020-06-30 10:38:00 -07:00
Mike Ruberry	75a074abdc	1.6 Port: Dynamic Versioning (#40542 ) Co-authored-by: Mike Ruberry <mruberry@devfair044.maas>	2020-06-30 10:18:18 -07:00
anjali411	dede34eab7	[1.6 cherrypick] Doc fix for complex views Cherry-pick of https://github.com/pytorch/pytorch/pull/40450 Test Plan: Imported from OSS	2020-06-30 09:37:02 -07:00
James Reed	0c90b6da5c	[1.6 cherrypick] Fix zip serialization for file > 2GiB (#40757 ) * [1.6 cherrypick] Fix zip serialization for file > 2GiB * Update test/test_serialization.py Co-authored-by: Nikita Shulga <nikita.shulga@gmail.com>	2020-06-30 07:10:02 -07:00
mrshenli	4316199832	Add examples and tests for combining static/class method with async execution (#40619 ) (#40688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40619 Test Plan: Imported from OSS Differential Revision: D22258407 Pulled By: mrshenli fbshipit-source-id: 036d85a2affc4505efd2df197fc513dba010e359	2020-06-29 19:34:23 -07:00
Luca Wehrstedt	f993e5ac88	[1.6] Update TensorPipe submodule (#40634 ) Upstream PR: #40614 Summary: This update pulls in a oneliner fix, which sets the TCP_NODELAY option on the TCP sockets of the UV transport. This leads to exceptional performance gains in terms of latency, with about a 25x improvement in one simple benchmark. This thus resolves a regression that TensorPipe had compared to the ProcessGroup agent and, in fact, ends up beating it by 2x. The benchmark I ran is this, with the two endpoints pinned to different cores of the same machine: ``` torch.jit.script def remote_fn(t: int): return t torch.jit.script def local_fn(): for _ in range(1_000_000): fut = rpc.rpc_async("rhs", remote_fn, (42,)) fut.wait() ``` And the average round-trip time (one iteration) is: - TensorPipe with SHM: 97.2 us - TensorPipe with UV _after the fix_: 205us - Gloo: 440us - TensorPipe with UV _before the fix_: 5ms Test Plan: Ran PyTorch RPC test suite	2020-06-29 19:33:32 -07:00
eellison	c5bd737f0c	[JIT] script if tracing fix (#40468 ) (#40572 ) Summary: Currently, torchvision annotates `batched_nms` with `torch.jit.script` so the function gets compiled when it is traced and ONNX will work. Unfortunately, this means we are eagerly compiling batched_nms, which fails if torchvision isn't built with `torchvision.ops.nms`. As a result, torchvision doesn't work on torch hub right now. `_script_if_tracing` could solve our problem here, but right now it does not correctly interact with recursive compilation. This PR fixes that bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40468 Reviewed By: jamesr66a Differential Revision: D22195771 Pulled By: eellison fbshipit-source-id: 83022ca0bab6d389a48a478aec03052c9282d2b7 Co-authored-by: Elias Ellison <eellison@fb.com>	2020-06-29 19:30:41 -07:00
wconstab	fe45c2c986	Allow slicing sequential container (#40538 ) - fixes #38034 - works around missing slice functionality in Sequential by casting to tuple and slicing that instead - supports iterating on the resulting slice but not call()	2020-06-29 19:29:19 -07:00
peterjc123	a9996bb482	Fixes caffe2 loading issues on Windows (#39513 ) (#40487 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/27840#issuecomment-638715422. Contains a bunch of fixes (https://github.com/pytorch/pytorch/pull/39376 + https://github.com/pytorch/pytorch/pull/39334 + https://github.com/pytorch/pytorch/pull/38302 + https://github.com/pytorch/pytorch/pull/35362) Pull Request resolved: https://github.com/pytorch/pytorch/pull/39513 Differential Revision: D22190761 Pulled By: malfet fbshipit-source-id: b2d52f6cb16c233d16071e9c0670dfff7da2710e (cherry picked from commit e2201e2ed8ed7bf9c6226f8c484192949d94c248)	2020-06-29 19:17:34 -07:00
Eli Uriegas	bdfcbfa18c	[release/1.6] .jenkins: Install torch from test channel (#40706 ) We're on a test branch so we should install from the test channel Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-06-29 13:53:14 -07:00
peterjc123	ea1b0dba18	Remove constexpr for NVCC on Windows (#40676 )	2020-06-29 13:48:50 -07:00
Nikita Shulga	6d85b2c989	Pin XLA CI to use r1.6 release branch. (#40721 )	2020-06-29 13:41:14 -07:00
Nikita Shulga	44f79651a7	Tweak `file_diff_from_base` for release/1.6 branch (#40712 )	2020-06-29 11:41:46 -07:00
eellison	8682ac147b	Docs merge (#40569 ) Co-authored-by: Elias Ellison <eellison@fb.com>	2020-06-26 12:24:08 -07:00
Jessica Lin	4cc605e80a	(1.6) Update docs feature classifications (#40539 ) Co-authored-by: Eli Uriegas <1700823+seemethere@users.noreply.github.com>	2020-06-26 12:23:02 -07:00
Jessica Lin	b0cce716f7	Add beta warning for quant docs (#40540 ) Add a beta warning to match stable and master docs: https://github.com/pytorch/pytorch/blob/master/docs/source/quantization.rst	2020-06-26 12:20:06 -07:00
mrshenli	0dc93ac119	[v1.6.0 patch] Install method docstrings from PyRRef to RRef (#40620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40461 It turned out `:inheried-members:` (see [doc](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#directive-autoclass)) is not really usable. Because pybind11 generates a docstring that writes `self` as parent class, `rpc.PyRRef`, type. As a workaround, I am pulling docstrings on parent-class, `PyRRef` class, into subclass, `RRef`. And do surgery on the docstring generated by pybind11. {F241283111} ghstack-source-id: 106472496 P134031188 Differential Revision: D7933834 fbshipit-source-id: c03a8a4c9d98888b64492a8caba1591595bfe247 Co-authored-by: Shihao Xu <shihaoxu@fb.com>	2020-06-26 12:15:28 -07:00
Jessica Lin	bb848df10b	[1.6] Remove table of contents at the top of rpc.rst (#40482 ) Master PR: https://github.com/pytorch/pytorch/pull/40205 Remove the table of contents created by the `.. contents:: :local: :depth: 2` since this page isn't one of the large documentation pages (https://github.com/pytorch/pytorch/issues/38010) and is simply a landing page for the Distributed RPC Framework. Changes made in this original PR: `f10fbcc820 (diff-250b9b23fd6f1a5c15aecdb72afb9d7d)`	2020-06-26 08:37:49 -07:00
peterjc123	2dc0b84aca	Skip test_mem_leak on Windows (#40498 ) (cherry picked from commit 3fb6f038256a3a5ce43e857409ce4ffb807d93a5)	2020-06-25 16:45:48 -07:00
Eli Uriegas	168cddf5f1	.circleci: Fix upload to backup directory Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-06-23 20:57:42 -07:00
Eli Uriegas	bc8760b3db	.circleci: Fix pip installation of awscli Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-06-23 19:05:48 -07:00
Eli Uriegas	4269b9a8fc	.circleci: Fix backup uploads awscli was not loaded on conda builds and the backup upload did not work since it was a recursive copy instead of just specifically copying what we want. Signed-off-by: Eli Uriegas <eliuriegas@fb.com>	2020-06-23 18:18:06 -07:00
Kimish Patel	6a421d50ab	Enabling concat fast path for channels last inputs (#39448 ) Summary: Updates concat kernel for contiguous input to support channels_last contig tensors. This was tried on squeezenet model on pixel-2 device. It improves model perf by about 25%. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39448 Test Plan: test_cat_in_channels_last Differential Revision: D22160526 Pulled By: kimishpatel fbshipit-source-id: 6eee6e74b8a5c66167828283d16a52022a16997f	2020-06-23 13:01:59 -07:00
Hector Yuen	27982d5711	fixes to layernorm emulation (#40422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40422 fix the remaining differences to the emulation of fp16 layernorm Test Plan: unit test of layernorm Reviewed By: venkatacrc Differential Revision: D22182849 fbshipit-source-id: 8a45c21418517d65d7a41663d5ad2110d6b4677a	2020-06-23 11:51:13 -07:00
Ilia Cherniavskii	b82bd654cc	Increase shapes column length (#40440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40440 Shapes sometimes need more than 35 symbols (Note: this ignores all push blocking failures!) Test Plan: found during testing the recipe https://github.com/pytorch/tutorials/pull/1019 Differential Revision: D22188679 Pulled By: ilia-cher fbshipit-source-id: efcf5d10882af7d9225897ec87debcf4abdc523f	2020-06-23 10:49:01 -07:00
Ilia Cherniavskii	d8c384544e	Destroy CUDA events after profiling (#39962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39962 Adding a simple wrapper with ref count for cuda event and destroying cuda event after the last copy is destroyed Test Plan: CI cuda profiler tests Differential Revision: D22027092 Pulled By: ilia-cher fbshipit-source-id: e0810388aa60b2291eb010896e13af1fad92e472	2020-06-23 10:44:39 -07:00
Ilia Cherniavskii	a54bb4e907	Fix demangle 't' issue in profiler (#40416 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40416 Fix demangle 't' that produces 'unsigned short' Test Plan: >>> import torch >>> from torch.autograd.profiler import profile >>> >>> t = torch.rand(4, 5) >>> with profile() as prof: ... t.t() >>> print(prof.key_averages().table()) Differential Revision: D22179508 Pulled By: ilia-cher fbshipit-source-id: b502af2f2547317c1a6447f2225d50b2376bfc76	2020-06-23 10:37:41 -07:00
Michael Carilli	3b040c478a	Make custom_fwd a no-op when not executed under autocast (#36171 ) Summary: Currently, a custom autograd function written with ``` torch.cuda.amp.custom_fwd(cast_inputs=dtype) def forward(ctx, *args): ... ``` casts incoming floating-point CUDA tensors to `dtype` unconditionally, regardless of whether the function executes in an autocast-enabled region. I think I had the wrong idea there. Autocast-disabled regions should give the user control of input types. Also, `custom_fwd(cast_inputs=dtype)`-decorated functions' behavior should align with native fp32list/fp16list functions. C++-side casting wrappers have no effect when autocast is disabled, and `custom_fwd`'s casting should behave the same way. The present PR changes `custom_fwd` so it only casts in autocast-enabled regions (also updates custom_fwd to ignore fp64 inputs, like the C++ wrappers). Pull Request resolved: https://github.com/pytorch/pytorch/pull/36171 Differential Revision: D22179511 Pulled By: ngimel fbshipit-source-id: 5a93d070179a43206066bce19da0a5a19ecaabbd	2020-06-23 10:23:02 -07:00
Jerry Zhang	f652abc1dd	[jit] Enable `copy.deepcopy` and `copy.copy` for RecursiveScriptModule (#32685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32685 att Test Plan: . Imported from OSS Differential Revision: D21220755 fbshipit-source-id: 5c71e9bb9f43032cf60563a9e67579118a8d7e33	2020-06-23 09:21:12 -07:00
Vasiliy Kuznetsov	9bf255573f	quant docs: add and clean up ELU (#40377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40377 Cleans up the docstring for quantized ELU and adds it to the quantization docs. Test Plan: * build on Mac OS and inspect Differential Revision: D22162834 Pulled By: vkuzo fbshipit-source-id: e548fd4dc8d67db27ed19cac4dbdf2a942586759	2020-06-23 09:02:43 -07:00
Vasiliy Kuznetsov	d71ec51c0e	quant docs: add and clean up BatchNorm{n}d (#40346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40346 Cleans up docstrings for quantized BatchNorm and adds to quantization docs Test Plan: * build on Mac OS and inspect Differential Revision: D22152633 Pulled By: vkuzo fbshipit-source-id: e0bf02194158231e0205b5b2df7f6f1ffc3c4d65	2020-06-23 09:02:41 -07:00
Vasiliy Kuznetsov	5e683517a7	quant docs: add and clean up InstanceNorm{n}d (#40345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40345 Fixes docstrings and adds to quantization docs for quantized InstanceNorm. Test Plan: * build on Mac OS and inspect Differential Revision: D22152637 Pulled By: vkuzo fbshipit-source-id: 7a485311ead20796b7a0944827d1d04e14ec8dcd	2020-06-23 09:02:39 -07:00
Vasiliy Kuznetsov	6e3fdd77ca	quant docs: add and clean up GroupNorm (#40343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40343 Cleans up the quantized GroupNorm docstring and adds it to quantization docs. Test Plan: * build on Mac OS and inspect Differential Revision: D22152635 Pulled By: vkuzo fbshipit-source-id: 5553b841c7a5d77f1467f0c40657db9e5d730a12	2020-06-23 09:02:36 -07:00
Vasiliy Kuznetsov	d15fcc7e49	quant docs: add and clean up LayerNorm (#40342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40342 Cleans up the docstrings for quantized LayerNorm, and adds it to the docs. Test Plan: * build on Mac OS and inspect Differential Revision: D22152639 Pulled By: vkuzo fbshipit-source-id: 38adf14b34675d1983ac4ed751938aa396e5400b	2020-06-23 09:02:34 -07:00
Vasiliy Kuznetsov	d27f8eaf92	quant docs: add and clean up hardtanh (#40341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40341 Cleans up the hardtanh docstring and adds it to quantization docs. Test Plan: * build and inspect on Mac OS Differential Revision: D22152636 Pulled By: vkuzo fbshipit-source-id: c98e635199c8be332aa6958664ff23faad834908	2020-06-23 09:02:32 -07:00
Vasiliy Kuznetsov	8e74fb6a0c	quant docs: add and clean up hardsigmoid (#40340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40340 Adds and simplifies quantization docs for hardsigmoid Test Plan: * build docs on Mac OS * inspect Differential Revision: D22152634 Pulled By: vkuzo fbshipit-source-id: 18da273023fb00e5f0bc1e881b00536492c606d3	2020-06-23 09:02:29 -07:00
Vasiliy Kuznetsov	c4594a97ae	quant docs: clean up hardswish (#40323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40323 Cleans up the naming and the function param docs for quantized hardswish. Remove redundant docstrings and link to floating point modules instead. Test Plan: * build the docs on Mac OS * verify that every link works as expected Differential Revision: D22152638 Pulled By: vkuzo fbshipit-source-id: fef04874ae460b449c677424a6a1c6dd47054795	2020-06-23 08:59:34 -07:00
Shawn Zhong	79736ff9c2	Simplify complex case for `div_cpu` (#39996 ) Summary: Simplify complex case for `div_cpu` cc: zasdfgbnm Pull Request resolved: https://github.com/pytorch/pytorch/pull/39996 Differential Revision: D22169715 Pulled By: anjali411 fbshipit-source-id: e1822ee5c575cb8786b395c1bc7550890b38a60d	2020-06-23 08:48:10 -07:00
ChenXi	3e6fa778a5	Testcppextensionjit rebuild once (#40169 ) Summary: Previous: deco dont_wipe_extensions_build_folder control clean build path or not. Now: If cpp files or args changed, rebuild extension. clean build path only before and after test suite. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40169 Differential Revision: D22161450 Pulled By: ezyang fbshipit-source-id: 9167c8265e13922f68cd886be900f84ffc6afb84	2020-06-23 08:43:14 -07:00
Pritam Damania	54c05fa34e	Add basic GPU support to distributed autograd. (#40312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40312 As part of https://github.com/pytorch/pytorch/issues/40255, we realized that GPU support for distributed autograd was broken as part of our multithreaded autograd change. To fix this in the short term for 1.6, this PR includes the following changes: 1) Long lived CPU thread in DistEngine to execute GPU->CPU continuations in the autograd graph. 2) The long lived CPU thread has its own ready_queue and this queue is used for all GraphTasks created by DistEngine. 3) In thread_main(), the CPU thread cannot exit once the GraphTask is done processing because of the new CPU thread added in 1). 4) To resolve this, thread_main() now has a parameter `device_thread` instead of `reentrant_thread`. When device_thread is True, we expect this to be a long lived device thread that does not exit. 5) When device_thread is False, thread_main is expected to run a GraphTask and return once done. ghstack-source-id: 106391329 Test Plan: waitforbuildbot Differential Revision: D22146183 fbshipit-source-id: dd146b7a95f55db75f6767889b7255e9d62d5825	2020-06-23 07:49:00 -07:00
Nikita Shulga	e509c58a1c	Set C++14 compatibility flag in torch_compile_options (#40399 ) Summary: Also mark warning modifiers as private options (i.e. libraries depending on `torch_cpu` do not have to be compiled with `-Wall`) Closes https://github.com/pytorch/pytorch/issues/31283 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40399 Differential Revision: D22186206 Pulled By: malfet fbshipit-source-id: 1ad4277b5acc5c39849a3e4efe4b93a189d26e59	2020-06-23 07:10:22 -07:00
Luca Wehrstedt	2acee6dc93	Revert D22124313: Use Int8QuantParamsBlob to pass the scale and zeropoint params Test Plan: revert-hammer Differential Revision: D22124313 Original commit changeset: 6b5c1974c0fc fbshipit-source-id: 87a9a64c323be40db5d7d584029efa10c779dfa1	2020-06-23 05:54:44 -07:00
generatedunixname89002005287564	08ae7d3a71	[Codemod][FBSourceGoogleJavaFormatLinter] Daily `arc lint --take GOOGLEJAVAFORMAT` Reviewed By: zertosh Differential Revision: D22183348 fbshipit-source-id: afd4f7e8c18587c6ce1e1d6e76c8eeb9c558de15	2020-06-23 05:26:55 -07:00
Summer Deng	1ec4337b7d	Use Int8QuantParamsBlob to pass the scale and zeropoint params (#40390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40390 Change the Int8FC/Int8Quantize op interface to use Int8QuantParamsBlob as the qparam input blob format when needed. Test Plan: ``` buck test caffe2/caffe2/quantization/server: ``` Reviewed By: hx89 Differential Revision: D22124313 fbshipit-source-id: 6b5c1974c0fc5928f72773495f0da8d0eb9b98c9	2020-06-23 00:45:21 -07:00
Luca Wehrstedt	78b3d5f878	[TensorPipe] Register multiplexing channel over UV (#40389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40389 The `mpt_uv` channel MultiPlexes over a Transport, namely the UV one. What this means is that it takes a tensor, chunks it into equal parts and sends each of them on a separate UV connection, each running in a separate UV loop. Thus they each have their own socket and thread. This allows them to reach bandwidths that go beyond what a simple single-threaded approach can do, which is necessary to reach the high bandwidths of some modern NICs. ghstack-source-id: 106375511 Test Plan: Ran a few manual tests myself, for the rest relied on the PyTorch RPC tests. Differential Revision: D22144380 fbshipit-source-id: ef555fa04c6f13a4acf3bd5f7b03d04d02460d38	2020-06-23 00:24:17 -07:00
Xiaodong Wang	e9efad6878	[ROCM][CI] Skip fp16 bench and 2-GPU runs (#40243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40243 rocm bench has a large backlog right now. Let's skip some tests. Test Plan: CI Reviewed By: ezyang Differential Revision: D22125197 fbshipit-source-id: 330b52ce7f97af4e45c58f25bc7d57351d7c4efb	2020-06-22 21:53:55 -07:00
Jerry Zhang	ba89a89376	[quant][graphmode][refactor] InsertQuantDeQuantHelper (#40384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40384 Test Plan: Imported from OSS Differential Revision: D22164072 fbshipit-source-id: 0ca86265cfef1afa99dd860a452f3dd76e31792a	2020-06-22 21:30:17 -07:00
Yanli Zhao	6c40ec55df	Revert D22165477: [pytorch][PR] [JIT] Fork/Join inline docs Test Plan: revert-hammer Differential Revision: D22165477 Original commit changeset: 93132cd6987f fbshipit-source-id: f3d5d35b6640d786ec3bada1396b5d7ad645c26d	2020-06-22 20:51:56 -07:00
Vitaly Fedyunin	7bf1dd582a	Fix Cuda IPC deadlock (#40347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40347 Fixes: #39541 Fixes: #25301 Differential Revision: D22152662 Test Plan: Imported from OSS Pulled By: VitalyFedyunin fbshipit-source-id: 82548aa4c937e0260932244e78cb132bcb3209b3	2020-06-22 20:50:25 -07:00
Jerry Zhang	18122facb9	[quant][graphmode] Add warning for debug option for add_scalar/mul_scalar (#40383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40383 debug option is not supported for these cases, so we print a warning if it occurs Test Plan: Imported from OSS Differential Revision: D22164071 fbshipit-source-id: 90459530f4efdd6d255df4f015606cb0e9070cd3	2020-06-22 20:29:44 -07:00
Nikita Shulga	5766da503b	Device name should be a string, not bytes (#40322 ) Summary: I.e. do not accept `bytes` as possible type of `device` argument in `torch.cuda._get_device_index` Pull Request resolved: https://github.com/pytorch/pytorch/pull/40322 Differential Revision: D22176885 Pulled By: malfet fbshipit-source-id: 2f3a46174161f1cdcf6a6ad94a31e54b18ad6186	2020-06-22 19:27:25 -07:00
James Reed	0d24ed0c81	Add note to torch.save (#40394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40394 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D22168181 Pulled By: jamesr66a fbshipit-source-id: 634104a1c18faf3b6cb0e0f49d3980d671a141f4	2020-06-22 18:41:58 -07:00
Jerry Zhang	64f925eb0c	[quant][graphmode] Add support for functional linear (#40331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40331 Test Plan: Imported from OSS Differential Revision: D22162905 fbshipit-source-id: 3e0320d5f5c267c778af8e2fe4224f8383aab2c8	2020-06-22 18:05:06 -07:00
Vasiliy Kuznetsov	b02c932fb6	qat eager: remove unneeded modules (#40396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40396 Removes activation and normalization modules from eager mode QAT. These were incorrectly added, but we don't actually need them. Test Plan: ``` python test/test_quantization.py TestQuantizationAwareTraining ``` Imported from OSS Differential Revision: D22169768 fbshipit-source-id: b5bd753dafe92e90e226fb773eb18c6aae179703	2020-06-22 17:45:51 -07:00
Raghuraman Krishnamoorthi	d7d75e37bb	Add state dict for LSTM and RNNCell and helper functions for accessing weights and bias (#40333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40333 Add state_dict support for dynamic quantized LSTM/GRU/RNNCell. Add helper functions get_weight and get_bias for LSTM and RNNCells ghstack-source-id: 106364749 (Note: this ignores all push blocking failures!) Test Plan: buck test caffe2/test:quantization -- 'test_lstm_api $quantization\.test_quantized_module\.TestDynamicQuantizedModule$' --print-passing-details buck test caffe2/test:quantization -- 'test_cell_api $quantization\.test_quantized_module\.TestDynamicQuantizedModule$' --print-passing-details Differential Revision: D22151020 fbshipit-source-id: 2eb54062f6c6a35ffe4dbe8e8cfbf7ede0e92ba1	2020-06-22 17:41:07 -07:00
Michael Carilli	8066fba226	[RELAND2] Change AccumulateGrad to yield `.grad`s that match weights' memory layout (#40358 ) Summary: https://github.com/pytorch/pytorch/pull/40129 fixed the error responsible for the first revert, but exposed another error in the same test. This PR is intended as the "master copy" for merge, and it runs on full CI. Two other PRs (restricted to run on a small subset of CI) supporting debugging DDP failures/hangs with multiple devices per process (`test_c10d.py:DistributedDataParallelTest.test_grad_layout_1devicemodule_2replicaperprocess`). - https://github.com/pytorch/pytorch/pull/40290 tries the test with purely rowmajor contiguous params on an untouched master. In other words https://github.com/pytorch/pytorch/pull/40290 contains none of this PR's diffs aside from the test itself. - https://github.com/pytorch/pytorch/pull/40178, for comparison, tries the test with this PR's diffs. Both fail the same way, indicating failure is unrelated to this PR's other diffs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40358 Differential Revision: D22165785 Pulled By: albanD fbshipit-source-id: ac7cdd79af5c080ab74341671392dca8e717554e	2020-06-22 17:13:21 -07:00
Ivan Kobzarev	9e5d62582c	[android][gradle] packaging headers in aars for publishing (#40392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40392 Test Plan: Imported from OSS Differential Revision: D22167757 Pulled By: IvanKobzarev fbshipit-source-id: 363319c64933382c0b0ddce65624fe5a4602da26	2020-06-22 16:56:39 -07:00
Rohan Varma	ae2f1f0372	[DDP Note] Remove refs to RoundRobin PG until we officially support it (#40380 ) Summary: Removes line mentioning `ProcessGroupRoundRobin` since we don't intend it to be used as a public API just yet. We can add this back when we officially support the API Pull Request resolved: https://github.com/pytorch/pytorch/pull/40380 Differential Revision: D22165556 Pulled By: rohan-varma fbshipit-source-id: 24d0477d881dc74f2ff579de61dfd1ced2b09e75	2020-06-22 16:19:29 -07:00
Yanli Zhao	016cf7d66e	Revert D22102408: DNNL: enable conv3d Test Plan: revert-hammer Differential Revision: D22102408 Original commit changeset: 1e95cede429f fbshipit-source-id: a20b725164177e8571320007548a58cc4779d669	2020-06-22 15:41:51 -07:00
Yanli Zhao	17fe0e2b8a	Revert D22102407: DNNL: enable batchnorm3d Test Plan: revert-hammer Differential Revision: D22102407 Original commit changeset: c9dbb61d0538 fbshipit-source-id: d40976aa8120d2d0839624bf02c082d7d1eb610d	2020-06-22 15:39:37 -07:00
Elias Ellison	0d0608532c	[JIT] Fork/Join inline docs (#39952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39952 Differential Revision: D22165477 Pulled By: eellison fbshipit-source-id: 93132cd6987fdd2484112a57ef17912b8fcc5fab	2020-06-22 15:34:05 -07:00
Yanli Zhao	13a8ec3cc5	Revert D22102406: DNNL: enable max_pool3d and avg_pool3d Test Plan: revert-hammer Differential Revision: D22102406 Original commit changeset: 296a87188b79 fbshipit-source-id: ff023be5e8dd4bfcd68770cab305da6ba2e03893	2020-06-22 15:23:01 -07:00
Yanli Zhao	9498e24ca8	Revert D22138737: DNNL: enable dilation conv Test Plan: revert-hammer Differential Revision: D22138737 Original commit changeset: 4225bc7d2624 fbshipit-source-id: 7bbafbe9f412a8f9167e3ae4425dbc933ec67c6b	2020-06-22 15:20:55 -07:00
anjali411	8ec2ae9a9f	Add view_as_real, view_as_complex for complex tensors (#39099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39099 Test Plan: Imported from OSS Differential Revision: D22057886 Pulled By: anjali411 fbshipit-source-id: bad5ba7097ba0dd13f2c549b2463094dee9afa14	2020-06-22 15:15:27 -07:00
Shawn Zhong	7a3c223bbb	Migrate `var` & `std` to ATen (#39967 ) Summary: Not sure why there are so many issues for std & var, but this PR should close them all: std: Fix https://github.com/pytorch/pytorch/issues/24771, Fix https://github.com/pytorch/pytorch/issues/24676, Fix https://github.com/pytorch/pytorch/issues/24639, Fix https://github.com/pytorch/pytorch/issues/24529 var: Fix https://github.com/pytorch/pytorch/issues/24782, Fix https://github.com/pytorch/pytorch/issues/24677, Fix https://github.com/pytorch/pytorch/issues/24652, Fix https://github.com/pytorch/pytorch/issues/24530 ```py import time import torch def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() for device in (torch.device("cpu"), torch.device("cuda")): for size in ( [100000000], [10000, 10000], [1000, 1000, 100], [100, 100, 100, 100], ): t = torch.randn(*size, device=device) total_time = 0 for i in range(10): t1 = _time() t.std() t2 = _time() total_time += t2 - t1 print(f"Tensor of size {size} on {device}: {total_time / 10}") ``` Before: ``` Tensor of size [100000000] on cpu: 0.36041643619537356 Tensor of size [10000, 10000] on cpu: 0.37235140800476074 Tensor of size [1000, 1000, 100] on cpu: 0.386572527885437 Tensor of size [100, 100, 100, 100] on cpu: 0.37404844760894773 Tensor of size [100000000] on cuda: 0.0021645784378051757 Tensor of size [10000, 10000] on cuda: 0.002090191841125488 Tensor of size [1000, 1000, 100] on cuda: 0.00208127498626709 Tensor of size [100, 100, 100, 100] on cuda: 0.0020844221115112306 ``` After: ``` Tensor of size [100000000] on cpu: 0.1339871883392334 Tensor of size [10000, 10000] on cpu: 0.1343991994857788 Tensor of size [1000, 1000, 100] on cpu: 0.1346735954284668 Tensor of size [100, 100, 100, 100] on cpu: 0.11906447410583496 Tensor of size [100000000] on cuda: 0.0013531208038330077 Tensor of size [10000, 10000] on cuda: 0.0012922048568725585 Tensor of size [1000, 1000, 100] on cuda: 0.001285886764526367 Tensor of size [100, 100, 100, 100] on cuda: 0.0012899160385131836 ``` cc: VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/39967 Differential Revision: D22162469 Pulled By: VitalyFedyunin fbshipit-source-id: 8d901c779767b00f81cd6231bc665e04f297b4c3	2020-06-22 14:25:18 -07:00
Xiang Gao	c4fc278fa8	Build docker for CUDA11 (#40231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40231 Differential Revision: D22168068 Pulled By: malfet fbshipit-source-id: 4706e7a113c2006acbb1a63ff8a657975aa5369b	2020-06-22 13:55:15 -07:00
Alban Desmaison	02ae9a1583	add TypeError to c10 and fix segfault in error checking in Tensor constructor (#40106 ) Summary: As per title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40106 Differential Revision: D22137193 Pulled By: albanD fbshipit-source-id: 11d059263c00a834211f016bd9a9e18fdc0437ef	2020-06-22 13:42:44 -07:00
Srimukh Sripada	a8ab78c815	Added a link to Contribution guide in Readme (#40353 ) Summary: Added a link to `CONTRIBUTION.md` in `README.md` for easy reference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40353 Differential Revision: D22167138 Pulled By: ezyang fbshipit-source-id: fe7b7f190c8135fdd2e71696c1cf8d84bcd40fc6	2020-06-22 13:20:06 -07:00
Zhang, Xiaobing	dbcc5b7533	DNNL: enable dilation conv (#40220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40220 Test Plan: Imported from OSS Differential Revision: D22138737 Pulled By: VitalyFedyunin fbshipit-source-id: 4225bc7d26241b443d18ef9d56326e5a9e6bbeda	2020-06-22 13:14:09 -07:00
Wojciech Baranowski	43331609a4	Port addmm, addbmm, addr to ATen (CUDA) (#38421 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24536, fixes https://github.com/pytorch/pytorch/issues/24534 and fixes https://github.com/pytorch/pytorch/issues/24533 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38421 Differential Revision: D22138333 Pulled By: VitalyFedyunin fbshipit-source-id: f4411d0df0a001bbb95089eb55fdcac3aba86700	2020-06-22 13:02:33 -07:00
Zhang, Xiaobing	c873895722	DNNL: enable max_pool3d and avg_pool3d (#35664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35664 Test Plan: Imported from OSS Differential Revision: D22102406 Pulled By: VitalyFedyunin fbshipit-source-id: 296a87188b79545741f6b7e136a58e4380564f25	2020-06-22 11:57:12 -07:00
Zhang, Xiaobing	8df35fd755	DNNL: enable batchnorm3d (#35663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35663 Test Plan: Imported from OSS Differential Revision: D22102407 Pulled By: VitalyFedyunin fbshipit-source-id: c9dbb61d0538ab9e1e76b2815564030b5f89d33e	2020-06-22 11:57:09 -07:00
Zhang, Xiaobing	6ba807cb43	DNNL: enable conv3d (#35662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35662 Test Plan: Imported from OSS Differential Revision: D22102408 Pulled By: VitalyFedyunin fbshipit-source-id: 1e95cede429f1a950f26bc7052ab33d198857df3	2020-06-22 11:55:04 -07:00
kshitij12345	03af4dcbbf	Utilise the vector version for sinh and cosh (UnaryOpsKernel) (#36396 ) Summary: Utilise the existing methods of `Vec256` class. Not sure if there should be tests and if yes where. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36396 Differential Revision: D22155803 Pulled By: VitalyFedyunin fbshipit-source-id: 500dcb5c79650bc5daa0c9683d65eeab6f9dd1d3	2020-06-22 11:38:37 -07:00
Zhang, Xiaobing	87c5f02f3d	jit: Conv3d + BatchNorm3d fusion (#40082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40082 Differential Revision: D22120340 Pulled By: jerryzh168 fbshipit-source-id: fce6c5f03fe7ab6c60620cbdf547d5a466a470e3	2020-06-22 11:15:52 -07:00
Rohan Varma	14f7e95c1a	Add prefix of remote events for RPC profiling (#40066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40066 Builds on top of the previous PR to ensure that all remotely profiled events are prefixed with the key for the RPC that generated them. The key is generated by the result of `_build_rpc_profiling_key` in `rpc/internal.py` and prefixed onto the event name. In order to do this, we set the current-key when creating the RPC in Python, retrieve the currently-set key in C++ and save a GloballyUniqueId -> key mapping to an in-memory map. When we receive an RPC with profiling information, we expect to receive this ID back, and look up the corresponding profiling key in the map. The key is then added to all the remote events. Tested by adding tests to ensure the key is added to all the remote events. Also added a UT which tests in under the multi-threading scenario, to ensure that the mapping's correctness is maintained when several RPCs are in the process of being created at once. ghstack-source-id: 106316106 Test Plan: Unit test Differential Revision: D22040035 fbshipit-source-id: 9215feb06084b294edbfa6e03385e13c1d730c43	2020-06-22 11:01:07 -07:00
Xiao Wang	17d3f74ea3	Relax cudnn conditions for channels-last convolutions (#38904 ) Summary: Follow up of https://github.com/pytorch/pytorch/issues/38044. Thanks ptrblck, mcarilli for the help on discussing the changes! Could fix https://github.com/pytorch/pytorch/issues/37725 by skipping the depthwise-workload check introduced in https://github.com/pytorch/pytorch/issues/22302. This PR also relaxed dilated convolution for channels-last. The testing script is https://gist.github.com/xwang233/82a707f69bb710cb612349280a2c5f41. About 387k conv arguments were tested and no cudnn exception was thrown. cc ngimel VitalyFedyunin ptrblck mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/38904 Differential Revision: D22155797 Pulled By: VitalyFedyunin fbshipit-source-id: 81b5736cec67ea263029121521c6acafd9dddba6	2020-06-22 10:59:37 -07:00
anjali411	c72ab19458	Add addmv for complex dtypes (#40238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40238 Differential Revision: D22160528 Pulled By: anjali411 fbshipit-source-id: 04093e5929318a7acc9c9b502c76d0a8bf15d5e1	2020-06-22 10:54:35 -07:00
Hong Xu	3894de569e	Reenable memory format test for some unary functions (#39102 ) Summary: Many of them have already been migrated to ATen Pull Request resolved: https://github.com/pytorch/pytorch/pull/39102 Differential Revision: D22162193 Pulled By: VitalyFedyunin fbshipit-source-id: 80db9914fbd792cd610c4e8ab643ab97845fac9f	2020-06-22 10:46:28 -07:00
Jerry Zhang	9f9e7c1d71	[quant][refactor] Tests for torch.jit.quantized (#40330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40330 Test Plan: Imported from OSS Differential Revision: D22149707 fbshipit-source-id: 44e7545bf9277d9245b5e9c2d9461f664fff0426	2020-06-22 10:41:31 -07:00
BowenBao	eaa91071ca	[ONNX] Support large attribute and subgraph for large model (#38793 ) Summary: Previously large tensor data in attributes and subgraphs are not stored externally. ONNX won't be able to serialize the model for cases where the total size sums up to >= 2GB. This PR enables that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38793 Reviewed By: hl475 Differential Revision: D22111092 Pulled By: houseroad fbshipit-source-id: 355234e50825d576754de33c86a9690161caaeaf	2020-06-22 10:34:37 -07:00
Hector Yuen	49887d1fc0	reference Swish implementation (#40150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40150 added a skeleton for a Swish implementation using fakelowp this implementation is as precise as it gets since it uses computation in fp32 as a reference -simplified the test since this is a linear sweep, no need to randomize it -modified the domain to ensure that 0 is always covered Test Plan: ran this test against the lowered swish implementation and found that the interpolation domain should be [-21,12] to cover even the smallest value in the Y domain Reviewed By: venkatacrc Differential Revision: D22025105 fbshipit-source-id: dd8561243182c359003b4370ce2312f607d964c9	2020-06-22 10:28:00 -07:00
Yael Dekel	3fa0b1e325	ONNX: fix bug in export of cumsum operator (#40044 ) Summary: The "cast" operator is currently added after the cumsum operator, but it should be added before, since torch.cumsum supports more types than ONNX (specifically, bool). Pull Request resolved: https://github.com/pytorch/pytorch/pull/40044 Reviewed By: hl475 Differential Revision: D22158013 Pulled By: houseroad fbshipit-source-id: e6c706572b9b8de880d4d71eaa132744ef01ad4d	2020-06-22 10:11:35 -07:00
Supriya Rao	c04d39aaf2	[quant][bug] Histogram observer bug fix with min == max (#40310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40310 Test Plan: python test/test_quantization.py test_histogram_observer_same_inputs Imported from OSS Differential Revision: D22145908 fbshipit-source-id: c1646d9ae6738755981fe3d09c8a8e25fcb994d4	2020-06-22 10:05:10 -07:00
Yael Dekel	766889b6bf	ONNX: fix bug in export of ops involving torch.bool type (#40006 ) Summary: When an op involves creating a tensor of a certain type (such as torch.ones(...)), the tracer creates a `prim::Constant` node with an integer value representing the type. The mapping from the torch type to integers maps: ``` torch.complex32 -> 8 torch.complex64 -> 9 torch.complex128 -> 10 torch.bool -> 11 ``` However, when the ONNX exporter maps back the integer to torch type, 10 is mapped to bool, 9 is mapped to complex128 and 8 is mapped to complex64. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40006 Reviewed By: hl475 Differential Revision: D22158019 Pulled By: houseroad fbshipit-source-id: 42fbd6b56566017ff03382c4faf10d30ffde3802	2020-06-22 09:57:25 -07:00
Luca Wehrstedt	0e146d2df4	Update TensorPipe submodule (#40374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40374 To pick up two fixes to MPT: `4b1b855f21` `462200aad3` MPT isn't yet used by PyTorch so this should have no effect Test Plan: Export to CircleCI and test Reviewed By: patricklabatut Differential Revision: D22160029 fbshipit-source-id: 202ea7487fcde015e5856f71ad6aebdfa6564ee1	2020-06-22 09:40:17 -07:00
Edward Yang	e4766fb4d9	Meta tensors, but without code deduplication (#38490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38490 A meta tensor is a tensor that is a lot like a normal tensor, except it doesn't actually have any data associated with it. You can use them to carry out shape/dtype computations without actually having to run the actual code; for example, this could be used to do shape inference in a JIT analysis pass. Check out the description in DispatchKey.h for more information. Meta tensors are part of a larger project to rationalize how we write kernels so that we don't have to duplicate shape logic in CPU kernel, CUDA kernel and meta kernel (this PR makes the duplication problem worse!) However, that infrastructure can be built on top of this proof of concept, which just shows how you can start writing meta kernels today even without this infrastructure. There are a lot of things that don't work: - I special cased printing for dense tensors only; if you try to allocate a meta sparse / quantized tensor things aren't going to work. - The printing formula implies that torch.tensor() can take an ellipsis, but I didn't add this. - I wrote an example formula for binary operators, but it isn't even right! (It doesn't do type promotion of memory layout correctly). The most future proof way to do it right is to factor out the relevant computation out of TensorIterator, as it is quite involved. - Nothing besides torch.add works right now - Meta functions are ALWAYS included in mobile builds (selective build doesn't work on them). This isn't a big deal for now but will become more pressing as more meta functions are added. One reason I'm putting up this PR now is to check with Yinghai Lu if we can unblock shape inference for accelerators, while we are still working on a long term plan for how to unify all shape computation across our kernels. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21935609 Pulled By: ezyang fbshipit-source-id: f7d8636eeb8516b6bc296db99a16e56029972eee	2020-06-22 09:18:33 -07:00
Thomas Viehmann	52f3a09663	ROCm: Use correct device type when exporting tensors to DLPack (#40124 ) Summary: Before this PR, DLPack export was tricked by the CUDA masquerading of the HIP backend into thinking that it was exporting a CUDA tensor. We change that to use the ROCM device type instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40124 Differential Revision: D22145215 Pulled By: ezyang fbshipit-source-id: 276f709861c55f499ae753d0bba48ddcc8b85926	2020-06-22 08:59:43 -07:00
guol-fnst	db5b273961	Rename dont_resize_outputs() to resize_outputs(false) (TensorIterator… (#40148 ) Summary: …) https://github.com/pytorch/pytorch/issues/40119 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40148 Differential Revision: D22144022 Pulled By: ezyang fbshipit-source-id: abd697ffd88b927877875e4c431ee39bd21eba24	2020-06-22 08:55:02 -07:00
rohithkrn	396087bfd8	[ROCm] Enable BFloat16 for pow, exp, erf ops on ROCm (#40236 ) Summary: Enable ops used in BERT which were missed in one of my earlier PRs. ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/40236 Differential Revision: D22143965 Pulled By: ezyang fbshipit-source-id: 5464ed021687fec1485e1c061e5a7aba71687fc4	2020-06-22 08:22:17 -07:00
F-G Fernandez	881c1adfcd	Fixed buffer update in BatchNorm when track_running_stats is set to False (#38084 ) Summary: This PR aims at tackling https://github.com/pytorch/pytorch/issues/37823 by: - ensuring that buffers will be used for normalization computation but won't be updated, when buffers are not None, and `track_running_stats=False` - adding a corresponding unittest to ensure expected behaviour Any feedback is welcome! _Note: we might want to update the docstrings of `BatchNorm*d`, feel free to share any suggestion!_ Pull Request resolved: https://github.com/pytorch/pytorch/pull/38084 Differential Revision: D22047871 Pulled By: ezyang fbshipit-source-id: 5acbcad9773e7901f26d625db71d43d7dc236d3e	2020-06-22 08:17:31 -07:00
peter	eb92ed6239	Append forward slashes to PIP_UPLOAD_FOLDER (#40352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40352 Differential Revision: D22160395 Pulled By: ezyang fbshipit-source-id: bb803c8a7cf7f8fd7682095b8b1917ec22a15495	2020-06-22 07:08:33 -07:00
peter	37c88a4731	Pin the version of scipy for Windows test jobs (#40369 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40366. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40369 Differential Revision: D22160114 Pulled By: malfet fbshipit-source-id: ea4c1349fc83787853d4925e7d0a2a63aecf0d77	2020-06-22 06:34:15 -07:00
Vasiliy Kuznetsov	ab8a99bd36	graph mode: add hardswish inplace handling (#40284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40284 Adds graph mode handling for inplace hardswish, and test coverage for functional hardswish. Test Plan: ``` python test/test_quantization.py TestQuantizeScriptPTSQOps.test_hardswish ``` Imported from OSS Differential Revision: D22140628 fbshipit-source-id: 55a514f7dc1130d510f69ee4e611d7cb5e08d02e	2020-06-21 09:40:50 -07:00
Vasiliy Kuznetsov	c6dbfcaf9e	quantized elu: graph mode handling (#40111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40111 Adds graph mode handling for quantized elu. Test Plan: ``` python test/test_quantization.py TestQuantizeScriptPTSQOps.test_elu ``` Imported from OSS Differential Revision: D22075080 fbshipit-source-id: 37fb1b9e390f2a33d47cbd025157532379b6aa64	2020-06-21 09:40:48 -07:00
Vasiliy Kuznetsov	cd0afe2b8e	quantized elu: eager mode QAT handling (#40104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40104 Adds eager mode QAT handling for quantized ELU. Test Plan: ``` python test/test_quantization.py TestQuantizationAwareTraining.test_activations ``` Imported from OSS Differential Revision: D22075082 fbshipit-source-id: 90eb06e4c52ec542fda97d7ee108a38465d3e845	2020-06-21 09:40:46 -07:00
Vasiliy Kuznetsov	03ed802a90	quantized elu: eager mode static handling (#40103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40103 Add eager mode static quantization handling for quantized ELU. Test Plan: ``` python test/test_quantization.py TestStaticQuantizedModule.test_elu python test/test_quantization.py TestPostTrainingStatic.test_activations ``` Imported from OSS Differential Revision: D22075081 fbshipit-source-id: 8a3df428be135a0565472ebd0f55fa801689bcc5	2020-06-21 09:40:44 -07:00
Vasiliy Kuznetsov	13d54c6471	quantized elu: require observation (#40100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40100 ELU has a range of [-1, inf]. In the original PR which added the quantized operator we decided to pass the quantization params from the input. However, it makes more sense to require observation for this op. This PR changes the API to require observation. Next PRs in this stack will add the eager and graph mode handling. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_qelu ``` Imported from OSS Differential Revision: D22075083 fbshipit-source-id: 0ea0fd05a00cc7a5f122a2b1de09144bbd586f32	2020-06-21 09:38:28 -07:00
Natalia Gimelshein	3bbedb34b9	restore generic IndexToScatterGatherOffset specialization (#40349 ) Summary: https://github.com/pytorch/pytorch/issues/39963 erroneously removed template specialization to compute offsets, causing cases relying on this specialization (topk for 4d+ tensors with topk dimension >= 1024/2048 depending on the type) to produce bogus results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40349 Differential Revision: D22153756 Pulled By: ngimel fbshipit-source-id: cac04969acb6d7733a7da2c1784df7d30fda1606	2020-06-20 23:14:13 -07:00
Pritam Damania	e632bf8d57	Add thrift and tensorpipe backend tests for test_ddp_under_dist_autograd. (#40210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40210 ghstack-source-id: 106300839 Test Plan: waitforbuildbot Differential Revision: D22110065 fbshipit-source-id: d9ebd009b8d451c75708eadc7eb3f2b788e875aa	2020-06-20 22:59:59 -07:00
peter	ac8c3c0ad1	Fix update_s3_html for nightly jobs (#40338 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/40337. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40338 Differential Revision: D22152913 Pulled By: seemethere fbshipit-source-id: bb76c726820efc7d0127201c4bd072bba95783c5	2020-06-20 15:08:06 -07:00
Ivan Kobzarev	3852215170	[vulkan] jit passes for vulkan conv2 prepack and fuse with clamp (#39282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39282 Test Plan: Imported from OSS Differential Revision: D21962424 Pulled By: IvanKobzarev fbshipit-source-id: 2d20e827d2c3836b7e6b443293377c68dc1ffa5a	2020-06-20 14:12:21 -07:00
Pritam Damania	f69460d0cb	Add unit test to verify DDP + RPC correctness. (#40139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40139 This unit test runs the same set of operations locally and then with DDP + RPC to verify correctness. ghstack-source-id: 106287490 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/:ddp_under_dist_autograd I ran these to make sure I am workin on a clean git repo. git submodule update --init --recursive to get latest tensor pipe code, otherwise build will have error. to record installed binaries and torch package wheels to system paths with-proxy env BUILD_CAFFE2_OPS=0 USE_CUDA=0 USE_MKLDNN=0 USE_DISTRIBUTED=1 python setup.py install --record files.txt remove binaries and torch package wheels from system paths. xargs rm -rf < files.txt build in develop mode with-proxy env BUILD_CAFFE2_OPS=0 USE_CUDA=0 USE_MKLDNN=0 USE_DISTRIBUTED=1 python setup.py develop pytest test/distributed/test_ddp_under_dist_autograd.py::TestDdpUnderDistAutograd -v Differential Revision: D22084385 fbshipit-source-id: e1f57e86ceddd4c96920ed904898e1763b47e8f2	2020-06-20 13:13:32 -07:00
Vitaly Fedyunin	a47fb57957	Change memory format promotion rules of point wise operators. (#37968 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37968 Modify memory format promotion rules to avoid promoting when one of the input is ambiguous. New rules are: Ambiguous + Contiguous = Contiguous Ambiguous + Channels Last = Channels Last Contiguous + Ambiguous ( NC11 ) = Contiguous Contiguous + Channels Last = Contiguous ( + Warning ) Before this PR: Channels Last Channels Last + Contiguous = Channels Last ( + Warning ) Channels Last + Ambiguous = Channels Last Bias + Channels Last = Channels Last Channels Last + Bias = Channels Last Test Plan: Imported from OSS Differential Revision: D21819573 Pulled By: VitalyFedyunin fbshipit-source-id: 7381aad11720b2419fb37a6da6ff4f54009c6532	2020-06-20 10:33:32 -07:00
Ivan Kobzarev	c1dfc05cc9	[android][test_app][reland] test_app example linking to pytorch_android aar content (#40313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40313 Test Plan: Imported from OSS Differential Revision: D22147079 Pulled By: IvanKobzarev fbshipit-source-id: c70a0a9dda8834376ed304a461318d4c6ef84582	2020-06-20 07:34:42 -07:00
Haixin Liu	4cbf87dc92	[PyTorch Numeric Suite] Add support for dynamic LSTM (#40065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40065 Add support for dynamic LSTM of all three Numeric Suite APIs: compare_weights(), compare_model_stub() and compare_model_outputs(). ghstack-source-id: 106291782 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_lstm_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_submodule_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_functional_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_functional_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_dynamic' Differential Revision: D22058275 fbshipit-source-id: 76cb42ce16b6b02b0b90f7582252756582660921	2020-06-20 07:00:13 -07:00
Raghuraman Krishnamoorthi	0079e429d6	Remove incorrect warning message on rounding mode (#40301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40301 ghstack-source-id: 106258861 Test Plan: Fix warning message Differential Revision: D22143261 fbshipit-source-id: 73a3b09ea82eb470c6702a413d1f984bbf38b3ea	2020-06-20 02:09:44 -07:00
Zafar	9da277c635	[quant][graphmodel] linear_relu (#40021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40021 This replaces #36889 due to significant merge conflicts Test Plan: Imported from OSS Differential Revision: D22087061 Pulled By: z-a-f fbshipit-source-id: 6a65cdd3c0c0c957968a9d017902fb6d03b58150	2020-06-19 23:32:54 -07:00
Jerry Zhang	e04a611b91	[quant][graphmode] clang format changes (#40329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40329 Test Plan: Imported from OSS Differential Revision: D22149706 fbshipit-source-id: 3c07cb0c09a53a01fc69185943ddc409264a6ff5	2020-06-19 23:22:43 -07:00
Jerry Zhang	59ca1d31ca	[quant][graphmode] docstrings for top level APIs (#40328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40328 Test Plan: Imported from OSS Differential Revision: D22149708 fbshipit-source-id: 63a1cd229d9e4668fba0ef3977e894cb8984318b	2020-06-19 22:20:23 -07:00
Jongsoo Park	7a837019a4	[caffe2] optimize 2/4-bit row-wise quantization (#387 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/387 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39985 avx2 optimized 2/4-bit row-wise quantization/dequantization in perfkernels. This diff slightly change the numerics of quantization by multiplying with the inverse of scale instead of dividing with scale. Test Plan: In my devserver for i in 2 4 8; do echo $i; buck run mode/opt :fused_rowwise_nbit_conversion_bench -- --bit-rate=$i; done Before this diff 2-bit 3.35394 ms. 100%. FloatToFused2BitRowwiseQuantized 4-bit 3.60351 ms. 100%. FloatToFused4BitRowwiseQuantized 8-bit 0.434467 ms. 100%. FloatToFused8BitRowwiseQuantized After this diff 2-bit 0.606386 ms. 100%. FloatToFused2BitRowwiseQuantized 4-bit 0.446683 ms. 100%. FloatToFused4BitRowwiseQuantized 8-bit 0.4349 ms. 100%. FloatToFused8BitRowwiseQuantized Reviewed By: choudharydhruv, jianyuh Differential Revision: D22033195 fbshipit-source-id: d3a219e47b8345268d90a160c9314ed0d5b71467	2020-06-19 21:28:31 -07:00
Ailing Zhang	cfe1c6ef9e	Update XLAPreAutograd keys. (#40265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40265 Differential Revision: D22137998 Pulled By: ailzhang fbshipit-source-id: 41edac06f8aafa5d4c1dcefd5da81be6c9ac4a9c	2020-06-19 21:12:50 -07:00
lixinyu	5c133eb2db	fix small typo in optim adamw (#40283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40283 Test Plan: Imported from OSS Differential Revision: D22138796 Pulled By: glaringlee fbshipit-source-id: 2c3a35f7e539b43ee5abf8dbc10b95df5d62fccb	2020-06-19 19:10:17 -07:00
Wanchao Liang	4b028a8e07	[jit] support pad_sequence/pack_sequence (#39844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39844 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D22026720 Pulled By: wanchaol fbshipit-source-id: cc51ea77eff3689e319ec7e89a54c788646b5940	2020-06-19 19:03:14 -07:00
Mike Ruberry	4f761f325c	Back out "[pytorch][PR] Removes dunder div" Summary: NVIDIA's Apex is updating to no longer rely on this behavior, but we're reverting this Python2->Python3 update to unblock internal apex users. Test Plan: Sandcaslte + OSS CI. Reviewed By: ngimel Differential Revision: D22146782 fbshipit-source-id: f9483d2cbf9dc3a469ad48a6c863edea3ae51070	2020-06-19 18:31:20 -07:00
Xiang Gao	5555d210b1	Cleanup TensorIteratorDynamicCasting.h (#39839 ) Summary: std::complex, and thrust::complex has gone Pull Request resolved: https://github.com/pytorch/pytorch/pull/39839 Differential Revision: D22139528 Pulled By: ngimel fbshipit-source-id: 535e8c137212338569c83c46ed6fd829934e4043	2020-06-19 18:17:50 -07:00
Jerry Zhang	b2f489dc57	[quant][graphmode] Rename graph mode quantization API to `quantize_jit` (#40212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40212 Test Plan: Imported from OSS Reviewed By: z-a-f Differential Revision: D22144745 fbshipit-source-id: 38a19b5afdddbbce262eea8ddf5b68458e6017b3	2020-06-19 18:13:37 -07:00
Hector Yuen	6d70d1574f	rename the LayerNorm operator and add it to the replacement map (#40318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40318 rename layernom fakefp16 to the right naming convention add it to the map of replacement ops this can be done even if the operator is not complete because we are blacklisting anyways Test Plan: net_runner and inspected the log that replacement happened Reviewed By: venkatacrc Differential Revision: D22145900 fbshipit-source-id: f19794ec05234b877f7697ed8b05dd8f46606c47	2020-06-19 16:49:22 -07:00
Xiang Gao	fb17b05f33	Make dynamic casting case also benefit from unrolling (#34749 ) Summary: This is based on https://github.com/pytorch/pytorch/issues/34708, I didn't use stacked diff because is not very convenient for cherry-picking. Please review after https://github.com/pytorch/pytorch/issues/34708 merged. Legacy kernels are now completely gone. And the rewrite of GPU loops is done. Benchmark shows big improvements in performance on RTX 2080ti: https://github.com/zasdfgbnm/things/blob/master/2020Q1/benchmark-unroll-with-dyn-casting.ipynb Pull Request resolved: https://github.com/pytorch/pytorch/pull/34749 Differential Revision: D22139597 Pulled By: ngimel fbshipit-source-id: 5995744c339afee331f15ea2e483c6acf3ce0c62	2020-06-19 16:43:46 -07:00
Ilia Cherniavskii	4194456158	Add _enable_record_function python API (#40306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40306 Adding _enable_record_function Test Plan: CI Differential Revision: D22143026 fbshipit-source-id: dc466ad3303cb1d52a66aab74ba668e36bab5458	2020-06-19 16:08:00 -07:00
Pritam Damania	a80dd02a22	[Resubmit] Ensure NCCL_BLOCKING_WAIT=1 works for dist.barrier() (#40249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40249 Blocking wait didn't work for dist.barrier() since we performed a cudaDeviceSynchronize() before we performed any of the timeout checks. As a result, in case of failures/desync the barrier() call would get stuck on cudaDeviceSynchrnonize() and would never return a timeout error to the user. To fix this, I've moved the device synchronization after the timeout checks. ghstack-source-id: 106250153 ghstack-source-id: 106250153 Test Plan: waitforbuildbot Differential Revision: D22126152 fbshipit-source-id: d919a7a6507cca7111d8ad72e916777b986d0d67	2020-06-19 15:42:43 -07:00
Shen Li	314d645e05	Add a warning to mention that async_execution does not work with autograd profiler (#40309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40309 Test Plan: Imported from OSS Differential Revision: D22145130 Pulled By: mrshenli fbshipit-source-id: d6f7250e53648d6939367f1ad4c9b898be00afed	2020-06-19 15:35:00 -07:00
Shen Li	5d0044389a	Minor RPC doc improvements (#40305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40305 Test Plan: Imported from OSS Differential Revision: D22144304 Pulled By: mrshenli fbshipit-source-id: 1c8a9648043eabaf909c6e4ae116672396a9f0f5	2020-06-19 15:34:58 -07:00
Shen Li	a9f0156271	Fix RRef to_here() docs (#40300 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40300 Test Plan: Imported from OSS Differential Revision: D22143252 Pulled By: mrshenli fbshipit-source-id: 85a5b7a7bab9ad29fe71064c927b059dd1ab39f9	2020-06-19 15:34:56 -07:00
Shen Li	caf0c286b8	Fix RPC API doc links (#40299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40299 Test Plan: Imported from OSS Differential Revision: D22143156 Pulled By: mrshenli fbshipit-source-id: c11848ebfe8863d59509a0fbc042eed71a58e514	2020-06-19 15:34:53 -07:00
Shen Li	d6d579397d	Improve docs for init_rpc (#40298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40298 Test Plan: Imported from OSS Differential Revision: D22143155 Pulled By: mrshenli fbshipit-source-id: deadcc29eda157b401ca6a091c3ba17455acb6b5	2020-06-19 15:34:51 -07:00
Shen Li	3ca05500fa	Improve RPC documents (#40296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40296 1. Added a link to parameter server tutorial 2. Explained current states for TorchScript support Test Plan: Imported from OSS Differential Revision: D22142647 Pulled By: mrshenli fbshipit-source-id: ffd697dd64a3aa874cf3f3488122ed805903370d	2020-06-19 15:34:49 -07:00
Shen Li	4463f59c2c	Let torch.futures.wait_all re-throw errors (#40291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40291 Test Plan: Imported from OSS Differential Revision: D22141702 Pulled By: mrshenli fbshipit-source-id: 50b5e5c687e87930aef3a50cc40839729a4eb9c6	2020-06-19 15:32:56 -07:00
Jiakai Liu	f92089b8ca	[pytorch] tweak code analyzer script to handle new namespaces (#40276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40276 - add a couple new namespaces; - handle the case where both contextual namespace and opreator namespace are set (BackendSelectRegister.cpp and #39401); - improve error message; Test Plan: Imported from OSS Differential Revision: D22135686 Pulled By: ljk53 fbshipit-source-id: 14d359c93573349b8fe1e05d7e44d875295a5f6d	2020-06-19 14:54:21 -07:00
Nikita Shulga	6df97c20c2	Make test case precision property (#40057 ) Summary: Make `common_utils.TestCase.precision` a property, because it is overriden as such in `common_device_type`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40057 Differential Revision: D22138385 Pulled By: malfet fbshipit-source-id: 0e7c14654bf60f18f585efc61f96fdd0af23346f	2020-06-19 14:24:55 -07:00
James Reed	c73095e78f	Add note to serialization docs about zipfile format (#40288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40288 Test Plan: Imported from OSS Differential Revision: D22140324 Pulled By: jamesr66a fbshipit-source-id: 01d7aa642ed2f4e4bdac4b7f3223bf4d7e62fd4d	2020-06-19 13:40:08 -07:00
Negin Raoof	73a156e81f	[ONNX] Update pytorch/onnx docs for new export API args (#39802 ) Summary: Update pytorch/onnx docs for new export API args: Use external data format and Training args. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39802 Reviewed By: hl475 Differential Revision: D22139664 Pulled By: houseroad fbshipit-source-id: 7d6dcf75129cb88987f8c37b7d9d48ca594c0f38	2020-06-19 13:38:47 -07:00
neginraoof	41865d8f19	[ONNX] Update black_listed_operators for opset 12 (#39414 ) Summary: Remove black_listed_operators for opset 12 as we now support these ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39414 Reviewed By: hl475 Differential Revision: D21915584 Pulled By: houseroad fbshipit-source-id: 37ec7bdd2b5a845484535054026d6613d0921b7a	2020-06-19 13:33:25 -07:00
Hector Yuen	65f67bbe92	improvements to sls 4bit Summary: enhance the sls test to reflect the shapes and values Test Plan: ran sls tests on device and emulator Reviewed By: amylittleyang Differential Revision: D22094433 fbshipit-source-id: 610a79433ae6c58f626b5984a3d89d9e1bbf4668	2020-06-19 13:30:53 -07:00
Luca Wehrstedt	c3ce35e67b	Update TensorPipe submodule Summary: This is to import a few features: - a fix to a race condition happening in SHM's use of epoll - a new XTH channel, that uses a memcpy to transfer between threads of the same process - a new MPT channel, that chunks and multiplexes tensors over multiple transport event loops Test Plan: Run in CircleCI Reviewed By: patricklabatut Differential Revision: D22140736 fbshipit-source-id: a3cee8a3839d98a42b8438844a9fd24fd85b2744	2020-06-19 13:22:06 -07:00
Jeff Daily	b48742322a	move ROCm 3.5 thunk upgrade from build.sh into test.sh (#40286 ) Summary: https://github.com/pytorch/pytorch/issues/40181 incorrectly placed the thunk work-around into the build.sh scripts. It needed to be in test.sh. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40286 Differential Revision: D22140366 Pulled By: xw285cornell fbshipit-source-id: 2a3d73594d1963c8c80cd8c45d06f1c963b9cbee	2020-06-19 12:30:30 -07:00
Rohan Varma	ca0540a7eb	Remove variable shadowing from tensorpipe lambda (#39126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39126 futureResponseMessage is shadowed in the pipeWrite lambda which creates some confusion, since it is used in the initial error handling but then a future of the same name is created when marking the future as completed. This change removes this by getting rid of the futureResponseMessage capture, instead capturing the message id. This change also makes it so that we don't need to copy it into the lambda. ghstack-source-id: 106211353 Test Plan: CI Differential Revision: D22127398 fbshipit-source-id: c98a53b5630ce487461e4ca9cd72fbd34788298d	2020-06-19 12:25:42 -07:00
Ilia Cherniavskii	cdbf78fba0	Revert D22118945: [android] test_app example linking to pytorch_android aar content Test Plan: revert-hammer Differential Revision: D22118945 (`52a2adb3f4`) Original commit changeset: 31c54b49b1f2 fbshipit-source-id: 0c4929d4441572debbbc49f8674b9fc49b726599	2020-06-19 12:16:18 -07:00
Edmund Williams Jr	465138ec39	refactoring TestQuantizeScript (#39677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39677 Test Plan: Moved a test class suite between files, wanted to have same functionality (simple code refactor) so tested to make sure the test output was the same before/after the refactor. Image below shows the output of TestGraphModePostTrainingStatic before refactor {F239676498} This image shows the output of TestQuantizeScript (renamed version that is in test_quantize_script.py instead of test_quantize.py) {F239676509} Differential Revision: D21940638 Pulled By: edmundw314 fbshipit-source-id: 54160a5151aadf3a34bdac2bcaeb52904e6653ed	2020-06-19 11:47:00 -07:00
Gemfield	3684dfafc2	Fix typos in RPC examples (#40280 ) Summary: There has a missing '=' in rpc_sync call in RPC example. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40280 Differential Revision: D22137619 Pulled By: mrshenli fbshipit-source-id: f4e4b85f68fd68d29834e199416176454b6bbcc2	2020-06-19 11:43:11 -07:00
Nikita Shulga	b670ff2d3a	Add typing for _CudaStreamBase and _CudaEventBase classes (#40256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40256 Differential Revision: D22139369 Pulled By: malfet fbshipit-source-id: c7f4f8709700eb85d971ad504dd3552e311cb58d	2020-06-19 11:29:41 -07:00
Omkar Salpekar	52e4e3a9b8	NCCL Comment typo fix (#40242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40242 Comment Typo in ProcessGroupNCCL ghstack-source-id: 106088379 Test Plan: CI Differential Revision: D22099219 fbshipit-source-id: ddce91e640d4eea54e0698166c6276aeffedeb1e	2020-06-19 11:24:52 -07:00
Haixin Liu	d9c804ce22	[PyTorch Numeric Suite] Add support for dynamic quantization of linear module (#39024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39024 Add support for dynamic quantization of linear module. ghstack-source-id: 106205450 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_weights_linear_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_submodule_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_functional_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub_linear_dynamic' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_conv_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_functional_static' buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs_linear_dynamic' Differential Revision: D21675971 fbshipit-source-id: c9562744dc59b61cf47f2787a934e6a5a53e12fd	2020-06-19 10:58:56 -07:00
Yinghai Lu	07e581d639	Remove useless name check for inputs (#4618 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/4618 `onnxInputNames_` originated from positional name binding. This is inherited from C2, where in C2 inputs are bound by position. So it's useless to check the name here as like as `onnxInputNames_` is filled. If should save cycles on string comparison. Test Plan: run it. Reviewed By: jackm321 Differential Revision: D22104338 fbshipit-source-id: 250463744aa37ed291aebd337e26d573048583ff	2020-06-19 10:05:26 -07:00
Gregory Chanan	96057c0080	Fix missing deprecation warning for Tensor.nonzero(). (#40187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40187 There were two issues: 1) The hand-written definition included an ambiguous default, which made the deprecated signature not selected. This didn't match the handwritten torch.nonzero, now they do. 2) A parsing bug for empty argument lists meant the signature wasn't being marked as deprecated. Test Plan: Imported from OSS Differential Revision: D22118236 Pulled By: gchanan fbshipit-source-id: a433ce9069fef28aea97cbd76f2adf5a285abd73	2020-06-19 09:24:48 -07:00
Kimish Patel	ece8ef2fc6	Run canonical graph optimizations in optimize_for_mobile. (#38840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38840 JIT graph executor runs some canonical optimizations such as cse, dead code elimination etc before constructing code that interpreter executes. Since we do not have full JIT in lite interpreter any such graph optimizations must happen AOT. This diff applies such canonical optimizations on graph. Test Plan: CI's test_mobile_optimizer. Reviewed By: dreiss Differential Revision: D21675855 fbshipit-source-id: 5dd898088ef8250103ccbbb6aa2bbce156a8d61d	2020-06-19 09:19:29 -07:00
Nikita Shulga	a11870b45d	Revert D22118971: [android] gradle version update Test Plan: revert-hammer Differential Revision: D22118971 (`262ad8e6ab`) Original commit changeset: 566e45e8f6f7 fbshipit-source-id: 74cfec0c978b724d84460a6d0c98f97b389811f7	2020-06-19 08:48:21 -07:00
Edmund Williams Jr	b0324a97f5	_jit_pass_fold_convbn wrapped with fuse_conv_bn_script (#40224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40224 Test Plan: Imported from OSS Differential Revision: D22117111 Pulled By: edmundw314 fbshipit-source-id: 9252674bd770ba6669d50090849d9f9bc13edaa3	2020-06-19 08:19:40 -07:00
Alexander Mols	b7bfdcbe3e	[caffe2/torch] Use logger in jit instantiator Summary: Previously the module would log some data using `print()`. This can be a problem when used in contexts where the process expects to write data to stdout itself. This diff changes the log statements to use `logger` instead. This makes it similar to other log statements in the same module. Test Plan: Confirmed no weird test showed up when running: buck test caffe2/test/distributed/nn/api:remote_module_fork Differential Revision: D22136172 fbshipit-source-id: a3d144eba6c75925ed684981793c84b36eb45a5d	2020-06-19 07:49:15 -07:00
Luca Wehrstedt	2393bab036	[TensorPipe] Update documentation (#40222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40222 Mention the TensorPipe agent in the RPC docs and give users the information they need to choose which agent to use. ghstack-source-id: 106225711 Test Plan: Export to GitHub, build locally and try out the docs. Differential Revision: D22116494 fbshipit-source-id: 30703ba8410c40f64e785f60d71dfd9faa8de4a1	2020-06-19 04:26:49 -07:00
Lu Fang	8315bb2359	Back out "[pytorch][PR] [JIT] Infer NamedTuple type attributes of nn.Modules correctly" (#40270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40270 Original commit changeset: 1227e243ab94 D22082806 (`1e03d603c6`) broke the model generation of pyper models. We trace the namedtuple as input. To unblock the development of PyPer project, let's revert the diff first. Sorry about the inconvenience, SplitInfinity ghstack-source-id: 106217609 Test Plan: buck run dper3/dper3_models/experimental/pytorch/feed:feed_generation_script -- --model_files_dir=/tmp/ Reviewed By: alyssawangqq Differential Revision: D22132960 fbshipit-source-id: ce9278c8462602a341e231ea890e46f74e743ddf	2020-06-19 02:58:31 -07:00
Sebastian Messmer	86b1afa039	Assert that kernels are called with the right signature (#40251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40251 Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel. This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly. ghstack-source-id: 106194240 Test Plan: waitforsandcastle Differential Revision: D22126701 fbshipit-source-id: 6c908a822e295757bcc0014f78f51e6a560f221f	2020-06-18 21:54:05 -07:00
Deepali Chourasia	02e091902f	Release DistAutogradContainer context for each dist_autograd test case (#38711 ) Summary: this fixes - https://github.com/pytorch/pytorch/issues/38710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38711 Differential Revision: D22132057 fbshipit-source-id: 894280d164543c63beaec679c18f2059e7055b01	2020-06-18 20:58:55 -07:00
Eli Uriegas	6e2c88980e	.circleci: Add git to the ecr gc docker images (#40262 ) Summary: Fixes GC jobs failing Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/40262 Differential Revision: D22131192 Pulled By: seemethere fbshipit-source-id: 182eb5f2f49889e6ef19817130e155c52bec2060	2020-06-18 20:37:35 -07:00
Pavel Belevich	ccea3726da	[Reland #4 ] Include AT_PARALLEL_OPENMP/AT_PARALLEL_NATIVE/AT_PARALLEL_NATIVE_TBB to ATen/Config.h (#40211 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39471 Reland of https://github.com/pytorch/pytorch/issues/39612 #39881 https://github.com/pytorch/pytorch/issues/40045 #40122 Proof: [green TBB test](https://app.circleci.com/pipelines/github/pytorch/pytorch/182769/workflows/ae9f4f7a-791a-49df-9625-e2f0a51e70e7/jobs/5910591/steps) Pull Request resolved: https://github.com/pytorch/pytorch/pull/40211 Reviewed By: malfet Differential Revision: D22128537 Pulled By: pbelevich fbshipit-source-id: 98c589405daafc2c81f76e1d5c1aef5e57065351	2020-06-18 20:19:55 -07:00
Nikita Shulga	a6420b8c75	Increase bazel test timeout to 8 minutes (#40263 ) Summary: Intermittently `integration_test` hits default 5 min timeout Pull Request resolved: https://github.com/pytorch/pytorch/pull/40263 Differential Revision: D22131334 Pulled By: malfet fbshipit-source-id: 128d3b6882ac5c1b60229a8e0cd2752b817191b5	2020-06-18 20:07:59 -07:00
Shen Li	8f51c39649	Improve torch.futures docs (#40245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40245 Test Plan: Imported from OSS Differential Revision: D22126892 Pulled By: mrshenli fbshipit-source-id: e7d06b9b20ac8473cc6f0572dd4872096fd366c3	2020-06-18 18:47:25 -07:00
Pritam Damania	13bd5992d0	Remove `finalize_bucket_sparse` from DDP (#40130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40130 The sparse gradients for the model and the tensor that is used to perform allreduce in DDP are essentially the same and have the same storage. As a result, once allreduce is done, the sparse gradients are automatically updated and unlike dense gradients we don't need to assign the bucket's contents back to the grad. In addition to this, I've also added a test for distributed autograd to ensure it works correctly for sparse gradients. I discovered `finalize_bucket_sparse` was redundant as part of this test since it passed without any changes needed to `finalize_bucket_sparse` which only looked at the `.grad` field. ghstack-source-id: 106090063 Test Plan: waitforbuildbot Differential Revision: D22080004 fbshipit-source-id: 493ce48b673f26b55dffd6894a3915dc769839f6	2020-06-18 17:07:45 -07:00
Rohan Varma	7e82382ad5	Allow profiler to be enabled remotely with RPC (#38748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38748 This diff contains the message scaffolding and profiler changes in order to be able to remotely run the profiler across different nodes and aggregate the results on a single node. As discussed, we have implemented this by creating new message types, that similar to autograd messages, wrap the profiling information with the original message, and send this new message over the wire. On the receiving end, this wrapped message is detected, we fetch the original message from it, and process the original message with the profiler enabled. When sending a response with profiling information, we serialize the profiled `Events` and send them back over RPC. When such a message is received, the events profiled on the remote node are stored (added back to the local profiler). Changes in this PR: - New message types (run_with_profiling_req, run_with_profiling_resp) to send profiling info over the wire. Message parsing logic is added to handle these wrapped types. - Handling of sending profiler data over the wire, in particular, the attributes of the `ProfilerConfig` and the serialized profiled `Event`s - The logic for wrapping RPC messages is deduped with that in `rpc_with_autograd`, and the common payload wrapping/unwrapping logic is moved to helper functions in `rpc/utils.cpp` - Changes in `autograd/utils.cpp` to detect if we have enabled the profiler and are sending an RPC, if so, uses the above new message types - Changes in request_callback to parse and turn on the profiler in a thread-local fashion - Serialization and deserialization of profiling `Events`, and support to add the remote events to the thread-local profiler - Introduction of the concept of `node_id`, which as discussed with ilia-cher , will be used along with the `Event`s handle attribute to distinguish between events. When there are events from different nodes, this node information is rendered in the profile output (e.g. when printing tables), otherwise, it is not, since it is irrelevant. - Some changes to profiler.cpp to add useful helper methods/guards - toHere() is now profiled for RRefs - Unittests ghstack-source-id: 106134626 Test Plan: Added unittests, existing profiler unittests. Differential Revision: D19510010 fbshipit-source-id: 044347af992f19a9e3b357c9567f6fc73e988157	2020-06-18 17:01:57 -07:00
Meghan Lele	d58b8222b7	[JIT] Add support for with statements (#34705 ) Summary: Summary This commit adds support for with statements to PyTorch JIT. Each of the with items in a with statement is represented in the JIT IR as a pair of `prim::Enter` and `prim::Exit` nodes that call the `__enter__` and `__exit__` methods defined on the context manager objects returned by the expressions in the with item. Testing This commit adds unit tests for with statements with named with items, nameless with items, and with statements that encounter exceptions. ``` $ python test/test_jit.py TestWith.test_with_as Fail to import hypothesis in common_utils, tests are not derandomized . ---------------------------------------------------------------------- Ran 1 test in 0.430s OK ``` ``` $ python test/test_jit.py TestWith.test_with_no_as Fail to import hypothesis in common_utils, tests are not derandomized . ---------------------------------------------------------------------- Ran 1 test in 0.264s OK ``` ``` $ python test/test_jit.py TestWith.test_with_exceptions Fail to import hypothesis in common_utils, tests are not derandomized Couldn't download test skip set, leaving all tests enabled... . ---------------------------------------------------------------------- Ran 1 test in 1.053s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34705 Differential Revision: D22095945 Pulled By: SplitInfinity fbshipit-source-id: f661565a834786725259b8ea014b4d7532f9419d	2020-06-18 16:57:18 -07:00
Xiang Gao	8c73e74fdf	Clean up thrust::complex usage in geometric kernels (#39293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39293 Differential Revision: D22123705 Pulled By: anjali411 fbshipit-source-id: cb83a9c93d1d5e9d499e52ecec61f3dc025f430c	2020-06-18 16:52:32 -07:00
Supriya Rao	9788a74da8	[quant][bug] Fix histogram observer with 0 input (#40191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40191 When the first couple of inputs passed to histogram observer are all 0's subsequent non-zero inputs cause a div by 0 error Test Plan: python test/test_quantization.py TestHistogramObserver.test_histogram_observer_zero_inputs Imported from OSS Differential Revision: D22119422 fbshipit-source-id: 8bbbba914ba7f343121830c768ca0444439f8e03	2020-06-18 16:33:50 -07:00
Ivan Kobzarev	262ad8e6ab	[android] gradle version update (#40176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40176 Test Plan: Imported from OSS Differential Revision: D22118971 Pulled By: IvanKobzarev fbshipit-source-id: 566e45e8f6f7aa357c98976ad9981c76d4c66a7f	2020-06-18 16:28:34 -07:00
Ivan Kobzarev	52a2adb3f4	[android] test_app example linking to pytorch_android aar content (#39587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39587 Example of using direct linking to pytorch_jni library from aar and updating android/README.md with the tutorial how to do it. Adding `nativeBuild` dimension to `test_app`, using direct aar dependencies, as headers packaging is not landed yet, excluding `nativeBuild` from building by default for CI. Additional change to `scripts/build_pytorch_android.sh`: Skipping clean task here as android gradle plugin 3.3.2 exteralNativeBuild has problems with it when abiFilters are specified. Will be returned back in the following diffs with upgrading of gradle and android gradle plugin versions. Test Plan: Imported from OSS Differential Revision: D22118945 Pulled By: IvanKobzarev fbshipit-source-id: 31c54b49b1f262cbe5f540461d3406f74851db6c	2020-06-18 16:26:25 -07:00
Xiang Gao	954a59a2f5	Add at::tensor(complex) and torch::tensor(complex) overload (#39793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39793 Differential Revision: D22067181 Pulled By: anjali411 fbshipit-source-id: 3cec1289a8aa3a9cc6bd1fcdb2974f858f75f7bd	2020-06-18 16:20:27 -07:00
Jeremy Lilley	35f357927d	[futures] Add specific python unittest coverage for collect_all/wait_all (#40233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40233 There was a question earlier whether torch.futures.wait_all() would raised if the underlying futures raise (it was supposed to, but no test coverage). This change adds a couple very basic torch.futures.collect_all/ wait_all tests. ghstack-source-id: 106168134 Test Plan: buck test mode/dev-nosan caffe2/test:futures Differential Revision: D22120284 fbshipit-source-id: 3a8edae5dbf8c58c8361eff156c386a684ec5e86	2020-06-18 16:14:10 -07:00
Nikita Shulga	8b5732e8ad	Move `torch.cuda` annotations inline (#40075 ) Summary: Also enable `torch.cuda` typechecking Pull Request resolved: https://github.com/pytorch/pytorch/pull/40075 Differential Revision: D22121275 Pulled By: malfet fbshipit-source-id: dbecef09911334e8f3d87f5ecab66349da9f2325	2020-06-18 15:52:29 -07:00
generatedunixname89002005287564	c1958de49d	[Codemod][FBSourceGoogleJavaFormatLinter] Daily `arc lint --take GOOGLEJAVAFORMAT` Reviewed By: zertosh Differential Revision: D22112813 fbshipit-source-id: 18ec732d7fc9752d5ed84e1cbb1e455e39e65d1e	2020-06-18 15:36:44 -07:00
Sotiris Lamprinidis	41f2dbde31	Add `AdamW` to C++ frontend (#40009 ) Summary: Slightly modified Adam, following the python implementation, and the `ProducesPyTorchValues` tests pass. I had a problem with another test though (see commit c1a6241676ab84fc531c1c3a10f964aa5704092e), it seems that optimizing for two steps with the same optimizer vs optimizing for two steps using freshly initialized objects will produce the same output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40009 Differential Revision: D22096053 Pulled By: glaringlee fbshipit-source-id: a31a8f5488cb37c53752ddf15436efabdba67dc4	2020-06-18 15:28:12 -07:00
Jeff Daily	89ef8f8141	add test_openmp to ROCM_BLACKLIST (#40204 ) Summary: This test is flaky for rocm platform. Add to blacklist until it can be further reviewed. CC ezyang xw285cornell sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/40204 Differential Revision: D22108295 Pulled By: xw285cornell fbshipit-source-id: 802444a7b41260edcb6ce393237784f3e6c52a74	2020-06-18 15:15:35 -07:00
Hector Yuen	430d5cec0e	print position of the operator that failed to onnxifi (#40232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40232 if an operator failed to onnxifi due to lack of support (not because of missing shapes), print out the position of such op, which can be used to feed net runner Test Plan: I0618 09:25:06.299002 1570804 onnxifi_transformer.cc:1232] Don't support c2 op SparseLengthsSumFused4BitRowwise at pos 246 (1030) Reviewed By: hl475 Differential Revision: D22120055 fbshipit-source-id: a3c68b93b7e38dfda5d70168e7541021a8e16dcb	2020-06-18 15:08:39 -07:00
Sebastian Messmer	cb8b2f0636	Revert D21534052: Assert that kernels are called with the right signature Test Plan: revert-hammer Differential Revision: D21534052 Original commit changeset: 6be436a3f205 fbshipit-source-id: a149c5ca7f9e78947ae3059ac4470712f291660b	2020-06-18 15:00:13 -07:00
Martin Yuan	85128113f9	[Selective build] Enable selective build in VariablType Summary: Quick fix due to code merging. With this feature working, the total size reduction in Android is 664 KB (Pytorch -26 KB and papaya - 639 KB) https://fburl.com/unigraph/c726gvb1 Test Plan: CI Reviewed By: kwanmacher Differential Revision: D22053779 fbshipit-source-id: 8da4a651432b453c25e543bc64dbed02946de63d	2020-06-18 14:31:09 -07:00
Sebastian Messmer	55cdd31bd0	Assert that kernels are called with the right signature (#38361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38361 Rather than segfaulting, we should show a good error message when in op.call<Return, Args...>(...) the Return type or Args types mismatch the kernel. This adds an assertion comparing two std::type_index to the call path, but that should be fast. Hashing the function signature is also in the call path and not strictly constexpr, but I checked on godbolt that GCC >=5 and Clang >=3.8 optimize it away and make it constexpr, i.e. it's not part of the assembly. supersedes D17485438 ghstack-source-id: 106178820 Test Plan: waitforsandcastle Differential Revision: D21534052 fbshipit-source-id: 6be436a3f20586277a051d764af29e21d5567da0	2020-06-18 14:22:48 -07:00
Sergey Ionov	d14d47b9b5	Get rid of global constructors in cuda codegen (#40183 ) Summary: Use switch instead of look ups in global std::unordered_maps<> to do enum-to-name conversions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40183 Reviewed By: malfet Differential Revision: D22117731 Pulled By: ionsphere fbshipit-source-id: d150114cfae5b1222bb9142d815f2379072506c7	2020-06-18 13:54:11 -07:00
Ivan Kobzarev	0891764e80	[android] ANDROID_STL=c++_shared (#39588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39588 Before this diff we used c++_static linking. Users will dynamically link to libpytorch_jni.so and have at least one more their own shared library that probably uses stl library. We must have not more than one stl per app. ( https://developer.android.com/ndk/guides/cpp-support#one_stl_per_app ) To have only one stl per app changing ANDROID_STL way to c++_shared, that will add libc++_shared.so to packaging. Test Plan: Imported from OSS Differential Revision: D22118031 Pulled By: IvanKobzarev fbshipit-source-id: ea1e5085ae207a2f42d1fa9f6ab8ed0a21768e96	2020-06-18 13:50:05 -07:00
Ivan Kobzarev	d3b786afdb	[android] Add libtorch headers to pytorch_android aar (#39507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39507 Adding gradle task that will be run after `assemble` to add `headers` folder to the aar. Headers are choosed for the first specified abi, they should be the same for all abis. Adding headers works through temporary unpacking into gradle `$buildDir`, copying headers to it, zipping aar with headers. Test Plan: Imported from OSS Differential Revision: D22118009 Pulled By: IvanKobzarev fbshipit-source-id: 52e5b1e779eb42d977c67dba79e278f1922b8483	2020-06-18 13:47:18 -07:00
Eli Uriegas	83d7718c5c	.circleci: Add docker builds based on rev-parse (#40194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40194 Adds the scaffolding for doing docker builds based off git rev-parse tags to detect changes. Basically allows us to do our previous builds while also prepping for the new builds by just retagging our current builds as the new ones and telling the garbage collector not to reap them. Should also skip out on redundant builds if the image already exists thus saving us some compute time on docker builds. Also adds the commands to load the calculated DOCKER_TAG from a shared workspace file. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Differential Revision: D22120651 Pulled By: seemethere fbshipit-source-id: c74f10816d63f440a9e0cdd00d6fa1a25eb7a2c1	2020-06-18 13:41:17 -07:00
Wanchao Liang	442ec1dd4e	[test] split remaining quantization tests out of test_jit (#40144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40144 as title, split remaining quantization tests out of test_jit to reduce the size of test_jit Test Plan: Imported from OSS Differential Revision: D22085034 Pulled By: wanchaol fbshipit-source-id: 0c8639da01ffc3e6a72e6f470837786c73a6b3f0	2020-06-18 13:39:13 -07:00
Ilia Cherniavskii	30648985a7	Revert D22108899: Ensure NCCL_BLOCKING_WAIT=1 works for dist.barrier() Test Plan: revert-hammer Differential Revision: D22108899 Original commit changeset: 6b109ef9357e fbshipit-source-id: 41ca36091a7d4d5e94143835809560362fb14fcd	2020-06-18 13:35:10 -07:00
Ivan Kobzarev	74a2cb87e3	[android][cmake] Remove NO_EXPORT for libtorch mobile build (#39584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39584 Removing `-DNO_EXPORT` for not-custom-build to be able to link to C10/A10 api. Custom build stays the same as its main goal is to have minimum binary size, while export api functions will increase it. Additional changes: 1. aten/src/ATen/DynamicLibrary.cpp uses libdl, if we need this functionality we will need to link result with libdl, but currently disabling this functionality for mobile. Test Plan: Imported from OSS Differential Revision: D22111600 Pulled By: IvanKobzarev fbshipit-source-id: d730201c55f543c959a596b34be532aecee6b9ab	2020-06-18 11:48:53 -07:00
Gemfield	034eddca01	Fix typos in RPC Docs (#40219 ) Summary: Environment variable MASTER_ADDRESS and MASTER_port should be MASTER_ADDR and MASTER_PORT respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40219 Differential Revision: D22116585 Pulled By: mrshenli fbshipit-source-id: d312ae66210b0a16ec3ab1f468b1654bb0a75a0f	2020-06-18 11:40:11 -07:00
lixinyu	645d6c014c	preserve output tensor's stride in TI's fast setup (#38895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38895 Test Plan: Imported from OSS Differential Revision: D21696586 Pulled By: glaringlee fbshipit-source-id: c7206dbcf74d30998544e221cd0c998c4c25663a	2020-06-18 11:34:21 -07:00
Eli Uriegas	6a42d85fc6	.circleci: Move docker_build workflow to codegen (#40189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40189 This is to allow for easier modification later on down the road. Makes no actual modification to the `.circleci/config.yml` Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Differential Revision: D22119414 Pulled By: seemethere fbshipit-source-id: c6cb105d983e43ae1bf289b2d9f734b34a7febe2	2020-06-18 11:19:29 -07:00
Jerry Zhang	aa84ec5325	[quant][api] Expose graph mode quantization API in `torch.quantization` (#40198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40198 Test Plan: Imported from OSS Differential Revision: D22106542 fbshipit-source-id: 482af0194b8d084dfc76426447e58b86efaa1a59	2020-06-18 10:34:20 -07:00
Irene Chen	fef253e711	[codemod][custom_rule] Migrate some scripts to use named outputs for custom_rule Reviewed By: jermenkoo Differential Revision: D22069459 fbshipit-source-id: d1e1ed43080f29cfac55af8ff3af571efd10b9de	2020-06-18 10:29:54 -07:00
Vasiliy Kuznetsov	fcc9a1e664	graph mode: move hardsigmoid op to `single_input_general_value` category (#40055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40055 Noticed this while reading the `helper.cpp` file, seems like this op should be in the `single_input_general_value` bucket. Test Plan: CI Imported from OSS Differential Revision: D22054257 fbshipit-source-id: 2ca16ff863d644cbd03c3938eeca0fb87e3e4638	2020-06-18 10:21:22 -07:00
Vasiliy Kuznetsov	37362fff66	graph mode: util for fusion of functions which require observation (#39413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39413 Implementing the request from https://github.com/pytorch/pytorch/pull/39095 WIP so we can align on the API, once it looks good will amend the PR to apply to all relevant functions. Test Plan: ``` python test/test_quantization.py TestQuantizeScriptPTSQOps.test_hardswish ``` Imported from OSS Differential Revision: D21885263 fbshipit-source-id: 029339a99f8c50e45dd1dfb7fd89c20e3188720d	2020-06-18 10:21:20 -07:00
Vasiliy Kuznetsov	4ad8ebe738	quant layer/group/instance norm: make weights and biases optional (#39203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39203 Adds logic and test coverage for optional weights and biases for the quantized normalization operators. This was broken before this PR because the `TORCH_LIBRARY` registration had these as required parameters - removed it, and cleaned up the callsites. Note: consolidating the registrations in `native_functions.yaml` as opposed to `library.cpp` after a discussion with ezyang . Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_qlayer_norm python test/test_quantization.py TestQuantizedOps.test_group_norm python test/test_quantization.py TestQuantizedOps.test_instance_norm python test/test_quantization.py TestStaticQuantizedModule.test_layer_norm python test/test_quantization.py TestStaticQuantizedModule.test_group_norm python test/test_quantization.py TestStaticQuantizedModule.test_instance_norm python test/test_quantization.py TestQuantizeScriptPTSQOps.test_layer_norm python test/test_quantization.py TestQuantizeScriptPTSQOps.test_group_norm python test/test_quantization.py TestQuantizeScriptPTSQOps.test_instance_norm ``` Imported from OSS Differential Revision: D21885259 fbshipit-source-id: 978c7b8bd6c11a03e9e5fdb68f154cb80cc43599	2020-06-18 10:19:39 -07:00
Jerry Zhang	d4e4f13173	[quant][graphmode] Add support for detach (#40197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40197 Test Plan: Imported from OSS Differential Revision: D22106544 fbshipit-source-id: 047236bf8c7cb0813563e6c7bcd41b79dfa6fb2b	2020-06-18 10:01:43 -07:00
Kimish Patel	5f309505ce	Move the check on orig_weight sizes. (#40200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40200 Since original weights are removed by default in mobile build, the check must be moved to a place where orig_weight is still valid. Test Plan: CI Plus observed a model run crash which was resolved after this change. Reviewed By: supriyar Differential Revision: D22101562 fbshipit-source-id: 9543e69a415beaef2a9fb92dc9cd87d636174d51	2020-06-18 08:34:13 -07:00
Shihao Xu	f3f30d4354	[JIT x RPC] Consolidate RRef type class and RRef impl class (#35694 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35694 close https://github.com/pytorch/pytorch/issues/35110 Differential Revision: D7881729 fbshipit-source-id: eedda8f1b7510491886d469efeed4e002bb8b991	2020-06-18 07:46:38 -07:00
Luca Wehrstedt	7c9e78fdf5	[TensorPipe] Add options for agent, including backend killswitches (#40162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40162 The only public option is `num_worker_threads`. The other ones are private (as indicated by the leading underscore, is that enough?) and allow to specify a different set and order of transports/channels. These can thus be used to disable a backend (by not specifying it) or by forcing one (by raising its priority). They can therefore be used to work around defective backends, in case we'll find any post-release. ghstack-source-id: 106103238 Test Plan: Built //caffe2:ifbpy and, using TensorPipe's verbose logging, verified that the transports/channels I specified were indeed the ones that were being registered. Differential Revision: D22090661 fbshipit-source-id: 789bbe3bde4444cfa20c40276246e4ab67c50cd0	2020-06-18 02:54:17 -07:00
Pritam Damania	d1a0e88075	Ensure NCCL_BLOCKING_WAIT=1 works for dist.barrier() (#40207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40207 Blocking wait didn't work for dist.barrier() since we performed a cudaDeviceSynchronize() before we performed any of the timeout checks. As a result, in case of failures/desync the barrier() call would get stuck on cudaDeviceSynchrnonize() and would never return a timeout error to the user. To fix this, I've moved the device synchronization after the timeout checks. ghstack-source-id: 106123004 Test Plan: waitforbuildbot Differential Revision: D22108899 fbshipit-source-id: 6b109ef9357e9464e7d66b540caabf5801e6a44a	2020-06-17 23:44:59 -07:00
Nikita Shulga	4553b0b537	Reduce number of Window test configurations (#38482 ) Summary: After this diff, on PR following compilation configuration would be running: - VS2017 14.11, CUDA10.1 - VS2017 no CUDA, CUDA10.1 - VS2019, CUDA10.1 And tested: - VS2017 14.11, CUDA10.1 - VS2017 14.11 no CUDA (only 1st half of tests) - VS2017 14.11 force on CPU (only 1st half of test) And on master, we would be building both VS2017 14.11 and 14.16, but testing only VS2017 14.11 and VS2019 builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38482 Differential Revision: D22111743 Pulled By: malfet fbshipit-source-id: d660e4bc8f4f17a93f1cc18402cd5f2091b7789d	2020-06-17 23:40:31 -07:00
Jerry Zhang	fd7e09a52b	[quant][graphmode] Clean up and add more logging (#40196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40196 - separate passes in insert observers to make it more robust - added print for quantization type - added more logging for insert observers Test Plan: Imported from OSS Differential Revision: D22106545 fbshipit-source-id: 6d8d722e33c1259b1a6a501853c801c275dbfcff	2020-06-17 23:35:28 -07:00
Nikita Shulga	76fbfba644	Move _dummy_type to _utils.py (#40177 ) Summary: Use it from both __init__ and streams to define dummy types when CUDA is missing Fix accidental reference of global `storage_name` from `_dummy_type` Add type annotations Pull Request resolved: https://github.com/pytorch/pytorch/pull/40177 Differential Revision: D22106922 Pulled By: malfet fbshipit-source-id: 52fbfd91d70a78eb14d7ffda109c02ad1231497e	2020-06-17 22:50:02 -07:00
Gao, Xiang	efd9fc7434	Remove thrust::complex from sqrt (#39901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39901 Differential Revision: D22109238 Pulled By: pbelevich fbshipit-source-id: 96a72bd0df391b872f8e6d08fe7b5dca61b472ab	2020-06-17 20:39:14 -07:00
Kimish Patel	edd3fbc61e	Add aarch64 specific quantize_tensor using arm intrinsics. (#40113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40113 Earlier version covered only armv7, aka aarch32. This diff adds aarch64 stuff as well. ghstack-source-id: 105990688 Test Plan: CI Reviewed By: jerryzh168 Differential Revision: D22072779 fbshipit-source-id: c01f0b3f84394710339cf3b791832fcf68fcd4c0	2020-06-17 19:54:11 -07:00
Zhuobo Feng	fb02007e9f	Export box_cox operator in caffe2 Summary: Export box_cox operator in caffe2 Test Plan: Pass all unit tests Reviewed By: mingzhe09088 Differential Revision: D21515797 fbshipit-source-id: 777ee5e273caeab671ee2c22d133d3f628fb4a6e	2020-06-17 19:28:53 -07:00
Jerry Zhang	1800032712	[quant][graphmode] Add warning for prim::Loop (#40195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40195 Test Plan: Imported from OSS Differential Revision: D22106543 fbshipit-source-id: c0958cd2f977e8bdbbb1ac1befe51326b4619f94	2020-06-17 18:58:45 -07:00
Xingying Cheng	0b3755b1d0	Add optimization blacklist as second arg to optimizeForMobile method. (#37462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37462 Instead of running all the optimization pass in optimizeForMobile method, introducing a whitelist optimizer dictionary as second param in the method, when it is not passed during calling, the method will run all the optimization passes, otherwise the method will read the dict and only run the pass with value of True. ghstack-source-id: 106104503 Test Plan: python test/test_mobile_optimizer.py Imported from OSS Differential Revision: D22096029 fbshipit-source-id: daa9370c0510930f4c032328b225df0bcf97880f	2020-06-17 18:14:45 -07:00
Shen Li	30364f0b01	Remove obsolete warning message from DDP (#40190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40190 Fixed by #36503 Test Plan: Imported from OSS Differential Revision: D22101516 Pulled By: mrshenli fbshipit-source-id: 9abd6dce602530c11b7fe623ac0f4d556dccc961	2020-06-17 17:58:21 -07:00
Shen Li	74142f76fa	Adding torch.futures to API docs (#40051 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40051 Test Plan: Imported from OSS Differential Revision: D22055031 Pulled By: mrshenli fbshipit-source-id: ce8a79ba4ffdc7dbed6d4c62b1c33b96764c89e7	2020-06-17 17:55:48 -07:00
Wanchao Liang	693ab77c00	[test] split onnx export test out of test_jit (#40143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40143 as titled, to reduce size of test_jit Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D22085036 Pulled By: wanchaol fbshipit-source-id: 424f189fd3849c111d06ebe2e341da50d98fe0ec	2020-06-17 17:27:50 -07:00
Wanchao Liang	27d789500b	[test] split tracer related tests out of test_jit (#40142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40142 test_jit is becoming huge again, which makes editor hard to load and write new tests, this split out the tracer related tests. Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D22085035 Pulled By: wanchaol fbshipit-source-id: 696bee84985ecfbfeac8e2ee5c27f1bdda8de394	2020-06-17 17:26:33 -07:00
Pavel Belevich	e34e32850e	Revert D22076711: [Reland #3 ] Include AT_PARALLEL_OPENMP/AT_PARALLEL_NATIVE/AT_PARALLEL_NATIVE_TBB to ATen/Config.h Test Plan: revert-hammer Differential Revision: D22076711 Original commit changeset: fa7b6335ebb5 fbshipit-source-id: 254b6941482f855e81c666e786fc5a4a1b57864f	2020-06-17 16:49:16 -07:00
Jiakai Liu	a2ef54c598	[pytorch] fix CUDA_KERNEL_ASSERT macro for android build (#40151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40151 For debug android build it throws the following error: ``` In file included from src/pytorch/android/pytorch_android/src/main/cpp/pytorch_jni_common.cpp:9: In file included from src/pytorch/android/pytorch_android/src/main/cpp/pytorch_jni_common.h:2: In file included from ../../../../src/main/cpp/libtorch_include/armeabi-v7a/torch/csrc/api/include/torch/types.h:3: In file included from ../../../../src/main/cpp/libtorch_include/armeabi-v7a/ATen/ATen.h:5: In file included from ../../../../src/main/cpp/libtorch_include/armeabi-v7a/ATen/Context.h:4: In file included from ../../../../src/main/cpp/libtorch_include/armeabi-v7a/ATen/Tensor.h:3: In file included from ../../../../src/main/cpp/libtorch_include/armeabi-v7a/ATen/core/TensorBody.h:7: In file included from ../../../../src/main/cpp/libtorch_include/armeabi-v7a/c10/core/Scalar.h:13: ../../../../src/main/cpp/libtorch_include/armeabi-v7a/c10/util/TypeCast.h:157:22: error: use of undeclared identifier '__assert_fail' AT_FORALL_QINT_TYPES(DEFINE_UNCASTABLE) ^ ``` Seems __assert_fail() isn't available on Android by default - in NDEBUG mode it forward declares the function and CI passes. But CUDA_KERNEL_ASSERT() shouldn't be relevant for mobile build at all and we already bypass `__APPLE__` so the easiest fix is to add `__ANDROID__`. Test Plan: Imported from OSS Differential Revision: D22095562 Pulled By: ljk53 fbshipit-source-id: 793108a7bc64db161a0747761c0fbd70262e7d5a	2020-06-17 16:26:08 -07:00
Yinghai Lu	3ea15af630	[Onnxifi] Allow adding timeout for OnnxifOp run (#40081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40081 Adding the functionality to enable timeout of OnnxifiOp run. In the case of backend hanging, it can error out quickly. Test Plan: ``` buck test glow/fb/test:test_onnxifinnpi -- test_timeout ``` Reviewed By: jackm321 Differential Revision: D22064533 fbshipit-source-id: 25487287c10ab217eb95692f09d48e13e19436ab	2020-06-17 16:21:25 -07:00
Xiao Wang	1670ea9474	Remove overload of GPU max_pool3d with kernel_width; fix nan, inf in GPU {fractional,adaptive} max_pool{2,3}d (#39903 ) Summary: Fix https://github.com/pytorch/pytorch/issues/39846. Fix https://github.com/pytorch/pytorch/issues/39044 The problem was that `max_pool3d_with_indices_single_out_frame` has an overload of kernel_width being a template argument. The two overloaded kernels were supposed to be identical, however, they were not. The general version `da3073e9b1/aten/src/ATen/native/cuda/DilatedMaxPool3d.cu (L69-L73)` The overloaded version `da3073e9b1/aten/src/ATen/native/cuda/DilatedMaxPool3d.cu (L130-L134)` While the max_pool3d being "switch-case"-ed to the overloaded version, the NaN value comparison is ignored. Also, maintaining two overloaded versions of such a complicated kernel would be hard. I'm not sure if the overloaded version would even give huge performance benefit. So I propose to remove the kernel_width overloaded version. Also, the current test of max_pool_XD_nan forgot the device kwarg. I added that. Edit: profiling before and after script: https://github.com/xwang233/code-snippet/blob/master/maxpool-3d-kw-template-arg/a.py plot: https://github.com/xwang233/code-snippet/blob/master/maxpool-3d-kw-template-arg/b.ipynb The performance difference is within +- 5%. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39903 Differential Revision: D22080759 Pulled By: ngimel fbshipit-source-id: 4dacdd266a0522b3ff432eb9d58b131fa86821e9	2020-06-17 16:18:33 -07:00
Jeff Daily	7f0e4265ac	ROCm thunk work-around for future transition to ROCm 3.5.1 (#40181 ) Summary: ROCm CI hosts will have their kernels upgraded first to ROCm 3.5.1. CI images will follow soon after. Due to the thunk/kernel mismatch during the interim, this PR will detect the mismatch and upgrade the thunk during the build. This PR will be reverted once migration to ROCm 3.5.1 images is complete. CC ezyang xw285cornell Pull Request resolved: https://github.com/pytorch/pytorch/pull/40181 Differential Revision: D22104488 Pulled By: xw285cornell fbshipit-source-id: 7192e1d0bb25bfb814e9a85efb4aa29d0e52b460	2020-06-17 16:17:06 -07:00
Rohan Varma	f4ffe99da5	Fix flaky rref timeout test (#40141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40141 This rref timeout test could be flaky because we could end up processing `RRefUserDelete` messages on the owner node before processing the to_here message. This would result in a hang in `ProcessGroupAgent::sync()` that eventually results in a timeout. The rough sequence of what happens is: 0) Node 0 creates RRef on node 1 with rpc.remote() call 1) rref.to_here() is called with a timeout. Because of delay injection, the processing of this message can be delayed (this is also technically possible in applications without delay injection) 2) At some point, callbacks corresponding to rpc.remote() runs and confirms the rref, adding it as a confirmed user 3) RPC shutdown starts, as part of which we send out RRef user deletes. In this case, 0 sends an RRef user delete to 1, and node 1 removes the owner from the `owners_` field. 4) The `to_here()` message is finally processed by node 1. But since we have deleted the `owner_`, while processing this message we create a future that will be complete when the owner exists (this is to account for the case of to_here() arriving here rpc.remote). But this future will never complete, since the owner is already deleted, so we hang indefnitely As a workaround for now, we can force `to_here()` to run before RPC shutdown by adding a blocking `to_here()` call with no timeout. A more robust, longer-term fix would be to detect if an owner has been previously deleted (such as by an RRefUserDelete). Then, we know that the future corresponding to owner creation on the remote end will never completee, and then we error out when processing a `to_here()`. ghstack-source-id: 106036796 Differential Revision: D22084735 fbshipit-source-id: fe7265a4fe201c4d6d2f480f64fe085cd59dbfb2	2020-06-17 15:48:38 -07:00
Ilia Cherniavskii	34e28ede57	Fix flaky test (#40175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40175 Check that there is an increasing memory usage in the test Test Plan: CI Differential Revision: D22098192 Pulled By: ilia-cher fbshipit-source-id: bbdbc71f66baf18514332a98d8927441c61ebc16	2020-06-17 15:40:28 -07:00
Shihao Xu	bc9e8af218	[distributed.nn] Change remote module template instantiator to write to tmp folder (#40173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40173 - Avoid path sharing across runs and workers, so even the test methods/workers run in parallel on the same host, they don't interfere with each other. - On some environment (e.g. fb internal CI platform), the torch package file tree is not writable. But the temporary folder chosen by Python `tempfile` module is always writable, on linux it's "/tmp". close https://github.com/pytorch/pytorch/issues/40120 ghstack-source-id: 106086340 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/nn/jit:test_instantiator buck build mode/dev-nosan //caffe2/test/distributed/nn/jit:test_instantiator && \ buck-out/gen/caffe2/test/distributed/nn/jit/test_instantiator\#binary.par -r test_instantiate_scripted_remote_module_template buck build mode/dev-nosan //caffe2/test/distributed/nn/jit:test_instantiator && \ buck-out/gen/caffe2/test/distributed/nn/jit/test_instantiator\#binary.par -r test_instantiate_non_scripted_remote_module_template ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_fork ``` Differential Revision: D5708493 fbshipit-source-id: dd92695682433aaf79d1912c7956cef40a450eaf	2020-06-17 15:01:30 -07:00
Edward Yang	7f88f037ac	Stop running target bot on ci-all (#40186 ) Summary: So it can still be a useful way to get all the build configs that target specifier can't handle yet Pull Request resolved: https://github.com/pytorch/pytorch/pull/40186 Differential Revision: D22100671 Pulled By: ezyang fbshipit-source-id: df291705e717c0c7e7cf4d675b9d49a1eba54a1d	2020-06-17 14:44:55 -07:00
James Reed	b5bf21a6bd	[JIT] Expose `__deepcopy__` on script::Object (#40068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40068 Test Plan: Imported from OSS Differential Revision: D22058808 Pulled By: jamesr66a fbshipit-source-id: d8593b047c553389caea085337305ee893dc6877	2020-06-17 14:02:28 -07:00
Meghan Lele	1e03d603c6	[JIT] Infer NamedTuple type attributes of nn.Modules correctly (#39116 ) Summary: Summary This commit modifies type inference for `nn.Module` instance attributes such that the type of a `NamedTuple` attribute is inferred correctly and such that the field names of this `NamedTuple` instance can be used in scripted methods. At present, the type of this attribute is inferred to be `Tuple[T, U, ..., V]`, so the field must be referred to by index and cannot be referred to by name. Test Plan This commit adds a unit test to test that a field of a `NamedTuple` attribute can be referred to by name in a scripted method. Fixes This commit fixes https://github.com/pytorch/pytorch/issues/37668. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39116 Differential Revision: D22082806 Pulled By: SplitInfinity fbshipit-source-id: 1227e243ab941376cd5e382fb093751e88dc8846	2020-06-17 13:58:15 -07:00
Supriya Rao	c252dddcdd	[quant][graphmode] Test JIT tracing for dynamic quant cases (#40128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40128 Reland PR Test Plan: python test/test_quantization.py TestQuantizeDynamicScriptJitPasses Imported from OSS Differential Revision: D22081258 fbshipit-source-id: a3f7e26ea02ff8946f356afa7203129c6b3d658b	2020-06-17 13:41:56 -07:00
Supriya Rao	f6739ec8e8	[quant][graphmode] Refactor dynamic quant tests (#40127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40127 Reland PR. Similar to static quant, break it up into op level tests and tests for jit passes Test Plan: python test/test_quantization.py TestQuantizeScriptPTDQOps python test/test_quantization.py TestDynamicQuantizeScriptJitPasses Imported from OSS Differential Revision: D22081259 fbshipit-source-id: cef8f78f89ef8789683b52508379ae1b9ad00700	2020-06-17 13:40:19 -07:00
Richard Zou	2ba5f98dd1	Revert D22068657: [pytorch][PR] Remove global CMAKE_INSTALL_RPATH_USE_LINK_PATH directive Test Plan: revert-hammer Differential Revision: D22068657 Original commit changeset: b04c529572a9 fbshipit-source-id: d8227dfc12d9b6382f7bf2905686b6025034561c	2020-06-17 13:05:01 -07:00
Xiong Wei	55bcb5dccc	Fix inconsistent results of string `split` func on JIT mode (#38772 ) Summary: Resolve https://github.com/pytorch/pytorch/issues/38207 Below is the description of split function according to [Python doc](https://docs.python.org/3.8/library/stdtypes.html?highlight=split#str.split). ``` If sep is not specified or is None, a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. ``` The logic to handle both none and empty separators is added in register_string_ops.cpp as fix. Signed-off-by: Xiong Wei <xiongw.fnst@cn.fujitsu.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/38772 Differential Revision: D21789612 Pulled By: suo fbshipit-source-id: 4dfd74eda71e0bfd757378daedc927a4a63ec0e4	2020-06-17 12:43:36 -07:00
Emilio Castillo	5e77999ecb	Add global hooks to `torch.nn.Module` (#38972 ) Summary: This allows registering hooks that will be executed for every module. This idea arose in a discussion with tkerola and niboshi kindly proposed this approach. The use case for this is to avoid boilerplate code when registering the same hook for all the modules in a complex model, the internal use-case was to allow every model to accept a NumPy array in the forward pass in a simpler way. Other use cases involve general mechanisms for plotting or tracing & debugging. Currently, this is shared for all the modules but this can be worked out to have the hooks shared only per type of module. If this functionality is not needed feel free to close the PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38972 Differential Revision: D22091364 Pulled By: albanD fbshipit-source-id: 204ff5f9e119eff5bdd9140c64cb5dc467bb23a2	2020-06-17 12:20:35 -07:00
Pavel Belevich	70192c651c	[Reland #3 ] Include AT_PARALLEL_OPENMP/AT_PARALLEL_NATIVE/AT_PARALLEL_NATIVE_TBB to ATen/Config.h (#40122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40122 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22076711 Pulled By: pbelevich fbshipit-source-id: fa7b6335ebb5ef2ccf51dc60d9f4079e70f612ba	2020-06-17 12:10:43 -07:00
Pavel Belevich	95e51bb7f8	change BuildExtension.with_options to return a class not a c-tor (#40121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40121 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D22076634 Pulled By: pbelevich fbshipit-source-id: a89740baf75208065e418d7f972eeb52db9ee3cf	2020-06-17 12:09:09 -07:00
Ivan Kobzarev	a71aefe857	[android][test_app] cleanup (#40136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40136 Test Plan: Imported from OSS Differential Revision: D22084170 Pulled By: IvanKobzarev fbshipit-source-id: f8d2d0494b3ac4f7fe2118238d621155d697d2c4	2020-06-17 11:07:44 -07:00
Luca Wehrstedt	c958dd5472	[TensorPipe] Add guards against transferring GPU tensors (#40167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40167 In v1.6 TensorPipe will not support transferring GPU tensors so, just like other agents, it should raise the appropriate errors when the user attempts to do so. One such error is when sending the arguments, another is when sending the result. ghstack-source-id: 106059723 Test Plan: Re-enabled the test for this Differential Revision: D22091737 fbshipit-source-id: 23dda98bc006333c6179361e8cfaf00ecda06408	2020-06-17 10:51:22 -07:00
Alban Desmaison	08227fea4f	Revert D22079377: [pytorch][PR] [RELAND] Change AccumulateGrad to yield `.grad`s that match weights' memory layout Test Plan: revert-hammer Differential Revision: D22079377 Original commit changeset: 9bd2b7e0c34f fbshipit-source-id: c22cc349d790caa574eace0d63980854c33e5a59	2020-06-17 10:17:27 -07:00
Michael Carilli	1ec8ece2b9	[RELAND] Change AccumulateGrad to yield `.grad`s that match weights' memory layout (#40129 ) Summary: https://github.com/pytorch/pytorch/pull/34904 was reverted because it had a misconfigured 4 GPU test that for some reason wasn't caught by external CI ([example failure](https://app.circleci.com/pipelines/github/pytorch/pytorch/181719/workflows/cfb37cd9-9a0c-4738-898b-d683934cd308/jobs/5868948/steps)). This PR reverts the revert, and adds diffs that should repair the misconfigured test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40129 Differential Revision: D22079377 Pulled By: albanD fbshipit-source-id: 9bd2b7e0c34fdaf887497b52037cfe82cba709c1	2020-06-17 09:02:54 -07:00
Emilio Castillo	5200814cfa	Fix test_hook_* issues (#40135 ) Summary: Follows https://github.com/pytorch/pytorch/issues/38972 Some of the changes asked by albanD in the above review are appliable to the regular hooks tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40135 Differential Revision: D22091389 Pulled By: albanD fbshipit-source-id: e1004213276bfb189167b9870e1a88b3d23b458c	2020-06-17 08:50:42 -07:00
Kimish Patel	216f512be2	Remove requirement of qnnpack engine for arm build. (#40112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40112 The changed code should be run for arm build even if qnnpack engine is not enabled. Furthermore the way AT_DISPATCH* stubs are defined, it just forms a lambda out of the __VA__ARGS and executes the lambda. Thus return inside such lambda just return to the original function and we end up executing the fallback path as well. Thus also changed #endif to #else...#endif. This was causing per regression on mobile in one of the models. ghstack-source-id: 105990691 Test Plan: CI Reviewed By: supriyar Differential Revision: D22072780 fbshipit-source-id: b12ca66aa19834b97b3eb0067af4e656cb8b3241	2020-06-17 08:45:55 -07:00
Richard Zou	8619d26338	Add batching rule for torch.expand (#40097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40097 This is (probably) necessary for the vmap frontend API (coming up after this PR should be the vmap frontend API). There is some manual handling of sizes in the `expand_batching_rule`. In particular, when performing expand(Tensor[B0, 3], [2, 3]), where B0 is a batch dimension and Tensor[B0, 3] is a batched tensor with batch dimension B0, we can't call expand directly on the physical view and instead first need to perform a view. It's possible to add said view as a helper function on `VmapPhysicalView` but after reading through the operator spreadsheet the conclusion was that no other operator needs the same manual handling. Test Plan: - `./build/bin/vmap_test` Differential Revision: D22070657 Pulled By: zou3519 fbshipit-source-id: 911854b078a1a5c7d5934ef2e17b16673ed9d103	2020-06-17 08:42:17 -07:00
Richard Zou	dec62dbfa3	Change VmapTransforms to use SmallVector instead of `vector<int64_t>` (#40042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40042 See title. Dynamic allocations are generally bad for performance. This change was not benchmarked because we have not gotten to the stage where we want to benchmark performance. Test Plan: - `./build/bin/vmap_test` Differential Revision: D22070656 Pulled By: zou3519 fbshipit-source-id: f6cf74a357bb52b75c0a02f1f82495c0a5329a28	2020-06-17 08:42:15 -07:00
Richard Zou	161fd5f507	Implement tensor.size(int) for BatchedTensor (#40028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40028 We have this call native::size directly. Some alternatives I considered were: - Call VariableType::size directly. That seems isomorphic to what we're doing now. - when creating a BatchedTensor from a regular tensor, put all of the keys on that tensor into the BatchedTensor's dispatch key set and use the dispatcher fallthrough mechanism. That seems weird because BatchedTensor is a tensor wrapper and also error prone because if BatchedTensor gets the VariableType key, there's a chance that if something goes wrong, an autogradmeta gets created on it... Test Plan: - `./build/bin/vmap_test` Differential Revision: D22070655 Pulled By: zou3519 fbshipit-source-id: 18530579ad41f3c4f96589da41eb24a46caf7af9	2020-06-17 08:40:41 -07:00
Gao, Xiang	dea58a7660	[resubmit] Kill thrust::complex from log kernels (#40079 ) Summary: Use `::log` instead of `std::log` for better ROCm support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40079 Differential Revision: D22068554 Pulled By: pbelevich fbshipit-source-id: a458ae34535a641832f816617387a45445e2fa48	2020-06-17 05:57:10 -07:00
Luca Wehrstedt	44c7a2ab69	[TensorPipe] Silence some more harmless warnings (#40094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40094 In #39182 we already silenced a few warnings when they were caused by expected errors, but left one case out, namely errors on an incoming pipe. The idea was to introduce a "proper" way of detecting these, for example by having the remote end send an empty message to indicate an intentional shutdown. I don't know if we'll have time to do that in time for v1.6, so as a temporary solution I'm implementing some approximation which, although imperfect, should cover most errors. I also made the warning message less scary by adding a clarification. ghstack-source-id: 105969540 Test Plan: Unit tests Differential Revision: D22067818 fbshipit-source-id: b2e2a37d633f21bca4a2873a05ad92b853dde079	2020-06-17 02:44:50 -07:00
Linbin Yu	0152baa33a	move some math ops back to full jit (#40149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40149 Many math ops are moved to lite interpreter in D21992552, but some ops (like log) also have tensor version and we didn't check duplicated names in this case. This breaks some existing models. Move back most ops for now until we have a cleaner solution Test Plan: build Reviewed By: pengtxiafb Differential Revision: D22085208 fbshipit-source-id: 951805f43f84bd614cf914c17e00444a122158e4	2020-06-17 02:07:57 -07:00
Tao Xu	6de6041585	[iOS] Disable NNPACK on iOS builds (#39868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39868 ### Summary why disable NNPACK on iOS - To stay consistency with our internal version - It's currently blocking some external users due to its lack support of x86 architecture - https://github.com/pytorch/pytorch/issues/32040 - https://discuss.pytorch.org/t/undefined-symbols-for-architecture-x86-64-for-libtorch-in-swift-unit-test/84552/6 - NNPACK uses fast convolution algorithms (FFT, winograd) to reduce the computational complexity of convolutions with large kernel size. The algorithmic speedup is limited to specific conv params which are unlikely to appear in mobile networks. - Since XNNPACK has been enabled, it performs much better than NNPACK on depthwise-separable convolutions which is the algorithm being used by most of mobile computer vision networks. ### Test Plan - CI Checks Test Plan: Imported from OSS Differential Revision: D22087365 Pulled By: xta0 fbshipit-source-id: 89a959b0736c1f8703eff10723a8fbd02357fd4a	2020-06-17 01:39:56 -07:00
Mike Ruberry	9d588f7ce2	Removes dunder div (#39151 ) Summary: BC-breaking note: If a user is using one of these dunders directly they will not longer be available. Users should update to Python3 compatible dunders. Original PR note: `__div__` (and `__idiv__` and `__rdiv__`) are no longer special dunders in Python3. This PR replaces them with the `__truediv__` (`__itrudediv__`, `__rtruediv__`) dunders, since we no longer support Python2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39151 Differential Revision: D22075713 Pulled By: mruberry fbshipit-source-id: d318b47b51f7cc4c3728b1606a34d81e49ba0fa1	2020-06-16 23:02:20 -07:00
Yinghai Lu	00505adbad	Add net_pos Tiles added during in-batch broadcast (#40078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40078 att. It's good to have net_pos for all the ops so that we can distinguish each op in minimizer in net_runner. Test Plan: unittest Reviewed By: ipiszy, ChunliF Differential Revision: D22062748 fbshipit-source-id: 5266abdb6dde63055fdffdba6e8d65bd0f221d7b	2020-06-16 21:51:18 -07:00
Jiakai Liu	e7a3a43d8f	[pytorch] upload android build size to scuba (#40010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40010 Use `upload_binary_size_to_scuba.py` to extract android library size and upload to scuba. Sample data: https://fburl.com/scuba/pytorch_binary_size/fz882auc ``` +-----------------------------------------------------------+-------------------+-----------+---------+-------------------------------------------------+------------------------------------------+----------+ \| Build Name \| Branch \| Build Num \| Os \| Pkg Type \| Sha1 \| Size \| +-----------------------------------------------------------+-------------------+-----------+---------+-------------------------------------------------+------------------------------------------+----------+ \| linux_libtorch_3.7m_cpu \| gh/ljk53/149/head \| 5842365 \| linux \| libtorch \| f6821a3708bb2e88d9482f5ff60ecc6bcc60d85c \| 111 MiB \| \| android_prebuild/aar/pytorch_android-release.aar__ \| gh/ljk53/149/head \| 5842981 \| android \| prebuild/aar/pytorch_android-release.aar \| f6821a3708bb2e88d9482f5ff60ecc6bcc60d85c \| 29.1 MiB \| \| android_prebuild-single/aar/pytorch_android-release.aar__ \| gh/ljk53/149/head \| 5842974 \| android \| prebuild-single/aar/pytorch_android-release.aar \| f6821a3708bb2e88d9482f5ff60ecc6bcc60d85c \| 7.96 MiB \| \| android_prebuild/x86_64/libpytorch_jni.so__ \| gh/ljk53/149/head \| 5842981 \| android \| prebuild/x86_64/libpytorch_jni.so \| f6821a3708bb2e88d9482f5ff60ecc6bcc60d85c \| 7.65 MiB \| \| android_prebuild-single/x86/libpytorch_jni.so__ \| gh/ljk53/149/head \| 5842974 \| android \| prebuild-single/x86/libpytorch_jni.so \| f6821a3708bb2e88d9482f5ff60ecc6bcc60d85c \| 7.62 MiB \| \| android_prebuild/x86/libpytorch_jni.so__ \| gh/ljk53/149/head \| 5842981 \| android \| prebuild/x86/libpytorch_jni.so \| f6821a3708bb2e88d9482f5ff60ecc6bcc60d85c \| 7.62 MiB \| \| android_prebuild/arm64-v8a/libpytorch_jni.so__ \| gh/ljk53/149/head \| 5842981 \| android \| prebuild/arm64-v8a/libpytorch_jni.so \| f6821a3708bb2e88d9482f5ff60ecc6bcc60d85c \| 6.44 MiB \| \| android_prebuild/armeabi-v7a/libpytorch_jni.so__ \| gh/ljk53/149/head \| 5842981 \| android \| prebuild/armeabi-v7a/libpytorch_jni.so \| f6821a3708bb2e88d9482f5ff60ecc6bcc60d85c \| 6.23 MiB \| \| android_prebuild/x86_64/libfbjni.so__ \| gh/ljk53/149/head \| 5842981 \| android \| prebuild/x86_64/libfbjni.so \| f6821a3708bb2e88d9482f5ff60ecc6bcc60d85c \| 319 KiB \| \| android_prebuild-single/x86/libfbjni.so__ \| gh/ljk53/149/head \| 5842974 \| android \| prebuild-single/x86/libfbjni.so \| f6821a3708bb2e88d9482f5ff60ecc6bcc60d85c \| 319 KiB \| \| android_prebuild/x86/libfbjni.so__ \| gh/ljk53/149/head \| 5842981 \| android \| prebuild/x86/libfbjni.so \| f6821a3708bb2e88d9482f5ff60ecc6bcc60d85c \| 319 KiB \| \| android_prebuild/arm64-v8a/libfbjni.so__ \| gh/ljk53/149/head \| 5842981 \| android \| prebuild/arm64-v8a/libfbjni.so \| f6821a3708bb2e88d9482f5ff60ecc6bcc60d85c \| 282 KiB \| \| android_prebuild/armeabi-v7a/libfbjni.so__ \| gh/ljk53/149/head \| 5842981 \| android \| prebuild/armeabi-v7a/libfbjni.so \| f6821a3708bb2e88d9482f5ff60ecc6bcc60d85c \| 214 KiB \| +-----------------------------------------------------------+-------------------+-----------+---------+-------------------------------------------------+------------------------------------------+----------+ ``` Test Plan: Imported from OSS Differential Revision: D22040439 Pulled By: ljk53 fbshipit-source-id: 39116c768067edf25391428e36e5c403ad0715a5	2020-06-16 21:31:26 -07:00
Raghuraman Krishnamoorthi	3258cb61b1	Dynamic quantization support for LSTMCell, RNNCell and GRUCell [Remove randomness in weights] (#40102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40102 Enable dynamic quantization for LSTMCell, RNNCell and GRUCell ghstack-source-id: 105997236 (Note: this ignores all push blocking failures!) Test Plan: buck test caffe2/test:quantization -- 'test_quantized_rnn_cell $quantization\.test_quantize\.TestPostTrainingDynamic$' Differential Revision: D22071017 fbshipit-source-id: 3fe1eac39db9c1e0566838eb8b969bbb1fa983c9	2020-06-16 21:29:50 -07:00
Rohan Varma	03529ed14d	Remove hacky double registration of to_here op in reg_distributed_ops (#39602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39602 This was added as a part of https://github.com/pytorch/pytorch/pull/38590 but we can use default arguments here. We use fmt:;format to bind the default value to the rpc timeout at runtime. ghstack-source-id: 105983645 Test Plan: Ci Differential Revision: D21912719 fbshipit-source-id: 7525c1322a95126f529301be142248af48565b82	2020-06-16 20:19:39 -07:00
Pritam Damania	15823ac6d5	Enhance DDP docstrings for DDP + RPC support. (#39916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39916 ghstack-source-id: 105999275 Test Plan: waitforbuildbot Differential Revision: D22013190 fbshipit-source-id: be3bb12b2281579610581b809c822ab6b027fa71	2020-06-16 20:05:13 -07:00
Shawn Zhong	23739654cd	Resubmit Remove `THTensor_(fill)` & `THTensor_(zero)` (#40108 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40108 Remove `THTensor_(fill)` & `THTensor_(zero)` following the PR https://github.com/pytorch/pytorch/pull/39042 as reference Pull Request resolved: https://github.com/pytorch/pytorch/pull/39727 Test Plan: buck test caffe2/caffe2/fb/python:pytorch_func_test Reviewed By: dzhulgakov Differential Revision: D22070199 Pulled By: ngimel fbshipit-source-id: d32ff0cc0dbc8a80b49ce184f08bda34ad0f2668	2020-06-16 19:07:37 -07:00
Ivan Kobzarev	bf544c4a7b	[android][fbjni] Test_app and Readme update with the recent fbjni dep state (#40058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40058 Test Plan: Imported from OSS Differential Revision: D22054574 Pulled By: IvanKobzarev fbshipit-source-id: 751e5bd5103aa869702356fc181f458fe4fcfc83	2020-06-16 18:42:56 -07:00
Jerry Zhang	f42c948df5	[quant][graphmode] Support another use pattern of mean (#40038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40038 Test Plan: Imported from OSS Differential Revision: D22055696 fbshipit-source-id: 776196ce3d743deb8335d237bf5ef0fa67f7f26d	2020-06-16 18:37:21 -07:00
Jade Nie	dcec099d48	Wrap Caffe2 (RowWise)SparseAdagrad fusion operator as a PT op (#39904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39904 This diff wraps Caffe2's (RowWise)SparseAdagrad fusion operator on GPU as a PT op. Reviewed By: jianyuh Differential Revision: D22010193 fbshipit-source-id: 5df3c506c0dadd3b21180829fd2d5084ac76abc3	2020-06-16 17:40:05 -07:00
Raghuraman Krishnamoorthi	15758bca55	Refactor LSTM tests, [Remove randomness in weights] (#40101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40101 Create three tests for LSTMs: 1. test_qlstm: Test to check numerics of quantized LSTM operator. 2. test_lstm_api: To check the LSTM module and compare it with the quantized LSTM op 3. test_quantized_rnn: Check the dynamic quantization workflow, scriptability and serialization of quantized LSTM ghstack-source-id: 105997268 (Note: this ignores all push blocking failures!) Test Plan: buck test caffe2/test:quantization -- 'test_lstm_api $quantization\.test_quantized_module\.TestDynamicQuantizedModule$' --print-passing-details buck test caffe2/test:quantization -- 'test_quantized_rnn $quantization\.test_quantize\.TestPostTrainingDynamic$' buck test caffe2/test:quantization -- 'test_qlstm $quantization\.test_quantized_op\.TestDynamicQuantizedRNNOp$' --print-passing-details Differential Revision: D22070826 fbshipit-source-id: 46c333e19b9eab8fa5cab6f132e89b80a635791a	2020-06-16 17:24:07 -07:00
Mike Ruberry	3d8de74e17	Bumps readable file format version for torch.full inferring float from int values (#40089 ) Summary: Reserves file format version 5 for marking when torch.full(int)->FloatTensor will be deprecated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40089 Differential Revision: D22066359 Pulled By: mruberry fbshipit-source-id: 6158e03ca75e3795a2641123ff23d67975170f44	2020-06-16 15:09:40 -07:00
Supriya Rao	b5d54db6f4	Revert D22071278: [quant][graphmode] Refactor dynamic quant tests Test Plan: revert-hammer Differential Revision: D22071278 Original commit changeset: 54292addcfbc fbshipit-source-id: 20ffbea0fd05e974b31381437c61040b5b24c993	2020-06-16 15:01:05 -07:00
Supriya Rao	cb1a1942ee	Revert D22071277: [quant][graphmode] Test JIT tracing for dynamic quant cases Test Plan: revert-hammer Differential Revision: D22071277 Original commit changeset: e8aa8637e636 fbshipit-source-id: e89c3e03a7d695e1d4f5ff8d8c5172633db83984	2020-06-16 14:59:09 -07:00
Sean Lynch	64689c2474	Remove unecessary copy within blob serialization (#40096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40096 Declaring `tensor_proto` to be of type `auto` means that it will copy the entire `TensorProto` instead of just keeping a reference. This changes it to just use a const reference instead. Test Plan: Using the model loader benchmark to measure model loading performance: ### `tensor_proto` is of type `const auto&` ``` ============================================================================ caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative time/iter iters/s ============================================================================ BlobProtoInt32DeserializationFloat16 11.08ms 90.27 BlobProtoByteDeserializationFloat16 1509.73% 733.73us 1.36K ---------------------------------------------------------------------------- BlobProtoInt32DeserializationUInt8 10.48ms 95.45 BlobProtoByteDeserializationUInt8 2974.57% 352.22us 2.84K ============================================================================ ``` ### `tensor_proto` is of type `auto` ``` ============================================================================ caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative time/iter iters/s ============================================================================ BlobProtoInt32DeserializationFloat16 13.84ms 72.26 BlobProtoByteDeserializationFloat16 658.85% 2.10ms 476.08 ---------------------------------------------------------------------------- BlobProtoInt32DeserializationUInt8 17.09ms 58.51 BlobProtoByteDeserializationUInt8 3365.98% 507.80us 1.97K ============================================================================ ``` Reviewed By: marksantaniello Differential Revision: D21959644 fbshipit-source-id: 6bc2dfbde306f88bf7cd4f9b14b95ac69c2e1b4d	2020-06-16 14:45:59 -07:00
Edward Yang	ba98c0e38c	Split TensorIteratorConfig out of TensorIterator (#39803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39803 The basic concept is to make it more clear what the construction side API is, as opposed to the "I want to actually do kernel stuff with TensorIterator" API (which has been kept on TensorIterator.) In fact, most of the stuff in TensorIteratorConfig isn't used by TensorIterator later, so it can be dropped entirely after construction. Before: ``` TensorIterator iter; iter.config1(); iter.config2(); iter.config3(); iter.build(); // use iter ``` Now: ``` TensorIterator iter = TensorIteratorConfig() .config1() .config2() .config3() .build(); // use iter ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D22018845 Pulled By: ezyang fbshipit-source-id: 5baca9a4dc87149d71a44489da56d299f9b12b34	2020-06-16 14:33:18 -07:00
Venkata Chintapalli	54c0ee1ebc	LayerNorm use Fused Multiply and Add (#40012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40012 Use Fused Multiply and Add Test Plan: Tested using the test_layernorm_nnpi_fp16.py test case. Reviewed By: hyuen Differential Revision: D22039340 fbshipit-source-id: d979daac152f885318ddcbbb9d7108219d4743e9	2020-06-16 14:27:00 -07:00
Ksenija Stanojevic	da8cd8260b	Fix KeypointRCNN test (#39589 ) Summary: Since Argmax is updated in ONNX Runtime we can enable testing for all output, including keypoints_scores. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39589 Reviewed By: hl475 Differential Revision: D21992264 Pulled By: houseroad fbshipit-source-id: a390b4628d2ac290902b9e651c69d47db9be540f	2020-06-16 13:45:23 -07:00
Sebastian Messmer	f69b72c738	Back out "Revert D21986243: TORCH_FN" (#40110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40110 Original commit changeset: 72c690c2b4c2 ghstack-source-id: 105993222 Test Plan: waitforsandcastle Differential Revision: D22072829 fbshipit-source-id: 0bc1a3e389e2afb05688c472793d34eaddb67f2a	2020-06-16 13:38:29 -07:00
Jerry Zhang	41fa4bef2a	[quant] Support general op modules with inplace options (#39919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39919 e.g. torch.nn.ReLU6(inplace=True) looks like this is already supported, but somehow it is not working in tutorial Test Plan: Imported from OSS Differential Revision: D22055695 fbshipit-source-id: 78a55b963cd3fac06f952f83c7c61c717cc839cc	2020-06-16 13:19:14 -07:00
Supriya Rao	fa4244d783	[quant][graphmode] Test JIT tracing for dynamic quant cases (#40040 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40040 Test Plan: python test/test_quantization.py TestQuantizeDynamicScriptJitPasses Imported from OSS Differential Revision: D22071277 fbshipit-source-id: e8aa8637e6364092b6ff1c3a48dfc4551eb645ec	2020-06-16 13:16:42 -07:00
Supriya Rao	ddeaa74382	[quant][graphmode] Refactor dynamic quant tests (#40039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40039 Similar to static quant, break it up into op level tests and tests for jit passes Test Plan: python test/test_quantization.py TestQuantizeScriptPTDQOps python test/test_quantization.py TestDynamicQuantizeScriptJitPasses Imported from OSS Differential Revision: D22071278 fbshipit-source-id: 54292addcfbc00f7af960fb333921db2ff9fda04	2020-06-16 13:14:48 -07:00
Jerry Zhang	461aa8a1e2	[quant][graphmode] Support quantizing `repeat` (#39925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39925 Test Plan: Imported from OSS Differential Revision: D22014883 fbshipit-source-id: 3076e64948f7ebdd99355f32185e2c716b113b43	2020-06-16 12:29:11 -07:00
Peter Bell	ee365c58e1	Fix destructor ordering for cuda handle pools (#39345 ) Summary: Possible fix for gh-38385. Unfortunately, I haven't been able to reproduce the issue reliably, so can't say for certain. Since this appears to be a destruction ordering issue, I've focused on making the destructor calls well-ordered: - Each pool is now a function-local `static` instead of a global variable. This ensures the destructor happens before any relevant pytorch global state is destroyed. - Each pool window now only stores a `std::weak_ptr` to the global pool. This means it can't extend the lifetime of the pool outside of the normal destructor ordering. That does also mean that if the `weak_ptr` is invalid, the handles will get leaked. However, that shouldn't happen under normal use. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39345 Differential Revision: D22044376 Pulled By: ezyang fbshipit-source-id: da1713b42c143ed1452a6edf1ecb05cd45743c7a	2020-06-16 12:23:27 -07:00
Pritam Damania	145df306ae	Avoid using default process group in ProcessGroupAgent. (#39909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39909 As described in https://github.com/pytorch/pytorch/issues/33583, ProcessGroupAgent initializes the default process group and this causes issues if the user initializes the default process group themsleves. Either the RPC initialization would fail or the user's process group initialization would fail. To avoid this, I've changed ProcessGroupAgent init to create its own ProcessGroupGloo and not use the default one at all. Closes: https://github.com/pytorch/pytorch/issues/33583 ghstack-source-id: 105953303 Test Plan: waitforbuildbot Differential Revision: D22011868 fbshipit-source-id: 7346a3fcb2821a0bc08e0bdc0625947abb5ae16f	2020-06-16 12:00:29 -07:00
Linbin Yu	7021635d61	fix more duplicated names (#40062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40062 fix duplicated op names after D21992552 Test Plan: build Reviewed By: iseeyuan Differential Revision: D22056588 fbshipit-source-id: 6d2fcf16b5b86b30b6ac7a4107b20c8cfb6816b0	2020-06-16 11:47:05 -07:00
pinzhenx	7f270233fb	Upgrade DNNL to 1.5 (#40088 ) Summary: - Bump DNNL to 1.5 - Bug fixes and improvements in ideep - suppress g++ Wreorder warning - avoid rebuilding `libmkldnn.so` https://github.com/oneapi-src/oneDNN/issues/743 - enable conv3d (integration code was checked in by Xiaobing https://github.com/pytorch/pytorch/pull/35662) Pull Request resolved: https://github.com/pytorch/pytorch/pull/40088 Differential Revision: D22071530 Pulled By: albanD fbshipit-source-id: e7a53d7421e8a7a03e36a7dfb68edc565a2f00df	2020-06-16 11:42:30 -07:00
Jerry Zhang	ec1833bc3c	Revert D22069566: Revert D22013026: [quant][graphmode] Pass debug option into insert_quant_dequant pass Test Plan: revert-hammer Differential Revision: D22069566 Original commit changeset: 6230bc806089 fbshipit-source-id: 930490ab0b6a017c949445620e7c6b7056693998	2020-06-16 11:37:33 -07:00
mattip	49732f0450	Remove global CMAKE_INSTALL_RPATH_USE_LINK_PATH directive (#37737 ) Summary: Closes gh-35418, PR gh-16414 added [the `CMAKE_INSTALL_RPATH_USE_LINK_PATH`directive](https://github.com/pytorch/pytorch/pull/16414/files#diff-dcf5891602b4162c36c2125c806639c5R16) which is non-standard and will cause CMake to write an `RPATH` entry for libraries outside the current build. Removing it leaves an RPATH entry for `$ORIGIN` but removes the entries for things like `/usr/local/cuda-10.2/lib64/stubs:/usr/local/cuda-10.2/lib64` for `libcaffe2_nvrtc.so` on linux. The added test fails before this PR, passes after. It is equivalent to checking `objdump -p torch/lib/libcaffe2_nvrtc.so \| grep RPATH` for an external path to the directory where cuda "lives" I am not sure if it solve the `rpath/libc++.1.dylib` problem for `_C.cpython-37m-darwin.so` on macOS in issue gh-36941 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37737 Differential Revision: D22068657 Pulled By: ezyang fbshipit-source-id: b04c529572a94363855f1e4dd3e93c9db3c85657	2020-06-16 11:18:39 -07:00
Xiang Gao	d57ca73c53	Remove item and data_ptr for std::complex (#39838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39838 Differential Revision: D22068251 Pulled By: ezyang fbshipit-source-id: d1f0e1ff98290a139f1a080a9f7a1258943cd3ad	2020-06-16 11:13:54 -07:00
Jerry Zhang	181ea1acce	[quant][graphmode] Support squeeze/unsqueeze (#39924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39924 Test Plan: Imported from OSS Differential Revision: D22014713 fbshipit-source-id: 76d6d8509062ff9203979b0d2f0cfb01778b3c2f	2020-06-16 11:03:32 -07:00
Nikita Shulga	f1a5f66115	[xplat] Add Windows specific ATen build definitions (#40092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40092 Move <windows.h> include in THAllocator after header that might include glog is included Test Plan: buck build xplat/mode/arstudio/windows //xplat/caffe2:aten_cpuWindows Reviewed By: nlutsenko Differential Revision: D22061135 fbshipit-source-id: 10f51955c0092761a96bc6169236c6e07b412313	2020-06-16 10:57:02 -07:00
Elias Ellison	b3dd4d9c33	[JIT] remove callable check to compile objects with __call__ (#40041 ) Summary: Fix for https://github.com/pytorch/vision/issues/2320 - still need to fix whatever reverting this change breaks EDIT: reverting this change doesnt seem to break anything, and fixes the torchvision issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/40041 Reviewed By: eellison Differential Revision: D22067586 Pulled By: fmassa fbshipit-source-id: 4b235fd3a69665dcc5689f12310097be31b40a28	2020-06-16 10:52:38 -07:00
Alban Desmaison	f1e575a0bf	Revert D20496044: [pytorch][PR] Change AccumulateGrad to yield `.grad`s that match weights' memory layout Test Plan: revert-hammer Differential Revision: D20496044 Original commit changeset: 248d680f4b1b fbshipit-source-id: 6462b25e3fb9c8596c1da443389089f09c32df4d	2020-06-16 10:38:40 -07:00
Mingfei Ma	4b5530de72	optimize upsample performance linear mode on CPU (#34864 ) Summary: This pr aims at improving `nn.UpSample()` performance on CPU with mode `linear`, `bilinear`, `trilinear`. For single socket inference, up to 31x performance improvement. For single core inference, up to 1.8x performance improvement. For dual socket training, up to 28x performance improvement. `channel last` format kernel also provided. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34864 Differential Revision: D20772990 Pulled By: ngimel fbshipit-source-id: a48307f2072227f20e742ebbd4a093bb29537d19	2020-06-16 10:36:58 -07:00
Luca Wehrstedt	5843854e66	[TensorPipe] Fix transport/channel priorities (#40090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40090 I messed up in #39957: TensorPipe used to have a bug where it inverted priorities and preferred lower ones over higher ones. I had fixed that bug at the same time as I was writing that PR but then forgot to update the priority values once that PR landed. So this meant that TensorPipe was trying to bootstrap using SHM and then upgrade to UV. That worked in our tests because they are all run on the same machine, but that broke using TensorPipe across different machines. I'll take suggestions on how to have tests in place to prevent this type of breakages from happening. The silver lining is that for some time our tests were testing the UV transport, instead of the SHM one, and it seems to be working alright. ;) ghstack-source-id: 105967203 Differential Revision: D22067264 fbshipit-source-id: c6e3ae7a86038714cfba754b0811ca8a9a6f1347	2020-06-16 10:28:42 -07:00
mattip	dd581b4512	DOC: fix rpc reference in top-level index (#40077 ) Summary: Fixes gh-40046 PR gh-37419 refactored the content of `docs/source/rpc/index.rst` into `docs/source/rpc.rst` but did not link to the latter from `doc/source/index.rst` so the top-level RPC documentation is missing from https://pytorch.org/docs/master/. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40077 Differential Revision: D22068128 Pulled By: mrshenli fbshipit-source-id: 394433f98f86509e0c9cb6d910a86fb8a2932683	2020-06-16 10:26:03 -07:00
Richard Zou	56b4b44107	Batching rule for torch.mul (#39859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39859 This PR implements the batching rule for `torch.mul` for the (Tensor, Tensor) overload. NB: ~250 lines of this PR are tests, so please don't be scared away by the line count. It introduces the BroadcastingVmapTransform, which is the VmapTransform one should use for operations that broadcast their inputs. This transform: - permutes all batch dimensions to the front of the tensors - aligns the batch dimensions of the tensors, adding extra 1's where necessary - aligns the non-batch dims of the tensors, adding extra 1's where necessary. Test Plan: - Test BroadcastingVmapTransform in `./build/bin/vmap_test`. - Test mul_batching_rule in `./build/bin/vmap_test`. Differential Revision: D22067337 Pulled By: zou3519 fbshipit-source-id: 5862da8c2b28699b08c7884342a1621581cb2e7f	2020-06-16 10:25:59 -07:00
Meghan Lele	33b82c7271	[JIT] Add registry for backend lowering functions (#39552 ) Summary: Summary This commit adds a registry for storing lowering functions for backends. Instead of backends registering these lowering functions in separate C extension modules, these will be registered in the Torch extension. Backends are registered statically, so a registry is needed to hold these lowering functions until Python bindings are created. Test Plan `python test/test_jit.py TestBackends` ``` Couldn't download test skip set, leaving all tests enabled... .. ---------------------------------------------------------------------- Ran 2 tests in 0.104s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39552 Reviewed By: mortzur Differential Revision: D22033855 Pulled By: SplitInfinity fbshipit-source-id: 05abf152274e5e51c37b6004886ea25bd4d33b80	2020-06-16 10:23:14 -07:00
Peter Bell	ad86c94f14	Reduce memory requirement for test_argminmax_large_axis (#40036 ) Summary: Closes gh-39060 The `TensorIterator` splitting is based on `can_use_32bit_indexing` which assumes 32-bit signed ints, so we can get away with just 2**31 as the axis length. Also tested on an old commit that I can reproduce the test failure on just a 1d tensor, overall quartering the memory requirement for the test. `4c7d81f847/aten/src/ATen/native/TensorIterator.cpp (L879)` For reference, the test was first added in gh-33310. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40036 Differential Revision: D22068690 Pulled By: ezyang fbshipit-source-id: 83199fd31647d1ef106b08f471c0e9517d3516e3	2020-06-16 10:19:10 -07:00
Raghuraman Krishnamoorthi	5add2e861c	Revert D21628596: Refactor LSTM tests Test Plan: revert-hammer Differential Revision: D21628596 Original commit changeset: 4aeda899f2e5 fbshipit-source-id: ab6544b87404863e054172aa9ec7ada51fad8e5e	2020-06-16 10:14:15 -07:00
Raghuraman Krishnamoorthi	e55e0cb1a9	Revert D20978736: Dynamic quantization support for LSTMCell, RNNCell and GRUCell Test Plan: revert-hammer Differential Revision: D20978736 Original commit changeset: 8f303ba1d7f8 fbshipit-source-id: bcd300819616d6536f582fcd3c90decd543c4657	2020-06-16 10:11:32 -07:00
Xiang Gao	5f6e55fb32	Clean up thrust::complex from tanh_backward (#39827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39827 Differential Revision: D22067191 Pulled By: anjali411 fbshipit-source-id: a257f0e33a917c13a7d0b855a869e0b9aca61541	2020-06-16 10:09:39 -07:00
Jerry Zhang	b372000d69	[quant][graphmode] Run RemoveRedundantDequantize in the end (#39923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39923 Test Plan: Imported from OSS Differential Revision: D22014712 fbshipit-source-id: a681bbab5d90ac8ebeff03cde5277501e86ee1d4	2020-06-16 09:55:03 -07:00
Christian Puhrsch	305921734a	Revert D22013026: [quant][graphmode] Pass debug option into insert_quant_dequant pass Test Plan: revert-hammer Differential Revision: D22013026 Original commit changeset: 714b938f25c1 fbshipit-source-id: 6230bc8060892e6485159ca88cc3ad49217791a2	2020-06-16 09:44:04 -07:00
wangxiyuan	12cf8390e6	Update aarch64 CI badge (#39914 ) Summary: This PR added python37 and python38 badge for aarch64 build CI. You can preview the badge here: https://github.com/wangxiyuan/pytorch/tree/update_aarch64_ci The build job is passing now since we use CLANG instead GCC for building. Using GCC still hit error which is mentioned in https://github.com/pytorch/pytorch/issues/33124 Related: https://github.com/pytorch/pytorch/issues/39558 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39914 Differential Revision: D22068834 Pulled By: ezyang fbshipit-source-id: d8a2ec795408850ec6eba3af7b29ddfeb3cbea38	2020-06-16 09:22:42 -07:00
Raghuraman Krishnamoorthi	48db06e39a	Dynamic quantization support for LSTMCell, RNNCell and GRUCell (#37159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37159 Enable dynamic quantization for LSTMCell, RNNCell and GRUCell ghstack-source-id: 105946183 (Note: this ignores all push blocking failures!) Test Plan: buck test caffe2/test:quantization -- 'test_quantized_rnn_cell $quantization\.test_quantize\.TestPostTrainingDynamic$' Differential Revision: D20978736 fbshipit-source-id: 8f303ba1d7f8e0c646ac73e862d2c1e735b7ff61	2020-06-16 09:14:59 -07:00
Michael Carilli	2beb9690c3	Change AccumulateGrad to yield `.grad`s that match weights' memory layout (#34904 ) Summary: Currently, whether `AccumulateGrad` [steals](`67cb018462/torch/csrc/autograd/functions/accumulate_grad.h (L42)`) or [clones](`67cb018462/torch/csrc/autograd/functions/accumulate_grad.h (L80)`) an incoming gradient, the gradient ends up rowmajor contiguous, regardless of its param's layout. If the param's layout is channels last, or otherwise not rowmajor contigous, later kernels that apply gradients to params are forced into an uncoalesced memory access pattern for either the param or the gradient. This may not sound like a big deal but for any binary op on large tensors it's a >3X increase in gmem traffic => 3X slowdown. The present PR changes `AccumulateGrad` to prefer, where possible, stashing gradients that match their params' layouts (["Gradient Layout Contract"](https://github.com/pytorch/pytorch/pull/34904/files#diff-ef1a56d24f66b280dcdb401502d6a796R29-R38)). Allowing `AccumulateGrad` to stash non-rowmajor-contiguous grads means DDP allreduces and DP reduces must allow non-rowmajor-contiguous grads. This PR extends DDP and DP to allow gradients with non-rowmajor-contiguous strides as long as their layout is nonoverlapping and dense. For good measure, I include changes that allow all five nccl primitives (allreduce, reduce, broadcast, allgather, reducescatter) to act on non-rowmajor-contiguous tensors (again as long as each input's layout is nonoverlapping and dense, and as long as all tensors participating in a given collective have the same layout). The primitive comm changes aren't necessary to enable the DDP changes, but I wasn't sure this would end up true until I had written both sets of changes. I think primitive comm enablement is reasonable to keep in the PR, especially since the code for it is simple. Channels last params will be a major beneficiary of this PR, but I don't see it as channels-last-specific fix. The spirit is layout matching in general: - Grads should be stashed with memory layouts matching their params. - Src and dst tensors on opposite ends of collectives should have matching dense layouts. This PR also updates autograd docs to describe potential BC-breaking changes below. ## BC notes ngimel albanD gchanan #### BC-breaking In the common case where the user lets AccumulateGrad decide grad layouts, strides for grads of dense but non-rowmajor-contiguous params will change. Any user code that was accustomed to `view(-1)`ing these grads will break. Also, the circumstances under which a grad can be stolen directly from the backward function that created it, as opposed to deep-copied by AccumulateGrad, have changed. In most cases we expect silent performance improvement, because we expect channels-last-aware backward kernels will create channels last gradients for channels last params. Now those can be stolen, whereas before this PR they were cloned and made rowmajor contiguous. IMO this is a mild BC breakage. Param backward hooks still see grads come in with whatever format the backward kernel gave them. The only BC breakage potential I see is if user code relies somehow on a grad in a hook having or not having the same deep memory as the eventual `param.grad`. Any such users hopefully know they're off the edge of the map and understand how to update their expectations. #### BC escape hatches At alband's recommendation, this PR's changes to AccumulateGrad do not alter the pre-PR code's decisions about whether grad is accumulated in or out of place. Accumulations of new grads onto an existing `.grad` attribute were (usually) in-place before this PR and remain in-place after this PR, keeping the existing `.grad`'s layout. After this PR, if the user wants to force accumulation into a grad with a particular layout, they can preset `param.grad` to a zeroed tensor with the desired strides or call `grad.contiguous(desired format)`. This likely won't be as performant as letting AccumulateGrad establish grad layouts by cloning or stealing grads with contract-compliant strides, but at least users have a control point. One limitation (present before this PR and unchanged by this PR): Presetting `param.grad` does not ensure in-place accumulation all the time. For example, if `create_graph=True`, or if incoming `new_grad` is dense and existing `variable_grad` is sparse, accumulation occurs out of place, and the out-of-place result may not match the existing grad's strides. ---------------------------- I also noticed some potential DDP improvements that I considered out of scope but want to mention for visibility: 1. make sure Reducer's ops sync with AccumulateGrad streams 2. ~to reduce CPU overhead and incur fewer kernel launches, lazily create flat `contents` tensors by a single `cat` kernel only when a bucket is full, instead of `copy_`ing grads into `contents` individually as soon as they are received.~ PR includes a [minor change](https://github.com/pytorch/pytorch/pull/34904/files#diff-c269190a925a4b0df49eda8a8f6c5bd3R312-R315) to divide grads while copying them into flat buffers, instead of copying them in, then dividing separately. Without cat+div fusion, div-while-copying is the best we can do. 3. https://github.com/pytorch/pytorch/issues/38942 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34904 Differential Revision: D20496044 Pulled By: albanD fbshipit-source-id: 248d680f4b1bf77b0a986451844ec6e254469217	2020-06-16 08:43:31 -07:00
Yang Gu	c065049592	Add smoke test to Windows CI (#39941 ) Summary: Related pr: https://github.com/pytorch/builder/pull/456 test at:https://github.com/pytorch/pytorch/pull/39943 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39941 Differential Revision: D22068004 Pulled By: ezyang fbshipit-source-id: 5244f64d842b3a8bf0af720dffb4b1a0370cc178	2020-06-16 08:29:02 -07:00
Edward Yang	d71804a57d	Eliminate TensorIterator::add_output with explicit dtype. (#39800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39800 I'm working on a refactor where I want to represent the inputs and outputs to TensorIterator as just plain Tensors, which means I need to kill this add_output with explicit dtype. This exists solely to set what the output dtype should be. We have a pretty similar API for doing this for shapes (declare_static_shape) so I just copied this API for dtypes instead. Although the new version is more code, I think the intent is more explicit. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21981740 Pulled By: ezyang fbshipit-source-id: cf45a6dbab6fb979ca3b412c31eca3dd4f4067de	2020-06-16 08:24:27 -07:00
Tongzhou Wang	23db54acdf	[DataLoader] add repr for WorkerInfo (#39975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39975 Differential Revision: D22039414 Pulled By: ezyang fbshipit-source-id: 230f68a91fca901bce652fdf88ba88167f39b978	2020-06-16 08:19:32 -07:00
Jerry Zhang	ee5ad6ce25	[quant][graphmode] Pass debug option into insert_quant_dequant pass (#39915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39915 Some of the usage, e.g. add_scalar will not be supporting the debug option, that is, we will not have a numerically exact representation of the final quantized model before finalize if people use add scalar. warning will be added in a later PR. Test Plan: Imported from OSS Differential Revision: D22013026 fbshipit-source-id: 714b938f25c10fad3dfc79f095356b9803ef4b47	2020-06-16 08:14:50 -07:00
Mike Ruberry	ebd869153c	Clarifies compare_with_numpy behavior (#40064 ) Summary: Currently compare_with_numpy requires a device and dtype, but these arguments are ignored if a tensor is provided. This PR updates the function to only take device and dtype if a tensor-like object is given. This should prevent confusion that you could, for example, pass a CPU float tensor but provided a CUDA device and integer dtype. Several tests are updated to reflect this behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40064 Differential Revision: D22058072 Pulled By: mruberry fbshipit-source-id: b494bb759855977ce45b79ed3ffb0319a21c324c	2020-06-16 05:01:33 -07:00
Mike Ruberry	8939849f72	Revert D21986243: TORCH_FN Test Plan: revert-hammer Differential Revision: D21986243 Original commit changeset: a123571c18aa fbshipit-source-id: 72c690c2b4c2fc39e1c9192d1c410f49bb4077a5	2020-06-16 04:43:46 -07:00
Sebastian Messmer	12cb80b5b8	TORCH_FN (#39823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39823 Add a compile time function pointer that can be used to pass function pointers in template args. This is very useful for metaprogramming function wrappers. ghstack-source-id: 105944072 Test Plan: waitforsandcastle Differential Revision: D21986243 fbshipit-source-id: a123571c18aa0e65908cbb131f28922ceb59061c	2020-06-16 03:08:08 -07:00
Jerry Zhang	144e8dc5a3	[quant][graphmode] Use quantizedbatch_norm in graph mode (#39911 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39911 Test Plan: Imported from OSS Differential Revision: D22012282 fbshipit-source-id: 98af55172cbeaa7080865d6533df21647a7cedfa	2020-06-16 00:58:11 -07:00
Raghuraman Krishnamoorthi	655f1ea176	Refactor LSTM tests (#38851 ) Summary: Create three tests for LSTMs: 1. test_qlstm: Test to check numerics of quantized LSTM operator. 2. test_lstm_api: To check the LSTM module and compare it with the quantized LSTM op 3. test_quantized_rnn: Check the dynamic quantization workflow, scriptability and serialization of quantized LSTM Pull Request resolved: https://github.com/pytorch/pytorch/pull/38851 ghstack-source-id: 105945574 (Note: this ignores all push blocking failures!) Test Plan: buck test caffe2/test:quantization -- 'test_lstm_api $quantization\.test_quantized_module\.TestDynamicQuantizedModule$' --print-passing-details buck test caffe2/test:quantization -- 'test_quantized_rnn $quantization\.test_quantize\.TestPostTrainingDynamic$' buck test caffe2/test:quantization -- 'test_qlstm $quantization\.test_quantized_op\.TestDynamicQuantizedRNNOp$' --print-passing-details Differential Revision: D21628596 fbshipit-source-id: 4aeda899f2e5f14bfbe3d82096cb4ce89c725fa1	2020-06-16 00:41:24 -07:00
Jiakai Liu	bcb44796ba	[pytorch] consolidate android gradle build scripts (#39999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39999 Cleaned up the android build scripts. Consolidated common functions into common.sh. Also made a few minor fixes: - We should trust build_android.sh doing right about reusing existing `build_android_$abi` directory; - We should clean up `pytorch_android/src/main/jniLibs/` to remove broken symbolic links in case custom abi list changes since last build; Test Plan: Imported from OSS Differential Revision: D22036926 Pulled By: ljk53 fbshipit-source-id: e93915ee4f195111b6171cdabc667fa0135d5195	2020-06-15 23:55:21 -07:00
Natalia Gimelshein	9204d76b5f	Back out "[pytorch][PR] Remove `THTensor_(fill)` & `THTensor_(zero)`" Summary: Original commit changeset: bfeeaebe93d9 Test Plan: CI runs Differential Revision: D22062523 fbshipit-source-id: 6d827fd682a9e64c49876cd1c7269d145e93dc2c	2020-06-15 23:49:58 -07:00
Jiakai Liu	f0b40cac30	[pytorch] simplify android circleci definition data model Summary: Android jobs don't seem to fit to `pytorch_build_data.py` data model very well. Other mobile jobs all have their own data model files - even for Android nightly jobs. As we are adding more variants like vulkan, it's going to be hard to maintain. So renamed `android_gradle.py` to `android_definitions.py` and moved android jobs into it, following the conventions of `nightly_android.py` and `ios_definitions.py`. Differential Revision: D22036915 Test Plan: Imported from OSS Reviewed By: IvanKobzarev Pulled By: ljk53 fbshipit-source-id: 42ad5cbe451edecef17f6d3cbf85076cc3acf615	2020-06-15 23:33:27 -07:00
Pavel Belevich	18fe9d267c	Revert D22050656: [Yet Another Reland] Include AT_PARALLEL_OPENMP/AT_PARALLEL_NATIVE/AT_PARALLEL_NATIVE_TBB to ATen/Config.h Test Plan: revert-hammer Differential Revision: D22050656 Original commit changeset: 274bc0733c37 fbshipit-source-id: ed9af6305dab96bb79a2016ce82f80af4bd1e5b7	2020-06-15 22:26:07 -07:00
Jerry Zhang	1a388da10a	[quant] add quantized::batch_norm (#39910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39910 We need this for graph mode quantization, since we only have `aten::batch_norm` the dimension is only known at runtime, we'll need to quantize it to `quantized::batch_norm` Test Plan: Imported from OSS Differential Revision: D22012281 fbshipit-source-id: 2973d86a17a02b7bdc36bd1e703e91584d9139d0	2020-06-15 21:32:58 -07:00
Zhang, Xiaobing	5d4a662846	DNNL: fix F.max_pool2d and F.avg_pool2 issue when stride=None (#39221 ) Summary: For F.max_pool2d and F.avg_pool2d, there has RuntimeError when stride is None, this PR sovle it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39221 Differential Revision: D22059565 Pulled By: ngimel fbshipit-source-id: 2080e1e010815aedd904c58552e92be9f7443d38	2020-06-15 21:00:12 -07:00
Pritam Damania	399dd84c8c	Fix TensorPipeAgent shutdown to ensure it drains all outstanding work. (#40060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40060 As part of debugging https://github.com/pytorch/pytorch/issues/39855, I noticed that TensorPipeAgent's ThreadPool was still executing tasks when the python interpreter was shutting down. This caused issues with pybind::gil_scoped_acquire() since it can't be called when the interpreter is shutting down resulting in a crash. The reason for this was that TensorPipeAgent was calling waitWorkComplete and then shutting down the listeners. This meant that after waitWorkComplete returned, there could still be a race where an RPC call gets enqueued before we shutdown listeners. To avoid this situation, I've moved the call to waitWorkComplete at the end of shutdown (similar to ProcessGroupAgent). Closes: https://github.com/pytorch/pytorch/issues/39855 ghstack-source-id: 105926653 Test Plan: 1) Ran test_backward_node_failure (__main__.TensorPipeAgentDistAutogradTestWithSpawn) 100 times to verify the fix. 2) waitforbuildbot Differential Revision: D22055708 fbshipit-source-id: 2cbe388e654b511d85ad416e696f3671bd369372	2020-06-15 20:38:25 -07:00
Pavel Belevich	d4faf14cb2	[Yet Another Reland] Include AT_PARALLEL_OPENMP/AT_PARALLEL_NATIVE/AT_PARALLEL_NATIVE_TBB to ATen/Config.h (#40045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40045 Fixes #39471 Reland of #39612 and #39881 Test Plan: Imported from OSS Differential Revision: D22050656 Pulled By: pbelevich fbshipit-source-id: 274bc0733c37ef6c87deb3344bb49ca9107e257b	2020-06-15 20:05:30 -07:00
Pavel Belevich	f13be5fde1	Check if generator has next normal sample cache methods in normal_distribution (#39816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39816 This change replaces [`#if !defined(__CUDACC__) && !defined(__HIPCC__)`](`856215509d/aten/src/ATen/core/DistributionsHelper.h (L147)`) with SFINAE expression that checks if RNG typename has next_double_normal_sample, set_next_double_normal_sample, next_float_normal_sample, set_next_float_normal_sample methods It is required by (and manually tested with) https://github.com/pytorch/csprng/pull/28 Fixes #39618 Test Plan: Imported from OSS Differential Revision: D22002599 Pulled By: pbelevich fbshipit-source-id: e33d42a7e88c5729b077b9cdbf1437158dab48bc	2020-06-15 19:57:04 -07:00
Shihao Xu	00651b8c93	[distribtued.nn] Implement TorchScript-compatible RemoteModule API (#37139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37139 See design doc in https://github.com/pytorch/pytorch/issues/37136 ghstack-source-id: 105926270 Test Plan: TODO: - Make the generated Interface usable. https://github.com/pytorch/pytorch/pull/37139#discussion_r434190978 - - Avoid generating the same template instances for Module that is not scriptable. - Remove "infer_module_interface_cls". - Use Python format instead of a CodeTemplate - Use Python tempfile to track and delete file. Does it work if there is crash. ``` buck test mode/dev-nosan //caffe2/test/distributed/nn/jit:test_instantiator buck build mode/dev-nosan //caffe2/test/distributed/nn/jit:test_instantiator && \ buck-out/gen/caffe2/test/distributed/nn/jit/test_instantiator\#binary.par -r test_instantiate_scripted_remote_module_template buck build mode/dev-nosan //caffe2/test/distributed/nn/jit:test_instantiator && \ buck-out/gen/caffe2/test/distributed/nn/jit/test_instantiator\#binary.par -r test_instantiate_non_scripted_remote_module_template ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_spawn ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_fork buck build mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_fork && \ buck-out/gen/caffe2/test/distributed/nn/api/remote_module_fork\#binary.par -r test_user_provided_global_unique_name buck build mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_fork && \ buck-out/gen/caffe2/test/distributed/nn/api/remote_module_fork\#binary.par -r test_forward_async_script buck build mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_fork && \ buck-out/gen/caffe2/test/distributed/nn/api/remote_module_fork\#binary.par -r test_forward_sync_script buck build mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_fork && \ buck-out/gen/caffe2/test/distributed/nn/api/remote_module_fork\#binary.par -r test_forward_with_kwargs buck build mode/dev-nosan //caffe2/test/distributed/nn/api:remote_module_fork && \ buck-out/gen/caffe2/test/distributed/nn/api/remote_module_fork\#binary.par -r test_user_provided_global_unique_name ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork ``` buck test mode/opt-asan //caffe2/test:jit -- 'test_script_forward_method_replacement buck build mode/dev-nosan //caffe2/test:jit && \ buck-out/gen/caffe2/test/jit\#binary.par -r 'test_script_forward_method_replacement' buck build mode/dev-nosan //caffe2/test:jit && \ buck-out/gen/caffe2/test/jit\#binary.par -r 'test_imported_classes' Differential Revision: D20499658 fbshipit-source-id: dd9383ae4eb2343366c11127664f845b91ca3b0a	2020-06-15 19:07:35 -07:00
Jerry Zhang	f37b8e73f4	[quant][graphmode] Support prim:TupleUnpack and prim::TupleConstruct (#39895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39895 Test Plan: Imported from OSS Differential Revision: D22009854 fbshipit-source-id: a5dab2b4f943e5e047ba9e8573088adf66f5da6b	2020-06-15 18:55:15 -07:00
Xiang Gao	eb358f49c2	Overload complex math functions on both :: and std:: (#39829 ) Summary: Because ROCm has bug on std:: functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39829 Differential Revision: D22018430 Pulled By: anjali411 fbshipit-source-id: 671e158d3e3342394d1deaebd7ff011cce94c31a	2020-06-15 16:53:16 -07:00
Ivan Kobzarev	84d8a42fdb	[android] Remove android fbjni subproject (#39691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39691 After switching on using fbjni-java-only dependency, we do not need to have gradle subproject fbjni. Test Plan: Imported from OSS Differential Revision: D22054575 Pulled By: IvanKobzarev fbshipit-source-id: 331478a57dd0d0aa06a5ce96278b6c897cb0ac78	2020-06-15 15:58:18 -07:00
Shihao Xu	d602950cb4	[torch.distributed.rpc] Add WorkerInfo python repr magic method (#40004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/40004 close https://github.com/pytorch/pytorch/issues/39965 ghstack-source-id: 105891281 Test Plan: buck test mode/opt-asan //caffe2/test:jit -- 'test_vae_quantized $jit\.test_models\.TestModels$' buck test mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork Differential Revision: D5696583 fbshipit-source-id: 19570414dc833c38fcd1ad38d2f0a816dbf51743	2020-06-15 15:08:29 -07:00
Luca Wehrstedt	ecfe0c9a25	[TensorPipe] Use registry for transports and channels (#39957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39957 In order to provide a pluggable and extendable way to add new transports and channels to the TensorPipe agent, we use two registries. This allows us to separate the specific details of each backend (e.g., how it determines what address to use) from the generic logic of setting up TensorPipe. Test Plan: Built `//caffe2:ifbpy` on two devservers, one in CLN and the other in PRN, and ran RPC across them. Differential Revision: D22017614 fbshipit-source-id: 4ea7e6ed004a69187666f41bf59858e8174fde0d	2020-06-15 15:04:00 -07:00
Xiong Wei	51e341df4f	[bernoulli_kernel] Replace CPU_tensor_apply functions with cpu_serial_kernel (#39711 ) Summary: Resolve https://github.com/pytorch/pytorch/issues/39556 Related https://github.com/pytorch/pytorch/issues/38558 Replace CPU_tensor_apply functions with cpu_serial_kernel in bernoulli_kernel, unifying bernoulli_kernel with all other kernels in `cpu/DistributionTemplates.h`. Signed-off-by: Xiong Wei <xiongw.fnst@cn.fujitsu.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/39711 Differential Revision: D22052374 Pulled By: pbelevich fbshipit-source-id: 416334da50195b67f05a18a98971f370cba4fb0d	2020-06-15 14:11:41 -07:00
Jeremy Lilley	0c25428597	[futures] Reland: Add torch.futures.collect_all()/wait_all() python api. (#39964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39964 The "[fut.wait() for fut in futs]" idiom can introduce up to O(len(futs)) thread switches, which may be excessive for large N. This plumbs through the new c++ c10::collectAll() to Python space so that we only employ a single jit-side wait. Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:rpc_spawn Differential Revision: D22027412 fbshipit-source-id: 4e344a19a09638ee46e7fc478df80a41941b84ce	2020-06-15 14:07:12 -07:00
Ilia Cherniavskii	cc3fc786b7	[resubmit] [pytorch][PR] Fix for num_threads==1 in OpenMP "parallel for" (#39533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39533 Test Plan: CI Reviewed By: ngimel Differential Revision: D21889269 fbshipit-source-id: 5ba13a0a3ec11edd0b6a7c3fdb35396b847a3d9e	2020-06-15 13:14:59 -07:00
Hongzheng Shi	f6b0fbe2c5	topk tensor k support (#39407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39407 - support passing a single element tensor as k for topk module - support passing a single element tensor to constant fill output Test Plan: buck test dper3/dper3/modules/tests:core_modules_test -- test_topk_gating_without_split_examples_tensor_k buck test caffe2/caffe2/python:hypothesis_test -- test_constant_fill_from_tensor Reviewed By: huayuli00 Differential Revision: D21843739 fbshipit-source-id: 0c5f5c03e9f57eeba40c0068784625164c2527ec	2020-06-15 13:10:20 -07:00
Sebastian Messmer	4c3436838f	Show which type was the wrong one when a signature is invalid (#39491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39491 - ghstack-source-id: 105820787 Test Plan: waitforsandcastle Differential Revision: D21872519 fbshipit-source-id: 18f030c2b4283d6e6833d9b5164e7484137ca0fb	2020-06-15 12:58:05 -07:00
Mikhail Zolotukhin	79450edad3	[JIT] IRParser: properly parse negative numbers. (#39981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39981 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D22032786 Pulled By: ZolotukhinM fbshipit-source-id: b6c5237ac5c1c331d5053a620eb9a37a4c698125	2020-06-15 12:28:41 -07:00
generatedunixname89002005287564	42f0ea49ca	[Codemod][GleanFbcode] Remove dead includes in caffe2/binaries Reviewed By: ilia-cher Differential Revision: D21949969 fbshipit-source-id: 80336f82e9507dd001d079644cba5012bc5c8eed	2020-06-15 12:16:52 -07:00
Jeremy Lilley	569c85b45d	[futures] Add assert to Future constValue() accessor, add hasValue(). (#39950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39950 Per the comment in the code, constValue() should only be used in the case where the future was complete and value was not an error. Add an assert to enforce this. Also, add hasValue() accessor for completeness. ghstack-source-id: 105815597 Test Plan: buck test mode/dev-nosan caffe2/test/cpp/jit: Differential Revision: D22021776 fbshipit-source-id: b59b6c775eab344068a76f4cd8c3a9dc1f2a174e	2020-06-15 12:11:22 -07:00
Tongzhou Wang	019eeb3183	Kill DataLoader worker when we can't join (#39869 ) Summary: There still are occasional reports of DataLoader workers not exiting (e.g., https://github.com/pytorch/pytorch/issues/39570). Before we figure out why, we should just kill them if the join timesout to prevent hanging. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39869 Differential Revision: D22018501 Pulled By: ezyang fbshipit-source-id: 66a00d0f5b3e303b6106b336949176b3ff8ac8ae	2020-06-15 11:18:23 -07:00
peter	1d642e2adf	Improve cuda error message for MSVC (#39987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39987 Differential Revision: D22039408 Pulled By: ezyang fbshipit-source-id: b15f6eced0aaee1087c77564126aa304623cbed1	2020-06-15 10:28:28 -07:00
Jeff Daily	ac8d63a52f	Update jenkins caffe2 scripts for ROCm circleci images. (#39908 ) Summary: Remove work-around to install conda locally for older ROCm jenkins images. Remove use of sudo to install pip packages. Install missing packages for caffe2 test.sh needs on ROCm. CC ezyang xw285cornell sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39908 Differential Revision: D22044404 Pulled By: ezyang fbshipit-source-id: da6b5a45dcf68432339ad6f1c4af2d8a96df73f1	2020-06-15 09:06:28 -07:00
Nikita Shulga	c6b69a4e4d	Delete Python <= 3.5 specific checks from the code (#39879 ) Summary: Remove PY3 and PY34 checks from `torch/testing/_internal/common_utils.py` Remove PY35 global var from `torch.jit.annotations` Always call `try_get_real_signature` in `torch/jit/annotations.py` Use `map` instead of `imap`, since Python-2 is no longer support, so map is always lazy. Remove all pre Python-3.6 checks from `torch/_six.py` and `torch/_appdirs.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39879 Differential Revision: D22037811 Pulled By: malfet fbshipit-source-id: af0c79f976569c2059d39ecb49c6b8285161734f	2020-06-15 08:16:06 -07:00
ShawnZhong	c8c53c802e	Add `generator=` kwarg for DataLoader & random samplers (#39737 ) Summary: Fix https://github.com/pytorch/pytorch/issues/39572 Add `generator=` kwarg for DataLoader & random samplers cc: SsnL, deeppatel4557, albanD, mitar Pull Request resolved: https://github.com/pytorch/pytorch/pull/39737 Differential Revision: D22019132 Pulled By: albanD fbshipit-source-id: 835e08b86c5396bc0b0e41057661306b15394d6e	2020-06-15 07:01:20 -07:00
Kurt Mohler	541814f2b7	Remove dead ScatterGather code (#39963 ) Summary: This code was probably left behind after an ATen port. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39963 Differential Revision: D22039370 Pulled By: ezyang fbshipit-source-id: 4ef75bac9b69f4b508a0b09c5c1f2ebc21bd9546	2020-06-14 20:02:15 -07:00
Pavel Belevich	cf64af1ad2	Revert D22036002: [pytorch][PR] Kill thrust::complex from log kernels Test Plan: revert-hammer Differential Revision: D22036002 Original commit changeset: 8852a833a0c7 fbshipit-source-id: 36d3c8d0e489f8a11a6e3e9d1ae162c192748037	2020-06-14 15:30:48 -07:00
xueht-fnst	ede9bc97c3	Fix the processing logic of bernoulli on amd (#40001 ) Summary: - Fixed the bug discussed in https://github.com/pytorch/pytorch/issues/38558 - This PR is aim to make the processing of bernoulli on amd can move to the default version, even though `AT_MKL_ENABLED` is setting to `TRUE`. - This logic used to be in the old code, but was broken by the latest update, this pr will be the fix for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/40001 Differential Revision: D22037646 Pulled By: pbelevich fbshipit-source-id: c0aa4ba37416d2568daf3463cfede6838ffaeac1	2020-06-14 13:46:13 -07:00
Xiang Gao	4947ee3811	Kill thrust::complex from log kernels (#39902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39902 Differential Revision: D22036002 Pulled By: pbelevich fbshipit-source-id: 8852a833a0c71343ae630754f00da35a66e05917	2020-06-14 11:44:28 -07:00
Xiang Gao	5b194b0fb2	Remove thrust::complex from reciprocal (#39899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39899 Differential Revision: D22035940 Pulled By: pbelevich fbshipit-source-id: 90d760317c47d2bad5a15384495a7a6cfbb655b7	2020-06-14 10:13:26 -07:00
SsnL	d5236f8517	Avoid initializing unnecessary tensors in nccl.reduce (#39688 ) Summary: While working on https://github.com/pytorch/pytorch/issues/38911, I realized that `nccl.reduce` only needs a single output tensor, while our current implementation requires a list of output tensors. This, along with a TODO I fixed in reduce_add, should have some speed up for data parallel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39688 Differential Revision: D22034547 Pulled By: mrshenli fbshipit-source-id: e74d54d673ebbb062474b1bb5cc93a095a3a5f6c	2020-06-14 10:11:32 -07:00
sununs11@gmail.com	8072f0685f	Add zero input support for batch permutation op (#39851 ) Summary: Batch permutation op does not support zero input now, it can output a tensor the same as the input if the first dimension is zero. This can be solved: facebookresearch/detectron2#1580 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39851 Reviewed By: houseroad Differential Revision: D22033207 Pulled By: ppwwyyxx fbshipit-source-id: 73b540d2182fe85ed9a47220237a8f213d68ae16	2020-06-13 21:34:24 -07:00
Venkata Chintapalli	f1d10978a4	Added Mean and Variance calculation function. (#39986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39986 Mean and Variance computation to match with Intel NNPI implementation. Test Plan: Manual Testing Reviewed By: hyuen Differential Revision: D22008566 fbshipit-source-id: 6ac4563859b84121a2482f8e2f738be5c6111f57	2020-06-13 18:41:51 -07:00
Shihao Xu	b803b4ce09	[torch.distributed.rpc] Add stringify WorkerInfo, better error message for py_rref (#39974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39974 # Problem When this assertion happens, I don't know - which worker_id it is on, even with the worker_name "trainer:0". - which rref is throwing this exception. ```shell File "/mnt/xarfuse/uid-213229/96b122e4-seed-df64b884-e2b4-4520-b7a8-777e79c829ac-ns-4026532900/caffe2/torch/fb/training_toolkit/backend/training_strategies/parameter_server_strategy.py", line 246, in _initialize_trainers trainer_name: fut.wait() for trainer_name, fut in model_rref_futs.items() File "/mnt/xarfuse/uid-213229/96b122e4-seed-df64b884-e2b4-4520-b7a8-777e79c829ac-ns-4026532900/caffe2/torch/fb/training_toolkit/backend/training_strategies/parameter_server_strategy.py", line 246, in <dictcomp> trainer_name: fut.wait() for trainer_name, fut in model_rref_futs.items() File "/mnt/xarfuse/uid-213229/96b122e4-seed-df64b884-e2b4-4520-b7a8-777e79c829ac-ns-4026532900/torch/distributed/rpc/internal.py", line 158, in _handle_exception raise result.exception_type(result.msg) RuntimeError: RuntimeError('Cannot call localValue() on a non-local reference. Call it on trainer:0') Traceback (most recent call last): File "/mnt/xarfuse/uid-213229/96b122e4-seed-21bc7792-3714-4e62-a1c1-32a7c38ed984-ns-4026533058/torch/distributed/rpc/internal.py", line 148, in _run_function result = python_udf.func(python_udf.args, python_udf.kwargs) File "/mnt/xarfuse/uid-213229/96b122e4-seed-21bc7792-3714-4e62-a1c1-32a7c38ed984-ns-4026533058/torch/distributed/rpc/rref_proxy.py", line 5, in _local_invoke return getattr(rref.local_value(), func_name)(args, **kwargs) RuntimeError: Cannot call localValue() on a non-local reference. Call it on trainer:0 ``` Changes, - Add stringify WorkerInfo - Make localValue() assertion message clearer about the case. ghstack-source-id: 105840918 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork -- test_local_value_not_on_owner buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit/:rpc_fork Reviewed By: mrshenli Differential Revision: D5690653 fbshipit-source-id: ca6a8b1ff6e09f8644303a0f82f9b1a546a11170	2020-06-13 12:57:05 -07:00
peter	905c6730b7	Adding /FS for NVCC if /Zi is used (#39994 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39989. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39994 Differential Revision: D22034956 Pulled By: malfet fbshipit-source-id: b26cf188eba8b796ee6e39e6adbc3e2fbb07a53a	2020-06-13 12:16:12 -07:00
Nikita Shulga	e2825392b6	Update torchvision commit from Mar 11 to Jun 11 2020 (#39970 ) Summary: Mar 11 version of TorchVision still have some Python 2 anachronisms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39970 Differential Revision: D22034738 Pulled By: malfet fbshipit-source-id: aa281d50072e2448a6b202061f3ae8e8b65346ad	2020-06-13 10:40:51 -07:00
Xingying Cheng	bdef721caf	[fbcode] Add find_method into lite interpreter python binding. Summary: Add 'find_method' into 'LiteScriptModule' python binding method, so that we use it to find existence of methods, e.g. "get_all_bundled_inputs". Reviewed By: linbinyu, houseroad Differential Revision: D22029002 fbshipit-source-id: 9acf76880fc989e825dc3a9186dab6928caee75e	2020-06-13 07:48:13 -07:00
Haixin Liu	ddd45ae919	Extend int8 FC op to take scale and zero point from input Summary: Extend int8 FC op to take scale and zero point from input to support int8 PTQ productization of online training models. Test Plan: buck test caffe2/caffe2/quantization/server:fully_connected_dnnlowp_op_test Reviewed By: csummersea Differential Revision: D21944884 fbshipit-source-id: 2094827da903f3993afe4f8cf6e70286b195321d	2020-06-13 02:34:45 -07:00
Pavel Belevich	8d3fcb43cf	Revert D22008317: [Reland] Include AT_PARALLEL_OPENMP/AT_PARALLEL_NATIVE/AT_PARALLEL_NATIVE_TBB to ATen/Config.h Test Plan: revert-hammer Differential Revision: D22008317 Original commit changeset: b25714d8643c fbshipit-source-id: 01f9f0a996cecba1fb2e7c0eb51ba32ea1fcf73e	2020-06-12 22:39:10 -07:00
Kurt Mohler	db2b273d1f	Reland: Fix CUDA device guard usage when first arg of kernel is scalar (#39956 ) Summary: Reland PR https://github.com/pytorch/pytorch/issues/39870 Closes https://github.com/pytorch/pytorch/issues/38889 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39956 Differential Revision: D22027956 Pulled By: ngimel fbshipit-source-id: e6029f450e2da3782b2d05bcc2012c19b82291da	2020-06-12 21:41:53 -07:00
Pavel Belevich	e62d655744	[Reland] Include AT_PARALLEL_OPENMP/AT_PARALLEL_NATIVE/AT_PARALLEL_NATIVE_TBB to ATen/Config.h (#39881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39881 Test Plan: Imported from OSS Differential Revision: D22008317 Pulled By: pbelevich fbshipit-source-id: b25714d8643cf584bb3331d70e44f4df06c1b615	2020-06-12 19:15:13 -07:00
Wanchao Liang	34d1098dc2	[rpc] fix RRef alias annotation (#39933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39933 Fix rref related alias annotation to ensure it's not getting ebased by the jit dce. Test Plan: Imported from OSS Differential Revision: D22015426 Pulled By: wanchaol fbshipit-source-id: 3e74d49fa9f88abaf662bde7be5284f01f621b98	2020-06-12 17:17:48 -07:00
Wanchao Liang	356d564886	[rpc] use annotation_str for RRef type serialization (#39932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39932 This PR make RRef fork to use the jit type annotation_str recently introduce in https://github.com/pytorch/pytorch/pull/39544 to allow consistent serialization type str format, and fix the case when dict->str() format not match the type resolver. Test Plan: Imported from OSS Differential Revision: D22015427 Pulled By: wanchaol fbshipit-source-id: f64d7e3acde5312813816c8f3c7d8fa9379704e8	2020-06-12 17:15:57 -07:00
James Reed	d9539cd835	[testing] Dont use zipfile for storage __reduce__ (#39893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39893 ghstack-source-id: 105721911 Test Plan: waitforsadcastle Differential Revision: D22003583 fbshipit-source-id: 864c3e36fc79412334ab3887c9776eaaabc5a315	2020-06-12 16:48:12 -07:00
Supriya Rao	727e77a809	[quant] Enable reduce_range for graphmode (#39874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39874 When fbgemm backend is set we make sure reduce_range is set to true to avoid overflow in the operator Also adds test for per-channel quant with graph mode and compare numerics with eager mode Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D22011205 fbshipit-source-id: 1c7c9b7ab0d84200e3d8d85c34978554c30c0169	2020-06-12 16:25:58 -07:00
Xiang Gao	b2620722c3	Kill meanall from TH, THC (#39907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39907 Differential Revision: D22022995 Pulled By: ngimel fbshipit-source-id: 0d9f3b61c28af7a8b1296fb9588b07a708b347f7	2020-06-12 15:16:50 -07:00
Nikita Shulga	8749aaca83	Abort setup_pytorch_env.bat if one of the steps failed (#39951 ) Summary: It's pointless to continue if `conda install`, Visual Studio or `pip install` commands have failed Pull Request resolved: https://github.com/pytorch/pytorch/pull/39951 Differential Revision: D22026240 Pulled By: malfet fbshipit-source-id: de982e9d328e3fd7d9f0bd14400c0116b3010281	2020-06-12 14:32:57 -07:00
Mike Ruberry	8bc821f0d0	Revert D21976891: [futures] Add torch.futures.collect_all()/wait_all() python api. Test Plan: revert-hammer Differential Revision: D21976891 Original commit changeset: 253c61f503f4 fbshipit-source-id: f839b16f4469e96325b607b6313a1397e1988856	2020-06-12 13:40:37 -07:00
Luca Wehrstedt	14099374bd	Update TensorPipe submodule (#39945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39945 In order to pick up `8fb1fe66f8`. Test Plan: Export to CircleCI and make sure tests pass. Reviewed By: patricklabatut Differential Revision: D22019033 fbshipit-source-id: eb192ea3950e4f27ed222f84e2d9de8bf6eb927c	2020-06-12 12:57:53 -07:00
Jeremy Lilley	a9aa6367c2	[futures] Add torch.futures.collect_all()/wait_all() python api. (#39790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39790 The "[fut.wait() for fut in futs]" idiom can introduce up to O(len(futs)) thread switches, which may be excessive for large N. This plumbs through the new c++ c10::collectAll() to Python space so that we only employ a single jit-side wait. ghstack-source-id: 105779443 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:rpc_spawn Reviewed By: kiukchung Differential Revision: D21976891 fbshipit-source-id: 253c61f503f4ffb9be784e6c49a0656cede139fb	2020-06-12 12:36:04 -07:00
Jiakai Liu	0d19ae5a14	[pytorch] fix (ProfiledType\|TraceType)None.cpp (#39934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39934 Shouldn't write .cpp file when it's called to produce header file. Test Plan: Imported from OSS Differential Revision: D22016596 Pulled By: ljk53 fbshipit-source-id: 30a1b4a527bc1ffd8ee748c70494fe712be60c4f	2020-06-12 11:53:01 -07:00
lixinyu	fdf6d37895	re-enable some corner cases in memory format transpose test (#39891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39891 Test Plan: Imported from OSS Differential Revision: D22009141 Pulled By: glaringlee fbshipit-source-id: aaf24fdf080855bbe9d2fe5082aa8e92c6973f34	2020-06-12 11:46:57 -07:00
Summer Deng	558c20f50a	Int8 PTQ ops for online training (#39818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39818 Add histogram collection and qparam update support for Int8 PTQ during online training Add caffe2 wrappers for generating int8 quant params based on output activation samples from the LastNWindowCollector op. Test Plan: ``` buck test mode/opt caffe2/caffe2/quantization/server:int8_gen_quant_params_test ``` Reviewed By: hx89 Differential Revision: D21984455 fbshipit-source-id: 9479c87a5b1867aec662ecd21fe7ad2bc7e8652c	2020-06-12 11:41:30 -07:00
Jerry Zhang	99084104b6	[quant][graphmode][refactor] isScalar check (#39892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39892 Test Plan: Imported from OSS Differential Revision: D22009856 fbshipit-source-id: fbc407499bcff0f25e44eedba3d6cd1225325c24	2020-06-12 10:53:35 -07:00
Jeff Daily	1e05e5e0ae	Correct #39759 for HIP. (#39801 ) Summary: Changes in PR https://github.com/pytorch/pytorch/issues/39759 broke HIP caffe2. hipify for caffe2 renames CUDA to HIP; torch does not. If caffe2 calls into torch, it needs to use CUDA-named functions. CC ezyang xw285cornell sunway513 houseroad dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/39801 Differential Revision: D21982493 Pulled By: xw285cornell fbshipit-source-id: 8e88e0fb80c71f0342e23ef0214a42d5542bdc70	2020-06-12 10:34:28 -07:00
MohamedAliRashad	f3f9415f81	Add file_name argument to load_state_dict_from_url (#39749 ) Summary: Add the feature proposed here https://github.com/pytorch/pytorch/issues/39196 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39749 Differential Revision: D21962736 Pulled By: ailzhang fbshipit-source-id: b60fb0d83fd0728354a46e2762cc3598b14b1fdb	2020-06-12 10:31:22 -07:00
Philip Meier	baa604812c	add optional request headers to torch.hub (#39740 ) Summary: Closes https://github.com/pytorch/pytorch/issues/39657. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39740 Differential Revision: D22005273 Pulled By: ailzhang fbshipit-source-id: eb33147386fb4befb0278c788111306aa878ca39	2020-06-12 10:25:26 -07:00
Ivan Kobzarev	d367f575b9	[CI][vulkan] android build abi x86 with USE_VULKAN (#39912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39912 Reland of https://github.com/pytorch/pytorch/pull/39767 What was wrong: android_x86_32_vulkan job used the same docker image as android_x86_32 As a result vulkan job did commit and following android_gradle used libpytorch.so with USE_VULKAN, while vulkan wrapper was not added to the linking of libpytorch_jni Fix: To commit to different docker images ``` elif [[ ${BUILD_ENVIRONMENT} == "android-ndk-r19c-vulkan-x86_32" ]]; then export COMMIT_DOCKER_IMAGE=$output_image-android-vulkan-x86_32 ``` Test Plan: Imported from OSS Differential Revision: D22012951 Pulled By: IvanKobzarev fbshipit-source-id: 27908f630e6ce3613679a50b4c10f8b246718894	2020-06-12 10:04:24 -07:00
Haixin Liu	2bab9149cc	Extend int8 quantize op to take scale and zero point from input Summary: Extend int8 quantize op to take scale and zero point from input to support int8 PTQ productization of online training models. Test Plan: buck test caffe2/caffe2/quantization/server:quantize_dnnlowp_op_test Reviewed By: csummersea Differential Revision: D21939660 fbshipit-source-id: 7ce2fbf9cd8a990c270f2187a49b1578ce76bc37	2020-06-12 09:28:51 -07:00
Alban Desmaison	48678aa39f	pin ninja version to fix windows CI (#39944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39944 Differential Revision: D22019435 Pulled By: albanD fbshipit-source-id: 9d0933066526b3e3fc4896457241f0877099291b	2020-06-12 08:56:45 -07:00
Pavel Belevich	4574abc395	Replace __host__ __device__ with C10_HOST_DEVICE in THCIntegerDivider.cuh (#39797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39797 This change allows using [OffsetCalculator.cuh](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/cuda/detail/OffsetCalculator.cuh) on both CPU and CUDA. It is required by (and manually tested with) https://github.com/pytorch/csprng/pull/28 Test Plan: Imported from OSS Differential Revision: D22002571 Pulled By: pbelevich fbshipit-source-id: 0b85e465777f6613f3b3ba91873da57dab949c54	2020-06-12 08:50:21 -07:00
Kurt Mohler	124cdf2290	Add experimental deterministic flag (#38683 ) Summary: Adds `torch.experimental.deterministic` flag to enforce deterministic algorithms across all of pytorch. Adds `torch.experimental.deterministic_error_level` to allow users to choose between error/warning/silent if determinism for an operation is not available. Adds `torch.experimental.alert_not_deterministic()` which should be called within operations that are not deterministic. Offers both Python and ATen interfaces Issue https://github.com/pytorch/pytorch/issues/15359 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38683 Differential Revision: D21998093 Pulled By: ezyang fbshipit-source-id: 23aabbddd20f6199d846f97764ff24d728163737	2020-06-12 08:44:06 -07:00
Jerry Zhang	004aa089a6	[jit][subgraph_rewriter] Support list of filters (#39867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39867 Support list of filters in subgraph rewriter, the rewrite will execute only when the match passes all filter check. this is useful for different matches to share the same filter. Test Plan: Imported from OSS Differential Revision: D22009855 fbshipit-source-id: 67aab8d6326b2011a9061397699dc62ee9ad4e2d	2020-06-12 08:24:49 -07:00
Xiang Gao	3876889218	Remove LegacyComplex.h (#39834 ) Summary: All std::complex has been migrated to c10::complex Pull Request resolved: https://github.com/pytorch/pytorch/pull/39834 Differential Revision: D22001969 Pulled By: ezyang fbshipit-source-id: 665a9198afde45a95309053b2f2381e123bf869a	2020-06-12 08:18:25 -07:00
Luca Wehrstedt	ae6a68ad09	[TensorPipe] Add extensive logging (#39781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39781 Use a new feature of TensorPipe where a pipe can tell you the name of the remote endpoint, in order to make the logging messages more informative: whenever there is a failure on a pipe, say which worker this was to/from, and the ID of the message involved. Also, add plenty of verbose logging, to help with debugging. This is off by default, but can be enabled by setting the `GLOG_v` env var to a value of 1 or higher. ghstack-source-id: 105777704 Test Plan: Builds. Differential Revision: D21973150 fbshipit-source-id: 9e3ce1b9977e1e9ecd91ff4a6fe82786dc79a702	2020-06-12 07:11:09 -07:00
Alban Desmaison	52cc0c2c37	Revert D22011184: [pytorch][PR] Fix CUDA device guard usage when first arg of kernel is scalar Test Plan: revert-hammer Differential Revision: D22011184 Original commit changeset: 427291c456e8 fbshipit-source-id: 7d4979e98bbd9294b91da255ecfc063615741630	2020-06-12 06:46:11 -07:00
Jerry Zhang	246d7bb41d	[quant][graphmode] Quantizing traced modules (#39826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39826 Expanding operator test coverage to traced modules Test Plan: Imported from OSS Differential Revision: D21991266 fbshipit-source-id: 73b1d94caa6ad41bb0d6cbde7ba0de343da3e7ff	2020-06-12 00:55:11 -07:00
Yanan Cao	c068233300	Add CHECK-SOURCE-HIGHLIGHTED to file check utils. (#39692 ) Summary: Enhance FileCheck util to check for highlighted source ranges. This is useful when writing tests regarding generated error messages that require source code highlighting. Here is how the error looks like in different cases: - In case of needed source code token not found at all in input string: ``` RuntimeError: Expected to find "invalid_token" but did not find it Searched string: ... <--- HERE def to_list_missing_type_annotation(x): # type: (torch.Tensor) -> List[float] From CHECK-SOURCE-HIGHLIGHTED: invalid_token ``` - In case of source code token not highlighted: ``` Traceback (most recent call last): File "test_range.py", line 11, in <module> FileCheck().check_source_highlighted("x.tolist()").run(s) RuntimeError: Expected to find "~~~~~~~~~~" but did not find it Searched string: # type: (torch.Tensor) -> List[float] li = x.tolist() ~~~~~~~~~ <--- HERE ~~~~~~~~~~~~~~~~~~~... <--- HERE return li ``` It is a bit confusing since both input text (usually an error message) and generated error messages have their highlighted portions, but this is consistent of previous behavior. Another option is to generate plain error messages without additional range highlighting on input text. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39692 Test Plan: Added unit test. Closes https://github.com/pytorch/pytorch/issues/38698 Differential Revision: D22001765 Pulled By: gmagogsfm fbshipit-source-id: 6681441eee5853ab061d198ccfe55ebffddca202	2020-06-11 23:47:07 -07:00
Ivan Kobzarev	0526af1af6	[vulkan] Conv2d with optional clamp (#39115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39115 Conv2d with optional clamp, to handle fused conv2d_clamp from prepacking Test Plan: Imported from OSS Differential Revision: D21962427 Pulled By: IvanKobzarev fbshipit-source-id: 8aec7ae22dff6ed3896011ebc218292b5503a69b	2020-06-11 23:43:34 -07:00
Ivan Kobzarev	71372b452a	[vulkan] addmm support non-vulkan inputs (#39078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39078 Adding support of non-vulkan inputs for addmm operator: if it is not on vulkan - converting to it inside operator, if we run torchscript pretrained model - weights of linear op will be on CPU, we need this to run mobilenetV2 on Vulkan backend Test Plan: Imported from OSS Differential Revision: D21962425 Pulled By: IvanKobzarev fbshipit-source-id: 8222edd31dfb14b326d15e6fec5c8778783479df	2020-06-11 23:41:12 -07:00
Christian Sarofeen	80e5ebf989	[nvFuser] Transform replay refactor and minor updates (#39579 ) Summary: We've got quite a few things going on, preparing a push back to upstream so we don't get too desynced. - Major refactor of transform replay. It is now far more robust and fixes bugs discovered in reductions. Preparing for extension to explicit broadcast ops which will be the last major memory pattern for op coverage. Broadcast ops will allow us to express up to and potentially beyond norms and gemms. - Initial runtime expression evaluator. This allows us to evaluate expressions at runtime. Will be useful for determining our grid/block layout at runtime, so we don't have to manually compute them according to the code we're trying to generate. - Moving to int64 and double for scalar representations to match PyTorch JIT. - Improvements in codegen interface where we return Tensor like object instead of parent class Val. - Add `addcmul` and `lerp` ops - General updates, fixes, test additions, test inprovements. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39579 Differential Revision: D21974001 Pulled By: soumith fbshipit-source-id: 7f7ccc91593466e948f3ce90f8f9b7fbc5c28de2	2020-06-11 23:04:24 -07:00
Meghan Lele	2854811ab8	[JIT] Allow self-referential annotations in classes (#39821 ) Summary: Summary This commit adds support for annotations in method signatures of a TorchScript class types that refer to the class being defined itself. Test Plan This commit adds a unit test to check that a method that uses self-referential type annotations can be defined and produces the same results in Python and TorchScript. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39821 Differential Revision: D22003624 Pulled By: SplitInfinity fbshipit-source-id: dce921c2e0ca0c8aecb52d5b0646b419eb207146	2020-06-11 22:11:27 -07:00
Jerry Zhang	14e841c292	[quant][graphmode] Remove dedup pass (#39825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39825 Removing the pass for now since it is causing error for some models Test Plan: Imported from OSS Differential Revision: D21987878 fbshipit-source-id: 129aefb34754d5390a4c9d3108fa1b6c2eae5a74	2020-06-11 21:46:24 -07:00
Kurt Mohler	2cd27be5b5	Fix CUDA device guard usage when first arg of kernel is scalar (#39870 ) Summary: Add an OptionalDeviceGuard for second arg in gpu_kernel_with_scalars when first arg is scalar Closes https://github.com/pytorch/pytorch/issues/38889 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39870 Differential Revision: D22011184 Pulled By: ngimel fbshipit-source-id: 427291c456e879f25d15ab76a60b5d4ad61f3b3f	2020-06-11 20:08:43 -07:00
Xiang Gao	b10c53e9b8	Vectorize on output for reduction kernels (#37206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37206 Benchmark on P100: https://github.com/zasdfgbnm/things/blob/master/2020Q2/reduction-benchmark-vectorize-output.ipynb ```python import torch print(torch.__version__) print() for i in range(1000): torch.arange(10000, device='cuda') def benchmark(dtype, i): size0 = 2 (i // 2) size1 = 2 ((i + 1) // 2) a = torch.zeros(size0, size1, device='cuda', dtype=dtype) torch.cuda.synchronize() %timeit a.sum(dtype=dtype, dim=0); torch.cuda.synchronize() for dtype in [torch.int8, torch.half, torch.float, torch.double]: print(dtype) for i in range(18, 30): benchmark(dtype, i) print() ``` Before ``` 1.5.0a0+3bbb36e torch.int8 24.5 µs ± 111 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 24.1 µs ± 216 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 26.1 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 30.9 µs ± 132 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 39 µs ± 504 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 59.6 µs ± 244 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 111 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 186 µs ± 300 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 397 µs ± 791 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 665 µs ± 1.06 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.45 ms ± 837 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.03 ms ± 2.79 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) torch.float16 24.2 µs ± 66.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 24.6 µs ± 255 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 27.2 µs ± 53.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 32 µs ± 91 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 48.1 µs ± 89.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 66.9 µs ± 66.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 121 µs ± 102 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 218 µs ± 384 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 431 µs ± 554 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 854 µs ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.75 ms ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.63 ms ± 849 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) torch.float32 24.2 µs ± 117 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 24.4 µs ± 237 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 29.3 µs ± 34.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 40.5 µs ± 36.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 57.4 µs ± 44.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 85.5 µs ± 41.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 158 µs ± 106 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 288 µs ± 181 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 557 µs ± 904 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1e+03 µs ± 1.27 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.98 ms ± 533 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.8 ms ± 1.98 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) torch.float64 25 µs ± 54.4 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 26.9 µs ± 320 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 37.1 µs ± 51.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 54.3 µs ± 45.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 84.9 µs ± 65.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 139 µs ± 68.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 275 µs ± 235 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 504 µs ± 702 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 987 µs ± 613 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.84 ms ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.64 ms ± 2.44 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.19 ms ± 1.19 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` After ``` 1.5.0a0+3bbb36e torch.int8 29.8 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 30.7 µs ± 1.41 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 33.4 µs ± 4.48 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 32.5 µs ± 110 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 40.6 µs ± 94.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 53.7 µs ± 66.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 68 µs ± 69.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 98.2 µs ± 88.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 158 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 283 µs ± 120 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 522 µs ± 563 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 967 µs ± 495 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) torch.float16 29.4 µs ± 68.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 29.2 µs ± 45.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 30.8 µs ± 41 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 35.3 µs ± 20.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 50.1 µs ± 133 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 70.4 µs ± 67.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 101 µs ± 325 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 157 µs ± 179 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 275 µs ± 791 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 486 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 936 µs ± 211 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.85 ms ± 124 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) torch.float32 29.9 µs ± 36.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 29.5 µs ± 108 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 33 µs ± 93.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 46 µs ± 37.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 64 µs ± 73.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 99.4 µs ± 82.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 157 µs ± 74.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 265 µs ± 68.8 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 490 µs ± 319 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 960 µs ± 669 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.84 ms ± 632 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.6 ms ± 1.63 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) torch.float64 33.1 µs ± 74.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 36.7 µs ± 86.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 46.7 µs ± 39.7 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 61.6 µs ± 196 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 100 µs ± 23.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 158 µs ± 202 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 270 µs ± 332 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 491 µs ± 445 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 939 µs ± 339 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.88 ms ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.65 ms ± 5.18 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.3 ms ± 7.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Test Plan: Imported from OSS Differential Revision: D21233255 Pulled By: ngimel fbshipit-source-id: d468fddbb228c0c13146dfc6344c470513f9e374	2020-06-11 19:44:17 -07:00
Gao, Xiang	a92231b70e	Typo in Dispatch.h (#39882 ) Summary: std::complex is gone, now we are using c10::complex on all dispatch macros. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39882 Differential Revision: D22009933 Pulled By: pbelevich fbshipit-source-id: 613ac06d0024f149184d0b2e08ed06d7d6066017	2020-06-11 19:35:50 -07:00
Linbin Yu	bbf364b0c1	move basic math ops to lite interpreter (#39861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39861 move some basic math ops to lite interpreter size change should be small Test Plan: build Reviewed By: iseeyuan Differential Revision: D21992552 fbshipit-source-id: 7f5a7380ffc1519001a98169e6c5381e45e8e0ea	2020-06-11 18:55:58 -07:00
Elias Ellison	bdecedd2d7	[JIT] use python type resolver for all types (#39880 ) Summary: Follow up to https://github.com/pytorch/pytorch/pull/39269, use the python resolver for all type resolutions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39880 Reviewed By: jamesr66a Differential Revision: D22008110 Pulled By: eellison fbshipit-source-id: 4e62eb0867d79e7d45156f88d628a28fa1578b9e	2020-06-11 18:36:31 -07:00
Edward Yang	0aa70039f9	Delete redundant device/dtype in TensorIterator add_input/add_output (#39798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39798 add_input's device/dtype are 100% redundant, as compute_types will always (internal) assert that this dtype matches the expected dtype. add_output's device/dtype is redundant UNLESS you have an undefined tensor (in which case it seems to be an indication what the output type should be). The one add_output case I killed can never be exercised, see: ``` import torch x = torch.randn(3, 4) mask = x.ge(0.5) torch.masked_select(x.cuda(), mask.cuda(), out=torch.zeros((0), dtype=torch.int64, device='cuda')) ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21981742 Pulled By: ezyang fbshipit-source-id: a042d1b9fce0ad58b833856ffe32001787551e59	2020-06-11 17:32:34 -07:00
Natalia Gimelshein	f59e38974a	fix multinomial for empty batch (#39873 ) Summary: Per title Pull Request resolved: https://github.com/pytorch/pytorch/pull/39873 Reviewed By: ailzhang Differential Revision: D22004830 Pulled By: ngimel fbshipit-source-id: 0274cd2ee40e84f06b34e7b53329e95d05a9ddd4	2020-06-11 17:26:39 -07:00
Nikita Shulga	8c8d9f8971	Move pip install after setting up VS environment (#39898 ) Summary: This is needed, because pip might want to build ninja from source Pull Request resolved: https://github.com/pytorch/pytorch/pull/39898 Differential Revision: D22010548 Pulled By: malfet fbshipit-source-id: 55423324c381aaec8a3c81f95f9405dd618b4e49	2020-06-11 17:15:03 -07:00
Nick Gibson	63dc1363e6	[TensorExpr] Eliminate Cond statements when each branch is a different kind of empty (#39754 ) Summary: Fix another simplification edge case, a Cond statement when one branch is nullptr and the other is a zero stmt block. This happens mostly with an if with no else branch where all statements inside the if are removed (eg via inlining or simplification). Common case is SplitWithMask -> ComputeInline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39754 Differential Revision: D21962987 Pulled By: nickgg fbshipit-source-id: 2461415466fbbab88d2329061f90fcfdfa85e243	2020-06-11 17:08:14 -07:00
Ivan Kobzarev	36501ff5d9	[vulkan] VulkanTensor, add strides in interface (#39077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39077 We plan to support strides for Vulkan, but that is not implemented yet. The main intention of this faking of strides and is_contiguous - to be able to run torchscript mobilenetV2 on Vulkan backend for development and profiling. This change adds strides to Vulkan interface and overrides strides(), stride(), is_contiguous() of OpaqueTensorImpl for that purpose. Test Plan: Imported from OSS Differential Revision: D21962426 Pulled By: IvanKobzarev fbshipit-source-id: cfef4903ad7062170926264f45cff1293ade78f6	2020-06-11 16:38:06 -07:00
Edward Yang	eace053398	Move all torch.nn.modules type annotations inline (#38211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38211 Just because the annotations are inline doesn't mean the files type check; most of the newly annotated files have type errors and I added exclusions for them in mypy.ini. The payoff of moving all of these modules inline is I can delete the relevant code generation logic for the pyi files (which was added ignore annotations that weren't actually relevant anymore.) For the most part the translation was completely mechanical, but there were two hairy issues. First, I needed to work around a Python 3.6 and earlier bug where Generic has a nontrivial metaclass. This fix is in torch/jit/__init__.py. Second, module.py, we need to apply the same fix for avoiding contravariance checks that the pyi file used to have; this is done by declaring forward as a variable (rather than a function), which appears to be sufficient enough to get mypy to not contravariantly check input arguments. Because we aren't actually typechecking these modules in most cases, it is inevitable that some of these type annotations are wrong. I slavishly copied the old annotations from the pyi files unless there was an obvious correction I could make. These annotations will probably need fixing up later. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21497397 Pulled By: ezyang fbshipit-source-id: 2b08bacc152c48f074e7edc4ee5dce1b77d83702	2020-06-11 15:59:57 -07:00
Xiang Gao	e22dd561ad	Migrate pow kernel to c10::complex (#39286 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39286 Test Plan: Imported from OSS Differential Revision: D21999454 Pulled By: pbelevich fbshipit-source-id: c8f1ba4ff4ec66ffbc283700cabb6794e6b2896a	2020-06-11 15:49:30 -07:00
peter	b5848833f0	Add runtime check for MSVC redist (#39841 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39734. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39841 Differential Revision: D21998020 Pulled By: ezyang fbshipit-source-id: 77df537045e4d7e718ab34e35bb6f847638f4b01	2020-06-11 15:37:21 -07:00
Richard Zou	9ca7fdcef0	Attempt to fix macos ci by pinning numba (#39875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39875 Numba released a new version (0.50) that is causing problems with librosa (we use this as a test dependency). Try pinning the version of numba to temporarily fix. I am not actually sure if this is going to work because it is unclear when we actually install numba. Test Plan: - wait for CI. Reviewed By: mruberry Differential Revision: D22005838 Pulled By: zou3519 fbshipit-source-id: 4bccfa622c82533d85631052e4ad273617ea31d7	2020-06-11 15:25:54 -07:00
Daiming Yang	0b90b9cdd3	Allow shuffle when auto-batching disabled in DataLoader (#39865 ) Summary: Fix https://github.com/pytorch/pytorch/issues/35761 cc SsnL Note: closed the other PR for this new branch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39865 Differential Revision: D22003612 Pulled By: ezyang fbshipit-source-id: 26aecd1b298fe99d3924f4c8157cd6cae2561c7c	2020-06-11 15:17:46 -07:00
Eli Uriegas	ae3567427f	.circleci: Remove upload_binary_sizes job (#39786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39786 This wasn't really used since we already have an internal SCUBA table to handle this use case and it doesn't rely on a singular script to run after all binaries have been uploaded. Also the web page took an enormously long time to actually load decreasing its usefulness, let's just get rid of the job altogether instead of trying to fix something no one really looked at Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Differential Revision: D22007197 Pulled By: seemethere fbshipit-source-id: d824b576e07c9cf1603db5ac14940b06ecdd2a0e	2020-06-11 15:00:12 -07:00
Mike Ruberry	32bf63890b	Revert D21992267: [pytorch][PR] [ONNX] Export linspace Test Plan: revert-hammer Differential Revision: D21992267 Original commit changeset: 3a4093703570 fbshipit-source-id: 09c8cddd8cac3bb1cfa2b5f1abf2af3c45d8a3b1	2020-06-11 14:46:02 -07:00
Christian Puhrsch	8893c0670d	Revert D21511611: Wrap Caffe2 (RowWise)SparseAdagrad fusion operator as a PT op Test Plan: revert-hammer Differential Revision: D21511611 Original commit changeset: 1a0bb8252ec0 fbshipit-source-id: 462cb20585cf3ad415c80f0f1d77e5fdba2a6ea1	2020-06-11 14:40:18 -07:00
neginraoof	91d539097b	[ONNX] Fix regression disabling checker (#39073 ) Summary: Fix regression disabling checker. Checker should be enabled for ONNX export type only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39073 Reviewed By: hl475 Differential Revision: D21992276 Pulled By: houseroad fbshipit-source-id: 79c671fc4af9e6d28e8957e04ae205f42f4bb38a	2020-06-11 14:03:18 -07:00
Jade Nie	85f1f67f33	Wrap Caffe2 (RowWise)SparseAdagrad fusion operator as a PT op (#38704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38704 This diff wraps Caffe2's (RowWise)SparseAdagrad fusion operator on GPU as a PT op. Reviewed By: jianyuh, xw285cornell Differential Revision: D21511611 fbshipit-source-id: 1a0bb8252ec0a8229eb80708338cb23008cfb26d	2020-06-11 13:56:02 -07:00
Ilia Cherniavskii	01986e9890	Wait for all op types in SimpleNet (#39493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39493 Make sure we wait for all types, incl. async cpu ops Test Plan: CI Reviewed By: kennyhorror Differential Revision: D21873540 fbshipit-source-id: 37875cade68e1b3323086833f8d4db79362a68e8	2020-06-11 13:00:34 -07:00
Richard Zou	7a792879f2	Prevent clobbering of docker images by parallelnative/paralleltbb builds (#39863 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39863 The paralleltbb and parallelnative builds use the same docker image as pytorch-linux-trusty-py3.6-gcc5.4-build (`da3073e9b1/.circleci/config.yml (L6752)`) Therefore they should push to a different intermediate docker image for the next phase (testing), according to (`da3073e9b1/.circleci/config.yml (L434-L439)`). However, they're not actually included in that list. We've found evidence of what looks like clobbering in recent CI jobs (https://circleci.com/gh/pytorch/pytorch/5787534?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link), (https://circleci.com/gh/pytorch/pytorch/5787763?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link) This PR adds parallelnative and paralleltbb to the list to prevent clobbering. Test Plan: - Wait for CI tests to pass on this PR. - The paralleltbb and parallelnative builds don't actually run on PRs. So I think the plan here is to yolo land and hope it works. Differential Revision: D22002279 Pulled By: zou3519 fbshipit-source-id: 89f05cedb2fbde2dc3033458b1e0c5be0b194955	2020-06-11 12:48:32 -07:00
Nick Gibson	2b29feace4	[TensorExpr] Fix IRPrinter for function calls with uniqued names (#39753 ) Summary: IRPrinter was using the name_hint rather than the uniqued name when printing FunctionCalls, leading to output that appeared incorrect. E.g. for the following set of tensorexprs: ``` producer[M, N] = M * N; chunk[M, N/2] = producer[M, N]; chunk_1[M, N/2] = producer[M, N + N/2]; consumer[M, N] = chunk_1[M, N]; ``` Before fix: ``` { for (int m = 0; m < 4; m++) { for (int n = 0; n < 20; n++) { producer[m, n] = m * n; } } for (int m_1 = 0; m_1 < 4; m_1++) { for (int n_1 = 0; n_1 < 10; n_1++) { chunk[m_1, n_1] = producer(m_1, n_1); } } for (int m_2 = 0; m_2 < 4; m_2++) { for (int n_2 = 0; n_2 < 10; n_2++) { chunk_1[m_2, n_2] = producer(m_2, n_2 + 10); } } for (int i = 0; i < 4; i++) { for (int j = 0; j < 10; j++) { consumer[i, j] = i * (chunk(i, j)); <----- HERE! } } } ``` After fix: ``` { for (int m = 0; m < 4; m++) { for (int n = 0; n < 20; n++) { producer[m, n] = m * n; } } for (int m_1 = 0; m_1 < 4; m_1++) { for (int n_1 = 0; n_1 < 10; n_1++) { chunk[m_1, n_1] = producer(m_1, n_1); } } for (int m_2 = 0; m_2 < 4; m_2++) { for (int n_2 = 0; n_2 < 10; n_2++) { chunk_1[m_2, n_2] = producer(m_2, n_2 + 10); } } for (int i = 0; i < 4; i++) { for (int j = 0; j < 10; j++) { consumer[i, j] = i * (chunk_1(i, j)); <----- HERE! } } } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39753 Differential Revision: D21962441 Pulled By: nickgg fbshipit-source-id: caa429caf0df9c7b16e109937412587bff6dc886	2020-06-11 12:13:28 -07:00
neginraoof	7957d83498	[ONNX] Export linspace (#39403 ) Summary: Adding linspace symbolic for opset 11 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39403 Reviewed By: hl475 Differential Revision: D21992267 Pulled By: houseroad fbshipit-source-id: 3a40937035703754045bb5e22ac5edf721453c25	2020-06-11 12:08:19 -07:00
Edward Yang	4c7d81f847	Add documentation for properties in TensorIterator. (#39792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39792 I also deleted the dead TensorIterator::remove_dimension, and reordered some properties so they were more logically grouped. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21981739 Pulled By: ezyang fbshipit-source-id: e7c9ad0233284f7c47322e62035edb704640aafd	2020-06-11 12:04:17 -07:00
Edward Yang	f33c31eace	Separate "configuration" properties in TensorIterator (#39789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39789 Some properties on TensorIterator are only set prior to build() by the user and then immutable during the build process. I've renamed all such properties so that they have a config_ prefix, gave them an explicit accessor and audited every site to ensure they are only written once. I also renamed check_mem_overlaps to compute_mem_overlaps to avoid confusion with the accessor check_mem_overlap. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21981741 Pulled By: ezyang fbshipit-source-id: b64e33a5d0bc01834ead6d7082605c20a5ed1a08	2020-06-11 12:01:32 -07:00
Dmytro Dzhulgakov	1f027ac02d	Disable testTHCAllocator on HIP (#39843 ) Summary: THCAllocator functionality is pretty obscure and it's hard to get it working with HIP because of how Caffe2/PyTorch rules are set up (see https://github.com/pytorch/pytorch/issues/39801). Let's just disable the test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39843 Reviewed By: zou3519 Differential Revision: D21998687 Pulled By: dzhulgakov fbshipit-source-id: cd12ba30cdfee658b98393ed3a72e83f4ecf1c9c	2020-06-11 11:36:17 -07:00
Taiqing Wang	ad91a3a11f	Skipping L2 regularization on sparse biases Summary: # Motivations As explained in the [link](https://stats.stackexchange.com/questions/86991/reason-for-not-shrinking-the-bias-intercept-term-in-regression/161689#161689), regularizing biases will cause mis-calibration of predicted probabilities. In SparseNN, the unary processor may use 1d embedding tables for the sparse features to serve as biases. In this diff, the regularization term is automatically skipped for the 1d sparse parameters to avoid regularizing biases. # Experiments Experiments were conducted to verify that it has no significant impact on the NE to skip the regularization on 1d sparse parameters. Baseline.1 (no L2 regularization): f193105372 Baseline.2 (L2 regularization in prod): f193105522 Treatment (skipping L2 regularization on 1d sparse params): f193105708 {F239859690} Test Plan: Experiments were conducted to verify that it has no significant impact on the NE to skip the regularization on 1d sparse parameters using a canary package: `aml.dper2.canary:9efc576b35b24361bb600dcbf94d31ea`. Baseline.1 (no L2 regularization): f193105372 Baseline.2 (L2 regularization in prod): f193105522 Treatment (skipping L2 regularization on 1d sparse params): f193105708 Reviewed By: zhongyx12 Differential Revision: D21757902 fbshipit-source-id: ced126e1eab270669b9981c9ecc287dfc9dee995	2020-06-11 11:21:25 -07:00
Richard Zou	da3073e9b1	Revert D21960728: Include AT_PARALLEL_OPENMP/AT_PARALLEL_NATIVE/AT_PARALLEL_NATIVE_TBB to ATen/Config.h Test Plan: revert-hammer Differential Revision: D21960728 Original commit changeset: dad5da902b97 fbshipit-source-id: ca1757409cb3ffa7e069699276dc0363f6879f97	2020-06-11 08:29:57 -07:00
Richard Zou	aaaf2eb6b3	Add batching rule for torch.sum(tensor, dims) (#39581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39581 Context: Batching rules ------------------------------------ Batching rules take BatchedTensors and regular Tensors as arguments. A batching rule generally does the following: 1. Converts (logical) BatchedTensors to views on physical tensors. 2. Converts logical arguments (e.g. dimension indexes, shapes) to physical arguments that correspond to the physical tensors. 3. Calls at:: operations on the physical tensors and arguments. 4. Converts physical results back to BatchedTensors. Steps 1 and 2 differ for operators with different batching behaviors. (see next section) VmapTransform abstraction ------------------------------------ (Previously known as a "Converter". Bikeshedding welcome on the naming). An ArgTransform converts logical views of tensors to physical views. When writing a batching rule, users should select the ArgTransform that matches the batching behavior of their operator. If the batching behavior of the op is complicated, then they’ll have to write some custom logic (either by writing a new ArgTransform, or writing the logical->physical transform themselves). 56% (~474) of (vmap-supported) operators can and will use these VmapTransform. 20% (~168) of operators need custom handling. See `VmapTransforms.h` for more context. PhysicalView ------------------------------------ VmapTransforms return physical views on tensors, represented by the PhysicalView struct. It is effectively a Tensor and contains enough metadata to enable mapping logical non-tensor arguments to physical non-tensor arguments, and the other way around. There are two methods on PhysicalView right now: - `PhysicalView::getPhysicalDim(logical_dim)` and `PhysicalView::getPhysicalDims(logical_dims)`. are used to map logical dims to physical dims. - `PhysicalView::newLogicalFromPhysical(Tensor)` is used to map a result physical tensor from a batching rule to a logical tensor (BatchedTensor). Test Plan: ------------------------------------ - `./build/bin/vmap_test` Differential Revision: D21983789 Pulled By: zou3519 fbshipit-source-id: dc558e05b596fd29f9643e933e4ece4b7866b6db	2020-06-11 08:00:12 -07:00
Richard Zou	2d1cf950bb	Impose maximum level restriction for BatchedTensors (#39580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39580 We support 64 total levels. This is done so that we can represent lists of levels as a bitset that fits into a single `int64_t` and is a reasonable upper bound because we only support (physical) tensors of up to 64 dimensions with vmap (see kVmapMaxTensorDims). Test Plan: `./build/bin/vmap_test`. One day we'll test this with the vmap Python API. Differential Revision: D21929249 Pulled By: zou3519 fbshipit-source-id: 2e99c0c519d6ab0c063fda20f4a0b1f53da6d450	2020-06-11 07:58:35 -07:00
Richard Zou	1360bb986c	Revert D21976091: [vulkan][CI] CI android build abi x86 with USE_VULKAN Test Plan: revert-hammer Differential Revision: D21976091 Original commit changeset: cb9fa5612cfe fbshipit-source-id: 00f41068c6b2c62473ba9c4026e0122c331b54bf	2020-06-11 07:24:31 -07:00
Emilio Castillo	ba27fd04d3	Fixes type promotion for `cat` (#39777 ) Summary: Fixes a bug introduced in https://github.com/pytorch/pytorch/issues/35030 Changes the test to do all the possible type combinations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39777 Differential Revision: D21975165 Pulled By: albanD fbshipit-source-id: 6d59cfac4f1abe021f8b489454c1c176e7893ecd	2020-06-11 05:52:56 -07:00
Jerry Zhang	c3d4053bc0	[quant][graphmode] Support quantized::batch_norm2d_relu fusion for tracing (#39645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39645 This PR added quantization support for handling BatchNorm2d and ReLU(or F.relu) in both scripting and tracing Test Plan: python test/test_quantization.py TestQuantizeScriptPTSQOps.test_qbatchnorm_relu Imported from OSS Differential Revision: D21942111 fbshipit-source-id: 680e16076a37b96d2485d5cbc39ce9a045c319c3	2020-06-10 23:32:59 -07:00
Supriya Rao	e1392922f2	[quant] Enable per-channel quantization for LSTM Modules (#39666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39666 Test Plan: python test/test_quantization.py TestPostTrainingDynamic.test_per_channel_lstm_quantize Imported from OSS Differential Revision: D21977601 fbshipit-source-id: 1333259e75782e54864ab444e05397b86cd9b9aa	2020-06-10 23:19:08 -07:00
Supriya Rao	425927bb2b	[quant] Add reduce_range params for quantized_lstm (#39604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39604 This change preserves BC for older models that are saved with reduce_range set to false. Newer models will use the version information in RNN module to toggle reduce_range parameter Internally this is implemented using a new CellParams type that calls the linear functions with reduce_range option set to true. New models serialized will use the CellParams struct for the `__getstate__` and `__setstate__` calls. Older models using QuantizedCellParamsDynamic will continue to use their original serialization/de-serialization methods tested using LSTM BC test and test_quantized_rnn Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D21977600 fbshipit-source-id: 0cb0e098b87207b537574d3beeab1f341c41c0d2	2020-06-10 23:16:57 -07:00
Ivan Kobzarev	e399e470b6	[vulkan] speed_becnhmark_torch add vulkan arg to use Vulkan backend (#39076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39076 `--vulkan` argument to use torch benchmark on Vulkan Backend if it is True - inputs will be converted to Vulkan backend before module.forward Usage for mobilenetv2 fp32: ``` /build/bin/speed_benchmark_torch --model=mn-fp32.pt --input_type=float --input_dims=1,3,224,224 --warmup=1 --iter=5 --vulkan=true ``` Test Plan: Imported from OSS Differential Revision: D21962428 Pulled By: IvanKobzarev fbshipit-source-id: 3136af5386b6bce9ea53ba4a9019af2d312544b3	2020-06-10 22:19:22 -07:00
ShawnZhong	a752832da9	Fix `Tensor.tolist` signature in the docstring (#39732 ) Summary: - Before ![image](https://user-images.githubusercontent.com/6421097/84171548-8b4e1300-aa40-11ea-90e2-d75a672979d7.png) - After ![image](https://user-images.githubusercontent.com/6421097/84171953-01527a00-aa41-11ea-933c-a6b7a02100ea.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/39732 Differential Revision: D21984956 Pulled By: ngimel fbshipit-source-id: fa02776cbe0aa27d4da21a34aae191b491199c28	2020-06-10 22:00:14 -07:00
Vasiliy Kuznetsov	5d2f6d86e5	graph mode: add quantization type enum (#39795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39795 Replaces the `is_dynamic` bool by enums in Python and c++ graph quantization code. This makes the code more readable and will make it easier to modify for adding QAT logic in the future. Test Plan: CI, as well as ``` python test/test_quantization.py TestQuantizeDynamicScript python test/test_quantization.py TestQuantizeScriptJitPasses ``` Imported from OSS Differential Revision: D21981643 fbshipit-source-id: d475760407bcc794aeae92a2c696bac4acda843d	2020-06-10 21:34:23 -07:00
Vasiliy Kuznetsov	94dfc76e3f	graph mode qat: make fake_quantize scriptable (#39750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39750 Add a test to make the default QAT qconfig scriptable, and fix all the errors. Test Plan: ``` python test/test_quantization.py TestQATScript.fake_quant_scriptable ``` Imported from OSS Differential Revision: D21975879 fbshipit-source-id: 8c48ad9f24b2c941d2267cb53eb70ebecd103744	2020-06-10 21:34:18 -07:00
Vasiliy Kuznetsov	5c10b17491	graph mode: more docs for insert observers pass (#39739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39739 Adding more docs and examples to make code reasing easier for newcomers. Test Plan: CI, no logic changes Imported from OSS Differential Revision: D21975878 fbshipit-source-id: 464858c0490cfbdec165a5ddf3817ca4878abb09	2020-06-10 21:34:13 -07:00
Vasiliy Kuznetsov	f8561acb13	graph mode: add docs to pre-calibration passes (#39683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39683 Adding a couple of docstrings to `_jit_pass_dedup_module_uses` and `_jit_pass_insert_observers`. Test Plan: CI, no logic change Imported from OSS Differential Revision: D21975880 fbshipit-source-id: 8876e0e981d6675bce08fa8e08ac7a3d38c3c622	2020-06-10 21:32:25 -07:00
Pavel Belevich	4c4b9916ef	Include AT_PARALLEL_OPENMP/AT_PARALLEL_NATIVE/AT_PARALLEL_NATIVE_TBB to ATen/Config.h (#39612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39612 Fixes #39471 Manually tested with https://github.com/pytorch/csprng Test Plan: Imported from OSS Reviewed By: ezyang, ljk53 Differential Revision: D21960728 Pulled By: pbelevich fbshipit-source-id: dad5da902b97a080753482ec5c293eee9bba89c8	2020-06-10 21:04:44 -07:00
kshitij12345	97dfdaaad8	torch.multinomial : fast-path for replacement=False (#39742 ) Summary: Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import time import torch import numpy as np for n, t in [(500_000, 10), (1_000_000, 10)]: for dtype in (torch.half, torch.float, torch.double): # Input Setup p = torch.from_numpy(np.random.rand(n)).to(dtype) want = 1000 print(f'torch.multinomial(a) a.numel() == {n} for {t} times {dtype}') start = time.time() # Iterate for _ in range(t): torch.multinomial(p, want, replacement=False) print(f'Took:', time.time() - start) print('***' 10) for n, t in [(50_000, 100), (100_000, 100)]: for dtype in (torch.half, torch.float, torch.double): # Input Setup p = torch.rand(n, device='cuda', dtype=dtype) want = 1000 print(f'torch.multinomial(a) a.numel() == {n} for {t} times {dtype}') start = time.time() # torch.cuda.synchronize() # Iterate for _ in range(t): torch.multinomial(p, want, replacement=False) # torch.cuda.synchronize() print(f'CUDA Took:', time.time() - start) ``` Before: ``` torch.multinomial(a) a.numel() == 500000 for 10 times torch.float16 Took: 80.64455389976501 torch.multinomial(a) a.numel() == 500000 for 10 times torch.float32 Took: 3.7778031826019287 torch.multinomial(a) a.numel() == 500000 for 10 times torch.float64 Took: 5.045570611953735 torch.multinomial(a) a.numel() == 1000000 for 10 times torch.float16 Took: 161.53191947937012 torch.multinomial(a) a.numel() == 1000000 for 10 times torch.float32 Took: 7.640851736068726 torch.multinomial(a) a.numel() == 1000000 for 10 times torch.float64 Took: 10.399673461914062 ************************************** torch.multinomial(a) a.numel() == 50000 for 100 times torch.float16 CUDA Took: 4.873984098434448 torch.multinomial(a) a.numel() == 50000 for 100 times torch.float32 CUDA Took: 4.713594436645508 torch.multinomial(a) a.numel() == 50000 for 100 times torch.float64 CUDA Took: 11.167185068130493 torch.multinomial(a) a.numel() == 100000 for 100 times torch.float16 CUDA Took: 7.195427417755127 torch.multinomial(a) a.numel() == 100000 for 100 times torch.float32 CUDA Took: 7.669712066650391 torch.multinomial(a) a.numel() == 100000 for 100 times torch.float64 CUDA Took: 20.20938801765442 ``` After: ``` torch.multinomial(a) a.numel() == 500000 for 10 times torch.float16 Took: 81.09321522712708 torch.multinomial(a) a.numel() == 500000 for 10 times torch.float32 Took: 0.06062650680541992 torch.multinomial(a) a.numel() == 500000 for 10 times torch.float64 Took: 0.0862889289855957 torch.multinomial(a) a.numel() == 1000000 for 10 times torch.float16 Took: 161.85304307937622 torch.multinomial(a) a.numel() == 1000000 for 10 times torch.float32 Took: 0.13271093368530273 torch.multinomial(a) a.numel() == 1000000 for 10 times torch.float64 Took: 0.17215657234191895 ************************************** torch.multinomial(a) a.numel() == 50000 for 100 times torch.float16 CUDA Took: 0.035035133361816406 torch.multinomial(a) a.numel() == 50000 for 100 times torch.float32 CUDA Took: 0.03631949424743652 torch.multinomial(a) a.numel() == 50000 for 100 times torch.float64 CUDA Took: 0.05507040023803711 torch.multinomial(a) a.numel() == 100000 for 100 times torch.float16 CUDA Took: 0.05105161666870117 torch.multinomial(a) a.numel() == 100000 for 100 times torch.float32 CUDA Took: 0.05449223518371582 torch.multinomial(a) a.numel() == 100000 for 100 times torch.float64 CUDA Took: 0.09161853790283203 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39742 Differential Revision: D21976915 Pulled By: ngimel fbshipit-source-id: 34431f814f31b6dfd6179a89f8e4fa574da7a306	2020-06-10 20:42:55 -07:00
James Reed	780fa2b489	Switch torch.save to zipfile serialization and swap quantization to that (#39460 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39460 Test Plan: Imported from OSS Differential Revision: D21865748 Pulled By: jamesr66a fbshipit-source-id: 90fddf366fcb3030e09ed79fb3e038f0175875a5	2020-06-10 17:19:55 -07:00
Jongsoo Park	262dbdf0a5	[caffe2/nomnigraph] handle when PATH env is not defined (#39373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39373 Line 114 is the only actual change. Other changes are just formatting. Test Plan: CI Reviewed By: zrphercule Differential Revision: D21830893 fbshipit-source-id: 83e49b1b3c48f6bc6de3c48ccce60c84aa49339b	2020-06-10 17:09:59 -07:00
Shawn Zhong	96870181c6	Remove duplicated entries in `random.rst` (#39725 ) Summary: In the current master doc, every function under [`torch.random`](https://pytorch.org/docs/master/random.html) appears twice because the function docs are generated by both `automodule` and `autofunction`. This PR removes the parts generated by `autofunction`. See changed docs at https://5751500-65600975-gh.circle-artifacts.com/0/docs/random.html: ![image](https://user-images.githubusercontent.com/6421097/84165823-bf720580-aa39-11ea-9149-c428d43260f8.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/39725 Differential Revision: D21983701 Pulled By: ngimel fbshipit-source-id: 5f515d7fd8034687e754e2c7b2ea9e154b3ea9b9	2020-06-10 16:51:15 -07:00
Shawn Zhong	be838504a3	Remove `THTensor_(fill)` & `THTensor_(zero)` (#39727 ) Summary: Remove `THTensor_(fill)` & `THTensor_(zero)` following the PR https://github.com/pytorch/pytorch/pull/39042 as reference Pull Request resolved: https://github.com/pytorch/pytorch/pull/39727 Differential Revision: D21980423 Pulled By: ngimel fbshipit-source-id: bfeeaebe93d9ff465c7ad21c872cad8f8399537d	2020-06-10 15:06:04 -07:00
Antoine Broyelle	bfa76ff407	[Doc] Clarify that variance estimor is biaised for normalization layers (#39752 ) Summary: Closes https://github.com/pytorch/pytorch/issues/39330 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39752 Differential Revision: D21980097 Pulled By: ngimel fbshipit-source-id: 2bdcb8bf8194768985f5a8787712d215c0c5c1ec	2020-06-10 14:44:44 -07:00
Mike Ruberry	95489b590f	Throws runtime error when performing integer division using torch.div (#38620 ) Summary: 1.6 Deprecation Note In PyTorch 1.6 attempting to divide two integer tensors or an integer tensor and an integer scalar will throw a runtime error. This behavior was deprecated with a warning in PyTorch 1.5. In PyTorch 1.7 torch.div and the division operator will always perform true division like Python3 and NumPy. To divide integer values use either torch.true_divide, for true division, or torch.floor_divide (the // operator) for floor division. PR Summary This PR updates the warning message when performing integer division to be a runtime error. Because some serialized Torchscript programs may rely on torch.div's historic behavior it also implements a "versioned symbol" for div that lets those models retain their current behavior. Extensive tests of this behavior are the majority of this PR. Note this change bumps the produced file format version to delineate which programs should have their historic div behavior preserved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38620 Differential Revision: D21612598 Pulled By: mruberry fbshipit-source-id: c9c33591abce2f7e97f67f0f859901f5b03ed47d	2020-06-10 13:59:34 -07:00
Ivan Kobzarev	cb519801d6	[vulkan][CI] CI android build abi x86 with USE_VULKAN (#39767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39767 Adding android build for every PR with `-DUSE_VULKAN=ON` It will use vulkan from ANDROID_NDK, so no changes for docker images. Test Plan: Imported from OSS Differential Revision: D21976091 Pulled By: IvanKobzarev fbshipit-source-id: cb9fa5612cfebc02dfd4946e50faa121311780f7	2020-06-10 13:55:47 -07:00
Ivan Kobzarev	a5fbd3ef8a	[vulkan][build_fix] Fix Vulkan Build; Prepacking uses new register api (#39771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39771 Vulkan build was not integrated with CI, it fails without this change. There were 2 separate problems 1. Recently added aten/src/ATen/templates/Functions.cpp missed VulkanType in header 2. Applying the new registration api, similar to xnnpack change https://github.com/pytorch/pytorch/pull/36800 Test Plan: `ANDROID_ABI=x86 ./scripts/build_android.sh -DUSE_VULKAN=ON` builds ok CI integration for it is in the next PR in this stack ( https://github.com/pytorch/pytorch/pull/39767 ) job `ci/circleci: pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_vulkan_build` Differential Revision: D21975992 Pulled By: IvanKobzarev fbshipit-source-id: b0400a9cb0ae90d7763ebeb5b8f7ee932a2148e1	2020-06-10 13:54:12 -07:00
Nikolay Korovaiko	7f55197a57	Peel Loop (#39434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39434 Differential Revision: D21857037 Pulled By: Krovatkin fbshipit-source-id: 6583da167fe93d96e93f1c3d71f46f94e7f4e982	2020-06-10 13:48:18 -07:00
Will Constable	a1071e5d36	Fix parsing of subscript expressions using python resolver (#39269 ) Summary: - add call out to python resolver in parseArgsFromDecl, parserReturnFromDecl - add support in python resolver for nested subexpressions - wrap python resolver call in exception handling to fall back to c++ path - add tests for newly resolvable types - closes https://github.com/pytorch/pytorch/issues/38728 Fixes bug where SourceRange objects did not include the final closing ']' for a subscript expression. E.g. range for 'List[int]' previously included only 'List[int'. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39269 Differential Revision: D21956402 Pulled By: wconstab fbshipit-source-id: 5d783260322eb1e04e20bc931a8e9d9179765f13	2020-06-10 13:30:15 -07:00
Michael Jungo	6748fbd38a	Remove `MultiheadAttention` weights from constants (#39768 ) Summary: The weights of the `MultiheadAttention` were incorrectly listed as constants, which produced warnings when converting to a TorchScript module. ```py import torch import torch.nn as nn multihead_attn = nn.MultiheadAttention(256, 4) torch.jit.script(multihead_attn) ``` Warnings: ``` /home/michael/.local/lib/python3.8/site-packages/torch/jit/_recursive.py:151: UserWarning: 'q_proj_weight' was found in ScriptModule constants, but it is a non-constant parameter. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " /home/michael/.local/lib/python3.8/site-packages/torch/jit/_recursive.py:151: UserWarning: 'k_proj_weight' was found in ScriptModule constants, but it is a non-constant parameter. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " /home/michael/.local/lib/python3.8/site-packages/torch/jit/_recursive.py:151: UserWarning: 'v_proj_weight' was found in ScriptModule constants, but it is a non-constant parameter. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " /home/michael/.local/lib/python3.8/site-packages/torch/jit/_recursive.py:151: UserWarning: 'in_proj_weight' was found in ScriptModule constants, but it is a non-constant parameter. Consider removing it. warnings.warn("'{}' was found in ScriptModule constants, " ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39768 Reviewed By: zhangguanheng66 Differential Revision: D21977032 Pulled By: ngimel fbshipit-source-id: c2c3d0605a51324a9541f5a2caca7ab7a518dc00	2020-06-10 13:23:48 -07:00
Shen Li	3fb1e73a4e	Add rpc.async_execution support for rpc.remote on script functions (#39758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39758 Test Plan: Imported from OSS Differential Revision: D21963789 Pulled By: mrshenli fbshipit-source-id: f16f464ba01401b160cc4d3daf036e4bc806d7ea	2020-06-10 13:17:07 -07:00
Radhakrishnan Venkataramani	eb7843ed01	[quantization] Remove duplicated piece of code in test (just a nit). (#39770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39770 Remove duplicated piece of code in test (just a nit). Test Plan: buck test test:quantization Reviewed By: supriyar Differential Revision: D21967877 fbshipit-source-id: 48a2d60e108fb9ddfa98e30888cf45744905277d	2020-06-10 12:51:42 -07:00
xueht-fnst	d04a3fcc42	Refactor CUDA bernoulli_kernel by using uniform_and_transform (#39652 ) Summary: - Fixes https://github.com/pytorch/pytorch/issues/39557 . - Related https://github.com/pytorch/pytorch/issues/38558 . - Simplifed `void bernoulli_kernel(TensorIterator& iter, double p_, RNG gen)` in `cuda/DistributionTemplates.h` by using `uniform_and_transform`. - Unified `void bernoulli_kernel(TensorIterator& iter, double p_, RNG gen)` with other kernels in `cuda/DistributionTemplates.h`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39652 Differential Revision: D21974529 Pulled By: pbelevich fbshipit-source-id: 5bbc06350714f4e72dc6ea8a0016769551610a52	2020-06-10 12:49:59 -07:00
Luca Wehrstedt	68b8740611	Update TensorPipe submodule (#39783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39783 This is needed to pick up the new pipe method used in https://github.com/pytorch/pytorch/pull/39781. Test Plan: CircleCI Reviewed By: patricklabatut Differential Revision: D21974131 fbshipit-source-id: 4b74064279ad4881cbd95e408423566a1cd62c2a	2020-06-10 12:41:32 -07:00
Yanan Cao	c22bbb2124	[JIT] Add Type::repr_str to return human-readable str (#39544 ) Summary: Clearly expressing a type is inferred by PyTorch instead of explicitly annotated by user makes many error messages more user-friendly Currently Type has two string conversion methods. str() for IR printing and python_str() for serialization and error message generation. If we want to include more information in type printing while maintaining serialization/deserialization correctness, we need to split python_str() into annotation_str() and repr_str(). annotation_str is solely responsible for serialization, it strictly matches format of python type annotation. repr_str() is responsible for generating a human-readable error message that includes information like "this type is inferred, not explicitly annotated" Closes https://github.com/pytorch/pytorch/issues/39449 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39544 Differential Revision: D21978759 Pulled By: gmagogsfm fbshipit-source-id: 733566f5a62e748b5ca4bb3c5943ebb6d5b664d0	2020-06-10 12:01:24 -07:00
Shen Li	4e892bd99c	[Easy Review] Fix ProcessGroupRpcBackendOptions Doc (#39787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39787 Test Plan: Imported from OSS Reviewed By: lw Differential Revision: D21975590 Pulled By: mrshenli fbshipit-source-id: b0f9c3b15fdd32df8f64ea64b05fd65ec793f2b2	2020-06-10 11:44:33 -07:00
Jerry Zhang	7994d6e147	[quant][graphmode] Support quantization for `aten::append` (#39644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39644 Test Plan: Imported from OSS Differential Revision: D21942112 fbshipit-source-id: 8dc5871cbde9e9cc161a624c2b07e2e74bc0ff6d	2020-06-10 11:29:27 -07:00
Edward Yang	08105a0068	Remove unnecessary !op.is_read_write test in compute_names/compute_shape. (#39747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39747 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21961336 Pulled By: ezyang fbshipit-source-id: 6c7b3ccebd8f95a04994d53e5b5e9471bfefb26b	2020-06-10 10:48:59 -07:00
Natalia Gimelshein	b1750cb884	always resize_ min/max outputs (#39696 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/36474. Addresses the comment in https://github.com/pytorch/pytorch/issues/35591, and makes behavior with `out` kwarg consistent with other functions that `resize_` their passed out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39696 Differential Revision: D21947473 Pulled By: ngimel fbshipit-source-id: 072f2bd156cc55b87ed64cf47c44b527e5e0b82c	2020-06-10 10:09:05 -07:00
Jeff Daily	9ba2530d42	[ROCm] explicitly embed version within image name (#39735 ) Summary: This commit also removes the clang7 install for ROCm images, and properly cleans up the apt cache after ROCm installation to reduce image sizes. Embedding the ROCm version within the image name follows the precedent set by CUDA images and decouples image creation from ROCm implicitly installing the latest version when images are prepared. CC sunway513 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/39735 Differential Revision: D21976162 Pulled By: ezyang fbshipit-source-id: 9801106e8cb118a812113ec077154e72a9c2eb2d	2020-06-10 10:01:10 -07:00
Jerry Zhang	2d589bc9da	[quant][graphmode] Fix a corner case in handling `if` in insert_observers (#39615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39615 corner case: https://github.com/pytorch/vision/blob/master/torchvision/models/densenet.py#L87 (Note: this ignores all push blocking failures!) Test Plan: Imported from OSS Differential Revision: D21942110 fbshipit-source-id: 9522032957575662d2648db2a41fa5410e8d1e3a	2020-06-10 09:45:07 -07:00
Mike Ruberry	0aecbbb762	Changes TensorIterator computation to not consider out kwarg, lets UnaryOps safe cast to out (#39655 ) Summary: BC breaking note: In PyTorch 1.5 passing the out= kwarg to some functions, like torch.add, could affect the computation. That is, ``` out = torch.add(a, b) ``` could produce a different tensor than ``` torch.add(a, b, out=out) ``` This is because previously the out argument participated in the type promotion rules. For greater consistency with NumPy, Python, and C++, in PyTorch 1.6 the out argument no longer participates in type promotion, and has no effect on the computation performed. ORIGINAL PR NOTE This PR effectively rewrites Tensor Iterator's "compute_types" function to both clarify its behavior and change how our type promotion works to never consider the out argument when determining the iterator's "common dtype," AKA its "computation type." That is, ``` a = op(b, c) ``` should always produce the same result as ``` op(b, c, out=a) ``` This is consistent with NumPy and programming languages like Python and C++. The conceptual model for this change is that a TensorIterator may have a "common computation type" that all inputs are cast to and its computation performed in. This common computation type, if it exists, is determined by applying our type promotion rules to the inputs. A common computation type is natural for some classes of functions, like many binary elementwise functions (e.g. add, sub, mul, div...). (NumPy describes these as "universal functions.") Many functions, however, like indexing operations, don't have a natural common computation type. In the future we'll likely want to support setting the TensorIterator's common computation type explicitly to enable "floating ufuncs" like the sin function that promote integer types to the default scalar type. Logic like that is beyond the type promotion system, which can only review inputs. Implementing this change in a readable and maintainable manner was challenging because compute_types() has had many small modifications from many authors over ~2 year period, and the existing logic was in some places outdated and in other places unnecessarily complicated. The existing "strategies" approach also painted with a broad brush, and two of them no longer made conceptual sense after this change. As a result, the new version of this function has a small set of flags to control its behavior. This has the positive effect of disentangling checks like all operands having the same device and their having the same dtype. Additional changes in this PR: - Unary operations now support out arguments with different dtypes. Like binary ops they check canCast(computation type, out dtype). - The dtype checking for lerp was outdated and its error message included the wrong variable. It has been fixed. - The check for whether all tensors are on the same device has been separated from other checks. TensorIterators used by copy disable this check. - As a result of this change, the output dtype can be computed if only the input types are available. - The "fast path" for checking if a common dtype computation is necessary has been updated and simplified to also handle zero-dim tensors. - A couple helper functions for compute_types() have been inlined to improve readability. - The confusingly named and no longer used promote_gpu_output_dtypes_ has been removed. This variable was intended to support casting fp16 reductions on GPU, but it has become a nullop. That logic is now implemented here: `856215509d/aten/src/ATen/native/ReduceOpsUtils.h (L207)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39655 Differential Revision: D21970878 Pulled By: mruberry fbshipit-source-id: 5e6354c78240877ab5d6b1f7cfb351bd89049012	2020-06-10 09:04:13 -07:00
HC Zhu	acc13ac828	[PyTorch] Make DDP reducer work under distributed autograd (#37998 ) Summary: ## Why doesn’t DDP work under dist_autograd? DDP follows the steps below 1. [DDP Python constructor](`8d6a8d2b3f/torch/nn/parallel/distributed.py (L389-L393)`) (on a module) creates a [C++ Reducer](https://github.com/pytorch/pytorch/blob/master/torch/csrc/distributed/c10d/reducer.cpp), which holds references to all parameters (or variables in C++ code). 2. The reducer installs a post hook on each model parameter. 3. The backward run starts and triggers the post hooks installed above. 4. The post hook of a parameter simply marks the parameter ready for all-reduce. 5. Once all parameters in a bucket are ready, an all-reduce process starts by reading variable `.grad` and writes to variable `.grad`. But under dist_autograd, `.grad` of a variable is not populated at all. Instead, grads are in a global map in distributed context from variables to their grads. ## Solution of this PR The distributed engine to set a thread_local variable in a backward run indicating we're running in distributed mode. DDP reducer can then appropriately use `.grad` or the distributed context based on the thread local. More precisely, the thread local is set before calling the post hooks installed by the DDP reducer so that DDP post hooks can retrieve this thread local. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37998 Test Plan: ``` python test/distributed/test_ddp_under_dist_autograd.py ``` FB repo ``` buck test caffe2/test/distributed/... ``` DDP accuracy benchmark workflow run ``` flow-cli canary pytorch.benchmark.accuracy_comparison.workflow --parameters-json '{"node_world_size": 4, "dist_backend": "nccl"}' --run-as-secure-group fblearner_flow --entitlement gpu_prod ``` f196173157 Reviewed By: pritamdamania87 Differential Revision: D21513795 Pulled By: hczhu fbshipit-source-id: fe21e68ecdc9274182db4d4bb5a1e2d68ef927a2	2020-06-10 08:38:14 -07:00
lixinyu	7cb4eae8b1	correct some cpp extension code usages and documents (#39766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39766 Test Plan: Imported from OSS Differential Revision: D21967284 Pulled By: glaringlee fbshipit-source-id: 8597916bee247cb5f8c82ed8297119d2f3a72170	2020-06-10 08:31:22 -07:00
Xiang Gao	be13388adb	Migrate AT_DISPATCH_COMPLEX_TYPES to c10::complex (#39564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39564 Differential Revision: D21954741 Pulled By: anjali411 fbshipit-source-id: 1280b5a11732b31a240f8bc129cc09298e61e419	2020-06-10 06:33:53 -07:00
Zino Benaissa	9111ae7782	Preserve user specified attributes and methods (#38830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38830 This patch enables to preserve user specified attributes or non forward methods. The API: _freeze_module(Module, ["a", "version"]) Test Plan: Imported from OSS Differential Revision: D21957316 Pulled By: bzinodev fbshipit-source-id: 5c9146ae679791070a9de868c45785725b48a9e6	2020-06-10 01:38:18 -07:00
Xiaoqiang Zheng	6bdfd6ae1a	[TensorExpr] Fast sigmoid for LLVM (#39717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39717 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D21949849 Pulled By: zheng-xq fbshipit-source-id: f918bb2cb0ea647ce254fc51258af6fd01325f2d	2020-06-09 20:11:35 -07:00
Tao Xu	307920731d	[iOS] Add nonVarTypeModeGuard to fix the unit test (#39743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39743 ### Summary Still need this RAII guard for full JIT ### Test Plan - CI checks Test Plan: Imported from OSS Differential Revision: D21968256 Pulled By: xta0 fbshipit-source-id: 8ea63c699fed4e2a01390232a58f039110391844	2020-06-09 20:05:32 -07:00
Elias Ellison	2193fa119e	[JIT] consider side effects when trying moves in alias analysis (#39497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39497 Previously, we didn't consider side effects at all when moving nodes in alias analysis. It is never valid to reorder a node with a side effect. This has led to bugs when used with Bailouts. Unfortunately this will might cause regressions but it wasn't correct prior :/ Test Plan: Imported from OSS Differential Revision: D21963774 Pulled By: eellison fbshipit-source-id: 656995d1b82534eca65437ed4e397b2bf08a4dec	2020-06-09 19:32:55 -07:00
Yuchen Huang	3cf9b7d9ea	move mm_cpu from BlasWrappersCPU.cpp to LinearAlgebra.cpp and delete the former file (#39700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39700 Refactored files 1. Moved mm_cpu from BlasWrappersCPU.cpp to LinearAlgebra.cpp 2. Deleted BlasWrappersCPU.cpp These functions are closely related to those in LinearAlgebra.cpp, we don't need a seperate file. ghstack-source-id: 105503249 Test Plan: `buck build //caffe2/aten/...` `buck build //xplat/caffe2:ptmobile_benchmarkAndroid#android-armv7` CI Reviewed By: dreiss Differential Revision: D21692154 fbshipit-source-id: 4edb7cee53c9e29700372f16ca1e6f85539dac24	2020-06-09 19:15:02 -07:00
Zino Benaissa	1b99be9088	Freezing Module containing fork subgraphs (#37044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37044 This patch applies freezing to module containing fork subgraphs. Test Plan: Imported from OSS Differential Revision: D21956334 Pulled By: bzinodev fbshipit-source-id: ec272ddea1ed588c35d8ffa4ea9b532d5336667f	2020-06-09 19:10:38 -07:00
Supriya Rao	0fe1ec3ce0	[quant][graphmode] Test weight observer for dynamic quant (#39687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39687 Run the observer on the weight values and compare it with the calculated attributes in the graph Test Plan: python test/test_quantization.py test_dynamic_weight_observer Imported from OSS Differential Revision: D21961907 fbshipit-source-id: dde3e629b8514e6c82346915ac35e35cf9c05f6f	2020-06-09 17:41:57 -07:00
Jerry Zhang	2a06a6935c	[quant][graphmode] Support propagate dequantize for nodes with multiple outputs (#39551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39551 e.g. prim::ListUnpack Test Plan: Imported from OSS Differential Revision: D21942108 fbshipit-source-id: 6fd7c972ca70692ec52c296b6a1e858324e66c12	2020-06-09 17:31:16 -07:00
Dmytro Dzhulgakov	e46060701d	[caffe2] Fix of initializing ATen's CUDA before using caching allocator (#39759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39759 Caffe2 has a mode where it uses PT's caching allocator. Somehow we were not calling the initialization explicitly. Now, I have no idea why it worked before. Probably worth to run a bisect separately. Reviewed By: houseroad Differential Revision: D21962331 fbshipit-source-id: f16ad6b27a67dbe0bda93939cca8c94620d22a09	2020-06-09 17:25:42 -07:00
James Reed	56289ba31f	[JIT] Improve error message when type annotation Future without a contained type (#39751 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39751 Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D21960368 Pulled By: jamesr66a fbshipit-source-id: 8650d31ff8070b12672c8d4b0224d4e69f619938	2020-06-09 16:55:13 -07:00
Hector Yuen	4c5a808d37	avoid dividing by 0 in div unit test (#39736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39736 in some rare cases we can end up generating a random number = 0 Test Plan: test_div Reviewed By: yinghai Differential Revision: D21953973 fbshipit-source-id: a834f624d72f1084c300163344662df121aae21b	2020-06-09 16:39:19 -07:00
Jeremy Lilley	be3bbfc917	[futures] Add collectAny() to ivalue::Future for completeness (#39597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39597 To complement collectAll(), this change adds collectAny(), and writes up relevant unittest coverage. We also remove the vector-based helper version of collectAll(), which was debatable usefulness in a previous change. ghstack-source-id: 105527180 Test Plan: buck test mode/dev-nosan caffe2/test/cpp/jit/... Differential Revision: D21910311 fbshipit-source-id: dbb3ca404672a3d751b1b3cf016e6084a9ff8040	2020-06-09 16:32:52 -07:00
Jason Ansel	bccf8831b8	Allow initializing TestCase() outside of unittest.main() (#39695 ) Summary: When debugging it is sometimes useful to call test code manually. This change makes that easier. Before this change, one would get the following error: ``` $ python -c "from torch.testing._internal.jit_utils import JitTestCase; JitTestCase()" Traceback (most recent call last): File "<string>", line 1, in <module> File "/home/jansel/pytorch/torch/testing/_internal/common_utils.py", line 740, in __init__ test_method = getattr(self, method_name) AttributeError: 'JitTestCase' object has no attribute 'runTest' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39695 Test Plan: `python -c "from torch.testing._internal.jit_utils import JitTestCase; JitTestCase()"` Differential Revision: D21959249 Pulled By: jansel fbshipit-source-id: 8435249f102338c957c3a7a7aad48d21d372a8cf	2020-06-09 15:59:36 -07:00
Jerry Zhang	c902146ba4	[quant][graphmode][refactor] propagateQuantizationOps (#39550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39550 This is to prepare for next PR that fixes propagate dequantize for ops with multiple outputs Test Plan: Imported from OSS Differential Revision: D21942063 fbshipit-source-id: 518b3e437140bec9620988d2eb59b6aae069245e	2020-06-09 15:48:20 -07:00
Elias Ellison	428bc90978	[JIT] add dtype as type annotation (#39741 ) Summary: make torch.dtype resolve as type annotation Pull Request resolved: https://github.com/pytorch/pytorch/pull/39741 Reviewed By: jamesr66a Differential Revision: D21956469 Pulled By: eellison fbshipit-source-id: 492acd7403fa827a2e2c87fd08d31450fcb3a45e	2020-06-09 15:01:00 -07:00
Nikita Shulga	4e30146368	Use `ProgramFiles` environment variable on Windows (#39707 ) Summary: 'Program Files' does not have to be on disk C (nor necesserily should be called `Program Files`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/39707 Differential Revision: D21954235 Pulled By: malfet fbshipit-source-id: 91a9b765cd1bc7e6201dd4b800d45257207010d9	2020-06-09 14:55:52 -07:00
peter	0f39ed86a7	Cleanup debug info switches with MSVC (#39703 ) Summary: Switch off `/Z7` so that we don't generate debug info in Release and MinSizeRel builds, so that we will probably get smaller static libraries and object files and faster build time Pull Request resolved: https://github.com/pytorch/pytorch/pull/39703 Differential Revision: D21960684 Pulled By: ezyang fbshipit-source-id: 909a237a138183591d667885b13fc311470eed65	2020-06-09 14:11:40 -07:00
wangxiyuan	f1e6e56641	Add aarch64 ci badge (#39698 ) Summary: This PR added a third-party aarch64 CI badge. It's CPU only currently for building pytorch master branch on python3.6 and Ubuntu 18.04. This CI is provided by OpenLab[1] The build job runs once everyday at UTC0000 You can preview the badge here[2] The build failed because of a known issue: https://github.com/pytorch/pytorch/issues/33124 More python version and GPU support will be added in the future. This fixes pytorch/pytorch#39558. 1: https://openlabtesting.org/ 2: https://github.com/wangxiyuan/pytorch/tree/add_aarch64_ci_badge Pull Request resolved: https://github.com/pytorch/pytorch/pull/39698 Differential Revision: D21960607 Pulled By: ezyang fbshipit-source-id: 15d5c06e455ed1b5cf69c3b33906c098cb539f87	2020-06-09 14:02:59 -07:00
Tongzhou Wang	b84a7fbbc1	Fix error message in autograd (#39729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39729 Differential Revision: D21953429 Pulled By: albanD fbshipit-source-id: 76aea69c5100371daaee7e5e386aac05e0b6a438	2020-06-09 13:54:49 -07:00
Tongzhou Wang	3bdbb27ddb	Fix Gather::apply accessing moved tensors (#39733 ) Summary: Somehow this gets uncovered when I make changes in https://github.com/pytorch/pytorch/pull/39710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39733 Differential Revision: D21956034 Pulled By: albanD fbshipit-source-id: e6d9097d67a4d0951ae6ccd2c5fb4cd54c536960	2020-06-09 13:38:26 -07:00
peter	f31aca3a11	Cleanup cuda install scripts for Windows jobs (#39712 ) Summary: 1. Separate script 2. Don't print result in stdout 3. Don't collect logs if installation succeeds Pull Request resolved: https://github.com/pytorch/pytorch/pull/39712 Differential Revision: D21959449 Pulled By: malfet fbshipit-source-id: 3379de28db0606632587a598c6721ff54f1e85b7	2020-06-09 13:28:54 -07:00
Jamie King	2633a9cca1	Adding LpNorm regularization for sparse features in DPER3 (#38582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38582 Adding LpNorm regularization for sparse features in DPER3. This is done using a sparse regularization op with run_after_optimizer (see D21003029). * Added code calling new caffe2 operator from D21003029 to caffe2/python/regularizer.py * Added l1norm and l2norm to sparse regularizer thrift definition. * Added the new regularization references to test utils. * Added a new file for unit tests "sparse_nn_sparse_reg_test.py" Test Plan: buck test mode/dev //caffe2/caffe2/fb/dper/layer_models/tests:sparse_nn_sparse_reg_test buck test mode/dev //caffe2/caffe2/fb/dper/layer_models/tests:sparse_nn_reg_test DPER canary: https://fburl.com/fblearner/rcp5yzeh New DPER canary: https://fburl.com/fblearner/0krgd74x Differential Revision: D20704248 fbshipit-source-id: 7e3d5013b3ff3da95ea027f0f2dd855f3ae8e41d	2020-06-09 12:34:50 -07:00
James Reed	f1c60c04b8	[JIT] Fix module interface test (#39592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39592 Test Plan: Imported from OSS Differential Revision: D21909659 Pulled By: jamesr66a fbshipit-source-id: 831ae6b158041d4241209cee50f7a4d09cd2fcb2	2020-06-09 12:13:58 -07:00
peter	3413f0a8ca	Fix dll load failure in virtual environments on Windows (#39622 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39620. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39622 Differential Revision: D21953420 Pulled By: malfet fbshipit-source-id: ab0e0358327ec321130384e0a654987cd70349c0	2020-06-09 11:28:22 -07:00
Gregory Chanan	18073ffca3	Add tests for mismatched dtypes in torch.gather. (#39689 ) Summary: https://github.com/pytorch/pytorch/pull/38646 added checks for this, but only added tets for the scatter functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39689 Reviewed By: malfet Differential Revision: D21945524 Pulled By: gchanan fbshipit-source-id: 8b06856c06d6427b8cd929a1275422a5ed6e11cc	2020-06-09 08:05:40 -07:00
Richard Zou	8565ae5a76	Revert D21925406: [pytorch][PR] torch.multinomial : fast-path for replacement=False Test Plan: revert-hammer Differential Revision: D21925406 Original commit changeset: f2ee5148fa7d fbshipit-source-id: b1cac6ad463a83afb7eee83c6a9d575abf15072f	2020-06-09 07:28:52 -07:00
kshitij12345	9733390998	Add `torch.flip{lr, ud}` (#38599 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/38349 TODO: * [x] Add Tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/38599 Differential Revision: D21941884 Pulled By: mruberry fbshipit-source-id: 7a442ff11051c2c868cf8e3c04e4bba0f1a1d426	2020-06-09 07:19:37 -07:00
Tao Xu	4ec86ca5ba	[iOS] Disable depthwise3x3_winograd on iOS (#39591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39591 Test Plan: Imported from OSS Differential Revision: D21908298 Pulled By: xta0 fbshipit-source-id: 2b67909f34ca1d3ff0bed9ff6875e0ba2ae3b98e	2020-06-09 02:55:30 -07:00
Luca Wehrstedt	7d85e77076	Use atomic operations to manipulate current RPC agent (#39663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39663 I was investigating a memory corruption issue and thought it may be due to a race condition in (un)setting the current RPC agent. It turns out it wasn't (still investigating...). I had already written this fix, and it is a real fix (there could really be a race condition), so I'm sending it out to see whether there's interest in merging it. I believe its practical usefulness is however very limited, since typically the current RPC agent is only changed twice (at start and at shutdown) and thus there's limited risk for races. As there may be some confusion on atomicity of shared_ptrs, let me clarify a few things from the get go. Operations on the control blocks of shared_ptrs (i.e., increasing and decreasing the refcounts) are atomic, which means that it is safe to manipulate two different shared_ptrs that point to the same object from different threads. However, the shared_ptr object itself is not atomic, which means that it is not safe to manipulate the same shared_ptr from two different threads. For that reason, the STL provides atomic functions explicitly specialized for shared_ptrs: https://en.cppreference.com/w/cpp/memory/shared_ptr/atomic (in C++ 20, they are being replaced by a specialization of std::atomic<std::shared_ptr<T>>). Note that this has been called "the worst question of all of C++" by Louis Brandy at his CppCon talk: https://youtu.be/lkgszkPnV8g?t=1210 ghstack-source-id: 105475005 Test Plan: Unit tests Differential Revision: D21932817 fbshipit-source-id: da33fedd98efb820f284583ce7ff1c1c531dea9c	2020-06-09 02:11:15 -07:00
kshitij12345	af05158c56	torch.multinomial : fast-path for replacement=False (#39636 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39624 #11931 Based on the example by RobertoLat https://github.com/pytorch/pytorch/issues/11931#issuecomment-625882503 Fast-path is not taken on CPU for `Half` as `log` doesn't support it. Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import time import torch import numpy as np for n, t in [(500_000, 10), (1_000_000, 10)]: for dtype in (torch.half, torch.float, torch.double): # Input Setup p = torch.from_numpy(np.random.rand(n)).to(dtype) want = 1000 print(f'torch.multinomial(a) a.numel() == {n} for {t} times {dtype}') start = time.time() # Iterate for _ in range(t): torch.multinomial(p, want, replacement=False) print(f'Took:', time.time() - start) print('***' 10) for n, t in [(50_000, 100), (100_000, 100)]: for dtype in (torch.half, torch.float, torch.double): # Input Setup p = torch.rand(n, device='cuda', dtype=dtype) want = 1000 print(f'torch.multinomial(a) a.numel() == {n} for {t} times {dtype}') start = time.time() # torch.cuda.synchronize() # Iterate for _ in range(t): torch.multinomial(p, want, replacement=False) # torch.cuda.synchronize() print(f'CUDA Took:', time.time() - start) ``` Before: ``` torch.multinomial(a) a.numel() == 500000 for 10 times torch.float16 Took: 80.64455389976501 torch.multinomial(a) a.numel() == 500000 for 10 times torch.float32 Took: 3.7778031826019287 torch.multinomial(a) a.numel() == 500000 for 10 times torch.float64 Took: 5.045570611953735 torch.multinomial(a) a.numel() == 1000000 for 10 times torch.float16 Took: 161.53191947937012 torch.multinomial(a) a.numel() == 1000000 for 10 times torch.float32 Took: 7.640851736068726 torch.multinomial(a) a.numel() == 1000000 for 10 times torch.float64 Took: 10.399673461914062 ************************************** torch.multinomial(a) a.numel() == 50000 for 100 times torch.float16 CUDA Took: 4.873984098434448 torch.multinomial(a) a.numel() == 50000 for 100 times torch.float32 CUDA Took: 4.713594436645508 torch.multinomial(a) a.numel() == 50000 for 100 times torch.float64 CUDA Took: 11.167185068130493 torch.multinomial(a) a.numel() == 100000 for 100 times torch.float16 CUDA Took: 7.195427417755127 torch.multinomial(a) a.numel() == 100000 for 100 times torch.float32 CUDA Took: 7.669712066650391 torch.multinomial(a) a.numel() == 100000 for 100 times torch.float64 CUDA Took: 20.20938801765442 ``` After: ``` torch.multinomial(a) a.numel() == 500000 for 10 times torch.float16 Took: 80.6487455368042 torch.multinomial(a) a.numel() == 500000 for 10 times torch.float32 Took: 0.0663309097290039 torch.multinomial(a) a.numel() == 500000 for 10 times torch.float64 Took: 0.09588909149169922 torch.multinomial(a) a.numel() == 1000000 for 10 times torch.float16 Took: 161.60748076438904 torch.multinomial(a) a.numel() == 1000000 for 10 times torch.float32 Took: 0.13187885284423828 torch.multinomial(a) a.numel() == 1000000 for 10 times torch.float64 Took: 0.17609834671020508 ************************************** torch.multinomial(a) a.numel() == 50000 for 100 times torch.float16 CUDA Took: 0.007131099700927734 torch.multinomial(a) a.numel() == 50000 for 100 times torch.float32 CUDA Took: 0.022255420684814453 torch.multinomial(a) a.numel() == 50000 for 100 times torch.float64 CUDA Took: 0.0323028564453125 torch.multinomial(a) a.numel() == 100000 for 100 times torch.float16 CUDA Took: 0.04995012283325195 torch.multinomial(a) a.numel() == 100000 for 100 times torch.float32 CUDA Took: 0.04948878288269043 torch.multinomial(a) a.numel() == 100000 for 100 times torch.float64 CUDA Took: 0.05495333671569824 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39636 Differential Revision: D21925406 Pulled By: ngimel fbshipit-source-id: f2ee5148fa7dd88e018c461ced0e2361c3a43796	2020-06-08 23:52:51 -07:00
Shen Li	338a1ccce5	Fix error handling for rpc.remote (#39605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39605 1. `RRef.to_here()` could serialize a Python object into a message. However, we did not catch the Python pickle error, which would result in crash. This was exposed when calling `rpc.remote` with a user function that returns `torch.futures.Future`. 2. `rpc.function.async_execution` could throw error on the server. This commit sets the error on the OwnerRRef properly. Test Plan: Imported from OSS Differential Revision: D21913820 Pulled By: mrshenli fbshipit-source-id: 50b620641a3b89d310b3b907570561decd83ee34	2020-06-08 22:57:24 -07:00
Shawn Zhong	aa5ccf9626	Kill dead pairwise ops in THC (#39070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39070 Differential Revision: D21947797 Pulled By: ngimel fbshipit-source-id: a556a1b61b814b90284a6de3a7a2c3d0793bb908	2020-06-08 22:40:21 -07:00
Nikita Shulga	1790d35848	Skip `test_minmax_illegal_dtype` on XLA (#39693 ) Summary: It's better to have skipping logic explicitly defined in test decorators rather than in some hard-to-find blacklists Pull Request resolved: https://github.com/pytorch/pytorch/pull/39693 Differential Revision: D21947893 Pulled By: malfet fbshipit-source-id: 3d0855eda7e10746ead80fccf84a8db8bf5a3ef1	2020-06-08 22:34:44 -07:00
Yael Dekel	0251ba6108	Fix ONNX export of RNNs with no bias (#36894 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34800 . Currently, the LSTM/RNN/GRU export to ONNX can't handle models without a bias term. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36894 Reviewed By: hl475 Differential Revision: D21134794 Pulled By: houseroad fbshipit-source-id: e71e089025a3dc7e8c883ff99cd788c5f302492e	2020-06-08 20:36:22 -07:00
Wanchao Liang	9f71997380	some refactor on register_distributed_ops (#38657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38657 Test Plan: Imported from OSS Differential Revision: D21940442 Pulled By: wanchaol fbshipit-source-id: c60c1ac3ede355c276e0d03fb13ff301698f6acd	2020-06-08 19:43:46 -07:00
Wanchao Liang	f32c9eb579	[jit] register distributed backward (#38494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38494 This register distributed.autograd.backward to jit Test Plan: Imported from OSS Differential Revision: D21596133 Pulled By: wanchaol fbshipit-source-id: b64343010616a636304de54ae74ad4fb83445a62	2020-06-08 19:43:40 -07:00
Wanchao Liang	d493918436	[dist_autograd] expose distributed backward C++ API (#38656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38656 Test Plan: Imported from OSS Differential Revision: D21940441 Pulled By: wanchaol fbshipit-source-id: e9d35201825912f5e7d7e1d0a71586abe5a6f71c	2020-06-08 19:42:21 -07:00
Rohan Varma	e033db0477	Enable RRef timeout for tensorpipe (#39531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39531 Enables RRef timeout support in TP agent by having TP agent mark timeout errors with `makeRPCError` API. Also does some refactoring so TP agent can print out the timeout for each future that has timed out. ghstack-source-id: 105461555 Test Plan: CI Differential Revision: D21881475 fbshipit-source-id: f63300e1f0a80ac7eebc983752070c0ec6ac17a6	2020-06-08 19:08:49 -07:00
Xiang Gao	afb2d27b24	Migrate AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES to c10::complex (#39296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39296 Differential Revision: D21825114 Pulled By: anjali411 fbshipit-source-id: 9dd4719282591d635a64001d27a649c86fb5022c	2020-06-08 18:54:15 -07:00
Tongzhou Wang	d1cdf1fd56	update convert_sync_batchnorm docs (#39646 ) Summary: fix some inaccuracies Pull Request resolved: https://github.com/pytorch/pytorch/pull/39646 Differential Revision: D21930023 Pulled By: mrshenli fbshipit-source-id: 9c6b8eeefeb0482a6ae7f825ae055090ce589223	2020-06-08 18:42:42 -07:00
kshitij12345	1f7557d173	Migrate `diag` and `trace` from TH to ATen (CUDA) (#36876 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24549 #24649 ## diag Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import time import torch import timeit import math for n, t in [(100, 20000), (400, 20000)]: for dtype in (torch.int8,torch.int16, torch.int32, torch.int64, torch.float, torch.double): # Input Setup a = torch.arange(n, dtype=dtype, device="cuda") b = a.reshape((int(math.sqrt(n)), int(math.sqrt(n)))) print(f'torch.diag a.numel() == {n} for {t} times {dtype}') for inp, inp_name in [(a, '1-D'), (b, '2-D')]: start = time.time() torch.cuda.synchronize() # Iterate for _ in range(t): torch.diag(inp) # Final Synchronize Before Teardown torch.cuda.synchronize() print(inp_name + " Took:", time.time() - start) ``` \|Dtype \| Before \| After \| \|------\|--------\|-------\| \| int8-Elems:100 \| 1-D Took: 0.20730137825012207<br />2-D Took: 0.12553787231445312<br /> \| 1-D Took: 0.33618664741516113<br />2-D Took: 0.1264970302581787<br /> \| \| int16-Elems:100 \| 1-D Took: 0.2127547264099121<br />2-D Took: 0.12582707405090332<br /> \| 1-D Took: 0.2146449089050293<br />2-D Took: 0.12558245658874512<br /> \| \| int32-Elems:100 \| 1-D Took: 0.2106609344482422<br />2-D Took: 0.12958312034606934<br /> \| 1-D Took: 0.2121574878692627<br />2-D Took: 0.1264948844909668<br /> \| \| int64-Elems:100 \| 1-D Took: 0.20768976211547852<br />2-D Took: 0.1256253719329834<br /> \| 1-D Took: 0.2077159881591797<br />2-D Took: 0.12476921081542969<br /> \| \| float32-Elems:100 \| 1-D Took: 0.2137584686279297<br />2-D Took: 0.12708187103271484<br /> \| 1-D Took: 0.21565628051757812<br />2-D Took: 0.1275336742401123<br /> \| \| float64-Elems:100 \| 1-D Took: 0.21710658073425293<br />2-D Took: 0.12845087051391602<br /> \| 1-D Took: 0.219193696975708<br />2-D Took: 0.1264345645904541<br /> \| \| int8-Elems:400 \| 1-D Took: 0.20585918426513672<br />2-D Took: 0.1257162094116211<br /> \| 1-D Took: 0.20970797538757324<br />2-D Took: 0.12455391883850098<br /> \| \| int16-Elems:400 \| 1-D Took: 0.20943427085876465<br />2-D Took: 0.12425971031188965<br /> \| 1-D Took: 0.21483230590820312<br />2-D Took: 0.12662172317504883<br /> \| \| int32-Elems:400 \| 1-D Took: 0.21058869361877441<br />2-D Took: 0.1312875747680664<br /> \| 1-D Took: 0.2092602252960205<br />2-D Took: 0.12785696983337402<br /> \| \| int64-Elems:400 \| 1-D Took: 0.287722110748291<br />2-D Took: 0.12862586975097656<br /> \| 1-D Took: 0.28710484504699707<br />2-D Took: 0.12852025032043457<br /> \| \| float32-Elems:400 \| 1-D Took: 0.21535277366638184<br />2-D Took: 0.1278238296508789<br /> \| 1-D Took: 0.2140669822692871<br />2-D Took: 0.1268482208251953<br /> \| \| float64-Elems:400 \| 1-D Took: 0.28638601303100586<br />2-D Took: 0.13219022750854492<br /> \| 1-D Took: 0.28608059883117676<br />2-D Took: 0.13063836097717285<br /> \| ## trace Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import time import torch import timeit import math for n, t in [(10000, 20000), (40000, 20000)]: for dtype in (torch.int8,torch.int16, torch.int32, torch.int64, torch.float, torch.double): # Input Setup a = torch.arange(n, dtype=dtype, device="cuda") a = a.reshape((int(math.sqrt(n)), int(math.sqrt(n)))) print(f'torch.trace a.numel() == {n} for {t} times {dtype}') start = time.time() torch.cuda.synchronize() # Iterate for _ in range(t): torch.trace(a) # Final Synchronize Before Teardown torch.cuda.synchronize() print("Took:", time.time() - start) ``` \|Dtype \| Before \| After \| \|------\|--------\|-------\| \| int8-Elems:10000 \| Took: 0.4376576900482178<br /> \| Took: 0.42725276947021484<br /> \| \| int16-Elems:10000 \| Took: 0.4334981441497803<br /> \| Took: 0.4376239776611328<br /> \| \| int32-Elems:10000 \| Took: 0.43313121795654297<br /> \| Took: 0.43097853660583496<br /> \| \| int64-Elems:10000 \| Took: 0.28386616706848145<br /> \| Took: 0.2827033996582031<br /> \| \| float32-Elems:10000 \| Took: 0.2905247211456299<br /> \| Took: 0.2914285659790039<br /> \| \| float64-Elems:10000 \| Took: 0.29450368881225586<br /> \| Took: 0.2907843589782715<br /> \| \| int8-Elems:40000 \| Took: 0.4255516529083252<br /> \| Took: 0.41020774841308594<br /> \| \| int16-Elems:40000 \| Took: 0.4287736415863037<br /> \| Took: 0.42923426628112793<br /> \| \| int32-Elems:40000 \| Took: 0.43021249771118164<br /> \| Took: 0.42778849601745605<br /> \| \| int64-Elems:40000 \| Took: 0.2852292060852051<br /> \| Took: 0.28212475776672363<br /> \| \| float32-Elems:40000 \| Took: 0.29549574851989746<br /> \| Took: 0.29524707794189453<br /> \| \| float64-Elems:40000 \| Took: 0.29451632499694824<br /> \| Took: 0.2894322872161865<br /> \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/36876 Differential Revision: D21940588 Pulled By: ngimel fbshipit-source-id: f0ec59b1d16a51690390a002b7c46eec93f0b092	2020-06-08 18:27:18 -07:00
Nikita Shulga	64192ca3da	Skip unit tests relying on MKL if compiled without it (#39672 ) Summary: Also skip TestTorchDeviceTypeCPU.test_float_to_int_conversion_finite_cpu_uint8 on PowerPC See example of tests failures on https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/1099/console for Pull Request resolved: https://github.com/pytorch/pytorch/pull/39672 Differential Revision: D21943588 Pulled By: malfet fbshipit-source-id: 3da0d33597db5aa8728e682b8e27dd5f7f6765f4	2020-06-08 17:52:00 -07:00
Xiang Gao	8004d35979	Remove tuple from reduction (#39433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39433 Differential Revision: D21940652 Pulled By: ngimel fbshipit-source-id: fca084fdf789bd2ea765cc383ae394bf94c1510b	2020-06-08 17:24:40 -07:00
Jerry Zhang	9551fb22d6	[quant][graphmode] Preserve numerics in debug option for clamp ops (#39219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39219 We didn't model clamp ops correctly right now, this PR fixes that. Reason is quantized clamp op quantizes the scalar arguments in the op implementation: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/kernels/QuantizedOpKernels.cpp#L614-L617 So we'll need to model this explicitly in the IR. When we see a `aten::dequantize - aten::clamp(%x, %min, %max)` we first make a scalar tensor with `aten::scalar_tensor(%scalar, ...)`, then we quantize the tensor with the same quantization parameters from the input tensor of the `aten::clamp`, dequantize the tensor, then convert the dequantized tensor to scalar using `aten::item`. Test Plan: Imported from OSS Differential Revision: D21831350 fbshipit-source-id: d60731459a0465d64946aabc62065d25d92faefc	2020-06-08 17:15:39 -07:00
Ailing Zhang	dd5aa1fb22	Cleanup unused args in max_unpooling3d (#39664 ) Summary: Reported by dlibenzi (thanks!) that these arguments are not used in the implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39664 Differential Revision: D21934989 Pulled By: ailzhang fbshipit-source-id: 35e79ce7f49626c8ad79362f972e442c06022dcc	2020-06-08 16:35:52 -07:00
Eli Uriegas	b7b7433561	setup: Add long description to wheel packages (#39676 ) Summary: Closes out https://github.com/pytorch/pytorch/issues/38354 For reference: https://packaging.python.org/guides/making-a-pypi-friendly-readme/ Should fill out the PyPI description as well. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/39676 Reviewed By: malfet Differential Revision: D21940656 Pulled By: seemethere fbshipit-source-id: 6c39500404227047d8f24936db0697fe44a6b9e8	2020-06-08 16:25:39 -07:00
Eli Uriegas	84d8d68397	.circleci: Fold postnightly workfow into nightly (#39669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39669 Folds the postnightly workflow, including html updating jobs and binary size jobs, into the regular nightly workflow that should only run after all upload jobs have completed. This also moves the smoke testing jobs into the binary_builds workflow. Do note that the devtoolset7 html update job has been removed since we do not upload binaries specifically to that location anymore. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Differential Revision: D21936811 Pulled By: seemethere fbshipit-source-id: a062413b69bafe0a85173020e8b218b375124106	2020-06-08 16:11:59 -07:00
Omkar Salpekar	0147216a46	[TensorPipe Agent] Documentation fixes and nits (#39467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39467 Mainly cleaned up some docs and spelling/grammar nits ghstack-source-id: 105457210 Test Plan: Sandcastle/CI Differential Revision: D21755354 fbshipit-source-id: 28e7b925ace7813548a1bf8cdcf96cd423a227aa	2020-06-08 15:16:41 -07:00
Kurt Mohler	bba30d1bd8	Add undefined tensor gradient support to all backward functions (#39400 ) Summary: Adds the ability for all backward functions to accept undefined output gradient arguments. An undefined gradient is a Tensor that was created by the argumentless constructor `at::Tensor()`, where `tensor.defined() == false`. Also adds new autograd nodes, UndefinedGrad and UndefinedGradBackward, that can be used from within Python code to inject undefined gradients into a backward function. A new test case is added to the backward function unit tests to use the UndefinedGrad node to ensure that undefined gradients do not break any backward functions. Closes https://github.com/pytorch/pytorch/issues/33138 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39400 Differential Revision: D21936588 Pulled By: albanD fbshipit-source-id: eccc5f55c77babe6dadcea4249d0c68a3c64e85d	2020-06-08 14:13:53 -07:00
Eli Uriegas	8251f1872f	.circleci: Move ecr gc build job to ecr gc workflow (#38523 ) Summary: It didn't really make sense for it to be where it was and seeing how the build only actually takes about 5 minutes to do it'd be best to just move it into the garbage collection workflow. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/38523 Reviewed By: malfet Differential Revision: D21937332 Pulled By: seemethere fbshipit-source-id: 6b797a6af88549dbd5ccce88814a1428354ce7f2	2020-06-08 13:13:30 -07:00
Xiaoqiang Zheng	83dd56632e	Fast tanh for the LLVM backend. (#39528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39528 Test Plan: Imported from OSS Differential Revision: D21927791 Pulled By: zheng-xq fbshipit-source-id: a3f7d79bf0d3a399000ffd7ff4d0502ba365f1dc	2020-06-08 13:02:16 -07:00
Hong Xu	df2d19723a	c10/util/complex_math.h and c10/util/complex_utils.h should not be individually included (#39276 ) Summary: Add a compilation error if they are individually included. Devs should instead include c10/util/complex_type.h (which includes these two files). Pull Request resolved: https://github.com/pytorch/pytorch/pull/39276 Differential Revision: D21924922 Pulled By: ezyang fbshipit-source-id: ad1034be5d9d694b18cc5f03a44f540f10de568c	2020-06-08 11:52:18 -07:00
Rohan Varma	397b24bb37	Cleanup rref_impl (#39530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39530 Some cleanups for consistency and code reuse. Uses torch_check instead of explicitly throwing runtime error. Calls RRefContext::handleError() for default error handler fallback. ghstack-source-id: 105424164 Test Plan: CI Differential Revision: D21881244 fbshipit-source-id: c706244869e5ddb915f9d8e4f81d1365b4b57321	2020-06-08 11:47:09 -07:00
HC Zhu	a2125135ee	[predictor] move fblearner/predictor to platform009 Summary: This is to test predictor on platform009 Test Plan: ``` fbpkg build -E fblearner/predictor fbpkg build -E fblearner/predictor_proxy ``` # Performance test ## ServiceLab experiments https://fburl.com/servicelab/p2xo4c85 ## Perf A/B test perf_b is platform-009 https://fburl.com/ods/59kdhdf9 perf_a is platform-09 https://fburl.com/ods/gjctzpe3 Differential Revision: D20552379 fbshipit-source-id: d6d9094aedfb2c1db623d44108627e8e00dde47e	2020-06-08 11:31:33 -07:00
rohithkrn	ab6c447f59	[ROCm] Enable AMP autocast tests on ROCm (#39616 ) Summary: Enables AMP autocast tests on ROCm. ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/39616 Differential Revision: D21924219 Pulled By: ezyang fbshipit-source-id: f4df4ad32cd8fae8c4620cd8ab18b00d74fb46bd	2020-06-08 10:30:39 -07:00
Edward Yang	cc2f7fa502	Revert D21930435: Revert D17923732: Optimize GroupNorm on CUDA Test Plan: revert-hammer Differential Revision: D21930435 Original commit changeset: 53bd5db7d61e fbshipit-source-id: 5418393b9207a387b0f448477250354cbc50fdb9	2020-06-08 08:46:55 -07:00
Nik Ved	e4f9c74db3	add dtype checks for scatter/gather family of functions. (#38646 ) Summary: Adds additional dtype checks for scatter/gather family of functions, namely: 1. Checks whether `index` is of type `Long` 2. Checks whether `src.dtype == self.dtype`. Fixes [https://github.com/pytorch/pytorch/issues/38554](https://github.com/pytorch/pytorch/issues/38554) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38646 Differential Revision: D21883033 Pulled By: gchanan fbshipit-source-id: 4bbd48ec0706ddb002318742edba640871ec0162	2020-06-08 08:42:00 -07:00
Gregory Chanan	e3e8f24cbe	Remove duplicate 'with_gil' declaration. (#39540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39540 This gets picked up by mypy as an error in 1.5.1, not sure if it's a different version or setting, but might as well fix. Test Plan: Imported from OSS Differential Revision: D21891772 Pulled By: gchanan fbshipit-source-id: 6f95bcd0652007323cd0c79070425b64e0b71c55	2020-06-08 08:34:38 -07:00
Edward Yang	a83f7a1d70	Revert D17923732: Optimize GroupNorm on CUDA Test Plan: revert-hammer Differential Revision: D17923732 Original commit changeset: 9afaf01288bd fbshipit-source-id: 53bd5db7d61e5eda8d7953d7f6321e54321d7ac2	2020-06-08 08:14:12 -07:00
William Gan	e41fe60867	Add error message when negative stride is passed to as_strided (#39508 ) Summary: Fixes this issue https://github.com/pytorch/pytorch/issues/33290. Builds upon this PR https://github.com/pytorch/pytorch/pull/33392. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39508 Differential Revision: D21890557 Pulled By: zou3519 fbshipit-source-id: 8e1a9afb064a6e19551bf3ede3103dd3f023c660	2020-06-08 07:45:24 -07:00
Linbin Yu	820e81ba09	add overload name for min/max with list input (#39614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39614 add overload name to differentiate prim::min.int(int a, int b) -> (int) prim::min.int(int[] l, int[] r) -> (int[]) Test Plan: verified op names for aten::min and aten::max are different before ``` prim::min.int(int a, int b) -> (int) prim::min.float(float a, float b) -> (float) prim::min.int_float(int a, float b) -> (float) prim::min.float_int(float a, int b) -> (float) prim::min(Scalar a, Scalar b) -> (Scalar) prim::max.int(int a, int b) -> (int) prim::max.float(float a, float b) -> (float) prim::max.int_float(int a, float b) -> (float) prim::max.float_int(float a, int b) -> (float) prim::max(Scalar a, Scalar b) -> (Scalar) prim::min.int(int[] l, int[] r) -> (int[]) prim::max.int(int[] l, int[] r) -> (int[]) prim::min.self_int(int[] self) -> (int) prim::max.self_int(int[] self) -> (int) prim::min.float(float[] l, float[] r) -> (float[]) prim::max.float(float[] l, float[] r) -> (float[]) prim::min.self_float(float[] self) -> (float) prim::max.self_float(float[] self) -> (float) prim::min.bool(bool[] l, bool[] r) -> (bool[]) prim::max.bool(bool[] l, bool[] r) -> (bool[]) prim::min.self_bool(bool[] self) -> (bool) prim::max.self_bool(bool[] self) -> (bool) ``` after ``` prim::min.int(int a, int b) -> (int) prim::min.float(float a, float b) -> (float) prim::min.int_float(int a, float b) -> (float) prim::min.float_int(float a, int b) -> (float) prim::min(Scalar a, Scalar b) -> (Scalar) prim::max.int(int a, int b) -> (int) prim::max.float(float a, float b) -> (float) prim::max.int_float(int a, float b) -> (float) prim::max.float_int(float a, int b) -> (float) prim::max(Scalar a, Scalar b) -> (Scalar) prim::min.int_list(int[] l, int[] r) -> (int[]) prim::max.int_list(int[] l, int[] r) -> (int[]) prim::min.self_int(int[] self) -> (int) prim::max.self_int(int[] self) -> (int) prim::min.float_list(float[] l, float[] r) -> (float[]) prim::max.float_list(float[] l, float[] r) -> (float[]) prim::min.self_float(float[] self) -> (float) prim::max.self_float(float[] self) -> (float) prim::min.bool_list(bool[] l, bool[] r) -> (bool[]) prim::max.bool_list(bool[] l, bool[] r) -> (bool[]) prim::min.self_bool(bool[] self) -> (bool) prim::max.self_bool(bool[] self) -> (bool) ``` Reviewed By: iseeyuan Differential Revision: D21914844 fbshipit-source-id: f1792a8c3b3ed6d1a4ba9705c4504f15e3665126	2020-06-08 06:13:10 -07:00
Jeremy Lilley	b83fed8d4c	[futures] Add c++ ivalue::Future collectAll() helper (#39119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39119 Add some base c++ unittest coverage for ivalue::Future, and in the process, add a basic collectAll() primitive, per 38937. In the process, I realized that List<Future> is effectively impossible to construct (since the Future's type is not templated, but rather passed in, the getTypePtr_<T>::call() isn't defined), so added a workaround in List to make it possible. ghstack-source-id: 105309650 Test Plan: buck test mode/dev-nosan caffe2/test/cpp/jit/... Differential Revision: D21756884 fbshipit-source-id: 5d40c8d1c55098de5497655c7b887f4f56508a37	2020-06-08 05:52:09 -07:00
Zafar Takhirov	172f31171a	[quant] QNNPACK deconv kernel and tests (#36790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36790 Test Plan: Imported from OSS Differential Revision: D21110111 fbshipit-source-id: 548df3a9853ad33d21d279393b91d1691050d4c4	2020-06-08 00:31:25 -07:00
Wanchao Liang	6c56671fd9	[jit] avoid pre-convert tensor to cpu in pickling (#38898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38898 Pickling will pickle the tensor meta info, and its up to the jit exporter or other upstream who use the pickler to decide how to write the actual tensor data. This PR make we call getWritableTensorData in upper level so that rpc and TensorPipe can leverge it with only pickling tensor meta data without converting the tensor from GPU to CPU. Test Plan: Imported from OSS Differential Revision: D21879866 Pulled By: wanchaol fbshipit-source-id: 75f7ff4073e4ad15b6588973dcbdc48f97a8329f	2020-06-07 21:28:33 -07:00
Zafar Takhirov	1db4a31d92	[quant] QNNPACK deconvolution packing (#37405 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37405 Test Plan: Imported from OSS Differential Revision: D21301246 fbshipit-source-id: be72e777a211d414d40e2912dbc2e0ec640c6b32	2020-06-07 20:49:06 -07:00
peter	ee2bc13f44	Fix smoke test jobs (#39638 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39626. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39638 Differential Revision: D21924224 Pulled By: ezyang fbshipit-source-id: 8da75e401bfbff5e11ceeccefd77d0fad81356e4	2020-06-07 17:05:44 -07:00
Linbin Yu	b06b792bbd	remove double registered ops (#39609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39609 These two ops are registered twice in the same file duplicated op: aten::_infer_size(int[] a, int[] b) -> (int[]) duplicated op: aten::_no_grad_embedding_renorm_(Tensor weight, Tensor input, float max_norm, float norm_type) -> (Tensor) Test Plan: compile Reviewed By: iseeyuan Differential Revision: D21915104 fbshipit-source-id: e0147c76e3c84c02952927a7e158ccb92449640c	2020-06-07 16:25:29 -07:00
Linbin Yu	8177637374	remove duplicated op schema for aten::pow (#39606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39606 Removed duplicated schema for aten::pow Test Plan: Previously there are many duplicated aten::pow ``` aten::pow.int(int a, int b) -> (float) aten::pow.float(float a, float b) -> (float) aten::pow.int_float(int a, float b) -> (float) aten::pow.float_int(float a, int b) -> (float) aten::pow(Scalar a, Scalar b) -> (float) aten::pow.int(int a, int b) -> (int) // duplicated name! aten::pow.float(float a, float b) -> (float) // duplicated schema! aten::pow.int_float(int a, float b) -> (float) // duplicated schema! aten::pow.float_int(float a, int b) -> (float) // duplicated schema! aten::pow(Scalar a, Scalar b) -> (Scalar) // duplicated name! ``` After this diff, there are only 7 ops with different overload name: ``` aten::pow.int(int a, int b) -> (float) aten::pow.float(float a, float b) -> (float) aten::pow.int_float(int a, float b) -> (float) aten::pow.float_int(float a, int b) -> (float) aten::pow(Scalar a, Scalar b) -> (float) aten::pow.Scalar(Scalar a, Scalar b) -> (Scalar) aten::pow.int_to_int(int a, int b) -> (int) ``` Reviewed By: iseeyuan Differential Revision: D21914441 fbshipit-source-id: 1e82c83c77d22206046276bbb52a65088c58ed33	2020-06-07 16:17:34 -07:00
Xiaomeng Yang	614dd03272	Optimize GroupNorm on CUDA (#28204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28204 Optimize GroupNorm on CUDA ghstack-source-id: 105388365 Test Plan: buck test mode/dev-nosan caffe2/test:nn -- "GroupNorm" Reviewed By: houseroad Differential Revision: D17923732 fbshipit-source-id: 9afaf01288bd9d273eed89909bff77243df89e9f	2020-06-07 14:34:01 -07:00
Vasiliy Kuznetsov	ebdff07d49	instancenorm: static quant graph mode support (#39096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39096 Hooks up instancenorm for graph mode static quant Test Plan: ``` python test/test_quantization.py TestQuantizeScriptPTSQOps.test_instance_norm ``` Imported from OSS Differential Revision: D21885258 fbshipit-source-id: 650cc5b162dda044866176fea6c345082d9788ed	2020-06-07 13:38:28 -07:00
Vasiliy Kuznetsov	b443ca26c5	groupnorm: graph mode static quant support (#39095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39095 Hooks up groupnorm to graph mode static quant Test Plan: ``` python test/test_quantization.py TestQuantizeScriptPTSQOps.test_group_norm ``` Imported from OSS Differential Revision: D21885257 fbshipit-source-id: 3415c4de76181b026d2f5bfebab130fea29e1d1e	2020-06-07 13:38:22 -07:00
Vasiliy Kuznetsov	952deba828	layernorm: eager mode qat support (#39094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39094 Adds eager mode QAT handling for LayerNorm Test Plan: ``` python test/test_quantization.py TestQuantizationAwareTraining.test_normalization ``` Imported from OSS Differential Revision: D21885260 fbshipit-source-id: 4f4c84a8bb8ba15dd78494f92569ed3a30d89169	2020-06-07 13:38:16 -07:00
Vasiliy Kuznetsov	b530176d10	instancenorm: eager mode QAT support (#39093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39093 Adds eager mode QAT support for instancenorm Test Plan: ``` python test/test_quantization.py TestQuantizationAwareTraining.test_normalization ``` Imported from OSS Differential Revision: D21885264 fbshipit-source-id: 7753995eed895bad26f713a857c6b0d194ea99d9	2020-06-07 13:38:10 -07:00
Vasiliy Kuznetsov	202625ba9e	groupnorm: eager mode QAT support (#39092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39092 Adds eager mode QAT support for GroupNorm. Test Plan: ``` python test/test_quantization.py TestQuantizationAwareTraining.test_normalization ``` Imported from OSS Differential Revision: D21885261 fbshipit-source-id: 0352e6a830e6384e7ad747067f8bf8ad64ab7fa8	2020-06-07 13:38:05 -07:00
Vasiliy Kuznetsov	2140874228	instancenorm: eager static quant support (#39091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39091 Adds eager mode static quant support for instancenorm. Test Plan: ``` python test/test_quantization.py TestPostTrainingStatic.test_normalization python test/test_quantization.py TestStaticQuantizedModule.test_instance_norm ``` Imported from OSS Differential Revision: D21885265 fbshipit-source-id: 277506faf108f3561867cd8449a2390b7a44c462	2020-06-07 13:37:59 -07:00
Vasiliy Kuznetsov	f9b675f7b6	groupnorm: eager static quant support (#39090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39090 Makes quantized GroupNorm work in eager mode post training static quant. Test Plan: ``` python test/test_quantization.py TestPostTrainingStatic.test_normalization python test/test_quantization.py TestStaticQuantizedModule.test_group_norm ``` Imported from OSS Differential Revision: D21885262 fbshipit-source-id: 58b0ffb59c601fcb4c79f711c7c98a667ffc6170	2020-06-07 13:37:53 -07:00
Vasiliy Kuznetsov	26bc272793	quant: clean up normalization channels_last handling (#37802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37802 * adds test coverage for channels_last input format for quantized normalization ops * fixes quantized group_norm and instance_norm to always return contiguous tensors Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_group_norm python test/test_quantization.py TestQuantizedOps.test_qlayer_norm python test/test_quantization.py TestQuantizedOps.test_instance_norm ``` Imported from OSS Differential Revision: D21395196 fbshipit-source-id: df55e842fe93ae594a336f1b115faea9ba3c88c1	2020-06-07 13:35:49 -07:00
Supriya Rao	8a4597b808	[quant][graphmode] Dynamic quantInsert observers for module output (#39458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39458 Previously if we had a CallMethod followed by a CallFunction, we didn't check for observers at output of CallMethod since it was handled separately. This change makes it default to check the outputs of all nodes to identify values that need observers Test Plan: python test/test_quantization.py test_dynamic_shared_weights Imported from OSS Differential Revision: D21872939 fbshipit-source-id: 08dd8b7ddf73ef2cc26ebcf4ceb2f222c4559ab3	2020-06-07 11:11:23 -07:00
Supriya Rao	67115b226a	[quant][graphmode] Dynamic Quant Do not depend on input shapes (#39412 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39412 This PR introduces changes to enable running the weight observer standalone in the graph It extracts the nodes from the graph that correspond to the observed weight value and adds all the related nodes to a new subgraph The subgraph is then executed using GraphFunction Test Plan: python test/test_quantization.py TestGraphMostPostTrainingStatic python test/test_quantization.py TestQuantizeDynamicScript Imported from OSS Differential Revision: D21872940 fbshipit-source-id: 55f1dcc2caef193531e2b807c8e56288b9794520	2020-06-07 11:09:44 -07:00
Jerry Zhang	6d13b583a7	[quant][graphmode] Support conv*d_relu in traced models (#39490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39490 Test Plan: Imported from OSS Differential Revision: D21917117 fbshipit-source-id: c96633aaaa347529cc1ca6ca1c982cfb04675ccf	2020-06-07 07:36:20 -07:00
xueht-fnst	faf0a3bd7a	Move bernoulli_() to DistributionTemplates (#38558 ) Summary: resolve the feature introduced in https://github.com/pytorch/pytorch/issues/37373 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38558 Differential Revision: D21920685 Pulled By: pbelevich fbshipit-source-id: 50c77d9aaa334b3276a2352afe6c4ad03f12be31	2020-06-07 07:18:30 -07:00
Kaikai Wang	a25b1b918b	Fix __STDC_FORMAT_MACROS redefinition issue for TypeDerived (#39608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39608 As title. When adding a new build mode TypeDerived failed to compile due to macro redefinition. Conditional define fixes this issue. Test Plan: Tests pass. Reviewed By: iseeyuan Differential Revision: D21914975 fbshipit-source-id: 12e04af29b7510106e8e47fa48e30b829aeff467	2020-06-07 00:45:54 -07:00
Jiakai Liu	183b04da3e	[pytorch] remove tracing logic from gen_variable_factories.py (#39514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39514 For methods in `variable_factories.h`, we set `AutoNonVariableTypeMode` guard before dispatching, which also disables tracing as side effort, so we need replicate the tracing logic. Now as we have created separate TraceType, we should be able to remove tracing from `variable_factories.h`. Example of old code: ``` inline at::Tensor arange(at::Scalar start, at::Scalar end, at::Scalar step, const at::TensorOptions & options = {}) { #if !defined(PYTORCH_DISABLE_TRACING) torch::jit::Node* node = nullptr; std::shared_ptr<jit::tracer::TracingState> tracer_state; if (jit::tracer::isTracing()) { tracer_state = jit::tracer::getTracingState(); at::Symbol op_name; op_name = jit::Symbol::fromQualString("aten::arange"); node = tracer_state->graph->create(op_name, /num_outputs=/0); jit::tracer::recordSourceLocation(node); jit::tracer::addInputs(node, "start", start); jit::tracer::addInputs(node, "end", end); jit::tracer::addInputs(node, "step", step); jit::tracer::addInputs(node, "options", options); tracer_state->graph->insertNode(node); jit::tracer::setTracingState(nullptr); } #endif at::Tensor tensor = ([&]() { at::AutoNonVariableTypeMode non_var_type_mode(true); return at::arange(start, end, step, at::TensorOptions(options)); })(); at::Tensor result = autograd::make_variable(std::move(tensor), /requires_grad=/options.requires_grad()); #if !defined(PYTORCH_DISABLE_TRACING) if (tracer_state) { jit::tracer::setTracingState(std::move(tracer_state)); jit::tracer::addOutput(node, result); } #endif return result; } ``` Example of new code: ``` inline at::Tensor arange(at::Scalar start, at::Scalar end, at::Scalar step, const at::TensorOptions & options = {}) { at::Tensor tensor = ([&]() { at::AutoNonVariableTypeMode non_var_type_mode(true); return at::arange(start, end, step, at::TensorOptions(options)); })(); at::Tensor result = autograd::make_variable(std::move(tensor), /requires_grad=/options.requires_grad()); return result; } ``` ghstack-source-id: 105407617 Test Plan: CI Differential Revision: D21880936 fbshipit-source-id: 19a4330eed5bc1ee956ad1c638a9658e7a1ce283	2020-06-07 00:17:48 -07:00
Jiakai Liu	9db27a50b4	[pytorch] add operator name to callBoxed() error message (#39562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39562 Improve error message to help debug missing boxed kernel error. ghstack-source-id: 105407643 Test Plan: CI Differential Revision: D21900540 fbshipit-source-id: 3d977bf2a7b886be9b3f940342c9bc5e186479e4	2020-06-07 00:01:25 -07:00
Jerry Zhang	e4627e5dba	[quant][graphmode] Fix add_relu patterns for scripting and tracing (#39455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39455 1. enable filters in PatternInfo 2. add aten_add_alpha_is_one filter 3. add is_functional_relu filter 4. add is_relu_module filter 5. fix the relu module method call matching in traced modules with regex 6. add aten::add - aten::relu patterns for traced modules Test Plan: Imported from OSS Differential Revision: D21917118 fbshipit-source-id: e67b55cd1c070fd4238f563d933a6f10a3582ae3	2020-06-06 23:51:34 -07:00
Shawn Zhong	2da5444221	[Resubmit] Fix argmin/max bug (#39576 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38922 See previous PR: https://github.com/pytorch/pytorch/pull/38946 cc: ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/39576 Differential Revision: D21906490 Pulled By: ngimel fbshipit-source-id: f3bfb4e14c4cee60a1e3b80c049945ce85f9f494	2020-06-06 23:47:12 -07:00
Linbin Yu	644d6a09e6	add overload name for aten::as_tensor (#39610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39610 add overload name for aten::as_tensor there are two aten::as_tensor ``` aten::as_tensor.float(float t, , int? dtype=None, Device? device=None) -> (Tensor) aten::as_tensor.int(int t, , int? dtype=None, Device? device=None) -> (Tensor) aten::as_tensor.bool(bool t, , int? dtype=None, Device? device=None) -> (Tensor) aten::as_tensor(t[] data, , int? dtype=None, Device? device=None) -> (Tensor) aten::as_tensor(Tensor(a) data, , int? dtype=None, Device? device=None) -> (Tensor(b\|a)) ``` change one to aten::as_tensor.list Test Plan: verified no duplicated op name after this diff This is the full list: ``` prim::TupleUnpack(Any tup) -> (...) prim::unchecked_cast(t x) -> (t) aten::IntImplicit(Tensor a) -> (int) aten::FloatImplicit(Tensor a) -> (float) aten::ScalarImplicit(Tensor a) -> (Scalar) aten::Bool.Tensor(Tensor a) -> (bool) aten::Bool.int(int a) -> (bool) aten::Bool.float(float a) -> (bool) aten::Float.Tensor(Tensor a) -> (float) aten::Float.Scalar(Scalar a) -> (float) aten::Float.int(int a) -> (float) aten::Float.bool(bool a) -> (float) aten::Float.str(str a) -> (float) aten::format(str self, ...) -> (str) prim::NumToTensor.Scalar(Scalar a) -> (Tensor) prim::RaiseException(str msg) -> () aten::Size(int[] sizes) -> (int[]) aten::size(Tensor self) -> (int[]) prim::TupleIndex(Any tup, int i) -> (Any) aten::ne.int_list(int[] a, int[] b) -> (bool) prim::unchecked_unwrap_optional(t(a)? optional) -> (t(a)) prim::device(Tensor a) -> (Device) prim::dtype(Tensor a) -> (int) aten::__not__(bool self) -> (bool) aten::__is__(t1 self, t2 obj) -> (bool) aten::__isnot__(t1 self, t2 obj) -> (bool) aten::element_size(Tensor self) -> (int) aten::numel(Tensor self) -> (int) aten::dim(Tensor self) -> (int) aten::get_device(Tensor self) -> (int) aten::storage_offset(Tensor self) -> (int) aten::is_contiguous(Tensor self) -> (bool) aten::select.t(t[](a) list, int idx) -> (t()) aten::__getitem__.t(t[](a) list, int idx) -> (t()) aten::append.t(t[](a!) self, t(c -> ) el) -> (t[](a!)) aten::reverse.t(t[](a!) self) -> () aten::extend.t(t[](a!) self, t[] other) -> () aten::copy.t(t[](a) self) -> (t[]) aten::_set_item.t(t[](a!) l, int idx, t(b -> ) el) -> (t[](a!)) aten::clear.t(t[](a!) self) -> () aten::Delete.t(t[](a!) self, int idx) -> () aten::insert.t(t[](a!) self, int idx, t(b -> ) el) -> () aten::pop.t(t[](a!) self, int idx=-1) -> (t()) aten::add.t(t[] a, t[] b) -> (t[]) aten::add_.t(t[](a!) self, t[] b) -> (t[]) aten::slice.t(t[] l, int start, int end=9223372036854775807, int step=1) -> (t[]) aten::list.t(t[] l) -> (t[]) aten::mul.left_t(t[] l, int n) -> (t[]) aten::mul.right_(int n, t[] l) -> (t[]) aten::mul_.t(t[](a!) l, int n) -> (t[](a!)) aten::len.t(t[] a) -> (int) aten::eq.int_list(int[] a, int[] b) -> (bool) prim::Uninitialized() -> (Any) prim::Print(...) -> () aten::eq.int(int a, int b) -> (bool) aten::eq.float(float a, float b) -> (bool) aten::eq.int_float(int a, float b) -> (bool) aten::eq.float_int(float a, int b) -> (bool) aten::eq(Scalar a, Scalar b) -> (bool) aten::eq.str(str a, str b) -> (bool) aten::ne.int(int a, int b) -> (bool) aten::ne.float(float a, float b) -> (bool) aten::ne.int_float(int a, float b) -> (bool) aten::ne.float_int(float a, int b) -> (bool) aten::ne(Scalar a, Scalar b) -> (bool) aten::ne.str(str a, str b) -> (bool) aten::lt.int(int a, int b) -> (bool) aten::lt.float(float a, float b) -> (bool) aten::lt.int_float(int a, float b) -> (bool) aten::lt.float_int(float a, int b) -> (bool) aten::lt(Scalar a, Scalar b) -> (bool) aten::lt.str(str a, str b) -> (bool) aten::gt.int(int a, int b) -> (bool) aten::gt.float(float a, float b) -> (bool) aten::gt.int_float(int a, float b) -> (bool) aten::gt.float_int(float a, int b) -> (bool) aten::gt(Scalar a, Scalar b) -> (bool) aten::gt.str(str a, str b) -> (bool) aten::le.int(int a, int b) -> (bool) aten::le.float(float a, float b) -> (bool) aten::le.int_float(int a, float b) -> (bool) aten::le.float_int(float a, int b) -> (bool) aten::le(Scalar a, Scalar b) -> (bool) aten::le.str(str a, str b) -> (bool) aten::ge.int(int a, int b) -> (bool) aten::ge.float(float a, float b) -> (bool) aten::ge.int_float(int a, float b) -> (bool) aten::ge.float_int(float a, int b) -> (bool) aten::ge(Scalar a, Scalar b) -> (bool) aten::ge.str(str a, str b) -> (bool) aten::add.int(int a, int b) -> (int) aten::add.float(float a, float b) -> (float) aten::add.int_float(int a, float b) -> (float) aten::add.float_int(float a, int b) -> (float) aten::add(Scalar a, Scalar b) -> (Scalar) aten::sub.int(int a, int b) -> (int) aten::sub.float(float a, float b) -> (float) aten::sub.int_float(int a, float b) -> (float) aten::sub.float_int(float a, int b) -> (float) aten::sub(Scalar a, Scalar b) -> (Scalar) aten::mul.int(int a, int b) -> (int) aten::mul.float(float a, float b) -> (float) aten::mul.int_float(int a, float b) -> (float) aten::mul.float_int(float a, int b) -> (float) aten::mul(Scalar a, Scalar b) -> (Scalar) aten::__and__(bool a, bool b) -> (bool) aten::__or__(bool a, bool b) -> (bool) aten::__xor__(bool a, bool b) -> (bool) aten::remainder.int(int a, int b) -> (int) aten::remainder.float(float a, float b) -> (float) aten::remainder.int_float(int a, float b) -> (float) aten::remainder.float_int(float a, int b) -> (float) aten::remainder(Scalar a, Scalar b) -> (Scalar) aten::div.int(int a, int b) -> (float) aten::div.float(float a, float b) -> (float) aten::div(Scalar a, Scalar b) -> (float) aten::floordiv.int(int a, int b) -> (int) aten::floordiv.float(float a, float b) -> (float) aten::floordiv.int_float(int a, float b) -> (float) aten::floordiv.float_int(float a, int b) -> (float) aten::floordiv(Scalar a, Scalar b) -> (Scalar) aten::pow.int(int a, int b) -> (float) aten::pow.float(float a, float b) -> (float) aten::pow.int_float(int a, float b) -> (float) aten::pow.float_int(float a, int b) -> (float) aten::pow(Scalar a, Scalar b) -> (float) aten::pow.Scalar(Scalar a, Scalar b) -> (Scalar) aten::pow.int_to_int(int a, int b) -> (int) prim::min.int(int a, int b) -> (int) prim::min.float(float a, float b) -> (float) prim::min.int_float(int a, float b) -> (float) prim::min.float_int(float a, int b) -> (float) prim::min(Scalar a, Scalar b) -> (Scalar) prim::max.int(int a, int b) -> (int) prim::max.float(float a, float b) -> (float) prim::max.int_float(int a, float b) -> (float) prim::max.float_int(float a, int b) -> (float) prim::max(Scalar a, Scalar b) -> (Scalar) prim::type(Device self) -> (str) aten::len.Tensor(Tensor t) -> (int) aten::index.Tensor_hacked_twin(Tensor self, Tensor[] indices) -> (Tensor) aten::_index_put_impl_.hacked_twin(Tensor(a!) self, Tensor[] indices, Tensor values, bool accumulate=False, bool unsafe=False) -> (Tensor(a!)) aten::index_put_.hacked_twin(Tensor(a!) self, Tensor[] indices, Tensor values, bool accumulate=False) -> (Tensor(a!)) aten::index_put.hacked_twin(Tensor self, Tensor[] indices, Tensor values, bool accumulate=False) -> (Tensor) aten::to.prim_Device(Tensor(a) self, Device? device, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(b\|a)) aten::to.prim_dtype(Tensor(a) self, int? dtype=None, bool non_blocking=False, bool copy=False) -> (Tensor(b\|a)) prim::is_cuda(Tensor a) -> (bool) prim::data(Tensor(a) a) -> (Tensor(a)) prim::min.int_list(int[] l, int[] r) -> (int[]) prim::max.int_list(int[] l, int[] r) -> (int[]) prim::min.self_int(int[] self) -> (int) prim::max.self_int(int[] self) -> (int) prim::min.float_list(float[] l, float[] r) -> (float[]) prim::max.float_list(float[] l, float[] r) -> (float[]) prim::min.self_float(float[] self) -> (float) prim::max.self_float(float[] self) -> (float) prim::min.bool_list(bool[] l, bool[] r) -> (bool[]) prim::max.bool_list(bool[] l, bool[] r) -> (bool[]) prim::min.self_bool(bool[] self) -> (bool) prim::max.self_bool(bool[] self) -> (bool) aten::len.Dict_str(Dict(str, t) self) -> (int) aten::keys.str(Dict(str, t) self) -> (str[]()) aten::values.str(Dict(str, t) self) -> (t[]()) aten::__getitem__.Dict_str(Dict(str, t) self, str key) -> (t()) aten::get.str(Dict(str, t) self, str key) -> (t()?) aten::get.default_str(Dict(str, t) self, str key, t default_value) -> (t()) aten::setdefault.str(Dict(str, t)(a!) self, str(b -> ) key, t(c -> ) default_value) -> (t()) aten::Delete.Dict_str(Dict(str, t)(a!) self, str key) -> () aten::pop.Dict_str(Dict(str, t)(a!) self, str key) -> (t()) aten::pop.Dict_default_str(Dict(str, t)(a!) self, str key, t default_value) -> (t()) aten::popitem.str(Dict(str, t)(a!) self) -> ((str, t)) aten::clear.str(Dict(str, t)(a!) self) -> () aten::update.str(Dict(str, t)(a!) self, Dict(str, t)(a!) to_add) -> () aten::items.str(Dict(str, t) self) -> ((str, t)[]) aten::copy.Dict_str(Dict(str, t)(a) self) -> (Dict(str, t)) aten::__contains__.str(Dict(str, t) dict, str key) -> (bool) aten::_set_item.str(Dict(str, t)(a!) l, str(b -> ) idx, t(c -> ) v) -> () aten::dict.str((str, tVal)[] inputs) -> (Dict(str, tVal)) aten::len.Dict_int(Dict(int, t) self) -> (int) aten::keys.int(Dict(int, t) self) -> (int[]()) aten::values.int(Dict(int, t) self) -> (t[]()) aten::__getitem__.Dict_int(Dict(int, t) self, int key) -> (t()) aten::get.int(Dict(int, t) self, int key) -> (t()?) aten::get.default_int(Dict(int, t) self, int key, t default_value) -> (t()) aten::setdefault.int(Dict(int, t)(a!) self, int(b -> ) key, t(c -> ) default_value) -> (t()) aten::Delete.Dict_int(Dict(int, t)(a!) self, int key) -> () aten::pop.Dict_int(Dict(int, t)(a!) self, int key) -> (t()) aten::pop.Dict_default_int(Dict(int, t)(a!) self, int key, t default_value) -> (t()) aten::popitem.int(Dict(int, t)(a!) self) -> ((int, t)) aten::clear.int(Dict(int, t)(a!) self) -> () aten::update.int(Dict(int, t)(a!) self, Dict(int, t)(a!) to_add) -> () aten::items.int(Dict(int, t) self) -> ((int, t)[]) aten::copy.Dict_int(Dict(int, t)(a) self) -> (Dict(int, t)) aten::__contains__.int(Dict(int, t) dict, int key) -> (bool) aten::_set_item.int(Dict(int, t)(a!) l, int(b -> ) idx, t(c -> ) v) -> () aten::dict.int((int, tVal)[] inputs) -> (Dict(int, tVal)) aten::len.Dict_float(Dict(float, t) self) -> (int) aten::keys.float(Dict(float, t) self) -> (float[]()) aten::values.float(Dict(float, t) self) -> (t[]()) aten::__getitem__.Dict_float(Dict(float, t) self, float key) -> (t()) aten::get.float(Dict(float, t) self, float key) -> (t()?) aten::get.default_float(Dict(float, t) self, float key, t default_value) -> (t()) aten::setdefault.float(Dict(float, t)(a!) self, float(b -> ) key, t(c -> ) default_value) -> (t()) aten::Delete.Dict_float(Dict(float, t)(a!) self, float key) -> () aten::pop.Dict_float(Dict(float, t)(a!) self, float key) -> (t()) aten::pop.Dict_default_float(Dict(float, t)(a!) self, float key, t default_value) -> (t()) aten::popitem.float(Dict(float, t)(a!) self) -> ((float, t)) aten::clear.float(Dict(float, t)(a!) self) -> () aten::update.float(Dict(float, t)(a!) self, Dict(float, t)(a!) to_add) -> () aten::items.float(Dict(float, t) self) -> ((float, t)[]) aten::copy.Dict_float(Dict(float, t)(a) self) -> (Dict(float, t)) aten::__contains__.float(Dict(float, t) dict, float key) -> (bool) aten::_set_item.float(Dict(float, t)(a!) l, float(b -> ) idx, t(c -> ) v) -> () aten::dict.float((float, tVal)[] inputs) -> (Dict(float, tVal)) aten::len.Dict_Tensor(Dict(Tensor, t) self) -> (int) aten::keys.Tensor(Dict(Tensor, t) self) -> (Tensor[]()) aten::values.Tensor(Dict(Tensor, t) self) -> (t[]()) aten::__getitem__.Dict_Tensor(Dict(Tensor, t) self, Tensor key) -> (t()) aten::get.Tensor(Dict(Tensor, t) self, Tensor key) -> (t()?) aten::get.default_Tensor(Dict(Tensor, t) self, Tensor key, t default_value) -> (t()) aten::setdefault.Tensor(Dict(Tensor, t)(a!) self, Tensor(b -> ) key, t(c -> ) default_value) -> (t()) aten::Delete.Dict_Tensor(Dict(Tensor, t)(a!) self, Tensor key) -> () aten::pop.Dict_Tensor(Dict(Tensor, t)(a!) self, Tensor key) -> (t()) aten::pop.Dict_default_Tensor(Dict(Tensor, t)(a!) self, Tensor key, t default_value) -> (t()) aten::popitem.Tensor(Dict(Tensor, t)(a!) self) -> ((Tensor, t)) aten::clear.Tensor(Dict(Tensor, t)(a!) self) -> () aten::update.Tensor(Dict(Tensor, t)(a!) self, Dict(Tensor, t)(a!) to_add) -> () aten::items.Tensor(Dict(Tensor, t) self) -> ((Tensor, t)[]) aten::copy.Dict_Tensor(Dict(Tensor, t)(a) self) -> (Dict(Tensor, t)) aten::__contains__.Tensor(Dict(Tensor, t) dict, Tensor key) -> (bool) aten::_set_item.Tensor(Dict(Tensor, t)(a!) l, Tensor(b -> ) idx, t(c -> ) v) -> () aten::dict.Tensor((Tensor, tVal)[] inputs) -> (Dict(Tensor, tVal)) aten::split(Tensor self, int[] split_sizes, int dim=0) -> (Tensor[]) aten::tensor.float(float t, , int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor) aten::as_tensor.float(float t, , int? dtype=None, Device? device=None) -> (Tensor) aten::tensor.int(int t, , int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor) aten::as_tensor.int(int t, , int? dtype=None, Device? device=None) -> (Tensor) aten::tensor.bool(bool t, , int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor) aten::as_tensor.bool(bool t, , int? dtype=None, Device? device=None) -> (Tensor) aten::_infer_size(int[] a, int[] b) -> (int[]) aten::_no_grad_embedding_renorm_(Tensor weight, Tensor input, float max_norm, float norm_type) -> (Tensor) aten::tensor(t[] data, , int? dtype=None, Device? device=None, bool requires_grad=False) -> (Tensor) aten::as_tensor(Tensor(a) data, , int? dtype=None, Device? device=None) -> (Tensor(b\|a)) aten::as_tensor.list(t[] data, , int? dtype=None, Device? device=None) -> (Tensor) aten::_pack_sequence(Tensor output, Tensor batch_sizes, Tensor? sorted_indices, Tensor? unsorted_indices) -> (Tensor, Tensor, Tensor?, Tensor?) aten::_get_tracing_state() -> (bool) aten::is_scripting() -> (bool) aten::_no_grad_uniform_(Tensor(a!) tensor, float a, float b) -> (Tensor(a!)) aten::_no_grad_normal_(Tensor(a!) tensor, float mean, float std) -> (Tensor(a!)) aten::_no_grad_fill_(Tensor(a!) tensor, float val) -> (Tensor(a!)) aten::_no_grad_zero_(Tensor(a!) tensor) -> (Tensor(a!)) ``` Reviewed By: iseeyuan Differential Revision: D21915144 fbshipit-source-id: 35faac8db03931aebad6089488ef6ca691d230d9	2020-06-06 23:42:18 -07:00
Linbin Yu	b28422d444	add overload name for str cmp (#39607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39607 add overload name for strcmp macro to prevent duplicated op names in lite interpreter also reformatted some other files Test Plan: verified these op schema are changed ``` -aten::eq(str a, str b) -> (bool) +aten::eq.str(str a, str b) -> (bool) -aten::ne(str a, str b) -> (bool) +aten::ne.str(str a, str b) -> (bool) -aten::lt(str a, str b) -> (bool) +aten::lt.str(str a, str b) -> (bool) -aten::gt(str a, str b) -> (bool) +aten::gt.str(str a, str b) -> (bool) -aten::le(str a, str b) -> (bool) +aten::le.str(str a, str b) -> (bool) -aten::ge(str a, str b) -> (bool) +aten::ge.str(str a, str b) -> (bool) ``` Reviewed By: iseeyuan Differential Revision: D21913049 fbshipit-source-id: 518db068c8c5b0efd19223f0bd94fc3351335dc4	2020-06-06 23:21:35 -07:00
Tongzhou Wang	479b04e26a	Improve DistributedSampler docs and add seed option (#39628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39628 Differential Revision: D21920373 Pulled By: mrshenli fbshipit-source-id: d7d1005db6feef4a83a1a094b85fcff964bd0ac6	2020-06-06 14:24:22 -07:00
peter	f2af07d7f6	Fix circleci postnightly jobs (#39627 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39626. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39627 Differential Revision: D21920220 Pulled By: seemethere fbshipit-source-id: 0cd2aa10f01b3f65ca4c330ff8bdf941824b7be3	2020-06-06 10:12:24 -07:00
Luca Wehrstedt	53c19423cf	Update TensorPipe submodule (#39598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39598 In order to include `af5f68b241` Test Plan: CircleCI Reviewed By: mrshenli Differential Revision: D21910997 fbshipit-source-id: 98ac0a9431576e2984c0cac99cc83f7ba967ccde	2020-06-06 00:36:37 -07:00
Paul Shao	6a75f650dd	Implement Quantized Version of Threshold Function (#39352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39352 In this task, the quantized backend of the kernel is implemented for the threshold function, which clamps the entries in a tensor less than or equal to a given threshold to be a specified value. The corresponding Python implementation and unit test are also added. Test Plan: 1. On a devserver, build PyTorch from source by running the command `buck build mode/dev //caffe2:torch` 2. Run the unit test throught the command `buck test mode/dev //caffe2/test:quantization -- test_qthreshold` Reviewed By: z-a-f Differential Revision: D21822446 fbshipit-source-id: e8c869664e6d4c664f0e7fa3957762992118c082	2020-06-05 23:07:48 -07:00
Jerry Zhang	3669e45736	[jit][subgraph_matcher] Enable regex matching for string attributes of node (#39454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39454 Test Plan: Imported from OSS Differential Revision: D21876224 fbshipit-source-id: c0fdff3a4532d2a73b222353e2cad6cf52444697	2020-06-05 23:03:38 -07:00
Wanchao Liang	856215509d	[jit] update to serialization doc (#39025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39025 Test Plan: Imported from OSS Differential Revision: D21911710 Pulled By: wanchaol fbshipit-source-id: e3c346feef2ddc36c671d5e1469702854dbfebb3	2020-06-05 17:49:08 -07:00
Chunli Fu	834569232b	[online trainer] Add blob reorder (#39534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39534 Reviewed By: boryiingsu Differential Revision: D21871352 fbshipit-source-id: 00cce83b7351fdafd36d4db57c99fb8a58e8a260	2020-06-05 17:33:08 -07:00
Alban Desmaison	e29d873e68	disable autograd while preparing Tensor for printing (#39420 ) Summary: Minor speed up when printing. Also allows you to print Tensors that you cannot perform autograd ops on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39420 Differential Revision: D21889390 Pulled By: albanD fbshipit-source-id: 4e229994eb89484795282e6eac37359ce46b5ebc	2020-06-05 16:57:48 -07:00
Vasiliy Kuznetsov	e35199a691	observer bench: add CUDA (#39360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39360 Makes the observer microbenchmarks also run on CUDA. This is useful now that QAT is supported in DDP and is more likely to be run on GPUs. Test Plan: ``` python -m pt.qobserver_test ``` Imported from OSS Differential Revision: D21828985 fbshipit-source-id: 6da4d61f744f7a2ee5e87963b3ec84579128d435	2020-06-05 14:18:32 -07:00
Jithun Nair	545a3e1eca	Remove test_nccl from ROCM_BLACKLIST and enable only a couple of test_nccl tests (#39354 ) Summary: All individual test_nccl unit tests have been disabled for ROCm in `bf9395438f` test_nccl was also added to the ROCM_BLACKLIST in `87b198d309` However, the issue only arises when running the test_nccl suite as a whole (as opposed to any one test individually). More details in comments here: https://github.com/pytorch/pytorch/pull/38689 This PR enables test_nccl suite with only two tests so as to workaround the as-yet unresolved issue above, while allowing at least one test_nccl collective test to run on ROCm. This is also needed as a precursor for: https://github.com/pytorch/pytorch/pull/38515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39354 Differential Revision: D21843194 Pulled By: mrshenli fbshipit-source-id: b28d1e073d8d0fdc1b59928fc3b00187cfd02a35	2020-06-05 13:52:23 -07:00
Nikolay Korovaiko	97a2918a07	reduce number of bailout nodes (#38281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38281 Differential Revision: D21665509 Pulled By: Krovatkin fbshipit-source-id: c2c34b759aec30d0a161e582030ba994192ee4ec	2020-06-05 13:45:37 -07:00
KushajveerSingh	88fe05e106	[Docs] Update torch.(squeeze, split, set_printoptions, save) docs. (#39303 ) Summary: I added the following to the docs: 1. `torch.save`. 1. Added doc for `_use_new_zipfile_serialization` argument. 2. Added a note telling that extension does not matter while saving. 3. Added an example showing the use of above argument along with `pickle_protocol=5`. 2. `torch.split` 1. Added an example showing the use of the function. 3. `torch.squeeze` 1. Added a warning for batch_size=1 case. 4. `torch.set_printoptions` 1. Changed the docs of `sci_mode` argument from ``` sci_mode: Enable (True) or disable (False) scientific notation. If None (default) is specified, the value is defined by `_Formatter` ``` to ``` sci_mode: Enable (True) or disable (False) scientific notation. If None (default=False) is specified, the value is defined by `torch._tensor_str._Formatter`. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39303 Differential Revision: D21904504 Pulled By: zou3519 fbshipit-source-id: 92a324257d09d6bcfa0b410d4578859782b94488	2020-06-05 12:57:53 -07:00
Yanan Cao	0031108b60	Support torch.Tensor subclass (like Parameter) input. (#39487 ) Summary: Currently torch.Tensor subclasses (like torch.nn.Parameter) isn't a supported type annotation to torch script inputs. This PR allows it to be treated like torch.Tensor for compilation. Closes https://github.com/pytorch/pytorch/issues/38235 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39487 Differential Revision: D21885827 Pulled By: gmagogsfm fbshipit-source-id: 1ec51829b132b7b0293a6c526d73497b23dae113	2020-06-05 11:58:20 -07:00
Hector Yuen	a6690bdb5b	fix input schema check for spatialbn Summary: we were restricting it to 3, but in training we set up to 5, even that in practice we just need 3 since we don't recompute mean/var Test Plan: contrib tests for fakelowp Reviewed By: hl475 Differential Revision: D21905490 fbshipit-source-id: 48f61c7ba7d95f19d55d2f65514a517c1514ae88	2020-06-05 10:30:44 -07:00
xueht-fnst	51504cb8dd	Fix IDE hint channels_last & preserve_format (#39120 ) Summary: - Fixing the nit introduced in https://github.com/pytorch/pytorch/issues/38784 . - [torch.preserve_format] does not show IDE hint either, it would be fixed here as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39120 Differential Revision: D21904575 Pulled By: ezyang fbshipit-source-id: 80fa1e838e0c444d7b1f2d45e649b51d6c38b54d	2020-06-05 09:48:05 -07:00
Edward Yang	77798a45a6	Un-inline Functions.h into Functions.cpp (#39446 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39446 In my unscientific testing, this reduces startup time by 50% on gcc 8.3. That's a big fucking deal. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21862037 Pulled By: ezyang fbshipit-source-id: 69fb401956304a97f8f80c48cecdb1cb199ff434	2020-06-05 09:12:34 -07:00
Nikita Shulga	e2a178ca21	Update cafe2 hypothesis_test_util to support hypothesis-5 (#39498 ) Summary: Extracting forward-backward `hypothesis` interface update parts of https://github.com/pytorch/pytorch/pull/39430 into a separate PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/39498 Differential Revision: D21900210 Pulled By: malfet fbshipit-source-id: 75e637cf839f49dc141d37e1686ce45ff4721245	2020-06-05 08:27:50 -07:00
Shen Li	baf6ed0238	Release GIL when deleting users and unforked owners (#39555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39555 This function does not require GIL, as all OwnerRRef-related py::object deletion is now guarded by ConcretePyObjectHolder. If we hold lock here, we could potentially run into deadlock, if there are other threads in the RPC thread pool trying to acquire GIL to destruct Python UDFs or OwnerRRefs. Test Plan: Imported from OSS Differential Revision: D21897125 Pulled By: mrshenli fbshipit-source-id: 96157689df38bc409af57b83248ae73823d1f959	2020-06-05 06:51:53 -07:00
Luca Wehrstedt	9bfb91b50b	Fix possible deadlock in _wait_all_workers (#39535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39535 This is my understanding of what could happen: on workerN (N != 0), `_wait_all_workers_sequence_id_to_states`, which is a `defaultdict`, is accessed twice: once in the body of `_wait_all_workers` (by the "main thread" of workerN) and once in `_set_proceed_shutdown_signal`, called by worker0 through a RPC call. I think the two could race and access the `_wait_all_workers_sequence_id_to_states` at the same time, and thus create two separate copies of `WaitAllWorkersStates`. One of those threads would wait on the event of one copy, but the other thread would set the event of the other copy. This lead to a deadlock, as the main thread would end up waiting forever. ghstack-source-id: 105283327 Test Plan: I added additional logging in those functions, ran a stress test of the RPC test suite, based on the logs I suspected that this could be the issue, fixed it and re-run the stress test and didn't see the bug anymore. This is admittedly not very convincing evidence, as I may just have been lucky that second time... Differential Revision: D21889752 fbshipit-source-id: 05ec710bd2930313e1480ae896b4b2f5f503aa17	2020-06-05 02:42:32 -07:00
Shen Li	8a6914ddb2	Add @rpc.functions.async_execution for rpc.remote (#39486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39486 Test Plan: Imported from OSS Differential Revision: D21871422 Pulled By: mrshenli fbshipit-source-id: 3c432b7718a47732b2aee064c554f6bdcc5c95c1	2020-06-04 22:38:35 -07:00
Shen Li	11abb75362	Make @rpc.functions.async_execution processing generic (#39485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39485 Test Plan: Imported from OSS Differential Revision: D21871421 Pulled By: mrshenli fbshipit-source-id: d0e4e82a9098cad364ecbcecff76091155cbda23	2020-06-04 22:38:29 -07:00
Shen Li	fa4ed17183	Explicitly decref in UnpickledPythonCall dtor (#38398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38398 Test Plan: Imported from OSS Differential Revision: D21550712 Pulled By: mrshenli fbshipit-source-id: aac4708a5b6f6dc38149f995d11e27c190648859	2020-06-04 22:36:35 -07:00
Paul Shao	876b9591dc	Refactor unittests for activation functions relu, elu, and sigmoid (#39190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39190 The tests covered previously by test_qrelu, test_qrelu6, test_qsigmoid, and test_qhardsigmoid are now merged into one test to ensure conciseness and reduce redundancy. The refactoring aims to provide the basis for a more generalizable framework to test quantized activation functions and more in the future. Test Plan: 1. On a devserver, build PyTorch from source by running the command "buck build mode/dev //caffe2:torch" 2. Run the merged unit test throught the command "buck test mode/dev //caffe2/test:quantization -- test_qrelu" "buck test mode/dev //caffe2/test:quantization -- test_qrelu6" "buck test mode/dev //caffe2/test:quantization -- test_qsigmoid" "buck test mode/dev //caffe2/test:quantization -- test_qhardsigmoid" Reviewed By: z-a-f Differential Revision: D21755690 fbshipit-source-id: ef62b2a50ee1c3b8607746f47fb587561e75ff25	2020-06-04 19:50:36 -07:00
Mike Ruberry	7d56ef27ee	Bumps supported file format in anticipate of torch.div changes (#39529 ) Summary: See https://github.com/pytorch/pytorch/pull/38620 for additional context. When PyTorch begins producing file format 4 with the updated div behavior it's safe for older PyTorch versions to consume it, since file format 4 only prohibits functionality. Bumping the supported file format version now gives PyTorch users on Master some leeway on updating their services that consume vs. produce PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39529 Differential Revision: D21886790 Pulled By: mruberry fbshipit-source-id: d6098eff06c26f18c3fac5cc85e5db298ba86e27	2020-06-04 19:34:00 -07:00
Venkata Chintapalli	17aebe909f	Added Operator_Schema's for missing FakeFP16 Operators (#39363 ) Summary: Added Operator_Schema's for missing FakeFP16 Operators Reviewer: hyz Pull Request resolved: https://github.com/pytorch/pytorch/pull/39363 Differential Revision: D21885816 Pulled By: hyuen fbshipit-source-id: aa6abd984df40660ab59a37c9898fbac430866da	2020-06-04 19:04:27 -07:00
Jerry Zhang	f94a171e6f	[quant][graphmode] Test for another type of ops in insert_observer for if (#39380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39380 Test for inserting observers for if statement for ops that propagate quantization parameters Test Plan: Imported from OSS Differential Revision: D21832477 fbshipit-source-id: 6e0b4ce4a89f847af161bb22338525802adb8b41	2020-06-04 17:36:28 -07:00
davidriazati	da8191a9ad	Remove useless copy on zip file load (#36362 ) Summary: Instead of copying to a buffer, then setting a tensor's storage with that buffer, create a storage directly from the file Pull Request resolved: https://github.com/pytorch/pytorch/pull/36362 Pulled By: driazati Differential Revision: D21889537 fbshipit-source-id: edbd430073c2bbf52332fe7b3b2590e7d936dedf	2020-06-04 16:59:54 -07:00
Venkata Chintapalli	ed12df64ca	misc updates to fake fp16 tests (#39405 ) Summary: Misc updates to the fake FP16 tests. 1. seeding numpy with a random seed 2. test base class changed from unittest.TestCase=>serial.SerializedTestCase 3. Removed the hypothesis_test_util import Reviewer: Hector Yuen Pull Request resolved: https://github.com/pytorch/pytorch/pull/39405 Test Plan: Fake FP16 test Differential Revision: D21890212 Pulled By: hyuen fbshipit-source-id: 25e7e17f118655f32cdd06ea9db3cdac5277e649	2020-06-04 15:22:18 -07:00
Nikita Shulga	2a513a6a2b	Do not raise decorator (#39532 ) Summary: s/raise unittest.skip/raise unittest.SkipTest/ As `unittest.skip` is a decorator while `unittest.SkipTest` is an exception Pull Request resolved: https://github.com/pytorch/pytorch/pull/39532 Differential Revision: D21889152 Pulled By: malfet fbshipit-source-id: 27a03dbf065a1e2712a63c6c27e156bd13edbbdf	2020-06-04 14:06:19 -07:00
Ailing Zhang	b861daf098	Reduce time spent per guard by comparing TensorType with Tensor (#39098 ) Summary: It mainly reduces the time spent on allocating new TensorType object for Tensor, but comparing them directly. benchmark result before and after this PR: https://gist.github.com/ailzhang/db44d0a1911cae62e0bb794bff33f40a Pull Request resolved: https://github.com/pytorch/pytorch/pull/39098 Differential Revision: D21786678 Pulled By: ailzhang fbshipit-source-id: 2f61f0ac1dc8c529c45bef4e149be431ff1608b0	2020-06-04 13:50:18 -07:00
Nikita Shulga	8811e4d00d	Add/fix typing annotations to some functions (#39075 ) Summary: Add missing typing imports to some jit tests Add typing annotations to `torch.testing._compare_scalars_internal` and `torch.testing._internal.assertTrue` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39075 Differential Revision: D21882468 Pulled By: malfet fbshipit-source-id: dd9858eb8e11a38411544cc64daf36fced807d76	2020-06-04 13:40:04 -07:00
James Reed	da2f8c9f1f	deepcopy() of Objects should call __g/setstate__ (#39500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39500 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D21875091 Pulled By: jamesr66a fbshipit-source-id: 105875dd220a91bc4fcb8fcfb77fab8b626eb6cb	2020-06-04 13:18:00 -07:00
Negin Raoof	4e5af8d146	[ONNX] Fix type casting for reduce ops (#38829 ) Summary: Fix type casting for reduce ops in ONNX exporter. PyTorch promotes dtypes bool and all integer types to long for these ops. This fix only covers traced modules where dtype is present Pull Request resolved: https://github.com/pytorch/pytorch/pull/38829 Reviewed By: hl475 Differential Revision: D21833533 Pulled By: houseroad fbshipit-source-id: 00d9ff692cc0b09d6ca169f6c63913f04b56f182	2020-06-04 13:04:09 -07:00
Edward Yang	da2004e132	Upgrade lint. (#39483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39483 I fixed all of the new errors that occurred because of the upgrade. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21884575 Pulled By: ezyang fbshipit-source-id: 45c8e1f1ecb410c8d7c46dd3922ad70e982a0685	2020-06-04 12:56:43 -07:00
Xiong Wei	fe684679b0	Fix overflow issues when unpacking large numbers (#39140 ) Summary: Resolve https://github.com/pytorch/pytorch/issues/33111 relax the overflow and precision lost checks when unpacking doubles. Signed-off-by: Xiong Wei <xiongw.fnst@cn.fujitsu.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/39140 Differential Revision: D21885217 Pulled By: ezyang fbshipit-source-id: e2bbe90d719443ea2e1c6b7b2c637f9a943fa5c0	2020-06-04 12:24:24 -07:00
Elias Ellison	49b69b2ade	[JIT] fix broadcasting lists of ints (#39481 ) Summary: Previously, on conversion from python -> c++ it was casted to double list through bad copy pasta. It's pretty unusual for someone to script a broadcasting list function directly since it's an internal api, so it was unlikely to affect anyone. Fix for https://github.com/pytorch/pytorch/issues/39450 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39481 Reviewed By: jamesr66a Differential Revision: D21870557 Pulled By: eellison fbshipit-source-id: e704e5e87d2702a270b7d65c4df444246a134480	2020-06-04 12:16:41 -07:00
Eli Uriegas	7676aa79ec	.circleci: Move binary builds into their own workflow (#39379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39379 Moves binary builds into their own workflow and adds the ability to target specification on them. This allows you to run the binary build workflow on a pull request without the need to modify any configuration at all. Some notes about this implementation: * Upload jobs are still restricted to only the nightly branches and RC tags * Parameters for circleci are currently defined in .circleci/verbatim-sources/header-section.yml * Target specification configuration is currently located at .github/pytorch-circleci-labels.yml Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Differential Revision: D21886341 Pulled By: seemethere fbshipit-source-id: 146ef5df2fea208d33e97862d52c170bf001bc98	2020-06-04 12:06:23 -07:00
Kimish Patel	eb5e0376a2	Selective enabling of xnnpack based max_pool2d in ceil_mode. (#39447 ) Summary: max_pool2d with ceil_mode calculates output size a little differently than what we get with xnnpack max_pool2d. Thus when ceil_mode=True, we disable this path. However if we get the same output size with ceil_mode and without ceil_mode, we should use xnnpack based max_pool2d. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39447 Test Plan: CI Differential Revision: D21873334 Pulled By: kimishpatel fbshipit-source-id: b84abed1505e36e492cc87e7d40664ac63964909	2020-06-04 11:59:08 -07:00
Edward Yang	7680358122	Move some of the definitions in LegacyNNDefinitions.cpp closer to sites (#37531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37531 All of these definitions are no longer "legacy" as their CPU implementations have been ported to ATen. There are probably some layers of indirection that could be reduced here, but for now just do a minor but unlikely to break things cleanup. The last thing in LegacyNNDefinitions truly is still in THCUNN and can't be removed. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21310913 Pulled By: ezyang fbshipit-source-id: 1ff4ff16abddf13f8d583df990242ac4b0461915	2020-06-04 11:52:03 -07:00
krshrimali	335e4a1e3b	Add arcosh, arcsinh and arctanh to unary ops (#38388 ) Summary: This PR aims to add `arcosh`, `arcsinh` and `arctanh` support. Please see issue https://github.com/pytorch/pytorch/issues/38349 for more details. TODOs: * [x] Add test cases for `arcosh`, `arcsinh` and `arctanh`. (need help) * [x] Overload ops if `std::op` does not work with `thrust::complex` types (like for `sinh`, `cosh`). Note: `std::acosh, std::asinh, std::atanh` do not support `thrust::complex` types. Added support for complex types for these 3 ops (`arccosh, arcsinh, arctanh`) cc: mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/38388 Differential Revision: D21882055 Pulled By: mruberry fbshipit-source-id: d334590b47c5a89e491a002c3e41e6ffa89000e3	2020-06-04 11:40:55 -07:00
mattip	ada2652ca6	Restore docs coverage test via sphinx (#39331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39331 Fixes gh-37590 Adds an extra `make coverage` to document building, which uses the built-in facility in sphinx to check docstring coverage. Also fixes a failure to import `torch/jit/supported_ops.py` which broke the [Torchscript Builtins](https://pytorch.org/docs/stable/jit_builtin_functions.html) page. This also adds the required `SPHINXOPTS` to turn warnings into error, but this is commented out. Note that since documentation of `torchvision` is merged in here, failures there would cause failures here if this is made active. Some thought might be needed about pinning the torchvision version merged into documentation. The first commit should fail, since the "ScriptModule" class is commented out. I did that in order to check that a CI failure is properly reported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38244 Differential Revision: D21640589 Pulled By: ezyang fbshipit-source-id: 1e240d81669b5f21404d596de4a27d192dc9fd8a	2020-06-04 10:49:38 -07:00
Edward Yang	b4aceb3884	Fix lint (#39527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39527 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21884798 Pulled By: ezyang fbshipit-source-id: a130bfd4cc122ea1d45e7db7303bf44e04f08703	2020-06-04 10:30:44 -07:00
Jithun Nair	af91df68ed	Remove cuda init patch (#39222 ) Summary: The below lines have been removed from `torch/cuda/__init__.py` anyway: ``` _cudart = _load_cudart() _cudart.cudaGetErrorName.restype = ctypes.c_char_p _cudart.cudaGetErrorString.restype = ctypes.c_char_p ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39222 Differential Revision: D21864397 Pulled By: yns88 fbshipit-source-id: 941b13f92192f930e1dfa4b385e1aec2e321e75f	2020-06-04 09:31:34 -07:00
David Clissold	ac25267753	fix build table for ppc64le (#39475 ) Summary: This corrects the build info for ppc64le in the main README. I am opening this PR before renaming the build job. (So, the "live" master README has the correct "live" link and the PR does not.) Immediately after submitting the PR, I will correct the name of the build job. This will make the new PR link correct, and the current "master" link will briefly appear broken until this PR gets merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39475 Differential Revision: D21883184 Pulled By: malfet fbshipit-source-id: 148353b632448c98e5aff560d31642328afe7963	2020-06-04 08:31:38 -07:00
Yanan Cao	002b19da92	Add SymbolicShape and replace all uses of VaryingShape<ShapeSymbol> with it (#38544 ) Summary: Adding a SymbolicShape class to represent a generic tensor shape with ShapeSymbols. Its core data structure is c10::optional<std::vector<ShapeSymbol>>. If has_value() == false, it represents an unranked tensor shape. At any dimension ShapeSymbol can contain dynamic size, checkable with ShapeSymbol::IsStatic method. SymbolicShape now replaces all uses of VaryingShape<ShapeSymbol>, ie c10::optional<std::vector<c10::optional<ShapeSymbol>>>. The inner c10::optional wrapper around ShapeSymbol used to indicate dynamic shape, which overlaps with part of ShapeSymbol's representation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38544 Reviewed By: ZolotukhinM Differential Revision: D21693984 Pulled By: gmagogsfm fbshipit-source-id: 6e633e4f36cf570d6fb34ac15d00ec1fb2054a09	2020-06-04 06:37:39 -07:00
Gao, Xiang	11a60b9942	Clean up thrust::complex from rsqrt (#39294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39294 Differential Revision: D21818288 Pulled By: anjali411 fbshipit-source-id: ee7758872700a93713ab66565e2a7a9e8a088a94	2020-06-04 06:09:23 -07:00
Mike Ruberry	92c6776761	Fix lint (#39517 ) Summary: Fixes lint. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39517 Reviewed By: lw Differential Revision: D21881495 Pulled By: mruberry fbshipit-source-id: 43b06466d9311d16b0d78d58ed124c1f01807443	2020-06-04 04:57:34 -07:00
Rohan Varma	8b2bb02e09	Implement timeout support for RRefs (#38590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38590 This PR implements timeout semantics for RRef for parity with rpc_sync and rpc_async. How it works: - Timeout parameter is added to rpc.remote. If the rpc.remote call times out, note that the error won't be raised to the user in that call, as it is not blocking (similar to rpc_async). Instead, the timeout error will be raised the next time the RRef is used (either by pickling or to_here call). - Error handling semantics are added to RRef to deal with the timeout errors. Previously, if there was an error creating the OwnerRRef, the callback on the local user would throw an error in a callback, resulting in an `std::terminate`. Instead of this, the error is now caught and surfaced to the user the next time the RRef is used. As part of this, we have added an `RPCErrorType` enum and defined RRef error handlers to handle the `RPCErrorrTypes` (currently just timeout and unknown) - A timeout parameter is added to `to_here()` which gives the user control over the max amount of time it can block for. - `ctx.prepareChildForFork()` which is called when the RRef is pickled (i.e. used as an arg over RPC) checks if the `rpc.remote()` call had timed out, and if so, raises that error to the user. - Tests are added, primarily via delay injection. ghstack-source-id: 105232837 Test Plan: CI Differential Revision: D21588165 fbshipit-source-id: c9f9e8aa3521012ea1de3e0f152a41afdf8b23f3	2020-06-04 02:14:42 -07:00
Jiakai Liu	72b0447f8d	[pytorch] move tracing logic to a separate dispatch backend (#38467 ) Summary: This PR moves tracing logic out of the generated VariableType kernels, to associate it with a new dedicated dispatch key Tracer. It also toggles the dispatch key set at various places to keep the semantics unchanged - see the inline [Tracing Mode Switches] note. Sample generated code: ``` Tensor & __ilshift___Tensor(Tensor & self, const Tensor & other) { #if !defined(PYTORCH_DISABLE_TRACING) torch::jit::Node* node = nullptr; std::shared_ptr<jit::tracer::TracingState> tracer_state; if (jit::tracer::isTracing()) { tracer_state = jit::tracer::getTracingState(); at::Symbol op_name; op_name = jit::Symbol::fromQualString("aten::__ilshift__"); node = tracer_state->graph->create(op_name, /num_outputs=/0); jit::tracer::recordSourceLocation(node); jit::tracer::addInputs(node, "self", self); jit::tracer::addInputs(node, "other", other); tracer_state->graph->insertNode(node); jit::tracer::setTracingState(nullptr); } #endif static auto op = c10::Dispatcher::singleton().findSchemaOrThrow("aten::__ilshift__", "Tensor"); c10::Dispatcher::singleton().redispatch<Tensor &, Tensor &, const Tensor &>(op, c10::DispatchKey::Tracer, self, other); #if !defined(PYTORCH_DISABLE_TRACING) if (tracer_state) { jit::tracer::setTracingState(std::move(tracer_state)); jit::tracer::addOutput(node, self); } #endif return self; } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38467 ghstack-source-id: 105215150 Test Plan: CI Differential Revision: D21570684 fbshipit-source-id: 1a96761830307f9a934f38bfb9fe8b5b1763e0e0	2020-06-04 01:51:30 -07:00
Xiaomeng Yang	03eca384fd	Optimize GroupNorm on CPU (#28203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28203 Optimize GroupNorm on CPU ghstack-source-id: 105149765 Test Plan: buck test mode/dev-nosan caffe2/test:nn -- "GroupNorm" Reviewed By: houseroad Differential Revision: D17901506 fbshipit-source-id: 5eb22ad0e8a9ab2533282b967b2818f690e48865	2020-06-03 23:52:16 -07:00
Oguz Ulgen	4a0a38c17a	Revert D21652452: [pytorch][PR] Fix for num_threads==1 in OpenMP "parallel for" Test Plan: revert-hammer Differential Revision: D21652452 Original commit changeset: 2cda7777c0ea fbshipit-source-id: fdd9a0346ce32a962766f57e13357dd2bc60d8b8	2020-06-03 22:51:51 -07:00
Shen Li	67cea74dd3	Add rpc.async_function decorator for TorchScript functions (#39267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39267 When combined with `torch.jit.script`, the order of decorators matter. `rpc.functions.async_execution` must be the outmost one. The `async_execution` decorator will store the TorchScript function in attribute `_wrapped_async_rpc_function` on the wrapper function, and pass this wrapped TorchScript function (i.e., an instance of `torch.jit.ScriptFunction`) to RPC. The caller will mark the ScriptCall with `isAsyncExecution=true`, and the callee will extract the returned `Future` in C++ and install subsequent processing as a callback to that `Future`. Test Plan: Imported from OSS Differential Revision: D21792688 fbshipit-source-id: de095eb148d21e9114a478e9e6047c707d34fd07	2020-06-03 22:27:15 -07:00
Aayush Naik	0829cadca3	Implement rad2deg, deg2rad (#38852 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/38372. cc mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/38852 Differential Revision: D21868935 Pulled By: mruberry fbshipit-source-id: ae6ded11b743c9d1cdc032984b4abe0a115290d6	2020-06-03 22:21:54 -07:00
neginraoof	4d597cb794	[ONNX] Update pytoch/onnx doc (#39480 ) Summary: Updated dos for operator_export_types and recently added op symbolics. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39480 Reviewed By: hl475 Differential Revision: D21877364 Pulled By: houseroad fbshipit-source-id: 9831fe5776629da897db6d7943f830528cb916d2	2020-06-03 22:15:30 -07:00
Hector Yuen	cc991bbf19	fix internal targets for layernorm (#39501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39501 fix internal targets, and disable the test until it is fixed Test Plan: built and ran the test, but venkat has to get access to nnpi before fine tuning the last few pieces. Currently getting around 1e-5 relative error Reviewed By: yinghai Differential Revision: D21875657 fbshipit-source-id: 3ae762093084fa65b9aeedaef1b2ca1b1e13b587	2020-06-03 22:09:16 -07:00
Ksenija Stanojevic	2f7f47eba1	[ONNX]Enable tests in test_operators.py (#39431 ) Summary: Enable Dropout and SoftmaxCrossEntropy tests in test_operators.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/39431 Reviewed By: hl475 Differential Revision: D21877501 Pulled By: houseroad fbshipit-source-id: 1e9b1e5cf80dc1843bdbde2662f3339e357c6654	2020-06-03 21:49:19 -07:00
Linbin Yu	0102bbf01e	move to.prim_dtype to lite interpreter (#39456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39456 Move aten::to.prim_dtype from full jit to lite interpreter Test Plan: verify TTS model can be used Reviewed By: iseeyuan Differential Revision: D21856104 fbshipit-source-id: 774981a5c04798e3a87cf7d6e6682f35e604944e	2020-06-03 19:24:24 -07:00
Edward Yang	4d880c0693	Device and torch._C function cleanup (#38173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38173 - Introduce torch.types.Device representing all "device-like" types - Stubbed torch.device.__reduce__ - Stubbed all torch._C functions comprehensively - Deleted _safe_call which is unused throughout the codebase Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21497399 Pulled By: ezyang fbshipit-source-id: 1f534442b0ec9a70d556545d072f2c06a08b9d15	2020-06-03 19:17:22 -07:00
Jongsoo Park	4f7c7e2e76	[caffe2] compute r_correction only for radam to avoid sqrt(negative) (#39393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39393 Computing r_correction should be done only for radam . Otherwise can generate floating-point exceptions. Test Plan: buck test caffe2/caffe2/python/operator_test:adam_test -- test_sparse_adam with --caffe2_operator_throw_if_fp_exceptions=1 gflags option Differential Revision: D21834296 fbshipit-source-id: a9e6a93451423e76a99f6591d21cb65d4374b008	2020-06-03 19:09:28 -07:00
Xingying Cheng	adc13432fe	Enabling lite interpreter in torch python API (#39181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39181 Create a python binding classes torch._C. LiteScriptModule for mobile::module, a python class called LiteScriptModule is created which wrap torch._C. LiteScriptModule. Python class LiteScriptModule contains preliminary functions including forward, run_method and __call__. Create a python api "load_for_lite_interpreter" under torch.jit.mobile where takes pre-saved mobile module in a file-like object as input and returns python class LiteScriptModule. Add a python binding method "_save_to_buffer_for_mobile" under ScriptModule, and python method "_save_to_buffer_for_lite_interpreter" under RecursiveScriptModule which saves mobile module into buffer instead of file. ghstack-source-id: 105215736 Test Plan: buck test caffe2/test:mobile Differential Revision: D21757474 fbshipit-source-id: 758b87497d65c4686459a567d41887c7a577aa4c	2020-06-03 18:33:23 -07:00
anjali411	3370c045ae	Remove copy_imag and copy_real methods (#39065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39065 Test Plan: Imported from OSS Differential Revision: D21803939 Pulled By: anjali411 fbshipit-source-id: c7313c527eb6b54d49ef46aa0a839a3418fa8d7e	2020-06-03 18:22:50 -07:00
Martin Yuan	5b23f56d5a	Selective build on Training, query based. (#39452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39452 Selective build works on training. * VariableType_?.cpp are now selectively generated based on the operator list. * Add a flag in pt_operator_library, "train". If it's True, an extra flag of "pt_train_operator_library" will be added to the labels. A query for "pt_train_operator_library" will be done to aggregate the training operators. With this flag we limit the generated VariableType to used training operators only, to conserve the code size. The models for inference only have train = False by default. * For testing purpose, caffe2/fb/pytorch_trainer is created. It's based on full jit but the operators are selectively built. * smartkeyboard_debug_model is used for test. Since the static code analysis is not applied for VariableType yet, the operators are manually added based on debugging error messages. * At build stage, make selective build optional for training code-gen library. The reason is that to make fb4a built, the generated VariableType.cpp needs to depend on torch_mobile_train. Torch_mobile_train is not needed for apps with inference only. In those cases training can be turned off to remove the dependency on torch_mobile_train to save size. It can also be used as a switch to check size regression introduced by training. ghstack-source-id: 105190037 (Note: this ignores all push blocking failures!) Test Plan: Training: ``` buck run -c pt.build_from_deps_query=1 -c pt.selective_build=0 -c pt.static_dispatch=0 xplat/caffe2/fb/pytorch_trainer:trainer ~/models/papaya/keyboard/smartkeyboard_debug_model.pt ``` Inference, with and without the new query-based feature: ``` buck run -c pt.build_from_deps_query=1 -c pt.selective_build=0 -c pt.static_dispatch=0 xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/models/pytext/BI/bi_pytext_0512.bc --input_dims "1,4" --input_type int64 --pytext_len=4 ``` ``` buck run xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/models/pytext/BI/bi_pytext_0512.bc --input_dims "1,4" --input_type int64 --pytext_len=4 ``` Reviewed By: ljk53 Differential Revision: D21459302 fbshipit-source-id: df71a46d74f8c7448cbf51990804104f1384594f	2020-06-03 18:01:48 -07:00
Venkata Chintapalli	d137710a64	LayerNorm Fake FP16 Op debug (#39476 ) Summary: LayerNorm Fake FP16 Op debug. still seeing output mismatches. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39476 Differential Revision: D21871748 Pulled By: hyuen fbshipit-source-id: ab308e3acff9ce21de41b0f006cbee767983f8e4	2020-06-03 17:35:25 -07:00
Nikita Shulga	c0d3d2f60f	Retry/skip test on URLError rather than on HTTPError (#39477 ) Summary: `HTTPError` are raised when server is overloaded, while `URLError` is raised when network is not available And since `HTTPError` is an extension of `URLError`, `URLError` should catch both exceptions Pull Request resolved: https://github.com/pytorch/pytorch/pull/39477 Differential Revision: D21873560 Pulled By: malfet fbshipit-source-id: 11806671b768705465f562087521ad4887fd20f7	2020-06-03 17:29:40 -07:00
ShawnZhong	cb530fcd3c	Enable some test cases in `test_memory_format_operators` (#38648 ) Summary: Re-enable some test cases in `test_memory_format_operators` since their corresponding issue has been fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38648 Differential Revision: D21689085 Pulled By: VitalyFedyunin fbshipit-source-id: 0aa09e0bf31ba98c8ad0191ac3afd31dda0f1d42	2020-06-03 16:02:49 -07:00
Mike Ruberry	9ed5efda47	Adds TestCase.compare_with_numpy (#39179 ) Summary: Cut from https://github.com/pytorch/pytorch/pull/38994. This is a helper function for comparing torch and NumPy behavior. It updates the existing and increasingly popular _np_compare function and moves it to be a method on TestCase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39179 Differential Revision: D21855082 Pulled By: mruberry fbshipit-source-id: edca3b78ae392d32243b02bf61960898b6ba590f	2020-06-03 15:27:32 -07:00
Nick Gibson	d31e84497c	[TensorExpr] some cleanups / fixes for LoopOptions (#39408 ) Summary: Mainly, fix a bug in the HashProvider where it would not include LoopOptions in the hash, meaning two loops would be seen as identical even if they were bound to different thread/block axes. Also added symbolic names for the different axis options. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39408 Differential Revision: D21864494 Pulled By: nickgg fbshipit-source-id: 9c28729984e7a3375e026c78294c9f75b9015123	2020-06-03 15:11:59 -07:00
Sebastian Messmer	e4657fe194	Revert D21579607: [pytorch][PR] Do not call optimizations within freezing API Test Plan: revert-hammer Differential Revision: D21579607 Original commit changeset: a6231754fea8 fbshipit-source-id: 277011605eedee1c3b44fbaf877233b239adf56b	2020-06-03 14:50:45 -07:00
Nick Gibson	2ed4ed8733	[TensorExpr] Fix two bugs in Rfactor (#39268 ) Summary: The two bugs were: * Non-reduction axes were not added when inserting the new ReduceOp, meaning if a reduction with non-reduce axes was rfactored we'd produce bad outputs. There were no tests of Rfactor with non-reduce axis so I modified a test to do this. * The new statements were always prepended to the block, meaning writes to a buffer could be reordered after the usage of that buffer. This mostly happened in the case where we rfactor a previously rfactored reduction. There was a test of this, but since it only tested rfactoring the outer reduction axis there was never any other statements at the insertion point (the tests of the insertion point argument also do this). I added a new test which covers various rfactor-axis cases. Also cleaned up tests, removed some helper code we don't need etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39268 Differential Revision: D21864489 Pulled By: nickgg fbshipit-source-id: d314d20997a8472ec96b72f7a9068d6da6d2399c	2020-06-03 14:38:34 -07:00
Christian Puhrsch	dbec0febd2	Update key_padding_mask arg docs in MHA module (#39321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39321 Reviewed By: zhangguanheng66 Differential Revision: D21825488 Pulled By: Nayef211 fbshipit-source-id: 41ee09e683c4ae838cfd488a342088d591e806e4	2020-06-03 13:49:01 -07:00
Zino Benaissa	5cfd1a190e	Do not call optimizations within freezing API (#38499 ) Summary: This patch removes call to run optimizations within freezing API. Only dead code elimination is invoked to clean up the frozen module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38499 Reviewed By: eellison Differential Revision: D21579607 Pulled By: bzinodev fbshipit-source-id: a6231754fea89296a3dcf07b5e37a1c43cb8d5dd	2020-06-03 13:25:24 -07:00
JackCaoG	46447045ea	Replace torch.allClose with self.assertEqual (#39424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39424 Reviewed By: Krovatkin Differential Revision: D21854870 Pulled By: ailzhang fbshipit-source-id: eb68f1775596e4c963169033444d6d6f4f818d4f	2020-06-03 12:40:50 -07:00
Jongsoo Park	5d2cfb3d4c	[torch] remove integer conversion resulted in a change of sign warning (#38968 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38968 As title Reviewed By: glaringlee Differential Revision: D21711684 fbshipit-source-id: c340360b29849fe9ab0e7be376918c92ba3629be	2020-06-03 12:38:18 -07:00
Eli Uriegas	ec5d579929	.github: Add initial target specifier config (#39378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39378 Will initially only contain a label to trigger builds for binary tests Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Test Plan: Imported from OSS Differential Revision: D21864091 Pulled By: seemethere fbshipit-source-id: f69467ccc797b6b320dc8b7f2d50a8601c172a1f	2020-06-03 11:23:07 -07:00
Shawn Zhong	21ba3b4f40	Fix `torch.backends.cudnn` mypy error (#38947 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38410 ![image](https://user-images.githubusercontent.com/6421097/82724121-74b26880-9c99-11ea-9b63-e92de2dccdf2.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38947 Differential Revision: D21765290 Pulled By: ezyang fbshipit-source-id: 5d2b25f039a653c609d60cdaac4a7ac5812ae291	2020-06-03 10:55:43 -07:00
Vasiliy Kuznetsov	6a60a8c1da	add_observer: respect device affinity for ReLU (#39337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39337 In #39031 we made fake quantize respect device affinity of the original module. However, that PR only handled modules with parameters or buffers, and did not work properly for `ReLU`. Fixing the logic to also work for `ReLU` by passing the parent's device when adding observers. Test Plan: ``` python test/test_quantization.py TestDistributed.test_device_affinity ``` Imported from OSS Differential Revision: D21821243 fbshipit-source-id: cc6abda3694b80ce8ba0440dc6c1b5b58f3c0066	2020-06-03 09:31:36 -07:00
kshitij12345	884e16b41a	`as_strided` : add size and stride length check (#39301 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39281 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39301 Differential Revision: D21849082 Pulled By: gchanan fbshipit-source-id: 5d30ef10767c4d35c6cb59c5e6a9acbfe0270a40	2020-06-03 09:17:54 -07:00
Luca Wehrstedt	5beb3b0c53	[TensorPipe] Re-enable dist optimizer tests (#39441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39441 This is the last test suite to be enabled for TensorPipe. ghstack-source-id: 105166757 Test Plan: Ran the tests, hundreds of times each, in different build modes. Differential Revision: D21858975 fbshipit-source-id: ee0a7e64b77b4b1974f031207031cc14afb3a8c2	2020-06-03 09:00:52 -07:00
Luca Wehrstedt	b1dab266f7	[TensorPipe] Re-enable dist autograd tests (#39440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39440 After the RPC tests, re-enable the second test suite: dist autograd. ghstack-source-id: 105165393 Test Plan: Ran the tests, several times each, in different build configs. Differential Revision: D21858974 fbshipit-source-id: 409377d564c36fecae51b9e4c776d94187b434a2	2020-06-03 08:59:19 -07:00
Peter Bell	aea09f5155	Leak safety in RReLU (#39347 ) Summary: Fixes gh-38966 If `THCTensor_(resizeAs)` fails to allocate, then these `free`s will never be reached. So, instead I use a wrapped tensor to do cleanup automatically. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39347 Differential Revision: D21838933 Pulled By: ezyang fbshipit-source-id: 8c74ecdd720d6712a33ddef6126ea545761a269b	2020-06-03 08:27:58 -07:00
Dylan Bespalko	c767d65caf	Added FPGA DispatchKey, DeviceType, Backend (#38938 ) Summary: ezyang, I have added the changes to DispatchKey, DeviceType, Backend to support the out-of-tree FPGA. cc. tataetae Pull Request resolved: https://github.com/pytorch/pytorch/pull/38938 Differential Revision: D21748955 Pulled By: ezyang fbshipit-source-id: fe76d9730818205961430d2a0e00727b5c547b32	2020-06-03 07:28:14 -07:00
Luca Wehrstedt	3f099879f7	[TensorPipe] Re-enable RPC tests (#39406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39406 For now, just the RPC test (no dist autograd or dist optimizer). I removed the skipping decorator from all the tests except those that explicitly use the ProcessGroup options. Includes #39027. ghstack-source-id: 105159974 Test Plan: Ran the tests several hundred times, in various build modes. Saw some flakes, but at a rate of about 0.1% Differential Revision: D21716069 fbshipit-source-id: 9d2a99e112049a63745772c18e7a58266ed8e74e	2020-06-03 07:14:30 -07:00
Peter Bell	7417b4c66f	Fix index overflow in ConvTranspose3d [attempt 2] (#39198 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32866, resubmit of https://github.com/pytorch/pytorch/issues/38970 The memory error in the issue is caused by int overflowing in col2vol. This version using mixed 32-bit and 64-bit indexing calculation lifts the maximum indexing possible without compromising the performance of ConvTranspose3d. vs 20-30% regression with pure 64-bit indexing. This requires that input.numel() <= UINT_MAX, and channels * kernel.numel() <= UINT_MAX otherwise it raises an error. Previously, the code would crash or give incorrect results unless input.numel() * kernel.numel() <= INT_MAX. Note that the test is a minimised reproducer for the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39198 Differential Revision: D21817836 Pulled By: ezyang fbshipit-source-id: b9adfe9f9dd00f04435be132966b33ac6b9efbef	2020-06-03 07:06:54 -07:00
Shen Li	a05ef17e46	Add rpc.functions.async_execution decorator for rpc_sync/rpc_async (#39216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39216 The `rpc.functions.async_execution` decorator specifies that the wrapped function is guaranteed to return a `torch.futures.Future`. The decorator adds a `_wrapped_async_rpc_function` attribute to the wrapper function. The caller retrieves this information and then sets `isAsyncFunction` argument accordingly which is later added to PythonCall RPC message as a field. On the callee side, if the PythonCall carries an asynchronous function, it will cast the function's return value to a jit::PythonFutureWrapper object, and then install response creation and communication as a callback on the that jit::PythonFutureWrapper. For applications, this feature is useful when a function needs to wait for IO or additional singaling. In those cases, marking the user function as `rpc.functions.async_execution` will prevent it from blocking one thread on callee for too long. Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D21779962 fbshipit-source-id: 6b6aa698bf6f91dad6ed2a7ee433df429b59e941	2020-06-02 23:21:25 -07:00
Lu Fang	15ad9dd30f	[ONNX] Bump up ONNX submodule to a82c6a7010e2e332d8f74ad5b0c726fd47c85376 (#39372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39372 we only bump the submodule in oss to unblock some works Test Plan: ci Reviewed By: hl475 Differential Revision: D21830800 fbshipit-source-id: fb4a716992efcd71926f7bba24a7c24422c17e38	2020-06-02 21:08:14 -07:00
Ilia Cherniavskii	abe2be2063	[resubmit] Use TensorMethods.cpp (#39385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39385 see https://github.com/pytorch/pytorch/pull/37639 Test Plan: https://github.com/pytorch/pytorch/pull/37639 Imported from OSS Differential Revision: D21833287 fbshipit-source-id: 9928d3f4122903d0de67ad312e349352d5f5c19c	2020-06-02 20:27:51 -07:00
mattip	a952f9bb06	Fix for num_threads==1 in OpenMP "parallel for" (#36479 ) Summary: fixes gh-32284 Move the non-parallel stanza out of the parallel context, and use `num_threads` to limit nesting `parallel for`s. The nesting caused a memory leak in the test script in the issue. This should probably have a test somewhere: are there tests for ParallelOpenMP? Pull Request resolved: https://github.com/pytorch/pytorch/pull/36479 Differential Revision: D21652452 Pulled By: ilia-cher fbshipit-source-id: 2cda7777c0eafbe268550a82fed306e52fb6eb25	2020-06-02 18:56:13 -07:00
Nick Gibson	36607c85ee	[TensorExpr] eliminate zero length Allocations in IRSimplifier (#38794 ) Summary: If the size of a temporary buffer is reduced to zero via binding of a dynamic variable we still run the alloc, even though it is a no op. It's easy to strip these out during simplification, so the expr: ``` { Allocate(x, int, {0}); // Stuff... Free(x); } ``` becomes ``` { // Stuff... } ``` I am assuming here that if the allocation size is zero then any usage of the buffer is also eliminated since theres no safe way to refer to a zero size buffer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38794 Differential Revision: D21723656 Pulled By: nickgg fbshipit-source-id: 3eaa8bd8974a13b0a351be04abe2348498b31b02	2020-06-02 18:24:42 -07:00
James Reed	f166b934ee	[JIT] Kill _cast_* operators (#39348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39348 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D21824594 Pulled By: jamesr66a fbshipit-source-id: 2563a886e3e5dd22d23a2f39f32fb077c3fb1dba	2020-06-02 16:35:37 -07:00
Hector Yuen	8638df45ae	call DoRunWitType on Layernorm (#39409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39409 enable running Layernorm fake op Test Plan: ran the test, results are incorrect Reviewed By: amylittleyang Differential Revision: D21845269 fbshipit-source-id: 114e26e4fea80c0a8ab27501503c3ec0dc2fafb5	2020-06-02 16:27:17 -07:00
Xiang Gao	ebd4125e7e	[JIT] Make torch.unique_consecutive compatible (#39339 ) Summary: A `unique_consecutive` version of https://github.com/pytorch/pytorch/pull/38156 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39339 Differential Revision: D21823997 Pulled By: eellison fbshipit-source-id: d14596a36ba36497e296da5a344e0376cef56f1b	2020-06-02 14:54:29 -07:00
Alban Desmaison	c6720f0d6b	nit on functional autograd (#35493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35493 Test Plan: Imported from OSS Differential Revision: D21843416 Pulled By: albanD fbshipit-source-id: af4d017ff4559237dd31e2ccaa1e3a967f7497ba	2020-06-02 14:49:16 -07:00
Hong Xu	89c0efb30b	Also set CMAKE_C_STANDARD for MSVC (#39304 ) Summary: According to <https://gitlab.kitware.com/cmake/cmake/-/blob/master/Modules/Compiler/MSVC-C.cmake>, the option simply has no effect for MSVC as of today. It is better to not impose such an if condition as it is a bit misleading (the current code makes it look like we have compatibility issues with MSVC C11 support), and also it's better to leave the judgment of MSVC C support to CMake devs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39304 Differential Revision: D21846032 Pulled By: malfet fbshipit-source-id: 962e5721da3d7b9be4117b42bdc35df426b7da7b	2020-06-02 13:59:07 -07:00
Nayef Ahmed	71af538e31	Updated assert to remove check on 3rd dim for MHA (#39402 ) Summary: ## Description * Updated assert statement to remove check on 3rd dimension (features) for keys and values in MultiheadAttention / Transform * The feature dimension for keys and values can now be of different sizes * Refer to https://github.com/pytorch/pytorch/issues/27623 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39402 Reviewed By: zhangguanheng66 Differential Revision: D21841678 Pulled By: Nayef211 fbshipit-source-id: f0c9e5e0f33259ae2abb6bf9e7fb14e3aa9008eb	2020-06-02 13:35:39 -07:00
Nikita Shulga	a864dbb360	Make `_C` extension a thin C wrapper (#39375 ) Summary: It just depends on a single `torch_python` library. C library does not depend on standard C++ library and as result it closes https://github.com/pytorch/pytorch/issues/36941 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39375 Reviewed By: orionr Differential Revision: D21840645 Pulled By: malfet fbshipit-source-id: 777c189feee9d6fc686816d92cb9f109b8aac7ca	2020-06-02 13:11:59 -07:00
kshitij12345	09bea13981	support flip and rot90 for complex dtype (#37826 ) Summary: Closes https://github.com/pytorch/pytorch/issues/37698 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37826 Differential Revision: D21657697 Pulled By: mruberry fbshipit-source-id: 16a3899d5de280da692a52bd0ce85d5ebe14cc31	2020-06-02 13:03:14 -07:00
Kimish Patel	58cb369dfa	Replace calls to contiguous with contiguous(suggested memory format) (#38433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38433 Wherever applicable it would be better to call contiguous with appropriate memory format. Plus output should be allocated with the same memory format as input when applicable. Otherwise convert to that format upon returning. This helps with some perf where otherwise calls to contiguous may involve allocation and memcpy. Test Plan: quantization tests Reviewed By: vkuzo Differential Revision: D21559301 fbshipit-source-id: 2ed5de05fb627eef1bf5d76fba0387ba67370007	2020-06-02 12:53:52 -07:00
Shawn Zhong	0d96f26404	Kill THC_logical{Value, Tensor} (#39069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39069 Differential Revision: D21818505 Pulled By: gchanan fbshipit-source-id: a86462a8a67720f2aaf079a67eb6d6c30bd8ea17	2020-06-02 12:41:08 -07:00
Omkar Salpekar	a6f0051db2	Fix test_get_and_set_timeout for TensorPipe Agent (#39353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39353 This test failed with TSAN since the shortened timeout prevented all messages from being processed within the timeout during Phase 1 of wait_all_workers during RPC shutdown. Phase 2 already had a longer timeout, so we extend this to Phase 1 as well. ghstack-source-id: 105045926 Test Plan: Ran the test_get_and_set_timeout with TSAN Differential Revision: D21826783 fbshipit-source-id: 7edfdeb50169b31e997dd36a3fd8eea0e9ae7189	2020-06-02 12:01:11 -07:00
Jongsoo Park	fca928cabf	[caffe2] fix test error in video_input_op_test (#39382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39382 Test Plan: buck test caffe2/caffe2/python/operator_test:video_input_op_test Reviewed By: dutran Differential Revision: D21832355 fbshipit-source-id: 47b1b0610b9600437fe1ed317d5af47d624767fb	2020-06-02 11:48:01 -07:00
Jongsoo Park	04ac41fe70	[caffe2] format video_input_op_test.py (#39381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39381 To prepare D21832355 Test Plan: Just formatting Reviewed By: dutran Differential Revision: D21832354 fbshipit-source-id: bbf6a1377752adaa115ee2e2a5ba546964e3fd08	2020-06-02 11:46:01 -07:00
Jeff Daily	c3ddb3f7a4	Add rocm image to circleci docker builder (#39262 ) Summary: CC ezyang sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39262 Differential Revision: D21842412 Pulled By: ezyang fbshipit-source-id: 00113e16c108b8f3eb92b4d0b93741161259f3ed	2020-06-02 11:40:48 -07:00
Martin Yuan	8bc5a4939f	Add prim::data to lite interpreter (#39335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39335 Test Plan: Imported from OSS Reviewed By: huntrui Differential Revision: D21820362 Pulled By: iseeyuan fbshipit-source-id: 0dd4d6cf6fe56cdab9c61709c8f52809edfd12f5	2020-06-02 11:35:41 -07:00
Yinghai Lu	cca29f2969	[Onnxifi] Support quantized output in Onnxifi (#39230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39230 Pull Request resolved: https://github.com/pytorch/glow/pull/4555 With this we now support cutting in the middle of the quantized domain for Onnxifi. This will allow us to observe intermediate quantized value during Onnxifi. Input still has to be non-quantized tensor though. This will be a follow-up. Test Plan: ``` buck test glow/fb/test/numerics:test_fc_nnpi_int8nnpi -- test_quantize ``` Reviewed By: hyuen Differential Revision: D21783368 fbshipit-source-id: 51001246e9e0357d7ba90bf12279b644f5f30221	2020-06-02 11:29:17 -07:00
Xiang Gao	35719cdc85	Fix some bugs of argmin/argmax and min/max (#39212 ) Summary: Partial fix of: https://github.com/pytorch/pytorch/issues/39060 There are actually two bugs: 1. `TensorIterator::get_dim_to_split` is asserting on what it shouldn't be. 2. `min_kernel_impl` and `max_kernel_impl` are setting `out_scalar_t` wrongly. `out_scalar_t` is used to compute indices for accumulation buffer, which is only used when the tensor is large enough. Both are tested in `test_argminmax_large_axis_cuda`, but unfortunately, this test does not run on CI. This PR makes `test_argminmax_large_axis_cuda` green, but this test is still not run on CI. I suggest keeping https://github.com/pytorch/pytorch/issues/39060 open until we figure out a way to run it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39212 Differential Revision: D21834723 Pulled By: ngimel fbshipit-source-id: e8272ac8552c3954ac486ba6e4129fedb545031e	2020-06-02 11:24:02 -07:00
Roffild	11f1014c05	Adding lost extra_repr() and __setstate __() to activation.py (#39084 ) Summary: Add ```python def __setstate__(self, state): self.__dict__.update(state) if not hasattr(self, 'dim'): self.dim = None def extra_repr(self): return 'dim={dim}'.format(dim=self.dim) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39084 Differential Revision: D21825245 Pulled By: albanD fbshipit-source-id: c790c288a9b23e7f320a912e21397da16bb1fb5a	2020-06-02 10:53:12 -07:00
Keigo Kawamura	b5cd3a80bb	Return `None` instead `False`, and return `bool` to `None` in type stub (#39324 ) Summary: # What's this Just a small bug fix related to typing stubs. I haven't open an issue. I will do so if I must open it, but this PR is very small (only 6 lines diff). ## What I encountered pytorch 1.5.0 with mypy 0.770 behaves odd. The code is following: ```python import torch def f() -> int: # Mypy says: `error: Missing return statement` with torch.no_grad(): return 1 ``` No mypy error is expected, but actually mypy 0.770 warns about `Missing return statement`. ## This is because `mypy >= 0.730` with `--warn-unreachable` says it's unreachable because `torch.no_grad()` may "swallows" the error in the return statement. http://mypy-lang.blogspot.com/2019/09/mypy-730-released.html Here is a small "swallowing" example: ```python from typing import Generator from contextlib import contextmanager contextmanager def swallow_zerodiv() -> Generator[None, None, None]: try: yield None except ZeroDivisionError: pass finally: pass def div(a: int, b: int) -> float: # This function seems `(int, int) -> float` but actually `(int, int) -> Optional[float]` because ` return a / b` may be swallowed with swallow_zerodiv(): return a / b if __name__ == '__main__': result = div(1, 0) print(result, type(result)) # None <class 'NoneType'> ``` To supress this behavior, one can tell mypy not to swallow any exceptions, with returning `Literal[False]` or `None` in `__exit__` method of the context manager. # What I did Return `None` instead of `bool` to tell mypy that "I never swallow your exception". I chose `None` because I cannot interpret `Literal[False]` without typing_extensions in `python <=3.7`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39324 Differential Revision: D21833651 Pulled By: albanD fbshipit-source-id: d5cad2e5e19068bd68dc773e997bf13f7e60f4de	2020-06-02 10:46:44 -07:00
Shen Li	bb0377bb24	Expose torch.futures.Future (#39008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39008 This commit adds a `torch.futures.Future` type and exposes its ctor, `wait`, `then`, and `set_result` APIs. This type is currently a wrapper of `c10::ivalue::Future` and mainly used by RPC for now. Later, we could revamp c10d APIs to return this `Future` type as well. More utils will be added into `torch.futures` package in followup PRs. Test Plan: Imported from OSS Differential Revision: D21723022 Pulled By: mrshenli fbshipit-source-id: 92e56160544e9bf00d11db3e8347a1b9707882c9	2020-06-02 10:12:56 -07:00
Xiang Gao	b3fac8af6b	Initial support for building on Ampere GPU, CUDA 11, cuDNN 8 (#39277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39277 This PR contains initial changes that makes PyTorch build with Ampere GPU, CUDA 11, and cuDNN 8. TF32 related features will not be included in this PR. Test Plan: Imported from OSS Differential Revision: D21832814 Pulled By: malfet fbshipit-source-id: 37f9c6827e0c26ae3e303580f666584230832d06	2020-06-02 10:03:42 -07:00
Venkata Chintapalli	85b3fa031c	[WIP] Layernorm Fake FP16 Op. (#39103 ) Summary: hyuen I have added few changes to the LayerNorm Fake FP16 op. Test case: test_layernorm_nnpi_fp16.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/39103 Reviewed By: hl475 Differential Revision: D21768096 Pulled By: hyuen fbshipit-source-id: 9bb7a5f759d783149b599706ff8d285653715f01	2020-06-02 09:54:23 -07:00
peter	30146d7391	More fixes about using Windows API through ctypes (#39376 ) Summary: Representation of `NULL` using `c_void_p` is `None` in ctypes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39376 Differential Revision: D21833451 Pulled By: malfet fbshipit-source-id: 70ec0a805a6c473e946ce9a7566440b6e0cd81ba	2020-06-02 09:42:09 -07:00
Luca Wehrstedt	e358adb42c	[TensorPipe] Acquire lock when adding message to timeout map (#39398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39398 The `timeoutMapMutex_` was only used to guard accesses in the timeout thread, but it should have been used also to guard accesses in the `send` method. The way I found this bug is rather odd. A test was failing because a timeout of 0.5 seconds was firing when it wasn't supposed to. The test was built with TSAN enabled and the point where we were wasting those 500ms was precisely when accessing the `timeoutMap_` in the `send` method. There is of course no reason it would take so long, so I suspect that either such an access triggered a whole lot of lengthy checks in TSAN or, perhaps, that TSAN was delaying it on purpose because it thought it was smelly and wanted to see whether it could cause a race. ghstack-source-id: 105088618 Test Plan: The test started passing. Differential Revision: D21838465 fbshipit-source-id: 02cf2bf1fef2e97da99b9c4e77070fe35d2bcbb0	2020-06-02 09:38:24 -07:00
Luca Wehrstedt	e142d70383	[TensorPipe] Guess IP addr in separate function (#39397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39397 I said I'd do it in a previous diff, but then I forgot, so here it is. ghstack-source-id: 105088619 Test Plan: No functional changes Differential Revision: D21838464 fbshipit-source-id: 74fbe76c7ce879b28c50fd29feecd9f4d71fc44c	2020-06-02 09:37:00 -07:00
Edward Yang	de5b8797e6	Remove unboxed only from AMP registrations for cat. (#39156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39156 TensorList is now supported for boxing, so we can remove unboxed only from it. I didn't check if there were other operators that were incorrectly classified. Fixes https://github.com/pytorch/pytorch/issues/38958 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21819821 Pulled By: ezyang fbshipit-source-id: 6dcf91bc196554e1721d2c704f3bf524f069534b	2020-06-02 07:49:02 -07:00
Hong Xu	283a3ff16d	The exception raised when RandomSampler.replacement is non-boolean should be TypeError (#36547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36547 Differential Revision: D21818752 Pulled By: ezyang fbshipit-source-id: 7502a24a0df134c44ac72959ba992777c873f8e9	2020-06-02 06:54:02 -07:00
Hong Xu	413f023784	Clean up cast from c10::complex<T> to thrust::complex<T>, and update the workaround CUDA version to <10.2 (#38941 ) Summary: I'm using CUDA 10.1 on Debian buster but I can still experience compilation issues: ``` /usr/include/thrust/detail/complex/complex.inl(64): error: no suitable conversion function from "const c10::complex<float>" to "float" exists detected during: instantiation of "thrust::complex<T>::complex(const R &) [with T=float, R=c10::complex<float>]" /home/hong/xusrc/pytorch/c10/util/complex_type.h(503): here instantiation of "T std::abs(const c10::complex<T> &) [with T=float]" /home/hong/xusrc/pytorch/aten/src/ATen/native/cuda/AbsKernel.cu(17): here instantiation of "c10::complex<T> at::native::abs_wrapper(c10::complex<T>) [with T=float]" /home/hong/xusrc/pytorch/aten/src/ATen/native/cuda/AbsKernel.cu(29): here /usr/include/thrust/detail/complex/complex.inl(64): error: no suitable conversion function from "const c10::complex<double>" to "double" exists detected during: instantiation of "thrust::complex<T>::complex(const R &) [with T=double, R=c10::complex<double>]" /home/hong/xusrc/pytorch/c10/util/complex_type.h(503): here instantiation of "T std::abs(const c10::complex<T> &) [with T=double]" /home/hong/xusrc/pytorch/aten/src/ATen/native/cuda/AbsKernel.cu(17): here instantiation of "c10::complex<T> at::native::abs_wrapper(c10::complex<T>) [with T=double]" /home/hong/xusrc/pytorch/aten/src/ATen/native/cuda/AbsKernel.cu(29): here 2 errors detected in the compilation of "/tmp/hong/tmpxft_00005893_00000000-6_AbsKernel.cpp1.ii". CMake Error at torch_cuda_generated_AbsKernel.cu.o.Debug.cmake:281 (message): Error generating file /home/hong/xusrc/pytorch/build/caffe2/CMakeFiles/torch_cuda.dir/__/aten/src/ATen/native/cuda/./torch_cuda_generated_AbsKernel.cu.o ``` `nvcc --version`: ``` nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Apr_24_19:10:27_PDT_2019 Cuda compilation tools, release 10.1, V10.1.168 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38941 Differential Revision: D21818790 Pulled By: ezyang fbshipit-source-id: a4bfcd8ae701f7c214bea0731c13a5f3587b7a98	2020-06-02 06:47:42 -07:00
Yuxin Wu	68f23d566a	[pytorch] Let jit.unused ignore unsupported method signature (#39336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39336 Test Plan: next diff Differential Revision: D21814656 fbshipit-source-id: 0bc6bcf668715473553f200a6ffea981abef09a6	2020-06-02 00:16:54 -07:00
Meghan Lele	f4365cf5ba	[JIT] Add support for saving/loading of lowered modules (#38893 ) Summary: Summary This commit adds support for seralization and deserialization of `ScriptModules` that have been lowered to a specific backend. Nothing special was required to accomplish this, other than removing some code in `unpickler.cpp` that guarded against the deserialization of `Any` type objects. Now that lists and dicts are tagged with their types during serialization, this check is no longer necessary. Test Plan This commit adds a unit test for testing that a lowered module still produces the same results as Python and regular JIT after saving and loading. Fixes This pull request fixes part of https://github.com/pytorch/pytorch/issues/37841. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38893 Differential Revision: D21825813 Pulled By: SplitInfinity fbshipit-source-id: 77a7b84504e0dddf14c89b3ed5dd6b438c086f66	2020-06-01 23:50:52 -07:00
Ksenija Stanojevic	858ab75046	ONNX Export Support for Celu (#38243 ) Summary: Add ONNX export support for torch.nn.CELU Pull Request resolved: https://github.com/pytorch/pytorch/pull/38243 Differential Revision: D21562188 Pulled By: houseroad fbshipit-source-id: a7056b3127c88e4a96a551ae906440ed8a153e42	2020-06-01 23:26:44 -07:00
Yinghai Lu	ed26e8b0a0	Resubmit [Onnxifi] Generic way of passing output shape/type hints (#39377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39377 Previous diff D21781515 had en compliation error on OSS CI and got reverted. Test Plan: net runner Reviewed By: jfix71 Differential Revision: D21832199 fbshipit-source-id: 07c6b6fe3bb18dc4f4ecec82ba9b99028086f55c	2020-06-01 22:51:15 -07:00
Hao Lu	f16b04f8b3	[caffe2] Update shape info delimiter (#39275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39275 Replace delimiter '\|' with '#' because '\|' could appear in the tensor names. Test Plan: ``` buck test //caffe2/caffe2/fb/opt:shape_info_utils_test ``` AI/AF canary: https://our.intern.facebook.com/intern/ads/canary/427007822162345576 https://our.intern.facebook.com/intern/ads/canary/427007917016548180 Reviewed By: yinghai Differential Revision: D21781037 fbshipit-source-id: f83497b12ddf0e7b71d6aed0e20873d52e97fb7f	2020-06-01 22:22:09 -07:00
Jerry Zhang	f29fa06c52	[quant][graphmode][fix] Run preprocess for child module before parent module (#39368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39368 Test Plan: Imported from OSS Differential Revision: D21829296 fbshipit-source-id: b7f001b54bb9f018336cff2810fc4efa9008ee3d	2020-06-01 21:43:39 -07:00
Jerry Zhang	625f4e39a7	[quant] Fix fusion pattern for add_relu (#39367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39367 We shouldn't match `%alpha` argument since it could be used by multiple functions Test Plan: Imported from OSS Differential Revision: D21829295 fbshipit-source-id: 6daa320a4b56df4e142b8e02e04a3ecb36284d1b	2020-06-01 20:15:13 -07:00
Tongzhou Wang	3001facd7a	[doc] [distributed] fix typo (#39264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39264 Differential Revision: D21791426 Pulled By: mrshenli fbshipit-source-id: c3aa8fda1893aa3c0f9ad3db7da25f1ee80303e8	2020-06-01 19:19:46 -07:00
Omkar Salpekar	a47e2d4488	[Futures] Allow setErrorIfNeeded arg to have type FutureError (#39113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39113 `setError` is overloaded - it can either take `FutureError` or an error message string as an argument. This PR replicates the same behavior for `setErrorIfNeeded`. ghstack-source-id: 105038824 Test Plan: Sandcastle/CI Differential Revision: D21753988 fbshipit-source-id: 0f413afd667f0416400aa95f0b2271b286326ac5	2020-06-01 18:30:34 -07:00
Natalia Gimelshein	f117089810	Restore thrust path for 1d tensors cumulative ops (#39180 ) Summary: Restores thrust path for computing prefix sums for tensors with a single non-degenerate dimension. Benchmark on P100 before: ``` import time import torch l = 4000 t=1000 for _ in range(6): for dtype in (torch.half, torch.float, torch.double): a = torch.randn(l, device="cuda", dtype=dtype) print(f'torch.cumsum(a) a.numel() == {l} for {t} times {dtype}') # dry run torch.cumsum(a, 0) torch.cuda.synchronize() # Iterate start = time.time() for _ in range(t): torch.cumsum(a, 0) # Final Synchronize Before Teardown torch.cuda.synchronize() end = time.time() elapsed = end - start bw = t * l * 2 * a.element_size() * 1e-9/elapsed print(f'Time {elapsed} bandwidth {bw}') l *= 2 ``` ``` torch.cumsum(a) a.numel() == 4000 for 1000 times torch.float16 Time 0.29149866104125977 bandwidth 0.05488875984145705 torch.cumsum(a) a.numel() == 4000 for 1000 times torch.float32 Time 0.24511313438415527 bandwidth 0.130551959528402 torch.cumsum(a) a.numel() == 4000 for 1000 times torch.float64 Time 0.25238871574401855 bandwidth 0.25357710550304885 torch.cumsum(a) a.numel() == 8000 for 1000 times torch.float16 Time 0.5812790393829346 bandwidth 0.05505101307965633 torch.cumsum(a) a.numel() == 8000 for 1000 times torch.float32 Time 0.4885847568511963 bandwidth 0.13099057861007293 torch.cumsum(a) a.numel() == 8000 for 1000 times torch.float64 Time 0.5031211376190186 bandwidth 0.2544118909528429 torch.cumsum(a) a.numel() == 16000 for 1000 times torch.float16 Time 1.1607651710510254 bandwidth 0.05513604439220951 torch.cumsum(a) a.numel() == 16000 for 1000 times torch.float32 Time 0.9755356311798096 bandwidth 0.13120996907637011 torch.cumsum(a) a.numel() == 16000 for 1000 times torch.float64 Time 1.0045702457427979 bandwidth 0.25483533987283175 torch.cumsum(a) a.numel() == 32000 for 1000 times torch.float16 Time 2.3198938369750977 bandwidth 0.055174938594129294 torch.cumsum(a) a.numel() == 32000 for 1000 times torch.float32 Time 1.949366569519043 bandwidth 0.13132471029456586 torch.cumsum(a) a.numel() == 32000 for 1000 times torch.float64 Time 2.00749135017395 bandwidth 0.2550446854755488 torch.cumsum(a) a.numel() == 64000 for 1000 times torch.float16 Time 4.63812518119812 bandwidth 0.055194715536735495 torch.cumsum(a) a.numel() == 64000 for 1000 times torch.float32 Time 3.897014856338501 bandwidth 0.13138261435345344 torch.cumsum(a) a.numel() == 64000 for 1000 times torch.float64 Time 4.013219356536865 bandwidth 0.2551567479938705 torch.cumsum(a) a.numel() == 128000 for 1000 times torch.float16 Time 9.274584770202637 bandwidth 0.05520462777427539 torch.cumsum(a) a.numel() == 128000 for 1000 times torch.float32 Time 7.792156934738159 bandwidth 0.1314141910354645 torch.cumsum(a) a.numel() == 128000 for 1000 times torch.float64 Time 8.02474856376648 bandwidth 0.2552104883693396 ``` after: ``` torch.cumsum(a) a.numel() == 4000 for 1000 times torch.float16 Time 0.033731937408447266 bandwidth 0.47432792864109924 torch.cumsum(a) a.numel() == 4000 for 1000 times torch.float32 Time 0.031197071075439453 bandwidth 1.025737317539167 torch.cumsum(a) a.numel() == 4000 for 1000 times torch.float64 Time 0.03245425224304199 bandwidth 1.972006611667389 torch.cumsum(a) a.numel() == 8000 for 1000 times torch.float16 Time 0.034340858459472656 bandwidth 0.931834596906329 torch.cumsum(a) a.numel() == 8000 for 1000 times torch.float32 Time 0.031183481216430664 bandwidth 2.0523686741645197 torch.cumsum(a) a.numel() == 8000 for 1000 times torch.float64 Time 0.031975507736206055 bandwidth 4.003063878015136 torch.cumsum(a) a.numel() == 16000 for 1000 times torch.float16 Time 0.032624006271362305 bandwidth 1.9617455767895642 torch.cumsum(a) a.numel() == 16000 for 1000 times torch.float32 Time 0.03129267692565918 bandwidth 4.0904138787514 torch.cumsum(a) a.numel() == 16000 for 1000 times torch.float64 Time 0.03260397911071777 bandwidth 7.851802356107085 torch.cumsum(a) a.numel() == 32000 for 1000 times torch.float16 Time 0.032918691635131836 bandwidth 3.888368390176069 torch.cumsum(a) a.numel() == 32000 for 1000 times torch.float32 Time 0.030851364135742188 bandwidth 8.29785026275116 torch.cumsum(a) a.numel() == 32000 for 1000 times torch.float64 Time 0.037447452545166016 bandwidth 13.6724921243299 torch.cumsum(a) a.numel() == 64000 for 1000 times torch.float16 Time 0.03391098976135254 bandwidth 7.549175114073387 torch.cumsum(a) a.numel() == 64000 for 1000 times torch.float32 Time 0.03214144706726074 bandwidth 15.929587704267457 torch.cumsum(a) a.numel() == 64000 for 1000 times torch.float64 Time 0.034329891204833984 bandwidth 29.828233182859922 torch.cumsum(a) a.numel() == 128000 for 1000 times torch.float16 Time 0.03589606285095215 bandwidth 14.263402705915954 torch.cumsum(a) a.numel() == 128000 for 1000 times torch.float32 Time 0.033178091049194336 bandwidth 30.863740728231736 torch.cumsum(a) a.numel() == 128000 for 1000 times torch.float64 Time 0.03487515449523926 bandwidth 58.72375419238841 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39180 Differential Revision: D21824498 Pulled By: ngimel fbshipit-source-id: b50fadde598e9ce2871201cd6bb22fa6ac0d482e	2020-06-01 18:07:55 -07:00
Sebastian Messmer	e286cb5e81	Revert D21781515: [Onnxifi] Generic way of passing output shape/type hints Test Plan: revert-hammer Differential Revision: D21781515 Original commit changeset: dfae3276e8f1 fbshipit-source-id: 2599056bb8e78791e0bcde01c5251db8d5014857	2020-06-01 17:52:42 -07:00
Yinghai Lu	7b07208d86	[Onnxifi] Generic way of passing output shape/type hints (#39229 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39229 Previously we have a ad-hoc way of passing output shape/type hints which is very limited and doesn't support quantized output. We actually have all the shape_info/qshape_info so we pass them as TensorProto and QTensorProto directly. This will pave the way for us to set output to quantized type in OnnxifiOp. Test Plan: ``` buck test glow/fb/test:net_runner ``` Reviewed By: hyuen Differential Revision: D21781515 fbshipit-source-id: dfae3276e8f158eed830f1244bea6420a9135aab	2020-06-01 17:21:47 -07:00
Nikita Shulga	c5def603a7	Use @skipIfNoFBGEMM instead of direct check (#39068 ) Summary: Should be a no-op, just makes the intent a bit cleaner Pull Request resolved: https://github.com/pytorch/pytorch/pull/39068 Differential Revision: D21829464 Pulled By: malfet fbshipit-source-id: dc174a3d7da3701bd9d31c366dfa9d24044ef27a	2020-06-01 17:15:36 -07:00
peter	e6d86036e2	Fix return types of Windows API functions in __init__.py (#39334 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39327. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39334 Differential Revision: D21820898 Pulled By: malfet fbshipit-source-id: ea771f8c44a152cee395ada70f8f129d4ad5283d	2020-06-01 17:03:57 -07:00
Omkar Salpekar	295a23f43f	[Futures] Added markCompletedIfNeeded API (#39080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39080 This PR adds a function similar to setErrorIfNeeded for marking futures complete. It only completes futures if they haven't been completed already. ghstack-source-id: 105038825 Test Plan: Sandcastle/CI Differential Revision: D21746065 fbshipit-source-id: a7791a070f19e1f56aa5c2822edc4b60d8227c2c	2020-06-01 16:41:12 -07:00
Xiang Gao	48e66859c1	Check illegal output dtype for torch.{min, max} (#38850 ) Summary: The test is currently only enabled for CPU, and it will be enabled for CUDA after the migration of `min` and `max` from THC to ATen is done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38850 Differential Revision: D21819388 Pulled By: ngimel fbshipit-source-id: 406343e96bccbf9139eb1f8f2d49ed530dd83d62	2020-06-01 16:09:39 -07:00
Alex Hedges	a3c87c4922	Make Optimizer.state_dict() nondeterministic (#37347 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/36831. Instead of using `id()`, an arbitrary yet consistent order-based index is used instead. This results in a deterministic output between runs. I am not the biggest fan of using `nonlocal` (it appears to be used sparingly in the codebase) to get `start_index` between calls to `pack_group()`, but the alternatives had larger issues: - Using the last value added to `param_mappings` would be ideal, but that only works if `dict` iteration order is consistent, and PyTorch currently supports Python <3.7. - Using the maximum value added to `param_mappings` wouldn't have that issue but would not be constant time. For testing, I confirmed that `test_optim.py` works before and after these changes. Randomizing the indices in `param_mappings` causes the tests to fail, which is further evidence these changes work. I'm not 100% if these tests are sufficient, but they're a start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37347 Differential Revision: D21353820 Pulled By: vincentqb fbshipit-source-id: e549f1f154833a461b1f4df6d07ad509aab34ea1	2020-06-01 15:32:02 -07:00
Jamie King	7f1a96d43c	Adding sparse Lp regularization operator to Caffe2 (#38574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38574 Adding sparse L1 and L2 regularization operator to Caffe2. This doesn't work using run_on_loss, only run_after_optimize. Applying it to run_after_optimize rather than run_on_loss was easier to implement, particularly for the L1 norm which is preferable in some cases and is non-differentiable at zero. Test Plan: Wrote and ran unit tests in operator_test:sparse_lp_regularizer_test. Differential Revision: D21003029 fbshipit-source-id: 81070a621752560ce03e320d065ce27807a5d278	2020-06-01 15:21:19 -07:00
Benoit Steiner	6d3e4aa0f9	Made sure torchscript compiles in optimized mode (#38888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38888 Test Plan: ran the build Reviewed By: zdevito Differential Revision: D21045046 fbshipit-source-id: f86d51b083cbc530012d36bbc770f13b28f4c65d	2020-06-01 14:53:55 -07:00
Facebook Community Bot	f76e05a2e1	Automated submodule update: FBGEMM (#39322 ) Summary: This is an automated pull request to update the first-party submodule for [pytorch/FBGEMM](https://github.com/pytorch/FBGEMM). New submodule commit: `7d673046a6` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39322 Test Plan: Ensure that CI jobs succeed on GitHub before landing. Reviewed By: jianyuh Differential Revision: D21814389 fbshipit-source-id: cec819a28f08915e2443f405d42efaa41a523bc8	2020-06-01 13:14:26 -07:00
Edward Yang	cffa0bee04	Don't generate DeviceGuard for CPU wrapping code. (#38806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38806 I'm trying to delete the Type wrapper code entirely, but I'm trying to figure out exactly how many device guards I need to preserve. For now, delete the guards that are known to be useless. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21764403 Pulled By: ezyang fbshipit-source-id: 9c3d18f209339dfe2adbe5866b31b03b55990b74	2020-06-01 13:10:57 -07:00
Edward Yang	2b6a48e962	Remove supports_named_tensor from codegen entirely. (#38739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38739 Instead of codegenning the named tensor support checks into CPUType/CUDAType, we instead add a new dispatch key that is put into tensor whenever it has names. By default, the fallback implementation says that named tensors are not supported, but if they are supported, we register a fallthrough which lets us through to the true backend implementation. There are a bunch of small pieces which are necessary to make this happen: - NameMode now also excludes DispatchKey::Named from the dispatch set - To avoid bad error messages, we add a teensy special case to the dispatcher for named_not_supported_kernel: if we see that the boxed kernel we need to invoke from unboxed is this kernel, but we don't support boxing, but it's a kernel which is known to not need boxing, we just pass in nullptr for the stack. The special case here is very nice: it doesn't affect the fast path and only gets exercised when things are not supported. - I need to add support for per operator fallthrough registration. This is done similarly to how we support fallthrough fallback, by just keeping track if the registered kernel for an operator is a fallthrough. It is possible we could go even further down this path, and move the named tensor logic itself into this key. I leave this up to future work. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21662643 Pulled By: ezyang fbshipit-source-id: 5bc6ae14a1f600189bd8bf865f74dd1700d932f7	2020-06-01 13:09:08 -07:00
Shihao Xu	45baf0e1a0	[Profiler x RPC] Enable RPC Server Global Profiler (#38847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38847 See motivation and design in https://github.com/pytorch/pytorch/issues/38845. Close https://github.com/pytorch/pytorch/issues/38845. Changes, - Add pre-request and post-response hooks to RPC "request_callback_impl.cpp". For a thread that executes RPC handler, check if the server-side global profiling is on. If it's on, enable profiling on this thread and after response, merge the thread-local profiling result into the global profiling state. - Add context-style Python API to parse the profiling Events into ranges represented by FunctionEvent. - Add data-structures to work as global profiling state that support nesting and container for consolidating results from multiple threads. Test, - Add a test that uses nested profiling range and inspect the profiling events. ghstack-source-id: 104991517 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_server_process_global_profiler Differential Revision: D5665992 fbshipit-source-id: 07f3bef5efd33d1214ef3404284c3803f5deca26	2020-06-01 12:35:52 -07:00
Xiang Gao	bdaa78499e	Reland Refactor c10::complex and cleanup c10::Scalar (#39306 ) Summary: This reverts commit 8556664d6896a8e7f48f1c419e06e0568b9ee09e. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39306 Differential Revision: D21818096 Pulled By: albanD fbshipit-source-id: ed4396fcad8c7036fb7bfa2f3da6ed63c0eb6625	2020-06-01 11:51:57 -07:00
Nikita Shulga	39d037253c	Test PyTorch using python-3.8 + GCC-9 on Bionic (Reland) (#39121 ) Summary: Enable new test config in .circleci/config.yml Skip scanning several 3rd-party packages to work around https://bugs.python.org/issue40350 Remove pre python-3.5 checks from `test.sh` and update `scikit-learn` to python-3.8 compatible version This is a reland of https://github.com/pytorch/pytorch/pull/39030 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39121 Differential Revision: D21820375 Pulled By: malfet fbshipit-source-id: d0be79b7d204cf692e055d42b9be42402dc4c1c0	2020-06-01 11:11:12 -07:00
Gregory Chanan	78244f8129	Kill CPPTypeToScalarType. It's now subsumed by CPPTypeAndStdComplexToScalarType. (#39263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39263 CPPTypeToScalarType is confusing because it doesn't handle the different complex types and it maps everything that it doesn't know about to Undefined, which is error prone. Test Plan: Imported from OSS Differential Revision: D21790515 Pulled By: gchanan fbshipit-source-id: ec897fd50bd8f7548a34573e59eb57bf3c6383c6	2020-06-01 11:00:58 -07:00
Gregory Chanan	ddf6d49445	Avoid defining bogus CPPTypeAndStdComplexToScalarType<void> by using some decltype tricks. (#39261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39261 Test Plan: Imported from OSS Differential Revision: D21790288 Pulled By: gchanan fbshipit-source-id: c1f04577c02f78dbc911aad4cb1d862acbea4b31	2020-06-01 10:58:51 -07:00
Jie	07518e120b	[nvFuser] add torch.jit.fuser context manager (#38993 ) Summary: 1. `torch.jit.fuser(str)` context manager facilitates switch between backend fusers: str - 'fuser0' enables only legacy fuser; str - 'fuser1' enables only NNC; str - 'fuser2' enables only nvFuser; 2. cleanup updated python tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38993 Reviewed By: nairbv, pbelevich Differential Revision: D21800620 Pulled By: soumith fbshipit-source-id: 7fe855f5a5b97368e5e84c98c28d04b2e1276c85	2020-06-01 10:52:40 -07:00
guol-fnst	42b2dee6c2	`verbose` unused in `torch.backends.cudnn` (#39228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39228 Differential Revision: D21818455 Pulled By: ezyang fbshipit-source-id: abf158f2d745fd135cd0966ee30d559cefa456c0	2020-06-01 09:08:03 -07:00
Vasiliy Kuznetsov	c193bd41f5	fake_quantize: respect device affinity (#39031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39031 Makes the eager mode QAT prepare logic respect device affinity. This fixes the issue where a module is on `cuda:0`, and running the QAT prepare script would add observers on `cpu`. Now it will add them on the original device. Test Plan: ``` python test/test_quantization.py TestDistributed.test_device_affinity ``` Imported from OSS Differential Revision: D21729272 fbshipit-source-id: 5537bf3977ddc23412184941978bf0d1cc6fb479	2020-06-01 08:55:14 -07:00
Edward Yang	2fe0fc2684	Revert D21374247: Use TensorMethods.cpp Test Plan: revert-hammer Differential Revision: D21374247 Original commit changeset: 076964415079 fbshipit-source-id: 732ec8c561d1f37475c1b5549ba79c718e3a6db8	2020-06-01 08:12:09 -07:00
Shawn Zhong	1943a2c317	Fix missing code in 'Installing C++ distribution of Pytorch' (#39237 ) Summary: Fix https://github.com/pytorch/pytorch/issues/39236 - Before: ![image](https://user-images.githubusercontent.com/6421097/83250998-8e0e5580-a16e-11ea-863e-ed4d9e060bdf.png) - After: ![image](https://user-images.githubusercontent.com/6421097/83250933-73d47780-a16e-11ea-86d3-c5a77d9fa6d1.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/39237 Differential Revision: D21818392 Pulled By: ezyang fbshipit-source-id: d7e51de83ec84276e88cbf168bf9e7f57200ff46	2020-06-01 07:54:43 -07:00
guol-fnst	7773a45c0d	Division by zero crashes for fmod operator(#32699 ) (#38919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38919 Differential Revision: D21791648 Pulled By: anjali411 fbshipit-source-id: 447ded74fa52377b04c1b2271a0b3eb5b8e4eeed	2020-06-01 07:48:52 -07:00
mattip	dc4fd0409f	DOC: remove java documentation (#38920 ) Summary: Continuation of issue gh-36064 and PR gh-38042 which removed the unmaintained javaspinx extension. The unknown sphinx directives cause warnings when building documentation. Edit: link to PR as well as issue Pull Request resolved: https://github.com/pytorch/pytorch/pull/38920 Differential Revision: D21818297 Pulled By: ezyang fbshipit-source-id: 2c1d007a7689b26653d7dee081b0b969b8a731a2	2020-06-01 07:32:00 -07:00
Xiang Gao	a566451017	Migrate AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2 to c10::complex (#39285 ) Summary: All the uses of `AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND2` are for CUDA. Dispatch macro comes first, cleanup of remaining `c10::complex --> thrust::complex` will be done later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39285 Differential Revision: D21803978 Pulled By: anjali411 fbshipit-source-id: ec9837f121e3020dfa2d12c8bc9aede9fb01c375	2020-06-01 07:25:47 -07:00
Gregory Chanan	aa5afbdb92	Add dynamic_cast asserts to CPU Loops. (#39258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39258 On CUDA, we currently support casting loops dynamically (i.e. when the argument or return types of the lamba don't match the dtypes of the TensorIterator). On CPU, before this change we would essentially reinterpret_cast, now we internal assert. We could add dynamic_casting support in the future on CPU. Test Plan: Imported from OSS Differential Revision: D21790020 Pulled By: gchanan fbshipit-source-id: b52f4340a0553f0c1bd8fafaa58309bc110adecf	2020-06-01 07:23:51 -07:00
Gregory Chanan	9b05b1bacf	Make needs_dynamic_casting multiple-complex-type aware. (#39255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39255 We don't actually cast between these complex representations, but the prior implementation would indicate that we needed to dynamic_cast, because we didn't have mappings for std::complex or thrust::complex. This PR makes it so they all map to the same dtype. Note that this has no functional change as all the use sites have already been changed to take this into account. Test Plan: Imported from OSS Differential Revision: D21789694 Pulled By: gchanan fbshipit-source-id: 6127aab32c40e62bf1b60fe5ccaeffacc60e3b52	2020-06-01 07:23:46 -07:00
Gregory Chanan	f9edbda7d7	Loops: Separate out dynamic_casting concerns from complex overloads. (#39254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39254 dynamic_casting is meant to handle CUDA kernels when the operand dtypes don't match the C++ kernel function types. This is made more complicated by the current state of complex, which uses thrust::complex, std::complex, c10::complex. Currently, thrust::complex and std::complex map to need dynamic casting even though we don't actually cast them. But, making them not need dynamic_cast doesn't work either because certain dynamic_casting optimizations don't work with thrust::complex and (maybe) std::complex. So, we separate out these concerns so we can iterate on dynamic_casting checks, in particular by applying them to CPU. This PR should have no functional change. Test Plan: Imported from OSS Differential Revision: D21788870 Pulled By: gchanan fbshipit-source-id: 5d69c9851423dee2fbe789674f4306710378f4ff	2020-06-01 07:22:09 -07:00
Jongsoo Park	caaea084e9	[caffe2] minor typo fix in fused_rowwise_nbitfake_conversion_ops.h comment (#39315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39315 As title Test Plan: Just comment change Reviewed By: jianyuh Differential Revision: D21813196 fbshipit-source-id: 3ff6bcd3cc31a4820bf7c7a948123c9e968f5de2	2020-05-31 23:32:39 -07:00
Nick Gibson	5153cdbe87	[TensorExpr] fix a bug in ReorderAxis when there are trailing loops (#38841 ) Summary: Fixes a bug in reorder axis where we append the new reordered loops to the enclosing block, even if there were statements after it. e.g. with 3 Computes: ``` for (int m1 ... for (int n1 ... for (int k1 ... Body 1 for (int m2 ... for (int n2 ... for (int k2 ... Body 2 for (int m3 ... for (int n3 ... for (int k3 ... Body 3 ``` If we reorder loops m2 and k2, we were also reordering the body statements like this: ``` for (int m1 ... for (int n1 ... for (int k1 ... Body 1 for (int m3 ... for (int n3 ... for (int k3 ... Body 3 for (int k2 ... for (int n2 ... for (int m2 ... Body 2 ``` This is because we always append the new loops to their parent. This PR fixes the logic to replace the old loop root with the new loop, which keeps things consistent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38841 Differential Revision: D21723670 Pulled By: nickgg fbshipit-source-id: 1dee8bb153182fcaa2cabd948197577e8e80acd7	2020-05-31 22:22:45 -07:00
Ilia Cherniavskii	68e62b9ab6	Use TensorMethods.cpp (#37639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37639 Changing TensorMethods.h to .cpp Necessary to avoid incomplete types in dispatcher Test Plan: CI Imported from OSS checked mobile size, no change, small reduction in size in fbios fbios: Succeeded Change in Download Size for arm64 + 3x assets variation: -18.2 KiB Change in Uncompressed Size for arm64 + 3x assets variation: -8.8 KiB reran benchmark, no stat. significant difference buck run mode/opt caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:benchmark_torchscript_model -- --model_file caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads/addmodule.pt --num_runs 3 ╷ @ 68592d0d 41 minutes ago iliacher D21374247 ╭─╯ Use TensorMethods.cpp Created 3 benchmark runs on aibench for caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads/addmodule.pt. Links to the results: * Adhoc run: https://our.intern.facebook.com/intern/aibench/details/1729113760 * Adhoc run: https://our.intern.facebook.com/intern/aibench/details/3867976782 * Adhoc run: https://our.intern.facebook.com/intern/aibench/details/2782186766 hg prev @ 7f501b42 Thursday at 14:26 bvaughan D21764704 ╷ short-circuit pow for complex 1 and 0 exponents Created 3 benchmark runs on aibench for caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads/addmodule.pt. Links to the results: * Adhoc run: https://our.intern.facebook.com/intern/aibench/details/2155256332 * Adhoc run: https://our.intern.facebook.com/intern/aibench/details/1802057074 * Adhoc run: https://our.intern.facebook.com/intern/aibench/details/4119590830 Differential Revision: D21374247 fbshipit-source-id: 076964415079cf84fb57f1f7b43d087afed86e1d	2020-05-31 17:11:12 -07:00
Shawn Zhong	f872cf5ed0	Add %= support in TorchScript (#38983 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38336 Add %= support in TorchScript. It's now possible to do something like: ```py torch.jit.script def mm(a,b): a %= b return a ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38983 Differential Revision: D21803523 Pulled By: SplitInfinity fbshipit-source-id: 3437860d06d32e26ca9a5497099148c1f1616c5b	2020-05-31 12:51:56 -07:00
Alban Desmaison	8556664d68	Revert D21769463: [pytorch][PR] Refactor c10::complex and cleanup c10::Scalar Test Plan: revert-hammer Differential Revision: D21769463 Original commit changeset: 3cb5bcbb0ff3 fbshipit-source-id: 0392e23d7057f90e7b13c9abf19bcca2d84b26fa	2020-05-30 18:02:51 -07:00
Xiang Gao	928ce29ee2	Refactor c10::complex and cleanup c10::Scalar (#38593 ) Summary: Main: - `c10::complex` is refactored: it no longer uses inheritance to specialize constructors, but using SFINAE instead. This implementation is cleaner and avoids some compiler bugs. - `c10::Scalar` is cleaned up: it no longer needs to store complex as `double z[2]`, `c10::complex<double>` will work. Other cleanups: - `numeric_limits` of `c10::complex` is moved to `complex_utils.h` - the variable in `c10::complex` storing real and imag is changed from `storage[2]` to `real_` and `imag_` - remove the `c10::` before `complex` when in `c10` namespace Pull Request resolved: https://github.com/pytorch/pytorch/pull/38593 Differential Revision: D21769463 Pulled By: anjali411 fbshipit-source-id: 3cb5bcbb0ff304d137221e00fe481a08dba7bc12	2020-05-30 13:33:51 -07:00
Xiaodong Wang	fcef43965b	[AMD] Fix broken test (#39297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39297 histogram op doesn't have GPU implementation. It's breaking the CI GPU test. Make the test run cpu only. Test Plan: CI Reviewed By: hwangjeff Differential Revision: D21800824 fbshipit-source-id: 9c835786f22bac7d420ce610397a6ee69084c19a	2020-05-30 13:12:24 -07:00
Negin Raoof	b7b99ab0c8	[ONNX] Remove Aten ops from ONNX export (#37239 ) Summary: This PR adds a new operator export type to exporter: ONNX_FALLTHROUGH This new type allows ops that are not supported to pass through. This PR also removes all aten ops in ONNX operator export type mode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37239 Reviewed By: hl475 Differential Revision: D21440509 Pulled By: houseroad fbshipit-source-id: 38b826677cf3431ea44868efebefe1ff51c9aa75	2020-05-29 21:20:14 -07:00
Amy Yang	c02cb7aa08	[nnpi fake ops] bug fix int8QuantizeNNPI (#39271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39271 Caused 10% NE loss. Bug in emulation itself and NNPI is fine. Test Plan: mobile_cvr has no NE loss after this fix: https://fburl.com/mlhub/z6hd8rhn Reviewed By: hyuen Differential Revision: D21793205 fbshipit-source-id: a908e95c26c2353f982d05e0a20f02f3c724715d	2020-05-29 20:49:07 -07:00
Jeff Daily	b0d6e4b604	work around building onnx in older rocm docker images (#39253 ) Summary: CC ezyang xw285cornell sunway513 malfet Pull Request resolved: https://github.com/pytorch/pytorch/pull/39253 Differential Revision: D21799868 Pulled By: xw285cornell fbshipit-source-id: 3ced799c0a3a3f1e052b362e8333dda2f76aeecd	2020-05-29 19:16:08 -07:00
Supriya Rao	25a6c5f60f	[quant] Dynamic Linear module to use reduce_range (#39125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39125 switch to setting reduce_range to true for version > 3. Models serialized with older state_dict will have version <=3 so will be run with reduce_range=false Verified with backward compatibility tests (works with no changes to these tests) Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D21769689 fbshipit-source-id: 131f2ae736e31705222e82bdc77480f2f1826fe8	2020-05-29 18:21:57 -07:00
Supriya Rao	9cacbe29e5	[quant] Add reduce_range argument for qlinear_dynamic (#39041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39041 reduce_range option restricts the activation tensor to 7 bits instead of 8. This is necessary to enable per channel quant for RNNs and LSTMs Test Plan: python test/test_quantization.py TestDynamicQuantizedLinear Imported from OSS Reviewed By: akinh Differential Revision: D21769691 fbshipit-source-id: ef0e9873367f3c1b34091b0b3af788233ef60c6c	2020-05-29 18:19:36 -07:00
Gregory Chanan	001102c50c	Avoid a TensorIterator/Loops reinterpret_cast in a test. (#39246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39246 This was found by adding some error checking in https://github.com/pytorch/pytorch/pull/38817, but that needs more work to be able to merge, so we just do a one-off fix here. Test Plan: Imported from OSS Differential Revision: D21786761 Pulled By: gchanan fbshipit-source-id: e4ecf6506c8649214d0fddfcca2ada6afa339d3b	2020-05-29 16:21:33 -07:00
Yinghai Lu	7ab96461d0	Remove some unnecessary code in Onnxifi (#39197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39197 Pull Request resolved: https://github.com/pytorch/glow/pull/4552 Clean up some code to prepare support for quantized output in Onnxifi. Reviewed By: jfix71 Differential Revision: D21770855 fbshipit-source-id: 8ecfd675846e3a42a80fd133e5eaa8dad0445bd3	2020-05-29 15:54:59 -07:00
Ilia Cherniavskii	a5e023f28a	Set RecordFunction id only when needed (#39265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39265 In this PR we set id of RecordFunction only when callbacks need them and when there's at least one active callback Test Plan: testRecordFunction unit test in test_misc.cpp buck test mode/dev caffe2/test/cpp/jit:jit https://our.intern.facebook.com/intern/testinfra/testrun/8725724291116413 Reviewed By: dzhulgakov Differential Revision: D21790421 fbshipit-source-id: 016623d7f1a2a271921a71c0483061e232b40321	2020-05-29 15:34:44 -07:00
Venkata Chintapalli	1c67c3d587	test_fc_nnpi_fp16.py test_fc_num0_fix fix (#39248 ) Summary: Update test_fc_num0_fix test case to limit max_examples=5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39248 Test Plan: test_fc_num0_fix Reviewed By: amylittleyang Differential Revision: D21787870 Pulled By: yinghai fbshipit-source-id: 9db85c44e8d0e5492b5862d2716b3baf55a466df	2020-05-29 15:07:07 -07:00
Jerry Zhang	1d0ec50a02	[quant][graphmode] Rename _quantize_script.py to quantize_script.py (#39122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39122 Test Plan: Imported from OSS Differential Revision: D21757619 fbshipit-source-id: 603c020aaaf6f467e63f15b4f271fe946d9fb949	2020-05-29 12:33:40 -07:00
anjali411	a50d781c03	Added real and imag views as tensor attributes (#39033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39033 Added `real` and `imag` views as tensor attributes. Right now, tensor.imag is disabled for real tensors. This is because if we return a new tensor of zeros, the user would be able to update the tensor returned by tensor.imag which should not be allowed as numpy returns a read-only array, and pytorch doesn't support read-only tensors yet. TODO in follow-up PRs: 1. add a setter for `real` and `imag` 2. add special case in codegen for `real` and `imag` backward functions. 3. remove `copy_real` and `copy_imag` methods. Test Plan: Imported from OSS Differential Revision: D21767542 Pulled By: anjali411 fbshipit-source-id: 539febf01f01ff055e3fbc7e9ff01fd3fe729056	2020-05-29 12:31:51 -07:00
Nikita Shulga	c3d3782c80	Fix init-shutdown race condition in autograd engine (#39194 ) Summary: If Engine is created shortly before application exits, then non-reentrant thread might not have a chance to spawn which would result in an infinite wait in `Engine::~Engine()` Prevent this by actually waiting for threads to spawn before returning from `Engine::start_device_threads()` Make sure that thread count is incremented before GIL is acquired in PythonThread Pull Request resolved: https://github.com/pytorch/pytorch/pull/39194 Differential Revision: D21789219 Pulled By: malfet fbshipit-source-id: d9b5e74d5ddeb2474b575af2e4f33d022efcfe53	2020-05-29 12:20:31 -07:00
Amy Yang	88c5fd94e7	[nnpi eval] enable int8 eval with emulation Int8FC (#39112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39112 Allow int8 packed weights in int8 model to deserialize to original format. Set default deserialization behavior in eval workflows to original format. Test Plan: Tested with workflow: f192797187 Reviewed By: yinghai Differential Revision: D21737940 fbshipit-source-id: 7afaf307b16cb4e85e61f019356f83fdab772c57	2020-05-29 11:59:12 -07:00
peter	29c04acdbb	Followup for cuda assert cleanups (#39220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39220 Differential Revision: D21786485 Pulled By: malfet fbshipit-source-id: 06d11519709a648f096907b733d97f643633171b	2020-05-29 11:53:46 -07:00
peter	a5d44800f0	Implement CUDA_KERNEL_ASSERT for MSVC (#39218 ) Summary: Tested locally on CPU/GPU + Debug/Release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39218 Differential Revision: D21786500 Pulled By: malfet fbshipit-source-id: 7e871003d3509436952932b5ff3599e36bb8f205	2020-05-29 11:44:54 -07:00
Jeff Daily	c25b3d4305	[ROCm] in test_cuda.py, re-enable skipped tests (#37952 ) Summary: - test_stream_context - test_cublas_multiple_threads_same_device - test_cusparse_multiple_threads_same_device These tests passed three rounds of CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37952 Differential Revision: D21532027 Pulled By: vincentqb fbshipit-source-id: dce7fc4f0943e2be43da71e213e168c455c66751	2020-05-29 11:38:47 -07:00
Jerry Zhang	85d0292c14	[quant][graphmode] Cleanup inplace API (#38827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38827 Test Plan: Imported from OSS Differential Revision: D21673481 fbshipit-source-id: becca38efcf720089407c981419b33f629a33e91	2020-05-29 11:13:25 -07:00
xuewenc	7836eaceee	[JIT] JIT should let people know we inferred an argument as a tensor (#38527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38527 This PR solves issue (#37200). Error is encountered during IR generation while trying to resolve the call to sum. Should let user know it inferred the value for argument 'dim' to be of type 'Tensor' because it was not annotated with an explicit type. Test Plan: Add code to reprodue the issue (#37200) `python test/test_jit.py TestJit.test_inferred_as_tensor` Differential Revision: D21743876 Pulled By: superwizard2019 fbshipit-source-id: 370ca32afea4d53b44d454f650f7d3006f86bcc6	2020-05-29 10:41:50 -07:00
Nikita Shulga	86f46ac9ca	Fix assertNotEqual error reporting (#39217 ) Summary: `msg` argument must be passed to `assertRaises`, because its exception is passed upstream (with custom error message) if `assertEquals` succeedes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39217 Differential Revision: D21786141 Pulled By: malfet fbshipit-source-id: f8c3d4f30f474fe269e50252a06eade76d575a68	2020-05-29 10:35:56 -07:00
Ophir Romano	f44fca882e	Update NNPI backend to v0.6.0.5 (#4539 ) Summary: Updating to NNPI Backend version v0.6.0.5 Pull Request resolved: https://github.com/pytorch/glow/pull/4539 Test Plan: Imported from GitHub, without a `Test Plan:` line. Glow Dev Mode Testing ICEREF ``` buck test //glow: -- NNPI ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/844425092762063/ Glow Opt Mode Testing ICEREF ``` buck test mode/opt //glow: -- NNPI ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/8444249313953808/ Glow Opt Mode Testing On Device ``` buck test mode/opt -c glow.nnpi_use_inf_api=true //glow: -- NNPI -j 1 ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/5629499560791922/ Net_runner defaults OPT ICEREF ``` buck test mode/opt //glow/fb/test:net_runner_nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/1970324865910264/ Net_runner defaults OPT CARD ``` USE_INF_API=1 buck test mode/opt //glow/fb/test:net_runner_nnpi ``` FAIL https://www.internalfb.com/intern/testinfra/testconsole/testrun/6192449501601838/ Net_runner tiny ctr_mbl_feed_2020q1 OPT CARD ``` USE_INF_API=1 LD_LIBRARY_PATH=third-party-buck/platform007/build/fb-nnpi/lib ./buck-out/opt/gen/glow/fb/test/net_runner_nnpi --logfiledb ~/test/161676462_0.predictor --opt_net ~/test/debug_optimized_net_0.pb_txt --use_input ~/test/inputs.pb.recordio --glow-nnpi-memory=13000000 --glow-num-devices=2 --ref_impl=glow --test_impls=glow,c2_fp16 --caffe2_fbgemm_fake_fp16_clamp --glow_global_fp16 --glow_clip_fp16 --glow_global_fused_scale_offset_fp16 --fbgemm_deserialize_to_original_format --inference_threads 16 --load_model_by_blob --glow_global_fp16_placeholders --glow_global_fp16_constants --glow_clip_fp16_skip_inputs --glow_nnpi_lower_all_batch_matmul=false --glow_nnpi_num_parallel_chunks=6 --print_latency ``` success Net_runner FP16 OPT CARD ``` USE_INF_API=1 LD_LIBRARY_PATH=third-party-buck/platform007/build/fb-nnpi/lib ./buck-out/opt/gen/glow/fb/test/net_runner_nnpi --opt_net buck-out/opt/gen/glow/fb/test/instagram_ctr_model_debug_optimized_net/debug_optimized_net_0.pb_txt --logfiledb buck-out/opt/gen/glow/fb/test/instagram_ctr_model_tiny/105533872_0.predictor --use_input buck-out/opt/gen/glow/fb/test/instagram_ctr_model_inputs_pb/inputs.pb --inference_threads=1 --glow_global_fp16 --glow_global_fused_scale_offset_fp16=1 --glow_clip_fp16 ``` FAIL P131738351 numerics tests on IceRef ``` buck test //caffe2/caffe2/contrib/fakelowp/test:test_batchmatmul_nnpi_fp16nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/1688849890354625/ ``` buck test //caffe2/caffe2/contrib/fakelowp/test:test_batchnorm_nnpi_fp16nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/1125900069449763/ ``` buck test //caffe2/caffe2/contrib/fakelowp/test:test_fc_nnpi_fp16nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/5910974535833584/ ``` buck test //caffe2/caffe2/contrib/fakelowp/test:test_op_nnpi_fp16nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/5910974535833516/ ``` buck test //caffe2/caffe2/contrib/fakelowp/test:test_int8_ops_nnpinnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/2533274820486771/ ``` buck test //caffe2/caffe2/contrib/fakelowp/test:test_sls_4bit_nnpi_fp16nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/5910974536132077/ ``` buck test //caffe2/caffe2/contrib/fakelowp/test:test_sls_8bit_nnpi_fp16nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/4222124677262742/ ``` buck test //caffe2/caffe2/contrib/fakelowp/test:test_sls_8bit_nnpi_fp32nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/1688849890355133/ Previously disabled tests `test_slws_fused_8bit_rowwise_acc32_nnpi` and `test_small_sls_acc32` still FAIL https://www.internalfb.com/intern/testinfra/testconsole/testrun/7599824383821851/ numerics tests on Card ``` buck test -c glow.nnpi_use_inf_api=true //caffe2/caffe2/contrib/fakelowp/test:test_batchmatmul_nnpi_fp16nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/4222124677263984/ ``` buck test -c glow.nnpi_use_inf_api=true //caffe2/caffe2/contrib/fakelowp/test:test_batchnorm_nnpi_fp16nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/1125900069450391/ ``` buck test -c glow.nnpi_use_inf_api=true //caffe2/caffe2/contrib/fakelowp/test:test_fc_nnpi_fp16nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/1688849890354230/ ``` buck test -c glow.nnpi_use_inf_api=true //caffe2/caffe2/contrib/fakelowp/test:test_op_nnpi_fp16nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/5910974535833691/ ``` buck test -c glow.nnpi_use_inf_api=true //caffe2/caffe2/contrib/fakelowp/test:test_int8_ops_nnpinnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/3659174723845865/ ``` buck test -c glow.nnpi_use_inf_api=true //caffe2/caffe2/contrib/fakelowp/test:test_sls_4bit_nnpi_fp16nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/5910974536135102/ ``` buck test -c glow.nnpi_use_inf_api=true //caffe2/caffe2/contrib/fakelowp/test:test_sls_8bit_nnpi_fp16nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/2251799842675543/ ``` buck test -c glow.nnpi_use_inf_api=true //caffe2/caffe2/contrib/fakelowp/test:test_sls_8bit_nnpi_fp32nnpi ``` https://www.internalfb.com/intern/testinfra/testconsole/testrun/2533274820487658/ Previously disabled tests `test_slws_fused_8bit_rowwise_acc32_nnpi` and `test_small_sls_acc32` still FAIL https://www.internalfb.com/intern/testinfra/testconsole/testrun/1407375046178756/ Reviewed By: arunm-git Differential Revision: D21697616 Pulled By: hl475 fbshipit-source-id: 3732324986eb40e644686cdd10e44951678508a7	2020-05-29 10:30:45 -07:00
peter	b44f02f8f5	Fix windows upload jobs (#39249 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39247. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39249 Differential Revision: D21788050 Pulled By: seemethere fbshipit-source-id: 05355364ac063000bd3023e9af50bb6af39b639e	2020-05-29 09:57:36 -07:00
kshitij12345	10e2126b10	support complex types for `cumsum`, `cumprod` (#39063 ) Summary: Adds complex support to `cumsum`, `cumprod` and relevant test update in `test_torch::tensor_op_tests` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39063 Differential Revision: D21771186 Pulled By: anjali411 fbshipit-source-id: 632916d4bdbd1c0941001898ab8146be2b7884fc	2020-05-29 09:36:26 -07:00
Natalia Gimelshein	4b5e87f94a	Revert D21751663: [pytorch][PR] Fix argmin/max bug Test Plan: revert-hammer Differential Revision: D21751663 Original commit changeset: 6d55e4bb7834 fbshipit-source-id: 5473af5650b8a14f1da32d660be43ccf027513e1	2020-05-29 09:08:46 -07:00
Alban Desmaison	d6715e6364	Improve warnings to actually point at user code (#39143 ) Summary: These warning's goal is to show the user where to be careful in their code. So make them point to the user's code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39143 Differential Revision: D21764201 Pulled By: albanD fbshipit-source-id: f1369d1b0e71d93af892ad3b7b1b3030e6699c59	2020-05-29 06:45:24 -07:00
Luca Wehrstedt	d1212e5814	[TensorPipe] Use PrefixStore to avoid conflicting keys (#39185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39185 The TP agent used the store for two things: mapping ranks to names, and mapping names to addresses. The former was prefixed, the latter wasn't. So, if a worker had a name which was `names/0` this would lead to a conflict. We should prefix both usages, and we can do so easily with the `PrefixStore`. ghstack-source-id: 104837023 Test Plan: Unit tests Differential Revision: D21767862 fbshipit-source-id: a256c0b9be349c7ffc11ac2790a2a682e3af84d5	2020-05-29 03:36:18 -07:00
Luca Wehrstedt	99f6df3c07	[TensorPipe] Bind to hostname's IP address instead of localhost (#39184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39184 TensorPipe has implemented some helpers to resolve the IP address of the hostname and to retrieve the IP address of a given interface using libuv, which means they are supposed to be portable across Linux, Mac, Windows... We can thus replace the version we had implemented inside the agent itself (which only resolved the hostname) with those helpers. ghstack-source-id: 104837026 Test Plan: Unit tests Differential Revision: D21494693 fbshipit-source-id: 4652dde6f7af3a90e15918506a103408f81ced0b	2020-05-29 03:36:13 -07:00
Luca Wehrstedt	3ac0ec3dab	[TensorPipe] Don't use separate heap allocation for metrics (#39183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39183 I didn't see any reason behind it, and it seems to work even after removing the unique_ptrs. (Well, it compiles...) ghstack-source-id: 104837027 Test Plan: None... Differential Revision: D21767863 fbshipit-source-id: daebfae69d5b63f1d10345abd625b7e0ddce7e6d	2020-05-29 03:36:07 -07:00
Luca Wehrstedt	587d453b0f	[TensorPipe] Ignore expected errors (#39182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39182 When the TensorPipe context is closed and joined, all pending callbacks are invoked with an error of type PipeClosedError. This is normal and expected, and should not be logged. There is still one log that should be addressed, namely when an incoming pipe from a remote worker dies after we have joined, which I still need to address. That will require some type of "signal" from the remote worker that the shutdown is intentional, for example sending an empty packet? ghstack-source-id: 104837024 Test Plan: Logs become less spammy. Differential Revision: D21703036 fbshipit-source-id: 0a2f9985032b9f1aaf7d2b129ce6d577f13062a4	2020-05-29 03:34:45 -07:00
Luca Wehrstedt	debb7ba6f4	Update TensorPipe submodule (#39189 ) Summary: Pick up a fix to SHM, which was crashing when writing to a full reactor ringbuffer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39189 Test Plan: Testing by CI. Reviewed By: mrshenli Differential Revision: D21769275 fbshipit-source-id: 1499f028d85de3a2facc79277ac5bdea73fd15cc	2020-05-29 02:02:12 -07:00
Michael Voznesensky	fce01a9bab	[JIT] Make new zip serialization for torch save/load significantly (~70%) faster (#38379 ) Summary: Before: ``` 2020-05-11 18:31:41 INFO Benchmarking 'basic', best of 10 runs (with 1 warmup runs) { "Big Tensors Save": { "mean": 17.8048762, "median": 17.458917 }, "Big Tensors Load": { "mean": 3.2556887, "median": 2.9668495000000004 }, "Small Tensors Save": { "mean": 4.0381357, "median": 3.9440125 }, "Small Tensors Load": { "mean": 5.8792499, "median": 5.603067 }, "benchmark_run_at": "2020-05-12T01:31:41" } ``` After ``` Use zipfile serialization: True 2020-05-12 20:15:32 INFO Benchmarking 'basic', best of 10 runs (with 1 warmup runs) { "Big Tensors Save": { "mean": 4.7534657, "median": 4.646732 }, "Big Tensors Load": { "mean": 3.6001919, "median": 3.493285 }, "Small Tensors Save": { "mean": 4.1066924, "median": 4.1219255 }, "Small Tensors Load": { "mean": 6.3902358, "median": 6.36977 }, "benchmark_run_at": "2020-05-13T03:15:32" } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38379 Differential Revision: D21779494 Pulled By: voznesenskym fbshipit-source-id: 694d65029a5b817424d454bd331e285df828c67a	2020-05-29 01:56:18 -07:00
Xingying Cheng	b08a4aaf3b	[PyTorch] Fix operator perf observer index issue. Summary: Fix operator perf observer index issue. Test Plan: make sure that the operator index is populated correctly, ran benchmarking for pytext_mobile_inference, see result: https://www.internalfb.com/intern/aibench/details/598900068317693 Reviewed By: linbinyu Differential Revision: D21779222 fbshipit-source-id: 0fc3561d83d10cfabd73e1e6b6ee240ce0bafd80	2020-05-28 21:52:24 -07:00
Pavel Belevich	d0650af2fb	Change __CUDACC__ and __HIPCC__ to __CUDA_ARCH__ and __HIP_ARCH__ in NumericUtils.h (#39213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39213 This PR fixes the problem that [__expf/__logf/__tanf](https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__SINGLE.html) are "intrinsic functions that are only supported in device code", so nvcc doesn't recognize them if it compiles host code. So `__CUDACC__ ` should be replaced with `__CUDA_ARCH__ ` Test Plan: Imported from OSS Differential Revision: D21779132 Pulled By: pbelevich fbshipit-source-id: b326e2135525b6a1f2392f8d1c17b735d8ef431a	2020-05-28 21:33:31 -07:00
Dmytro Dzhulgakov	2331853236	[caffe2] Fix the correctness check for GivenTensorFill operator Summary: DCHECK is never triggered and the user error could lead to crash. I could make the error message be even nicer by checking shape in contructor, but even this would do. Reviewed By: m3rlin45 Differential Revision: D21778992 fbshipit-source-id: a8ec2faaf734746f6dc42879705245851dc99bed	2020-05-28 21:15:45 -07:00
Gao, Xiang	2f49757372	Remove sumall from TH, THC, THCUNN (#39042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39042 Differential Revision: D21765425 Pulled By: ngimel fbshipit-source-id: a95aba48b7202a723d9f27cb24fe98fbc26ca2c0	2020-05-28 21:03:21 -07:00
Nikita Shulga	ca6579bd40	Regenerate config.yml (#39215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39215 Differential Revision: D21779893 Pulled By: malfet fbshipit-source-id: 69c9c6167fbeba34626ff677696dd97a0865b373	2020-05-28 20:25:04 -07:00
lixinyu	a04fb2ab22	[Reland] add xenial + cuda 9.2 + gcc 5.4 CI test (#39036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39036 Test Plan: Imported from OSS Differential Revision: D21731026 Pulled By: glaringlee fbshipit-source-id: ae678f786f95e3687ed6b3f176fe6736a436c721	2020-05-28 19:48:18 -07:00
ShawnZhong	f7a8851e9e	Fix argmin/max bug (#38946 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38922 # Reproduction - This is correct ```py >>> torch.zeros(1, 32767).argmax(dim=0) tensor([0, 0, 0, ..., 0, 0, 0]) ``` - But this is not ```py >>> torch.zeros(1, 32768).argmax(dim=0) tensor([ 0, 0, 0, ..., 31141, 31141, 31141]) ``` - Only occurs when the size of the reduced dimension is 1 ```py >>> torch.zeros(2, 327680).argmax(dim=0) tensor([1, 1, 1, ..., 1, 1, 1]) >>> torch.zeros(3, 327680).argmax(dim=0) tensor([2, 2, 2, ..., 2, 2, 2]) ``` - Has something to do with the rest of the dims ```py >>> torch.zeros(1, 327680).argmax(dim=0) tensor([ 0, 0, 0, ..., 311296, 311296, 311296]) ``` ```py >>> torch.zeros(1, 32768, 10).argmax(dim=0) tensor([[ 0, 0, 0, ..., 0, 0, 0], [ 0, 0, 0, ..., 0, 0, 0], [ 0, 0, 0, ..., 0, 0, 0], ..., [311296, 311296, 311296, ..., 311296, 311296, 311296], [311296, 311296, 311296, ..., 311296, 311296, 311296], [311296, 311296, 311296, ..., 311296, 311296, 311296]]) ``` # Reason - `resize_outputs_` is set to `false` in `reduce_op`, but the dimension is still coalesced during `TensorIterator::build()` `899a075b25/aten/src/ATen/native/TensorIterator.cpp (L703-L715)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38946 Differential Revision: D21751663 Pulled By: ngimel fbshipit-source-id: 6d55e4bb783423b4c2df09cd3e8b87147efcbfdb	2020-05-28 19:42:07 -07:00
Vasiliy Kuznetsov	527ee63b7d	fused convbn: respect the strict argument when loading from state dict (#39205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39205 Context: * https://github.com/pytorch/pytorch/pull/38478 modified convbn folding logic * https://github.com/pytorch/pytorch/pull/38820 fixed the ^ to be backwards compatible and be able to load v1 state dicts This PR is an additional fix on backwards compatibility - it allows older dicts to be loaded with `strict == False`. This is important because there are teams who use this flow to load floating point checkpoints into fused models, with `strict == False` Test Plan: 1. save a floating point and corresponding fused model: https://gist.github.com/vkuzo/177eba811a7a2ac359054fe9d4e3f099 2. load both of them, it works with strict==False and the floating point one fails with a good error message with strict==True: https://gist.github.com/vkuzo/447c9e797f208cb98447ffb24359d73e Imported from OSS Differential Revision: D21774353 fbshipit-source-id: f85f0c7fa956561824c9addb9198fea7a76a91aa	2020-05-28 19:25:45 -07:00
Xiang Gao	98a755bc8f	Migrate AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES_AND1 to c10::complex (#39045 ) Summary: No special changes are needed for CPU kernels, some CUDA kernels are still doing `c10::complex -> thrust::complex` casting, this will be cleaned up later. But for now, it will be good to just keep it as is, and change the dispatch macro first. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39045 Differential Revision: D21741151 Pulled By: anjali411 fbshipit-source-id: 748f057f9f33338b8c9293aeaa228ad861172e71	2020-05-28 18:48:05 -07:00
Jithun Nair	41363b299a	test_bottleneck_cuda works on ROCm 3.3 (#38249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38249 Differential Revision: D21665097 Pulled By: ailzhang fbshipit-source-id: cb2deab2fe8305db6fbe9ac4bfce4bb01cd9ff29	2020-05-28 17:48:29 -07:00
Nikita Shulga	0e8c65f756	Add timeout to TestBottleneck (#39191 ) Summary: Invoke `Popen.communicate` with `timeout` argument and kill the process in `TimeoutExpired` handler Pull Request resolved: https://github.com/pytorch/pytorch/pull/39191 Differential Revision: D21773510 Pulled By: malfet fbshipit-source-id: 52b94315f8aa4d6c330dd5c9a8936100e49aef2d	2020-05-28 16:08:16 -07:00
Natalia Gimelshein	9c19a12965	fix asserts in cuda code (#39047 ) Summary: Gets rid of some in-kernel asserts where they can be replaced with static_asserts Replaces bare in-kernel `assert` in one case with `CUDA_KERNEL_ASSERT` where necessary replaces host code `assert`s with `TORCH_INTERNAL_ASSERT` Another group of asserts is in fractional max pooling kernels which should be fixed regardless https://github.com/pytorch/pytorch/issues/39044, the problems there are not just asserts. I've audited remaining cases of in-kernel asserts, and they are more like `TORCH_INTERNAL_ASSERT`, so they should not happen with invalid user data. I think it's ok to leave them as is. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39047 Differential Revision: D21750392 Pulled By: ngimel fbshipit-source-id: e9417523a2c672284de3515933cb7ed166e56719	2020-05-28 15:51:38 -07:00
Jeff Hwang	0b9d537056	[dper][pruning] add histogram op (#38514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38514 this diff introduces the `Histogram` caffe2 op, which computes a histogram tensor for a list of input tensors. the bin edges of the histogram are defined by arg `bin_edges`. Test Plan: tests Reviewed By: chocjy Differential Revision: D21553956 fbshipit-source-id: fc98c8db691d66d2dad57b6ad14867109913cb6f	2020-05-28 15:45:04 -07:00
Ivan Kobzarev	928e99b9bb	[vulkan] jni build support USE_VULKAN (#39188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39188 Extracting Vulkan_LIBS and Vulkan_INCLUDES setup from `cmake/Dependencies.cmake` to `cmake/VulkanDependencies.cmake` and reuse it in android/pytorch_android/CMakeLists.txt Adding control to build with Vulkan setting env variable `USE_VULKAN` for `scripts/build_android.sh` `scripts/build_pytorch_android.sh` We do not use Vulkan backend in pytorch_android, but with this build option we can track android aar change with `USE_VULKAN` added. Currently it is 88Kb. Test Plan: Imported from OSS Differential Revision: D21770892 Pulled By: IvanKobzarev fbshipit-source-id: a39433505fdcf43d3b524e0fe08062d5ebe0d872	2020-05-28 15:39:02 -07:00
Mike Ruberry	ee3bd10445	Moves angle/abs test to test_torch (#39154 ) Summary: Moves test (per request). Pull Request resolved: https://github.com/pytorch/pytorch/pull/39154 Differential Revision: D21769706 Pulled By: mruberry fbshipit-source-id: a09d0d0a47fbcf8f0e798d57230f2fe6a9ebf6b9	2020-05-28 14:55:40 -07:00
Eli Uriegas	7b343cc30f	.cirlceci: Remove setup job (#39081 ) Summary: The setup job isn't really what we need anymore so let's get rid of it and remove the single point of failure from our build pipeline. Should also resolve issues with CircleCI where re-run workflow from failed would trigger an entire re-run instead of only jobs that we actually want to re-run. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/39081 Differential Revision: D21770380 Pulled By: seemethere fbshipit-source-id: 92a239deb6f2908eb46d519c332dc34c6023da6d	2020-05-28 14:46:39 -07:00
Brian	feaf72088c	short-circuit pow for complex 1 and 0 exponents (#39117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39117 Test Plan: Imported from OSS Differential Revision: D21764704 Pulled By: nairbv fbshipit-source-id: 7f501b429aa63d121f4841fbb0ef3378b911afcd	2020-05-28 14:28:15 -07:00
Mike Ruberry	5e975cf8d6	Stops cross-device data movement in tensor iterator (#38998 ) Summary: BC-breaking note: In previous versions of PyTorch zero dimensional CUDA tensors could be moved across devices implicitly. For example, ``` torch.tensor(5, device='cuda:0') + torch.tensor((1, 1), device='cuda:1') ``` would work, even though the tensors are on different CUDA devices. This is a frequent source of user confusion, however, and PyTorch generally does not move data across devices without it being explicit. This functionality is removed in PyTorch 1.6. PR Summary: Today in PyTorch we allow implicit data movement of zero dimensional CUDA tensors. For example, we allow: ``` torch.tensor(5, device='cuda:0') + torch.tensor((1, 1), device='cuda:1') ``` and ``` torch.tensor(2, device='cuda') + torch.tensor((3, 5)) ``` In both of these cases TensorIterator would move the zero dim CUDA tensor to the device of the non-scalar tensor (cuda:1 in the first snippet, the CPU in the second snippet). One of PyTorch's fundamental rules, however, is that it does not perform implicit data movement like this, and this change will causes these cases to throw an error. New tests for this behavior are added to test_torch.py, and tests of the old behavior are removed in test_torch.py and test_autograd.py. A cpp test in tensor_iterator_test.cpp is modified to account for the new behavior. This addresses https://github.com/pytorch/pytorch/issues/36722. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38998 Differential Revision: D21757617 Pulled By: mruberry fbshipit-source-id: 2498f07f4938d6de691fdbd5155ad2e881ff7fdb	2020-05-28 13:53:57 -07:00
Richard Zou	d26f7f09b5	Fixup: rename BatchedTensorKey to Batched (#38798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38798 This makes it more in-line with the other keys in the file (DispatchKey.h). Test Plan: Imported from OSS Differential Revision: D21691789 Pulled By: zou3519 fbshipit-source-id: 8d8b902360c0238f67bd0e58f9d969cec4b63320	2020-05-28 13:47:09 -07:00
peter	e029d678b6	Make collect_env more robust on Windows (#39136 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/39133. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39136 Differential Revision: D21763686 Pulled By: zou3519 fbshipit-source-id: d45c3b529f569554e987dfd29579fc93d4894aaf	2020-05-28 13:25:36 -07:00
Rohan Varma	5267b17a96	Revert D21748644: [pytorch][PR] Fix index overflow in ConvTranspose3d Test Plan: revert-hammer Differential Revision: D21748644 Original commit changeset: 95060423219d fbshipit-source-id: 73c53c8a27a29bc8edd5b9b8c80f0f938b04a845	2020-05-28 13:08:35 -07:00
Yanli Zhao	b98948e6dd	implement dynamic bucket order in DDP (#35137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35137 bucket order is rebuilt dynamically in the first reduction backward pass when find_unused_parameters = false ghstack-source-id: 104794018 Test Plan: unit test Differential Revision: D20128537 fbshipit-source-id: fad73de965cdcb59a51c0a12b248271344584b9f	2020-05-28 12:59:52 -07:00
Rohan Varma	7e1cc2daa5	Revert D21729544: add overload name for op eq.str Test Plan: revert-hammer Differential Revision: D21729544 Original commit changeset: cf86f5eb101b fbshipit-source-id: 4d8610fca30e6aaa49fff29741ab56e0dc349cfe	2020-05-28 12:27:12 -07:00
Linbin Yu	c2133179a9	add overload name for op eq.str Summary: See D21681838 There are two "aten::eq" in lite interpreter. Add overload name for op eq.str. Test Plan: CI Reviewed By: iseeyuan Differential Revision: D21729544 fbshipit-source-id: cf86f5eb101bb0530a3dca4051f8fe14ee184f9c	2020-05-28 11:37:24 -07:00
Luca Wehrstedt	f58cc4b444	[RPC] Fix flaky test by waiting for async rref calls (#39012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39012 The `test_rref_context_debug_info` test was flaky with the TensorPipe agent, and I think the issue is the test itself. What was happening is that on line 1826 the test was clearing a global variable on the remote side which was holding a rref. Even though the RPC call that unset the global variable was synchronous, the messages that the rref context needs to send around to delete that rref are asynchronous. Therefore, sometimes, when we reached line 1845 we saw the following check fail: ``` self.assertEqual(2, int(info["num_owner_rrefs"])) ``` because `num_owner_rrefs` was still 3, as the deletion hadn't yet been processed. The only way I found to fix it is to add a synchronization step where we wait for all the futures from the rref context to complete. Since we must wait for this to happen on all workers, we synchronize with a barrier. ghstack-source-id: 104810738 Test Plan: The test isn't flaky anymore. Differential Revision: D21716070 fbshipit-source-id: e5a97e520c5b10b67c335abf2dc7187ee6227643	2020-05-28 10:48:34 -07:00
Luca Wehrstedt	377a355bcc	[TensorPipe] Detect duplicate worker names (#39011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39011 There's a test for this, so let's implement it. It's very easy. ghstack-source-id: 104810739 Test Plan: The test now passes. Differential Revision: D21716068 fbshipit-source-id: 1080040b12913ea0dcc4982182d6b3f6d9ac763c	2020-05-28 10:48:29 -07:00
Luca Wehrstedt	72f2ff5950	[TensorPipe] Improve serialization (#39010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39010 The initial version of the serialization for the TensorPipe RPC agent (i.e., the conversion from rpc::Message to tensorpipe::Message) worker around a limitation of TensorPipe of only allowing one payload per message by pickling each tensor separately and storing the pickles as metadata (which is a less efficient way of sending data over, as it goes through more copies). Having now lifter that limitation we can now improve the way we serialize. We now put the type and the id as their own payloads, we do a single pickling pass for all the tensors of the message (which allows us to deduplicate them) and store the pickle as a payload. My impression is that pickling is a somewhat costly operation, so reducing the number of times we do it should be beneficial for performance. For this same reason, another change I've done here is separate the allocation of the buffers from the deserialization. This will allow us (in the future) to perform the allocation on the I/O event loop but perform the unpickling in the worker thread, thus keeping the event loop more responsive. ghstack-source-id: 104810740 Test Plan: RPC tests Differential Revision: D21716067 fbshipit-source-id: c1475cc78afdcf0820a485ffd98c91abb35796c7	2020-05-28 10:48:24 -07:00
Luca Wehrstedt	65aa2b65e5	[TensorPipe] Close and join TP context at shutdown (#38934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38934 The TensorPipe context contains all the threads and global state. It needs to be closed and joined upon shutdown (joining implicitly closes it). Destructing the context implicitly joins it, which is what was happening so far: we were waiting for the RPC agent to be destroyed for the TP context to be closed. However, I was seeing some TSAN errors that seemed to be happening during the process termination, where the SHM reactor thread was trying to log something on GoogleLog while a static member of GoogleLog was being destructed. I suspect this means that this means that the TP agent was being "leaked" (probably because the `RpcAgent::currentRpcAgent_` static field was still storing it) and thus was destroyed too late. The obvious solution seems to be to destroy it earlier, when GoogleLog is still active. Test Plan: I guess land this and see if the TSAN flakes keep happening? testinprod Differential Revision: D21703016 fbshipit-source-id: d117e619bb835192b1f3c8e2eb3cee94dbdb050f	2020-05-28 10:48:18 -07:00
Luca Wehrstedt	54046c1024	[TensorPipe] Implement join correctly (#38933 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38933 Based on what I could understand from how the RPC shutdown operates and from what the ProcessGroup agent does, the join method is supposed to act as a barrier among all workers that waits until they all have finished all their pending work, including work that may be triggered by nested calls or by callbacks. ghstack-source-id: 104760684 Test Plan: Before this diff, the `test_user_rrefs_confirmed` test of the RPC suite was flakily deadlocking. After this, I haven't been able to repro that. Differential Revision: D21703020 fbshipit-source-id: 3d36c6544f1ba8e17ce27ef520ecfd30552045dd	2020-05-28 10:48:13 -07:00
Luca Wehrstedt	49e4e41fdc	[TensorPipe] Always complete futures from thread pool (#38930 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38930 Any time we mark a future as complete or set an error on it we call its callbacks, which could be arbitrary user functions and could thus be slow or blocking. The safest behavior is to always defer to the loop. ghstack-source-id: 104760682 Test Plan: None... :( Differential Revision: D21703017 fbshipit-source-id: ad2bdc6be25844628ae6f318ef98b496f3d93ffd	2020-05-28 10:48:07 -07:00
Luca Wehrstedt	eaca6f32b0	[TensorPipe] Do not mark future messages as complete after they have timed out (#38931 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38931 When requests time out they are not aborted, so they could in fact still complete successfully but, when they do so, they try to mark an errored future as complete, which causes an error. I don't see any atomic way of doing future->markCompleteIfNeeded, so we implement it on top of it on our side. ghstack-source-id: 104760689 Test Plan: Hit this error in the RPC test suite, and it disappeared after this fix. Differential Revision: D21703015 fbshipit-source-id: af92f7819ed907efb9b068a4ca65420739fac8cc	2020-05-28 10:48:02 -07:00
Luca Wehrstedt	510971f86c	[TensorPipe] Fix lock inversion upon response read error handling (#38929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38929 Fixes a TSAN error that was reported by the internal tests. Test Plan: None... :( Differential Revision: D21703022 fbshipit-source-id: 54480d32d8c19db01d9608a52b7b906a622ca8b2	2020-05-28 10:47:56 -07:00
Luca Wehrstedt	0413e1e624	[TensorPipe] Fix timeout computation (#38928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38928 The original code was ``` steady_clock_time_point earliestTimeout = std::chrono::steady_clock::now() + kLargeTimeDuration; if (std::chrono::steady_clock::now() >= earliestTimeout) { break; } if (!timeoutMap_.empty()) { earliestTimeout = timeoutMap_.begin()->first; } timeoutThreadCV_.wait_until(lock, earliestTimeout); ``` which meant we'd never break the loop, as that required `std::chrono::steady_clock::now()` to be smaller than `std::chrono::steady_clock::now() + kLargeTimeDuration`. The fixed code looks like: ``` steady_clock_time_point earliestTimeout = std::chrono::steady_clock::now() + kLargeTimeDuration; if (!timeoutMap_.empty()) { earliestTimeout = timeoutMap_.begin()->first; } if (std::chrono::steady_clock::now() >= earliestTimeout) { break; } timeoutThreadCV_.wait_until(lock, earliestTimeout); ``` but by staring at it for a second it becomes clear that the code behaves very differently based on whether `timeoutMap_.empty()`, so I think that for better readability we should reflect that in the code, making that `if` the main one. This then allows us to do a timeout-less wait if there are no messages, which avoids the hacky `kLargeTimeDuration`. ghstack-source-id: 104760685 Test Plan: eyes Differential Revision: D21703021 fbshipit-source-id: 0c5062b714c92b956376ae2a8372223fd0d9f871	2020-05-28 10:47:50 -07:00
Luca Wehrstedt	7866854184	[TensorPipe] Add cases for TP in RPC test helpers (#38927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38927 Since the regexs weren't matching the RPC tests would never confirm that the remote end had correctly shut down and were thus retrying in a loop forever. ghstack-source-id: 104760686 Test Plan: Ran the RPC test suite after re-enabling some of the TensorPipe tests Differential Revision: D21703018 fbshipit-source-id: 3e4b8d22810e58c9d72c4317dcf5ba68d6e0b258	2020-05-28 10:47:44 -07:00
Luca Wehrstedt	7b90ed1117	[TensorPipe] Pass names of endpoints to context/pipe for easier debugging (#38926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38926 TensorPipe supports for the user to provide a meaningful name for each context and to specify what it thinks the name of the endpoint it's connecting to is, so that these names can be logged and matched to the otherwise not-very-informative ID of a pipe (given by the PID and some counters) for easier debugging. ghstack-source-id: 104760688 Test Plan: Ran RPC tests with `TP_VERBOSE_LOGGING=1`. Differential Revision: D21479799 fbshipit-source-id: 856d2ffac239a3f9b11318a92ba4534133865dc8	2020-05-28 10:45:48 -07:00
Supriya Rao	1d1f16079d	[quant] Add save/load state_dict to quantized dynamic RNNs (#39105 ) Summary: Previously dynamic LSTM modules weren't able to save/load from state_dict since PackedParameter used in RNNs isn't serializable from python Pull Request resolved: https://github.com/pytorch/pytorch/pull/39105 Test Plan: python test/test_quantization.py TestSerialization Reviewed By: jerryzh168 Differential Revision: D21752256 Pulled By: supriyar fbshipit-source-id: ef82cf21ce21a3a1304d147ed0da538c639f952d	2020-05-28 10:37:38 -07:00
Xiang Gao	78acc9dffb	Check reinterpret_cast of complex bidirectional (#38882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38882 Differential Revision: D21690131 Pulled By: anjali411 fbshipit-source-id: 5634f79e5a0248843625bb4eb69e854359e5d7ef	2020-05-28 09:09:39 -07:00
jiej	bfcb687b9c	Nearest interpolation gpu implementation fix [Resolves issue #38985 ] (#39055 ) Summary: fix nearest upsample dgrad bug, where window computation was wrong previously; fix python test where previously GPU implementation was not tested; Pull Request resolved: https://github.com/pytorch/pytorch/pull/39055 Differential Revision: D21763242 Pulled By: albanD fbshipit-source-id: 9b1d5365f40176450f529136110542fd36bd7f58	2020-05-28 08:07:14 -07:00
Peter Bell	5702a28b26	Fix index overflow in ConvTranspose3d (#38970 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32866 The memory error in the issue is caused by `int` overflowing in `col2vol`. This version using mixed 32-bit and 64-bit indexing calculation lifts the maximum indexing possible without compromising the performance of `ConvTranspose3d`. vs 20-30% regression with pure 64-bit indexing. This requires that `input.numel() <= UINT_MAX`, and `channels * kernel.numel() <= UINT_MAX` otherwise it raises an error. Previously, the code would crash or give incorrect results unless `input.numel() * kernel.numel() <= INT_MAX`. Note that the test is a minimised reproducer for the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38970 Differential Revision: D21748644 Pulled By: ezyang fbshipit-source-id: 95060423219dc647595e1a24b3dcac520d3aecba	2020-05-28 07:28:15 -07:00
Jeff Daily	7e16dd299a	[ROCm] enable mem leak check for rocm (#35953 ) Summary: CC iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/35953 Differential Revision: D21742926 Pulled By: zou3519 fbshipit-source-id: f18534dbb88a84fe98b8d85ce8fde652916a72d5	2020-05-28 07:05:47 -07:00
Alban Desmaison	0d4eefcd82	fix comments in gradcheck (#38877 ) Summary: Follow up to https://github.com/pytorch/pytorch/issues/38774 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38877 Differential Revision: D21697680 Pulled By: albanD fbshipit-source-id: f7cf6fb79f56eac2afceec7167c26e25f20a665d	2020-05-28 06:30:27 -07:00
Luca Antiga	e088902b4a	Add type-hint check for default arguments in TorchScript C++ frontend (#39021 ) Summary: This PR fixes https://github.com/pytorch/pytorch/issues/39020 by requiring users to type-hint default arguments to a TorchScript when using the C++ frontend (the Python frontend will insert those automatically). Since this is a bit of a niche use case, I opted for the simpler solution of making type-hints mandatory for default arguments, as opposed to trying to type-infer them. I left a comment in the code justifying this choice. Test is included. /cc t-vi Pull Request resolved: https://github.com/pytorch/pytorch/pull/39021 Differential Revision: D21755317 Pulled By: suo fbshipit-source-id: e007650d3bfb3a4c58c25ad2c3a17759898f303b	2020-05-28 01:42:04 -07:00
Xiang Gao	7543e7e558	Migrate minall, max, maxall from THC to ATen and cleanup THC (#39029 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/36900 fixes https://github.com/pytorch/pytorch/issues/24594 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39029 Differential Revision: D21747599 Pulled By: ngimel fbshipit-source-id: 9c18876f2ceb0e36db4e043acdb813bfe7ccf6d1	2020-05-28 01:27:40 -07:00
Nikita Shulga	f5bc91f851	Get rid of multiple inheritence in test_torch (#39110 ) Summary: `_TestTorchMixin` is base class which is instantiated across multiple types. It was inherited from `object` in order to hide it from unittest test discovery mechanism. But this approach makes it almost impossible to use static code analyzer on the class. This PR implements alternative approach by hiding base class into inner class, per https://stackoverflow.com/a/25695512 Change imported class access path in `test_cuda.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/39110 Test Plan: run `test_torch.py --discover-tests` and `test_cuda.py --discover-tests` before and after change: ``` $ python test_torch.py --discover-tests\|md5sum 2ca437bb5d65700763ce04cdacf6de3e - $ python test_cuda.py --discover-tests\|md5sum b17df916fb0eeb6f0dd7222d7dae392c - ``` Differential Revision: D21759265 Pulled By: malfet fbshipit-source-id: b01b06111469e551f7b78387449975e5248f6b9e	2020-05-27 22:45:06 -07:00
Rohan Varma	01815be1e4	Infinite timeout for operations against ProcessGroup for RPC (#38577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38577 We don't want to limit a timeout to 30 min since there could be no operations within that time frame. Bump to 2^31 - 1 (int32 max) ghstack-source-id: 104743727 Test Plan: CI Differential Revision: D21602425 fbshipit-source-id: ab002262f01664b538761202b3bd7584fcee3c6b	2020-05-27 22:35:13 -07:00
Hao Lu	b0420cc2de	[Caffe2] Change shape_hints format (#39100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39100 The old shape_hints format has a few cons: - ',' is used to separate <model_id>:<shape_hints> pairs, as well as delimiter for dims in the <shape_hints>, which is an obvious bug - it cannot handle the case of having ':' in tensor names The new shape_hints format uses '::' to delimit <model_id> and <shape_hints>, ';' to delimit <model_id>::<shape_hints> pairs. Inside <shape_hints>, '\|' is used to separate <tensor>,<shape> pairs, and ',' is used to delimit <tensor> and <shape>, as well as the dimensions inside <shape>. Test Plan: ``` buck test //caffe2/caffe2/fb/opt:shape_info_utils_test ``` AI/AF canary: https://www.internalfb.com/intern/ads/canary/426980448937212687 https://www.internalfb.com/intern/ads/canary/426980529105312403 Reviewed By: yinghai Differential Revision: D21656832 fbshipit-source-id: 9dec4b5586d093ddb814c3f15041a57d45a3de76	2020-05-27 21:55:25 -07:00
Cloud Han	05f097b5bb	Implement logaddexp (#38384 ) Summary: Resolve https://github.com/pytorch/pytorch/issues/38377 Related https://github.com/pytorch/pytorch/issues/38349 This op should be disambiguated with `logsumexp` which do a reduction on a tensor over a specific axis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38384 Differential Revision: D21737336 Pulled By: mruberry fbshipit-source-id: 7864d04ca304c0fb2937bb083583e3e3d6ef205d	2020-05-27 20:27:31 -07:00
Lu Fang	90a8cdfdbf	Automatic update of fbcode/onnx to eae3eb8c61cf5ad27cc9a416dbdc5274982385a6 (#39089 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39089 Previous import was 79a7e0df7e86e0f32e7a05f563b24a566540c18b Included changes: - [eae3eb8c](https://github.com/onnx/onnx/commit/eae3eb8c): Use cmake GNUInstallDirs (#2661) <Gustavo Alvarez> - [106821e9](https://github.com/onnx/onnx/commit/106821e9): Update sequence test case so input is not scalar and splits are specified (#2675) <Scott McKay> - [e094e101](https://github.com/onnx/onnx/commit/e094e101): Remove unnecessary copies and std::move (#2684) <Changming Sun> - [71145275](https://github.com/onnx/onnx/commit/71145275): Update Batchnorm test (#2674) <Lara Haidar> - [da13be2d](https://github.com/onnx/onnx/commit/da13be2d): Rename OPTIONAL to OPTIONAL_VALUE (#2682) <Changming Sun> - [2987fa06](https://github.com/onnx/onnx/commit/2987fa06): Adding CI for ONNX Debug mode (Linux, OSX) (#2651) <Vinitra Swamy> - [46fe392d](https://github.com/onnx/onnx/commit/46fe392d): Update Pow input types in Opset 12 (#2666) <Lara Haidar> - [ac1caf3b](https://github.com/onnx/onnx/commit/ac1caf3b): Change type of label tensor to int32/int64 in SoftmaxCrossEntropyLoss spec. (#2667) <M. Zeeshan Siddiqui> - [c2fefcbf](https://github.com/onnx/onnx/commit/c2fefcbf): [Training] SG with Momentum Optimizer (#1959) <Wei-Sheng Chin> - [8d15705e](https://github.com/onnx/onnx/commit/8d15705e): [Training] Add Adagrad optimizer operator (#1955) <Wei-Sheng Chin> - [94b01cdd](https://github.com/onnx/onnx/commit/94b01cdd): Suppress a warning in unsqueeze (#2637) <Hong Xu> - [0582d526](https://github.com/onnx/onnx/commit/0582d526): Fix Greater/LessOrEqual function definition (#2645) <Takeshi Watanabe> - [b852d819](https://github.com/onnx/onnx/commit/b852d819): Increment version number to 1.7.0 (#2639) <Chin Huang> - [ff4bb553](https://github.com/onnx/onnx/commit/ff4bb553): Regenerate Min test data (#2644) <Takeshi Watanabe> Test Plan: ci Reviewed By: hl475 Differential Revision: D21750299 fbshipit-source-id: c33ec1b1e0dc65d0187e78db96d749f9037aae9c	2020-05-27 18:55:48 -07:00
Rohan Varma	988e31c788	Revert D21752017: [pytorch][PR] Test PyTorch using python-3.8 + GCC-9 on Bionic Test Plan: revert-hammer Differential Revision: D21752017 Original commit changeset: 56c841636349 fbshipit-source-id: adf08e03ba9610050fc5440ef453789f805fdc6b	2020-05-27 17:42:22 -07:00
Natalia Gimelshein	d92ef9268d	Revert D21728402: Simplify precision-specification in tests. Test Plan: revert-hammer Differential Revision: D21728402 Original commit changeset: 85f3daf63f1b fbshipit-source-id: 4e2a36aca15cd8d842985173395b4e1cac7135d8	2020-05-27 17:34:28 -07:00
Nick Gibson	cf8001d2d0	[TensorExpr] Fix a bug in Rfactor when there are multiple reductions (#38733 ) Summary: In `LoopNest::rfactor` we assume that there is only a single reduction below the insertion point, and when replacing the reduction we recursively replace all reductions below that point. This is not a safe assumption, as a number of transformations can introduce additional ReduceOps - most directly a `splitWithTail` on the innermost reduce axis. This PR fixes that bug, and adds some unit tests covering the case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38733 Differential Revision: D21723634 Pulled By: nickgg fbshipit-source-id: 3ed6ffcdc2c15aef7504f9b2b91e8d827e0b5d88	2020-05-27 16:49:34 -07:00
Jeff Daily	0f1f0a1f35	update circleci scripts for rocm ubuntu bionic support (#39097 ) Summary: CC ezyang xw285cornell sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/39097 Differential Revision: D21753340 Pulled By: ezyang fbshipit-source-id: 9cd84e9c47c08702d7b67d071dc88c345b9db85c	2020-05-27 16:33:09 -07:00
Jessica Lin	b12a879184	Correct Javadoc link to master (#39038 ) Summary: Correct Javadoc link to match the 1.4 version: https://github.com/pytorch/pytorch/blob/release/1.4/docs/source/index.rst Pull Request resolved: https://github.com/pytorch/pytorch/pull/39038 Differential Revision: D21747969 Pulled By: jlin27 fbshipit-source-id: 941b61204e9be53e15a6351eff6f4935e6a16d24	2020-05-27 16:21:30 -07:00
Nikita Shulga	30dd4acbf6	Test PyTorch using python-3.8 + GCC-9 on Bionic (#39030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39030 Differential Revision: D21752017 Pulled By: malfet fbshipit-source-id: 56c841636349e24c9ebef8dac18c283de3664fa5	2020-05-27 15:56:37 -07:00
Meghan Lele	fa184c351f	[JIT][to-backend] Fix compilation unit and name mangling of generated module (#38679 ) Summary: Summary This commit gets rid of the separate compilation unit that is currently being created for every backend-specific module generated by `jit::backend::generateToBackendFn` and mangles the name properly to allow multiple backend-specific modules to coexist in the same compilation unit. Test Plan `python test/test_jit.py TestBackends` Fixes This pull request fixes part of https://github.com/pytorch/pytorch/issues/37841. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38679 Differential Revision: D21744620 Pulled By: SplitInfinity fbshipit-source-id: ac85b8ce0d179c057991e9299fd53a4e13ba02a9	2020-05-27 15:40:51 -07:00
Presley Graham	ff2e29144c	Refactor backward compatibility tests to use override_qengines decorator (#38838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38838 Test Plan: Imported from OSS Differential Revision: D21676032 Pulled By: durumu fbshipit-source-id: 5cbe56e0d72d322f540bccffb60bcdbb15385ee8	2020-05-27 15:37:47 -07:00
Ailing	20397285c6	Replace use of np.allclose in tests. (#34287 ) Summary: fixes https://github.com/pytorch/pytorch/issues/34096 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34287 Differential Revision: D21735525 Pulled By: ailzhang fbshipit-source-id: 611da17cfc5a3fee77d482abccf8f9854f504263	2020-05-27 15:29:35 -07:00
Chunli Fu	898d062bfd	[disagg_acc] In batch broadcast (#38700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38700 Reviewed By: yinghai Differential Revision: D21634147 fbshipit-source-id: 7bd1912654e2433cfb580b5f7a9fb86570a55cab	2020-05-27 15:21:37 -07:00
Mike Ruberry	4239416c72	Throws runtime error on attempted addcdiv integer division (#38762 ) Summary: 1.6 Deprecation Note: In 1.6 attempting to perform integer division using addcdiv will throw a RuntimeError, and in 1.7 the behavior will change so that addcdiv always performs a true division of its tensor1 and tensor2 inputs. See the warning in torch.addcdiv's documentation for more information. PR Summary: This PR updates the warning that appears when addcdiv performs integer division to throw a RuntimeError. This is intended to prevent silent errors when torch.addcdiv's behavior is changed to always perform true division in 1.7. The documentation is updated (slightly) to reflect this, as our the addcdiv tests in test_torch and test_type_promotion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38762 Differential Revision: D21657585 Pulled By: mruberry fbshipit-source-id: c514b44409706f2bcfeca4473424b30cc48aafbc	2020-05-27 14:40:07 -07:00
Kimish Patel	bb12e4dca0	Add JIT fusion pass to fuse quantized add and relu. (#38897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38897 Quantized ops support add_relu. This pass enables finding quantized add + relu pattern and fuse them to add_relu. Test Plan: buck run caffe2/test:quantization -- test_quantization.TestFusionPasses Reviewed By: jerryzh168 Differential Revision: D21690909 fbshipit-source-id: 607cf72dde535df15eb7638841543ab2156af464	2020-05-27 14:16:57 -07:00
Kimish Patel	248758d702	Expose qnnpack's maxpool when going through aten::max_pool2d (#38896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38896 Current way of exposing qnnpack's maxpool2d only works if max_pool2d op is quantized::max_pool2d. This diff moves the function about to expose it via aten::max_pool2d when dispatch key is QuantizedCPU. Test Plan: Quantized tests. Reviewed By: supriyar Differential Revision: D21690913 fbshipit-source-id: 75fb77329b915e3a3c3aac4d76359482976ca783	2020-05-27 14:14:35 -07:00
Nikita Shulga	c6e9e9359f	[Codemod][GleanFbcode] Remove dead includes in caffe2/test (#39023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39023 Reviewed By: orionr Differential Revision: D21702529 fbshipit-source-id: 6945bba95609102409850b105a8a091e33b8acc9	2020-05-27 14:07:26 -07:00
chengjinfang	c835dedce9	Fix the issue that PyTorch doesn't construct bool tensors from non-bo… (#38392 ) Summary: …ol values correctly(https://github.com/pytorch/pytorch/issues/37398) Signed-off-by: chengjinfang <chengjf@cn.fujitsu.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/38392 Differential Revision: D21737009 Pulled By: mruberry fbshipit-source-id: c77d8c940af95f5011fe008b48ea0d16c3f501d1	2020-05-27 13:59:28 -07:00
Nik Ved	30063347e7	remove serial_exec from scatter/gather kernel (#36181 ) Summary: Since the indexed dimension in `scatter/gather` is traversed inside the kernel, all the memory conflicts of writing to the same memory between the threads are actually mutually disjoint. See [this comment](https://github.com/pytorch/pytorch/issues/33389#issuecomment-590017938) for a graphical explanation. More formal description: Suppose we deal with 3D tensors and `dim=0`, hence the `scatter_add` operations are ``` self[index[i][j][k]][j][k] += src[i][j][k], ... self[index[i'][j'][k']][j'][k'] += src[i'][j'][k'], ... ``` Clearly, write/read to the same memory happens if and and only if: ``` index[i][j][k] = index[i'][j'][k'], j = j', k = k'. ``` Since the reduction over `dim=0` happens inside the kernel, threads `i` and `i'` partition `dim=1,2`. It means that threads `i` and `i'` receive indices ``` I = {(, i, k) sent to the thread i}, I' = {(, i', k') sent to the thread i'}, I intersection with I' = the empty set. ``` This happens: ``` index[i][j][k] = index[i'][j'][k'], j = j', k = k', ``` if and only if there exists some thread k which receives indices K and `(,j,k),(,j',k') in K`. Therefore it is possible to make `scatter_add` parallel and remove `serial_exec` from the `scatter_gather_base_kernel`. CC v0dro Pull Request resolved: https://github.com/pytorch/pytorch/pull/36181 Differential Revision: D21716167 Pulled By: ngimel fbshipit-source-id: 49aee2de43779a1f0b359c22c8589c0702ee68a2	2020-05-27 13:28:00 -07:00
Hector Yuen	b636f5e324	change the int8 test to use unquantized bias (fp32) Summary: change the test default to test the version we care about Test Plan: ran the test Reviewed By: amylittleyang Differential Revision: D21725194 fbshipit-source-id: 243fcdf1dd5784768f6ceb2b46f9f1c9e64341eb	2020-05-27 12:23:39 -07:00
Brian	df4066bbb6	Simplify precision-specification in tests. (#37181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37181 Now that assertEquals considers dtypes in determining tolerance, most tests don't need explicitly set precision. Those that do are a few half precision tests on cuda. In this PR, those are broken out to be handled explicitly, though we may also want to consider further loosening the tolerance on half-precision. Test Plan: Imported from OSS Differential Revision: D21728402 Pulled By: nairbv fbshipit-source-id: 85f3daf63f1bdbb5101e8dea8c125f13448ca228	2020-05-27 12:05:33 -07:00
Peter Bell	1c74d965ed	Fix attribute warning on gcc (#38988 ) Summary: When building, my log was being spammed with: ``` warning: attribute "__visibility__" does not apply here ``` Which, at least on gcc 7.4 isn't covered by silencing `-Wattribute`. The warning suggests `enum`s don't need to be exported on linux, so I just `ifdef` it out instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38988 Differential Revision: D21722032 Pulled By: ezyang fbshipit-source-id: ed4cfebc187dceaa9e748d85f756611fd7eda4b4	2020-05-27 11:59:06 -07:00
Donna Choi	3d2fce6bc3	Change len(DataLoader) for IterableDataset (#38925 ) Summary: Fix https://github.com/pytorch/pytorch/issues/36176 One-liner change to ensure that ```len(loader) == (len(dataset) // batch_size)``` for IterableDataset. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38925 Differential Revision: D21731587 Pulled By: ezyang fbshipit-source-id: 59a086165a004c0c1c8a1ee0776b1444bd26de23	2020-05-27 11:56:41 -07:00
ashishfarmer	53b55d8f38	Use ninja build as default for HIPExtensions (#38939 ) Summary: This PR adds the following changes: 1. It sets the default extension build to use ninja 2. Adds HIPCC flags to the host code compile string for ninja builds. This is needed when host code makes HIP API calls cc: ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/38939 Differential Revision: D21721905 Pulled By: ezyang fbshipit-source-id: 75206838315a79850ecf86a78391a31ba5ee97cb	2020-05-27 11:35:19 -07:00
Tongzhou Wang	dfc4be205e	Fix broken reference in sync bn doc (#38890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38890 Differential Revision: D21722162 Pulled By: ezyang fbshipit-source-id: a7d18239917b2886fe8c1c0aaf42fc8491c8e10c	2020-05-27 11:30:48 -07:00
Ksenija Stanojevic	0edf063c24	Enable Constant Folding Tests (#38751 ) Summary: Enable tests for constant folding since constant folding is enabled for opset 12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38751 Differential Revision: D21728013 Pulled By: ezyang fbshipit-source-id: e0ed9ad62d8b781eacfdf894e8c9609fe7e778bd	2020-05-27 11:22:19 -07:00
Omkar Salpekar	e07ee1954d	[TensorPipe Agent] Message on Agent Shutdown (#38819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38819 Logs a message when the agent is shutting down like the other RPC Agents. ghstack-source-id: 104673386 Test Plan: Sandcastle Differential Revision: D21671061 fbshipit-source-id: a44f0e4976e3acc898645a2baf6f41f45a697166	2020-05-27 11:09:45 -07:00
Omkar Salpekar	d08a30a300	[TensorPipe Agent] Improve Response Error Message on Agent Shutdown (#38818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38818 Standardizes the error message when a response is attempted after the agent has shut down. ghstack-source-id: 104673115 Test Plan: Sandcastle - no functionality change, just error message Differential Revision: D21670706 fbshipit-source-id: d26fcd7c76758c62d432d9c4e6ef2e3af7cbedff	2020-05-27 10:57:07 -07:00
Jeff Daily	1093e26d72	[ROCm] HIP version guard for occupancy API compatibility (#38551 ) Summary: CC ezyang xw285cornell HIP from ROCm 3.5 renames `hipOccupancyMaxActiveBlocksPerMultiprocessor` to `hipModuleOccupancyMaxActiveBlocksPerMultiprocessor`. In addition, the API parameter types now match CUDA. Add these changes in a backwards-compatible manner. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38551 Differential Revision: D21721832 Pulled By: ezyang fbshipit-source-id: 6fc971845e363d7495d8be9550e76d0f082c3062	2020-05-27 10:09:06 -07:00
peter	626048efd3	Fix Windows binary jobs after migrating to the new circleci image (#39057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39057 Differential Revision: D21742971 Pulled By: albanD fbshipit-source-id: a25ab8b01a9b7c1e2d14fe38227f85a5b8f0db83	2020-05-27 09:35:03 -07:00
Negin Raoof	7f1c9886cd	[ONNX] Enable models tests (#38791 ) Summary: PR to enable model tests which are fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38791 Reviewed By: hl475 Differential Revision: D21732498 Pulled By: houseroad fbshipit-source-id: f417f9d4124ef5a663dc666d5c2ed6ba013b26a4	2020-05-27 09:09:59 -07:00
Alban Desmaison	b789c1790f	Update to use the stable windows image instead of the temporary one (#39066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39066 Differential Revision: D21742331 Pulled By: albanD fbshipit-source-id: e4184b38eb69a289910e79808ebe9b2510dc6b06	2020-05-27 08:19:57 -07:00
Gregory Chanan	d627f2b174	Support void return type in TensorIteratorDynamicCasting checks. (#38815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38815 Some CPU kernels have void return types and the currently implementation segfaults on these cases. Test Plan: Imported from OSS Differential Revision: D21670717 Pulled By: gchanan fbshipit-source-id: bc17b8330195601ca231a985ee44319447ba6cf0	2020-05-27 07:41:13 -07:00
Gregory Chanan	45f32ceb4e	Move needs_dynamic_casting to a non-CUDA specific file. (#38813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38813 We are going to apply this check to CPU (with some changes), so just moving this in preparation. The code is just cut-pasted here, no behavioral change. Test Plan: Imported from OSS Differential Revision: D21670554 Pulled By: gchanan fbshipit-source-id: c7e07f67bb4c6524fde12237e35892e42557103e	2020-05-27 07:41:07 -07:00
Gregory Chanan	bbb5e106ad	Improve error checking of CUDALoops. (#38810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38810 Same change as was applied to CPU loops -- separate out checking of the inputs and outputs. Test Plan: Imported from OSS Differential Revision: D21670339 Pulled By: gchanan fbshipit-source-id: 42f208538dce1a5598d14948d8d02a1c91ba152a	2020-05-27 07:41:02 -07:00
Gregory Chanan	b7882f9bd6	Improve cpu/Loops.h arity asserts. (#38809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38809 This splits the asserts into separate input/output asserts and makes the numbers precise, instead of ranges. This is an ongoing effort to improve the Loops assertion and to integrate dynamic cast checking into CPU loops. Test Plan: Imported from OSS Differential Revision: D21670263 Pulled By: gchanan fbshipit-source-id: b1868db5255a69158045b759dc9171690a2dcd01	2020-05-27 07:38:58 -07:00
Mike Ruberry	13120bf677	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21740237 Pulled By: mruberry fbshipit-source-id: acbc027aa1d7877a49664d94db9a5fff91a07042	2020-05-27 06:31:07 -07:00
Nikolay Korovaiko	9b95f757af	move num_profiled_runs to common_utils (#38687 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38687 Differential Revision: D21634080 Pulled By: Krovatkin fbshipit-source-id: 55513124caf3885e475ffecd9d9f3dbc4729a573	2020-05-27 01:14:01 -07:00
Meghan Lele	916084d933	[JIT] Allow @torch.jit.unused to be used on TS classes (#38522 ) Summary: Summary This commit enables the use of `torch.jit.unused` on methods of TorchScript classes. This attribute is honoured by replacing the body of any method marked as unused in the parsed AST for the class with `raise Exception(...)`. Test Plan This commit adds a unit test `TestClassType.test_unused_method` that tests this feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38522 Differential Revision: D21733818 Pulled By: SplitInfinity fbshipit-source-id: 771872359dad70fac4aae83b6b5f17abb6329890	2020-05-26 23:21:54 -07:00
Natalia Gimelshein	93d87a16eb	Revert D21493165: Automatic update of fbcode/onnx to 20b3e10e6c3a9cdab90d2bb864d1c36d3e3651cd Test Plan: revert-hammer Differential Revision: D21493165 Original commit changeset: 6863b289bfbf fbshipit-source-id: 47b530c8ffceb3673a86b6cf0c064fe6af0eb72d	2020-05-26 21:35:29 -07:00
Martin Valgur	de8c888232	Fix torch.hub.hub_dir inconsistencies (#38969 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38401 * `torch.hub.load_state_dict_from_url()` now also downloads to `$TORCH_HOME/hub/checkpoints` instead of `$TORCH_HOME/checkpoints` like `torch.hub.load()` and others. * Make `hub_dir` private, add and use `get_dir()` instead. Also updated docs. Did not see a need for additional unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38969 Differential Revision: D21725880 Pulled By: ailzhang fbshipit-source-id: 58cc6b32ddbda91e58c1c1433cc3916223556ea1	2020-05-26 21:06:52 -07:00
Supriya Rao	2b789e2e03	[quant] Onnx export of quantized models with new API (#38736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38736 qconv2d and qlinear APIs were changed recently so updating the scale code accordingly Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py Imported from OSS Differential Revision: D21647724 fbshipit-source-id: 45d4b358ffb84f1e73da8ba3f702d5043bdb16d2	2020-05-26 21:01:18 -07:00
Lu Fang	51274b501a	Automatic update of fbcode/onnx to 20b3e10e6c3a9cdab90d2bb864d1c36d3e3651cd (#38203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38203 Previous import was 79a7e0df7e86e0f32e7a05f563b24a566540c18b Included changes: - [20b3e10e](https://github.com/onnx/onnx/commit/20b3e10e): Add 'ignore_index' input in the spec for SoftmaxCrossEntropyLoss and NLLLoss. (#2680) <M. Zeeshan Siddiqui> - [eae3eb8c](https://github.com/onnx/onnx/commit/eae3eb8c): Use cmake GNUInstallDirs (#2661) <Gustavo Alvarez> - [106821e9](https://github.com/onnx/onnx/commit/106821e9): Update sequence test case so input is not scalar and splits are specified (#2675) <Scott McKay> - [e094e101](https://github.com/onnx/onnx/commit/e094e101): Remove unnecessary copies and std::move (#2684) <Changming Sun> - [71145275](https://github.com/onnx/onnx/commit/71145275): Update Batchnorm test (#2674) <Lara Haidar> - [da13be2d](https://github.com/onnx/onnx/commit/da13be2d): Rename OPTIONAL to OPTIONAL_VALUE (#2682) <Changming Sun> - [2987fa06](https://github.com/onnx/onnx/commit/2987fa06): Adding CI for ONNX Debug mode (Linux, OSX) (#2651) <Vinitra Swamy> - [46fe392d](https://github.com/onnx/onnx/commit/46fe392d): Update Pow input types in Opset 12 (#2666) <Lara Haidar> - [ac1caf3b](https://github.com/onnx/onnx/commit/ac1caf3b): Change type of label tensor to int32/int64 in SoftmaxCrossEntropyLoss spec. (#2667) <M. Zeeshan Siddiqui> - [c2fefcbf](https://github.com/onnx/onnx/commit/c2fefcbf): [Training] SG with Momentum Optimizer (#1959) <Wei-Sheng Chin> - [8d15705e](https://github.com/onnx/onnx/commit/8d15705e): [Training] Add Adagrad optimizer operator (#1955) <Wei-Sheng Chin> - [94b01cdd](https://github.com/onnx/onnx/commit/94b01cdd): Suppress a warning in unsqueeze (#2637) <Hong Xu> - [0582d526](https://github.com/onnx/onnx/commit/0582d526): Fix Greater/LessOrEqual function definition (#2645) <Takeshi Watanabe> - [b852d819](https://github.com/onnx/onnx/commit/b852d819): Increment version number to 1.7.0 (#2639) <Chin Huang> - [ff4bb553](https://github.com/onnx/onnx/commit/ff4bb553): Regenerate Min test data (#2644) <Takeshi Watanabe> Test Plan: ci Reviewed By: hl475 Differential Revision: D21493165 fbshipit-source-id: 6863b289bfbf4235e36f0e2456ce44c776aaf164	2020-05-26 20:12:36 -07:00
Rohan Varma	362928d5dc	Remove unneeded const from process group agent header (#38804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38804 This is only needed in the process group agent implementation, and removing it from the header file prevents other translation units that include it from having this constant. ghstack-source-id: 104666599 Test Plan: CI Differential Revision: D21668514 fbshipit-source-id: 1c39cc98dea99518134c66dca3ca5b124a43de1b	2020-05-26 20:01:45 -07:00
Nick Gibson	a25062ab50	[TensorExpr] Fix elimination of For loops with empty bodies (#38883 ) Summary: We do try to eliminate empty For loops, but missed a case where the body Block exists but is empty. In that case we can eliminate the loop as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38883 Differential Revision: D21723680 Pulled By: nickgg fbshipit-source-id: 49610b0524af5b9ec30ef3b4cc0c8461838259c3	2020-05-26 18:58:57 -07:00
Nikolay Korovaiko	4fcd1c3123	run te only for profiling executor (#38591 ) Summary: * Disable the mode where PE can still run the old fuser. * Clean up Pull Request resolved: https://github.com/pytorch/pytorch/pull/38591 Differential Revision: D21643664 Pulled By: Krovatkin fbshipit-source-id: 6753ed6bdc544698a1340e59a624608ff3abf7f9	2020-05-26 18:35:25 -07:00
Rohan Varma	63e545e0fe	Revert D21717199: [pytorch][PR] Updates assertEqual to require atol and rtol, removes positional atol Test Plan: revert-hammer Differential Revision: D21717199 Original commit changeset: 9feb856f94ee fbshipit-source-id: bfde9c39a5ce99f0ca6183a7dde703c65b7c8259	2020-05-26 18:23:59 -07:00
Natalia Gimelshein	ba14a701dc	restore proper cuda assert behavior with DNDEBUG (#38943 ) Summary: Per title. https://github.com/pytorch/pytorch/issues/32719 essentially disabled asserts in cuda kernels in release build. Asserts in cuda kernels are typically used to prevent invalid reads/writes, so without asserts invalid read/writes are silent errors in most cases (sometimes they would still cause "illegal memory access" errors, but because of caching allocator this usually won't happen). We don't need 2 macros, CUDA_ALWAYS_ASSERT and CUDA_KERNEL_ASSERT because all current asserts in cuda kernels are important to prevent illegal memory accesses, and they should never be disabled. This PR removes macro CUDA_ALWAYS_ASSERT and instead makes CUDA_KERNEL_ASSERT (that is commonly used in the kernels) an asserttion both in release and debug builds. Fixes https://github.com/pytorch/pytorch/issues/38771 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38943 Differential Revision: D21723767 Pulled By: ngimel fbshipit-source-id: d88d8aa1b047b476d5340e69311e65aff4da5074	2020-05-26 18:11:00 -07:00
peter	eddc3f61d0	Migrate Windows build jobs to VS 2019 for CUDA >= 10.1 (#38959 ) Summary: This PR relies on https://github.com/pytorch/pytorch/pull/38957 and https://github.com/pytorch/builder/pull/445. Tested with pytorch/pytorch#38949 and pytorch/pytorch#38956. Will need a rebase after the dependent commits go in Pull Request resolved: https://github.com/pytorch/pytorch/pull/38959 Differential Revision: D21732423 Pulled By: malfet fbshipit-source-id: 50837a026a575bb3d547526e299db7bcfd7637a8	2020-05-26 16:37:52 -07:00
Andy Jones	7e85f6f922	Removes pickle deprecation warning (#39003 ) Summary: As per [this issue](https://github.com/pytorch/pytorch/issues/38597), this is a one-line PR to remove the pickle deprecation warning. cc stsievert driazati Pull Request resolved: https://github.com/pytorch/pytorch/pull/39003 Differential Revision: D21723048 Pulled By: ezyang fbshipit-source-id: f6cc8f28b8140edd7b46d1f26f2d99819beb933e	2020-05-26 16:32:28 -07:00
Ivan Kobzarev	44d418957e	[vulkan] TensorConversions remove explicit vulkan ifs (#39019 ) Summary: As a follow up for https://github.com/pytorch/pytorch/pull/36491 and last comments on it. Vulkan uses Strided Layout (at the moment strides are not supported, but in plan) empty_strided just forwards to empty_vulkan, ignoring strides params. Removing explicit ifs in TensorConversions that were added before decision to use Strided layout and have not been cleaned after that :( Pull Request resolved: https://github.com/pytorch/pytorch/pull/39019 Differential Revision: D21726480 Pulled By: IvanKobzarev fbshipit-source-id: d465456df248a118bfef441c85280aa0025860cd	2020-05-26 16:27:02 -07:00
chengjinfang	f188b52b59	Fix the issue that Bad interaction between no_grad and numpy conversi… (#38906 ) Summary: …on(https://github.com/pytorch/pytorch/issues/37000) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38906 Differential Revision: D21722033 Pulled By: albanD fbshipit-source-id: f22aec8106e4546e828aba15be606e9d9f3eeffa	2020-05-26 16:18:58 -07:00
mattip	2e6ee853ab	make onnx expect tests resiliant to producer_version changes (#39002 ) Summary: closes gh-32561 closes gh-38545. As part of the fallout from gh-36797, this PR - replaces the producer_version: "1.6" in onnx expect tests with `producer_version: "XXX" - adapts `testing/_internal/common_utils.py` with a regex to change the onnx producer_version so tests still pass The consistency of the torch version and the onnx `producer_version` is tested in gh-36797, so there is no reason to test it again in the expect tests. xref gh-38629 which documented how to run the onnx tests and at the same time refactored the Community documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39002 Differential Revision: D21723062 Pulled By: ezyang fbshipit-source-id: 1bd6a8ed37d5383e69d017226dc09c0645a69aff	2020-05-26 16:11:21 -07:00
xueht-fnst	c611b57bd1	Add index number to THArgCheck error message (#38978 ) Summary: - Resolving the feature introduced in https://github.com/pytorch/pytorch/issues/38652 - Since the iteration will be terminated once the error occurred, perhaps we can only give the current index which caused the error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38978 Differential Revision: D21722426 Pulled By: ezyang fbshipit-source-id: edfc3f7a320584ba22d790f2b79c3726e99aae2a	2020-05-26 16:07:04 -07:00
kshitij12345	2751dda7f6	[docs] fix formula `torch.logcumsumexp` (#38952 ) Summary: Reference : https://github.com/pytorch/pytorch/pull/36308#issuecomment-632282641 After fix: ![Screenshot from 2020-05-23 15-35-09](https://user-images.githubusercontent.com/19503980/82727956-4bcabb80-9d0b-11ea-85a8-81b35012abbc.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38952 Differential Revision: D21722196 Pulled By: ezyang fbshipit-source-id: 62b08c14e0ce9603133841940627df40d7b1e861	2020-05-26 16:02:43 -07:00
mattip	8650376444	DOC: fix import error (#38921 ) Summary: Fixes errors when importing the module. The import is used by sphinx in documentation builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38921 Differential Revision: D21722144 Pulled By: ezyang fbshipit-source-id: 5f31d4750325f1753de93754a009006cbc13655e	2020-05-26 15:58:34 -07:00
Nikita Shulga	47869b1b12	Windows build updates (#39035 ) Summary: Small follow-up updates https://github.com/pytorch/pytorch/pull/38971 - Remove extra whitespace - Delete unused script files - Modify `vs_install.ps1` to use correct workspace path Pull Request resolved: https://github.com/pytorch/pytorch/pull/39035 Differential Revision: D21731129 Pulled By: malfet fbshipit-source-id: e04253e82f8753423b4634d1928f2d0fcf20ebbb	2020-05-26 15:52:50 -07:00
Jeff Daily	ccab142197	Add ROCm-specific half_support_literal for JIT. (#38899 ) Summary: CC ezyang xw285cornell sunway513 lcskrishna Pull Request resolved: https://github.com/pytorch/pytorch/pull/38899 Differential Revision: D21721855 Pulled By: ezyang fbshipit-source-id: 3739c462f04cee40ff979f44387ef66b971f5303	2020-05-26 14:34:08 -07:00
Mike Ruberry	0d649efb81	Updates torchvision version (#38848 ) Summary: In PyTorch 1.6 integer division using torch.div will throw a runtime error. When PyTorch Master adopts this behavior some of our ONNX tests would break if we continued to import torchvision v0.5, since v0.5 uses torch.div to perform integer division. fmassa and I recently updated Torchvision to use torch.floor_divide for integer division (at least on paths covered by the PyTorch OSS CI tests), and this PR updates our torchvision test version to include those changes. This will prevent the PyTorch OSS CI from breaking when PyTorch Master adopts the 1.6 integer division behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38848 Differential Revision: D21679988 Pulled By: mruberry fbshipit-source-id: 1333f6254c295909cf05b6f3e352e4a0c336e5af	2020-05-26 13:35:22 -07:00
ShawnZhong	12c219de54	Fix histc with empty tensor error (#38987 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38979 The error in mentioned https://github.com/pytorch/pytorch/issues/38979 is a [`cudaErrorInvalidConfiguration` error](https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__TYPES.html#group__CUDART__TYPES_1g3f51e3575c2178246db0a94a430e0038): > This indicates that a kernel launch is requesting resources that can never be satisfied by the current device. Requesting more shared memory per block than the device supports will trigger this error, as will requesting too many threads or blocks. See cudaDeviceProp for more device limitations. This is because we are trying to launch a kernel with block size 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38987 Differential Revision: D21722993 Pulled By: ezyang fbshipit-source-id: 2c283e0a9f542b4acb96e895a43b991ccac808fe	2020-05-26 13:19:13 -07:00
Yan Zhu	c40a79a027	[c2] cuda impl for WeightScale op (#38712 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38712 as title Test Plan: buck test; Reviewed By: ustctf Differential Revision: D21586705 fbshipit-source-id: 12cd34f04f074ee12b77304055f3ba6068cf38fb	2020-05-26 12:50:54 -07:00
Rohan Varma	224ce03ebe	Revert D21681838: add eq.str op to lite interpreter Test Plan: revert-hammer Differential Revision: D21681838 Original commit changeset: 1f17ecdadb9b fbshipit-source-id: bac620957d1a68057cd53f91f6a837d2b64f0e5e	2020-05-26 12:18:08 -07:00
Xiao Wang	e4a3c584d5	Fix max_pool2d nchw backward bug (#38953 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38764 The current problem is that, `top_diff` and `top_mask` pointers are shifted "accumulatively" with for-n and for-c loops. This may cause overflow and illegal memory access when the loop counts are greater than one, that is n > 65535 or c > 65535 (the case in https://github.com/pytorch/pytorch/issues/38764). Since neither of n > 65535 or c > 65535 is common, it has not been seen before. The simple fix would be using new pointer variables for the n & c offset instead of directly modifying `top_diff` or `top_mask`. However, I think the current nchw max_pool2d GPU impl still has plenty of room for performance improvement. We can check that in a later PR if needed. Slightly clean up the indentation. Also add tests to use CPU impl as a reference check. cc skrah Pull Request resolved: https://github.com/pytorch/pytorch/pull/38953 Differential Revision: D21721930 Pulled By: ezyang fbshipit-source-id: fef7d911d814f8ed9fd67c60cabe5d52f8fd3d57	2020-05-26 12:00:31 -07:00
kshitij12345	0ff1aa9058	Port TH cum{sum,prod}_cuda to ATen (#36458 ) Summary: References: https://github.com/pytorch/pytorch/issues/24521 #24522 https://github.com/pytorch/pytorch/issues/24547 #24548 https://github.com/pytorch/pytorch/issues/24507 Depends on https://github.com/pytorch/pytorch/issues/36308 Changes related to this PR are only in file : aten/src/ATen/Declarations.cwrap aten/src/ATen/native/cuda/ReduceOpsKernel.cu aten/src/ATen/native/native_functions.yaml aten/src/THC/generic/THCTensorMathScan.cu aten/src/THC/generic/THCTensorMathScan.h Please Review VitalyFedyunin Thanks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36458 Differential Revision: D21718384 Pulled By: ngimel fbshipit-source-id: 5af15164050c77be164397abd659a48c9ded2b29	2020-05-26 11:50:16 -07:00
Ivan Kobzarev	996b6a3d00	[vulkan] Fix python overrides tests for is_vulkan_available (#39016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39016 Differential Revision: D21724619 Pulled By: IvanKobzarev fbshipit-source-id: d7a6c8b944a55bc4f2cce957eeac08c5801667a0	2020-05-26 11:42:55 -07:00
Xiao Wang	583ff947e1	Fix max_pool2d for returning wrong shape with return_indices=True on cuda (#38992 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38986 The current code only resizes pooling output but forget to resize indices as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38992 Differential Revision: D21718324 Pulled By: ngimel fbshipit-source-id: 7cf937966d38ab2167be79979475c4e0cacbf82c	2020-05-26 11:27:36 -07:00
Ivan Kobzarev	c82375306c	[vulkan] Fix Bazel build, add aten/native/vulkan/stub/*.cpp (#39018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/39018 Differential Revision: D21724853 Pulled By: IvanKobzarev fbshipit-source-id: 8d5bbc914b168da7d27c5447d625a9cfce61127f	2020-05-26 11:22:57 -07:00
ShawnZhong	fc4dfbf700	Remove reference of CUDA < 9.2 (#38977 ) Summary: Since CUDA < 9.2 is no longer supported (See https://github.com/pytorch/pytorch/pull/36848, https://github.com/pytorch/pytorch/pull/36846), this PR updates the required CUDA version in README.md to avoid confusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38977 Differential Revision: D21722965 Pulled By: ezyang fbshipit-source-id: 626772f4303d023918dda34a620d95693174d97f	2020-05-26 09:23:26 -07:00
Hector Yuen	108321dc41	move int8 fc operators and dependencies (#38935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38935 move int8 ops Test Plan: sandcastle Reviewed By: jackm321, zrphercule Differential Revision: D21704235 fbshipit-source-id: 7d5b570a5840ff21ffa6256604f892a084a30b31	2020-05-26 09:17:54 -07:00
Diego M. Rodriguez	4d1df74c7c	Use a temporary file during ReducerTest (#39004 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37558 Use a temporary file instead of `/dev/null` in `ReducerTest`, to prevent the chance of unintended deletion when running as root. It seemed that there were no strong side-effects (running time?) by fixing it at the test level, compared to other solutions that involved modifying the behaviour of `FileStore` (for example, adding an optional flag to avoid auto-deleting the file upon destruction). Please note this is my first contribution - I have done my best to read the contributing guide and checked for duplicate PRs with no luck, but apologies in advance for any oversights and lack of familiarity with the procedures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/39004 Differential Revision: D21721966 Pulled By: mrshenli fbshipit-source-id: 76fb81600fa08a91c35d0eb9a5aab179f5371422	2020-05-26 09:05:34 -07:00
peter	1fef2075a5	Disable some unsupported module for 32-bit build (#38950 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38322#issuecomment-632976523 and https://github.com/pytorch/pytorch/issues/38322#issuecomment-628698852. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38950 Differential Revision: D21721918 Pulled By: ezyang fbshipit-source-id: 999788bb88d3e3c2c06f8dec4f0d6b3389095936	2020-05-26 08:30:35 -07:00
peter	81daadf651	Expose VC_YEAR in Windows binary test jobs (#38957 ) Summary: To make it configurable in https://github.com/pytorch/builder/pull/445. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38957 Differential Revision: D21721933 Pulled By: ezyang fbshipit-source-id: 510b19e59bed4ff9d6c39173b4d5c5fc69290ed0	2020-05-26 08:30:29 -07:00
Mike Ruberry	6ddca30b2d	Updates assertEqual to require atol and rtol, removes positional atol (#38872 ) Summary: This updates assertEqual and assertEqual-like functions to either require both or neither of atol and rtol be specified. This should improve clarity around handling precision in the test suite, and it allows us to remove the legacy positional atol argument from assertEqual. In addition, the "message" kwarg is replace with a kwarg-only "msg" argument whose name is consistent with unittest's assertEqual argument. In the future we could make "msg" an optional third positional argument to be more consistent with unittest's assertEqual, but requiring it be specified should be clear, and we can easily update the signature to make "msg" an optional positional argument in the future, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38872 Differential Revision: D21717199 Pulled By: mruberry fbshipit-source-id: 9feb856f94eee911b44f6c7140a1d07c1b026d3a	2020-05-26 08:30:23 -07:00
Linbin Yu	341fd63ff6	add eq.str op to lite interpreter (#38859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38859 This error message indicates aten::eq expects different types ``` RUNNING 379 OP 76, aten::eq terminate called after throwing an instance of 'c10::Error' what(): isInt() INTERNAL ASSERT FAILED at "buck-out/gen/68e83026/xplat/caffe2/aten_header#header-mode-symlink-tree-with-header-map,headers/ATen/core/ivalue.h":331, please report a bug to PyTorch. ``` It turns out that there are two aten::eq in lite interpreter (https://www.internalfb.com/intern/diffusion/FBS/browse/master/xplat/caffe2/torch/csrc/jit/runtime/register_prim_ops.cpp?lines=417) aten::eq(int, int) aten::eq(str, str) This diff add overload name for str and it fixed the problem. Test Plan: local test Reviewed By: pengtxiafb Differential Revision: D21681838 fbshipit-source-id: 1f17ecdadb9bc1c16915a24c60fa57a6fc273865	2020-05-26 08:30:18 -07:00
Ivan Kobzarev	b460465a18	[Mobile GPU][Integration] Vulkan backend integration (#36491 ) Summary: This PR contains the initial version of Vulkan (GPU) Backend integration. The primary target environment is Android, but the desktop build is also supported. ## CMake Introducing three cmake options: USE_VULKAN: The main switch, if it is off, all other options do not affect. USE_VULKAN_WRAPPER: ON - Vulkan will be used loading it at runtime as "libvulkan.so" using libdl, every function call is wrapped in vulkan_wrapper.h. OFF - linking with libvulkan.so directly USE_VULKAN_SHADERC_RUNTIME: ON - Shader compilation library will be linked, and shaders will be compiled runtime. OFF - Shaders will be precompiled and shader compilation library is not included. ## Codegen if `USE_VULKAN_SHADERC_RUNTIME` is ON: Shaders precompilation () starts in cmake/VulkanCodegen.cmake, which calls `aten/src/ATen/native/vulkan/gen_glsl.py` or `aten/src/ATen/native/vulkan/gen_spv.py` to include shaders source or SPIR-V bytecode inside binary as uint32_t array in spv.h,spv.cpp. if `USE_VULKAN_SHADERC_RUNTIME` is OFF: The source of shaders is included as `glsl.h`,`glsl.cpp`. All codegen results happen in the build directory. ## Build dependencies cmake/Dependencies.cmake If the target platform is Android - vulkan library, headers, Vulkan wrapper will be used from ANDROID_NDK. Desktop build requires the VULKAN_SDK environment variable, and all vulkan dependencies will be used from it. (Desktop build was tested only on Linux). ## Pytorch integration: Adding 'Vulkan" as new Backend, DispatchKey, DeviceType. We are using Strided layout without supporting strides at the moment, but we plan to support them in the future. Using OpaqueTensorImpl where OpaqueHandle is copyable VulkanTensor, more details in comments in `aten/src/ATen/native/vulkan/Vulkan.h` Main code location: `aten/src/ATen/native/vulkan` `aten/src/ATen/native/vulkan/VulkanAten.cpp` - connection link between ATen and Vulkan api (Vulkan.h) that converts at::Tensor to VulkanTensor. `aten/src/ATen/native/Vulkan/Vulkan.h` - Vulkan API that contains VulkanTensor representation and functions to work with it. Plan to expose it for clients to be able to write their own Vulkan Ops. `aten/src/ATen/native/vulkan/VulkanOps.cpp` - Vulkan Operations Implementations that uses Vulkan.h API ## GLSL shaders Located in `aten/src/ATen/native/vulkan/glsl` as *.glsl files. All shaders use Vulkan specialized constants for workgroup sizes with ids 1, 2, 3 ## Supported operations Code point: conv2d no-groups conv2d depthwise addmm upsample nearest 2d clamp hardtanh ## Testing `aten/src/ATen/test/vulkan_test.cpp` - contains tests for copy from CPU to Vulkan and back all supported operations Desktop builds supported, and testing can be done on a desktop that has Vulkan supported GPU or with installed software implementation of Vulkan, like https://github.com/google/swiftshader ## Vulkan execution The initial implementation is trivial and waits every operator's execution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36491 Differential Revision: D21696709 Pulled By: IvanKobzarev fbshipit-source-id: da3e5a770b1a1995e9465d7e81963e7de56217fa	2020-05-26 08:30:13 -07:00
Yang Gu	1fa0bb6d9d	Use workspace to persist and restore images for Windows CI build and … (#38971 ) Summary: Inspired by malfet > By the way, once we have build_artifacts property, can someone try if its faster to use it as mean of transferring images between build and test instead of using AWS (i.e. use artifacts instead of jenkins/pytorch/win-test-helpers/upload_image.py /download_image.py pair) Use CircleCI to store intermediate binaries and make them available to be downloaded as artifacts instead of uploading to S3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38971 Differential Revision: D21717080 Pulled By: seemethere fbshipit-source-id: e3498b058778d02ae2f38daefbc7118a1a2cbe76	2020-05-26 08:30:07 -07:00
svcscm	f07a60fcd4	Updating submodules Summary: GitHub commits: `3491869253` `56b6191c13` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: a117b779500b040f5c1e087ed4ffb1587d745663	2020-05-26 08:30:02 -07:00
Natalia Gimelshein	c34b333230	improve accuracy of logsoftmax computation on cuda (#38945 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38839. Previously, if magnitude of input values was large, when computing `max+log(sum)` the `log(sum)` value was essentially ignored, now the result is computed as `x-max-log(sum)` which has a better chance of preserving accuracy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38945 Differential Revision: D21712483 Pulled By: ngimel fbshipit-source-id: c1a3599ed981ba7a7fd130cbd7040a706b7eace0	2020-05-26 08:29:56 -07:00
Brian	389e16c33b	`torch.pow` Add type promotion support and fix issue with __rpow__ (#37098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37098 ### Cherry-picked from another stack: Some code review already occurred here: https://github.com/pytorch/pytorch/pull/32582 ### Summary: Fixes: https://github.com/pytorch/pytorch/issues/32436 The issue caused incorrect handling of dtypes for scalar tensor. e.g. before this change: ``` >>> 5.5 torch.ones(5, dtype=torch.int32) tensor([5, 5, 5, 5, 5], dtype=torch.int32) ``` should return a float tensor. Also fixes a number of incorrect cases: * tensors to negative powers were giving incorrect results (1 instead of 0 or error) * Behavior wasn't consistent between cuda/cpu * large_value ** 1 in some cases gave a result not equal to large_value because of truncation in conversion to double and back. BC-breaking: Previously incorrect behavior (in 1.4): ``` >>> a tensor([1, 1, 1, 1, 1], dtype=torch.int32) >>> a.pow_(.5) tensor([1, 1, 1, 1, 1], dtype=torch.int32) ``` After this change: `RuntimeError: result type Float can't be cast to the desired output type Int` Test Plan: Imported from OSS Differential Revision: D21686207 Pulled By: nairbv fbshipit-source-id: e797e7b195d224fa46404f668bb714e312ea78ac	2020-05-26 08:29:51 -07:00
Shawn Zhong	ba3893e736	Rename `torch._C.Generator` to `torch.Generator` (#38773 ) Summary: Fix https://github.com/pytorch/pytorch/issues/26528 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38773 Differential Revision: D21701053 Pulled By: pbelevich fbshipit-source-id: 57632ca9ce430ec30dc8e40739194ee2b5860f71	2020-05-26 08:29:46 -07:00
Luca Wehrstedt	b8f2ecbfb6	Update TensorPipe submodule (#38923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38923 Test Plan: We'll see on CircleCI Reviewed By: beauby Differential Revision: D21703670 fbshipit-source-id: fd477486226303130906d669b0e9a1c888cfeee0	2020-05-26 08:28:04 -07:00
Yunus Rahbar	5749ef75d3	Update ShipIt sync fbshipit-source-id: 945b4dfe99016e1788d0ec5429343e8c610f4e20	2020-05-26 08:11:42 -07:00
Xiang Gao	7e6f6f522f	[PATCH] Migrate min from THC to ATen and remove _min (#38440 ) Summary: Related issue: https://github.com/pytorch/pytorch/issues/36900 Since I feel this PR is already large enough, I didn't migrate max in this PR. Legacy code is not cleaned up either. All these remaining work will be done in later PRs after this is merged. Benchmark on an extreme case ```python import torch print(torch.__version__) t = torch.randn(100000, 2, device='cuda') warmup = torch.arange(100000000) torch.cuda.synchronize() %timeit t.min(dim=0); torch.cuda.synchronize() ``` Before: 4ms; After: 24.5us. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38440 Differential Revision: D21560691 Pulled By: ngimel	2020-05-26 08:10:38 -07:00
Igor Sugak	d035d05080	[pytorch] expose __ldg(const Half* ptr) to Clang in host mode (#38151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38151 We need to expose this method to Clang unconditionally when building CUDA, otherwise it would error on device code calling `__ldg` with `Half*`. Test Plan: ``` buck build -c fbcode.caffe2_use_mpi=1 -c fbcode.cuda_use_clang=true mode/opt //experimental/training_supercomputer/trainer/hpc_pt:trainer ``` Reviewed By: ngimel Differential Revision: D21481297 fbshipit-source-id: aacfe7de2cdc8542908249081ddb58170b1e35ff	2020-05-21 22:18:32 -07:00
peter	f3f3097a4c	Use old circleci windows image for both CPU and CUDA (#38909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38909 Differential Revision: D21701160 Pulled By: malfet fbshipit-source-id: 7eb81b76e3e9b269ded668e873e10695e7bb1ae4	2020-05-21 21:53:15 -07:00
Elias Ellison	cd5d7a34b8	[JIT] Factor out aliases to separate test (#38746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38746 Factors out testing of op alias normalization so that there is a registry used for tests. Test Plan: Imported from OSS Differential Revision: D21673107 Pulled By: eellison fbshipit-source-id: e06653cdf24f14a4253dd054e4d402d171d16a11	2020-05-21 21:47:24 -07:00
Elias Ellison	f90dc741eb	[JIT] Normalize op aliases (#38735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38735 Follow up to my comment https://github.com/pytorch/pytorch/pull/36597/#issuecomment-613674329 This adds a pass to convert op aliases into a normalized form. Having two ops generated in our IR that do the same thing makes the IR harder for downstream consumers of the IR, such as TorchScript passes but also ONNX, glow, etc. Another solution would have been to fix our code generation to only emit `aten::abs` from the start. This seems trickier, and doesn't really buy us much if we still have to expose `aten::absolute` in C++, as glaringlee of the C++ API thinks we should. Bike shedding: maybe this should be `CanonicalizeOps` instead Test Plan: Imported from OSS Differential Revision: D21673108 Pulled By: eellison fbshipit-source-id: c328618907de1af22e07f57fd27fa619978c2817	2020-05-21 21:47:17 -07:00
Elias Ellison	5183e3aa16	[JIT] Rename canonicalize ops (#38734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38734 As far as I can tell, this pass only exists to canonicalize ops that are generating in the graph fuser, so it's kind of a misnomer. Test Plan: Imported from OSS Differential Revision: D21673109 Pulled By: eellison fbshipit-source-id: b7bedf34ccaf1fcd442bfb2bbb990e64915f51d4	2020-05-21 21:45:15 -07:00
Nikita Shulga	22454c5aeb	Collect and upload error logs if VisualStudio installation fails (#38902 ) Summary: Add `store_artifacts` attribtue to Windows build jobs In `vs_install.ps1` add logic to download vscollect tool and upload collected results as build artifacts Pull Request resolved: https://github.com/pytorch/pytorch/pull/38902 Differential Revision: D21700598 Pulled By: malfet fbshipit-source-id: b51c47ff44ac522ad5581624f5b9a9a86cf1e595	2020-05-21 20:05:25 -07:00
Nikita Shulga	4c0bf93a0e	Revert D21057090: Remove useless copy on zip file load Test Plan: revert-hammer Differential Revision: D21057090 Original commit changeset: e3d30a3b09f4 fbshipit-source-id: b24cbe77aae38b321882e7dcf41022710ee28ed0	2020-05-21 19:34:18 -07:00
Yinghai Lu	a53422e0ee	[FakeLowp] Open source more c2 ops (#38878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38878 We need to Packing op and shape extraction functions to make some of the FakeLowP tests run in OSS. Test Plan: unittests Reviewed By: hyuen Differential Revision: D21682704 fbshipit-source-id: f36321b91acfd738e90543309b82ad87b9e5c156	2020-05-21 19:10:04 -07:00
Vasiliy Kuznetsov	8d8b586c7a	fake_quant: make qparams shape consistent (#38587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38587 Before this diff, scale+zp were initialized to tensors with a single dimension and 1 element, and then switched to scalar tensors after the first forward. This diff makes the shape stay consistent. This should fix an issue reported when saving/loading models, which crashes on this inconsistent shape. Test Plan: ``` python test/test_quantization.py TestFakeQuantizePerTensor.test_fake_quant_preserves_qparam_shapes_for_activations ``` Imported from OSS Differential Revision: D21605532 fbshipit-source-id: e00cd268d6d3ded1006d18d6c6759c911b3a74ea	2020-05-21 19:08:08 -07:00
davidriazati	455bf77da5	Remove useless copy on zip file load (#36362 ) Summary: Instead of copying to a buffer, then setting a tensor's storage with that buffer, create a storage directly from the file ](https://our.intern.facebook.com/intern/diff/21057090/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/36362 Pulled By: driazati Differential Revision: D21057090 fbshipit-source-id: e3d30a3b09f4d67bf4bb7a0dd7f4f60c3dd1a47e	2020-05-21 18:57:06 -07:00
Christian Sarofeen	8e69c3be17	[nvFuser] Reduction support in codegen, fp16 support (#38627 ) Summary: Adds reduction support for the code generator. Reductions are fully supported with split/merge/reorder/rfactor/computeAt/unroll operators. There is also cross thread (intra-block) reduction support. The two remaining pieces missing for reduction support is: - Safety: If cross thread reduction was used, child operators shouldn't be able to bind that thread dim anymore - Cross block reduction: we will want inter-block reduction support to match parity with tensor iterator PR also provides FP16 support for fusions now. We insert casts on FP16 inputs to FP32, and we insert casts to FP16 on FP16 outputs. Also working towards reductions and shape inference for reductions in the fusion pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38627 Reviewed By: albanD Differential Revision: D21663196 Pulled By: soumith fbshipit-source-id: 3ff2df563f86c39cd5821ab9c1148149e5172a9e	2020-05-21 17:18:39 -07:00
Xiang Gao	d3b0cf9ae9	Kill AT_DISPATCH_ALL_TYPES_AND_C10_COMPLEX_AND (#38462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38462 Test Plan: Imported from OSS Differential Revision: D21663878 Pulled By: anjali411 fbshipit-source-id: f58a173a1d7cd56986788a28a28c76dbf4386c01	2020-05-21 16:31:30 -07:00
Ailing Zhang	9b9fc59b0a	Add cuda version of clang9 image. (#38825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38825 Differential Revision: D21687353 Pulled By: ailzhang fbshipit-source-id: d99bb6d034b26e851ddaf7ff2ef572ae58cc20bc	2020-05-21 16:21:43 -07:00
Kurt Mohler	f9eb8824f1	Remove datatype from Storage and StorageImpl (#38870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38870 * Removed dtype data member from StorageImpl * Removed any methods or method arguments in Storage/StorageImpl that deal with dtypes * Update all callers of the changed API Part of issue https://github.com/pytorch/pytorch/issues/33950 Original PR: https://github.com/pytorch/pytorch/pull/38038 Reviewed By: albanD Differential Revision: D21549645 Pulled By: ezyang fbshipit-source-id: 4289b356c55ff6b9530376a79343b99b540ee3de	2020-05-21 15:26:08 -07:00
Xiang Gao	9b656dac7f	Switch AT_DISPATCH_COMPLEX_TYPES_AND and AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX to c10::complex (#37697 ) Summary: These two macros only appear in `Dispatch.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37697 Differential Revision: D21666340 Pulled By: anjali411 fbshipit-source-id: 1f31ab46c08b77f1011367e471874d390ffa70fb	2020-05-21 15:05:54 -07:00
Yuxin Wu	0e2a0478af	Support paths with spaces when building ninja extension (#38670 ) Summary: Generate the following `build.ninja` file and can successfully build: ``` cflags = -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DWITH_CUDA '-I/scratch/yuxinwu/space space/detectron2/layers/csrc' -I/private/home/yuxinwu/miniconda3/lib/python3.7 /site-packages/torch/include -I/private/home/yuxinwu/miniconda3/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/private/home/yuxinwu/miniconda3/lib/python3.7/site-packages/torc h/include/TH -I/private/home/yuxinwu/miniconda3/lib/python3.7/site-packages/torch/include/THC -I/public/apps/cuda/10.1/include -I/private/home/yuxinwu/miniconda3/include/python3.7m -c post_cflags = -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++14 cuda_cflags = -DWITH_CUDA '-I/scratch/yuxinwu/space space/detectron2/layers/csrc' -I/private/home/yuxinwu/miniconda3/lib/python3.7/site-packages/torch/include -I/private/home/yuxinwu/miniconda3/li b/python3.7/site-packages/torch/include/torch/csrc/api/include -I/private/home/yuxinwu/miniconda3/lib/python3.7/site-packages/torch/include/TH -I/private/home/yuxinwu/miniconda3/lib/python3.7/site -packages/torch/include/THC -I/public/apps/cuda/10.1/include -I/private/home/yuxinwu/miniconda3/include/python3.7m -c cuda_post_cflags = -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options '-fPIC' -DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_ OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ -ccbin=/public/apps/gcc/7.1.0/bin/gcc -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=_C -D_GLIBCXX_USE_CXX11_ABI=0 -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -std=c++14 ldflags = rule compile command = $cxx -MMD -MF $out.d $cflags -c $in -o $out $post_cflags depfile = $out.d deps = gcc rule cuda_compile command = $nvcc $cuda_cflags -c $in -o $out $cuda_post_cflags build /scratch/yuxinwu/space$ space/build/temp.linux-x86_64-3.7/scratch/yuxinwu/space$ space/detectron2/layers/csrc/vision.o: compile /scratch/yuxinwu/space$ space/detectron2/layers/csrc/vision.c$ p build /scratch/yuxinwu/space$ space/build/temp.linux-x86_64-3.7/scratch/yuxinwu/space$ space/detectron2/layers/csrc/box_iou_rotated/box_iou_rotated_cpu.o: compile /scratch/yuxinwu/space$ space/de$ ectron2/layers/csrc/box_iou_rotated/box_iou_rotated_cpu.cpp build /scratch/yuxinwu/space$ space/build/temp.linux-x86_64-3.7/scratch/yuxinwu/space$ space/detectron2/layers/csrc/ROIAlignRotated/ROIAlignRotated_cpu.o: compile /scratch/yuxinwu/space$ space/de$ ectron2/layers/csrc/ROIAlignRotated/ROIAlignRotated_cpu.cpp build /scratch/yuxinwu/space$ space/build/temp.linux-x86_64-3.7/scratch/yuxinwu/space$ space/detectron2/layers/csrc/nms_rotated/nms_rotated_cpu.o: compile /scratch/yuxinwu/space$ space/detectron2$ layers/csrc/nms_rotated/nms_rotated_cpu.cpp build /scratch/yuxinwu/space$ space/build/temp.linux-x86_64-3.7/scratch/yuxinwu/space$ space/detectron2/layers/csrc/ROIAlign/ROIAlign_cpu.o: compile /scratch/yuxinwu/space$ space/detectron2/layer$ /csrc/ROIAlign/ROIAlign_cpu.cpp ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38670 Differential Revision: D21689613 Pulled By: ppwwyyxx fbshipit-source-id: 1f71b12433e18f6b0c6aad5e1b390b4438654563	2020-05-21 14:57:40 -07:00
Shawn Zhong	b1982c4bdb	Fix multiline signatures in docstring (#38768 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38694 See https://5533621-65600975-gh.circle-artifacts.com/0/docs/torch.html ## Index Page \| Before \| After \| \| --- \| --- \| \| ![image](https://user-images.githubusercontent.com/6421097/82448124-ee1a4300-9a6e-11ea-9a48-cabf62eedd92.png) \| ![image](https://user-images.githubusercontent.com/6421097/82448175-fd00f580-9a6e-11ea-8c79-c3dd6bac0b69.png) \| \| ![image](https://user-images.githubusercontent.com/6421097/82448234-0f7b2f00-9a6f-11ea-8221-19335ee60aa2.png) \| ![image](https://user-images.githubusercontent.com/6421097/82448262-19049700-9a6f-11ea-9eea-ac2f71068d7f.png) \| ## Detail Page \| Before \| After \| \| --- \| --- \| \| ![image](https://user-images.githubusercontent.com/6421097/82448421-4fdaad00-9a6f-11ea-9909-29692cb8ca01.png) \| ![image](https://user-images.githubusercontent.com/6421097/82448440-5701bb00-9a6f-11ea-8c07-d06cb0cdfa50.png) \| \| ![image](https://user-images.githubusercontent.com/6421097/82448496-68e35e00-9a6f-11ea-8db9-2d75a9328b3a.png) \| ![image](https://user-images.githubusercontent.com/6421097/82448539-7567b680-9a6f-11ea-9c2e-a59eca4090c4.png) \| ![image](https://user-images.githubusercontent.com/6421097/82448563-7d275b00-9a6f-11ea-97af-51f45969f473.png) \| \| ![image](https://user-images.githubusercontent.com/6421097/82448329-320d4800-9a6f-11ea-8d24-3d33445cf591.png) \| ![image](https://user-images.githubusercontent.com/6421097/82448353-389bbf80-9a6f-11ea-8cc8-752d3fd0dee1.png) \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/38768 Differential Revision: D21691859 Pulled By: zou3519 fbshipit-source-id: 336158be450436554a1fa2105a5eedf24236c56b	2020-05-21 14:39:32 -07:00
Nikita Shulga	acc181c2ea	Document `torch.utils.cmake_prefix_path` (#38727 ) Summary: Documents new global variable pointing to PyTorch CMake config files Pull Request resolved: https://github.com/pytorch/pytorch/pull/38727 Differential Revision: D21694243 Pulled By: malfet fbshipit-source-id: 652532cd5da9945caf7d7dfe1fde696dc474661b	2020-05-21 14:34:19 -07:00
Rohan Varma	6d4d508d8e	Log incorrect device in ProcessGroupGloo (#38844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38844 Enhances error message in ProcessGroupGloo to log the unsupported device. Been seeing a few issues with this and this will provide more debug information. Test Plan: CI Differential Revision: D21676881 fbshipit-source-id: 1fd727162682e1a55003adff67c4358dab488455	2020-05-21 13:16:50 -07:00
Eli Uriegas	5dd65ba634	.circleci: Add simple backup and restore solution for RCs (#38690 ) Summary: * Does a basic upload of release candidates to an extra folder within our S3 bucket. * Refactors AWS promotion to allow for easier development of restoration of backups Backup restoration usage: ``` RESTORE_FROM=v1.6.0-rc3 restore-backup.sh ``` Requires: * AWS credentials to upload / download stuff * Anaconda credentials to upload Pull Request resolved: https://github.com/pytorch/pytorch/pull/38690 Differential Revision: D21691033 Pulled By: seemethere fbshipit-source-id: 31118814db1ca701c55a3cb0bc32caa1e77a833d	2020-05-21 13:09:12 -07:00
Shawn Zhong	481838f21b	Sphinx parallel build (#38785 ) Summary: See https://www.sphinx-doc.org/en/master/man/sphinx-build.html#cmdoption-sphinx-build-j > Distribute the build over N processes in parallel, to make building on multiprocessor machines more effective. Note that not all parts and not all builders of Sphinx can be parallelized. If auto argument is given, Sphinx uses the number of CPUs as N. - Timing results - Python doc build on a 40-core machine: 9:34 down to 1:29 - pytorch_cpp_doc_push: ~1h 10m down to 47m - pytorch_python_doc_push: 34m down to 32m Pull Request resolved: https://github.com/pytorch/pytorch/pull/38785 Differential Revision: D21691991 Pulled By: zou3519 fbshipit-source-id: cfc5e8cd13414640f82edfd2ad1ce4d9c7afce12	2020-05-21 13:03:55 -07:00
peter	a40049fd2a	Better handling for msvc env when compiling cpp extensions (#38862 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38861#issuecomment-631934636. 1. Error out if msvc env is activated but `DISTUTILS_USE_SDK` is not set. 2. Attempt to activate msvc env before running ninja build Pull Request resolved: https://github.com/pytorch/pytorch/pull/38862 Differential Revision: D21686343 Pulled By: ezyang fbshipit-source-id: 38b366654e2d0376dbdd21276689772b78e9718e	2020-05-21 12:52:22 -07:00
peter	4e46c95826	Fix cpp extension build failure if path contains space (#38860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38860 Differential Revision: D21686335 Pulled By: ezyang fbshipit-source-id: 2675f4f70b48ae3b58ea597a2b584b446d03c704	2020-05-21 12:36:27 -07:00
Xiang Gao	b9105f42a1	Kill AT_DISPATCH_ALL_TYPES_AND_C10_COMPLEX (#38792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38792 Test Plan: Imported from OSS Differential Revision: D21669629 Pulled By: anjali411 fbshipit-source-id: f9dd46219ff90217d94315f7223b49cc4aeab091	2020-05-21 11:45:05 -07:00
Pritam Damania	bf9395438f	Disable test_nccl for ROCm (#38801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38801 NCCL specific tests that shouldn't be run on ROCm ghstack-source-id: 104481245 Test Plan: waitforbuildbot Differential Revision: D21667348 fbshipit-source-id: a3e558185d9b74e1eac5fae27d97d5d026baa0a1	2020-05-21 11:15:08 -07:00
Nik Ved	07bed4b7ef	remove redundant contiguous in unfold_backward. (#38871 ) Summary: As per title. Makes for a 5-25% boost on CPU in tests from [https://github.com/pytorch/pytorch/issues/36612](https://github.com/pytorch/pytorch/pull/36612). Pull Request resolved: https://github.com/pytorch/pytorch/pull/38871 Differential Revision: D21687007 Pulled By: albanD fbshipit-source-id: c64b2545ad159fa7463cc32f2e2a72dde6229eff	2020-05-21 10:30:24 -07:00
kshitij12345	3487744821	Add `torch.logcumsumexp` (#36308 ) Summary: Creating new PR as I am unable to push to pandeykartikey 's branch as I don't have the permissions. Closes https://github.com/pytorch/pytorch/issues/26411 Based on https://github.com/pytorch/pytorch/issues/32876 Thanks pandeykartikey for starting this out. Have addressed the comments. anjali411 agadetsky albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/36308 Differential Revision: D21648573 Pulled By: albanD fbshipit-source-id: bc1a8fc4ab474a1148298117a1549b0e46f7c3ff	2020-05-21 09:12:31 -07:00
Alban Desmaison	b88b7d552f	Prevent custom Functions from creating non differentiable type that requires grad (#38326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38326 Test Plan: Imported from OSS Differential Revision: D21668740 Pulled By: albanD fbshipit-source-id: f452f65e76003492055311523a652937b1300183	2020-05-21 08:30:14 -07:00
Alban Desmaison	0f1669181a	Add specific list of supported types in autograd (#38325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38325 Test Plan: Imported from OSS Differential Revision: D21668739 Pulled By: albanD fbshipit-source-id: 2e6ebaa36e41a084aed0a8e1e16b6e37e36a1910	2020-05-21 08:28:06 -07:00
Kimish Patel	a83f25314b	Some TODO fixes. (#37829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37829 - Removes memset not needed. - Removed separate packing function added for dynamic linear. Test Plan: Quantized tests. Imported from OSS Differential Revision: D21404841 fbshipit-source-id: b24fc36961a65a9be3d4c12768031ea70bae4394	2020-05-21 07:58:57 -07:00
Alban Desmaison	2c2fe6356a	Add a check for stride==0 in gradcheck (#38774 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38586 Raise a proper error and fix the failing test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38774 Differential Revision: D21668720 Pulled By: albanD fbshipit-source-id: 5d15e9885934661c30c3dc6dd7389b7a33456a33	2020-05-21 07:54:29 -07:00
Edward Yang	6f0e53624d	Enforce that named_tensor_meta_ is non-null only if there is a non-wildcard name (#38725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38725 Today, there are two equivalent representations: named_tensor_meta_ is null, or named_tensor_meta_ is non-null but all of the dimension names are wildcard. Let's reduce the opportunity for behavior divergence by making the second representation illegal. This will make it easier for me to add a dispatch key for named tensor as I can rely on setters to always go through TensorImpl to maintain invariants on DispatchKey. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21662641 Pulled By: ezyang fbshipit-source-id: ccc6566d23ad2ba850f653364a86cc8db0428223	2020-05-21 07:48:55 -07:00
rohithkrn	1ea80b4234	[ROCm] Set correct tolerance values for bfloat16 div tests (#38823 ) Summary: This PR fixes the tolerance values for some of the bfloat16 div tests that were enabled on ROCm with incorrect tolerance values in the PR https://github.com/pytorch/pytorch/pull/38621 Also disabled(to unblock CI) `test_addcdiv*` for which the error is large when absolute values in the tensor are higher. This will have to be investigated further. ezyang jeffdaily sunway513 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38823 Differential Revision: D21686290 Pulled By: ezyang fbshipit-source-id: 85472680e1886bdc7c227ed2656e0b4fd5328e46	2020-05-21 07:29:49 -07:00
Ralf Gommers	d363cf4639	Fix incorrect __torch_function__ handling in einsum (#38741 ) Summary: Closes gh-38479 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38741 Differential Revision: D21662512 Pulled By: ezyang fbshipit-source-id: 247e3b50b8f2dd842c03be8d6ebe71910b619bc6	2020-05-21 06:59:25 -07:00
Nikita Shulga	664a3ab5c7	Enable py38 gcc9 build config (#38805 ) Summary: Add `py38-gcc9` build-only config Add appropriate `-Wno-xyz` flags to ATEN kernels as well as `tensorexp/llvm_jit.cpp` and `tensorexp/llvm_codegen.cpp` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38805 Differential Revision: D21682953 Pulled By: malfet fbshipit-source-id: 5b61d0dfe8bdec8fb13e2ae5857dc5e7c6e58e42	2020-05-21 01:38:04 -07:00
Yinghai Lu	e9902358df	Support fp16 output in OnnxifiOp (#38846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38846 We begin to have fp16 inputs/outputs. Adding this will help with the debugging. Test Plan: run. Reviewed By: jfix71 Differential Revision: D21676805 fbshipit-source-id: 47788e631164d24aef0f659b281c59822b009e18	2020-05-20 22:50:24 -07:00
Kimish Patel	65e8fe1832	Perf optimization for conv and gemm kernels. (#37626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37626 Did some rescheduling of the instructions to hide latency of the loads. Particularly at the start of the kernel we have latency bound chains. It seems to improve perf form aarch32. Also did some inst rescheduling for aarch64 gemm kernel. Not clear if this actually helps with perf espcially in OOO CPUs, but worth a try. Test Plan: qnnpack tests q8gemm-test Imported from OSS Differential Revision: D21339037 fbshipit-source-id: 0469581a0e3bd3fd04f15200c2171fc8c264722b	2020-05-20 21:11:34 -07:00
Vasiliy Kuznetsov	0b2a861507	convbn fusion: add backwards compatibility support (#38820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38820 Missed this on https://github.com/pytorch/pytorch/pull/38478/ - the conv BN refactor was not backwards compatible because it changed the state dict keys. This PR adds logic to load the state dict from the old format. Test Plan: create ConvBn2d module instance and save state dict: https://gist.github.com/vkuzo/5ed4701c122f629a51988d0748a3223e load ConvBn2d from state dict: https://gist.github.com/vkuzo/f97cb52057b6c7792920b8ae407f646b verify that all valid permutations of above between v1 and v2 work correctly Imported from OSS Differential Revision: D21671329 fbshipit-source-id: 91b9ce88f99500bf4f1868ba638f1c90a594f0da	2020-05-20 20:38:46 -07:00
Vasiliy Kuznetsov	4d5d9c0455	qat syncbn: add test coverage (#38738 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38738 Adds test coverage for swapping BN -> SyncBN on a fused ConvBN module. Test Plan: ``` python test/test_quantization.py TestDistributed.test_qat_convbn_fused_syncbn_replacement ``` Imported from OSS Differential Revision: D21648320 fbshipit-source-id: 3f7f71ec7b34d7d784dcbce9974c525b5db35942	2020-05-20 20:37:13 -07:00
Jerry Zhang	a8d8fc5532	[quant][graphmode] Different rule for add/add_/mul/mul_ (#38667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38667 Test Plan: Imported from OSS Differential Revision: D21633555 fbshipit-source-id: 03b0298e83bf4dbda41b048c0edc7bb92cd4e1df	2020-05-20 19:43:46 -07:00
Nikita Shulga	57d6e19d6f	Use union to cast between incompatible function pointers (#38842 ) Summary: This fixes `can not cast between incompatible function types` error if code is compiled by gcc-9.3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38842 Differential Revision: D21676360 Pulled By: malfet fbshipit-source-id: d8b05d8381bfc961e06981731ebca87a516c2811	2020-05-20 19:38:18 -07:00
svcscm	6cf5c71b32	Updating submodules Summary: GitHub commits: `909926d1ee` `8abb78f423` `ac53e737cf` `0b892bcbfb` `eb04bb86c6` `0c2c715235` `5c5e7ad98c` `450e1aaae6` `60e318d48d` `b4b0ff439e` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 11bfde5db57254d449c0d5fb4cea1a895432989c	2020-05-20 19:30:05 -07:00
Gao, Xiang	48116ac8d0	Revert "Revert D21593870: Kill AT_DISPATCH_ALL_TYPES_AND_C10_COMPLEX_AND2" (#38814 ) Summary: The failure was caused by cross merge conflicts. A new use of `AT_DISPATCH_ALL_TYPES_AND_C10_COMPLEX_AND2` at `ATen/native/cuda/TensorTransformations.cu` was added before the reverted PR merged. See `c73523a4c3` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38814 Differential Revision: D21670650 Pulled By: malfet fbshipit-source-id: 867636cdb0106cb1275617ad2e355736d5d77210	2020-05-20 19:23:26 -07:00
Nik Ved	f80df4ca79	port `scatter_add` to ATen (CUDA) (#38262 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/24622 ](https://github.com/pytorch/pytorch/issues/24622). Pull Request resolved: https://github.com/pytorch/pytorch/pull/38262 Differential Revision: D21656729 Pulled By: ngimel fbshipit-source-id: 63dcbf8eeaf59d8295bf4e5c8bb9d28ad165d4eb	2020-05-20 19:03:41 -07:00
Ilia Cherniavskii	83fa3f1c36	Add HIP to the memory profiler device list (#38795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38795 Add HIP alongside CUDA Test Plan: rocm CI Differential Revision: D21665627 Pulled By: ilia-cher fbshipit-source-id: 76ddf0a45094a9003f1d0d4ac94cf5e970535fd1	2020-05-20 18:59:21 -07:00
Nikita Shulga	c02e7c464a	Replace import cpp_benchmark with `torch.utils.cpp_benchmark` (#38832 ) Summary: Otherwise, I don't understand how those could have been invoked Also, what is the benefit of importing the same module twice? Pull Request resolved: https://github.com/pytorch/pytorch/pull/38832 Differential Revision: D21675081 Pulled By: malfet fbshipit-source-id: fee5604c4c433161b6b1a999d505b5acbbc3b421	2020-05-20 18:53:09 -07:00
James Webber	9c88b23fa0	[bug] Binomial distribution BTRS algorithm has small chance of returning -1 (#38456 ) Summary: I was so excited to take advantage of https://github.com/pytorch/pytorch/issues/36858 getting merged that I installed the nightly build, and I'm glad I did! It turns out that there's a _very small_ chance that the current algorithm will return a negative value (I imagine only -1 is possible but not sure about that). Basically the logic [here](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/Distributions.h#L198-L213), which returns a value that passes certain checks before checking if its negative. I can't figure out the particular range that causes this but could reproduce it by taking a billion samples with `count` 1 and `prob` 0.9: ```python ( torch.distributions.Binomial( total_count=torch.tensor(1.0).cuda(), probs=torch.tensor(0.9).cuda() ).sample(torch.Size((1000000000,))) >= 0 ).all() ``` Reliably evaluates to `tensor(False, device='cuda:0')` on my machine. 100M samples usually does it but not always, so that's around the rate at which this crops up (it took me most of a whole day to run into it!). Seems to be CUDA specific, I imagine due to some esoteric reason I cannot begin to guess. This PR tries to solve it in the most obvious way: reject negative values _before_ testing the bounding box, not after. But a better solution is probably to figure out why this occurs at all, and stop it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38456 Differential Revision: D21664886 Pulled By: jerryzh168 fbshipit-source-id: 99b0eed980e214bede484c100388a74d8c40ca55	2020-05-20 17:49:40 -07:00
Kimish Patel	267a8da1bb	Fix broken windows build due per channel quantization stack land. (#38828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38828 Move newly introduced functions in qnnpack_utils.h inside the ifdef. Test Plan: CI. Reviewed By: malfet Differential Revision: D21672942 fbshipit-source-id: 32e23bb45f5f3f882618e91435a1ae9f80781f97	2020-05-20 17:44:58 -07:00
Ailing Zhang	a7a69ad104	Fast path for contiguous tensor (#38732 ) Summary: A local run shows it improves running 2000 guards time from 0.00282s to 0.00187s (~30%). This is for the case when tensor is contiguous, we don't have to recompute whether it's contiguous from stride for each dimension. We can further optimize other cases if there's a repro script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38732 Differential Revision: D21664191 Pulled By: ailzhang fbshipit-source-id: 125950f20c8676afc447f1d27ce4d14bbd445918	2020-05-20 16:25:11 -07:00
jiej	5b8a79ab49	fix the device inconsistency for import convert_sync_batchnorm (#38729 ) Summary: This fixes the device inconsistency reported in https://github.com/pytorch/pytorch/issues/37930 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38729 Differential Revision: D21671039 Pulled By: ngimel fbshipit-source-id: 17fdb4eae2ddaf64560dd026fe39958536ab313f	2020-05-20 15:42:53 -07:00
Mingzhe Li	6736a76cec	Back out "[RPC] [Minor] RPC entry point cleanup" Summary: Original commit changeset: b509c47fb612 (Note: this ignores all push blocking failures!) Reviewed By: xush6528 Differential Revision: D21669711 fbshipit-source-id: e452a513a2d22eaa3bffa333fdb3277fabc24b41	2020-05-20 15:35:24 -07:00
Nikita Shulga	604533ddfa	[CircleCI] Add Python3.8-gcc9 config (#38747 ) Summary: `pytorch-linux-bionic-py3.8-gcc9` is based on Ubuntu 18.04 using gcc-9 and python-3.8 `pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9` adds CUDA-10.2 to the same configuration Also this in this PR: - Updates valgrind to 3.15.0 - Fixes bug when gcc-5.5 were used in gcc-5.4 configurations - Do not install `typing` when installing Python-3.8 from Conda - Install `llvmdev-8` to for `numba/llvmlite` package compilation to succeed Pull Request resolved: https://github.com/pytorch/pytorch/pull/38747 Differential Revision: D21670093 Pulled By: malfet fbshipit-source-id: 995dfc20118a6945b55a81ef665a0b80dab97535	2020-05-20 14:31:46 -07:00
Kimish Patel	5b1814e44d	Added per channel separate test cases for fc and deconv tests. (#37624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37624 Test Plan: qnnpack tests fully-connected-test deconvolution-test Imported from OSS Differential Revision: D21339038 fbshipit-source-id: c1f6e9c39de51ab4ab18cd29055b5879e3137f1a	2020-05-20 14:01:48 -07:00
Kimish Patel	0a554aeed5	Changes to enable per channel support on dynamic linear. (#37623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37623 Follows the same strategy as static linear. Same kernel now supports both per-channel and per-tensor linear. Fixed fully connected test. Test Plan: qnnpack tests q8gemm fully-connected-test Imported from OSS Differential Revision: D21339040 fbshipit-source-id: 479d847c16b42c926acb67357dc3bdd2d0bd6ca4	2020-05-20 14:01:43 -07:00
Kimish Patel	b8eae1e3b1	Enabled per channel quantized static linear/conv (#37622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37622 Enable channelwise quantized test on qlinear and qconv. Dynmaic linear to follow. Test Plan: pytest test/quantization/test_quantized.py pytest test/quantization/test_quantized_module.py Imported from OSS Differential Revision: D21339046 fbshipit-source-id: 32377680d3a6424ca1e98d3707b82839eeb349a7	2020-05-20 14:01:37 -07:00
Kimish Patel	1c9a110b22	Added per channel kernels for depthwise conv. (#37621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37621 Due to potential perf issues with using same depthwise conv kernels for perf channel depthwise conv, we opt for replicating the kernels and adding per channel support to them. Note that the large kernels files are largely duplication of original kernels. Assembly kernels have little more modifications than intrinsics ones. Test Plan: qnnpack tests. q8dwconv-test convolution-test Differential Revision: D21339042 Pulled By: kimishpatel fbshipit-source-id: f2c3413e1e1af0b1f89770b5e0f66f402d38aee8	2020-05-20 14:01:31 -07:00
Kimish Patel	1f16d4ce1c	Changes to enable per channel requant. (#37620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37620 Now channel wise quantization is supported for linear/conv. Depthwise conv are still pending. Tests are altered to generate per channel zero points and requant scales. All the kernels are fixed appropritately. Added per_channel member to conv_param structure. And replicated conv tests to exercise per_channel conv. This was not strictly needed since conv kernels were changed such that they did per channel anyway. When per channels is not needed zp and scale were same across channels. This was to minimize code duplicaiton as the perf impact is estimated (to be measured though) to be low. However this is not likely the case for depthwise convs. Thus they will have separate kernels, which required us to introduce per_channel member to conv_param structure, to know which kernels to apply for depthwise. Ensuing modifications were to keep everything in sync for both regular conv and depthwise so that we dont have caveat when reading the code, that why does depthwise have separate test for per channel and non-depthwise conv does not. Test Plan: Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test. fully-conntected-test, convolution-test. Imported from OSS Differential Revision: D21339041 fbshipit-source-id: 1b8fbd7fbd0fe0582a43996147171567b126d948	2020-05-20 14:01:26 -07:00
Kimish Patel	622f5b68f0	Enable per channel zero point. (#37619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37619 This PR introduces changes to add per channel zero point. Modifies kernels appripriately. Some bug fixes in enabling per channel zero point. Test Plan: Via tests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test. fully-conntected-test, convolution-test. Imported from OSS Differential Revision: D21339044 fbshipit-source-id: fb69488b2b04da109c69f3dd1e8a285babf2863d	2020-05-20 14:01:20 -07:00
Kimish Patel	f1991ca8e7	Interface changes to enable per channel quant. (#37618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37618 This does not do any actual changes. Just introduces API changes and some data struct changes to hold vector of data for zero point and scale. Test Plan: Via unittests inside qnnpack, i.e., q8gemm-test, q8conv/dwconv test. fully-conntected-test, convolution-test. PT's quantization tests. Imported from OSS Differential Revision: D21339039 fbshipit-source-id: 4a20cff9795a105ddd31482d1f1fe2b1dbe18997	2020-05-20 13:59:47 -07:00
Nikita Shulga	96d7defb4b	Revert D21593870: Kill AT_DISPATCH_ALL_TYPES_AND_C10_COMPLEX_AND2 Test Plan: revert-hammer Differential Revision: D21593870 Original commit changeset: b4edaa001e76 fbshipit-source-id: 6ccbe58fa58c1b529cb953e87d3235831765c856	2020-05-20 13:25:05 -07:00
kshitij12345	3b254acd99	support complex types for tanh_cuda and tanh_backward_cuda (#38786 ) Summary: Builds on https://github.com/pytorch/pytorch/issues/37791 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38786 Differential Revision: D21666138 Pulled By: anjali411 fbshipit-source-id: cbd313b8fd21109aadd614c60259b9dc505771a5	2020-05-20 12:57:40 -07:00
Xiang Gao	51b25218c0	Remove deprecated cuDNN API from caffe2 (#38680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38680 Differential Revision: D21656332 Pulled By: ngimel fbshipit-source-id: 8cf8040d68d849848cc0e0ad35849a5757f7eaf8	2020-05-20 12:55:58 -07:00
Peter Bell	90400f48fc	Enforce tensorboard minimum version as 1.15 (#35952 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34028 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35952 Differential Revision: D20870095 Pulled By: sanekmelnikov fbshipit-source-id: f7de26a538841d832df3179f49dfa2145e98fcdc	2020-05-20 12:41:02 -07:00
Xiang Gao	4b248393b7	Kill AT_DISPATCH_ALL_TYPES_AND_C10_COMPLEX_AND2 (#38459 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38459 Test Plan: Imported from OSS Differential Revision: D21593870 Pulled By: anjali411 fbshipit-source-id: b4edaa001e767e9c93bc75e907ee157ec866568a	2020-05-20 12:25:38 -07:00
Xiang Gao	c82b873dbf	Migrate CPU min max to c10::complex (#38461 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38461 Test Plan: Imported from OSS Differential Revision: D21663871 Pulled By: anjali411 fbshipit-source-id: 649a5af9ec7b428b155ed02740ae845f65a849be	2020-05-20 12:14:08 -07:00
Yinghai Lu	c039540d10	[Onnxifi] optimize the dispatcher ordering (#38766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38766 We will most likely hit int32, float16 and float inputs are Onnxifi inputs. Test Plan: runs Reviewed By: ipiszy Differential Revision: D21658148 fbshipit-source-id: c51917c29e223051c5dfa1c21788c6d620539562	2020-05-20 12:04:32 -07:00
Jiakai Liu	d8b9448c62	[pytorch] reorder tracer code in generated VariableTypes (#38308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38308 This PR doesn't add any new functionality. The purpose of this PR is to validate reordering tracing code in variable kernel doesn't break anything (which is a prerequisite of stacked change of moving tracing logic into a new dispatch backend). And it will be easier to bisect in case it breaks something which is not covered by tests. Test Plan: Imported from OSS Differential Revision: D21570685 Pulled By: ljk53 fbshipit-source-id: 616b6434326df8381fb6f07c7b9aadac86dd02b4	2020-05-20 11:47:21 -07:00
Jeff Daily	cae45e416e	add skipIfRocm to TestAutograd.test_memory_profiler (#38790 ) Summary: CC ezyang xw285cornell sunway513 Skip new test until triage of ROCm CI can be completed. Test added by a94fb71b126001630d3d1e350347c20977f14ec0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38790 Differential Revision: D21665404 Pulled By: xw285cornell fbshipit-source-id: c03227a91c9d06f8c0ff50f4593baa9ecb507743	2020-05-20 11:41:24 -07:00
Mingfei Ma	fe66bdb498	port masked_select from TH to ATen and optimize perf on CPU (#33269 ) Summary: This PR ports `masked_select` from TH to ATen and optimize the performance on CPU with TensorIterator. https://github.com/pytorch/pytorch/issues/33053 1. single socket run: up to 5.4x speedup; 2. single core run: up to 1.16x speedup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33269 Differential Revision: D20922288 Pulled By: ngimel fbshipit-source-id: 38e183a4e3599bba29bbbebe36264026abe1c50e	2020-05-20 11:36:29 -07:00
Xiang Gao	f4f0dd470c	Migrate CPU clamp to c10::complex (#38460 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38460 Test Plan: Imported from OSS Differential Revision: D21663855 Pulled By: anjali411 fbshipit-source-id: 2fa5e17ec12f4eabeb58c5edb8f2459407b1b3f9	2020-05-20 10:56:42 -07:00
Ailing Zhang	ca1978c9db	For jobs need a merge, merge with origin/master for ghstack PRs. (#38745 ) Summary: ghstack PRs has target branch changed to `gh/xxx/1234/base` so the merge didn't work. Change it to `master` by default. IIRC we don't use ghstack with release branches so this should be good? cc: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/38745 Differential Revision: D21663796 Pulled By: ailzhang fbshipit-source-id: 3d2c7b91b0e355dc8261d8f1e7da76af8d3bcee4	2020-05-20 10:34:31 -07:00
Shawn Zhong	8666ea0cd1	Remove duplicated entries in `native_functions.yaml` (#38389 ) Summary: `use_c10_dispatcher: full` appears twice in some entries. This PR removes duplicated ones. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38389 Differential Revision: D21549098 Pulled By: zou3519 fbshipit-source-id: 4e456f740d5b4d4519650c0854a273d87fbc5f09	2020-05-20 09:21:23 -07:00
nuka137	c78691b4a6	[CPU] torch.gather for complex dtypes (#36430 ) Summary: This PR resolves https://github.com/pytorch/pytorch/issues/36340 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/36430 Differential Revision: D21662139 Pulled By: anjali411 fbshipit-source-id: 361d064c1144b368afae3059c19f77abe26080a3	2020-05-20 09:15:14 -07:00
Richard Zou	a3bab37d96	Add BatchedTensorImpl (#38424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38424 On the way to adding initial vmap support, this is the implementation for BatchedTensorImpl. Vmap (in future PRs) leverages Tensors backed by BatchedTensorImpl to do its work. For more context, here is an overview of the plan to add initial vmap support. - [this PR] Add BatchedTensorImpl - Add one or two batching rules - Add vmap Python API - Add "slow" for-loop fallbacks for out-of-place functions via dispatcher fallback mechanism. - Add batching rules for "view" functions - Add "slow" for-loop fallbacks for in-place functions - Miscellaneous handling for failure cases - And more Test Plan: - `./build/bin/vmap_test` Differential Revision: D21640917 Pulled By: zou3519 fbshipit-source-id: 969490a838cf2099ed80104e7d51ee8ff069e168	2020-05-20 09:10:00 -07:00
Xiang Gao	c60daedb36	Migrate CPU eye to c10::complex (#37899 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37899 Test Plan: Imported from OSS Differential Revision: D21554155 Pulled By: anjali411 fbshipit-source-id: d8beffc0e7356effa4f434635db3d2d60263b035	2020-05-20 08:16:08 -07:00
Nikita Shulga	1465970a34	Update valgrind version build from source (#38754 ) Summary: Why not use valgrind-3.15.0? Also, build in in parallel (with -j4) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38754 Differential Revision: D21657357 Pulled By: malfet fbshipit-source-id: 22b7761a6c9672477e32f16e56f58bdcce02a75c	2020-05-19 23:29:49 -07:00
Nikolay Korovaiko	42870ddf24	Generate Dynamic Shapes (#37693 ) Summary: Yay! Pull Request resolved: https://github.com/pytorch/pytorch/pull/37693 Differential Revision: D21641663 Pulled By: Krovatkin fbshipit-source-id: 64e70138b31800371887d24ceb1c5d18945b4412	2020-05-19 23:17:54 -07:00
Nikita Shulga	9e910a95b0	Add `torch_python` and `_C` library to bazel build (#38707 ) Summary: Split `generated_sources` into `cpp_generated_sources` and `python_generated_sources` Add `shm` and `_C` library definitions Pull Request resolved: https://github.com/pytorch/pytorch/pull/38707 Test Plan: `bazel build :_C.so; pushd bazel-bin/; python -c 'import _C;print(dir(_C))'; popd Differential Revision: D21654868 Pulled By: malfet fbshipit-source-id: dd5f78c38fe58e5ab4cccd3eee42706f44af7989	2020-05-19 22:52:30 -07:00
Supriya Rao	530d48e93a	[quant] Support for fused ConvBn1d and ConvBnRelu1d modules (#38452 ) (#38749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38749 Test Plan: python test/test_quantization.py TestFused Differential Revision: D21654659 Pulled By: supriyar fbshipit-source-id: 301be24083e794f4e71ff1d6d842e1aaefa640f0	2020-05-19 22:48:05 -07:00
Mike Ruberry	7587188037	Skips test_float_to_int_conversion_finite on MacOS (#38753 ) Summary: See https://github.com/pytorch/pytorch/issues/38752. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38753 Differential Revision: D21656330 Pulled By: mruberry fbshipit-source-id: f1f97228f31b8a0b0535b3168a7d209fefff2769	2020-05-19 21:56:48 -07:00
Shen Li	40ce90bfc1	Revert D21560096: [Tensorpipe Agent] Enabling tests with OSS CI Test Plan: revert-hammer Differential Revision: D21560096 Original commit changeset: 7d61cc1c354e fbshipit-source-id: 6adfd87e354545031203d65d04f0bad4687a93cd	2020-05-19 19:39:33 -07:00
Mike Ruberry	64584573f9	Updates tests for integer division deprecation (#38621 ) Summary: Updates our tests in preparation of integer division using torch.div and torch.addcdiv throwing a runtime error by avoiding integer division using torch.div. This creates a brief period where integer division using torch.div is untested, but that should be OK (since it will soon throw a runtime error). These callsites were identified using https://github.com/pytorch/pytorch/issues/36897. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38621 Differential Revision: D21612823 Pulled By: mruberry fbshipit-source-id: 749c03a69feae02590b4395335163d9bf047e162	2020-05-19 19:28:00 -07:00
Sebastian Messmer	5af4e76683	Back out "Revert D21530545: Remove call_unboxed_super_slow_temp_shim" (#38742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38742 Original commit changeset: af9013ed37d2 ghstack-source-id: 104397898 Test Plan: waitforsandcastle Differential Revision: D21651660 fbshipit-source-id: 8bb56eb8abd43fd01d1468f104babe92a09d2ad4	2020-05-19 18:23:20 -07:00
Nikita Shulga	5fb26b1022	Delete cuda9-cudnn7 build which is not defined in build.sh (#38750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38750 Test Plan: `grep -R pytorch-linux-xenial-cuda9-cudnn7-py3 .circleci` Differential Revision: D21654262 Pulled By: malfet fbshipit-source-id: a20cba15e9a24e9cbca7a8111d9149a9ae725886	2020-05-19 18:19:52 -07:00
Lara Haidar	9907a3eb65	Update Argmin/Argmax ONNX Export (#38329 ) Summary: Update Argmin/Argmax ONNX export in opset 12 to export with "select_last_index", and export correctly cases where the same value appears multiple time in the input tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38329 Reviewed By: hl475 Differential Revision: D21613799 Pulled By: houseroad fbshipit-source-id: 4597e23561f444c4e56d30c735dae7e9a8a41c5e	2020-05-19 16:56:33 -07:00
Xiang Gao	cbd0adc7b4	Migrate CPU unary ops to c10::complex (#37898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37898 Test Plan: Imported from OSS Differential Revision: D21554156 Pulled By: anjali411 fbshipit-source-id: 846319dd08d0e3ed3d387cf484360508fb123c81	2020-05-19 16:54:02 -07:00
Nikita Shulga	bcf8973654	Add `torch.utils.cmake_prefix_path` pointing to `share/cmake` folder (#38559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38559 Test Plan: Make sure that `cmake path/to/CMakeLists.txt -DCMAKE_PREFIX_PATH=`python -c 'import torch;print(torch.utils.cmake_prefix_path)'` succeeds for CMake projects which depends on Torch package Differential Revision: D21644066 Pulled By: malfet fbshipit-source-id: c8e3cb2cbd7f969fadea6a3bccc41c4edb3ec546	2020-05-19 16:49:16 -07:00
Sebastian Messmer	363a2d9455	Revert D21530545: Remove call_unboxed_super_slow_temp_shim Test Plan: revert-hammer Differential Revision: D21530545 Original commit changeset: cdfb801e5519 fbshipit-source-id: af9013ed37d27bf8dca859902918c02eb8cceeb4	2020-05-19 16:07:36 -07:00
Ilia Cherniavskii	235f62417d	Fixes for profiling JIT code (#38453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38453 Two fixes: - RecordFunction in JIT interpreter should exist during the execution of the frame, and not just when we enter the frame - When creating a JIT continuation in wait instruction, we'd want to preserve the original thread local context, right now when we resume execution in continuation we preserve the thread local state of the thread that set future value (i.e. executed a forked task) Test Plan: unittest, CI Reviewed By: ngimel Differential Revision: D21565959 Pulled By: ilia-cher fbshipit-source-id: 206b98e3bfb0052fc8e4031da778e372cc71afc1	2020-05-19 15:50:42 -07:00
Ilia Cherniavskii	a94fb71b12	Memory profiling (#37775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37775 Adding memory usage into profiler table output Test Plan: BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install --cmake ``` import torch import torchvision.models as models model = models.resnet18() inp = torch.randn(5, 3, 224, 224) with torch.autograd.profiler.profile(profile_memory=True, record_shapes=True) as prof: model(inp) print(prof.key_averages(group_by_input_shape=True).table(sort_by="cpu_memory_usage", row_limit=15)) ``` ``` --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg CPU Mem Total Number of Calls Input Shapes --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- resize_ 0.37% 577.936us 0.37% 577.936us 9.796us 339.03 Mb 59 [[0]] empty 0.69% 1.061ms 0.74% 1.139ms 5.556us 47.42 Mb 205 [] stride 0.00% 0.853us 0.00% 0.853us 0.853us 19.53 Kb 1 [[5, 1000]] empty_strided 0.01% 21.393us 0.02% 26.033us 5.207us 252 b 5 [] is_complex 0.02% 37.425us 0.02% 37.425us 1.291us 208 b 29 [[]] masked_select 0.04% 55.333us 0.06% 93.616us 46.808us 120 b 2 [[30], [30]] conv2d 0.01% 18.009us 9.62% 14.902ms 14.902ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ convolution 0.01% 12.436us 9.61% 14.884ms 14.884ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ _convolution 0.03% 52.381us 9.60% 14.871ms 14.871ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ size 0.00% 5.429us 0.00% 5.429us 0.339us 0 b 16 [[5, 3, 224, 224]] contiguous 0.00% 1.934us 0.00% 1.934us 0.967us 0 b 2 [[5, 3, 224, 224]] _convolution_nogroup 0.02% 27.505us 9.57% 14.814ms 14.814ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ _nnpack_available 0.02% 34.267us 0.02% 34.267us 1.713us 0 b 20 [] thnn_conv2d 0.01% 13.274us 9.54% 14.771ms 14.771ms 0 b 1 [[5, 3, 224, 224], [64, 3, 7, 7], [ thnn_conv2d_forward 5.98% 9.264ms 19.02% 29.446ms 14.723ms 0 b 2 [[5, 3, 224, 224], [64, 3, 7, 7], [ --------------------------- --------------- --------------- --------------- --------------- --------------- --------------- --------------- ----------------------------------- Self CPU time total: 154.855ms ``` Reviewed By: ngimel Differential Revision: D21384248 Pulled By: ilia-cher fbshipit-source-id: 31359cce2aa06f6255ed1ad8c60d03cb640bfec3	2020-05-19 15:48:48 -07:00
Xinyu Li	24b48372b9	Revert D21626921: override gcc version in cuda related test Test Plan: revert-hammer Differential Revision: D21626921 Original commit changeset: b645845aa831 fbshipit-source-id: 148a2ee5184b0252ff7f31131ab87671673235bf	2020-05-19 15:41:08 -07:00
Natalia Gimelshein	b995540a01	Revert D21632878: [quant] Support for fused ConvBn1d and ConvBnRelu1d modules Test Plan: revert-hammer Differential Revision: D21632878 Original commit changeset: 0d73398b95d7 fbshipit-source-id: c4dd18a4220d175237f31f741a782f2596228009	2020-05-19 15:22:16 -07:00
Jeff Daily	87b198d309	add distributed/test_nccl to ROCM_BLACKLIST (#38730 ) Summary: CC ezyang xw285cornell sunway513 Work-around for recent ROCm CI failures due to 9cfc10d52e0d0a8576b0a5a347fa6fa8da86244a (https://github.com/pytorch/pytorch/issues/37294). Replaces full revert suggested by PR https://github.com/pytorch/pytorch/issues/38689. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38730 Differential Revision: D21648707 Pulled By: xw285cornell fbshipit-source-id: 627b11b229c7eadca1f6e0c6192c6b5b6416e6a1	2020-05-19 14:45:50 -07:00
Shihao Xu	befc76bb65	[RPC] [Minor] RPC entry point cleanup (#34292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34292 This is to finish a cleanup request from https://github.com/pytorch/pytorch/pull/34733#discussion_r392479110. ghstack-source-id: 104361618 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_class_rref_in_py_and_use_in_script buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_module_rref_in_py_and_use_in_script ``` Differential Revision: D7436759 fbshipit-source-id: b509c47fb612ec3486ff1199c005eba69480ee05	2020-05-19 14:23:11 -07:00
Sebastian Messmer	423a00ad39	Remove call_unboxed_super_slow_temp_shim (#38351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38351 ghstack-source-id: 104368838 Test Plan: waitforsandcastle Differential Revision: D21530545 fbshipit-source-id: cdfb801e551993ecb339f3f8ec7c9b3039766989	2020-05-19 14:19:28 -07:00
ShawnZhong	959afe0726	Overload bitwise NOT, AND, OR, XOR operators for `at::Tensor` (#38691 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38691 Differential Revision: D21640554 Pulled By: ezyang fbshipit-source-id: 407f210a74d35837abf5a68b82ad5ab8d2d3902d	2020-05-19 14:16:07 -07:00
peter	ab169fa5ac	Fix find_first_set for x86 MSVC (Updated) (#38706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38706 Differential Revision: D21640477 Pulled By: ezyang fbshipit-source-id: fc7ff5c35fc3f776553e93f485532cc805a2af9c	2020-05-19 14:14:56 -07:00
Omkar Salpekar	87aa2d25ae	[Tensorpipe Agent] Enabling tests with OSS CI (#38447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38447 This PR modifies `run_tests.py` to enable running Tensorpipe Agent tests with the OSS CI. ghstack-source-id: 104321881 Test Plan: CI Differential Revision: D21560096 fbshipit-source-id: 7d61cc1c354e9353c4a586dd2b56690c28d51d10	2020-05-19 13:34:06 -07:00
Omkar Salpekar	b2991c105a	[Tensorpipe Agent] Dist Optimizer Tests for Tensorpipe Agent (#38446 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38446 This PR enables the Distributed Optimizer tests for the Tensorpipe Agent - all of them are currently passing so there is no need to skip any tests. ghstack-source-id: 104321883 Differential Revision: D21560097 fbshipit-source-id: 316971b96b632f12326872a51fd9124c9eae4720	2020-05-19 13:34:00 -07:00
Omkar Salpekar	b782ad3b9e	[Tensorpipe Agent] Dist Autograd Tests for Tensorpipe Agent (#38445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38445 This PR enables the Distributed Autograd tests for the Tensorpipe Agent. A decorator is used to skip all tests that are currently failing due to functionality lacking in the Tensorpipe RPC Agent (primarily timeouts and error handling). ghstack-source-id: 104321884 Differential Revision: D21560098 fbshipit-source-id: 2564bfc96d196f35ef0dfb9de59791fcd29093cf	2020-05-19 13:33:55 -07:00
Omkar Salpekar	7492e98c7f	[Tensorpipe Agent] RPC, RRef tests for Tensorpipe Agent (#38444 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38444 This enables the RPC/RRef test suites to run with the Tensorpipe RPC Agent. This creates a new fixture to ensure the backend/options used are Tensorpipe, as well as a decorator to skip tests that Tensorpipe currently cannot support due to missing functionality. One small note: the decorator function is a class method of the test class so we can check whether `self.rpc_backend` is tensorpipe. In the class-scope, the `TEST_CONFIG.rpc_backend_name` string is set to Tensorpipe, but outside the class scope, it is PGA, possibly due to importing dist_utils which sets this config to PGA by default. The cleanest solution would be to refactor the backend selection to be more uniform (since currently every backend is set slightly differently), but that would be a longer-term fix. ghstack-source-id: 104321885 Test Plan: Note: A couple of these tests will fail right now due to missing features. I've skipped the ones that regularly fail, but there will be some flaky tests that still fail occasionally. The decorator `@_skip_if_tensorpipe_agent` skips the tests that fail with the Tensorpipe Agent. Remove this decorator from above the tests once they are fixed. Differential Revision: D21412016 fbshipit-source-id: 1e801ac5ccaf87974dd4df92d556895b01468bf3	2020-05-19 13:32:58 -07:00
Jeff Daily	55914f8e83	Add skipCUDAIfRocm to test_nn test_softmax_results. (#38724 ) Summary: CC ezyang xw285cornell sunway513 Commit 59d92e442b88eae51b84adc4e902e36e8f12a4db (https://github.com/pytorch/pytorch/issues/38557) has caused this test to regularly fail on ROCm CI gfx900 hosts. Skipping test until root cause analysis can complete. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38724 Differential Revision: D21645815 Pulled By: xw285cornell fbshipit-source-id: 4087e9565710c271ca5c026a5ae0c5132e56f44d	2020-05-19 13:20:34 -07:00
Mingfei Ma	9ad14f6b43	cover nn.Conv1d in mkldnn model conversion logic (#38528 ) Summary: current `to_mkldnn` model conversion logic under `torch.utils.mkldnn` does not cover `nn.Conv1d`. This patch fills the gap, using similar logic to `nn.Conv2d`. The model conversion will remove unnecessary memory format reorders of input/output tensors and thus speedup the model. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38528 Differential Revision: D21640325 Pulled By: albanD fbshipit-source-id: c3340153b5c524e020c097eb4b9e2ffcbde8896d	2020-05-19 13:04:18 -07:00
Will Constable	6fd48e24f1	Add support, test for kwargs in jit._fork (#38357 ) (#38665 ) Summary: Closing 38357 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38665 Reviewed By: suo Differential Revision: D21643697 Pulled By: wconstab fbshipit-source-id: c292c037f87bc2bb69a4ca163d7107d5396c53a2	2020-05-19 13:02:46 -07:00
Mike Ruberry	819da00b3d	Fixes floordiv dunder registrations (#38695 ) Summary: floordiv was missing a couple dunder registrations, which was causing __ifloordiv__ to not be called when it should. This adds the appropriate registrations and adds a test verifying that the inplace dunders are actually occuring inplace. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38695 Differential Revision: D21633980 Pulled By: mruberry fbshipit-source-id: a423f5ec327cdc062fd6d9d56abd36fe44ac8198	2020-05-19 12:11:38 -07:00
Jerry Zhang	1ef77f9045	[quant][graphmode] Different rule for handling `aten::cat` (#38570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38570 We changed the rule of quantizing `aten::cat`, previously `aten::cat` is considered to be an op that should always be quantized, like `aten::conv2d`, but this is not ideal, a better way is to quantize the output of `aten::cat` depending on whether the input is quantized, if it is then we'll quantize the output, if not, then we will not quantize the output, since `aten::cat` works both on quantized and non-quantized tensor. Test Plan: Imported from OSS Differential Revision: D21600160 fbshipit-source-id: efa957e0eaa608fffefcdfefa7f442fab45605eb	2020-05-19 11:23:35 -07:00
Yan Zhu	dfbf9f397f	Back out "Back out "[c2] register cuda op for LpNorm (fallback)"" (#38566 ) Summary: Previously we got a CI issue in original submission (D21562485), so we backout the original diff (D21588831). Resubmitting here to reprod the CI issue and ask caffe2 dev to take a look. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38566 Original commit changeset: 6dda4b71904d Test Plan: buck test Reviewed By: houseroad Differential Revision: D21589352 fbshipit-source-id: de40ff2884019e14476e31c4c952f24d6e438f5f	2020-05-19 10:37:25 -07:00
Natalia Gimelshein	54d4b419db	fix clip_grad_norm to work with parameters on the different devices (#38615 ) Summary: Per title. We move all the individual gradient norms to a single device before stacking (no-op if all the gradients are already on a single device), `clip_coef` is copied to the device of gradient, which may be suboptimal as there could be multiple copies, but no worse than when we were synchronizing for each parameter. In a simple case of all gradients on a single device, there should be no synchronization. Also, we no longer error out if parameter list is empty or none of the parameters have gradients, and return 0 total_norm instead. Fixes https://github.com/pytorch/pytorch/issues/38605 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38615 Reviewed By: ailzhang Differential Revision: D21634588 Pulled By: ngimel fbshipit-source-id: ea4d08d4f3445438260052820c7ca285231a156b	2020-05-19 10:33:40 -07:00
Pavel Belevich	b14734d92e	Add bfloat16 to CPU cauchy_kernel, log_normal_kernel, exponential_kernel (#38427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38427 Test Plan: Imported from OSS Differential Revision: D21640640 Pulled By: pbelevich fbshipit-source-id: 9cff8f6b5c33b3b31753c76fc8033d329b218019	2020-05-19 10:21:36 -07:00
Pavel Belevich	35beff0b9f	RNG infrastructure improvements (#37984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37984 - `NumericUtils.h` CUDA distribution kernels had two variants of transformation labdas(`uniform`/`normal` -> `lognormal`/`exponential`/`cauchy`/`geometric`...): for double-precision and optimized for CUDA single precision. It was done by using `::log`/`__logf`, `::exp`/`__expf` and `::tan/__tanf`. I moved them to `NumericUtils.h` and called them `at::exp`, `at::log` and `at::tan`. It allowed to unify CPU/CUDA transformation templates in `TransformationHelper.h`. - `DistributionsHelper.h` Made `normal_distribution`, `geometric_distribution`, `exponential_distribution`, `cauchy_distribution`, `lognormal_distribution` C10_HOST_DEVICE compatible to reuse them in CPU/CUDA distribution kernels. Replaced explicit math with transformations from `TransformationHelper.h` - `TransformationHelper.h` Renamed `_transformation` to `transformation::` Added clear unified host/device transformations templates `normal`, `cauchy`, `exponential`, `geometric`, `log_normal` which are used by both CPU and CUDA distribution kernels and custom PRNG distribution kernels. - `cpu/DistributionTemplates.h` Unified `normal_kernel`, `cauchy_kernel`, `log_normal_kernel`, `geometric_kernel`, `exponential_kernel`. - `cuda/DistributionTemplates.h` Extracted `UNIFORM_AND_TRANSFORM` and `NORMAL_AND_TRANSFORM` macros to reuse code between distribution kernel templates. Unified transformation labdas(`uniform`/`normal` -> `lognormal`/`exponential`/`cauchy`/`geometric`...) - `test_torch.py` Added `scipy.stats.kstest` [Kolmogorov–Smirnov](https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test) tests for `uniform`/`normal`/`lognormal`/`exponential`/`cauchy` distributions and [Chi-squared](https://en.wikipedia.org/wiki/Chi-squared_test) test for `geometric` one. To make sure that our distributions are correct. - `cpu_rng_test.cpp`, `rng_test.h` Fixed random_()'s from and to bounds issue for floating-point types, fixed cast/overflow warnings - `THTensorRandom.h`, `THVector.h` Moved unnecessary includes to `THTensorRandom.cpp` Test Plan: Imported from OSS Differential Revision: D21477955 Pulled By: pbelevich fbshipit-source-id: 7b793d1761a7a921c4b4a4a7d21d5d6c48f03e72	2020-05-19 10:20:39 -07:00
Supriya Rao	7d38db0f9a	[quant] Support for fused ConvBn1d and ConvBnRelu1d modules (#38452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38452 Test Plan: python test/test_quantization.py TestFused Imported from OSS Differential Revision: D21632878 fbshipit-source-id: 0d73398b95d72a0a23b42ef36f3ede1bfcc35eda	2020-05-19 09:53:56 -07:00
Nick Gibson	320c35681d	[TensorExpr] (trivial) unique Kernel input names (#38678 ) Summary: We have a bug where Function names are not uniqued which produces bad printed output, e.g: ``` { for (int i0 = 0; i0 < 1024; i0++) { input[i0] = t0[0 + i0 * 1]; } for (int i0_1 = 0; i0_1 < 1024; i0_1++) { input_1[i0_1] = t1[0 + i0_1 * 1]; } for (int v = 0; v < 1024; v++) { aten_add[v] = (input(v)) + float(1) * (input(v)); } for (int v_1 = 0; v_1 < 1024; v_1++) { aten_sub[v_1] = (aten_add(v_1)) - float(1) * (input(v_1)); } } ``` Notice the names of the vars in the `aten_add` line which make it appear as though input_1 isn't used. This is because the Buf names are uniqued by the unique_name_manager but the FunctionCall names are not. Not fixing this right now, but working around it by reducing the number of Tensors that are created with the same name ("input") in kernel.cpp. That example now looks like: ``` { for (int i0 = 0; i0 < 1024; i0++) { input1[i0] = t0[0 + i0 * 1]; } for (int i0_1 = 0; i0_1 < 1024; i0_1++) { input2[i0_1] = t1[0 + i0_1 * 1]; } for (int v = 0; v < 1024; v++) { aten_add[v] = (input1(v)) + float(1) * (input2(v)); } for (int v_1 = 0; v_1 < 1024; v_1++) { aten_sub[v_1] = (aten_add(v_1)) - float(1) * (input1(v_1)); } } ``` To be clear, the bug still exists but it's not blocking what I'm trying to do right now 😄 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38678 Differential Revision: D21630276 Pulled By: nickgg fbshipit-source-id: 39dec2178cf492302bc5a61e1e688ae81513858a	2020-05-19 09:49:24 -07:00
Vasiliy Kuznetsov	e5ada042b1	QAT ConvBN: remove explicit folding and use BN instead (#38478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38478 Before this PR, the QAT ConvBN module inlined the batch normalization code in order to reproduce Conv+BN folding. This PR updates the module to use BN directly. This is mathematically equivalent to previous behavior as long as we properly scale and fake quant the conv weights, but allows us to reuse the BN code instead of reimplementing it. In particular, this should help with speed since we can use dedicated BN kernels, and also with DDP since we can hook up SyncBatchNorm. Test Plan: ``` python test/test_quantization.py TestQATModule ``` Imported from OSS Differential Revision: D21603230 fbshipit-source-id: ecf8afdd833b67c2fbd21a8fd14366079fa55e64	2020-05-19 08:58:42 -07:00
Yunus Rahbar	8d64986202	Fix target determination file diffing (#38661 ) Summary: It seems like all this time this was accidentally doing a 3-way merge-base, oops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38661 Test Plan: ``` $ git checkout gh/mohammadmahdijavanmard/1/head $ git merge-base origin master HEAD --all 8292742ba020fcff90f14418c18741ebf606103b $ git merge-base origin/master HEAD --all 324dc1623e2f91892038fb1b151450a7c6529dd9 ``` Differential Revision: D21640939 Pulled By: yns88 fbshipit-source-id: 0f59922e7c0fd046f48fec30e8aa25c244f6dd62	2020-05-19 08:47:41 -07:00
chenx	f3b5c22dba	Update On "check-doxygen.sh must be run from docs/cpp/source director… (#38641 ) Summary: …y" & "check-doxygen.sh suppress stderr output" Fixes https://github.com/pytorch/pytorch/issues/36974 Fixes https://github.com/pytorch/pytorch/issues/36975 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/38641 Differential Revision: D21640474 Pulled By: ezyang fbshipit-source-id: f25b373a3459a1a315c009fc75fdb37d4ab6d67c	2020-05-19 07:51:38 -07:00
Michael Voznesensky	f6f1384811	[JIT] Refactor attributes to support buffers and parameters as first class citizens, add support for iterating over `named_buffers()` (#37905 ) Summary: First part of https://github.com/pytorch/pytorch/issues/36211 - still a WIP, but asking for commentary to ensure this is the direction we want to go in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37905 Differential Revision: D21633735 Pulled By: voznesenskym fbshipit-source-id: f4e4302e40114513776c9e48867a90d72049e2e9	2020-05-18 23:23:43 -07:00
Nikita Shulga	c4d3b042e8	Cleanup BUILD.bazel (#38699 ) Summary: Use recursive glob to make `aten_headers` and `torch_headers` declaration more compact Use list generator to define torch_cpp_api tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/38699 Differential Revision: D21635357 Pulled By: malfet fbshipit-source-id: ecab437d471b6be0c3caf669d4f59fcda9409249	2020-05-18 22:02:42 -07:00
Nikita Shulga	1a3f646b9c	Regenerate .circleci/config.yml (#38705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38705 Differential Revision: D21635212 Pulled By: malfet fbshipit-source-id: e14b1f150689187049530de32aca8773a8d37264	2020-05-18 21:52:57 -07:00
mjavanmard	ddfd720e5d	Redundant schema registration Prevention for Manually Boxed Wrappers (#38588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38588 Test Plan: Imported from OSS Reviewed By: ezyang Differential Revision: D21508186 Pulled By: MohammadMahdiJavanmard fbshipit-source-id: 1fc8f29d0eb107f847d3ae90e6728c38337f2808	2020-05-18 21:41:56 -07:00
lixinyu	5e55f0805f	override gcc version in cuda related test (#38675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38675 Test Plan: Imported from OSS Differential Revision: D21626921 Pulled By: glaringlee fbshipit-source-id: b645845aa831cb64078fe2309881038138abb443	2020-05-18 21:20:52 -07:00
kshitij12345	fc19747d64	handle grad with `stride=0` on GPU MvBackward (#38321 ) Summary: References : https://github.com/pytorch/pytorch/issues/38315 , https://github.com/pytorch/pytorch/issues/29984 cuBlas expects strides to be greater than 0. Cloning the `grad` allocates a new vector with non-zero strides. For CPU, we don't clone and allocate a new vector as CPU implementation works with stride=0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38321 Differential Revision: D21628966 Pulled By: ngimel fbshipit-source-id: 390caf835af6d1d77ed537b7fcc113a22c3ec301	2020-05-18 20:53:36 -07:00
Jerry Zhang	86397f6b24	[quant] Remove get_qparams in Observers (#38435 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38435 Test Plan: Imported from OSS Differential Revision: D21597835 Pulled By: jerryzh168 fbshipit-source-id: 88a8dd110db5586509bf98fa6712290f1756c272	2020-05-18 20:49:33 -07:00
Jerry Zhang	d5461e7ac8	[quant][graphmode] Move processing code to prepare_script (#38669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38669 Test Plan: Imported from OSS Differential Revision: D21623385 fbshipit-source-id: a59630de47f4927ae8af3801240101d307901671	2020-05-18 20:18:11 -07:00
Karl Ostmo	91163addf8	organize verbatim sources with subdirectories (#38688 ) Summary: re-use master-only branch filter logic and docker constants Pull Request resolved: https://github.com/pytorch/pytorch/pull/38688 Differential Revision: D21634022 Pulled By: kostmo fbshipit-source-id: 8a47af7fb08fb77e8f2ea376b564a84abca1ad50	2020-05-18 19:58:20 -07:00
Tongzhou Wang	f184ec819d	Do not use "buffer" in reentrant autograd err msg (#38625 ) Summary: `buffer` is also used to refer to `nn.Module`'s buffer. Wording is changed to reduce confusion between the two. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38625 Differential Revision: D21629396 Pulled By: albanD fbshipit-source-id: acb5ef598739efabae7b388e1a4806c9caf0f589	2020-05-18 19:31:21 -07:00
jjsjann123	958313a79f	Fix memory usage increase reported in #38568 (#38674 ) Summary: update to in-place version for bias add in convolution, this saves unnecessary memory allocation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38674 Differential Revision: D21626080 Pulled By: ngimel fbshipit-source-id: 4f52a3ae2e5aefae372d8ea5188336216f910da3	2020-05-18 18:46:19 -07:00
anjali411	f3048609d3	[CUDA] torch.roll for complex dtypes (#38664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38664 Test Plan: Imported from OSS Differential Revision: D21630498 Pulled By: anjali411 fbshipit-source-id: bf43a812f3d8dd984785256bad41131410435965	2020-05-18 18:19:22 -07:00
Shawn Zhong	724b2b6ebd	Profiler: Call `populate_cpu_children` inside `__str__` and fix typo (#37816 ) Summary: Fix https://github.com/pytorch/pytorch/issues/37500 I messed up with the old PR https://github.com/pytorch/pytorch/pull/37755 during rebasing and thus opened this one. - Add call to `populate_cpu_children` for `__str__` to make sure that the printed result is correctly populated. - Add test `test_profiler_aggregation_table` - Fix a minor typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/37816 Reviewed By: ilia-cher Differential Revision: D21627502 Pulled By: ngimel fbshipit-source-id: 9c908986b6a979ff08c2ad7e6f4afac1f5fbeebb	2020-05-18 16:47:13 -07:00
Jason Ansel	49d687f23c	[JIT][to_backend] Move code that is not related to the user-facing API out of `jit/backends/backend.h` (#38567 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38431 Test Plan ``` python test/test_jit.py TestBackends ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38567 Test Plan: ``` python test/test_jit.py TestBackends ``` Differential Revision: D21598950 Pulled By: jansel fbshipit-source-id: 794436cf351f28ded9c3e13fbcf173aee6c33d42	2020-05-18 16:30:34 -07:00
Nikita Shulga	76fc9bd2ef	Docker constants refactor (#38676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38676 Differential Revision: D21629431 Pulled By: malfet fbshipit-source-id: 17503c80ceb3aa30a45c29742e719226658e5ca5	2020-05-18 16:01:54 -07:00
Nick Gibson	2f21dfb541	[TensorExpr] Eager reduction initialization & removal from ReduceOp (#38585 ) Summary: This PR removes the deferred initializer field from ReduceOp in favour of eagerly initializing buffers when they are created (either in the constructor of `LoopNest`, or in `rfactor()`). This allows a pretty good simplification of reduction logic, removing almost all of the reduction expander and the ReduceInitCleaner & unpopular NoOp node added in the last fix. Eager initialization is better for us anyway because it allows more opportunities to transform the initialization loop. Added a few more tests, testReduceOverSplitWithTail failed before this change due to a bug in splitWithTail which now can't happen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38585 Differential Revision: D21621551 Pulled By: nickgg fbshipit-source-id: 378137e5723b4a6d6e390239efb12adce22a8215	2020-05-18 15:56:43 -07:00
Supriya Rao	97abed7cbe	[quant] Remove TensorListObserver (#38584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38584 All observers will support tensor lists in future PR Test Plan: Imported from OSS Differential Revision: D21623464 fbshipit-source-id: c5c57ecfe14f7c3aa92b7c99d724e846132ae03b	2020-05-18 15:49:34 -07:00
svcscm	c430b7d80f	Updating submodules Summary: GitHub commits: `d67d568565` `6472821406` `10a3d74cc4` `10957464df` `b2ab01c578` `06943d59da` `218b463647` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: af0d8f75f6ee6ac9e67812fcd6206070f74f3f49	2020-05-18 15:01:46 -07:00
Donna Choi	8c07a98adc	Error out of default_collate for lists of unequal size (#38492 ) Summary: Fix issue https://github.com/pytorch/pytorch/issues/23141# In the below example ```default_collate``` collates each element of the list. Since the second element isn't present in all samples, it is discarded: ``` from torch.utils.data import Dataset from torch.utils.data import DataLoader import numpy as np class CustomDataset(Dataset): def __len__(self): return 2 def __getitem__(self, idx): tmp = { "foo": np.array([1, 2, 3]), "bar": ["X"] * (idx+1), } return tmp training = CustomDataset() for batch in DataLoader(training, batch_size=2): print(batch) ``` Yields ``` { 'foo': tensor( [ [1, 2, 3], [1, 2, 3] ] ), 'bar': [ ('X', 'X'), ] } ``` Based on discussion in the issue, it seems the best course of action is to error out in this case. This seems consistent with what is done for tensor elements, as seen in [TensorShape.cpp line 1066](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorShape.cpp#L1060) which is called when ```torch.stack``` is called. In this PR, I introduce a similar message to error out for lists. SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/38492 Differential Revision: D21620396 Pulled By: ezyang fbshipit-source-id: 17f59fbb1ed1f0d9b2185c95b9ebe55ece701b0c	2020-05-18 14:53:33 -07:00
Venkata Chintapalli	d7e08b456d	FakeLowP Readme update (#38666 ) Summary: * fixed a typo source venv3/bin/active => source venv3/bin/activate * Added instructions to set libomp.so path in LD_LIBRARY_PATH Pull Request resolved: https://github.com/pytorch/pytorch/pull/38666 Reviewed By: amylittleyang Differential Revision: D21622582 Pulled By: yinghai fbshipit-source-id: a286b1a25fea7de8b692bfba19e60978fcb3c215	2020-05-18 14:45:08 -07:00
peter	378956b481	Make find_first_set works on x86 MSVC (#38637 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38322#issuecomment-630031072. Tested locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38637 Differential Revision: D21620059 Pulled By: ezyang fbshipit-source-id: 50af50ce29e46759f11a196fa0fedca2740214bb	2020-05-18 14:40:10 -07:00
Nikolay Korovaiko	23207ae656	towards guard what you use (#38576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38576 Reviewed By: eellison Differential Revision: D21608008 Pulled By: Krovatkin fbshipit-source-id: c5f9783b6a7cdefe932d65502874cfc4fa650e3c	2020-05-18 14:28:03 -07:00
Omkar Salpekar	5fcb2f678f	[Distributed Autograd] Make debugInfoMap from strings to ints (#38416 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38416 This diff primarily changes the `debugInfoMap` to map from strings to ints, instead of strings to strings. We were basically just converting these back to ints in Python so this avoid the extra conversions. `arc lint` also exposed tons of linting issues so fixing those here as well. Test Plan: Build Bot - the tests already check whether the debugInfoMap is correct. Differential Revision: D21266522 fbshipit-source-id: e742dec272bb1bab1bee01542610802922abab6b	2020-05-18 14:19:13 -07:00
Xiang Gao	5e2d8745c8	RIP CUDA <9.2: circleci, aten, and caffe2 (#36846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36846 Test Plan: Imported from OSS Differential Revision: D21620850 Pulled By: ngimel fbshipit-source-id: 7ad1676a12f86250f301095ffc6f365a3b370f34	2020-05-18 13:41:05 -07:00
Mikhail Zolotukhin	b29e7f9b9d	[TensorExpr] Use couldMoveBefore instead of couldMoveAfter checks in the fuser pass, add CPP tests. (#38592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38592 I'm not sure that using couldMoveAfter was incorrect, but using couldMoveBefore is more consistent with other subgraph-extraction passes (old fuser, create autodiff graphs, etc.), so it would make it easier to unify their implementations after this change. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D21607856 Pulled By: ZolotukhinM fbshipit-source-id: 970583af7859889d48aacf620ae028258e37a75f	2020-05-18 13:40:59 -07:00
Karl Ostmo	895479e612	Complete codegen of 'build' workflow YAML tree (#38631 ) Summary: This replaces all "verbatim-sources" files comprising the workflow named 'build' in the CircleCI config with code generation. This shall facilitate an automated conversion to workflow-per-job. Note that the '.circleci/config.yml' file has some strictly cosmetic changes in this PR: some keys are sorted and inline comments are removed (moved to the Python modules that generate the config). Pull Request resolved: https://github.com/pytorch/pytorch/pull/38631 Differential Revision: D21623528 Pulled By: kostmo fbshipit-source-id: d86bd7aea979f443db14b4a3898220faad6bd0da	2020-05-18 13:38:53 -07:00
Xingying Cheng	262f70c986	[PyTorch] Remove module and operator observer macros. (#38489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38489 Remove module and operator observer macros. ghstack-source-id: 104290763 Test Plan: a. Verify that QPL is being sent while testing FB4A BI Cloaking: {F236982877} b. Verify that AI Benchmark is working on both module and operator level: https://our.intern.facebook.com/intern/aibench/details/808056762618979 c. Verify that macosx segmentation effect by running buck run xplat/arfx/tracking/segmentation/tools:person_segmentation_demoAppleMac#macosx-x86_64: {F236982853} Reviewed By: ljk53 Differential Revision: D21540838 fbshipit-source-id: 516f84ef5673d4ceed38ae152440a5cbacc6ddaa	2020-05-18 13:28:01 -07:00
Ksenija Stanojevic	5b12c29b17	[ONNX]Update Dropout Export (#37641 ) Summary: Dropout operator in ONNX has additional input: training_mode. Update Dropout export to match changes made in ONNX Pull Request resolved: https://github.com/pytorch/pytorch/pull/37641 Reviewed By: hl475 Differential Revision: D21613782 Pulled By: houseroad fbshipit-source-id: f34d1a1f8116200c6609b4b43489d5610f6d0ec4	2020-05-18 13:10:44 -07:00
Simon Layton	59d92e442b	Vectorize non-persistent Softmax (#38557 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/36485 with bug fix & enhanced testing. Moved `test_softmax_backward` -> `test_softmax_results`, check fprop & bgrad against CPU implementation for all cases. \cc ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/38557 Differential Revision: D21620805 Pulled By: ngimel fbshipit-source-id: 4f736b3e59f79142e1b982eb643c592dedcbe111	2020-05-18 13:05:36 -07:00
Meghan Lele	b2c06ad875	[JIT] Export all jit/backend headers in BUILD.bazel (#38668 ) Summary: Summary This commit modifies `BUILD.bazel` to include all headers in `jit/backends` in `torch_headers` so that they can be accessed by external backend code that lives in a different repository. Test Plan Continuous integration. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38668 Differential Revision: D21623755 Pulled By: SplitInfinity fbshipit-source-id: 7f77b70e056205444e5ae63b47d87d8791131c3c	2020-05-18 12:32:00 -07:00
Kimish Patel	eb224721d2	Enabled dropout removal pass in mobile optimizer. (#38254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38254 ghstack-source-id: 103939143 Test Plan: mobile optimizer test. Also tested on pytext model. Reviewed By: dreiss Differential Revision: D21505862 fbshipit-source-id: 95c3b205e85cc7c4f20f5416c09cd0bc862849ce	2020-05-18 12:21:26 -07:00
kshitij12345	09c430a2aa	support complex types for tanh_backward_cpu (#37791 ) Summary: Closes: https://github.com/pytorch/pytorch/issues/37701 TO-DO: * [x] Add Tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/37791 Differential Revision: D21619827 Pulled By: anjali411 fbshipit-source-id: 0919ec80168a7f8b8092da8d39b8bc6f519d3440	2020-05-18 12:09:56 -07:00
Omkar Salpekar	a84fd8de39	Handling Active Call Count through Future Callback (#38589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38589 This PR creates a unified way of decrementing the active call count on the client side by attaching a callback to the future returned by `TensorPipeAgent::send`. ghstack-source-id: 104227074 Test Plan: CI/Sandcastle once tests PR's are merged. Differential Revision: D21605779 fbshipit-source-id: c82396de6984876b09ee032ab1aa0f68a87005be	2020-05-18 11:59:38 -07:00
Omkar Salpekar	34ef473d92	[Tensorpipe Agent] Timeouts for RPC requests (#38448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38448 This PR implements timeout support for RPCs, and respects the new per-RPC timeout functionality. A map containing RPC futures, keyed by an expiration time, is populated by the send function for each RPC. A separate watchdog thread polls this map and sets all incomplete futures with errors. Note: we cannot set errors to a future with the lock held (this will trigger callbacks immediately and, if one of the callback functions tries to acquire the lock that we held when setting the error, we have a lock order cycle). Thus we add all incomplete futures to a list, and then iterate through the list outside the lock to set errors on those futures if necessary. ghstack-source-id: 104227075 Test Plan: Will patch the testing diff on top of this to run tests. Differential Revision: D21468526 fbshipit-source-id: 4514484ece6fb6be673427d44c7f3164ab3d9d7c	2020-05-18 11:59:32 -07:00
Yael Dekel	ece878e5b8	[ONNX] Add GreaterOrEqual and LessOrEqual to opset 12 ONNX export (#38311 ) Summary: GreaterOrEqual and LessOrEqual were added in opset 12, this PR adds support to export these operators to ONNX instead of using "not" and "less than" or "greater than". Pull Request resolved: https://github.com/pytorch/pytorch/pull/38311 Reviewed By: hl475 Differential Revision: D21613795 Pulled By: houseroad fbshipit-source-id: 121d936d9787876ecb19cf24d661261e4abc82ab	2020-05-18 11:59:26 -07:00
anjali411	ca05fb2e86	Add autograd tests for complex (#38658 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38658 Test Plan: Imported from OSS Differential Revision: D21622580 Pulled By: anjali411 fbshipit-source-id: 977f4a3074b09f72dd8bb6895edd3b6d3152b04b	2020-05-18 11:57:54 -07:00
Tongzhou Wang	44cead3a31	Improve syncbn doc format (#38423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38423 Differential Revision: D21601342 Pulled By: jerryzh168 fbshipit-source-id: dd2bf012831025495e9ece3db08536dd1d515645	2020-05-18 11:52:07 -07:00
Yang Gu	59bef16138	Add ci binary test for windows (#38297 ) Summary: Tested in https://github.com/pytorch/pytorch/pull/38316. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38297 Differential Revision: D21623307 Pulled By: seemethere fbshipit-source-id: 51f89f826f4add72d1f18456cad175a4f1724010	2020-05-18 11:50:17 -07:00
Yinghai Lu	d904f3324f	[NNPI] Support fp32 bias in NNPI Backend (#38596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38596 ATT. Test Plan: unittests in the diff ``` buck test mode/dev //glow/fb/test/numerics:test_fc_nnpi_int8nnpi -- 'test_int8_fc_simple_fp32_bias $glow\.fb\.test\.numerics\.test_fc_nnpi_int8\.Int8FCTest$' ``` Reviewed By: jackm321 Differential Revision: D20474831 fbshipit-source-id: 9c49a71eb1926466013a196a3d6e60cdb25cf721	2020-05-18 11:45:41 -07:00
Yinghai Lu	67cd263876	Fix merge_fp32_inputs_into_fp16 with no partition (#38594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38594 By default, we don't have parition name, so previous impl will fail to rewire the input into the split-convert output. It's usually a hidden perf issue instead of a correctness issue. Test Plan: Enhanced ``` buck test glow/fb/test:test_merge_inputs_nnpi_fp16nnpi ``` Reviewed By: tracelogfb Differential Revision: D21608439 fbshipit-source-id: d72b06500a3b84f6747aa77cf9fd8754a4ff1195	2020-05-18 11:45:35 -07:00
Yinghai Lu	8338426ed8	Fix infinite loop bug in minimizer (#38507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38507 With `--merge_fp32_inputs_into_fp16` we added some ops to the net with out net_pos, this makes the cardinality of blacklist pos smaller than number of op in the net. Previously, the updateInternalState() function of minimizer will just enter infinite loop. This diff fixed it by changing the loop condition. Reviewed By: tracelogfb Differential Revision: D21578777 fbshipit-source-id: 0d5373fa0a417ded1c80a2dc03248c07b1e0a320	2020-05-18 11:44:05 -07:00
Omkar Salpekar	e6993938de	Avoid Releasing, Reacquiring lock per iteration in RPC Retry Thread (#38521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38521 In the RPC Retry Thread, we add retriable futures to a list under the lock, release the lock, add callbacks/set errors to those futures, then re-acquire the lock to clean up the retry map. We can simply clean up the retry map before releasing the lock and not acquire it again - this would be cleaner and may results in better perf if this reduces context switching between threads looking to acquire the retryMapLock. ghstack-source-id: 104062147 Test Plan: CI, there are thorough tests in the RPC framework to test errors with retries. Differential Revision: D21563085 fbshipit-source-id: 35e620892da630d082c032f5f9ce16e8a9ffdfaa	2020-05-18 10:59:13 -07:00
Ksenija Stanojevic	711f258dc7	Enable tests in test_pytorch_onnx_onnxruntime (#37868 ) Summary: Enable tests in tests/onnx/test_pytorch_onnx_onnxruntime.py for: - Einsum - SoftmaxCrossEntropy - NLLLoss - normalize - pixel_shuffle - test_interpolate_no_shape - test_arange_dynamic - test_slice_neg_large_negone since there is a support in ORT for these operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37868 Reviewed By: hl475 Differential Revision: D21440528 Pulled By: houseroad fbshipit-source-id: 4e590c554d000981bb12d4ce3ff4c175ed73a274	2020-05-18 10:50:03 -07:00
MLgdg	a86176dee2	CTC target un-pad example (#38393 ) Summary: CTC target un-pad example Pull Request resolved: https://github.com/pytorch/pytorch/pull/38393 Differential Revision: D21620042 Pulled By: ezyang fbshipit-source-id: 532d77c92fe1742c6f6b4a1b61c281f042cf8374	2020-05-18 10:31:14 -07:00
Vasiliy Kuznetsov	8292742ba0	fake_quant: move observer and fake_quant flags into buffers (#38368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38368 There is a need for some customers to enable/disable these flags in the middle of QAT. To make it work properly with DDP, we need to implement them using buffers so that they are replicated properly to all the nodes. This should solve issue https://github.com/pytorch/pytorch/issues/38081 Test Plan: CI Imported from OSS Differential Revision: D21537607 fbshipit-source-id: 8c9da022beb7aaa44c658268f02f99dd5aee93fd	2020-05-18 09:30:07 -07:00
Gregory Chanan	b27be3e0c5	Avoid double dispatch in logical_not for compilation speed reasons. (#38565 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38565 Also note this turns on "-Wno-unused-local-typedefs" because we are using dispatch macros for error checking. Test Plan: Imported from OSS Differential Revision: D21598478 Pulled By: gchanan fbshipit-source-id: 28f9ad01bd678df0601a10d0daf3ed31c47c4ab2	2020-05-18 09:25:54 -07:00
James Reed	176174a68b	Remove BC hack (#38571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38571 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D21600325 Pulled By: jamesr66a fbshipit-source-id: 9c1d53c2271702ad21cab07aafbd3bb16b474308	2020-05-16 19:51:42 -07:00
svcscm	873f9025bb	Updating submodules Summary: GitHub commits: `c160c61bde` `09f8ebd98a` `c881ebf415` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 986beb532d39dab1368e0099be84e1fc69072d46	2020-05-16 19:50:04 -07:00
svcscm	fe44741dba	Updating submodules Summary: GitHub commits: `2b1888a631` `9f5ac27f50` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 1e17af89ad40dfeaf6ca662fb5cdfc89d1b76a99	2020-05-16 02:23:41 -07:00
Xiang Gao	83df3beaca	Add complex support for torch.sum (#38382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38382 Test Plan: Imported from OSS Differential Revision: D21600127 Pulled By: anjali411 fbshipit-source-id: c5338ab10bdcebe4a281b03f78e6f2063186bc32	2020-05-15 19:49:38 -07:00
James Reed	db86c8c6f5	Test BC for built-in torchbind methods (#38560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38560 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D21598067 Pulled By: jamesr66a fbshipit-source-id: 26a0e92a5c2883326be261cf84b7e916ebfd60d8	2020-05-15 19:06:59 -07:00
James Reed	b9c537514c	[JIT] Remove import statement thing in serialization docs (#38578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38578 Test Plan: Imported from OSS Differential Revision: D21603383 Pulled By: jamesr66a fbshipit-source-id: 07c7eb62f048406f2e21528e32c677d18eb87cce	2020-05-15 18:26:36 -07:00
Sebastian Messmer	feb24577c2	Reduce number of variables in codegen (#38369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38369 Seems we have a lot of variables in codegen that carry duplicate information. This PR is removing them. It unifies all use sites to use the same instance ghstack-source-id: 104067031 Test Plan: waitforsandcastle Differential Revision: D21537983 fbshipit-source-id: 8d3ce3d3f712f7ba355e8c192798dfefaf847dac	2020-05-15 17:58:45 -07:00
Jason Ansel	31b57e38cb	[jit] fix `index_put_` error in subscript assignment (#38378 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/27493 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38378 Test Plan: `pytest ./test/test_jit.py -k test_tensor_subscript_assign` Differential Revision: D21540489 Pulled By: jansel fbshipit-source-id: a06e55175942b9d51ccc51d5440b7b122481b333	2020-05-15 17:53:27 -07:00
Shen Li	f39222a13d	Restore thread_local states in continuation thread on RPC servers (#38512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38512 As we gradually making the RPC non-blocking on server side, the processing of the same request can yield-run on different threads. Hence, we need to populate thread_local states (e.g., ctx id) in the continuation thread. Fixes #38439 Test Plan: Imported from OSS Differential Revision: D21583642 Pulled By: mrshenli fbshipit-source-id: a79bce1cb207fd11f1fa02b08465e49badda65fc	2020-05-15 17:23:04 -07:00
Bharat123rox	8752d6a736	DOC: Correct upsample doc to match interpolation (#38455 ) Summary: Fix https://github.com/pytorch/pytorch/issues/38334 and correct the docs of `torch.nn.functional.upsample` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38455 Differential Revision: D21583515 Pulled By: driazati fbshipit-source-id: 6ac5a79ba489bdcdd3fab34e4eddb4864e20a29e	2020-05-15 17:09:26 -07:00
svcscm	8743d51182	Updating submodules Summary: GitHub commits: `b9f1a803de` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 131edb2b9b26117e2b9a9d34a91305251b04e7e2	2020-05-15 17:04:21 -07:00
James Reed	0d9bb5f580	[JIT] Use GE optimizer guard in import (#38575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38575 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D21603086 Pulled By: jamesr66a fbshipit-source-id: 4427efabfdcc449045485e1cd9c2740ea823cc9c	2020-05-15 16:57:42 -07:00
Alban Desmaison	67d76f6bdd	Add utility to enable cpp stacktraces in torch.utils.debug (#38127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38127 Test Plan: Imported from OSS Differential Revision: D21595298 Pulled By: albanD fbshipit-source-id: 3926336cea2eaa0ef50bf9bfffd6c07f239d753f	2020-05-15 16:49:16 -07:00
Alban Desmaison	87f40fef84	Refactor check macros to reuse code (#38126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38126 Test Plan: Imported from OSS Differential Revision: D21595300 Pulled By: albanD fbshipit-source-id: 53805053a7a1ad35e93f335e889718e699a5dce1	2020-05-15 16:49:03 -07:00
Alban Desmaison	adf67b81c5	Make strip error messages work for cuda code (#38125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38125 Test Plan: Imported from OSS Differential Revision: D21595299 Pulled By: albanD fbshipit-source-id: 1eec45d32afbd3d09d71d7cb155b8e69f4ba496b	2020-05-15 16:47:15 -07:00
Mike Ruberry	9cfc10d52e	Updates assertEqual to use torch.isclose-like logic (#37294 ) Summary: Edit: this has been updated to reflect the PR's current status, which has changed after review. This PR updates the behavior of the assertEqual, assertNotEqual, and assert_allclose to be consistent with each other and torch.isclose. It corrects several additional bugs in the current implementations and adds extensive testing and comments, too. These updates follow from changes to assertEqual like https://github.com/pytorch/pytorch/pull/34258 and https://github.com/pytorch/pytorch/pull/37069, and from our discussion of torch.isclose for complex tensors (see https://github.com/pytorch/pytorch/issues/36462), where we decided to implement a NumPy-compatible mathematical notion of "closeness" for complex tensors that is not a great fit for our testing framework. The detailed changelist is: - New test framework functions for comparing tensors and scalars - Tensors are compared using isclose; the real and imaginary parts of complex tensors are compared independently - Scalars are compared using the same algorithm - assertEqual and assert_allclose now use this common comparison function, instead of each implementing their own with divergent behavior - assertEqual-like debug messages are now available for all tensor and scalar comparisons, with additional context when comparing the components of sparse, quantized, and complex tensors - Extensive testing of the comparison behavior and debug messages - Small Updates - assertEqual now takes an "exact_device" argument, analogous to "exact_dtype", which should be useful in multidevice tests - assertEqual now takes an "equal_nan" argument for argument consistency with torch.isclose - assertEqual no longer takes the "allow_inf" keyword, which misleadingly only applied to scalar comparisons, was only ever set (rarely) to true, and is not supported by torch.isclose - Bug fixes: - the exact_dtype attribute has been removed (no longer needed after https://github.com/pytorch/pytorch/pull/38103) - message arguments passed to assertEqual are now handled correctly - bool x other dtype comparisons are now supported - uint8 and int8 tensor comparisons now function properly - rtol for integer comparisons is now supported (default is zero) - rtol and atol for scalar comparisons are now supported - complex scalar comparisons are now supported, analogous to complex tensor comparisons - assertNotEqual is now equivalent to the logical negation of assertEqual Pull Request resolved: https://github.com/pytorch/pytorch/pull/37294 Differential Revision: D21596830 Pulled By: mruberry fbshipit-source-id: f2576669f7113a06f82581fc71883e6b772de19b	2020-05-15 16:24:03 -07:00
Meghan Lele	6a23214a47	[JIT] Adjust pybind includes in backend.h (#38562 ) Summary: Summary This commit adjusts the `pybind` includes in `backend.h` so that we can avoid exporting some unrelated headers during install (which probably shouldn't be exposed anyway). In addition, the headers that this commit removes are not used. Test Plan Continuous integration (includes tests for JIT backends). Pull Request resolved: https://github.com/pytorch/pytorch/pull/38562 Differential Revision: D21601694 Pulled By: SplitInfinity fbshipit-source-id: c8f8103d24cb4f10d9eb6b3657eed75878078945	2020-05-15 16:01:22 -07:00
Rishit Dagli	b04c07a67c	Added a Resource section to README (#38547 ) Summary: Added the following entries in the newly made resources section in README: * [PyTorch.org](https://pytorch.org/) * [PyTorch Tutorials](https://pytorch.org/tutorials/) * [PyTorch Examples](https://github.com/pytorch/examples) * [PyTorch Models](https://pytorch.org/hub/) * [Intro to Deep Learning with PyTorch from Udacity](https://www.udacity.com/course/deep-learning-pytorch--ud188) * [Intro to Machine Learning with PyTorch from Udacity](https://www.udacity.com/course/intro-to-machine-learning-nanodegree--nd229) * [Deep Neural Networks with PyTorch from Coursera](https://www.coursera.org/learn/deep-neural-networks-with-pytorch) * [PyTorch Twitter](https://twitter.com/PyTorch) * [PyTorch Blog](https://pytorch.org/blog/) * [PyTorch YouTube](https://www.youtube.com/channel/UCWXI5YeOsh03QvJ59PMaXFw) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38547 Differential Revision: D21601647 Pulled By: jerryzh168 fbshipit-source-id: 2453312401386aa59c3b6c62b9f735dc8eb4947f	2020-05-15 15:54:10 -07:00
Jeremy Lilley	53a368fedd	[aten] Split some at::launch code into at::internal::launch_no_thread_state() (#38477 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38477 A few specific uses (e.g. Thrift rpc parsing) don't need source thread state to be copied over. In microbenchmarks, this seems to add ~500ns, so split code across functions, so some code can use directly. ghstack-source-id: 104190095 Test Plan: - Existing code using at::launch exercises this codepath, so buck test mode/dev-nosan caffe2/test/... - For the split version, primarily the Thrift-based change layered on top of this. Differential Revision: D21573168 fbshipit-source-id: 2ef1f196b5177634d4ee7fdca7371d36906a69d6	2020-05-15 15:06:23 -07:00
Jerry Zhang	6232481cab	[quant][graphmode] Add RemoveReduantDequantize pass (#38434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38434 We insert dequantize for each use in order to produce quantization patterns that will later be fused, after that we should also remove extra dequantize node produced by this operation. Test Plan: Imported from OSS Differential Revision: D21597834 fbshipit-source-id: 18dfb2760bbb08932aa4e1d06f96cfc5fb37ed88	2020-05-15 15:01:40 -07:00
Meghan Lele	dd7eed5ae4	[JIT] Export JIT backend extension headers in setup.py (#38525 ) Summary: Summary This commit adds the headers required to define and use JIT backends to `package_data` in `setup.py` so that they are exported and copied to the same place as the rest of the headers when PyTorch is installed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38525 Differential Revision: D21601806 Pulled By: SplitInfinity fbshipit-source-id: 1615dd4047777926e013d7dd14fe427d5ffb8b70	2020-05-15 14:45:08 -07:00
Xiang Gao	1d1533e358	Migrate CPU cross and some elementwise to c10::complex (#38023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38023 Test Plan: Imported from OSS Differential Revision: D21518304 Pulled By: anjali411 fbshipit-source-id: a7d7bad7ba0af66314a5e91608add32d36695e6b	2020-05-15 14:33:41 -07:00
Nikita Shulga	dc918162b7	Remove `Caffe2_MAIN_LIBS` (#38408 ) Summary: Right now it is an unused alias to `torch_library` interface library Pull Request resolved: https://github.com/pytorch/pytorch/pull/38408 Differential Revision: D21598250 Pulled By: malfet fbshipit-source-id: ec9a2446b94e7ea68298831212005c2c80bbc95c	2020-05-15 12:27:15 -07:00
Elias Ellison	daa85cfe2e	[JIT] Exit Transform Rewrite (#38282 ) Summary: After an early return, we conditionalize all further execution. This means that currently the pattern of `if return elif return elif return` generates better code than `if return if return if return`. It's obviously not good to have semantically equivalent code generate worse IR, so we should rewrite the graph to handle this case. This came up in https://github.com/pytorch/pytorch/pull/37171 ``` torch.jit.script def test_foo(x: bool, y: bool): if x: return 1 return 2 print(test_foo.code) ``` generates: ``` def test_foo(x: bool, y: bool) -> int: _0 = uninitialized(int) if x: _1, _2 = True, 1 else: _1, _2 = False, _0 if _1: _3 = _2 else: _3 = 2 return _3 ``` while ``` torch.jit.script def test_foo(x: bool, y: bool): if x: return 1 else: return 2 print(test_foo.code) ``` generates: ``` def test_foo(x: bool, y: bool) -> int: if x: _0 = 1 else: _0 = 2 return _0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38282 Differential Revision: D21576733 Pulled By: eellison fbshipit-source-id: 80cf1ad7fbda6d8d58557abbfb21c90eafae7488	2020-05-15 12:22:28 -07:00
Meghan Lele	62afc2d63d	[JIT] Remove debug print statement added in #37994 (#38524 ) Summary: Summary This commit removes a print statement added in https://github.com/pytorch/pytorch/issues/37994 that appears to be for debugging and was most likely not intended to be commited. Test Plan Continuous integration. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38524 Differential Revision: D21587268 Pulled By: SplitInfinity fbshipit-source-id: 6bdcdce647c45f5c0a2ba179a3545a1c0cae1492	2020-05-15 12:01:34 -07:00
James Reed	d44573a6dc	Remove _all_weight_values again (#38504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38504 Test Plan: Imported from OSS Differential Revision: D21579530 Pulled By: jamesr66a fbshipit-source-id: 4449c92142200eaadc68b59d6f5f964ba60b1c80	2020-05-15 11:55:09 -07:00
Karl Ostmo	8d7582a2cf	codegen mobile and macos configs (#38539 ) Summary: Second round of conversion to generated CircleCI configs Pull Request resolved: https://github.com/pytorch/pytorch/pull/38539 Differential Revision: D21596722 Pulled By: kostmo fbshipit-source-id: 448eca382f7f108d1c8b45df419429423c3b248f	2020-05-15 10:56:06 -07:00
Gregory Chanan	70ef9f5124	Improve testing of logical_not. (#38505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38505 This takes the testing of https://github.com/pytorch/pytorch/pull/38275, but doesn't include the kernel changes which are still being worked out. Test Plan: Imported from OSS Reviewed By: mruberry Differential Revision: D21580574 Pulled By: gchanan fbshipit-source-id: f12317259cb7373989f6c9ad345b19aaac524851	2020-05-15 10:51:35 -07:00
Jiatong Zhou	42a3fb3a4e	change to find_method of lite_interpreter API to return nullptr if method not found (#38503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38503 Modify find_method to not error out if method doesn't exist to be more similar to: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/api/object.h#L100 Test Plan: run functions as part of mobile ASR Reviewed By: iseeyuan Differential Revision: D21466638 fbshipit-source-id: 635bff32539f7495f68dd3b203aaeb108f6283da	2020-05-15 10:33:16 -07:00
Nik Ved	5a19fe7454	migrate `gather` to ATen (CUDA) (#37659 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/24567](https://github.com/pytorch/pytorch/issues/24567). Pull Request resolved: https://github.com/pytorch/pytorch/pull/37659 Differential Revision: D21504432 Pulled By: ngimel fbshipit-source-id: baeb464a511236b01b69a7dddd4a3db268cd799c	2020-05-15 10:26:59 -07:00
Xinyu Li	52e9953faf	use version number instead of 'master' in html header title (#38149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38149 This is for (#21290) (#31894) Instead of putting "Pytorch master documentation" in header's html title, now we use "Pytorch 1.x.x documentation", this is similar to tensorFlow and numpy doc page. In google search, we will get Pytorch Documentation - Pytorch 1.x.x Documentation instead. Test Plan: Imported from OSS Differential Revision: D21586559 Pulled By: glaringlee fbshipit-source-id: 2995709ac3c22dbb0183b5b4abfde7d795f1f8eb	2020-05-15 08:32:32 -07:00
Nikita Shulga	4b52e52577	Use `jit_core_sources` from build_varliables.bzl (#38526 ) Summary: Replace hardcoded filelist in aten/src/ATen/CMakeLists.txt with one from `jit_source_sources` Fix `append_filelist` to work independently from the location it was invoked Pull Request resolved: https://github.com/pytorch/pytorch/pull/38526 Differential Revision: D21594582 Pulled By: malfet fbshipit-source-id: c7f216a460edd474a6258ba5ddafd4c4f59b02be	2020-05-15 08:21:37 -07:00
anjali411	242af6c078	Add tan_cuda for complex dtypes (#38400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38400 * #38399 Added autograd tests, disabled jit autograd tests for complex and added a separate list for tests for complex dtype only Test Plan: Imported from OSS Differential Revision: D21572209 Pulled By: anjali411 fbshipit-source-id: 7036029e9f8336139f5d54e0dfff9759f3bf8376	2020-05-15 08:16:59 -07:00
Nirav Mehta	acacad2575	Adding support for manifold files in DBReader (#37727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37727 Check if the file exists locally only for `log_file_db` db_type. Reader files in other `db_type` like `manifold_log_file_db` are excluded from this check. Test Plan: Verified that files stored in manifold can be loaded using `DBFileReader`. Reviewed By: hbjerry Differential Revision: D21329671 fbshipit-source-id: bbc0e88851783ca3f78f7c61bfe84b480c09b5ac	2020-05-15 07:18:30 -07:00
Nicolas Weber	bae895cef0	Issue 37819: Added check for kHIP in ATen/native/Copy.cpp (#38003 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/37819 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38003 Differential Revision: D21533134 Pulled By: mruberry fbshipit-source-id: 97490a8729171b95b103e00780e36518b9865087	2020-05-15 01:40:48 -07:00
Xiao Wang	bf2bbd9648	Add message to static_assert (#38519 ) Summary: From standard https://en.cppreference.com/w/cpp/language/static_assert, static_assert without message is not supported on C++ 14. Some compilers complained about this. cc mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/38519 Differential Revision: D21589194 Pulled By: ezyang fbshipit-source-id: d01555b2861703f0326a99bc5162e124695b0624	2020-05-15 00:26:49 -07:00
Natalia Gimelshein	c0bc182761	Revert "Vectorize non-persistent Softmax kernels (#36485 )" (#38534 ) Summary: This reverts commit c879c6fb98ec38197c86c703e1011c8b94f14c59. (it produces incorrect results) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38534 Reviewed By: soumith Differential Revision: D21589251 Pulled By: ngimel fbshipit-source-id: 66d5324848d0245d15b7ef5f1fe4302ed0992b56	2020-05-14 23:17:59 -07:00
Nick Gibson	8bf3124572	[TensorExpr] Fix bug when splitting inner reduce axis with tail (#38420 ) Summary: Fixes a bug in the following code: ``` Tensor* c = Reduce("sum", {{10, "m"}}, Sum(), b, {{10, "n"}, {10, "k"}}); // split N loop with tail: loop.splitWithTail(loop.getLoopStmtsFor(c)[1], 8, &outer, &inner, &tail); ``` When this is expanded there are two ReduceOps: ``` for (int m = 0; m < 10; m++) { for (int n_outer = 0; n_outer < (10 - 0) / 8; n_outer++) { for (int n_inner = 0; n_inner < 8; n_inner++) { for (int k = 0; k < 10; k++) { sum[m] = ReduceOp(sum, float(0), (sum[m]) + (b[m, n_outer * 8 + n_inner, k]), out_args={m}, reduce_args={n_inner, n_outer, k}); } } } for (int n_tail = 0; n_tail < (10 - 0) % 8; n_tail++) { for (int k = 0; k < 10; k++) { sum[m] = ReduceOp(sum, float(0), (sum[m]) + (b[m, n_tail + ((10 - 0) / 8) * 8, k]), out_args={m}, reduce_args={n_tail, k}); } } } ``` But each ReduceOp will expand it's initializer, which in this case will overwrite the sum of the split loop: ``` for (int m = 0; m < 10; m++) { sum[m] = 0.f; for (int n_inner = 0; n_inner < 8; n_inner++) { for (int k = 0; k < 10; k++) { sum[m] = (sum[m]) + (b[(100 * m + k) + 10 * n_inner]); } } sum[m] = 0.f; <------- HERE for (int n_tail = 0; n_tail < 2; n_tail++) { for (int k = 0; k < 10; k++) { sum[m] = (sum[m]) + (b[((100 * m + k) + 10 * n_tail) + 80]); } } } ``` The simplest fix is to remove the initializer from the tail loop, which requires adding support for Reductions without an initializer (I did via adding a NoOp Expr rather than handling nullptr). Also moved the ReductionExpander from loopnest.cpp to reduction.h as loopnest is getting a bit heavy. Added tests for all kinds of splits on a simple 3D reduction to verify no more problems of this type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38420 Differential Revision: D21587583 Pulled By: nickgg fbshipit-source-id: e0766934481917007119612eb60cc76c3242e44a	2020-05-14 22:58:28 -07:00
svcscm	0d51728d38	Updating submodules Summary: GitHub commits: `1b4b90a028` `17b31be012` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 504a9afa39116184bd7d211698b406d935116715	2020-05-14 22:54:11 -07:00
Hong Xu	3cb2778d94	Remove some unnecessary cast for complex numbers. (#38422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38422 This partially reverts #38021, due to the availability of #38418 Test Plan: Imported from OSS Differential Revision: D21587201 Pulled By: malfet fbshipit-source-id: c0717303c842ceb3a202986ec0e808ed45f682f1	2020-05-14 22:25:24 -07:00
Hong Xu	000fea375c	Support operations on c10::complex and integer scalars (#38418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38418 This is useful in reducing verbosity in c10::complex's general usage, and potentially also offers performance benefits. This brings back #34506 (which was made for std::complex). Differential Revision: D21587012 Test Plan: Imported from OSS Pulled By: malfet fbshipit-source-id: 6dd10c2f417d6f6d0935c9e1d8b457fd29c163af	2020-05-14 22:23:14 -07:00
omromano	ac613371a3	Update NNPI backend to 0.5.2.5. (#4464 ) Summary: Update of NNPI Backend to v0.5.2.5. Pull Request resolved: https://github.com/pytorch/glow/pull/4464 Reviewed By: arunm-git Differential Revision: D21418023 Pulled By: hl475 fbshipit-source-id: 254fcbca28bce0cfc37672306db7f9a352423d18	2020-05-14 22:15:17 -07:00
Jerry Zhang	ec9b2f9a9d	[quant][graphmode][refactor] Factor out getFixedQParamOpFusionInfo (#38359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38359 Test Plan: Imported from OSS Differential Revision: D21559807 fbshipit-source-id: 13a67049a189ca43dcdae4b42bab0847821b3cd5	2020-05-14 21:37:59 -07:00
Michael Voznesensky	960f4b51e3	[JIT] Fix `@staticmethod` access from `self` on modules (#37702 ) Summary: Closes https://github.com/pytorch/pytorch/issues/30755 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37702 Differential Revision: D21389989 Pulled By: voznesenskym fbshipit-source-id: f9b7e26a9eab7dc3d7762a5a28f85424dac5fbb3	2020-05-14 21:12:10 -07:00
Yan Zhu	3d0532f3ab	[c2] fix compute_norm test (#38529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38529 (Note: this ignores all push blocking failures!) Test Plan: buck test mode/opt //caffe2/caffe2/python/modeling:compute_norm_for_blobs_test Reviewed By: olittle Differential Revision: D21588603 fbshipit-source-id: bdb0ae455e85a934cb5e369fbb0078f2ff842814	2020-05-14 20:49:36 -07:00
Pruthvi Madugundu	8df14c573e	Add sccache support for hcc and hip-clang in ROCm (#38451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38451 Differential Revision: D21589126 Pulled By: ezyang fbshipit-source-id: dc4d08e7f393dbe369e501334c776071b2c176e0	2020-05-14 20:44:20 -07:00
Yan Zhu	fac9f36563	Back out "[c2] register cuda op for LpNorm (fallback)" Summary: Original commit changeset: 573419e5a8da Test Plan: D21562485 breaks CI build. Unlanding Reviewed By: olittle Differential Revision: D21588831 fbshipit-source-id: 6dda4b71904d7765f32f570f9722e4a9a6cbc97b	2020-05-14 20:25:30 -07:00
Jerry Zhang	ee52501976	[quant][graphmode][refactor] Factor out getInputTensorQParamOpFusionInfo (#38358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38358 Test Plan: Imported from OSS Differential Revision: D21559806 fbshipit-source-id: b243b811c5c5917f50a11ef5b26174baf46e683f	2020-05-14 19:59:09 -07:00
Shen Li	155a287aea	Enforce const on PyRRef functions (#38415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38415 Test Plan: Imported from OSS Reviewed By: rohan-varma Differential Revision: D21554722 Pulled By: mrshenli fbshipit-source-id: 53c2abd8de43545873be486e1fb893bc329d65a1	2020-05-14 19:01:28 -07:00
Supriya Rao	25177e2796	[quant] Support empty batch input for quantized ops (#38508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38508 Test Plan: python test/test_quantization.py TestQuantizedOps.test_empty_batch Imported from OSS Differential Revision: D21581937 fbshipit-source-id: e50580dec0682848a0703f7bdee6e9351ab79814	2020-05-14 18:42:50 -07:00
Ailing Zhang	bc49d938e2	Revert D21585458: [pytorch][PR] [RELAND] .circleci: Improve docker image build workflow Test Plan: revert-hammer Differential Revision: D21585458 Original commit changeset: 37792a1e0f5e fbshipit-source-id: cd4c6794708f27a80077e0af27ccf52c5c6ba832	2020-05-14 18:11:03 -07:00
Yan Zhu	0e0b9496fe	[c2] [easy] stop gradient when diagnose (#38518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38518 as title Test Plan: buck test Reviewed By: olittle Differential Revision: D21562570 fbshipit-source-id: 3a2e8dea3d821a2bdb9f30db25816a2bfa6c5dcf	2020-05-14 17:30:39 -07:00
Eli Uriegas	8cdc4807cd	[RELAND] .circleci: Improve docker image build workflow (#38484 ) Summary: closes https://github.com/pytorch/pytorch/issues/37855 Relies on https://github.com/pytorch/pytorch/pull/38483 Previous attempts to get this right: * https://github.com/pytorch/pytorch/pull/38335 * https://github.com/pytorch/pytorch/pull/38279 * https://github.com/pytorch/pytorch/pull/37976 This reverts commit 80639604a82422e314890f154242202a43d264f9. Improves the docker image build workflow from many steps to basically transparent from a user's perspective. To update docker images now all one has to do is edit the .circleci/docker folder and it will update automatically and also dynamically add the tags to the list of tags to keep from the garbage collector. Adding a new image will currently stay the same but we can explore doing that dynamically as well. How the build workflow works: - Docker tags are determined by the hash defined from git for the .circleci/docker sub-directory (extracted using git rev-parse) - Images are only built if the computed hash is not found in ecr and the hash is different than the previously computed hash. The previously computed hash is found using the same process as before but subbing out HEAD for the merge base between HEAD and the base git revision - That tag is then passed through the jobs using a shared workspace which is added to downstream jobs using the circleci ${BASH_ENV} How the new garbage collection works: - Tags to keep are generated by stepping through all of the commits in in the .circleci/docker subdirectory Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/38484 Differential Revision: D21585458 Pulled By: seemethere fbshipit-source-id: 37792a1e0f5e5531438c4ae61507639c133aa76d	2020-05-14 17:11:04 -07:00
Yan Zhu	bbfd0ef244	[c2] register cuda op for LpNorm (fallback) (#38517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38517 as title Test Plan: buck test Reviewed By: olittle Differential Revision: D21562485 fbshipit-source-id: 573419e5a8dae4121d99d5b72ed3960a92db7a54	2020-05-14 16:54:12 -07:00
Jerry Zhang	504637a171	[quant][graphmode] Support ops with fixed quantization parameters (#38278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38278 Support ops like aten::hardsigmoid that has a fixed quantization parameters: ``` constexpr float o_scale = 1.0f / 256.0f; constexpr int32_t o_zero_point = 0; ``` Ops supported: - hardsigmoid - sigmoid - tanh Test Plan: Imported from OSS Differential Revision: D21559811 fbshipit-source-id: 26f3c9c3389dea4f07b350172e2974fac8c5c470	2020-05-14 16:36:06 -07:00
Supriya Rao	de7025fbdb	[quant] Support for functional quantized::conv1d (#38449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38449 Also update docs to reflect conv1d op support Test Plan: python test/test_quantization.py TestQuantizedFunctional.test_conv1d_api Imported from OSS Differential Revision: D21575921 fbshipit-source-id: 21c9f6b49ad456cd9d93e97f17cf5b8d87f0da6b	2020-05-14 16:09:51 -07:00
Supriya Rao	8e732514cd	[quant][graphmode] Add support for quantized conv1d + relu fusion (#38441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38441 Test Plan: python test/test_quantization.py test_quantized_conv1d_relu Imported from OSS Differential Revision: D21575919 fbshipit-source-id: d43e33052ce1be5e38acef8fac16f22cb11c0695	2020-05-14 16:09:46 -07:00
Supriya Rao	f4605ae5c3	[quant] Fusion support for conv1d + ReLU (#38438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38438 Fusion for PTQ flow in eager mode. Graph mode to follow Test Plan: python test/test_quantization.py TestFusion Imported from OSS Differential Revision: D21575920 fbshipit-source-id: 5bac6602520f42ae3f4957d1a55e6a863daa0257	2020-05-14 16:08:11 -07:00
Jessica Lin	8b6bf2a457	Add C++ Landing Page (#38450 ) Summary: * Add cpp_index.rst for landing page to match 1.5 (https://github.com/pytorch/pytorch/blob/release/1.5/docs/source/cpp_index.rst) * Link to new cpp landing page was added to the docs table of contents in this PR: https://github.com/pytorch/pytorch/pull/38350 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38450 Differential Revision: D21580939 Pulled By: jlin27 fbshipit-source-id: 021c43f207a100d554266e4e16cb6752ca9c56a0	2020-05-14 16:02:01 -07:00
David Reiss	1f87f15ba3	Remove _reset_warning_registry (#38485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38485 Python 2 has reached end-of-life and is no longer supported by PyTorch. This class does nothing in Python 3. Test Plan: CI Reviewed By: ailzhang Differential Revision: D21575260 Pulled By: dreiss fbshipit-source-id: 184696c9fa501e8d2517950b47cdbc90b2ae8053	2020-05-14 15:03:30 -07:00
David Reiss	b140ed6848	Remove structseq_slice (#35625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35625 Python 2 has reached end-of-life and is no longer supported by PyTorch. This function was already ifdef'ed out in Python 2. Added a comment about when we might be able to remove this entire file. Test Plan: CI Differential Revision: D20842885 Pulled By: dreiss fbshipit-source-id: 1fd3b1b2ff5a82caaf3bc11344dde2941427cfc0	2020-05-14 15:03:24 -07:00
David Reiss	6d642a6f6c	Remove (most) Python 2 support from C++ code (#35614 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35614 Python 2 has reached end-of-life and is no longer supported by PyTorch. Now we can clean up a lot of cruft that we put in place to support it. These changes were all done manually, and I skipped anything that seemed like it would take more than a few seconds, so I think it makes sense to review it manually as well. Test Plan: CI Differential Revision: D20842876 Pulled By: dreiss fbshipit-source-id: 18abf0d324ed2185ec6d27c864e935d856dcc6ad	2020-05-14 15:01:49 -07:00
Karl Ostmo	1b973aa2a2	Sort CirlceCI config.yml keys to facilitate diff review after codegen (#38496 ) Summary: This will support another round of migration from hand-written configs to code generation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38496 Differential Revision: D21581624 Pulled By: kostmo fbshipit-source-id: aed814ef6d4fc6af9ce092727b2dacc99de14ae0	2020-05-14 14:33:25 -07:00
svcscm	69dca43c35	Updating submodules Summary: GitHub commits: `64bad39e0d` `d10385b2cf` `d1d606ea75` `5b7309d5fe` `e5c84d203b` `6e64791678` `b9a2b343c4` `d9c1059140` `1bcde534b5` `46981b8186` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 5299cef9c91612f176ff0c29d0cc3acf629d2240	2020-05-14 14:15:22 -07:00
Igor Sugak	0e80c12bb4	[pytorch] fix -Wlogical-op-parentheses in SortingKthValue.cu (#38500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38500 Reported by Clang: ``` caffe2/aten/src/ATen/native/cuda/SortingKthValue.cu:77:56: error: '&&' within '\|\|' [-Werror,-Wlogical-op-parentheses] \|\| THCNumerics<scalar_t>::isnan(v) && THCNumerics<scalar_t>::isnan(kValue)); ~~ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ caffe2/aten/src/ATen/native/cuda/SortingKthValue.cu:77:56: note: place parentheses around the '&&' expression to silence this warning \|\| THCNumerics<scalar_t>::isnan(v) && THCNumerics<scalar_t>::isnan(kValue)); ^ ( ) ``` Test Plan: ``` buck build mode/opt -c fbcode.cuda_use_clang=true fblearner/flow/projects/dper:workflow ``` Reviewed By: ngimel Differential Revision: D21578871 fbshipit-source-id: 83595152a370a4acbb2c3b5823dbae9c21485f06	2020-05-14 13:59:31 -07:00
Michael Suo	9d0e935b48	skip torchbind on rocm (#38501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38501 Test Plan: Imported from OSS Differential Revision: D21579298 Pulled By: suo fbshipit-source-id: 4ac0b6beac26c97c1e0ff68304996ce62be8e8ce	2020-05-14 12:58:27 -07:00
Rohan Varma	4d4895a62a	Use Future's then() API to fix RPC profiling (#38352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38352 Fixes the RPC profiling by using the `then()` API added in https://github.com/pytorch/pytorch/pull/37311. Instead of adding a regular callback, we return a new future that completes when the profiling callback is finished. This is transparent to the user as the future still completes with the value of the original future (i.e. the RPC's return value) To make this work for RRef, we add a `_set_profiling_future` to set the profiling future, and `_get_profiling_future` to retrieve this future and wait on it in the tests. Re-enabled profiling tests and stress tested them 1000 times to verify the fix ghstack-source-id: 104086114 Test Plan: Re-enabled profiling tests Differential Revision: D21506940 fbshipit-source-id: 35cde22f0551c825c9bc98ddc24cca412878a63a	2020-05-14 12:52:45 -07:00
Rohan Varma	f178bf10f1	Support rpc_async call with timeout in JIT (#37884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37884 Adds support to use rpc_timeout param in rpc_async call from jit for parity with eager mode. Done by: 1) Add timeout as an input in ir_emitter.cpp if it is specified 2) Parse float IValue from inputs in `prim::rpc_async` operator. Give the default if needed. Added UTs in jit/rpc_test. ghstack-source-id: 104083031 Test Plan: Added UTs in jit/rpc_test. Differential Revision: D21268895 fbshipit-source-id: 34bb10a2ac08b67dd6b789121ab43e2c0e696229	2020-05-14 12:44:26 -07:00
Eli Uriegas	3300dd5227	.cirlceci: Keep tags that look like a sha1 (#38483 ) Summary: Previous attempts to get this right: * https://github.com/pytorch/pytorch/pull/38335 * https://github.com/pytorch/pytorch/pull/38279 * https://github.com/pytorch/pytorch/pull/37976 This tag kept getting deleted before the docker image ci workflow could be merged causing it to have upstream breakages. It'd be best to make sure the garbage collector just doesnt garbage collect it. This is a pre-step to merge https://github.com/pytorch/pytorch/pull/38484 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/38483 Differential Revision: D21577359 Pulled By: seemethere fbshipit-source-id: c4e0709bd8fff8f24a988b60eaa9f8c01576ef2f	2020-05-14 12:38:33 -07:00
Will Feng (FAIAR)	38d141ede5	Support having a different forward method when we are not in scripting mode (#38158 ) Summary: TorchScript currently doesn’t support `args, kwargs` in method signature, which is extensively used in DPER3 low-level modules’ forward method. In order to make DPER3 low-level modules scriptable, I was thinking about a solution of having a forward method only* for TorchScript, and replace the forward method when we are not in scripting mode. This solution works today, and I would like to add a test to make sure it will always work in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38158 Differential Revision: D21485657 Pulled By: yf225 fbshipit-source-id: df7368e8a5265418be7c305e6666ffd76e595466	2020-05-14 12:13:06 -07:00
SsnL	5f2a274015	Fix conv non zero padding being applied in wrong dim (#37881 ) Summary: Turns out F.pad takes in dims in reverse order. Fixes https://github.com/pytorch/pytorch/issues/37844 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37881 Differential Revision: D21554011 Pulled By: soumith fbshipit-source-id: a85a7f6db9f981d915728965903c5c57b6617c93	2020-05-14 11:56:38 -07:00
Omkar Salpekar	b57a339703	Guard against negative rpcTimeout being passed in to RpcBackendOptions (#38267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38267 Assert that the rpcTimeout is positive in RpcBackendOptions constructor ghstack-source-id: 104029918 Test Plan: CI Differential Revision: D21509850 fbshipit-source-id: c925490e3d8fa2ffa42b0ae1170ca2f740af11f7	2020-05-14 11:33:23 -07:00
Protonu Basu	d1eeb3b7bb	[Tensorexpr] Fix and improve handling multiple gpu devices (#38365 ) Summary: These commits fixes a bug which was exposed when we took away the fallback path. The fix is to set the appropriate device before setting CUDA stream. The improvement is when compiling, setting the device to new device only if it's different from prior device, and removing redundant call to cudaFree Pull Request resolved: https://github.com/pytorch/pytorch/pull/38365 Reviewed By: zheng-xq Differential Revision: D21537469 Pulled By: protonu fbshipit-source-id: b9662dd623b5c7cfd23eb6894e992a43665641e4	2020-05-14 11:17:17 -07:00
Omkar Salpekar	af597335d4	Remove unnecessary to_string in RPC logging code. (#38414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38414 `std::to_string` call is unnecessary when using glog. ghstack-source-id: 104030161 Test Plan: Ran the retry tests and checked logs to ensure correct message was printed upon message failure, Differential Revision: D21266330 fbshipit-source-id: 53519287778d47d99b94ea34b7c551f910affda2	2020-05-14 10:57:00 -07:00
David Reiss	2f4da7c00c	Remove a use of exec (#35624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35624 Python 2 has reached end-of-life and is no longer supported by PyTorch. This test case is valid syntax in Python 3. Test Plan: CI Differential Revision: D20842877 Pulled By: dreiss fbshipit-source-id: 856e72171496aa1d517f2f27a8a5066462cf4f76	2020-05-14 10:08:04 -07:00
David Reiss	7f7fdb1013	Remove a use of checkScript(str) (#35623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35623 Python 2 has reached end-of-life and is no longer supported by PyTorch. This test case is valid syntax in Python 3. Test Plan: CI Differential Revision: D20842874 Pulled By: dreiss fbshipit-source-id: 9f12e046f827d4f9d5eca99b0b0b46f73e06ff51	2020-05-14 10:07:58 -07:00
David Reiss	313bea84ef	Remove _get_wrapped_func (#35621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35621 Python 2 has reached end-of-life and is no longer supported by PyTorch. `func.__wrapped__` can be used directly in Python 3. Test Plan: CI Differential Revision: D20842875 Pulled By: dreiss fbshipit-source-id: 26f71df12db6d5118c8f278b27d747d647d07900	2020-05-14 10:07:53 -07:00
David Reiss	d060deb5bb	Remove _compatible_subtest (#35620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35620 Python 2 has reached end-of-life and is no longer supported by PyTorch. `self.subTest` can be used directly in Python 3. Test Plan: CI Differential Revision: D20842872 Pulled By: dreiss fbshipit-source-id: 6ad42550c01e6959821ff07df767fc14b58c5a9e	2020-05-14 10:07:48 -07:00
David Reiss	7026b39ac7	Remove _uses_true_division (#35618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35618 Python 2 has reached end-of-life and is no longer supported by PyTorch. Python 3 always uses true division. Test Plan: CI Differential Revision: D20842884 Pulled By: dreiss fbshipit-source-id: 522e34bb584d4bdb01c9c40eb267955062a57774	2020-05-14 10:07:42 -07:00
David Reiss	328fc70b84	Remove (most) Python 2 support from setup.py (#35617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35617 Python 2 has reached end-of-life and is no longer supported by PyTorch. Now we can clean up some cruft that we put in place to support it. Test Plan: CI Differential Revision: D20842883 Pulled By: dreiss fbshipit-source-id: 18dc5219ba99658c0ca7e2f26863df008c420e6a	2020-05-14 10:06:20 -07:00
Supriya Rao	cbff959bd7	[quant] Return default qconfig when backend is 'none' (#38407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38407 We can still run some quantized tests even when fbgemm/qnnpack isn't enabled Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D21554257 fbshipit-source-id: e4fa8f61f6a6717881c00620ed7938c01ffbf958	2020-05-14 09:53:50 -07:00
Hong Xu	7f11079769	Delete "named_guard" in native_functions.yaml (#38429 ) Summary: "named_guard" is not a supported option (i.e., a typo). Pull Request resolved: https://github.com/pytorch/pytorch/pull/38429 Differential Revision: D21572794 Pulled By: zou3519 fbshipit-source-id: 6e799611344f373b03f64410d7af9c2c89a75f55	2020-05-14 09:48:23 -07:00
Michael Carilli	25f918548d	Allow GradScaler to be pickled (#38296 ) Summary: Should unblock https://github.com/PyTorchLightning/pytorch-lightning/issues/1782. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38296 Differential Revision: D21553296 Pulled By: albanD fbshipit-source-id: 9041a72d7cf8833e4b01bc767fd2321f17c7c5f2	2020-05-14 09:14:28 -07:00
SsnL	ae392a77a6	Add better device idx parse checks (#37376 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32079 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37376 Differential Revision: D21476036 Pulled By: zou3519 fbshipit-source-id: 86907083c23cbaf165b645307fb340f2656b814e	2020-05-14 09:07:12 -07:00
Peter Bell	0a159b0a3a	Fix precision issues in CPU remainder (#38293 ) Summary: Together with https://github.com/pytorch/pytorch/issues/37758, this fixes https://github.com/pytorch/pytorch/issues/37743 and fixes https://github.com/pytorch/pytorch/issues/24861. This follows the CUDA fix in https://github.com/pytorch/pytorch/issues/37758, vectorised using a `blendv` to replace the if conditionals. Most of the complication is from `remainder` supporting `at::Half` where `fmod` doesn't. I've now got `fmod` working on `Vec256<at::Half>` as well as enabling half dispatch for `fmod` so it matches `remainder`. I also added `fmod` support to `Vec256<at::BFloat16>` before realising that `remainder` doesn't support `BFloat16` anyway. I could also enable `BFloat16` if that's desirable. If not, I don't think `Vec256<BFloat16>` should be missing `fmod` anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38293 Differential Revision: D21539801 Pulled By: ezyang fbshipit-source-id: abac6a3ed2076932adc459174cd3d8d510f3e1d5	2020-05-14 08:54:32 -07:00
Nikita Shulga	3e9b4332d2	Fix @skipIfNoFBGEMM for types (#38432 ) Summary: Return unmodified type from decorator if fbgemm is present. Fix `Tried to trace <__torch__.torch.classes.rnn.CellParamsBase object at 0x55f504c56b40> but it is not part of the active trace. Modules that are called during a trace must be registered as submodules of the thing being traced` thrown from `TestPostTrainingDynamic.test_quantized_rnn` by preserving modules in returned qRNNBase (i.e. by partially reverting https://github.com/pytorch/pytorch/pull/38134 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38432 Differential Revision: D21567333 Pulled By: malfet fbshipit-source-id: 364fa2c8fc6e400b4f2e425b922a977756aec1d8	2020-05-14 08:27:29 -07:00
Takayoshi Nishida	628e3b6fbd	Fix unreachable validation for gradcheck (#37915 ) Summary: Hi, I found the validation that is unreachable in `gradcheck` function :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/37915 Differential Revision: D21551661 Pulled By: albanD fbshipit-source-id: 8acadcc09cd2afb539061eda0ca5e98860e321eb	2020-05-14 08:18:14 -07:00
Pearu Peterson	48c0331e01	Sparse softmax support (CPU) (#36305 ) Summary: This PR implements softmax support for sparse tensors. The sparse softmax is related to dense softmax when the values of unspecified sparse tensor entries are taken to be `-inf` that will have the effect of "zero entries ignored". This relation is used for testing the correctness of results here. Resolves https://github.com/pytorch/pytorch/issues/23651 for CPU. - [x] sparse softmax - [x] CPU C++ implementation - [x] unittests - [x] update softmax documentation - [x] autograd support - [x] sparse log_softmax - [x] CPU C++ implementation - [x] unittests - [x] update log_softmax documentation - [x] autograd support Pull Request resolved: https://github.com/pytorch/pytorch/pull/36305 Differential Revision: D21566540 Pulled By: ezyang fbshipit-source-id: a632ea69c38622f960721482e442efeb8d0a54fc	2020-05-14 08:08:40 -07:00
Huang Shuang	fedb70a8fb	Fix encoding errors for hipify tool (#37906 ) Summary: Encoding errors occur when using anaconda python 3.6.10 to run hipify_python.py, e.g., "' ASCII 'codec can't decode byte 0xc3". Pull Request resolved: https://github.com/pytorch/pytorch/pull/37906 Differential Revision: D21549531 Pulled By: ezyang fbshipit-source-id: 2ffb5787e192a5c03711baa5c7e2577cb5bcab5a	2020-05-14 08:07:04 -07:00
Robert Wang	2b2d2168e8	Issue #27441 Fix: Bug in updating ModuleDict & ParameterDict (#27814 ) Summary: Fix a bug in `nn.ModuleDict.update` and `nn.ParameterDict.update` when passing another same dictionary as input. Related issue: [Issue https://github.com/pytorch/pytorch/issues/27441](https://github.com/pytorch/pytorch/issues/27441) Pull Request resolved: https://github.com/pytorch/pytorch/pull/27814 Differential Revision: D21518099 Pulled By: ezyang fbshipit-source-id: 9e6bb6fcc26c8070e137e2e52c65f69a1fcaab37	2020-05-14 08:01:41 -07:00
Bharat123rox	15da26f8aa	DOC: Add documentation for Tensor.is_nonzero (#37845 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37438 by adding documentation for `Tensor.is_nonzero` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37845 Differential Revision: D21494422 Pulled By: mruberry fbshipit-source-id: ee4f5979922d7c8100b5031d770ccdf59fe1c1a1	2020-05-14 04:46:55 -07:00
Nikolay Korovaiko	96885f73ed	make test_jit infer the profiling mode, add a job for simple executor (#38374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38374 Differential Revision: D21567658 Pulled By: Krovatkin fbshipit-source-id: c0eb44cf6c842d5feebabf8c7d99c1b4aa6c4960	2020-05-13 23:55:40 -07:00
SsnL	b5868b2833	Relax sampler check in BatchSampler (#38403 ) Summary: Since the check was added in https://github.com/pytorch/pytorch/pull/6249, one can not pass an iterable as a sampler to the data loader anymore, which was a very handy feature (e.g., https://github.com/pytorch/pytorch/issues/1337). I think the check should be removed for two-fold reasons: 1. It is too strict. There is no reason that it should not be a general iterable. 2. It is inconsistent. In `DataLoader` (the main place where people use samplers), you can pass a general iterable as `batch_sampler` but not `sampler` due to this check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38403 Differential Revision: D21555958 Pulled By: soumith fbshipit-source-id: c7267bb99a31edd8f2750689205d6edc5dab5cff	2020-05-13 22:24:29 -07:00
Nikita Shulga	f3d2e332f1	[PyTorch] Remove duplicate jit core sources filelists (#38430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38430 Add `jit_core_[sources\|headers]` to `build_variables.bzl`, use them from BUILD.bazel as wel as from internal build systems Test Plan: CI Reviewed By: suo Differential Revision: D21555649 fbshipit-source-id: e78572465f36560806d646f147b2ef5a53ba1efe	2020-05-13 22:19:31 -07:00
Nikita Shulga	061ed739c1	Embed ATen/core/CMakeLists.txt into its parent (#38426 ) Summary: This file were separated from main CMakeLists.txt to enable mobile builds, but at the moment it is only referenced from CMakeLists.txt in parents folder This is a preparatory step to move `jit_core_sources`,`jit_core_headers` to build_variables.bzl Pull Request resolved: https://github.com/pytorch/pytorch/pull/38426 Test Plan: CI Differential Revision: D21567389 Pulled By: malfet fbshipit-source-id: e6340fad1da75aa3e24d6c340df0c3e1e1957595	2020-05-13 22:14:19 -07:00
Shen Li	f99a693cd9	Remove unnecessary py::object copy in PyRRef ctor (#38402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38402 Test Plan: Imported from OSS Differential Revision: D21554724 Pulled By: mrshenli fbshipit-source-id: abab45010810ec53628ea2c7a9c76cdc50eb2f74	2020-05-13 22:00:13 -07:00
Jeff Daily	54c16b44cf	[ROCm] increase timeout, enable test_backend_group (#36166 ) Summary: CC iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/36166 Differential Revision: D21566721 Pulled By: ezyang fbshipit-source-id: 4fc83af918e1b427511388d9227da35a91156dfd	2020-05-13 21:46:37 -07:00
Cloud Han	8d94615c2b	Migrate erfc from TH to ATen (CUDA) (#38373 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/24559 Reference https://github.com/pytorch/pytorch/issues/24507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38373 Differential Revision: D21549626 Pulled By: ezyang fbshipit-source-id: 84c2cf58b071df3afc312ae0aef3b5ed6c014cc7	2020-05-13 21:19:03 -07:00
Jeff Daily	beedc6542e	relax MAX_JOBS restriction for ROCm builds (#38425 ) Summary: CC ezyang xw285cornell sunway513 Forcing MAX_JOBS=4 was done 2 years ago. We have tested up to MAX_JOBS=256. OOM issues are no longer observed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38425 Differential Revision: D21566747 Pulled By: ezyang fbshipit-source-id: f7f50e44a287268f1b06bcea3cb4e11c80260cc3	2020-05-13 21:12:14 -07:00
svcscm	b1d2c1765e	Updating submodules Summary: GitHub commits: `c2eda06820` `7ed5f9f16c` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 48b466893b7537c845bc30b2880f7bb9d7c1d265	2020-05-13 20:26:37 -07:00
Hong Xu	336e1ec592	Clean up error handling in is_nonzero and where in TensorCompare.cpp (#38150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38150 Differential Revision: D21539736 Pulled By: ezyang fbshipit-source-id: e390c12f5948192a552d66dcd1bb89b2cb45f170	2020-05-13 20:19:40 -07:00
lixinyu	5a979fcb99	allow user passing relative paths in include_dirs within setuptools.setup (#38264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38264 Test Plan: Imported from OSS Differential Revision: D21509277 Pulled By: glaringlee fbshipit-source-id: b0bc17d375a89b96b1bdacde5987b4f4baa9468e	2020-05-13 20:00:12 -07:00
Jerry Zhang	ee8bf1c640	[quant][graphmode][refactor] insertDeQuantForAllUse (#38277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38277 Test Plan: Imported from OSS Differential Revision: D21559809 fbshipit-source-id: 87f73c9ec9d5be5a3224d963fed35792ca0decc1	2020-05-13 19:09:03 -07:00
Jerry Zhang	eb66dd0bc8	[quant][graphmode][refactor] Refactor propagateQuantizationOps (#38276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38276 Test Plan: Imported from OSS Differential Revision: D21559814 fbshipit-source-id: 31331415c30f59cde0af478cfad5e890e994ef71	2020-05-13 19:07:38 -07:00
Elias Ellison	8d883f5c7c	[JIT] [Easy] Add location to implicit conversions (#38442 ) Summary: Previously, we weren't adding the location to implicit conversions, so the error message wouldn't show location when these ops failed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38442 Differential Revision: D21563500 Pulled By: eellison fbshipit-source-id: 19dd786ab8580f11ed919aac669efeed0ef52dcb	2020-05-13 18:02:41 -07:00
Jerry Zhang	7ce733d218	[quant][graphmode] Move leaky_relu to general value op map (#38166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38166 Test Plan: Imported from OSS Differential Revision: D21559813 fbshipit-source-id: 8521f7ad2b0fcd6f87090fb40517d5d92c37ba54	2020-05-13 17:51:14 -07:00
Jerry Zhang	16696186e1	[quant][graphmode] Move elu to general value ops map (#38165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38165 Test Plan: Imported from OSS Differential Revision: D21559812 fbshipit-source-id: 55bc28d71d0b8a1c33e05bce20a802db1015ea0b	2020-05-13 17:51:09 -07:00
Jerry Zhang	98d78a7f20	[quant][graphmode] Move hardtanh to general value ops map (#38164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38164 Test Plan: Imported from OSS Differential Revision: D21559808 fbshipit-source-id: 7b00e40cfa58806ce8675a61073778c4d77f8a8b	2020-05-13 17:51:03 -07:00
Jerry Zhang	1fde373f2f	[quant][graphmode] Move clamp to general value ops map (#38163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38163 Test Plan: Imported from OSS Differential Revision: D21559805 fbshipit-source-id: db02bd17fbc6d1335fe021265955d02d52d139e6	2020-05-13 17:50:57 -07:00
Jerry Zhang	e988b4fbb1	[quant][graphmode] Move interpolate to general value ops (#38162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38162 Test Plan: Imported from OSS Differential Revision: D21559810 fbshipit-source-id: 2d975fc71f73c18f594108172850dfcfdb0cb9a0	2020-05-13 17:49:08 -07:00
Michael Suo	0d220ef381	[torchbind] Better error message when missing init. (#37474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37474 Previously we would segfault Test Plan: Imported from OSS Differential Revision: D21297542 Pulled By: suo fbshipit-source-id: c7e2f828a250c490ec23fb51c6a4a642d3370e52	2020-05-13 17:38:31 -07:00
Michael Suo	2efa7e04c2	[jit] move torchbind tests to separate file (#37473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37473 Test Plan: Imported from OSS Differential Revision: D21297541 Pulled By: suo fbshipit-source-id: 65c48094b1f26fbbf251021957257ce04279922b	2020-05-13 17:37:00 -07:00
Supriya Rao	7d7d73655d	[quant][graphmode] Add quantizedconv1d to graphmode (#38341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38341 Test Plan: python test/test_quantization.py TestQuantizeScriptPTSQOps.test_quantized_conv1d Imported from OSS Differential Revision: D21554256 fbshipit-source-id: baf78c7788a38acd9362204990f0b22c21263dfb	2020-05-13 16:59:24 -07:00
Supriya Rao	ae11718c45	[quant] Add quantized::conv1d op benchmarck (#38332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38332 Test Plan: python -m pt.qconv_test --test QConv1d_N1_IC128_OC256_L64_G1_kernel3_stride1_pad0 Forward Execution Time (us) : 147.844 python -m pt.conv_test --test Conv1d_IC128_OC256_kernel3_stride1_N1_L64_cpu Forward Execution Time (us) : 470.750 Imported from OSS Differential Revision: D21553662 fbshipit-source-id: 9c240a141f9cd3a82a20aa462e8e5577e002a387	2020-05-13 16:59:19 -07:00
Supriya Rao	f6626aaf43	[quant] Add support for Quantized Conv1d and ConvRELU1d (#38283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38283 Adds support for the modules and tests Test Plan: python test/test_quantization.py TestStaticQuantizedModule.test_conv1d_api Imported from OSS Differential Revision: D21553665 fbshipit-source-id: 7ea28da024bdf59f87f300d616c266f2b41f0bcd	2020-05-13 16:59:13 -07:00
Supriya Rao	2d221df52f	[quant] Add support for quantized::conv1d operator (#38248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38248 Test Plan: Imported from OSS Differential Revision: D21553661 fbshipit-source-id: 430b4c3244be0cf1a18bdf16788a2023c524c10b	2020-05-13 16:57:43 -07:00
anjali411	1676c7d618	Added autograd tests, disabled jit autograd tests for complex and added a separate list for tests for complex dtype only (#38399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38399 Test Plan: Imported from OSS Differential Revision: D21555941 Pulled By: anjali411 fbshipit-source-id: ea9f5a76590c5bab3df6a540617b074238bfb535	2020-05-13 16:41:09 -07:00
Hector Yuen	53439be643	improve some reporting for fakelowp tests (#38428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38428 use and log a randomly generated seed with each test Test Plan: locally tested Reviewed By: amylittleyang Differential Revision: D21554466 fbshipit-source-id: 008185d13116ec8553b082150a355ba87682bf6a	2020-05-13 15:56:49 -07:00
Gregory Chanan	dac9b61850	Move Cuda Abs kernel to its own file. (#38274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38274 UnarySignKernels is one of the longest files to compile and Abs is not a sign function. Test Plan: Imported from OSS Differential Revision: D21511831 Pulled By: gchanan fbshipit-source-id: f8572ab21321a241c984c64f7df83e2cb5e757d5	2020-05-13 15:44:30 -07:00
Vasiliy Kuznetsov	ff76de8ace	speed up hardswish and hardsigmoid tests (#38256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38256 Removes hypothesis to speed these tests up, as these tests were flagged as top slow tests in CI. At the same time, combines the fbgemm and qnnpack test cases together for better reuse. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_hardswish python test/test_quantization.py TestQuantizedOps.test_qhardsigmoid ``` Imported from OSS Differential Revision: D21506831 fbshipit-source-id: 9ff70e4ec7ae30b6948fe808878f0187e631f4d8	2020-05-13 15:37:51 -07:00
Shen Li	afa4dbd731	Use GIL to guard decref of jit::toPyObj return value in processRpc (#38376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38376 Test Plan: Imported from OSS Differential Revision: D21540179 Pulled By: mrshenli fbshipit-source-id: 082fa5f11da7fc1f083710b498e72abc5ba2c244	2020-05-13 15:36:12 -07:00
Jessica Lin	33977ca769	Update Cpp, rpc docs and Libraries section to match 1.5 (#38350 ) Summary: * Link cpp docs to the cpp landing page * Link to rpc.rst landing page * Update Libraries to match 1.5 (https://github.com/pytorch/pytorch/blob/release/1.5/docs/source/index.rst) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38350 Differential Revision: D21554435 Pulled By: jlin27 fbshipit-source-id: d1c9d5a86f84910225cbd0a57074ae95c8a9a450	2020-05-13 15:20:35 -07:00
Jeremy Lilley	328dd9e5d6	[future] Make new IValue future constValue semantics match torch::utils counterpart (#38355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38355 The torch::utils::Future api which this api was copied from last week intentionally does not throw. Harmonize the semantics and comment appropriately. ghstack-source-id: 104014210 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D21533016 fbshipit-source-id: db26af32656d7b9dacf4fad4e77c944a0087c9b0	2020-05-13 15:02:23 -07:00
Jerry Zhang	b668bbc404	[quant][graphmode][refactor] Factor out common parts of general value ops (#38161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38161 Test Plan: Imported from OSS Differential Revision: D21512972 fbshipit-source-id: 61425f7c51fe5972527432b74407486aa479d999	2020-05-13 14:17:45 -07:00
Mikhail Zolotukhin	6e13146d96	[TensorExpr] TensorExprKernel: don't do any compilation or lowering in run(). (#37948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37948 The input JIT graph has all the information we need to perform the entire compilation at the construction time. We don't need to postpone any steps until the execution time. Also, from the graph we always know what device we will be executing on and thus we don't need to have a CodeGen cache in TensorExprKernel - we always have one and only one CodeGen. Test Plan: Imported from OSS Reviewed By: protonu Differential Revision: D21432145 Pulled By: ZolotukhinM fbshipit-source-id: 8dc86b891713056b2c62f30170cd4a168912f027	2020-05-13 14:02:23 -07:00
Hong Xu	eac54f18b8	Vectorize SmoothL1Loss forward (CPU) (#37115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37115 Benchmark (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz): ```python import timeit for op in ('SmoothL1Loss',): print('Forward') for dtype in ('torch.double', 'torch.float', 'torch.bfloat16'): for n, t in [(10_000, 100000), (100_000, 10000)]: print(f'torch.nn.{op}()(a, b), \|a-b\|>1, numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('m(a, b)', setup=f'import torch; m = torch.nn.{op}(); a = torch.full(({n},), 1, dtype={dtype}); b = torch.full(({n},), 3, dtype={dtype})', number=t)) print(f'torch.nn.{op}()(a, b), \|a-b\|<1, numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('m(a, b)', setup=f'import torch; m = torch.nn.{op}(); a = torch.full(({n},), 1, dtype={dtype}); b = torch.full(({n},), 1.5, dtype={dtype})', number=t)) ``` Results: Before: ``` Forward torch.nn.SmoothL1Loss()(a, b), \|a-b\|>1, numel() == 10000 for 100000 times, dtype=torch.double 2.8427017140056705 torch.nn.SmoothL1Loss()(a, b), \|a-b\|<1, numel() == 10000 for 100000 times, dtype=torch.double 2.823863306999556 torch.nn.SmoothL1Loss()(a, b), \|a-b\|>1, numel() == 100000 for 10000 times, dtype=torch.double 0.9239509999897564 torch.nn.SmoothL1Loss()(a, b), \|a-b\|<1, numel() == 100000 for 10000 times, dtype=torch.double 0.9014650480094133 torch.nn.SmoothL1Loss()(a, b), \|a-b\|>1, numel() == 10000 for 100000 times, dtype=torch.float 2.4530331650021253 torch.nn.SmoothL1Loss()(a, b), \|a-b\|<1, numel() == 10000 for 100000 times, dtype=torch.float 2.4551637870026752 torch.nn.SmoothL1Loss()(a, b), \|a-b\|>1, numel() == 100000 for 10000 times, dtype=torch.float 0.5716871829936281 torch.nn.SmoothL1Loss()(a, b), \|a-b\|<1, numel() == 100000 for 10000 times, dtype=torch.float 0.5748704470024677 torch.nn.SmoothL1Loss()(a, b), \|a-b\|>1, numel() == 10000 for 100000 times, dtype=torch.bfloat16 9.777982015002635 torch.nn.SmoothL1Loss()(a, b), \|a-b\|<1, numel() == 10000 for 100000 times, dtype=torch.bfloat16 12.627838339001755 torch.nn.SmoothL1Loss()(a, b), \|a-b\|>1, numel() == 100000 for 10000 times, dtype=torch.bfloat16 7.810075458997744 torch.nn.SmoothL1Loss()(a, b), \|a-b\|<1, numel() == 100000 for 10000 times, dtype=torch.bfloat16 10.73597132100258 ``` After: ``` Forward torch.nn.SmoothL1Loss()(a, b), \|a-b\|>1, numel() == 10000 for 100000 times, dtype=torch.double 2.8420191049808636 torch.nn.SmoothL1Loss()(a, b), \|a-b\|<1, numel() == 10000 for 100000 times, dtype=torch.double 2.8814279660000466 torch.nn.SmoothL1Loss()(a, b), \|a-b\|>1, numel() == 100000 for 10000 times, dtype=torch.double 0.9491433810035232 torch.nn.SmoothL1Loss()(a, b), \|a-b\|<1, numel() == 100000 for 10000 times, dtype=torch.double 0.9144560259883292 torch.nn.SmoothL1Loss()(a, b), \|a-b\|>1, numel() == 10000 for 100000 times, dtype=torch.float 2.4458729829930235 torch.nn.SmoothL1Loss()(a, b), \|a-b\|<1, numel() == 10000 for 100000 times, dtype=torch.float 2.4474395569995977 torch.nn.SmoothL1Loss()(a, b), \|a-b\|>1, numel() == 100000 for 10000 times, dtype=torch.float 0.5676976410031784 torch.nn.SmoothL1Loss()(a, b), \|a-b\|<1, numel() == 100000 for 10000 times, dtype=torch.float 0.5793530470109545 torch.nn.SmoothL1Loss()(a, b), \|a-b\|>1, numel() == 10000 for 100000 times, dtype=torch.bfloat16 4.32380092900712 torch.nn.SmoothL1Loss()(a, b), \|a-b\|<1, numel() == 10000 for 100000 times, dtype=torch.bfloat16 4.332892568985699 torch.nn.SmoothL1Loss()(a, b), \|a-b\|>1, numel() == 100000 for 10000 times, dtype=torch.bfloat16 2.3354615129937883 torch.nn.SmoothL1Loss()(a, b), \|a-b\|<1, numel() == 100000 for 10000 times, dtype=torch.bfloat16 2.3352111729909666 ``` Test Plan: Imported from OSS Differential Revision: D21351860 Pulled By: VitalyFedyunin fbshipit-source-id: b19ca1e58586d964972e5c495aba10c8808cd747	2020-05-13 12:50:40 -07:00
Supriya Rao	b90fc52c68	[quant] Implement unsqueeze/squeeze for per-channel qtensor (#38247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38247 Per-channel quantized tensor axis value is shifted based on the unsqueeze/squeeze dim Test Plan: python test/test_quantization.py TestQuantizedTensor.test_qtensor_unsqueze Imported from OSS Differential Revision: D21550293 fbshipit-source-id: 90ea4a1bd637588360b3228cb5af9176176eb033	2020-05-13 12:45:55 -07:00
Protonu Basu	0526eb0f08	Fix aten_add. aten_sub to handle 2-operand versions (#38367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38367 Reviewed By: Krovatkin Differential Revision: D21550736 Pulled By: protonu fbshipit-source-id: 83491d35cc9168af2208c4f19c423d23e7de836d	2020-05-13 12:26:33 -07:00
Jerry Zhang	d403b85c00	[quant][graphmode] Move `aten::mean` to general value ops (#38160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38160 Test Plan: Imported from OSS Differential Revision: D21512971 fbshipit-source-id: 98cb1cc0eec5e7b140dcdf4e756bdbcd724b98f3	2020-05-13 11:39:22 -07:00
Jianyu Huang	2a54533c64	Fix the flooding log issues (#38356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38356 Reduce the log size ghstack-source-id: 103997991 Test Plan: CI Reviewed By: jspark1105 Differential Revision: D21532296 fbshipit-source-id: d5ab5a8acc18a2b4210131d0d6b932e293c303a9	2020-05-13 11:23:17 -07:00
Vasiliy Kuznetsov	f64d24c941	speed up SyncBatchNorm by batching distributed communication (#38246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38246 Speeds up SyncBatchNorm by batching the distributed communication. Initial benchmarks show a ~15+% speed improvement on MobileNetV2 and EfficientNetB3 on a single machine with 8 gpus. Improvement vs baseline increases as # of gpus increases. Test Plan: verified that before+after intermediate values in fwd/bwd pass are equivalent (with `torch.allclose`) benchmark runner: https://gist.github.com/vkuzo/7b1ce1b1b051ee6d46877d0f18ab9b1f results (1 forward pass + 1 backward pass, 1 machine, 8x Tesla-P100, batch_size=20 per node): ``` model gpus before_ms after_ms speedup efficientnet-b3 2 660 654 0.00909 efficientnet-b3 4 777 710 0.08623 efficientnet-b3 8 988 838 0.15182 mobilenet-v2 2 267 266 0.00375 mobilenet-v2 4 328 289 0.1189 mobilenet-v2 8 453 373 0.1766 ``` Imported from OSS Differential Revision: D21505905 fbshipit-source-id: 3e796343fce8329a2e17671d60ae66c0387924e7	2020-05-13 11:21:42 -07:00
Gregory Chanan	899a075b25	Split up BinaryAritmeticKernel.cu to speed up compilation time. (#38263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38263 On my machine, compilation went from 4m8sec to the maximum of the files being compiled in 2m22sec. Test Plan: Imported from OSS Differential Revision: D21508985 Pulled By: gchanan fbshipit-source-id: 2917cd5f30c6b31229053cada93c95e3a27ab29a	2020-05-13 10:51:05 -07:00
kshitij12345	d86de916a9	Migrate `exp` and `exp_` from the TH to Aten (CUDA) (#36652 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24561 Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.exp(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.exp(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.exp(a) a.numel() == 10000 for 20000 times torch.half 0.3001665159999902 torch.exp(a) a.numel() == 10000 for 20000 times torch.float 0.28265794499998265 torch.exp(a) a.numel() == 10000 for 20000 times torch.double 0.3432170909998149 torch.exp(a) a.numel() == 100000 for 20000 times torch.half 0.32273333800003456 torch.exp(a) a.numel() == 100000 for 20000 times torch.float 0.31498759600003723 torch.exp(a) a.numel() == 100000 for 20000 times torch.double 1.079708754999956 ``` After: ``` torch.exp(a) a.numel() == 10000 for 20000 times torch.half 0.27996097300092515 torch.exp(a) a.numel() == 10000 for 20000 times torch.float 0.2774473429999489 torch.exp(a) a.numel() == 10000 for 20000 times torch.double 0.33066844799941464 torch.exp(a) a.numel() == 100000 for 20000 times torch.half 0.27641824200145493 torch.exp(a) a.numel() == 100000 for 20000 times torch.float 0.27805968599932385 torch.exp(a) a.numel() == 100000 for 20000 times torch.double 1.0644143180015817 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36652 Differential Revision: D21164653 Pulled By: VitalyFedyunin fbshipit-source-id: 42c7b24b0d85ff1d390231f1457968a8869b8db3	2020-05-13 10:06:51 -07:00
Gao, Xiang	e7b4ef8fd3	Revert "Partial revert of #38144 to fix ROCm CI. (#38363 )" (#38380 ) Summary: The changes in this file broke ROCm and got reverted in https://github.com/pytorch/pytorch/issues/38363. This PR brings it back with ROCm fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38380 Differential Revision: D21549632 Pulled By: ezyang fbshipit-source-id: 68498aba70e651352d58fd0c865e71420dbf900a	2020-05-13 09:58:23 -07:00
Jerry Zhang	f2c6346ebe	[quant][graphmode] Move avg_pool/adaptive_avg_pool to general value ops (#38330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38330 Test Plan: python test/test_quantization.py TestQuantizeScriptPTSQOps.test_quantize_general_value_ops Imported from OSS Differential Revision: D21533452 fbshipit-source-id: 56928d93624f7c3d5c61f2627a19c5d3bb595202	2020-05-13 09:22:24 -07:00
Jeff Daily	138769b1b8	[ROCm] add exact_dtype=False to bfloat16 test (#38381 ) Summary: CC rohithkrn ezyang xw285cornell Fixes - TestNNDeviceTypeCUDA.test_activations_bfloat16_cuda - TestNNDeviceTypeCUDA.test_pooling_bfloat16_cuda - TestNNDeviceTypeCUDA.test_softmax_bfloat16_cuda Pull Request resolved: https://github.com/pytorch/pytorch/pull/38381 Differential Revision: D21549636 Pulled By: ezyang fbshipit-source-id: acb290c57eff4077b040a696267ecde613f0a433	2020-05-13 08:48:18 -07:00
Hong Xu	61bea93fca	Further parallelize linspace in addition to AVX (#38093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38093 Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136, Parallelization using OpenMP): ``` import timeit for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'): for n, t in [(40_000, 50000), (400_000, 5000)]: print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times') print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t)) ``` With AVX ======== Before: ``` torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times 1.0942596640015836 torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times 0.9209065200011537 torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times 1.0520610109997506 torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times 0.9031864690005023 torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times 1.949299545998656 torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times 1.82629113800067 torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times 1.9547776939980395 torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times 1.8259895039991534 torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times 2.759497356000793 torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times 2.6285490109985403 torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times 2.3456633150017296 torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times 2.2031515989983745 torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times 2.559069258000818 torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times 2.378239962999942 ``` After: ``` torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times 0.8100852870011295 torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times 0.18943897200006177 torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times 0.6679975400002149 torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times 0.17846923400065862 torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times 1.1431112539976311 torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times 0.3336703610002587 torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times 1.157699686998967 torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times 0.32964968899977976 torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times 1.5379577429994242 torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times 0.4638638729993545 torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times 1.360489848000725 torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times 0.4033017760011717 torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times 1.4591587399991113 torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times 0.44132660000104806 ``` Without AVX =========== Before: ``` torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times 3.4967273879992717 torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times 3.330881046000286 torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times 2.176502857997548 torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times 2.023505228000431 torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times 2.117801246000454 torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times 1.9885458380013006 torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times 2.1057261179994384 torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times 1.9809251260012388 torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times 3.187070896001387 torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times 3.049615387000813 torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times 3.4874590049985272 torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times 3.33596555099939 torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times 4.256659758000751 torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times 4.100936053000623 ``` After: ``` torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times 1.9155298300029244 torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times 0.598213522000151 torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times 1.3183841649988608 torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times 0.40136947100108955 torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times 1.2191377319977619 torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times 0.35984685299990815 torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times 1.2153874989999167 torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times 0.35752785600197967 torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times 1.750796647000243 torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times 0.5376063230032742 torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times 1.9153429929974664 torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times 0.5952553579991218 torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times 2.281823589000851 torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times 0.7391443560009066 ``` Differential Revision: D21528099 Test Plan: Imported from OSS Pulled By: malfet fbshipit-source-id: a6b3904e7860bb6d652a48b2056154509e73157d	2020-05-12 23:48:31 -07:00
Mikhail Zolotukhin	9a2d8dfe63	[TensorExpr] Benchmarks: set up profiling executor and fuser according to the given arguments. (#38295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38295 Test Plan: Imported from OSS Differential Revision: D21525741 Pulled By: ZolotukhinM fbshipit-source-id: 8bf1d54da062c8e0653bb2cb627883ae4ed14774	2020-05-12 23:27:46 -07:00
svcscm	3a478b1cbf	Updating submodules Summary: GitHub commits: `483ccc940c` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: b4fc9fdab591b9ff10a6a228c6c92166518253c4	2020-05-12 23:24:10 -07:00
Michael Suo	167a978a03	Fix method stub creation for function attributes (#37994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37994 Before, reassigning a method in a module (like `forward = _forward`) didn't work, because we look at the function object's name for our def name when building AST. Mkae that overrideable to handle cases like reassignment Test Plan: Imported from OSS Differential Revision: D21444535 Pulled By: suo fbshipit-source-id: 4f045f18b5a146edc8005689af525d7d7ed8dd5f	2020-05-12 23:20:35 -07:00
Natalia Gimelshein	3d968088e0	fix multinomial kernels to properly advance random states (#38046 ) Summary: Before, multinomial kernels did not advance random states enough, which lead to the same sequence being generated over and over with a shift of 4. This PR fixes that. Fixes https://github.com/pytorch/pytorch/issues/37403 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38046 Differential Revision: D21516542 Pulled By: ngimel fbshipit-source-id: 23248a8c3a5c44316c4c35cd71a8c3b5f76c90f2	2020-05-12 22:33:11 -07:00
Shen Li	756788ea87	Keep py::object alive until jit::toIValue returns (#38348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38348 Test Plan: Imported from OSS Differential Revision: D21530282 Pulled By: mrshenli fbshipit-source-id: a507402fbbd89618936ac6eecb4a223ab86236c6	2020-05-12 22:16:17 -07:00
Hongyi Jia	e39991e838	[TensorPipe Agent] Bind default IP address (#37910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37910 To resolve the issue: https://github.com/pytorch/pytorch/issues/36715 In tensorpipe rpc agent, we currently hardcoded localhost as pipes handshake IP address. This prevents us from setting up cross-host connections. As the first step, we start binding IP address for a given network device. For now it's defaulted to eth0. Will provide options to let user configure Test Plan: CI Reviewed By: lw Differential Revision: D21421094 fbshipit-source-id: 60f612cbaeddcef7bd285136ad75af20709a7d56	2020-05-12 21:09:11 -07:00
Jeff Daily	c20b0080c6	Partial revert of #38144 to fix ROCm CI. (#38363 ) Summary: CC ezyang xw285cornell Pull Request resolved: https://github.com/pytorch/pytorch/pull/38363 Differential Revision: D21539778 Pulled By: ezyang fbshipit-source-id: 0f7d3b8e3b30ab4d5992f1c13aa8d48069796a8d	2020-05-12 21:03:19 -07:00
Shen Li	797c608f50	Explicitly decref py::object in PythonRpcHandler (#38366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38366 Test Plan: Imported from OSS Differential Revision: D21537612 Pulled By: mrshenli fbshipit-source-id: 089bcc3d7de3bce6e769f72d67e0e0f91e0219c6	2020-05-12 20:55:59 -07:00
Shen Li	2e9d6d99be	Explicitly decref py::object in ConcretePyObjectHolder and PythonFunctionGuard (#38364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38364 Test Plan: Imported from OSS Differential Revision: D21537611 Pulled By: mrshenli fbshipit-source-id: e22d1f1360cf71bec526841b5014013b11316f8d	2020-05-12 20:55:53 -07:00
Shen Li	d001862aff	Minor code cleanup (#38340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38340 Test Plan: Imported from OSS Differential Revision: D21530281 Pulled By: mrshenli fbshipit-source-id: 358bdcb6b2eb3ed871fd8b699438b0ef05362613	2020-05-12 20:54:11 -07:00
Jongsoo Park	6be3e5d3bb	[caffe2] weight_decay in reduced precision adagrad Summary: As title Test Plan: CI Reviewed By: taiqing Differential Revision: D21512729 fbshipit-source-id: 0777c90954ebad0cbd5785460e7b2a7c8c146316	2020-05-12 20:33:40 -07:00
Xiong Wei	cfe3c795ed	Port torch/csrc/jit/runtime/register_distributed_ops.cpp to new operator registration API (#38014 ) Summary: Port register_distributed_ops.cpp with the new registration API introduced in https://github.com/pytorch/pytorch/issues/36258. resolve https://github.com/pytorch/pytorch/issues/37579 Signed-off-by: Xiong Wei <xiongw.fnst@cn.fujitsu.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/38014 Differential Revision: D21502643 Pulled By: ezyang fbshipit-source-id: e1749d788b5c0f2a903ffac2f0c94929d6a8ad72	2020-05-12 19:14:18 -07:00
Pavel Belevich	34523b70c1	Renamed _transformation to transformation:: (#38301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38301 Test Plan: Imported from OSS Differential Revision: D21534886 Pulled By: pbelevich fbshipit-source-id: 44baa563b6e8624bcc6290c4054e9b3189bad69b	2020-05-12 19:11:37 -07:00
Pavel Belevich	4f08bdddfc	Add skipIfNoSciPy/get_all_int_dtypes/get_all_fp_dtypes to common_utils (#38299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38299 Test Plan: Imported from OSS Differential Revision: D21534876 Pulled By: pbelevich fbshipit-source-id: 864881b3be899aea3660039128d9bc2e94edab95	2020-05-12 19:11:31 -07:00
Pavel Belevich	00be4abc38	Fixing DistributionsHelper.h includes (#38298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38298 Moved unnecessary includes to `THTensorRandom.cpp` Test Plan: Imported from OSS Differential Revision: D21534864 Pulled By: pbelevich fbshipit-source-id: bfec9cf5ce7587b1bd1674bc47850c16446621e9	2020-05-12 19:11:26 -07:00
Pavel Belevich	70c6550cc9	Forgotten changes for Tensor.random_()'s from and to bounds for floating-point types (#38287 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38287 Test Plan: Imported from OSS Differential Revision: D21534847 Pulled By: pbelevich fbshipit-source-id: 6ea972186789347555efbbf68407b5f12960dae6	2020-05-12 19:09:37 -07:00
Elias Ellison	eb3e9872c9	[JIT] make torch.unique compilable (#38156 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/37986 Follows the stack in https://github.com/pytorch/pytorch/pull/33783 stack to make functions in `torch/functional.py` resolve to their python implementations. Because the return type of `torch.unique` depends on `return_inverse` and `return_counts` I had to refactor the implementation to use our boolean_dispatch mechanism. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38156 Differential Revision: D21504449 Pulled By: eellison fbshipit-source-id: 7efb1dff3b5c00655da10168403ac4817286ff59	2020-05-12 18:37:53 -07:00
Hong Xu	4a266c93a6	Allow specifying range in and cpu_serial_kernel and cpu_serial_kernel_vec (#37981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37981 This additional parameter may be helpful in parallelize range factories Differential Revision: D21506744 Test Plan: Imported from OSS Pulled By: malfet fbshipit-source-id: be9418216510ae600c555188971663fafb413fa0	2020-05-12 18:32:33 -07:00
Emilio Castillo	f7e7a15a5d	Fix `NaN` comparison in `torch.median` (#38216 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38018 when calling `eq_with_nan(v, kValue)` having `v` and `kValue` both `nan` is returning `false` when it should be `true`. https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/SortingKthValue.cu#L76 The implementation is using intrinsics such as `__double_as_longlong` and comparing their bit representations. But the values of the bits obtained for both nans are different. `9221120237041090560` for `v` `9223372036854775807` for `kValue` two different nans have different bit representations, so we have to do additional comparisons to fix this. I changed this comparison and it seems to be working now. However, when compared to a CPU implementation, the returned indices for the values seems to be random but valid. Probably this is an effect of the comparison order in the Cuda version. I am not sure if this is ok since all the indices point to valid elements. For the snippet in the issue I get the following: ``` # CUDA Values tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], device='cuda:0', dtype=torch.float64) # CUDA indices tensor([304, 400, 400, 528, 304, 304, 528, 336, 304, 432, 400, 280, 280, 336, 304, 336, 400, 304, 336, 560], device='cuda:0') ``` ``` # CPU values tensor([nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan], dtype=torch.float64) # CPU indices tensor([515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515, 515]) ``` Also, maybe its better to change the `eq_with_nan` implementations to address this instead? I am not sure if this will cause code to break in other places though ... Pull Request resolved: https://github.com/pytorch/pytorch/pull/38216 Differential Revision: D21517617 Pulled By: ngimel fbshipit-source-id: deeb7bb0ac519a03aa0c5f365005a9150e6404e6	2020-05-12 18:27:14 -07:00
Kimish Patel	2c881417a7	Change input scale to double type for conv params. (#38346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38346 Given qtensor stores scale as double, this mismatch can cause use to repack weights everytime in QNNPACK. Worse given that we release original weights runtime can crash. Test Plan: pytest test/quantization/test_quantized_module.py::TestStaticQuantizedModule::test_conv2d_api Imported from OSS Differential Revision: D21529384 fbshipit-source-id: 859b763dee5476e1554ebc278c5b95199a298eab	2020-05-12 18:02:22 -07:00
Nikita Shulga	e3357a7812	Fix typo in build environment name (#38343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38343 Test Plan: `grep pytorch-linux-xenial-py3.6-gcc5.4-test .circleci -R` + CI Differential Revision: D21537546 Pulled By: malfet fbshipit-source-id: 4e790eaee388e51e28640b43d56ef2c07ca146c4	2020-05-12 17:54:54 -07:00
Eli Uriegas	80639604a8	Revert D21536269: [pytorch][PR] [RELAND] [RELAND] .circleci: Improve docker image build workflow Test Plan: revert-hammer Differential Revision: D21536269 Original commit changeset: 5577f84fa49d fbshipit-source-id: dd824f74521595b7a0efac7ae94ce3c64df04a20	2020-05-12 17:28:34 -07:00
Elias Ellison	c2ac2127be	[JIT] recursively compile class types (#38050 ) Summary: Make it so that non-nn Module classes do not need to be annotated with `torch.jit.script` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38050 Differential Revision: D21482654 Pulled By: eellison fbshipit-source-id: 22689e4d7a33f6e1574b9495cff29a1fe6abb910	2020-05-12 17:16:28 -07:00
Eli Uriegas	cdf4d42c39	[RELAND] [RELAND] .circleci: Improve docker image build workflow (#38335 ) Summary: This reverts commit 6e66e8562f276e2015af8ff76437a3f0277c4bcc. Two things learned from the previous reland: * `cirlceci-agent step halt` doesn't actually halt the step in place, you must explicitly exit the step after the `step halt` is called * Even though `circleci` uses `git` to checkout repositories inside of docker images, that does not mean `git` is available after the fact. <details> <summary> Changes from previous reland </summary> ```patch commit cc99a12c9029472bd73325876bc0e9dbb1746b05 Author: Eli Uriegas <eliuriegas@fb.com> Date: Tue May 12 10:58:18 2020 -0700 .cirlceci: Install git for gc, exit step explicitly Signed-off-by: Eli Uriegas <eliuriegas@fb.com> diff --git a/.circleci/config.yml b/.circleci/config.yml index 481d7889da..856a0fb10a 100644 --- a/.circleci/config.yml +++ b/.circleci/config.yml @@ -2018,13 +2018,15 @@ jobs: export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1} eval $(aws ecr get-login --no-include-email --region us-east-1) set -x + PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker") # Check if image already exists, if it does then skip building it if docker manifest inspect "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${IMAGE_NAME}:${DOCKER_TAG}"; then circleci-agent step halt + # circleci-agent step halt doesn't actually halt the step so we need to + # explicitly exit the step here ourselves before it causes too much trouble + exit 0 fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker") # If no image exists but the hash is the same as the previous hash then we should error out here - # no stampeding herd effect plz. if [[ ${PREVIOUS_DOCKER_TAG} = ${DOCKER_TAG} ]]; then echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" echo " contact the PyTorch team to restore the original images" diff --git a/.circleci/ecr_gc_docker/Dockerfile b/.circleci/ecr_gc_docker/Dockerfile index d0198acb86..36347d5e6d 100644 --- a/.circleci/ecr_gc_docker/Dockerfile +++ b/.circleci/ecr_gc_docker/Dockerfile @@ -1,6 +1,6 @@ FROM ubuntu:16.04 -RUN apt-get update && apt-get install -y python-pip && rm -rf /var/lib/apt/lists/* /var/log/dpkg.log +RUN apt-get update && apt-get install -y git python-pip && rm -rf /var/lib/apt/lists/* /var/log/dpkg.log ADD requirements.txt /requirements.txt diff --git a/.circleci/verbatim-sources/docker_jobs.yml b/.circleci/verbatim-sources/docker_jobs.yml index e04d11c5cd..3918cc04ae 100644 --- a/.circleci/verbatim-sources/docker_jobs.yml +++ b/.circleci/verbatim-sources/docker_jobs.yml @@ -35,13 +35,15 @@ export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1} eval $(aws ecr get-login --no-include-email --region us-east-1) set -x + PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker") # Check if image already exists, if it does then skip building it if docker manifest inspect "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${IMAGE_NAME}:${DOCKER_TAG}"; then circleci-agent step halt + # circleci-agent step halt doesn't actually halt the step so we need to + # explicitly exit the step here ourselves before it causes too much trouble + exit 0 fi - PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker") # If no image exists but the hash is the same as the previous hash then we should error out here - # no stampeding herd effect plz. if [[ ${PREVIOUS_DOCKER_TAG} = ${DOCKER_TAG} ]]; then echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch" echo " contact the PyTorch team to restore the original images" ``` </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/38335 Differential Revision: D21536269 Pulled By: seemethere fbshipit-source-id: 5577f84fa49dd6e1e88fce461646fd68be3d417d	2020-05-12 17:10:12 -07:00
Meghan Lele	3134978816	[JIT] Handle del statements with variables as targets (#37608 ) Summary: Summary This commit modifies the JIT frontend to handle `del` statements with variables as targets by dropping the mapping corresponding to that variable from the environment stack maintained by the IR emitter code. Test Plan This commit adds test cases for deleting a variable, deleting a variable and then using it, and deleting a variable in a if-statement, and then using it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37608 Differential Revision: D21507239 Pulled By: SplitInfinity fbshipit-source-id: ac7e353817dc76990ece294c95965cf585d6bdfb	2020-05-12 15:17:07 -07:00
Omkar Salpekar	a2a53447e4	[Tensorpipe Agent] Add Call Counts to Metrics (#38266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38266 Add the client/server active and async call counters to the Tensorpipe Agent metrics. ghstack-source-id: 103949985 Test Plan: CI Reviewed By: lw Differential Revision: D21509236 fbshipit-source-id: 66277f44d974c929a65e87bd270222d0ae27395e	2020-05-12 15:09:24 -07:00
Omkar Salpekar	a4466eeff4	[Tensorpipe Agent] Tracking Active Call Metrics (#38265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38265 Tracking the active calls counts in the TensorPipe Agent: * clientActiveCalls: running count of un-responded/un-errored RPC's sent * serverActiveCalls: running count of un-responded RPC's received * serverAsyncCallCount: running count of received RPC's set to be completed asynchronously ghstack-source-id: 103949984 Test Plan: CI Reviewed By: lw Differential Revision: D21508957 fbshipit-source-id: 8be9dbf77ec06c138c8dd70443976d7bccee0f1e	2020-05-12 15:08:12 -07:00
svcscm	3317fdf177	Updating submodules Summary: GitHub commits: `5a74dde371` `07cfe45cd9` `71650d5d67` `2b81847227` `8ed576f96b` `8a4ba8a17f` `f30ca63d3b` `20021f0396` `67355d562e` `2f0b01b165` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 04350da1e88206c9029aee4c1da9393d75cbbac8	2020-05-12 14:58:12 -07:00
Kimish Patel	f954dd7823	Add dropout removal pass. (#38253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38253 This pass removes dropout and dropout_ nodes when training is false. It requires to have run freeze_module pass which does both inlining and constant propagation, without which training variable remains as attribute instead of constant. ghstack-source-id: 103939141 Test Plan: python test/test_jit.py TestScript.test_remove_dropout Reviewed By: dreiss Differential Revision: D21505863 fbshipit-source-id: 42ea45804e4653b625b6a254c8d8480757264aa8	2020-05-12 14:38:34 -07:00
Cloud Han	8ab6377273	Port atan from TH to ATen (#37991 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/24538 Related https://github.com/pytorch/pytorch/issues/24507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37991 Differential Revision: D21531741 Pulled By: VitalyFedyunin fbshipit-source-id: c762cc80416d7fffbb1769c6cc5e0914ceaa8e2d	2020-05-12 14:22:26 -07:00
anjali411	d5a7d790a1	Use torch.ne instead of torch.nonzero in gradcheck (#37857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37857 Test Plan: Imported from OSS Differential Revision: D21528484 Pulled By: anjali411 fbshipit-source-id: 2c43b4e4d484a943210dd9426c2e3ac1c30c8084	2020-05-12 13:45:45 -07:00
Ailing Zhang	7c13a07286	[Reland] Remove uses of type() part 2 (#38288 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/38140. It got reverted since it broke slow tests which were only run on master branch(thanks mruberry !). Enabling all CI tests in this PR to make sure they pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38288 Reviewed By: mruberry Differential Revision: D21524923 Pulled By: ailzhang fbshipit-source-id: 3a9ecc7461781066499c677249112434b08d2783	2020-05-12 13:37:14 -07:00
Jeremy Lilley	b6d494d6da	[future] Minor: std::move() callback in future for the convenience operator case. (#37861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37861 ghstack-source-id: 103511066 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D21409650 fbshipit-source-id: 6c501963f73590512e426f6806f4530aad618b1a	2020-05-12 13:35:41 -07:00
James Reed	525295e696	BC upgrader for dynamic Linear with torchbind (#38333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38333 Test Plan: Imported from OSS ``` manifold get fblearner_inference_platform_models/tree/149745959/1/149745959_1.000 /tmp/149745959_1 ``` In python: ``` import torch torch.jit.load('/tmp/149745959_1') ``` ` buck test mode/dev-nosan //caffe2/torch/fb/predictor/model_repo/tests:model_loading_test` passes Reviewed By: ailzhang Differential Revision: D21527444 Pulled By: jamesr66a fbshipit-source-id: b33cab29df2d68beb482a044e604e41683e4fef6	2020-05-12 13:30:40 -07:00
Rohan Varma	906c50eb69	Remove dead code in ddp.{h, cpp} (#37990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37990 The code in `ddp.{h, cpp}` and the corresponding pybind implementations are no longer used. The pybinded calls were all private APIs and only ran in unittests, so we should remove these unused APIs. https://github.com/pytorch/pytorch/pull/20234 from a year ago also mentioned that we should delete `_dist_broadcast_coalesced` Verified that all tests pass with cuda by running `test_c10d` on a gpu-enabled machine. ghstack-source-id: 103885383 Test Plan: CI Differential Revision: D21443879 fbshipit-source-id: 764d8681ca629056bfe2c260ffab47fa5bdf07ff	2020-05-12 12:41:09 -07:00
David Reiss	6daaeb2bda	[pytorch] Add C++ error when PyTorch used with Python 2 Summary: Python 2 has reached end-of-life and is no longer supported by PyTorch. To avoid confusing behavior when trying to use PyTorch with Python 2, detect this case early and fail with a clear message in C++. Test Plan: waitforsandcastle Reviewed By: orionr Differential Revision: D21043062 fbshipit-source-id: ab448d2888f5048a0180598b882adfc67e31d851	2020-05-12 12:33:47 -07:00
Kimish Patel	a90e574401	Enable linear/conv + relu fusion in mobile optimizer. (#38139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38139 As title says. Test Plan: mobile optimizer test. Reviewed By: AshkanAliabadi Differential Revision: D21479880 fbshipit-source-id: 07b47cd620ee8af4dbe3c98bd94924b159c1406f	2020-05-12 12:28:26 -07:00
anjali411	82abd50f2b	Added more autograd tests for C->C complex functions (#37856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37856 Test Plan: Imported from OSS Differential Revision: D21528455 Pulled By: anjali411 fbshipit-source-id: d18b546cac3aae11c1cda748df56dbf5aeca66b8	2020-05-12 12:19:10 -07:00
Rohan Varma	291869d625	Remove unnecessary RPC profiling code after future merge (#38255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38255 Now that the futures are consolidated after https://github.com/pytorch/pytorch/pull/35154, there is no `torch.distributed.rpc.Future` and we do not need a special path. All futures can now be profiled through the use of the jit operator defined in record_function_ops.cpp As a result, we also get rid of the record_function_ops.h file. RPC profiling tests are currently disabled, although I re-enabled them locally to ensure that they still work with this change. ghstack-source-id: 103869855 Test Plan: CI Differential Revision: D21506091 fbshipit-source-id: ad68341c9f2eab2dadc72fe6a6c59b05693434f2	2020-05-12 12:03:16 -07:00
Jongsoo Park	7c66ad8941	[caffe2/fakelowp] fix bug in ref code (#38331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38331 C_ref size was wrong Test Plan: CI Reviewed By: hyuen Differential Revision: D21525639 fbshipit-source-id: 59f4709238cdd46bb38f7c534335eb79229f6c7f	2020-05-12 11:58:41 -07:00
Emilio Castillo	779abf7538	Implements torch.pow for complex on cuda and enables complex values as exponents for pow (#36793 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/36744 It also allows to call pow on the cpu with complex values as exponent, which was not possible before. TODO: Add tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/36793 Differential Revision: D21525514 Pulled By: anjali411 fbshipit-source-id: c4624c97b194cb1d942e5dd0ee9042adf7586ed3	2020-05-12 11:28:44 -07:00
Xiang Gao	986d7e47c4	Migrate CPU fill kernel to c10::complex (#38026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38026 Test Plan: Imported from OSS Differential Revision: D21518318 Pulled By: anjali411 fbshipit-source-id: 0bbf47f53a7aad619d5a3e22f7ba875dc007b881	2020-05-12 11:15:34 -07:00
Xiang Gao	d5e8d90a2c	Migrate CPU reduction to c10::complex (#38022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38022 Test Plan: Imported from OSS Differential Revision: D21518270 Pulled By: anjali411 fbshipit-source-id: 382845a4d966fcdcb416895341502a09e378e57c	2020-05-12 11:10:10 -07:00
Hector Yuen	96d2ddba6c	remove harcoded values for fc testing Summary: removing hard coded dimensions Test Plan: ran the test itself Reviewed By: jspark1105, amylittleyang Differential Revision: D21520255 fbshipit-source-id: a75043103c61b91b8f10f405abff4790292e92c4	2020-05-12 11:05:11 -07:00
Nikita Shulga	b29ec43555	Limit max numel for test tensors (#38304 ) Summary: Add `max_numel` option to `hypothesys_utils.array_shapes` Use it to limit tensor element count to 100K for tensors whose maximum number of elements can exceed 250K Pull Request resolved: https://github.com/pytorch/pytorch/pull/38304 Differential Revision: D21525483 Pulled By: malfet fbshipit-source-id: fac132dc7274b9417141b708cc9535561a95fcb3	2020-05-12 10:46:00 -07:00
Nikita Shulga	9576b37caf	Fix test_channel_shuffle hypothesis params (#38327 ) Summary: Otherwise, zero-point can be out of range, if selected type is torch.qint8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38327 Differential Revision: D21525214 Pulled By: malfet fbshipit-source-id: 989f58f79830ec7f616a68f0ab00661b15030062	2020-05-12 09:37:09 -07:00
peter	7eb9f1788c	Using LoadLibraryEX [Reland] (#38302 ) Summary: This reverts commit 1ab4f35499aa933677152aca6a1ba2cbe86639f8. Without this PR, the OS try to find the DLL in the following directories. - The directory from which the application loaded. - The system directory. Use the GetSystemDirectory function to get the path of this directory. - The 16-bit system directory. There is no function that obtains the path of this directory, but it is searched. - The Windows directory. Use the GetWindowsDirectory function to get the path of this directory. - The current directory. - The directories that are listed in the PATH environment variable. Note that this does not include the per-application path specified by the App Paths registry key. The App Paths key is not used when computing the DLL search path. If we use LoadLibraryEx with LOAD_LIBRARY_SEARCH_* flags, the directories are searched in the following order. - The directory that contains the DLL (LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR). This directory is searched only for dependencies of the DLL to be loaded. - The application directory (LOAD_LIBRARY_SEARCH_APPLICATION_DIR). - Paths explicitly added to the application search path with the AddDllDirectory function (LOAD_LIBRARY_SEARCH_USER_DIRS) or the SetDllDirectory function. If more than one path has been added, the order in which the paths are searched is unspecified. - The System32 directory (LOAD_LIBRARY_SEARCH_SYSTEM32). Advantages: 1. The directory that contains the DLL comes first and it's desirable for us, because the dependencies in `lib` should always be preferred. 2. The system directory is considered in the last place. According to some of the bug reports, the DLL load failure are caused by loading the conflicting ones in systemroot. Neural: 1. The directories in `PATH` are not considered. Similar things happen as described in the previous point. So it may be beneficial for normal users. However, it may cause failures if there are some new dependencies if built from source. (Resolved by making the fallback to `LoadLibraryW` if error code is `126`) Disadvantages: 1. LoadLibraryEx with LOAD_LIBRARY_SEARCH_* flags is only available for Win7/2008 R2 + KB2533623 and up. (Resolved by making the fallback to `LoadLibraryW` if it is not supported) 2. Failure during the call of `LoadLibraryEx` will lead to the OS to pop up a modal dialog, which can block the process if user is using a CLI-only interface. This can be switched off by calling `SetErrorMode`. (Resolved by calling `SetErrorMode`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/38302 Test Plan: Test some common cases (in a new repo maybe) including 1. Python 3.6/3.7/3.8, conda python, conda install 2. Python 3.6/3.7/3.8, conda python, pip install 3. Python 3.6/3.7/3.8, official python, pip install Plus some corner cases like 1. Conflicting DLLs in systemroot or `PATH` 2. Remove some local dependencies and use global ones References: 1. https://docs.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-seterrormode 2. https://docs.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa 3. https://docs.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-search-order#standard-search-order-for-desktop-applications Differential Revision: D21524090 Pulled By: malfet fbshipit-source-id: 0cf5e260c91759b0af8c7aa0950a488e3b653ef5	2020-05-12 09:31:43 -07:00
Gregory Chanan	6bb1c4a7ab	Move (most) generated return statements for TH functions out of the switch. (#38073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38073 Most of the generated return statements don't depend on the scalar type and it saves ~900 lines of generated code. Test Plan: Imported from OSS Differential Revision: D21476010 Pulled By: gchanan fbshipit-source-id: 3fcc4db466d697c90abafb9da6c3f3644621810b	2020-05-12 09:19:09 -07:00
Xiang Gao	e3584f8d7e	Migrate CPU tensor factories to c10::complex (#38021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38021 Test Plan: Imported from OSS Differential Revision: D21518263 Pulled By: anjali411 fbshipit-source-id: 9d7769357cf51d3f71d8833fa9ca108a1a97e9cd	2020-05-12 07:32:40 -07:00
Donna Choi	4c99a9b672	Add documentation for hardswish (#37989 ) Summary: Fix issue https://github.com/pytorch/pytorch/issues/37431. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37989 Differential Revision: D21502182 Pulled By: zou3519 fbshipit-source-id: 245586fb555f7f1d9ec8d87269035b6fe626b47b	2020-05-12 06:48:51 -07:00
Anjali Chourdia	ba0851326c	Revert D21449462: [CUDA] addmv for complex tensors Test Plan: revert-hammer Differential Revision: D21449462 Original commit changeset: 1f2dd5a7f8a4 fbshipit-source-id: 4f5f035668d1de4469d11ddeb08a77340eb52f98	2020-05-12 05:21:11 -07:00
svcscm	5c44f2a16b	Updating submodules Summary: GitHub commits: `7bb3cd718a` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 6b2332b2db9a27c063c4327671a7e921f85621e5	2020-05-12 01:46:07 -07:00
Karl Ostmo	cf82011361	Codegen CircleCI Windows configs (#38292 ) Summary: This is a step toward re-automating most of the CircleCI `config.yml` generation so that it can be safely refactored into multiple `workflow`s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38292 Differential Revision: D21519337 Pulled By: kostmo fbshipit-source-id: 09cc4f97ac52f37ef6d8a6fb8f49eeead052b446	2020-05-11 22:16:43 -07:00
Jongsoo Park	3a63728149	[caffe2/fakelowp] optimize ref int8 gemm (#38294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38294 Optimize the reference int8 gemm using avx2 intrinsics Test Plan: Before this diff 7.72164 GF/s After this diff 27.7731 GF/s Reviewed By: amylittleyang Differential Revision: D21516439 fbshipit-source-id: 2b596605eec6a338a295701a01cf2c8639204274	2020-05-11 22:04:12 -07:00
Shen Li	dad552666e	Add then(callback)->Future API to ivalue::Future (#37311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37311 Test Plan: Imported from OSS Differential Revision: D21247827 Pulled By: mrshenli fbshipit-source-id: f8fe0617ccb957aa747a78554a000ce2c4a58495	2020-05-11 21:58:56 -07:00
Xinyu Li	dcf1861f88	add document for bucktization (#38119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38119 This is for (#37435). Demo is here: https://glaringlee.github.io/generated/torch.searchsorted.html https://glaringlee.github.io/generated/torch.bucketize.html Test Plan: Imported from OSS Differential Revision: D21517392 Pulled By: glaringlee fbshipit-source-id: b35795c7f07e9ae4c4806c528eb51fd4ca14d499	2020-05-11 21:54:19 -07:00
anjali411	0d977e9223	[CUDA] addmv for complex tensors (#37940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37940 Test Plan: Imported from OSS Differential Revision: D21449462 Pulled By: anjali411 fbshipit-source-id: 1f2dd5a7f8a42d3ba92a1b1a286f35454392a06d	2020-05-11 21:46:52 -07:00
Sebastian Messmer	63c3b89c1c	Simplify code with decltype(auto) (#30922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30922 New c++14 feature we can use now ghstack-source-id: 103767403 Test Plan: waitforsandcastle Differential Revision: D18869644 fbshipit-source-id: 54541c8004b2116386668a31eb9b0410a603b7dc	2020-05-11 21:31:18 -07:00
Supriya Rao	6943253421	[quant][mobile] Don't release bias tensor (#38284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38284 Bias is used to calculate out channels Test Plan: Imported from OSS Differential Revision: D21515997 fbshipit-source-id: 5fe5ddd4c7ce5cc49d15c477b744994a3db5fc89	2020-05-11 21:23:48 -07:00
Supriya Rao	09e4ff95ee	[quant][mobile] Ensure qconv doesn't assert with empty batch (#38252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38252 Return empty batch output if input has empty batch on mobile. Test Plan: python test/test_quantization.py TestQNNPackOps.test_qconv_empty_batch Imported from OSS Differential Revision: D21515998 fbshipit-source-id: 1eab4710f4c21d06521e1a172f9bc708dbaeb3c0	2020-05-11 21:22:06 -07:00
peter	ec7beda822	Use thrust::host_vector instead of std::vector (#38178 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/38024. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38178 Differential Revision: D21502379 Pulled By: ezyang fbshipit-source-id: 74dd6504c56f4150ed4cef129fd3f32f378c0564	2020-05-11 20:34:04 -07:00
Ralf Gommers	cebf5a8767	Run mypy on some test files, add iinfo/finfo annotations (#38220 ) Summary: Most test files have a ton of errors; there's not much point adding ignores for them though. The way of working is simply to run `mypy test/test_somefile.py`, fix up the errors, then add that file to the `files =` list in `mypy.ini`. Can't add all of `test/*` by default, because the JIT test files have (on purpose) syntax errors that are meant to exercise the robustness of the JIT to bad annotations. Leave those alone for now. _Depends on the ghstacked PRs in gh-38173, only the last 2 commits are new._ Pull Request resolved: https://github.com/pytorch/pytorch/pull/38220 Differential Revision: D21503481 Pulled By: ezyang fbshipit-source-id: 63026e73201c549d64647a03a20a4c6687720244	2020-05-11 20:18:41 -07:00
Natalia Gimelshein	6e66e8562f	Revert D21517822: [pytorch][PR] [RELAND] .circleci: Improve docker image build workflow Test Plan: revert-hammer Differential Revision: D21517822 Original commit changeset: 5f705f6c617c fbshipit-source-id: a7ee422abdb1e966c267f62c45c73f4b4cb45b57	2020-05-11 20:14:30 -07:00
Xiang Gao	bf499cccb6	Refactor native/cpu/zmath.h (#38037 ) Summary: There is now a `zmath.h` and `zmath_std.h`, where the latter is the copy-paste of the original `zmath.h` and supporting `std::complex`, and `zmath.h` is for supporting `c10::complex`. `zmath_std.h` will be removed eventually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38037 Differential Revision: D21518177 Pulled By: anjali411 fbshipit-source-id: 18552e955dc31f95870f34962d709de0444804f6	2020-05-11 20:09:44 -07:00
anjali411	375ddb01b5	Fix tensor printing (#38031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38031 Test Plan: Imported from OSS Differential Revision: D21502915 Pulled By: anjali411 fbshipit-source-id: 0cc3017a390da55af47ba81f651a883cd52b10da	2020-05-11 19:59:19 -07:00
Eli Uriegas	eea9c6a048	[RELAND] .circleci: Improve docker image build workflow (#38279 ) Summary: This reverts commit 7c2853be9dccc7a1ae80a2a421e63e254cd7797c. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38279 Differential Revision: D21517822 Pulled By: seemethere fbshipit-source-id: 5f705f6c617cbc77a10ab9deb913bc5958ae7439	2020-05-11 19:38:51 -07:00
Ilia Cherniavskii	43dd8760d7	Move ThreadLocalDebugInfo to c10 (#37774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37774 Move ThreadLocalDebugInfo from ATen to C10 Test Plan: Imported from OSS Differential Revision: D21384249 Pulled By: ilia-cher fbshipit-source-id: f9b5089a868f84a2ee013695a481fcc883d3c6b2	2020-05-11 19:27:41 -07:00
Sebastian Messmer	6968c8153e	Warn against callOp (#37797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37797 This is slow (see comment in code). Not fixing this yet, but at least adding a warning so people are aware and don't add new call sites. ghstack-source-id: 103887226 Test Plan: waitforsandcastle Differential Revision: D21390364 fbshipit-source-id: 7bff1c3b9756a16c9d9110f209c23bf557266dda	2020-05-11 19:21:50 -07:00
Bharat123rox	42a222cf2c	DOC: Add missing args for index_add (#38213 ) Summary: Fix https://github.com/pytorch/pytorch/issues/37752 by updating `index_add`documentation as suggested by danpovey Pull Request resolved: https://github.com/pytorch/pytorch/pull/38213 Reviewed By: ilia-cher Differential Revision: D21506728 Pulled By: ngimel fbshipit-source-id: 3c08bc3743cd4ba8c0c97b7d359d35e82f0127ac	2020-05-11 18:37:25 -07:00
Mikhail Zolotukhin	cdd1b9a891	[TensorExpr] Distinguish aten::max reduction op from aten::max elementwise op and only fuse the latter. (#38171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38171 Test Plan: Imported from OSS Differential Revision: D21487389 Pulled By: ZolotukhinM fbshipit-source-id: ac28789bf2bea389f560de4d5b979e036295e96a	2020-05-11 17:45:59 -07:00
Shawn Zhong	21ce4333b9	Remove `THFile`, `THDiskFile`, and `THMemoryFile` (#37830 ) Summary: Fix https://github.com/pytorch/pytorch/issues/36996 Remove seemingly unused files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37830 Differential Revision: D21431969 Pulled By: mruberry fbshipit-source-id: 824fa86f03ce0049ceddb093468800154ae8048b	2020-05-11 17:18:35 -07:00
Natalia Gimelshein	1ab4f35499	Revert D21496081: [pytorch][PR] Using LoadLibraryEx and LOAD_LIBRARY_SEARCH_* flag for loading DLLs o… Test Plan: revert-hammer Differential Revision: D21496081 Original commit changeset: aa5e528e5134 fbshipit-source-id: c0636b06dd65c7419018062f79aabc397fb2c5b8	2020-05-11 16:38:37 -07:00
Emilio Castillo	f41833957d	bypass `getDeviceFromPtr` check when device is known (#36714 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/36594 In some cases, when using memory that was allocated in another process before doing any memory-related operation in PyTorch, there are errors because the GPU CUDA context is not completely initialized. I guess there is an explicit reason to leave the context not initialized at first, and don't do it in `THCudaInit` where other CUDA calls are going on. I'd like to discuss it in this PR. Possible better solutions are Initialize the device context in `fromDLPack` or `from_blob`, probably by creating some dummy array with one element. But this feels like a hack. Another possibility is to catch the exception in `getDeviceFromPtr`, check if the context was initialized, and if not repeat this operation. but we will need to check for every device. This PR bypasses the `getDeviceFromPtr` call which is the one causing the problem if we already know the device. This allows us to create the Tensor from the shared memory storage but the context will not be initialized. However, it will be when the tensor is accessed later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36714 Differential Revision: D21504557 Pulled By: ngimel fbshipit-source-id: 173ccdeb7c2a2b0ece53dd50be97f2df577a5634	2020-05-11 16:20:23 -07:00
anjali411	8e07b75cef	Have DeviceType available in torch namespace (#38036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38036 Resolves: https://github.com/pytorch/pytorch/issues/36946 Test Plan: Imported from OSS Differential Revision: D21463610 Pulled By: anjali411 fbshipit-source-id: c4aabfac2cd1f05f8b66745aae0a17c2af4d9c9b	2020-05-11 16:06:52 -07:00
Nikita Shulga	7c2853be9d	Revert D21511048: [pytorch][PR] .circleci: Improve docker image build workflow Test Plan: revert-hammer Differential Revision: D21511048 Original commit changeset: e4b153a6078e fbshipit-source-id: 09ad9ad9b108479cba44070c82182dd91fd4f099	2020-05-11 15:52:03 -07:00
neginraoof	333e29c45f	[ONNX] Fix pow op export (#38065 ) Summary: Fix pow type cast for opset 9 and update opset 12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38065 Differential Revision: D21485353 Pulled By: malfet fbshipit-source-id: 3993e835ffad07b2e6585eb5cf1cb7c8474de2ec	2020-05-11 15:46:44 -07:00
毛毛	19d6e32e9a	fix sample code (#38002 ) Summary: Make Linear layer working correct when bias is False Pull Request resolved: https://github.com/pytorch/pytorch/pull/38002 Differential Revision: D21509679 Pulled By: malfet fbshipit-source-id: c7077992cf414ecc557b39e5ed1e39ef01c8b347	2020-05-11 15:34:09 -07:00
Eli Uriegas	def9f15b57	.circleci: Improve docker image build workflow (#37976 ) Summary: closes https://github.com/pytorch/pytorch/issues/37855 ## .circleci: Improve docker image build workflow Improves the docker image build workflow from many steps to basically transparent from a user's perspective. To update docker images now all one has to do is edit the .circleci/docker folder and it will update automatically and also dynamically add the tags to the list of tags to keep from the garbage collector. Adding a new image will currently stay the same but we can explore doing that dynamically as well. ### How the build workflow works: - Docker tags are determined by the hash defined from git for the .circleci/docker sub-directory (extracted using git rev-parse) - Images are only built if the computed hash is not found in ecr and the hash is different than the previously computed hash. The previously computed hash is found using the same process as before but subbing out HEAD for the merge base between HEAD and the base git revision - That tag is then passed through the jobs using a shared workspace which is added to downstream jobs using the circleci ${BASH_ENV} ### How the new garbage collection works: - Tags to keep are generated by stepping through all of the commits in in the .circleci/docker subdirectory Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/37976 Differential Revision: D21511048 Pulled By: seemethere fbshipit-source-id: e4b153a6078e3875f6cfa03a903b2e951d803cce	2020-05-11 15:25:14 -07:00
kshitij12345	a37b865107	test_linspace : remove explicit for-loop (#38191 ) Summary: Reference : https://github.com/pytorch/pytorch/issues/38187 Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CPU : Intel® Core i5-8300H CPU @ 2.30GHz × 8 GPU : GTX 1050ti Test Cmd : `pytest test/test_torch.py -k linspace_cpu_float` Before : ``` test/test_torch.py .. [100%] ======================================================================== 2 passed, 5170 deselected in 24.43s ======================================================================== ``` After : ``` test/test_torch.py .. [100%] ======================================================================== 2 passed, 5170 deselected in 9.20s ========================================================================= ``` Test Cmd : `pytest test/test_torch.py -k linspace_cuda_float` Before : ``` test/test_torch.py ...... [100%] =================================================================== 6 passed, 5166 deselected in 83.84s (0:01:23) =================================================================== ``` After : ``` test/test_torch.py ...... [100%] ======================================================================== 6 passed, 5166 deselected in 40.18s ======================================================================== ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/38191 Differential Revision: D21494478 Pulled By: mruberry fbshipit-source-id: fa58f727781425937a7b8212f9b63a739935eb86	2020-05-11 15:17:47 -07:00
Shen Li	c6b2844076	Pin flake8 to 3.7.9 (#38269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38269 Test Plan: Imported from OSS Differential Revision: D21510318 Pulled By: mrshenli fbshipit-source-id: ac57a0ffed7401c13b7983b8685a8706b8181142	2020-05-11 15:08:36 -07:00
James Reed	a553935e3c	[JIT] Expose magic methods on script::Object (#38167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38167 Test Plan: Imported from OSS Differential Revision: D21486709 Pulled By: jamesr66a fbshipit-source-id: 17b44d979fc658768b0d64f7d8af6fb684043ea3	2020-05-11 15:01:15 -07:00
Meghan Lele	1456515f15	[JIT] Disallow plain List type annotation without arg (#38130 ) Summary: Summary This commit detects and prohibits the case in which `typing.List` is used as an annotation without a type argument (i.e. `typing.List[T]`). At present, `typing.List` is always assumed to have one argument, and when it is used without one, `typing.List.__args__[0]` is nonempty and set to some `typing.TypeVar` instance, which has no JIT type equivalent. Consequently, trying to convert `typing.List` to a JIT type results in a `c10::ListType` with `nullptr` for its element type, which can cause a segmentation fault. This is fixed by returning a `ListType` from `jit.annotations.try_ann_to_type` only if the element type is converted successfully to a JIT type and returning `None` otherwise. Test Plan I ran the code from the issue (https://github.com/pytorch/pytorch/issues/37530) that reported this problem and also ran some unit tests. Before ``` $ python3 segfault.py Segmentation fault (core dumped) ``` After ``` $ python3 segfault.py Traceback (most recent call last): ... RuntimeError: Unknown type name 'List': File "segfault.py", line 9 classmethod def cat(cls, box_lists: List): ~~~~ <--- HERE return cls(torch.cat([x for x in box_lists])) 'Boxes.cat' is being compiled since it was called from 'Boxes' File "segfault.py", line 13 def f(t: torch.Tensor): b = Boxes(t) ~~~~~ <--- HERE c = Boxes(torch.tensor([3, 4])) return Boxes.cat([b, c]) 'Boxes' is being compiled since it was called from 'f' File "segfault.py", line 13 def f(t: torch.Tensor): b = Boxes(t) ~~~~~~~~~~~ <--- HERE c = Boxes(torch.tensor([3, 4])) return Boxes.cat([b, c]) ``` Fixes This pull request fixes https://github.com/pytorch/pytorch/issues/37530. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38130 Differential Revision: D21485284 Pulled By: SplitInfinity fbshipit-source-id: 9b51ef6340485a24c8b7cfb85832d4668b8ac51a	2020-05-11 14:15:54 -07:00
peter	00f3790a9d	Using LoadLibraryEx and LOAD_LIBRARY_SEARCH_* flag for loading DLLs o… (#37763 ) Summary: …n Windows Without this PR, the OS try to find the DLL in the following directories. - The directory from which the application loaded. - The system directory. Use the GetSystemDirectory function to get the path of this directory. - The 16-bit system directory. There is no function that obtains the path of this directory, but it is searched. - The Windows directory. Use the GetWindowsDirectory function to get the path of this directory. - The current directory. - The directories that are listed in the PATH environment variable. Note that this does not include the per-application path specified by the App Paths registry key. The App Paths key is not used when computing the DLL search path. If we use LoadLibraryEx with LOAD_LIBRARY_SEARCH_* flags, the directories are searched in the following order. - The directory that contains the DLL (LOAD_LIBRARY_SEARCH_DLL_LOAD_DIR). This directory is searched only for dependencies of the DLL to be loaded. - The application directory (LOAD_LIBRARY_SEARCH_APPLICATION_DIR). - Paths explicitly added to the application search path with the AddDllDirectory function (LOAD_LIBRARY_SEARCH_USER_DIRS) or the SetDllDirectory function. If more than one path has been added, the order in which the paths are searched is unspecified. - The System32 directory (LOAD_LIBRARY_SEARCH_SYSTEM32). Advantages: 1. The directory that contains the DLL comes first and it's desirable for us, because the dependencies in `lib` should always be preferred. 2. The system directory is considered in the last place. According to some of the bug reports, the DLL load failure are caused by loading the conflicting ones in systemroot. Neural: 1. The directories in `PATH` are not considered. Similar things happen as described in the previous point. So it may be beneficial for normal users. However, it may cause failures if there are some new dependencies if built from source. (Resolved by making the fallback to `LoadLibraryW` if error code is `126`) Disadvantages: 1. LoadLibraryEx with LOAD_LIBRARY_SEARCH_* flags is only available for Win7/2008 R2 + KB2533623 and up. (Resolved by making the fallback to `LoadLibraryW` if it is not supported) 2. Failure during the call of `LoadLibraryEx` will lead to the OS to pop up a modal dialog, which can block the process if user is using a CLI-only interface. This can be switched off by calling `SetErrorMode`. (Resolved by calling `SetErrorMode`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/37763 Test Plan: Test some common cases (in a new repo maybe) including 1. Python 3.6/3.7/3.8, conda python, conda install 2. Python 3.6/3.7/3.8, conda python, pip install 3. Python 3.6/3.7/3.8, official python, pip install Plus some corner cases like 1. Conflicting DLLs in systemroot or `PATH` 2. Remove some local dependencies and use global ones References: 1. https://docs.microsoft.com/en-us/windows/win32/api/errhandlingapi/nf-errhandlingapi-seterrormode 2. https://docs.microsoft.com/en-us/windows/win32/api/libloaderapi/nf-libloaderapi-loadlibraryexa 3. https://docs.microsoft.com/en-us/windows/win32/dlls/dynamic-link-library-search-order#standard-search-order-for-desktop-applications What do you think, malfet ezyang ? Differential Revision: D21496081 Pulled By: malfet fbshipit-source-id: aa5e528e5134326b00ac98982f4db4b4bbb47a44	2020-05-11 14:02:03 -07:00
Shawn Zhong	5f9b9036c1	Add instance methods tensor.isnan(), tensor.isinf(), tensor.isfinite() (#37942 ) Summary: Fix https://github.com/pytorch/pytorch/issues/37736 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37942 Differential Revision: D21503150 Pulled By: soumith fbshipit-source-id: cf6bf57ca67013efe119543f3d9a698473960dec	2020-05-11 13:56:59 -07:00
Peter Bell	5137827ad0	Lazily initialise thread local num_threads value (#37461 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37259, fixes https://github.com/pytorch/pytorch/issues/20156 This lazily calls `at::init_num_threads` once for each thread by adding a call to `lazy_init_num_threads` in `at::parallel_for` and `at::parallel_reduce`. If this solution is okay, then we should add the same to guard other places that might use MKL or OpenMP. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37461 Reviewed By: ezyang Differential Revision: D21472763 Pulled By: ilia-cher fbshipit-source-id: 889d6664f5bd4080037ade02ee324b1233992915	2020-05-11 13:24:45 -07:00
Tao Wu	08c3339e7c	[pyfi] override TP2 networkx -> PyFI networkx (#37764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37764 Auto-generated diff for TP2->PyFI migration. ``` networkx TP2 version: 2.0 PyFI active wheels (networkx): py2-darwin -> 2.3 py2-platform007 -> 2.2 py3-darwin -> 2.3 py3-platform007 -> 2.3 py3.7-platform007 -> 2.3 ``` #buildmore excited_python Test Plan: buildallthethings Reviewed By: thatch Differential Revision: D19790867 fbshipit-source-id: d6f893beee794df5408a5117978b534cafc6ec83	2020-05-11 13:20:00 -07:00
mattip	c31913671c	DOC: add BFloat16 dtype and BFloat16Tensor (#37051 ) Summary: Related to gh-36318 Mention `bfloat16` dtype and `BFloat16Tensor` in documentation. The real fix would be to implement cpu operations on 16-bit float `half`, and I couldn't help but notice that `torch.finfo(torch.bfloat16).xxx` crashes for `xxx in ['max', 'min', 'eps']` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37051 Differential Revision: D21476851 Pulled By: ngimel fbshipit-source-id: fef601d3116d130d67cd3a5654077f31b699409b	2020-05-11 12:44:46 -07:00
Xiang Gao	b290da0e75	Migrate CPU tril, triu, masked_fill to c10::complex (#37897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37897 Test Plan: Imported from OSS Differential Revision: D21442181 Pulled By: anjali411 fbshipit-source-id: 609af9086da1b622db51694f65eadfebe3970cfd	2020-05-11 12:27:46 -07:00
Sebastian Messmer	77d8a44802	If we're building on C++17, use actual "if constexpr" (#38154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38154 This should give better error messages and shorter stack traces on C++17 builds (e.g. fbcode) ghstack-source-id: 103775564 Test Plan: waitforsandcastle Differential Revision: D21483327 fbshipit-source-id: 184d1f9c0543bf43dc9713fa97fcc5955e7be319	2020-05-11 12:22:19 -07:00
Alban Desmaison	3569c59600	Inverse logic of persistent set and prevent use in jit (#38131 ) Summary: jit.ScriptModule deletes all the actual attributes but still uses the nn.Module implementation. Since I don't know how to add this new set() to the ScriptModule, it is simpler to just raise a nice error for now. I also reverted the logic so that an empty set() (which is always the case in a ScriptModule) means that everything is persistent. cc zdevito should we open an issue to add this to the ScriptModule? Pull Request resolved: https://github.com/pytorch/pytorch/pull/38131 Differential Revision: D21502183 Pulled By: albanD fbshipit-source-id: 96f83098d9a2a9156e8af5bf5bd3526dd0fefc98	2020-05-11 09:59:24 -07:00
Gregory Chanan	f314d9a077	Remove codegen for IntArrayRefStride, which isn't used. (#38072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38072 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D21476008 Pulled By: gchanan fbshipit-source-id: 14e5cc7d7e3412e4aca897adcd3653b86dcfaff4	2020-05-11 08:46:29 -07:00
Gregory Chanan	fe53b52537	Macro generate ScalarTypeToCPPType, including all ScalarTypes. (#38071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38071 Fixes: https://github.com/pytorch/pytorch/issues/34826 Test Plan: Imported from OSS Reviewed By: zou3519 Differential Revision: D21476009 Pulled By: gchanan fbshipit-source-id: 96fa9d81f9581179c674e6af2dd903930c8def68	2020-05-11 08:46:24 -07:00
Gregory Chanan	c26dde967c	Kill resize-ing and zero-ing from codegen. (#37958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37958 All codegen invocations have been removed at this point, so this has no effect. Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D21433215 Pulled By: gchanan fbshipit-source-id: 1f58f3022fab6443e34f0201ae4b32b2a99725cf	2020-05-11 08:45:02 -07:00
Hector Yuen	ebad4e463f	add missing include file for fake_nnpi_ops_utils (#38215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38215 add missing include file Test Plan: tested fix on OSS Reviewed By: yinghai Differential Revision: D21498580 fbshipit-source-id: cf6c021738b4a93563fdb98e29502dba5898989d	2020-05-11 08:36:57 -07:00
Edward Yang	6edf340338	Delete torch/__init__.pyi, deferring to direct extension stubs (#38157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38157 This removes the error prone process of assembling `torch/__init__.pyi` (and frequently forgetting to expose things), since now we can simply rely on the true source file to get things done. Most of the old codegen in gen_pyi.py is now rerouted to various files: - `torch/_C/__init__.pyi` (the dumping pile of all misc bindings) - `torch/_C/_nn.pyi` (NN function bindings) - `torch/_C/_VariableFunctions.pyi` (torch function bindings) `torch.types` grew a bunch more definitions that previously where defined in `torch/__init__.pyi` Some miscellaneous changes - Fixed a bug where we treat single TensorList argument as implying varargs are accepted. This is actually only supported on IntList. This means we can correctly generate a stub for dequantize. - Add missing manual stub for nonzero - Switched torch/onnx/operators.py to directly refer to _C module, since apparently mypy doesn't think that methods prefixed with underscores get reexported. This may be a recurring theme; maybe we need to find a better way to solve it. Because I was really lazy, I dumped namedtuple definitions in both `torch._C` and `torch._C._VariableFunctions`. This is definitely wrong. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21497400 Pulled By: ezyang fbshipit-source-id: 07b126141c82efaca37be27c07255cb2b9b3f064	2020-05-11 07:20:13 -07:00
Matthew Brandyberry	6f396e18c3	Add per-device allocator object in CUDACachingAllocator (#37567 ) Summary: Reduces lock contention and BlockPool management costs by tracking applicable state in per-device structures. `THCCachingAllocator` now maintains a set of `DeviceCachingAllocator` objects (one per device) each of which maintains its own allocator state and operations. Only global state remains in the top-level THCCachingAllocator object -- namely, `allocated_blocks`, the mapping between the raw storage pointers and the allocator's underlying Block structure. Global operations deal mostly with this translation and then pass the bulk of the work on to the device-specific allocator. Conversely, device-specific state and operations are comprised mostly of managing the device's underlying blocks. This has the following benefits: - Performance: Access to the global pointer map is serialized independently of the per-device state -- reducing lock contention between operations on different devices. - Simplicity: Managing the block pools in separate device-specific objects is conceptually more intuitive, simplifies the code and makes certain operations more efficient -- even in the absence of contention (e.g. free_cached_blocks, synchronize_and_free_events, emptyCache, get_all_blocks, etc.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/37567 Differential Revision: D21458556 Pulled By: colesbury fbshipit-source-id: ef56cb373797b180df72f0998ebc35972c892288	2020-05-11 06:44:44 -07:00
Cloud Han	324dc1623e	add dtype checking for gather and scatter (#38025 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/37996 in the `cpu_scatter_gather_base_kernel`, it interpret a pointer as `int64_t` regardless the actual dtype. `2b41b9bceb/aten/src/ATen/native/cpu/ScatterGatherKernel.cpp (L106)` add a index dtype checking will avoid the nasty index out of bound error. As using `int64_t` is convention in ATen code (a.k.a, a limitation), no further fix is needed at the moment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38025 Differential Revision: D21498146 Pulled By: ezyang fbshipit-source-id: b1f96f394a460c4bc63d21ec8d4a2cfbf3e97b03	2020-05-10 23:15:45 -07:00
Protonu Basu	503be4e05e	fixing build failures with USE_NATIVE_ARCH ON (#35359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35359 Differential Revision: D21497939 Pulled By: ezyang fbshipit-source-id: d81b653714ee4d7a09945ea9677976a7b61f4d43	2020-05-10 19:27:59 -07:00
Karl Ostmo	f3e620ee83	explain redundant branch/tag filters (#38169 ) Summary: Add a comment because at first glance there doesn't seem to be any need to specify branch and tag filters, just to make them glob to everything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38169 Differential Revision: D21496261 Pulled By: kostmo fbshipit-source-id: 7f75bb466ceffd6b17d4c97d711a8eb6e8b3143a	2020-05-10 10:34:42 -07:00
Xiang Gao	5077518c91	[Resubmit] Migrate AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3 to c10::complex (#38144 ) Summary: This reverts commit 0c936f94d647a2c422d29cafaa923047dd243473. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38144 Differential Revision: D21495374 Pulled By: anjali411 fbshipit-source-id: 33249659fba88f087539233c3d297c0280e17208	2020-05-10 08:09:43 -07:00
Hector Yuen	26928b164f	remove internal file logging.h (#38182 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38182 remove unused internal logging.h Test Plan: sandcastle Reviewed By: yinghai Differential Revision: D21490039 fbshipit-source-id: bd60d332eb2a5cc67408b07f7777716a086cc7ff	2020-05-09 21:10:54 -07:00
Nick Gibson	33f4fca1a6	[TensorExpr] remove Let and LetStmt in favour of binding in Block (#37606 ) Summary: Implementation of the less popular proposal for eliminating overlap between LetStmt and Let: removing both and storing a mapping between Var and value Expr in the Block. This complicates some tests but simplifies the IR by restricting where variable binding can occur. I used the unit tests & python integration tests to verify this is correct but I'm unsure of coverage, particularly around the dependency checker in loopnest - ZolotukhinM your review would be useful there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37606 Differential Revision: D21467483 Pulled By: nickgg fbshipit-source-id: b402d3fce4cacf35d75f300f0a7dca32a43b6688	2020-05-09 16:23:37 -07:00
Vitaly Fedyunin	48ad9f5a30	assertEqual now requires matching dtypes (#38103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38103 Test Plan: Imported from OSS Differential Revision: D21477062 Pulled By: VitalyFedyunin fbshipit-source-id: 9592fed336214dd97eb8e9d6b3e16f21ff6f072d	2020-05-09 14:49:01 -07:00
Vitaly Fedyunin	57d01be92b	Replacing assertEqual with assertEqualIgnoreType wherever types missmatch (#38102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38102 Test Plan: Imported from OSS Differential Revision: D21477060 Pulled By: VitalyFedyunin fbshipit-source-id: 25e0fd837ca9bfccf0ce994c80f7790c894096d4	2020-05-09 14:48:55 -07:00
Vitaly Fedyunin	e3414c1ef1	AssertEqual now checks tensors dtype (#34154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34154 Temporary replacing with `assertEqualIgnoreType` all cases when `AssertEqual` fails. Test Plan: Imported from OSS Differential Revision: D20251131 Pulled By: VitalyFedyunin fbshipit-source-id: fa69c6e2b3a7963912af5b0fa42bec9eded323d3	2020-05-09 14:47:01 -07:00
Nikolay Korovaiko	64d083bb86	fix a bracket (#38039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38039 Differential Revision: D21465272 Pulled By: Krovatkin fbshipit-source-id: 509e967128b15ba171cb72744d21384b871fc8ce	2020-05-09 14:27:55 -07:00
Lu Fang	b579433bf7	Revert D21487840: Bind VariableFunctions as a module, not a class with static methods. Test Plan: revert-hammer Differential Revision: D21487840 Original commit changeset: 368da9b9c50e fbshipit-source-id: 900f5d36490ac8d419c6704f8727d4c8e492bfb7	2020-05-09 11:58:02 -07:00
Mike Ruberry	f6b1c046b6	Revert D21483808: [pytorch][PR] Remove uses of type() part 2 Test Plan: revert-hammer Differential Revision: D21483808 Original commit changeset: 12f5de6151ba fbshipit-source-id: 2755fa97ae3f342ae88b1531acfa790772a27c17	2020-05-09 00:42:39 -07:00
Rohan Varma	4501083306	dedupe test skipping in common_distributed and test_distributed (#38078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38078 `common_distributed` and `test_distributed` have some error codes that overlap but are for different reasons, for example, code 75 in `test_distributed` is "no cuda available" but in common_distributed it is "need at least 2 CUDA devices". This is an issue because the tests in `test_distributed` now use the utils in `common_distributed`, so we could get the wrong reason for skipping tests. It is also the source of test failures in https://github.com/pytorch/pytorch/pull/37990. This diff makes it so that the test skipping logic is deduped and put into `common_distributed.py`, where it can be reused and then imported into `test_distributed` ghstack-source-id: 103782583 Test Plan: CI Differential Revision: D21466768 fbshipit-source-id: 53b5af36672ebd8b51ba8b42709d87e96cadef20	2020-05-08 23:19:26 -07:00
Rohan Varma	e109ff6379	Use py::pickle in RRef pickling pybind code (#38147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38147 We're seeing many warnings of the form: ``` /home/rvarm1/pytorch/torch/distributed/rpc/__init__.py:14: FutureWarning: pybind11-bound class 'torch.distributed.rpc.R Ref' is using an old-style placement-new '__setstate__' which has been deprecated. See the upgrade guide in pybind11's docs. This message is only visible when compiled in debug mode. ``` in test logs, it turns out this is because pybind recommends using `py::pickle` instead of manually defining getstate and setstate (see https://github.com/pybind/pybind11/blob/master/docs/upgrade.rst#id5). Changing to use pybind's recommendation will silence these warnings. Note that return types need to be added to the function to satisfy the contract pybind expects, but they don't return anything since we TORCH_CHECK(false) in all cases. ghstack-source-id: 103769585 Test Plan: CI Differential Revision: D21446260 fbshipit-source-id: a477e4937b1d6134992c57467cdbe10f54567b8b	2020-05-08 22:36:59 -07:00
Edward Yang	30f4064cfb	Bind VariableFunctions as a module, not a class with static methods. (#38136 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38136 This was a bit trickier than I expected, because modules have to be importable to be pickleable, but adding a module to another module in the C API isn't really the right way to make it importable. We hack around it by manually adding the module to sys.modules. Thanks Richard Zou for an extremely useful prior attempt which helped me make this work. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21487840 Pulled By: ezyang fbshipit-source-id: 368da9b9c50e5de4d7dd265e6f9f189a882d75c1	2020-05-08 22:34:34 -07:00
Edward Yang	7e9af67ca1	Add minimal skeleton for _C type stubs, delete torch.autograd stub (#38080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38080 Originally, my plan was to just delete the torch.autograd stub, but this triggered a bunch of downstream errors relating to non-existent to _C modules, and so instead of ignoring those files, I decided to add a minimal _C type stubs, where it was easy (cases which were codegened I ignored). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21487841 Pulled By: ezyang fbshipit-source-id: cfcc467ff1c146d242cb9ff33a46ba26b33b8213	2020-05-08 22:33:21 -07:00
Mikhail Zolotukhin	464e5a6c07	[TensorExpr] Add print functions for Tensor and Function. (#38175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38175 Also, make Tensor derived from KernelScopedObject - we must have missed that originally. Test Plan: Imported from OSS Reviewed By: resistor Differential Revision: D21489136 Pulled By: ZolotukhinM fbshipit-source-id: fe003f44ef1265629fd84befc2e9ec8f48d2fc4f	2020-05-08 22:15:26 -07:00
Lu Fang	8181711637	Automatic update of fbcode/onnx to 79a7e0df7e86e0f32e7a05f563b24a566540c18b (#38106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38106 Previous import was 807c62cf7e4c96ce49040bcf073b7e4a054f28a5 Included changes: - [79a7e0df](https://github.com/onnx/onnx/commit/79a7e0df): Fix copy paste error of Min op test case (#2640) <Takeshi Watanabe> - [4cd2538d](https://github.com/onnx/onnx/commit/4cd2538d): Add a release pipeline for Windows python packages (#2632) <Changming Sun> - [e8b33a5a](https://github.com/onnx/onnx/commit/e8b33a5a): Adding UnfoldToDepth op [1.7 Release] (#2616) <Negin Raoof> - [c2a8d525](https://github.com/onnx/onnx/commit/c2a8d525): update docs (#2627) <Ksenija Stanojevic> - [22752354](https://github.com/onnx/onnx/commit/22752354): Generate tests for MeanSquareDistance and SoftmaxCrossEntropyLoss (#2623) <Jonny Shipton> - [602bd622](https://github.com/onnx/onnx/commit/602bd622): Add section about external tensor data to IR.md (#2323) <Jonny Shipton> - [165c3f3b](https://github.com/onnx/onnx/commit/165c3f3b): Add integer support to Clip (#2532) <Jonny Shipton> - [a5fabf87](https://github.com/onnx/onnx/commit/a5fabf87): Add operators LessOrEqual and GreaterOrEqual (as functions) (#2606) <Jeremy Cochoy> - [f1dcdafc](https://github.com/onnx/onnx/commit/f1dcdafc): Fix input document of quantized operators (#2117) <Takeshi Watanabe> - [43af9b69](https://github.com/onnx/onnx/commit/43af9b69): Add reference impl for sequence ops (#2380) <Bowen Bao> - [aa50aa12](https://github.com/onnx/onnx/commit/aa50aa12): Print value case of TypeProto more friendly (#2422) <Takeshi Watanabe> - [2e67bfc3](https://github.com/onnx/onnx/commit/2e67bfc3): Fix issue #2436 (#2447) <daquexian> - [d27ffc6b](https://github.com/onnx/onnx/commit/d27ffc6b): Add support for integer tensors to Min and Max (#2608) <Jonny Shipton> - [5cc668af](https://github.com/onnx/onnx/commit/5cc668af): Update IR.md to describe training extension. (#2615) <G. Ramalingam> - [8c5bf9d4](https://github.com/onnx/onnx/commit/8c5bf9d4): Generate node backend tests for celu operator (#2607) <Jonny Shipton> - [7b65287e](https://github.com/onnx/onnx/commit/7b65287e): Change dtype of dd_da in gradient test to float32 (#2620) <Shinichiro Hamaji> - [e91739f2](https://github.com/onnx/onnx/commit/e91739f2): Introduce SoftmaxCrossentropy as a loss function (#2573) <Ksenija Stanojevic> - [b008ed3a](https://github.com/onnx/onnx/commit/b008ed3a): Support gathernd with batch_dim mode (#2585) <wezuo> - [d2fe4f22](https://github.com/onnx/onnx/commit/d2fe4f22): Introduce MeanSquaredError as Loss Function (#2570) <Ksenija Stanojevic> - [10b812a6](https://github.com/onnx/onnx/commit/10b812a6): Add support for default attributes within FunctionExpandHelper (#2588) <Ewa Tusień> - [3368834c](https://github.com/onnx/onnx/commit/3368834c): adding version update content. (#2609) <Ke Zhang> - [8873cb02](https://github.com/onnx/onnx/commit/8873cb02): Adding Inverse Op (#2578) <Negin Raoof> Test Plan: ci Reviewed By: hl475 Differential Revision: D21471424 fbshipit-source-id: 5009a5f9558458a0aba56b2a9e8fffc3895a9e02	2020-05-08 21:47:11 -07:00
Shihao Xu	3d0279862d	Consolidate builtin/python_udf RPC to return ivalue::Future like torchscript RPC does (#35154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35154 This is for issue https://github.com/pytorch/pytorch/issues/34999. close https://github.com/pytorch/pytorch/issues/34999. https://github.com/pytorch/pytorch/issues/34997 need more work. This will make a few work items easier, like 1) Dist autograd profiler, 2) JIT annotation for Future. Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_rref_forward_chain --stress-runs 100 buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par \ -r test_call_method_on_rref ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- 'test_rref_proxy_class $fb\.test_rpc_fork\.RpcTestWithFork$' --stress-runs 100 test_rref_proxy_reuse test_handle_send_exceptions ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_script_call_python_return_future ``` Differential Revision: D7722184 fbshipit-source-id: bd92b855bfea4913d6672700590c57622fa86e0e	2020-05-08 21:28:56 -07:00
Ailing Zhang	86d28706e0	Remove uses of type() part 2 (#38140 ) Summary: I'm mostly done with cleaning up test/ folder. There're a bunch of remaining callsites but they're "valid" in testing `type()` functionalities. We cannot remove them until it's fully deprecated. Next PR would mainly focus on move some callsites to an internal API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38140 Differential Revision: D21483808 Pulled By: ailzhang fbshipit-source-id: 12f5de6151bae59374cfa0372e827651de7e1c0f	2020-05-08 19:30:46 -07:00
Sebastian Messmer	16e62f9305	Unboxing uses if_constexpr instead of SFINAE (#38145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38145 Now that if_constexpr is landed, we can make this more readable ghstack-source-id: 103765920 Test Plan: waitforsandcastle Differential Revision: D21480798 fbshipit-source-id: 8181d4731036373cc3a1868fd6f4baeebb426081	2020-05-08 18:15:44 -07:00
ycao	ae534dc978	[TorchScript] Explicitly disallow del with more than 1 operand. (#38089 ) Summary: del in python supports multiple operands, but PyTorch c++ frontend doesn't support that. To be consistent across different frontends, we decided to throw an exception when finding del with multiple operands inside torchscript. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38089 Test Plan: Unit tests in test/jit/test_builtins.py Differential Revision: D21478900 Pulled By: SplitInfinity fbshipit-source-id: 1cbd61301680c5d6652ef104996178cefcdd3716	2020-05-08 17:56:36 -07:00
Supriya Rao	138476389e	[quant] Disable qnnpack test when TSAN is enabled (#38153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38153 Test fails when opt-tsan is enabled Test Plan: buck test mode/opt-tsan //caffe2/test:quantization -- 'test_single_linear_dynamic $quantization\.test_quantize\.TestGraphModePostTrainingStatic$' --run-disabled Reviewed By: vkuzo Differential Revision: D21482799 fbshipit-source-id: fe6d1d84f525387081fabb90ce876c7c7dafd081	2020-05-08 16:52:36 -07:00
Xiao Wang	63b1ae6983	Fix overflow in torch.remainder when dividend is very large (#37758 ) Summary: This will fix the GPU implementation in https://github.com/pytorch/pytorch/issues/37743 and https://github.com/pytorch/pytorch/issues/24861. Please also check my [comment](https://github.com/pytorch/pytorch/issues/37743#issuecomment-623285707). The fixed `remainder_kernel` follows the similar implementation in numpy. See `79d7bc276a/numpy/core/src/npymath/npy_math_internal.h.src (L649-L658)` I also slightly update the doc for `torch.remainder`, to make it similar to `torch.fmod`. I'm not sure how to modify the Vec256 code of CPU remainder_kernel, so I just leave it there. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37758 Differential Revision: D21388417 Pulled By: ngimel fbshipit-source-id: 770ba5801cf34619b2b68b8b0cf95d8cfa52e6f6	2020-05-08 16:46:55 -07:00
Basil Hosmer	fdc40616b2	s/callUnboxed/call/ (#37999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37999 Next step: make explicit type arguments less intrusive, or fine a way to eliminate them entirely. Test Plan: Imported from OSS Differential Revision: D21445646 Pulled By: bhosmer fbshipit-source-id: 106b3381acea473ca686ab42b5ca610c89f5c531	2020-05-08 16:18:10 -07:00
peter	55de7c3bb0	Add test jobs on CPU agents for CUDA builds on Windows (#37904 ) Summary: Targets https://github.com/pytorch/pytorch/pull/37811#issuecomment-624367089. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37904 Differential Revision: D21484360 Pulled By: seemethere fbshipit-source-id: b25cbf35b8432a587bce86815c97ff444cab255c	2020-05-08 15:46:09 -07:00
Ailing Zhang	e84aa0211d	[JIT]Support List variable in adv indexing. (#37966 ) Summary: Followup of https://github.com/pytorch/pytorch/issues/37848 I realized that it's better to condition on `Value` type instead of token type. So now it also support indexing through list variables (used to be list literal only). Also apparently our eager frontend accept indexing with float list as well, so matched this edge case behavior as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37966 Reviewed By: suo Differential Revision: D21439642 Pulled By: ailzhang fbshipit-source-id: cedb8431ef38747d4aa9909a6bbf8e954dbe0e25	2020-05-08 15:40:11 -07:00
Simon Layton	c879c6fb98	Vectorize non-persistent Softmax kernels (#36485 ) Summary: Add read/write vectorization to non-persistent softmax kernels only. At this point launch logic has minimal changes, and `ILP=vectorization=2` is always used (the code can handle other values, but `ILP=2` has been the most consistent performer). Dispatch to persistent / non-persistent kernels is unchanged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36485 Differential Revision: D21477775 Pulled By: ngimel fbshipit-source-id: 9ff7fd243695d7bbf4121390085b64db0bbdef35	2020-05-08 15:20:33 -07:00
Shihao Xu	615235fc80	Migrate OwnerRRef value store to generic torch Future (#38143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38143 It's a followup of https://github.com/pytorch/pytorch/pull/32556, where an error handling boilerplate code path was added to the FutureMessage callback. However, I noticed that the FutureMessage could never be set with an error, because the FutureMessage is a member in OwnerRRef, - OwnerRRef does not have a setError method yet. - The FutureMessage is only used for signaling - The value of the RRef is contained in the `value_` field. With the Future being generalized, it could contain more value types, not limited to Message. This PR migrates the OwnerRRef value from the `value_` field to the generic Future. In a later PR, it will be super easy to add a `setError` method for OwnerRRef, which calls `future_.setError(..)`. (I decide to do it later. I think it's better to migrate the call sites together with adding the new `setError` method.) Also, this fixes the issue pointed out by https://github.com/pytorch/pytorch/pull/31086/files#r422256916. This PR was submitted as https://github.com/pytorch/pytorch/pull/32608. ghstack-source-id: 103757743 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par \ -r test_call_method_on_rref ``` Differential Revision: D5707692 fbshipit-source-id: 83ce0e5e5e97acb9ce8230fce5e4a3d806478b02	2020-05-08 15:10:32 -07:00
Donna Choi	ca2206d071	Add documentation for FeatureAlphaDropout (#36295 ) Summary: These changes add documentation for FeatureAlphaDropout, based on a need raised in an issue by SsnL (Issue https://github.com/pytorch/pytorch/issues/9886). Pull Request resolved: https://github.com/pytorch/pytorch/pull/36295 Differential Revision: D21478591 Pulled By: zou3519 fbshipit-source-id: a73c40bf1c7e3b1f301dc3347cef7b32e9842320	2020-05-08 15:09:01 -07:00
Nozomu Yoshinari	c13dc2cab2	Fix a minor typo in DistanceOpsKernel.cpp (#37596 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37596 Differential Revision: D21356627 Pulled By: zou3519 fbshipit-source-id: a3e7bc6f9f150d478c6d10fb5b6797589af12add	2020-05-08 15:04:30 -07:00
Nick Gibson	ad433e2003	[TensorExpr] Fix a bug in the IR Simplifier that could introduce a division by zero (#38055 ) Summary: In the IR Simplifier when doing partial factorization of Round+Mod patterns we divide by the lower number, which could be zero. Add in a quick check against zero avoid the crash. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38055 Differential Revision: D21478486 Pulled By: nickgg fbshipit-source-id: c5083f672e91662b7d1271d817cade7fa6c39967	2020-05-08 14:58:53 -07:00
Hector Yuen	9957db22a9	int8 fc with tests (#38017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38017 added a more comprehensive set of tests for int8fc there are some failures but this emulation gets us much closer than the existing one there is still more work coming in Test Plan: the test itself Reviewed By: amylittleyang Differential Revision: D21368530 fbshipit-source-id: 318722c030b2a1f8de37adb7c8633f75057edfab	2020-05-08 14:52:51 -07:00
Richard Zou	172bcdb8c8	Add documentation for nn.Hardsigmoid and nn.functional.hardsigmoid. (#38120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38120 Test Plan: build docs locally and attach a screenshot to this PR. Differential Revision: D21477815 Pulled By: zou3519 fbshipit-source-id: 420bbcfcbd191d1a8e33cdf4a90c95bf00a5d226	2020-05-08 13:56:45 -07:00
James Reed	41572116f6	Dont store redundant packed params in dynamic quantized RNN (#38134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38134 Test Plan: Imported from OSS Reviewed By: ailzhang Differential Revision: D21479289 Pulled By: jamesr66a fbshipit-source-id: 11d9ad034396ce75c5a93d1f7ebca587205089ee	2020-05-08 13:52:52 -07:00
Mikhail Zolotukhin	4784af1d78	[TensorExpr] Don't include aten::rand_like to TE fusion groups since we can't handle rand+broadcast case yet. (#38132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38132 Test Plan: Imported from OSS Reviewed By: resistor Differential Revision: D21479256 Pulled By: ZolotukhinM fbshipit-source-id: 2678cfd6ad2feea132efb5eec09e5f41bbd54487	2020-05-08 13:37:13 -07:00
Cloud Han	6e1e2a60dc	fix compilation error with gcc 5.5 (#38112 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/38111 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38112 Differential Revision: D21476876 Pulled By: malfet fbshipit-source-id: 06d25e763eb73961f7b4e4cfbd2bb59f5ab96387	2020-05-08 13:23:36 -07:00
Nik Ved	a7c29dbfa2	`unfold_backward` gets its own kernel (#36612 ) Summary: `unfold_backward` uses `index_add` which causes regression on CUDA because of the underlying `atomicAdd`, and regression on CPU because of limited parallelization. This PR attempts to replace `index_add` with a custom kernel. Fixes [https://github.com/pytorch/pytorch/issues/17501](https://github.com/pytorch/pytorch/issues/17501). Pull Request resolved: https://github.com/pytorch/pytorch/pull/36612 Differential Revision: D21450349 Pulled By: albanD fbshipit-source-id: 09ec1fbd5d7290656700eca8e7fb7cf52323ec28	2020-05-08 13:18:36 -07:00
Jerry Zhang	0ed7fc581c	[quant][graphmode][refactor] Split quantization.cpp (#37975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37975 Test Plan: . Imported from OSS Differential Revision: D21468497 fbshipit-source-id: 35cbf98a344ca6e4094d616a4040eacf017fd2de	2020-05-08 12:24:50 -07:00
Jerry Zhang	ff9a809ccd	[quant][graphmode][refactor] Remove unused code in quantization.cpp (#37974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37974 Differential Revision: D21468498 Pulled By: jerryzh168 fbshipit-source-id: 96f34db9f98474ec8e5d33e9b7c406b1637f5de8	2020-05-08 11:03:03 -07:00
James Reed	c1e7758b5e	Back out "Revert D20229168: [quantization] Use torchbind for Linear PackedParams" (#38101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38101 Original commit changeset: 29e8a4d3b8bf ghstack-source-id: 103730417 Test Plan: waitforsadcastle Differential Revision: D21471381 fbshipit-source-id: a922cdf31ba32021e7264ae1454c646c0bfd7ef4	2020-05-08 10:53:06 -07:00
Luca Wehrstedt	91f451a5e6	[TensorPipe] Do not require user to provide worker name-to-rank map (#38052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38052 The initial version of the TensorPipe agent required the user to specify the full map between workers' names and their ids, on each worker. However it's enough for each worker to just specify their name and id, as these can then be exchanged using the store. Addresses #37784, although I think we can go further and use the store to also automatically assign ranks to workers, so that the user only needs to specify a name. ghstack-source-id: 103741595 (Note: this ignores all push blocking failures!) Test Plan: On worker 0: ``` In [1]: import os ...: import torch ...: import torch.distributed.rpc as rpc ...: os.environ["MASTER_ADDR"] = "127.0.0.1" ...: os.environ["MASTER_PORT"] = "8765" In [2]: rpc.init_rpc(name="foo", rank=0, backend=rpc.backend_registry.BackendType.TENSORPIPE, world_size=2) In [3]: rpc.rpc_sync("bar", torch.add, args=(torch.full((2,2), 1), torch.full((2,2), 2))) Out[3]: tensor([[3., 3.], [3., 3.]]) In [4]: rpc.rpc_sync("bar", torch.add, args=(1, 2)) Out[4]: 3 ``` On worker 1: ``` In [1]: import os ...: import torch ...: import torch.distributed.rpc as rpc ...: os.environ["MASTER_ADDR"] = "127.0.0.1" ...: os.environ["MASTER_PORT"] = "8765" In [2]: rpc.init_rpc(name="bar", rank=1, backend=rpc.backend_registry.BackendType.TENSORPIPE, world_size=2) ``` Then also tested by adding `rpc_backend_options=rpc.TensorPipeRpcBackendOptions(init_method="file:///tmp/init/foo")` to `rpc_init`. Differential Revision: D21463833 fbshipit-source-id: b53d7af6fc060789358ac845aa1898ddea6e8f31	2020-05-08 10:48:48 -07:00
Ilia Cherniavskii	b4946b96c6	Don't use Profiler key in lite interpreter (#37962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37962 Temporarily re-enable RecordFunction in lite interpreter when profiler key is not set, this allows the profiler to work without profiled wrappers in the build Test Plan: CI Reviewed By: smessmer, linbinyu Differential Revision: D21409120 fbshipit-source-id: 6f0311c8eb55537a03b8bdac69def18a496ec672	2020-05-08 10:47:10 -07:00
Ralf Gommers	726aa713d5	Replace torch.is_tensor usages with isinstance checks. (#38062 ) Summary: `is_tensor` doesn't really have a reason to exist anymore (other than backwards compatibility) and is worse for typechecking with mypy (see gh-32824). Given that it may not be obvious what the fix is once mypy gives an error, make the change in a number of places at once, and add a note on this to the `is_tensor` docstring. Recommending an isinstance check instead has been done for quite a while, e.g. https://github.com/pytorch/pytorch/pull/7769#discussion_r190458971 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38062 Differential Revision: D21470963 Pulled By: ezyang fbshipit-source-id: 98dd60d32ca0650abd2de21910b541d32b0eea41	2020-05-08 10:10:11 -07:00
Ailing Zhang	9232356e5f	remove uses of type() and type_as() part 1. (#38029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38029 Differential Revision: D21468523 Pulled By: ailzhang fbshipit-source-id: 14b7185d43eb03f630cfaa2d70e02d637ff8551b	2020-05-08 08:16:24 -07:00
Mike Ruberry	0c936f94d6	Revert D21449612: [pytorch][PR] Migrate AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3 to c10::complex Test Plan: revert-hammer Differential Revision: D21449612 Original commit changeset: 236070946b9d fbshipit-source-id: 2de485ca18388a055f44d6caf18cf516b2288875	2020-05-08 02:34:00 -07:00
Mikhail Zolotukhin	0f60c8d878	[TensorExpr] Correctly print 'bool' dtype in Cuda printer. (#38077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38077 Test Plan: Imported from OSS Differential Revision: D21467298 Pulled By: ZolotukhinM fbshipit-source-id: 65ac347f097e01aaf1d3ff5d598a402ca619d1f2	2020-05-08 00:40:47 -07:00
Mikhail Zolotukhin	ff1a627bae	[TensorExpr] Don't include prim::Constant nodes with Tensor type into TE fusion groups - we can't handle them. (#38105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38105 Test Plan: Imported from OSS Differential Revision: D21471611 Pulled By: ZolotukhinM fbshipit-source-id: 5d06fde353e221bcbdf26935a19b589aab7e2afe	2020-05-08 00:40:42 -07:00
Mikhail Zolotukhin	a253ea92fb	[TensorExpr] Properly handle Bool dtype in several other places. (#38104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38104 Test Plan: Imported from OSS Differential Revision: D21471612 Pulled By: ZolotukhinM fbshipit-source-id: b582b9fda346b96df5d19bf8a160ab2b5306cb92	2020-05-08 00:39:12 -07:00
Mikhail Zolotukhin	459f14e9f6	[TensorExpr] Correctly print dtypes in Cast and Allocate. (#38091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38091 Test Plan: Imported from OSS Differential Revision: D21469100 Pulled By: ZolotukhinM fbshipit-source-id: d052ff33f26321d04371557c4cf2afc0a928c6bf	2020-05-08 00:27:23 -07:00
Tzu-Wei Huang	609d5a4476	[tensorboard] Let hparam render values correctly (#31544 ) Summary: The root cause of incorrect rendering is that numbers are treated as a string if the data type is not specified. Therefore the data is sort based on the first digit. closes https://github.com/pytorch/pytorch/issues/29906 cc orionr sanekmelnikov Pull Request resolved: https://github.com/pytorch/pytorch/pull/31544 Differential Revision: D21105403 Pulled By: natalialunova fbshipit-source-id: a676ff5ab94c5bdb653615d43219604e54747e56	2020-05-08 00:05:16 -07:00
Nikita Shulga	4c358b8b72	Run QEMU to test that default dispatch doesn't use AVX (#38094 ) Summary: `qemu-x86_64 -cpu Haswell` jit compiels x86_64 code to the host OS but lacks support for AVX/AVX2 instruction set emulation, which makes it ideal target for testing instruction set violation (especially via static initializes) even if it runs on CPU physically capable of executing AVX2 instructions. It's quite easy to validate, that it is the case, by invoking ATen's `basic` cpp test with dispatch set to AVX: `qemu-x86_64 -cpu Broadwell -E ATEN_CPU_CAPABILITY=avx ./bin/basic --gtest_filter=BasicTest.BasicTestCPU` This PR adds extra step to CircleCI tessuite that executes `basic` test with default CPU capability for `pytorch-linux-[xenial\|bionic]-py3.6-...-test` configurations using qemu and validates that it completes successfully. (And fails before https://github.com/pytorch/pytorch/pull/38088 is merged) Closes https://github.com/pytorch/pytorch/issues/37786 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38094 Differential Revision: D21472278 Pulled By: malfet fbshipit-source-id: 722d4eceac8ce6fbc336ab883819cf7fccea3a66	2020-05-07 22:09:57 -07:00
Nikita Shulga	53aa7d8bc5	Add option to skip tests after retries (#38079 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38079 Differential Revision: D21470238 Pulled By: malfet fbshipit-source-id: b2e63be34090c6f61acad8b6530658a835c68870	2020-05-07 21:56:29 -07:00
Allan Di Wu	d35ab0b7ae	Fix CUDA memory management issues caused by not using PinnedCPUAllocator (#38066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38066 Increasing priority for PinnedCPUAllocator to make sure it is set when CUDA is enabled. Test Plan: buck test mode/dev-nosan //vision/fair/detectron2/tests:test_export_caffe2 -- 'testMaskRCNNGPU $test_export_caffe2\.TestCaffe2Export$' Reviewed By: ppwwyyxx Differential Revision: D21465835 fbshipit-source-id: 643cff30d35c174085e5fde5197ddb05885b2e99	2020-05-07 21:52:00 -07:00
Xiang Gao	f4d9713d12	Migrate AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3 to c10::complex (#37977 ) Summary: `AT_DISPATCH_ALL_TYPES_AND_C10_COMPLEX_AND3` is removed `AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3` is now using `c10::complex` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37977 Differential Revision: D21449612 Pulled By: anjali411 fbshipit-source-id: 236070946b9d6fc89533d196f17fa9c7275d83b5	2020-05-07 21:47:41 -07:00
Chris Paulse	deeef50432	Check the _geev input matrix for NaNs and infs (#37642 ) Summary: If we don't do this we risk a segmentation fault from the Intel MKL. Fixes https://github.com/pytorch/pytorch/issues/37499 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37642 Differential Revision: D21465181 Pulled By: pbelevich fbshipit-source-id: 809dca11f11de91018d978578bc11737b879d6ec	2020-05-07 21:33:37 -07:00
Edward Yang	16e3df3ac6	Fix typo: TupleUnpack. (#38043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38043 Fixes https://github.com/pytorch/pytorch/issues/37183 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21471000 Pulled By: ezyang fbshipit-source-id: feea7021a23a68053db32ef02751e97e1b61ca8f	2020-05-07 20:57:20 -07:00
ashish	5a386a0a78	Fix ldflags string for HIPExtensions (#38047 ) Summary: This pull request adds a check for ROCm environment and skips adding CUDA specific flags for the scenario when a pytorch extension is built on ROCm. ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/38047 Differential Revision: D21470507 Pulled By: ezyang fbshipit-source-id: 5af2d7235e306c7aa9a5f7fc8760025417383069	2020-05-07 20:39:01 -07:00
Edward Yang	c2f787ce77	Give _VariableFunctions class a different name, so pickling works (#38033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38033 Pickles require class names to be actually accessible from the module in question. _VariableFunction was not! This fixes it. Fixes https://github.com/pytorch/pytorch/issues/37703 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21458068 Pulled By: ezyang fbshipit-source-id: 2a5ac41f9d1972e300724981b9b4b84364ddc18c	2020-05-07 20:34:21 -07:00
Ralf Gommers	9fe8243536	Fix minor issue in type stub for Optimizer (#38067 ) Summary: Closes gh-23731 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38067 Differential Revision: D21471021 Pulled By: ezyang fbshipit-source-id: 8e7ee7f437bfa8e78a47ac6cf572b0fc9b5c6939	2020-05-07 20:11:40 -07:00
Lu Fang	3cade9cdd4	Automatic update of fbcode/onnx to 807c62cf7e4c96ce49040bcf073b7e4a054f28a5 (#37983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37983 Previous import was 9fdae4c68960a2d44cd1cc871c74a6a9d469fa1f Included changes: - [807c62cf](https://github.com/onnx/onnx/commit/807c62cf): Training Proposal: Spec Changes and Gradient Operator (#2314) <Wei-Sheng Chin> Test Plan: ci Reviewed By: hl475 Differential Revision: D21441188 fbshipit-source-id: 88b5be5bd479b59bdb45525f5dfe61d787151cdd	2020-05-07 20:07:31 -07:00
Nikita Shulga	12bbda053c	Remove static initalizers from Vec256 (#38088 ) Summary: Follow up after PR https://github.com/pytorch/pytorch/pull/37767 Pull Request resolved: https://github.com/pytorch/pytorch/pull/38088 Test Plan: `qemu-x86_64 -cpu Broadwell -E ATEN_CPU_CAPABILITY=default ./bin/basic --gtest_filter=BasicTest.BasicTestCPU` Differential Revision: D21468726 Pulled By: malfet fbshipit-source-id: dcf9ce3d7816e9bdfe5d0fcf8b7cad42c0f77b4c	2020-05-07 20:03:30 -07:00
Ansha Yu	25413635d0	[c2][opt] nomnigraph transform for ClipRangesGatherSigridHashV2 fusion (#38004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38004 Un-backout of D21353550, originally D21262085. No changes here, fix in D21445881. Fuse ClipRanges + GatherRanges + SigridHash -> ClipRangesGatherSigridHashV2 dpa_product_ctr model's dper2 to dper3 migration is blocked by 3.6% higher prospector cpu usage. Root cause is traced down to sigrid transforms, where ClipRanges, GatherRanges, SigridHash are separately called, instead of fused, as is the case in dper2. Further context: https://fb.quip.com/GijaAZtX5mav https://fb.quip.com/pIDdAjJP2uiG Test Plan: Local benchmarking with small model 181513584_0 (Dper3 full model is 178772812, dper2 refresh is 178770392) Transform turned on: P129799373 Iters per second: 609.291 Transform turned off: P129799397 Iters per second: 519.088 We also want to confirm this performance on the full model in canary and in qrt. `buck build mode/opt-clang mode/no-gpu caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench` `MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --pred_net=/data/users/ansha/tmp/dpa/small_pred_net.pb --c2_model=/data/users/ansha/tmp/dpa/181513584_0.predictor --c2_inputs=/data/users/ansha/tmp/dpa/c2_inputs_small.pb --iters=3000 --warmup_iters=100 --num_threads=32 --c2_apply_nomnigraph_passes=1 --caffe2_predictor_enable_preproc_fusion=1` Run dbgo build to check that all transforms happen. Check that ClipRangesGatherSigridHash is used: https://fburl.com/scuba/caffe2_operator_stats_canary/e6qfdsat Canaries: https://our.intern.facebook.com/intern/ads/canary/426498918895712377/ https://our.intern.facebook.com/intern/ads/canary/426498905389730718/ https://our.intern.facebook.com/intern/ads/canary/426498901795492517/ Dbgo canaries: https://our.intern.facebook.com/intern/ads/canary/426498888067456166/ https://our.intern.facebook.com/intern/ads/canary/426498879652089095/ https://our.intern.facebook.com/intern/ads/canary/426498873491575187/ https://our.intern.facebook.com/intern/ads/canary/426498860171351505/ Reviewed By: houseroad Differential Revision: D21445887 fbshipit-source-id: a3c15ee30465de693f434b6ee041025c276581ac	2020-05-07 20:00:35 -07:00
Ansha Yu	32329c3338	[nomni] fix outputs check to replaceSubgraph (#38005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38005 D21445887 runs into a dbgo build crash on this stack P130135519 It is because the assertion sg_inputs_copy.size() == 0 is too restrictive. nn::getOutputs(sg) returns "output" nodes which can include any inputs that have additional consumers that are not in the subgraph itself. To fix, proposing to remove inputs from the output check. Test Plan: Run tests Sanity canaries: https://our.intern.facebook.com/intern/ads/canary/426498931666198610/ https://our.intern.facebook.com/intern/ads/canary/426498935267166205/ Reviewed By: bwasti Differential Revision: D21445881 fbshipit-source-id: 419a4b1a230f0370619cea574403bfa114e56a7c	2020-05-07 19:58:15 -07:00
Edward Yang	f8c93c5d3e	Get rid of javasphinx dependency. (#38042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38042 Fixes https://github.com/pytorch/pytorch/issues/36064 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21460484 Pulled By: ezyang fbshipit-source-id: 553cbacc4365cfd84ff4a468a7366b12eade6fe0	2020-05-07 19:52:31 -07:00
Nikita Shulga	4bc0a7f86a	Revert D20229168: [quantization] Use torchbind for Linear PackedParams Test Plan: revert-hammer Differential Revision: D20229168 Original commit changeset: 3607cac9aa5b fbshipit-source-id: 29e8a4d3b8bffd95ff6a58b46c4f1c1e23770304	2020-05-07 19:47:45 -07:00
Negin Raoof	29f19bf727	[ONNX] Enable tests for opset 12 (#37846 ) Summary: Update ORT nightly version and enable opset 12 tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37846 Reviewed By: hl475 Differential Revision: D21467903 Pulled By: houseroad fbshipit-source-id: 20d249790edfb0091a02ebfc58c3d306087e8471	2020-05-07 19:39:08 -07:00
Kurt Mohler	5ee2302349	Add links to more subdir READMEs in CONTRIBUTING.md (#38049 ) Summary: I think it would be nice to have these extra README links here so they're easier to find. There are even more READMEs throughout the source tree that I didn't include, but most of them seem to have pretty minimal information. Pull Request resolved: https://github.com/pytorch/pytorch/pull/38049 Differential Revision: D21470749 Pulled By: ezyang fbshipit-source-id: aa164a3776ab90f2453634082eeae20c0dd002ce	2020-05-07 19:37:33 -07:00
Lu Fang	9efbc19f75	Fix the issue with C2 cont build Summary: Issue was introduced in D21258652. We need to make sure it compiles with opt mode. We may still have some left over py2 packages. Let's just use some format work with both. Test Plan: ci Reviewed By: xush6528 Differential Revision: D21457394 fbshipit-source-id: cde79a0fc6b4feba307bd9d45e1a1d4a42de9263	2020-05-07 19:33:00 -07:00
peter	4ae187f6cb	Set SCCACHE_IDLE_TIMEOUT to INFINITE(0) on Windows (#37993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37993 Differential Revision: D21470487 Pulled By: ezyang fbshipit-source-id: 8a16e65c539439ff4ca0cba4a3b0bf144b8d85c9	2020-05-07 19:27:45 -07:00
peter	bfa5070cbc	Fix rebuild with Ninja on Windows (#37917 ) Summary: It is currently broken due to a ninja bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37917 Differential Revision: D21470357 Pulled By: ezyang fbshipit-source-id: c0ed858c63a7504bf2c4961dd7ed906fc3f4502a	2020-05-07 19:15:27 -07:00
James Reed	eaf9b28c55	[quantization] Use torchbind for Linear PackedParams (#34140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34140 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D20229168 Pulled By: jamesr66a fbshipit-source-id: 3607cac9aa5b4b044572329742baed03350491c6	2020-05-07 19:03:44 -07:00
Rohan Varma	e3fcc6ade8	Skip RPC profiling tests (#38045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38045 We are working on fixing these (e.g. https://github.com/pytorch/pytorch/pull/37311) but a few PRs still need to land before these tests are fixed. Disable them for now to avoid noise ghstack-source-id: 103701518 Test Plan: CI Differential Revision: D21461340 fbshipit-source-id: fbb029a19a93d439c9fce8424be0fb6409b52ff3	2020-05-07 18:47:06 -07:00
eellison	d5df055bbb	[WIP][JIT] Add JIT backend registration API (#35833 ) Summary: Summary This commit adds `torch::jit::RegisterBackend`, an API that allows external backends to be registered for the execution of JIT subgraphs outside the JIT interpreter. In order to register an external backend, one must extend the provided abstract class `PyTorchBackendInterface` and provide two additional functions: one that creates an instance of the aforementioned subclass of `PyTorchBackendInterface`, and another that preprocesses a `ScriptModule` so that it can run on the backend. Then, a `ScriptModule` that can compile and execute a given JIT subgraph using the functions provided at registration time is generated for each registered backend. Testing This commit adds a unit test that uses a minimal test backend to make sure that the registration endpoint and generated `ScriptModule` work. ``` $ python test/test_jit.py TestBackends Fail to import hypothesis in common_utils, tests are not derandomized . ---------------------------------------------------------------------- Ran 1 test in 0.183s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35833 Differential Revision: D21231955 Pulled By: SplitInfinity fbshipit-source-id: 452db1123d0e5d83f97fe5da8a00fdfdb50dbef9	2020-05-07 18:15:26 -07:00
Kimish Patel	002f5ec51b	Add preprocessing that fuses decomposed linear into linear. (#37937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37937 Sometime traces models dont preseve aten::linear ops and they are decomposed into addmm or mul + add. Adding thie preprocessing step helps us catch more lowerable linear nodes. Please see the test for example. Test Plan: python test/test_xnnpack_integration.py Reviewed By: xcheng16 Differential Revision: D21428069 fbshipit-source-id: 6c4ea3335eaf5722852c639fb4ee593746bb408f	2020-05-07 18:08:36 -07:00
Nikita Shulga	376c9a40dc	Fix dummy typo in `skipIfNoFBGEMM` (#38058 ) Summary: I've picked wrong revision when landed the diff, it should have had an actual check rather than `if True`: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38058 Differential Revision: D21466152 Pulled By: malfet fbshipit-source-id: 03fdc510562fab44b7d64a42284d4c3c1f8e940a	2020-05-07 18:03:48 -07:00
anjali411	a42616f71a	Fix torch.tensor dtype inference (#38030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38030 Resolves: https://github.com/pytorch/pytorch/issues/36834 Test Plan: Imported from OSS Differential Revision: D21462729 Pulled By: anjali411 fbshipit-source-id: 456b01e96fc3eac0ddf572703636459e05649316	2020-05-07 17:41:08 -07:00
Nick Gibson	f2f8027760	[TensorExpr] simplify trivial adds/subs/muls even in Float (#37960 ) Summary: The IR Simplifier early exits when working with dtypes that are not safe to reorder. There are some cases where we still want to simplify ops in these dtypes: x + 0, x - 0, x * 0 and x * 1. It's safe to eliminate the op here and it reduces clutter in the expr. Also added a quick simplification of casts which do nothing (their type is the same as the underlying). Pull Request resolved: https://github.com/pytorch/pytorch/pull/37960 Differential Revision: D21457736 Pulled By: nickgg fbshipit-source-id: 40e20a3b55fc1afb2ec50071812238a08bded2ac	2020-05-07 17:23:47 -07:00
Sebastian Messmer	379e717a1b	Back out "Revert D18927220: if_constexpr for C++14" (#37792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37792 Original commit changeset: a1b8755a2790 ghstack-source-id: 103609715 Test Plan: waitforsandcastle Differential Revision: D21389755 fbshipit-source-id: 1a3c74295dbfbf07fe225be9bcd47d11e31a20fa	2020-05-07 15:20:55 -07:00
Alban Desmaison	5e83a13e14	stop creating integer type Tensors that require gradients (#37789 ) Summary: Fix https://github.com/pytorch/pytorch/issues/37680 Makes two changes: - Add `argmin`, `argmax` and `argsort` to the list of non-differentiable functions to prevent them from generating outputs that requires_grad. - Add a check to make sure we don't add such functions to the codegen by mistake. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37789 Differential Revision: D21389201 Pulled By: albanD fbshipit-source-id: 6a7617e389e893f6f813d50f02700d32300b1386	2020-05-07 15:08:35 -07:00
Ilia Cherniavskii	facc5e0cc4	Make profiler thread local (#36291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36291 Move profiler state to be a thread local property, reuse existing thread local propagation mechanism to ensure correct profiling of async tasks. This also makes push/pop callback thread safe and easier to use in e.g. distributed profilier Test Plan: USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install ./build/bin/test_jit ./build/bin/test_jit python test/test_autograd.py python test/test_jit.py Differential Revision: D20938501 Pulled By: ilia-cher fbshipit-source-id: c0c6c3eddcfea8fc7c14229534b7246a0ad25845	2020-05-07 14:52:49 -07:00
Ilia Cherniavskii	2ef4010593	Propagate TLS callbacks with ThreadLocalState (#37745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37745 This PR makes it possible to set TLS callbacks and use them transparently not only in the main thread but also in any async tasks Test Plan: Imported from OSS Differential Revision: D21374873 Pulled By: ilia-cher fbshipit-source-id: 3be2e121673b32d7694e17e794f3b474826dffe9	2020-05-07 14:52:44 -07:00
Ilia Cherniavskii	2d708cefcc	Move RecordFunction into ATen (#37548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37548 Moving RecordFunction from torch::autograd::profiler into at namespace Test Plan: CI Imported from OSS Differential Revision: D21315852 fbshipit-source-id: 4a4dbabf116c162f9aef0da8606590ec3f3847aa	2020-05-07 14:52:39 -07:00
Ilia Cherniavskii	c24c5f9684	Make RecordFunction callbacks thread local and modernize interface (#37491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37491 This PR modernizes RecordFunction API and adds thread local callbacks in addition to the global ones Changes: - support for TLS callbacks, this is going to be the foundation of profiler and other tools - modernize interface around simple set of functions (add\|remove\|has\|clear)(Global\|ThreadLocal)(Callback) and adding RecordFunctionCallback to easily construct callbacks to be passed - we also add `.setShouldRun` into the callback interface to support cases when simple uniform sampling is not enough - to properly support add/remove introduce the idea of callback handle returned by add - internal implementation still uses SmallVector to store intermediate state (as before) - in this case these are vector of handles of callbacks that were picked to run - to speed up runtime we keep these vectors sorted, this way we can quickly enumerate callbacks that need to be run - added tests for new functionality Test Plan: BUILD_BINARY=1 USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install ./build/bin/test_jit CI record_function_benchmark: https://gist.github.com/ilia-cher/f1e094dae47fe23e55e7672ac4dcda2f Imported from OSS Differential Revision: D21300448 fbshipit-source-id: 6d55c26dbf20b33d35c3f1604dcc07bb063c8c43	2020-05-07 14:51:02 -07:00
Gregory Chanan	dc25190833	Move resize / zero logic for _thnn_conv_depthwise2d from codegen to native code. (#37957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37957 Test Plan: Imported from OSS Differential Revision: D21433212 Pulled By: gchanan fbshipit-source-id: fb431d5cf06afe2bb87fa2d73e15046f9a8d044d	2020-05-07 14:27:43 -07:00
Gregory Chanan	ed4e7cec03	Move _thnn_conv2d resize and zero code from codegen to native code. (#37956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37956 This is basically just doing what the CPU code already does, but with keeping the kernel in THC, unlike in CPU where that has already moved to native. Test Plan: Imported from OSS Differential Revision: D21433211 Pulled By: gchanan fbshipit-source-id: b7440aa50905b8c94b087eaa95f5b20a27b19d3a	2020-05-07 14:26:13 -07:00
anjali411	99349393ba	Fixed gradcheck for complex (#37836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37836 Test Plan: Imported from OSS Differential Revision: D21456881 Pulled By: anjali411 fbshipit-source-id: 9ccd130f7f23fc7b47c1c0a1f6ebfa0df0332c06	2020-05-07 14:13:03 -07:00
Summer Deng	8a8b7a16be	Remove unpacked int8 blob after constructing the packed blob to save memory (#37973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37973 Fix the unexpected memory usage issue in model QRT for the OC model. Test Plan: ``` buck test mode/opt caffe2/caffe2/quantization/server:fully_connected_dnnlowp_op_test ``` ``` buck test mode/opt caffe2/caffe2/fb/fbgemm:int8_serializer_test ``` Reviewed By: hx89 Differential Revision: D21422257 fbshipit-source-id: cc586123b8bfe41c85c6f2f7e493954845ad18a2	2020-05-07 14:05:30 -07:00
Omkar Salpekar	f0f587366c	[Tensorpipe Agent] Implementing getMetrics with currently available metrics (#37980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37980 This implements `TensorPipeAgent::getMetrics` with the metrics currently available. Will add other metrics such as Client/Server Active Calls once time outs are implemented. ghstack-source-id: 103624005 Test Plan: CI Differential Revision: D21439184 fbshipit-source-id: 8a15df58cc23cdf954e604c0f806877ba111e0a6	2020-05-07 14:02:22 -07:00
Omkar Salpekar	5d21a9cfc7	[Tensorpipe Agent] Network Data Profiling (#37852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37852 This tracks network-related metrics in the Tensorpipe RPC Agent including number of bytes sent and recieved on each node, number of errors, number of successful calls, etc. ghstack-source-id: 103681018 Test Plan: CI Differential Revision: D21340499 fbshipit-source-id: 5682a3351a6394de92a7430869b24fc56c08d793	2020-05-07 14:02:16 -07:00
Omkar Salpekar	25359f7392	[Tensorpipe Agent] Implement Global Interpreter Lock Wait Time Metric (#37851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37851 Tracks `GilWaitTime` metric in the Tensorpipe RPC Agent. ghstack-source-id: 103528374 Test Plan: CI Differential Revision: D21339527 fbshipit-source-id: 7acc2cf304e1172de21b0e459f2c97430cd22834	2020-05-07 14:02:11 -07:00
Omkar Salpekar	b452fef583	[Tensorpipe Agent] Base Structs for Tracking RPC Metrics (#37850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37850 Adding the base structs for tracking time-series metrics in the Tensorpipe RPC Agent ghstack-source-id: 103528373 Test Plan: CI Differential Revision: D21339520 fbshipit-source-id: 8334044cdded44a940800c1d1f14d07ffab1a7e2	2020-05-07 14:00:26 -07:00
Mikhail Zolotukhin	a44824c9ed	[TensorExpr] Allow to enable/disable fallback mechanism thru an envvar PYTORCH_TENSOREXPR_FALLBACK. (#37971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37971 Test Plan: Imported from OSS Reviewed By: protonu Differential Revision: D21444831 Pulled By: ZolotukhinM fbshipit-source-id: c75f58772a4730e8f40f05491f9e5afa4aa3ed30	2020-05-07 12:20:31 -07:00
Mikhail Zolotukhin	067f08c148	[TensorExpr] Move controlling knob out of the TE fuser pass. (#37970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37970 This change makes the pass friendlier for users who try to invoke it directly. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D21444832 Pulled By: ZolotukhinM fbshipit-source-id: 8be4b5028b3bd84082874e16f38a70b245af5d19	2020-05-07 12:18:31 -07:00
Owen Anderson	3066d3ac1c	Remove overly strict assertion for type demotion of scalars. (#38001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38001 Reviewed By: ZolotukhinM Differential Revision: D21445921 Pulled By: resistor fbshipit-source-id: efe441eea5c2996d919c5c2621b13a379e68accb	2020-05-07 11:51:17 -07:00
Supriya Rao	7bf9d983ea	[quant] Release qnnpack original weights for conv/linear (#37595 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37595 QNNPACK currently does not support an unpack function. So we store the original weights in the packed structure which is directly returned to the user when unpack is called. However for memory constrained environments (like mobile), storing these extra weights in memory is expensive. We need to release these weights after packing on mobile to free up the memory. As a side-effect user cannot call unpack on mobile once the model is run. The change is gated by C10_MOBILE which is enabled for mobile builds. The change saves 36MB on device for Speech Model. Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D21365495 fbshipit-source-id: 66465ea0b4a10d44187d150edfb90d989e872b65	2020-05-07 11:46:32 -07:00
Kimish Patel	dd64d26d74	Make speed_benchmark_torch report latency in us (#37953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37953 Earlier it said us but reported ms. Test Plan: buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/fbnet_mobile_inference.json --devices s9u --remote --framework pytorch --logger_level info --job_queue aibench_interactive --platform android/full_jit Reviewed By: xcheng16 Differential Revision: D21349612 fbshipit-source-id: b97b6216eb0264123ff2c7852a0678b2008b0bf1	2020-05-07 11:08:14 -07:00
Omkar Salpekar	85fccba224	Message Delay fix for test_check_failed_messages (#37978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37978 Faulty PGA tests now have message delayed by default. Tests that were written prior to this addition should explicitly turn this off since they are not designed to work reliably with message delays. ghstack-source-id: 103622888 Test Plan: Stress-running this test with TSAN. Also added a sanity check in the verify_backend_options test that verifies the default value of `messages_to_delay`. Differential Revision: D21440043 fbshipit-source-id: 78151f07a3294c3dfcfaeacd6a5e5b77a0f34da1	2020-05-07 10:51:59 -07:00
Eli Uriegas	305444a0bd	Update miniconda repository, be specific about cudatoolkit (#37186 ) Summary: Miniconda repo has moved from continuum.io to anaconda.com Also we should be specific about cudatoolkit version so that it installs the right CUDA version. Resolves https://github.com/pytorch/pytorch/issues/37047 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/37186 Differential Revision: D21443147 Pulled By: seemethere fbshipit-source-id: 856718822bdd3ce51bbc6e59b0609fe6af77bd79	2020-05-07 09:58:18 -07:00
Nikita Shulga	2b41b9bceb	[BE] Add @skipIfNoFBGEMM decorator (Reland) (#37894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37894 Differential Revision: D21449993 Pulled By: malfet fbshipit-source-id: d9d355d360384cbb158f62b40dc885527f22ee05	2020-05-07 09:43:53 -07:00
jiej	1667aa6451	[CUDA_FUSER] Expand operation support for cuda fuser (#37849 ) Summary: This PR added more supported operations in CUDA fuser. We are covering major point-wise operations supported in legacy fuser. In an attempt to adapt to legacy executor: 1. added an naive shape propagation pass on pytorch JIT IR; 2. small refactor on graph partitioning; 3. fallback interpreter execution of fusion group; Pull Request resolved: https://github.com/pytorch/pytorch/pull/37849 Reviewed By: yf225 Differential Revision: D21444320 Pulled By: soumith fbshipit-source-id: 712e18ab8497f8d58a07e6f8d200cdab52cf0d74	2020-05-07 09:21:09 -07:00
Luca Wehrstedt	ffed9dca42	[TensorPipe] Update submodule (#38013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/38013 Reviewed By: mrshenli Differential Revision: D21450029 Pulled By: lw fbshipit-source-id: 9e449af76d9e232a8c800981db421c4f390c49b2	2020-05-07 08:58:02 -07:00
Gregory Chanan	b2cc9928dd	Move resize logic for bmm from codegen to native code. (#37955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37955 Test Plan: Imported from OSS Differential Revision: D21433213 Pulled By: gchanan fbshipit-source-id: 421c566471279b53348bc77e738af13a1f3e1f9e	2020-05-07 08:25:46 -07:00
Shen Li	ee1ddcef8d	Acquire GIL when constructing/destructing ConcretePyObjectHolder (#37870 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37870 Test Plan: Imported from OSS Differential Revision: D21410785 fbshipit-source-id: 374d5f40fbdfec98262aa4c84ec4ccdc40fb2ac1	2020-05-07 07:37:39 -07:00
Sharvil Nanavati	594b33ea10	Add support for non-persistent buffers. (#37191 ) Summary: Issue: https://github.com/pytorch/pytorch/issues/18056 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37191 Differential Revision: D21428373 Pulled By: albanD fbshipit-source-id: a7d367bafb95137e1bc380178b82b08eff5d5a5a	2020-05-07 06:52:31 -07:00
Ralf Gommers	46ed3349f3	Add --check-untyped-defs to mypy.ini and test suite (#37594 ) Summary: Also move the ignores for imports to the bottom in `mypy.ini`, those are much less interesting - start with the stuff people want to work on. Second commit tests the instructions: remove an ignore, fix the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37594 Differential Revision: D21434858 Pulled By: ezyang fbshipit-source-id: 4f1a6868cdb4cb59d072bcf105f48c3a5ba3ff98	2020-05-07 06:36:01 -07:00
Xiang Gao	30fc58cfcc	Migrate CUDA where, tril, triu to c10::complex (#37896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37896 Test Plan: Imported from OSS Differential Revision: D21442150 Pulled By: anjali411 fbshipit-source-id: b80ff801572a61f76dd25f94726a2a6334a89f3b	2020-05-07 06:08:37 -07:00
BowenBao	7be9796cc4	[ONNX] Support clamp_min and clamp_max (#37872 ) Summary: clamp_min is used in `torch.nn.functional.normalize`. Update symbolic_opset11 to support with updated clip in onnx opset 11. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37872 Reviewed By: hl475 Differential Revision: D21440450 Pulled By: houseroad fbshipit-source-id: a59cbec3f4d00c3f6654da6a747fbfca59d618f1	2020-05-07 04:39:46 -07:00
Luca Wehrstedt	bc09478a60	[TensorPipe] Use the new multi-payload message API (#37919 ) Summary: In D21209901 TensorPipe added support for a vector of payloads inside each message, instead of a single one, so that users with multiple payloads can send them separately as they are instead of having to copy them into a new block of contiguous memory. The PyTorch agent is using the old API, which is preventing us from deleting it. This change has no effects on over-the-wire format and thus on performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37919 ghstack-source-id: 103572164 Test Plan: On both workers ``` import os import torch import torch.distributed.rpc as rpc os.environ["MASTER_ADDR"] = "127.0.0.1" os.environ["MASTER_PORT"] = "8765" ``` On worker 0 ``` rpc.init_rpc(name="foo", rank=0, backend=rpc.backend_registry.BackendType.TENSORPIPE, world_size=2, rpc_backend_options=rpc.TensorPipeRpcBackendOptions(worker_name_to_id={"foo": 0, "bar": 0})) ``` On worker 1 ``` rpc.init_rpc(name="bar", rank=1, backend=rpc.backend_registry.BackendType.TENSORPIPE, world_size=2, rpc_backend_options=rpc.TensorPipeRpcBackendOptions(worker_name_to_id={"foo": 0, "bar": 0})) ``` On worker 0 ``` In [15]: rpc.rpc_sync("bar", torch.add, args=(torch.full((2,2), 1), torch.full((2,2), 2))) Out[15]: tensor([[3., 3.], [3., 3.]]) In [16]: rpc.rpc_sync("bar", torch.add, args=(1, 2)) Out[16]: 3 ``` Differential Revision: D21425536 fbshipit-source-id: a0ec2be825556b39aff018a2834baf815a6d8fa5	2020-05-07 02:52:30 -07:00
Luca Wehrstedt	978ad16290	[TensorPipe] Allow passing args to agent options constructor (#37918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37918 ghstack-source-id: 103569096 Test Plan: Tested top of stack Reviewed By: jiayisuse Differential Revision: D21425537 fbshipit-source-id: 2e78d700ea774944c7fd8b22e152d8e459dd422a	2020-05-07 02:50:47 -07:00
Michael Suo	4e93844ab1	remove deprecation warning on get_contiguous_memory_format (#37963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37963 This function is still widely used in the codebase, so we don't want to add noise to builds with a bunch of warnings. Seems like the comment + macro are already pretty good indications that this functionality is considered legacy Test Plan: Imported from OSS Differential Revision: D21434447 Pulled By: suo fbshipit-source-id: 08162ed6502894ea5d3ccb92dfa0183232cc2ab5	2020-05-07 02:06:22 -07:00
Owen Anderson	65260d48c8	Fix splitWithTail to insert the tail immediately after the outer loop. (#37941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37941 Differential Revision: D21429733 Pulled By: resistor fbshipit-source-id: 12094d990c11da8b44f32a52aa5e50b3f3575145	2020-05-07 00:05:23 -07:00
Yinghai Lu	9143d7fb68	[Fakelowp] Open source fake fp16 FC ops (#37923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37923 ATT. Previously we missed this one. Test Plan: unittests Reviewed By: hyuen Differential Revision: D21426190 fbshipit-source-id: de85892a50a4b4820386e0f0d6adc34d12b33788	2020-05-06 23:53:27 -07:00
Supriya Rao	76c964dfb0	Reland [quant][tests] Enable tests to run on all qengine backends (#37943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37943 Refactor tests to use supported_qengines Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D21435514 fbshipit-source-id: 8004ef2535e1cc65036f331c00af27ded1c04a6b	2020-05-06 22:38:50 -07:00
BowenBao	122587dcb4	[ONNX] Improve error checking for large model export (#37798 ) Summary: * Add error message when onnx model file path is not a string. * Add error message when model size exceed 2GB when large model export is not turned on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37798 Reviewed By: hl475 Differential Revision: D21440571 Pulled By: houseroad fbshipit-source-id: 054aaa25ab0cffc229f9b487a2c160623c89b741	2020-05-06 22:35:00 -07:00
Nikita Shulga	385f7e59a7	Report test stats (#37803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37803 Differential Revision: D21445128 Pulled By: malfet fbshipit-source-id: 9d73b3fa32ece56309cf5ef08bd9e7fc64e0a69e	2020-05-06 22:26:07 -07:00
Nikita Shulga	952e0f00a4	Skip c2_ref_tests on network failures (#37972 ) Summary: Skip the tests if network is unaccessible and model can not be downloaded Pull Request resolved: https://github.com/pytorch/pytorch/pull/37972 Differential Revision: D21441996 Pulled By: malfet fbshipit-source-id: 5ce59764584974aee9195572338ada1fa0351a75	2020-05-06 22:19:28 -07:00
Nikita Shulga	72e5b7ae5b	Add option to run python unittests in parallel (#37180 ) Summary: So far results looks quite promising: test_nn is purely sequential tests and can be accelerated 3x Pull Request resolved: https://github.com/pytorch/pytorch/pull/37180 Differential Revision: D21437871 Pulled By: malfet fbshipit-source-id: 8679a8af355f839f2c9dae3bf36d2e102af05425	2020-05-06 22:14:11 -07:00
Xiang Gao	681c6fb60f	Move complex utilities out of Half.h (#37676 ) Summary: There is no reason to put complex utilities to half header. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37676 Differential Revision: D21440270 Pulled By: anjali411 fbshipit-source-id: bbed5fcb5be33f6a4aedcc9932595d43d97672f6	2020-05-06 19:46:05 -07:00
anjali411	634282112b	updated create input and add test methods and added a whitelist for complex (#37835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37835 Test Plan: Imported from OSS Differential Revision: D21434429 Pulled By: anjali411 fbshipit-source-id: 2590dfbae3e60c1a1019c96fe1c0b177ae088ccf	2020-05-06 19:40:25 -07:00
Xiang Gao	14fc83ebc7	Add missing c10::complex::value_type (#37677 ) Summary: There is such a member type as in https://en.cppreference.com/w/cpp/numeric/complex Pull Request resolved: https://github.com/pytorch/pytorch/pull/37677 Differential Revision: D21410197 Pulled By: anjali411 fbshipit-source-id: 749be1d71190e4afc13513b396da47f33cb990c7	2020-05-06 19:36:20 -07:00
Vasiliy Kuznetsov	09bedec29e	move quantization normalization layers to aten/src/ATen/native/quantized/cpu/ (#37352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37352 Implementing cleanup requested on #36835. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_instance_norm python test/test_quantization.py TestQuantizedOps.test_group_norm python test/test_quantization.py TestQuantizedOps.test_qlayer_norm ``` Imported from OSS Differential Revision: D21261139 fbshipit-source-id: bebcad62a21a082152281a50defaa82aa769935a	2020-05-06 19:01:39 -07:00
Vasiliy Kuznetsov	4fa049c525	add quantized instancenorm operator (#36847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36847 Adds a quantized instancenorm operator, which can reuse most of groupnorm's logic. Benchmarking shows that the quantized version is about 10x faster than floating point for equivalent input sizes (https://gist.github.com/vkuzo/2f230e84d26f26cc6030afdbfbc8e7f0) Test Plan: ``` python test/quantization/test_quantized.py TestQuantizedOps.test_instance_norm ``` Imported from OSS Differential Revision: D21107925 fbshipit-source-id: 6bacda402f0eb9857bc8f9a5cf8ef306150613d4	2020-05-06 19:01:33 -07:00
Vasiliy Kuznetsov	b837d5d418	add quantized groupnorm operator (#36835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36835 Adds a quantized groupnorm operator. We reuse most of the layernorm kernel, modifying it to be able to perform channel-wise scaling. Benchmark results: the quantized layer is between 6x to 15x faster from fp to q, depending on input shapes (full results: https://gist.github.com/vkuzo/db67623232415382dabff6c8923124e9) Test Plan: ``` python test/quantization/test_quantized.py TestQuantizedOps.test_group_norm python test/quantization/test_quantized.py TestQuantizedOps.test_qlayer_norm ``` Numerics are nearly equivalent, with the only difference documented in the test case. The difference is the same type as with quantized layernorm. Making numerics equivalent is possible but will sacrifice speed. Imported from OSS Differential Revision: D21107926 fbshipit-source-id: 80e87e9e2c71310bc28c3d114c88de428819cb45	2020-05-06 19:01:26 -07:00
Vasiliy Kuznetsov	288dd33770	quant: remove hypothesis and int32 from layernorm test (#37947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37947 The current test is flaky, removing two potential causes of flakiness. Test Plan: CI Imported from OSS Differential Revision: D21434861 fbshipit-source-id: 82ea5762f3bb07a12052cde29729d73e95da8ddd	2020-05-06 18:59:54 -07:00
lixinyu	675e77e88a	add docker image build ubuntu16.04-cuda9.2-cudnn7-gcc5.4-py3.6 (#37610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37610 Test Plan: Imported from OSS Differential Revision: D21336643 Pulled By: glaringlee fbshipit-source-id: 92340a9e83c79c199f4a739a24857be08ca28e19	2020-05-06 18:24:05 -07:00
Michael Carilli	35693e9b4b	Give at::cuda::blas::gemv<at::Half> parity with <float> and <double>. Nature is healing. (#37569 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37157 on my machine. This was annoying to track down. The essence is that cublas expects column major inputs and Pytorch tensors are usually row major. Cublas lets you request that it act on transposed data, and the erroring `gemv` calls in https://github.com/pytorch/pytorch/issues/37157 make that request. The problem is, [cublasSgemv and cublasDgemv](https://docs.nvidia.com/cuda/cublas/index.html#cublas-lt-t-gt-gemv) (called by [`gemv<float>`](`091a1192d7/aten/src/ATen/cuda/CUDABlas.cpp (L318)`) and `gemv<double>`) regard their `m, n` arguments values as _pre_-transpose sizes, while [cublasGemmEx](https://docs.nvidia.com/cuda/cublas/index.html#cublas-GemmEx) (called by `gemv<at::Half>`, see [here](`091a1192d7/aten/src/ATen/cuda/CUDABlas.cpp (L342)`) and [here](`091a1192d7/aten/src/ATen/cuda/CUDABlas.cpp (L229)`)) regards its `m, k` argument values as _post_-transpose sizes. This is inconsistent. It turns out the `gemv<float>/<double>` calls are configured correctly and the `gemv<at::Half>` calls aren't. Strikethrough text below is no longer accurate, ngimel suggested a better way to handle gemv->gemm forwarding. [Comments in code](https://github.com/pytorch/pytorch/pull/37569/files#diff-686aa86335f96b4ecb9b37f562feed12R323-R348) provide an up-to-date explanation. Keeping out-of-date strikethrough text because I don't have the heart to delete it all and because it captures an intermediate state of my brain that will help orient me if i ever have to fix this again. ~~To convince myself this PR keeps `at::cuda::blas::gemv`'s external API consistent across dtypes, I need to think through what happens when a pytorch tensor input of size `(a,b)` multiples a vector of size `(b,)` for 4 cases:~~ ### ~~1. input is row-major (needs cublas internal transpose)~~ #### ~~1a. input is float or double~~ ~~`gemv<float>/<double>` call `cublasS/Dgemv`, forwarding `trans`, `m`, and `n` directly.~~ ~~`cublasS/Ggemv` expects "a m × n matrix stored in column-major format" (so m is the input's fast dim). Input has size `(a, b)` in row-major format. We can reinterpret it as a column-major matrix with size `(b, a)` without any memory movement. So the gemv call should supply `m=b`, `n=a`. However, we're not trying to multiply a matrix `(b, a)` x a vector `(b,)`, we're trying to sum across `b` for matrix and vector. So we also request that cublas transpose the matrix internally by supplying `trans='t'` to `blas::gemv`, which becomes `trans=CUBLAS_OP_T` to the `cublasS/Ggemv`.~~ ~~As long as the code calling `blas::gemv` thinks carefully and passes `trans='t'`, `m=b`, `n=a`, cublas carries out `(a, b) x (b,)` and all is well.~~ #### ~~1b. input is half or bfloat16~~ ~~`blas::gemv<at::Half>` takes a different code path, calling `gemm<at::Half>` which calls `cublasGemmEx`. The job of this PR is to make sure the exterior `blas::gemv` caller's carefully thought-out argument choices (`trans='t'`, `m=b`, `n=a`) remain correct.~~ ~~`cublasGemmEx` takes args `transa, transb, m, n, k, ....others we don't care about` and carries out~~ ``` C = α op ( A ) op ( B ) + β C where α and β are scalars, and A , B and C are matrices stored in column-major format with dimensions op ( A ) m × k , op ( B ) k × n and C m × n Also, for matrix A A if transa == CUBLAS_OP_N op ( A ) = A^T if transa == CUBLAS_OP_T ... ``` ~~`gemv<at::Half>` hacks a gemv by calling gemm such that the raw gemm's `m` is the output dim, `k` is the summed dim, and `n=1`, . Reasonable, as long as we get the values right, given that we also need to transpose the input.~~ ~~To conform with cublas docs we interpret input as column-major with size `(b, a)`. As for the `<float>/<double>` gemv we want cublas to carry out input (interpreted as column major), internally transposed, times vector of size `(b,)`. In other words we want cublas to apply `op(A) x B`, where op is transpose and `A` is input interpreted as column major. Docs define `m` and `k` by saying `op(A)` has dims `m x k` (`m` and `k` are _post_-`op` sizes). `A` was `(b, a)`, `op(A)` is `(a, b)`, so the correct thing is to supply `m=a`, `k=b` to the underlying gemm. For the `<float>/<double>` gemv, we passed `m=b`, not `m=a`, to the raw `cublasS/Dgemv`.~~ ~~The exterior `blas::gemv` must have been called with `trans='t'`, `m=b`, `n=a` (as required by the `<float>/<double>` versions). So when gemv is about to call gemm, we [swap](https://github.com/pytorch/pytorch/pull/37569/files#diff-686aa86335f96b4ecb9b37f562feed12R330) the local values of `m` and `n` so that `m=a`, `n=b`, then put `m (=a)` in the gemm's `m` spot, 1 in the gemm's `n` spot, and `n (=b)` in the gemm's `k` spot. All is well (we made the right gemm call after ingesting the same arg values as `blas::gemv<float>/<double>`).~~ ### ~~2. input is column-major (doesn't need cublas transpose)~~ #### ~~2a. input is float or double~~ ~~input is `(a,b)`, already column-major with strides `(1,a)`. Code calling `blas::gemv` supplies `trans='n'` (which becomes `CUBLAS_OP_N`, no internal transpose), `m=a`, `n=b`.~~ #### ~~2b. input is half or bfloat16~~ ~~`blas::gemv` should pass `transa='n'`, `m=a`, `n=1`, `k=b` to the underlying gemm. The exterior `blas::gemv` must have been called with `trans='t'`, `m=a`, `n=b` (as required by the `<float>/<double>` versions). So in this case we _don't_ swap `blas::gemv`'s local values of `m` and `n`. We directly put `m (=a)` in the gemm's `m` spot, 1 in the gemm's `n` spot, and `n (=b)` in the gemm's `k` spot. All is well (we made the right gemm call after ingesting the same arg values as `blas::gemv<float>/<double>`).~~ ~~ `trans` is a string `t` or `n` in the `at::cuda::blas::gemv` API, which gets [converted](`091a1192d7/aten/src/ATen/cuda/CUDABlas.cpp (L314)`) to a corresponding cublas enum value `CUBLAS_OP_T` (do transpose internally) or `CUBLAS_OP_N` (don't transpose internally) just before the raw cublas call.~~ Pull Request resolved: https://github.com/pytorch/pytorch/pull/37569 Differential Revision: D21405955 Pulled By: ngimel fbshipit-source-id: e831414bbf54860fb7a4dd8d5666ef8081acd3ee	2020-05-06 18:19:30 -07:00
Elias Ellison	28ed04c620	[JIT] remove list_with_default op (#37886 ) Summary: We can implement this as a builtin instead of as a registered op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37886 Differential Revision: D21414329 Pulled By: eellison fbshipit-source-id: 6e130fa83fbf7ba4d4601f509cb169a2fa804108	2020-05-06 17:32:11 -07:00
Allan Di Wu	f538cd627a	Install HugePagesArena to optimize pytorch prediction performance (#37640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37640 Enable oversize arena to reduce memory fragmentation. Memory request with large size (configurable with FLAGS_caffe2_oversize_threshold) are fulfilled from dedicated arena separate from the existing huge page arena. Two additional parameters are introduced to configure the 2-phase decay of the memory arena: - caffe2_dirty_decay_ms - caffe2_muzzy_decay_ms In current JEMalloc implementation, oversized allocations will be immediately purged regardless of putting it in arena or not. Therefore we need to extend the decay time to indefinite. Currently we set the default for caffe2_muzzy_decay_ms to -1. We now enable the arena allocator statically. To ensure it is correctly installed regardless of static initialization order, we add a priority flag in c10::SetAllocator, and only higher priority allocators can overwrite existing ones. ghstack-source-id: 103276877 Test Plan: buck test mode/dev //caffe2/caffe2/fb/init:huge_pages_allocator_test Benchmarking known CV model that benefits from page arena: ``` PyTorchModelBench.cpp:183] test / base : 86.9532% ``` By adjusting ```dirty_decay_ms``` and ```muzzy_decay_ms```, we have the following plots: https://pxl.cl/15SWW https://pxl.cl/15TnL From the figures above we can see performance does not change much until dirty decay time is indefinite (set to -1). Either setting muzzy decay or dirty decay time to -1 will reach best performance, regardless of which one it is. Even setting the decay time to very long (100s, which is longer than the run), does not change the performance by much. ## Observe performance difference in production with a variety of models (WIP) Reviewed By: dzhulgakov Differential Revision: D21258581 fbshipit-source-id: c006f8b94f28aef0666e52f48d4e82cf0d3a48af	2020-05-06 17:27:10 -07:00
Nikita Shulga	3cc5062544	Update bazel to 3.1.0 (#37951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37951 Differential Revision: D21439260 Pulled By: malfet fbshipit-source-id: 77bcb5a28a29482f6e44c01e3dafd24d24ee7ec3	2020-05-06 17:00:38 -07:00
Jerry Zhang	56fc347e49	[quant][fix] A typo in quantized::conv2d_relu (#37964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37964 I thought it was because of flakiness that we didn't pass the conv2d_relu test, but turns out to be a typo in the implementation Also re-enabled the `use_fused` option in `test_conv2d_api` Test Plan: . Imported from OSS Differential Revision: D21434776 fbshipit-source-id: 7c24c13cde0a96807e8dfbd1deabf47e8280fdb7	2020-05-06 16:38:26 -07:00
Gregory Chanan	f29f96d47b	Port existing zero_dim_dispatch optimizations from codegen and remove codegen capability. (#37615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37615 We probably missed a lot of these when we ported things from TH, but it's also probably not a huge deal. There is only one left with fmod. Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D21338030 Pulled By: gchanan fbshipit-source-id: c133b4e37df87a53797939e9f757cea9446834e8	2020-05-06 15:59:42 -07:00
Elias Ellison	f5b3125af7	[JIT] Peephole optimize list ops (#37612 ) Summary: Peephole optimize `len(li)` and `li[index]` patterns. This changes the Profiled Graph IR for the following tests: ``` (Test Name, Num ifs loops, Num non-tensor nodes) Before: ('test_nn_Conv1d_reflect_stride2_pad2', 3, 14) ('test_nn_Conv2d_reflect_stride2_pad2', 3, 14) ('test_nn_Conv1d_circular_stride2_pad2', 5, 31) ('test_nn_Conv2d_circular_stride2_pad2', 5, 31) ('test_nn_Conv3d_circular_stride2_pad2', 5, 31) ('test_nn_Conv1d_replicate_stride2_pad2', 3, 14) ('test_nn_Conv2d_replicate_stride2_pad2', 3, 14) ('test_nn_Conv3d_replicate_stride2_pad2', 3, 14) After ('test_nn_Conv1d_reflect_stride2_pad2', 0, 2) ('test_nn_Conv2d_reflect_stride2_pad2', 0, 2) ('test_nn_Conv1d_circular_stride2_pad2', 0, 4) ('test_nn_Conv2d_circular_stride2_pad2', 0, 7) ('test_nn_Conv3d_circular_stride2_pad2', 0, 10) ('test_nn_Conv1d_replicate_stride2_pad2', 0, 2) ('test_nn_Conv2d_replicate_stride2_pad2', 0, 2) ('test_nn_Conv3d_replicate_stride2_pad2', 0, 2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37612 Differential Revision: D21352676 Pulled By: eellison fbshipit-source-id: f8a0e7653b7a6a4c769f075de9b3044242ca9336	2020-05-06 15:55:18 -07:00
Xiang Gao	bf970bce21	Migrate some CUDA arithmetic kernels to c10::complex (#37878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37878 Test Plan: Imported from OSS Differential Revision: D21426621 Pulled By: anjali411 fbshipit-source-id: 6cdf0ee7320e5c4c2864331b1eaff4201d74ccf7	2020-05-06 15:51:15 -07:00
Jerry Zhang	4bbf889bcf	[jit][api][refactor] remove redundant deepcopy implementation (#37538 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37538 Test Plan: . Imported from OSS Differential Revision: D21431011 fbshipit-source-id: 9dedccb19c7d43999a756e3f5076846527e2f6ca	2020-05-06 15:41:33 -07:00
Nikita Shulga	cd0724f9f1	Do not `std::move` returned value (#37891 ) Summary: This prevents compiler to use copy elision and triggers `redundant move in return statement` warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37891 Differential Revision: D21417998 Pulled By: malfet fbshipit-source-id: 4008a6442cee3fe710c2da252b1bde7b4293b63f	2020-05-06 15:38:05 -07:00
Jerry Zhang	728189588e	[reland][quant][graphmode] Support a new category of ops in graph mode quantization (#37936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37936 Previously we classify ops like average pool to the category that doesn't require observation and the quantization of these ops are done by swapping with dequantize ops: https://github.com/pytorch/pytorch/pull/33481 However, this operation is done in finalize, which means finalize is a numerics changing pass when we swap dequantize with ops like average pool, this is not ideal since we want to restrict the scope of numerics changing passes. Because although average pool doesn't require observation, quantized average pool = dequant + float32 average pool + quant and swapping average pool with dequantize is a numerics changing operation. This PR implements the support for that. We'll classify ops like average pool to a new category and we'll get average pool through fusion, like we did for other quantized ops. And the numerics changing pass will only happen in insert quant dequant pass, so the model will have the same numerics before and after finalize. With the new category, the debug only option(the model before finalize) for quantize_script will actually produce a model that's numerically consistent with the finalized model. Test Plan: python test/test_quantization.py TestQuantizeScriptJitPasses Differential Revision: D21432871 Pulled By: jerryzh168 fbshipit-source-id: 4926890441e39af4e459376038563c3882cc4c46	2020-05-06 15:36:29 -07:00
Mikhail Zolotukhin	ec9342521b	[TensorExpr] Support Bool dtype in Or, Xor, And ops and in TensorExprKernel::bindInput. (#37938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37938 Test Plan: Imported from OSS Differential Revision: D21428552 Pulled By: ZolotukhinM fbshipit-source-id: f9840b2f090150da01b172a31618ea261da19ff4	2020-05-06 15:28:23 -07:00
Elias Ellison	a3042ca89d	[JIT] Rewrite unaliased if output mutation (#37694 ) Summary: In a case like below, if x0 and x1 are both unaliased an only have a single use, than we can rewite the mutation to x2 without breaking observable semantics. This PR makes torchvision.models.alexnet functionalizable. ``` if cond: x0 = op() else: x1 = op() x2.add_(1) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37694 Differential Revision: D21428275 Pulled By: eellison fbshipit-source-id: 1e2a39a8fb3819f1f225b7c345e986b3a3db253f	2020-05-06 15:26:31 -07:00
Michael Suo	b53e6bfd49	[jit] normalize `getMethod` (#37472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37472 Our convention is for `findX` to return an optional version and `getX` to assert that the X is there. Fix up `getMethod` to be consistent with this convention. Test Plan: Imported from OSS Differential Revision: D21297543 Pulled By: suo fbshipit-source-id: b40f56231cc8183e61bbb01fe5c0c113bcb6464d	2020-05-06 15:22:25 -07:00
Elias Ellison	28ac5cdc91	fix profiling test (#37961 ) Summary: this is failing in the profiling_executor job Pull Request resolved: https://github.com/pytorch/pytorch/pull/37961 Differential Revision: D21434341 Pulled By: eellison fbshipit-source-id: b34f94b1595ef6f6edee76cd200f951a2ef21f22	2020-05-06 15:04:44 -07:00
Xiang Gao	6293f1fb49	Migrate cpu kernel for index and index_put to c10::complex (#37877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37877 Test Plan: Imported from OSS Differential Revision: D21426613 Pulled By: anjali411 fbshipit-source-id: 1bbdc0b0fc38df8a135c4cc29440b767b675324c	2020-05-06 14:51:40 -07:00
Nikolay Korovaiko	ae308db681	fix lilstm test in tensorexpr_te (#37913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37913 Reviewed By: ZolotukhinM Differential Revision: D21428329 Pulled By: Krovatkin fbshipit-source-id: eefba49b59dc76a6efaad85f03a4c12b889b60a9	2020-05-06 14:44:28 -07:00
peter	ab2373205f	Create a desktop shortcut for restoring pytorch environment on CircleCI (#37926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37926 Differential Revision: D21434713 Pulled By: ezyang fbshipit-source-id: 20687c632547a287ce8ed4c0fc692e2210bb5871	2020-05-06 14:38:49 -07:00
Wojciech Baranowski	945672bf3e	cmake: improve dependencies in incremental builds (#37661 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26304 Test procedure: With ninja: [x] Build a clean checkout [x] Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s. [x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding. [x] Modify DispatchTable.h. Build again. Result: `.cu` files are rebuilt, as well as many `.cpp` files [x] Build for the fifth time. Result: Virtually instantaneous, with no extra rebuilding. [x] Touch one of the `.depend` files. Build again. Result: Only 10 libraries are (needlessly) linked again, the extra delay on a 24-core machine is <10s. Without ninja: [x] Build a clean checkout [x] Build again. Result: There is some unnecessary rebuilding. But it was also happening before this change. [x] Build for the third time. Result: Virtually instantaneous, with no extra rebuilding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37661 Differential Revision: D21434624 Pulled By: ezyang fbshipit-source-id: 379d2315486b8bb5972c184f9b8da8e00d38c338	2020-05-06 14:25:18 -07:00
anjali411	4c4816ad07	[CPU] addmv for complex tensors (#37924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37924 Test Plan: Imported from OSS Differential Revision: D21429384 Pulled By: anjali411 fbshipit-source-id: 8b1b76ed13d2e5785a4d552aedb2e6f58d304c46	2020-05-06 14:13:05 -07:00
Nikita Shulga	7a408576dd	Stopgap fix to `determine_target` predicate (#37934 ) Summary: This makes it a proper python package, therefore `ModuleFinder` will parse dependencies from this module. (see https://docs.python.org/3/tutorial/modules.html ) As result, changes to `torch/testing/_internal/common_quantization` or `test/quantization/*.py` would be considered affecting `test_quantization.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37934 Test Plan: CI Differential Revision: D21432413 Pulled By: malfet fbshipit-source-id: acff6cee69a1dfd5535e33978f826ed1f6a70821	2020-05-06 14:05:14 -07:00
Jerry Zhang	1ad46f470f	[jit] `__copy__` for `RecursiveScriptModule` (#36830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36830 Test Plan: build/bin/test_jit Imported from OSS Differential Revision: D21431012 fbshipit-source-id: 13a1bf9744ec95ea59622226c8d8a8d55ec3f0b0	2020-05-06 13:55:01 -07:00
Kimish Patel	b1b6bc36a5	Enable xnnpack_integration test in CI. (#37838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37838 Test Plan: oss: python test/test_xnnpack_integration.py Reviewed By: xcheng16 Differential Revision: D21405850 fbshipit-source-id: ba4ba06692b49315f110653d9492b2e14b618574	2020-05-06 13:53:03 -07:00
David Reiss	d6b51e4adf	In interpolate, join short lines (#37170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37170 ghstack-source-id: 102773588 Test Plan: CI Reviewed By: kimishpatel Differential Revision: D21209998 fbshipit-source-id: 9386e54aa85a5576678d21d443017079028f8dca	2020-05-06 13:03:45 -07:00
David Reiss	59f03c69ab	In interpolate, give a short name to scale_factor_list (#37169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37169 This allows some cleanup of the code below by making lines shorter. ghstack-source-id: 102773593 Test Plan: Existing tests for interpolate. Reviewed By: kimishpatel Differential Revision: D21209988 fbshipit-source-id: cffcdf9a580b15c4f1fa83e3f27b5a69f66bf6f7	2020-05-06 13:03:39 -07:00
David Reiss	4996961826	In interpolate, only call _interp_output_size in one place (#37168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37168 It looks like this was made a separate function because of the `dim` argument, but that argument is always equal to `input.dim() - 2`. Remove the argument and consolidate all call sites into one. This also means that this will be called on paths that previously didn't call it, but all those cases throw exceptions anyway. ghstack-source-id: 102773596 Test Plan: Existing tests for interpolate. Reviewed By: kimishpatel Differential Revision: D21209993 fbshipit-source-id: 2c274a3a6900ebfdb8d60b311a4c3bd956fa7c37	2020-05-06 13:03:33 -07:00
David Reiss	8749aa2d55	Clean up formatting in upsample ops (#37166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37166 ghstack-source-id: 102773597 Test Plan: CI Reviewed By: kimishpatel Differential Revision: D21210001 fbshipit-source-id: 8e65d638dea72d995d6c079ed8c0b03be0fb813c	2020-05-06 13:03:28 -07:00
David Reiss	78529f6de7	Whitespace cleanup (#37165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37165 ghstack-source-id: 102773591 Test Plan: CI Reviewed By: kimishpatel Differential Revision: D21209997 fbshipit-source-id: c5eef259aade2ad66095231e139ba125e759445b	2020-05-06 13:01:56 -07:00
Xiang Gao	5edf5efd37	Migrate CPU sum, eq, and ne to c10::complex (#37876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37876 Test Plan: Imported from OSS Differential Revision: D21426516 Pulled By: anjali411 fbshipit-source-id: 0532e5508ad65e649f3d4d8cde32ff871956c9f7	2020-05-06 12:21:36 -07:00
Nick Gibson	4e2ea6e013	[TensorExpr] Remove the Tensor argument from loopnest.reorderAxis (#37873 ) Summary: Remove the requirement for the axes provided to reorderAxis to come from a Tensor. We were using that to determine the relevant loops, but we can alternatively determine it by traversing the parents of each provided For. resistor does this work for you? Pull Request resolved: https://github.com/pytorch/pytorch/pull/37873 Differential Revision: D21428016 Pulled By: nickgg fbshipit-source-id: b16b2f41cb443dfc2c6548b7980731d1e7d89a35	2020-05-06 12:02:15 -07:00
mjavanmard	53e7d49a98	Port register_prim_ops_c10.cpp to new registration API (#37834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37834 Ported all use sites of the old registration API to use new Integer operator registration API. Test Plan: Imported from OSS Differential Revision: D21415700 Pulled By: MohammadMahdiJavanmard fbshipit-source-id: 34f18757bad1642e1c485bb30c9771f7b7102230	2020-05-06 11:44:37 -07:00
Elias Ellison	0e3a05ec00	[JIT] rename enable_profiling_mode to enable_profiling_mode_for_profiling_tests (#37825 ) Summary: The existing contextmanager only conditionally enabled_profiling_mode, which was counter intuitive. When we changed the default executor it broke internal benchmarking as a result. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37825 Differential Revision: D21404611 Pulled By: eellison fbshipit-source-id: 306b3c333ef4eb44ab6a6e5ab4e0682e5ce312ce	2020-05-06 11:30:02 -07:00
Xiang Gao	436cd2c02d	Migrate check_convert to c10::complex (#37875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37875 Test Plan: Imported from OSS Differential Revision: D21426480 Pulled By: anjali411 fbshipit-source-id: e9a474b4f7524aeeb6c63976ff7de9ac38ecefab	2020-05-06 11:13:12 -07:00
Nik Ved	8434247653	modify `select_equals_backward` to propage only to a single value (#36316 ) Summary: Renames `select_equals_backward` to `select_first_equal_backward` and makes sure it propagates to a single value. Fixes [https://github.com/pytorch/pytorch/issues/35699](https://github.com/pytorch/pytorch/issues/35699). Pull Request resolved: https://github.com/pytorch/pytorch/pull/36316 Differential Revision: D21403848 Pulled By: albanD fbshipit-source-id: b260cd79289162ee5733887d2afe8203945baee6	2020-05-06 10:50:24 -07:00
Ailing Zhang	dd618216c5	[JIT]Support adv indexing using list. (#37848 ) Summary: We used to only support indexing through - numbers like `x[0, 1]` - tuple like `x[(0, 1)]` - tensor like `x[torch.tensor([0, 1])]` This PR adds support for indexing through list which is equivalent to tensor. - `x[[0, 1, 5]]` - `x[[0, 1], [0, 1]]` - `x[[[0, 1], [0, 1]], [[0, 1], [0, 1]]]` Note for `x[[0, 1, 5]]` we had a bug in AST conversion code so we used to treat it like `x[0, 1, 5]` which means it might accidentally run and produce wrong result(fixes https://github.com/pytorch/pytorch/issues/37286 fixes https://github.com/pytorch/pytorch/issues/18616), now that it's fixed we probably want to mark it as BC breaking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37848 Reviewed By: suo Differential Revision: D21409840 Pulled By: ailzhang fbshipit-source-id: 6f2d962885c6dc009cb384d98be1822f5ca7a189	2020-05-06 10:44:48 -07:00
Mike Ruberry	f2148de92f	Revert D21409626: [quant][tests] Enable tests to run on all qengine backends Test Plan: revert-hammer Differential Revision: D21409626 Original commit changeset: 21b23e498f43 fbshipit-source-id: 44cb6d1087c521926c56fa4148c2eb897e03bb98	2020-05-06 10:37:41 -07:00
Shawn Zhong	ec7fd0caef	[docs] Fix broken links in `contribution_guide.rst` and `governance.rst` (#37820 ) Summary: Fix https://github.com/pytorch/pytorch/issues/37716 Fix three broken links in the documentation: - [PyTorch Governance](https://pytorch.org/docs/source/community/governance.rst) in the [Contribution Guide page](https://pytorch.org/docs/master/community/contribution_guide.html#the-pytorch-contribution-process) - [PyTorch Governance \| Persons of Interest](https://pytorch.org/docs/source/community/persons_of_interest.rst) under the [Core Developer section](https://pytorch.org/docs/master/community/governance.html#core-developers) - [PyTorch Contributor Guide](https://pytorch.org/docs/source/community/contribution_guide.rst) under the [FAQ session of the Governance Page](https://pytorch.org/docs/master/community/governance.html#faq) The old link leads to the `.rst` source file, which does not exist on the server. It's now fixed using the [document cross-referencing syntax](https://www.sphinx-doc.org/en/1.8/usage/restructuredtext/roles.html#cross-referencing-documents) Pull Request resolved: https://github.com/pytorch/pytorch/pull/37820 Differential Revision: D21414579 Pulled By: mruberry fbshipit-source-id: ecf6de9317ce93f70205cbfe97a3bdd54e635fe5	2020-05-06 10:33:33 -07:00
Kimish Patel	e729db48ca	Remove requantization scale constraint. (#37683 ) Summary: Now that we landed float requantization for conv/linear, we do not need the constraint for requant_scale < 1. Removing that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37683 Test Plan: Quantization tests Differential Revision: D21412536 Pulled By: kimishpatel fbshipit-source-id: c932b5ab3aa40407e9d7f0c877e2fe7fd544f8a7	2020-05-06 10:23:08 -07:00
Edward Yang	6f06df8193	Fix lint (#37922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37922 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21426425 Pulled By: ezyang fbshipit-source-id: 9d0d997f608a742668f64e7529c41feb39bec24e	2020-05-06 09:29:34 -07:00
Gregory Chanan	122d8215a3	[RESUBMIT] Kill broadcasting from the codegen layer. (#37907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37907 Test Plan: Imported from OSS Differential Revision: D21420872 Pulled By: gchanan fbshipit-source-id: c782c0c438bcb7e764a97b446f8c3cd168e188f0	2020-05-06 08:54:47 -07:00
Jerry Ma	88c447bf71	Change DeprecationWarning to UserWarning in `torch.cuda` (#32142 ) Summary: Follow-up of https://github.com/pytorch/pytorch/issues/27361 . Addresses https://github.com/pytorch/pytorch/issues/32141 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/32142 Differential Revision: D19404540 Pulled By: gchanan fbshipit-source-id: f0b230a3224004286064da2b617ff471ba272f47	2020-05-06 08:28:43 -07:00
Supriya Rao	f78d02ed51	[quant][tests] Enable tests to run on all qengine backends (#37843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37843 Refactor tests to use supported_qengines Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D21409626 fbshipit-source-id: 21b23e498f4359c7ea7430c86f931dd534ddfdb7	2020-05-06 07:51:29 -07:00
Tal Cherckez	2f61b04514	Add Aten as dep to fakelowp and cpuinfo path to its include path (#37909 ) Summary: yinghai please review Pull Request resolved: https://github.com/pytorch/pytorch/pull/37909 Reviewed By: hyuen Differential Revision: D21422399 Pulled By: yinghai fbshipit-source-id: 2dfce909fe11a12404d16286e77e81dd46dfda52	2020-05-06 06:32:13 -07:00
Owen Anderson	75c201ac32	Fix some amount of support for Bool in tensorexpr. (#37914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37914 Reviewed By: ZolotukhinM Differential Revision: D21421402 Pulled By: resistor fbshipit-source-id: 825391843d74fee3a23a934c859d867ef3cffde9	2020-05-06 02:04:48 -07:00
Pritam Damania	cdc56d0b6c	Support c10::optional<Tensor> in custom C++ autograd function. (#37700 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37700 Certain autograd functions can have optional Tensor arguments. For this purpose it would be nice to support c10::optional<Tensor> as an argument for C++ autograd functions. I've added the appropriate overload to ExtractVariables to ensure this works. For an example, you can look at D21272807 in terms of how this is used. ghstack-source-id: 103541789 Test Plan: waitforbuildbot Differential Revision: D21363491 fbshipit-source-id: 0c8665e9bfe279e6b9ab84a889524fea11fa971c	2020-05-06 01:59:51 -07:00
Gao, Xiang	b57b596f20	Reduction should not coalesce_dimensions when splitting for 32bit indexing (#37788 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37583 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37788 Differential Revision: D21387325 Pulled By: ngimel fbshipit-source-id: dbd0f5a23e06d8c4cc68cd21b09b4b0221c4bba7	2020-05-05 23:44:00 -07:00
svcscm	222fdd4227	Updating submodules Summary: GitHub commits: `8eb845b08d` `40f530d566` Test Plan: n/a Reviewed By: jurajh-fb fbshipit-source-id: d73a0ab8a9ab28196e88b40bb31fe93bf20378ba	2020-05-05 23:36:49 -07:00
Jerry Zhang	ad2305e556	Revert D21393512: [quant][graphmode] Support a new category of ops in graph mode quantization Test Plan: revert-hammer Differential Revision: D21393512 Original commit changeset: 5632935fe1a7 fbshipit-source-id: 6e43897ee59924656af18a7f2c95c13bb4b48311	2020-05-05 22:51:40 -07:00
Edward Yang	fe88806784	Back out "Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count" (#37893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37893 Original commit changeset: 50746043acf3 Test Plan: sandcastle and ossci Reviewed By: malfet, seemethere, ngimel Differential Revision: D21416509 fbshipit-source-id: 735ec4e61f9d36d4537f52dd2dc6267751aeb94b	2020-05-05 22:43:15 -07:00
Mikhail Zolotukhin	8c91b78277	[TensorExpr] Fix the shape info check in the TE fuser pass. (#37882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37882 Previously we checked if a node's inputs and outputs have shape info only when we tried to merge this node into an existing fusion group, but we didn't check it for the first node in the group. This PR fixes that. It was causing a failure on test_milstm_cuda, which is now fixed. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D21412756 Pulled By: ZolotukhinM fbshipit-source-id: 3ca30637ab8fe68443adb5fc03f1b8a11085a6a8	2020-05-05 22:34:59 -07:00
rohithkrn	e3934dfae8	[ROCm] Enable bfloat16 for ops in BERT model (#37634 ) Summary: Enables bfloat16 type for ops present in BERT model. Enabled relevant unit tests. ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/37634 Differential Revision: D21413957 Pulled By: ezyang fbshipit-source-id: 19309fe46b4a2f07922bf5b32fee2066df514aeb	2020-05-05 21:24:56 -07:00
ashishfarmer	402f635bbe	Enable ahead of time compilation for HIPExtensions using ninja (#37800 ) Summary: This pull request enables ahead of time compilation of HIPExtensions with ninja by setting appropriate compilation flags for ROCm environment. Also, this enables the unit test for testing cuda_extensions on ROCm as well as removing test for ahead of time compilation of extensions with ninja from ROCM_BLACKLIST ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/37800 Differential Revision: D21408148 Pulled By: soumith fbshipit-source-id: 146f4ffb3418f3534e6ce86805d3fe9c3eae84e1	2020-05-05 20:53:35 -07:00
Jerry Zhang	70f375becf	[quant] ConvPackedParams with TorchBind (#35923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35923 (Note: this ignores all push blocking failures!) Test Plan: tbd Imported from OSS Differential Revision: D20957089 fbshipit-source-id: 74d8bd628ccba64e902ea6ebabc2b883924050b0	2020-05-05 20:18:36 -07:00
Basil Hosmer	32b09f7ab9	Devirtualize device init calls in factory op wrappers (#37815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37815 Generated device-specific wrappers for Tensor factory ops now call methods on `globalContext()` directly, rather than indirecting through `globalLegacyTypeDispatch()`, which we can now delete. Test Plan: Imported from OSS Differential Revision: D21398294 Pulled By: bhosmer fbshipit-source-id: b37bc67aa33bfda6f156d441df55ada40e9b814d	2020-05-05 19:56:45 -07:00
Nikita Shulga	9f060d3873	[Caffe2] Increase timing threshold to 50 ms on Windows (#37892 ) Summary: Helps prevent following accidental failures: ``` ..\caffe2\core\parallel_net_test.cc:303 The difference between ms and 350 is 41, which exceeds kTimeThreshold, where ms evaluates to 391, 350 evaluates to 350, and kTimeThreshold evaluates to 40. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37892 Differential Revision: D21417251 Pulled By: malfet fbshipit-source-id: 300cff7042e466f014850cc7cc406c725d5d0c04	2020-05-05 19:45:36 -07:00
Jerry Zhang	5eacc9cb57	[quant][graphmode] Support a new category of ops in graph mode quantization (#37515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37515 Previously we classify ops like average pool to the category that doesn't require observation and the quantization of these ops are done by swapping with dequantize ops: https://github.com/pytorch/pytorch/pull/33481 However, this operation is done in finalize, which means finalize is a numerics changing pass when we swap dequantize with ops like average pool, this is not ideal since we want to restrict the scope of numerics changing passes. Because although average pool doesn't require observation, quantized average pool = dequant + float32 average pool + quant and swapping average pool with dequantize is a numerics changing operation. This PR implements the support for that. We'll classify ops like average pool to a new category and we'll get average pool through fusion, like we did for other quantized ops. And the numerics changing pass will only happen in insert quant dequant pass, so the model will have the same numerics before and after finalize. With the new category, the debug only option(the model before finalize) for quantize_script will actually produce a model that's numerically consistent with the finalized model. Test Plan: python test/test_quantization.py TestQuantizeScriptJitPasses Imported from OSS Differential Revision: D21393512 fbshipit-source-id: 5632935fe1a7d76382fda22903d77586a08f0898	2020-05-05 19:04:53 -07:00
Edward Yang	480bd0ad50	Stop defining static data in Vec256 (#37767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37767 Fixes #37577 Needs tests, and maybe a lint. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21386704 Pulled By: ezyang fbshipit-source-id: 082c69f9e1f40dc5ed7d371902a4c498f105d99f	2020-05-05 18:46:40 -07:00
Michael Ranieri	96b512be07	fix msan in vec_reduce_all (#37853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37853 ``` Uninitialized value was created by an allocation of 'acc_arr_next' in the stack frame of function '_ZN2at6vec25614vec_reduce_allIfZZNS_6native12_GLOBAL__N_124_vec_log_softmax_lastdimIfEEvPT_S6_llENKUlllE_clEllEUlRNS0_12_GLOBAL__N_16Vec256IfEESB_E_EES5_RKT0_NS9_IS5_EEl' #0 0xa961530 in float at::vec256::vec_reduce_all<float, void at::native::(anonymous namespace)::_vec_log_softmax_lastdim<float>(float, float, long, long)::'lambda'(long, long)::operator()(long, long) const::'lambda'(at::vec256::(anonymous namespace)::Vec256<float>&, at::vec256::(anonymous namespace)::Vec256<float>&)>(void at::native::(anonymous namespace)::_vec_log_softmax_lastdim<float>(float, float, long, long)::'lambda'(long, long)::operator()(long, long) const::'lambda'(at::vec256::(anonymous namespace)::Vec256<float>&, at::vec256::(anonymous namespace)::Vec256<float>&) const&, at::vec256::(anonymous namespace)::Vec256<float>, long) xplat/caffe2/aten/src/ATen/cpu/vec256/functional.h:12 ``` Test Plan: passed sanitizer locally after change, CI green Differential Revision: D21408120 fbshipit-source-id: b9d058cedf42b3d1d34ce05a42049d402906cd13	2020-05-05 18:25:15 -07:00
Ailing Zhang	e3d1c4eaac	Revert D21310335: reenable quantization test_qadd_scalar_relu test Test Plan: revert-hammer Differential Revision: D21310335 Original commit changeset: 99d22e61168f fbshipit-source-id: 081b24ef0026ffb5fbb86d0654406b46e3d752eb	2020-05-05 18:02:15 -07:00
Nikolay Korovaiko	92f750b5c7	disable clang-tidy modernize-trailing-return (#37888 ) Summary: too much noise from this warning ![image](https://user-images.githubusercontent.com/5086322/81123764-b6e15900-8ee8-11ea-8f2f-49d69ddde25d.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/37888 Differential Revision: D21415338 Pulled By: Krovatkin fbshipit-source-id: 8d6f1be11d8419fa54a18e167929100401da439a	2020-05-05 17:40:22 -07:00
peter	0359a9b0a0	Delay loading the cuda library on Windows (#37811 ) Summary: so we can import torch compiled with cuda on a CPU-only machine. need tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/37811 Differential Revision: D21417082 Pulled By: ezyang fbshipit-source-id: 7a521b651bca7cbe38269915bd1d1b1bb756b45b	2020-05-05 17:28:28 -07:00
Gregory Chanan	91c1505e5a	Move addmm broadcasting code from codegen layer to native layer. (#37613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37613 Test Plan: Imported from OSS Differential Revision: D21337341 Pulled By: gchanan fbshipit-source-id: 064e983e0dc4334c5eed9df1af57bd7fc29d7a81	2020-05-05 17:15:48 -07:00
Gregory Chanan	6792c3ad24	Move addbmm broadcasting from the codegen layer to native layer. (#37603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37603 Test Plan: Imported from OSS Differential Revision: D21333923 Pulled By: gchanan fbshipit-source-id: 6afb8f7b9931fd78064b4c759d38ffb0f4a6e293	2020-05-05 17:13:16 -07:00
Edward Yang	b8d48d3680	Revert D21406034: [pytorch][PR] [BE] Add @skipIfNoFBGEMM decorator Test Plan: revert-hammer Differential Revision: D21406034 Original commit changeset: 9583a8a726c2 fbshipit-source-id: ec891e5d00c78310b320f4901a261fc99fc5399b	2020-05-05 16:48:40 -07:00
Raghuraman Krishnamoorthi	34bf868ebc	Fix weight quantization in RNNs (#35961 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35961 Weight quantization was done incorrectly for LSTMs, the statistics for all weights (across layers) were combined in the observer. This meant that weights for later layers in a LSTM would use sub-optimal scales impacting accuracy. The problem gets worse as the number of layers increases. ghstack-source-id: 103511725 Test Plan: Will be updated Differential Revision: D20842145 fbshipit-source-id: a622b012d393e0755970531583950b44f1964413	2020-05-05 16:40:16 -07:00
Edward Yang	a2fc7f787a	Revert D21171334: [pytorch][PR] Change StorageImpl to track byte count rather than element count Test Plan: revert-hammer Differential Revision: D21171334 Original commit changeset: 37329a379de9 fbshipit-source-id: 50746043acf3c76754688de0fe6f1cc12437ea2f	2020-05-05 16:36:15 -07:00
Michael Ranieri	563bbeb890	fix undef CUDA_VERSION warning (#37866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37866 make sure not to check `CUDA_VERSION` if it is not defined Test Plan: CI gree Reviewed By: anjali411 Differential Revision: D21408844 fbshipit-source-id: 5a9afe372b3f1fbaf08a7c43fa3e0e654a569d5f	2020-05-05 16:31:24 -07:00
Vasiliy Kuznetsov	0cae718723	reenable quantization test_qadd_scalar_relu test (#37423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37423 For now, see what breaks on CI ghstack-source-id: 103508233 Test Plan: CI Imported from OSS Differential Revision: D21310335 fbshipit-source-id: 99d22e61168fcb318b18a16522aabdc0115c1f39	2020-05-05 16:10:42 -07:00
Vasiliy Kuznetsov	b61fda2313	reenable quantized test_compare_tensor_scalar (#37422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37422 The test was failing because in fbcode the version of hypothesis was too old to know about the width parameter, and it was trying to generate values larger than float32. The fix is to explicitly set the defaults of the floats range for old versions of hypothesis. For now, reenable the test and see what breaks in CI ghstack-source-id: 103500358 Test Plan: CI ``` buck test mode/dev-nosan //caffe2/test:quantization -- 'test_compare_tensor_scalar $quantization\.test_quantized_op\.TestComparatorOps$' ``` Imported from OSS Reviewed By: raghuramank100 Differential Revision: D21310336 fbshipit-source-id: 1a59ab722daa28aab3d6d2d09bc527874942dc36	2020-05-05 16:09:08 -07:00
Michael Ranieri	b57d82fcbb	workaround nvcc host function bug (#37867 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37867 this is to work around internal issue we are hitting with nvcc in ovrsource. It does not seem to overload to the correct device version of `isinf` and `isnan` without this fudging of the code. Test Plan: CI green, internal builds pass Reviewed By: malfet Differential Revision: D21408263 fbshipit-source-id: 1ff44e088b5c885d729cc95f00cf8fa07e525f6d	2020-05-05 15:31:34 -07:00
Omkar Salpekar	30a65f1afa	[Tensorpipe Agent] Call Shutdown from Destructor and Join (#37839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37839 Calling `RpcAgent::shutdown` from the TensorpipeAgent will ensure that parent class threads are joined and the atomic is set to False. ghstack-source-id: 103496383 Test Plan: CI Build - no Tensorpipe Agent tests yet Differential Revision: D21291974 fbshipit-source-id: 50cab929b021faf7f80e0e8139d0c7d1788a3a6c	2020-05-05 15:25:45 -07:00
Hong Xu	5325606c37	Add zero_mask() for Vec256<BFloat16> (#37114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37114 Test Plan: Imported from OSS Differential Revision: D21351861 Pulled By: VitalyFedyunin fbshipit-source-id: 4564624cb33555a3f026af25540b2df24edaecfb	2020-05-05 15:14:42 -07:00
Nikita Shulga	4c009c7f3e	Make aten_tensor_iterator ASAN safe (#37869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37869 Return type of `cpu_serial_kernel` functor should match type of the Tensor Closes https://github.com/pytorch/pytorch/issues/37490 Test Plan: CI Differential Revision: D21410450 fbshipit-source-id: 78081d7478fc8126cbd497625ba60ed17e253314	2020-05-05 15:08:48 -07:00
Mikhail Zolotukhin	27fc2ab9f4	[TensorExpr] Add a constructor accepting a name_hint to class Buf. (#36617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36617 Test Plan: Imported from OSS Differential Revision: D21027355 Pulled By: ZolotukhinM fbshipit-source-id: 54633f7400f24f7f9fdcaeead94c80282ccb5207	2020-05-05 15:06:10 -07:00
Mikhail Zolotukhin	1c0bad25f3	[TensorExpr] Add dtype to class Buf. (#36611 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36611 Currently Buf represents underlying storage but it didn't have dtype. That resulted in specifying dtypes in different places and there was no mechanism to enforce its consistency: e.g. one could've created a kFloat expression and use a kInt buffer to store its result. Now we're centralizing where the logic regarding the storage is located and we can start enforcing semantics rules. Follow-ups: we can merge Buffer and BufHandle classes as the former is now a mere wrapper over the latter. Test Plan: Imported from OSS Differential Revision: D21027356 Pulled By: ZolotukhinM fbshipit-source-id: c06aa2c4077fdcde3bb4ca622d324aece79b5a9c	2020-05-05 15:04:37 -07:00
Nikita Shulga	2c6aed0d61	[Testing] Add `--save-xml` option (#37840 ) Summary: Passing `--save-xml` option to common test runner would have the same effect as setting up `IN_CIRCLECI` environment variable, but also would allow one to specify folder to save results Pull Request resolved: https://github.com/pytorch/pytorch/pull/37840 Differential Revision: D21410250 Pulled By: malfet fbshipit-source-id: ae5855fafdc8c66b550d42b683d547c88b4e55d9	2020-05-05 14:57:50 -07:00
Omkar Salpekar	a3639fa516	[Tensorpipe Agent] Adding Tensorpipe Codeowners (#37854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37854 Adding Tensorpipe contributors to the Codeowners file for Tensorpipe-related functionality in PyTorch. ghstack-source-id: 103507371 Test Plan: CI Differential Revision: D21408676 fbshipit-source-id: ea7cc1fd7ec069c83e67812e704d31492ef2a3cf	2020-05-05 14:27:42 -07:00
Kurt Mohler	3706803b60	Change StorageImpl to track byte count rather than element count (#37776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37776 * Remove type-specific size tracking in favor of byte size tracking in Storage and StorageImpl * Changed numel() and set_numel() to nbytes() and set_nbytes() * Added enum argument to Storage/StorageImpl constructor to indicate new meaning of the size parameter * Update all callers of the changed API Part of issue https://github.com/pytorch/pytorch/issues/33950 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37028 Differential Revision: D21171334 Pulled By: ezyang fbshipit-source-id: 37329a379de9a3a83cc5e9007e455a3e1c2d10b8	2020-05-05 14:20:51 -07:00
Emilio Castillo	25ba802ce4	Fix `cdist` backward calculation for `p=2` (#37337 ) Summary: Closes https://github.com/pytorch/pytorch/issues/37154 Fixes a bug in `cdist` backward with `p=2`. Under some circumstances, if the output has 0s, the gradient calculation of `sqrt` will be undefined. Leading to NaNs in the input gradients. This PR defines a subgradient for this case. A test is also added to verify this behavior, I was only able to reproduce it under certain shapes, so the shape is explicitly taken from https://github.com/pytorch/pytorch/issues/37154 example Pull Request resolved: https://github.com/pytorch/pytorch/pull/37337 Differential Revision: D21403178 Pulled By: albanD fbshipit-source-id: deef9678c1958524b552504920f19617f9ad1da6	2020-05-05 14:13:37 -07:00
Nikita Shulga	06e1b68843	[BE] Add @skipIfNoFBGEMM decorator (#37810 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37810 Differential Revision: D21406034 Pulled By: malfet fbshipit-source-id: 9583a8a726c2e59e5173e114604e4edd979330c0	2020-05-05 14:00:52 -07:00
Shen Li	65291fd422	Remove unused capture in tensorpipe_agent.cpp (#37828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37828 Test Plan: Imported from OSS Differential Revision: D21407942 Pulled By: mrshenli fbshipit-source-id: 72c0cc36d6aa61d48c9108850f5e8ba1eb6a7507	2020-05-05 13:50:08 -07:00
Michael Suo	bd220b336b	[jit] fix trace checking reporting divergent names (#37842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37842 Fixes https://github.com/pytorch/pytorch/issues/23993. Previously our name lookup function for the tracer was looking in f.globals for names. For example: ``` sample1 = torch.ones(1) sample2 = torch.ones(1) traced = torch.jit.trace(my_mod, ((sample1, sample2,),)) > produces a graph with something like: > %sample1, %sample2 = prim::TupleUnpack(%input) ``` This is not great if you are, e.g. trace checking, because a non-local bit of interpreter state is affected the graph produced: ``` traced = torch.jit.trace(my_mod, _clone_inputs((sample, sample,),)) > produces a graph with something like > %0, %1 = prim::TupleUnpack(%input) ``` I have removed this functionality, as I don't think it provides huge value. Things that look locally for names will still work, so e.g. inputs, intermediate variables, and the like will be named correctly. Test Plan: Imported from OSS Differential Revision: D21406478 Pulled By: suo fbshipit-source-id: 3c7066b95d4a6e9b528888309954b02dadbc1a07	2020-05-05 13:39:41 -07:00
Quang Luong	9d7a79ac27	[Caffe2] raise exceptions instead of str (#37744 ) Summary: Some exceptions are not correctly wrapped inside a class. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37744 Differential Revision: D21388197 Pulled By: mrshenli fbshipit-source-id: 2d69e2543c2e05116c367d137968b982c254d2dc	2020-05-05 13:34:33 -07:00
Vasiliy Kuznetsov	b57c8b720e	[wip] Make quantization modules work with DataParallel (#37032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37032 DataParallel requires all params and buffers of child modules to be updated in place because of how it implements model replication during the forward pass (see https://github.com/pytorch/pytorch/pull/12671 for context). Any params or buffers not updated in place are lost and not propagated back to the master. This diff updates (some quantized modules) (TBD: all quantized modules? determine a good cut point) to do their parameter update in-place. This will enable static quant and QAT to work correctly with DataParallel. TODO: https://github.com/pytorch/pytorch/pull/32684 needs to land before we can fix the graph mode test failures on this PR. Test Plan: script failed before and passes after the diff: https://gist.github.com/vkuzo/78b06c01f23f98ee2aaaeb37e55f8d40 TODO before land: add integration testing Imported from OSS Differential Revision: D21206454 fbshipit-source-id: df6b4b04d0ae0f7ef582c82d81418163019e96f7	2020-05-05 13:06:43 -07:00
Vasiliy Kuznetsov	25e6129c52	quant BN tests: remove qint32 (#37832 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37832 These tests were flaky and qint32 support is not a priority at the moment, turning it off to improve test quality. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_batch_norm2d python test/test_quantization.py TestQuantizedOps.test_batch_norm2d_relu python test/test_quantization.py TestQuantizedOps.test_batch_norm3d ``` Imported from OSS Differential Revision: D21404980 fbshipit-source-id: 04f4308bc5d6e1a278c60985971d03c10a851915	2020-05-05 12:22:20 -07:00
Nikolay Korovaiko	08304ccccc	add a cuda job for profiling tests (#37812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37812 Reviewed By: ZolotukhinM Differential Revision: D21405933 Pulled By: Krovatkin fbshipit-source-id: 2ba67afb80a6b34373559ccd66450fce1d3140eb	2020-05-05 12:10:35 -07:00
Michael Suo	5b0244ee8f	Tighten error checking in ConcreteModuleType (#37813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37813 This condition should never fire. Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D21398021 Pulled By: suo fbshipit-source-id: 7f2213a020071b8eab80ef40ac6a9de669722548	2020-05-05 11:39:50 -07:00
Gregory Chanan	782b53b654	Specify _th_ ops in CUDAUnaryOps macros so they are easier to find. (#37582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37582 Test Plan: Imported from OSS Differential Revision: D21328055 Pulled By: gchanan fbshipit-source-id: de0939dfdb97ab4dca777e0784fc6225ac31abdc	2020-05-05 11:38:24 -07:00
Jerry Zhang	9b3911c073	[quant][graphmode][refactor] rename SwapDequant and refactor code handling general ops (#37555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37555 Test Plan: . Imported from OSS Differential Revision: D21393514 fbshipit-source-id: 5bc9fa0f0be25f4c35a64acb23513f64ed07e230	2020-05-05 11:20:15 -07:00
Mikhail Zolotukhin	7fa968b10d	[TensorExpr] Add python bindings for TE fuser. (#37831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37831 Test Plan: Imported from OSS Reviewed By: jackm321 Differential Revision: D21404947 Pulled By: ZolotukhinM fbshipit-source-id: 8467346d4fd8413985a33832fb3994d3ead746dc	2020-05-05 10:58:30 -07:00
Rak Alexey	5c628ddbd0	Fix README for installation from source (#37301 ) Summary: I think, it's help faster compile pytorch from source without errors about incompatible compiler(such as: unsupported GNU version! gcc versions later than 8 are not supported!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/37301 Differential Revision: D21396682 Pulled By: ngimel fbshipit-source-id: 5e21c36ee550424e820f3aa6e6131ca858994ae4	2020-05-05 10:15:21 -07:00
Hong Xu	3b97723f08	Let >> and << support half on CUDA (#37670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37670 Differential Revision: D21395325 Pulled By: ngimel fbshipit-source-id: fcb02f3bee488717cdc1ffc05204970b907d3c3f	2020-05-05 10:10:37 -07:00
Vasiliy Kuznetsov	3673a7245d	graph mode: more in-place activation handling (#37771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37771 Adds in place handling for other activations in graph mode Test Plan: ``` python test/test_quantization.py TestQuantizeScriptPTSQOps.test_swap_dequantize ``` Imported from OSS Differential Revision: D21382825 fbshipit-source-id: 6a4e64bae08fcbfb9bdab92aaac43da98207a1c3	2020-05-05 10:07:50 -07:00
Vasiliy Kuznetsov	b354700e75	graph mode: round out relu support (#37592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37592 Makes sure that all the standalone relu flavors are tested in graph mode. Test Plan: ``` python test/test_quantization.py TestQuantizeScriptPTSQOps.test_swap_dequantize ``` Imported from OSS Differential Revision: D21366597 fbshipit-source-id: 103848b76a0c65b9adac5bae98b545aa1d30a9e2	2020-05-05 10:06:04 -07:00
Xing Liu	0b693e9601	uninitialize output and bag_size in the fast path of EmbeddingBag to save overhead (#36681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36681 Test Plan: Imported from OSS Unit tests: python test/run_test.py -i test_nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_failures_cpu python test/run_test.py -i test_nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_and_offsets_cpu python test/run_test.py -i test_nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu python test/run_test.py -i test_nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_and_no_offsets_cpu python test/test_nn.py TestNN.test_embeddingbag_from_pretrained python test/test_nn.py TestNN.test_embeddingbag_from_pretrained_options Finally run: python test/test_nn.py Reviewed By: jspark1105 Differential Revision: D21058006 Pulled By: xing-liu fbshipit-source-id: 65b36a788839e8b722db3e295e58215b5935d6e8	2020-05-05 09:56:52 -07:00
kshitij12345	145560f499	Migrate `erf` and `erf_` from the TH to Aten (CUDA) : Closes #24558 (#36724 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24558 Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.erf(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.erf(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.erf(a) a.numel() == 10000 for 20000 times torch.half 0.29057903600187274 torch.erf(a) a.numel() == 10000 for 20000 times torch.float 0.2836507789979805 torch.erf(a) a.numel() == 10000 for 20000 times torch.double 0.44974555500084534 torch.erf(a) a.numel() == 100000 for 20000 times torch.half 0.31807255600142526 torch.erf(a) a.numel() == 100000 for 20000 times torch.float 0.3216503109979385 torch.erf(a) a.numel() == 100000 for 20000 times torch.double 2.0413486910001666 ``` After: ``` torch.erf(a) a.numel() == 10000 for 20000 times torch.half 0.2867302739996376 torch.erf(a) a.numel() == 10000 for 20000 times torch.float 0.28851128199858067 torch.erf(a) a.numel() == 10000 for 20000 times torch.double 0.4592030350013374 torch.erf(a) a.numel() == 100000 for 20000 times torch.half 0.28704102400115517 torch.erf(a) a.numel() == 100000 for 20000 times torch.float 0.29036039400125446 torch.erf(a) a.numel() == 100000 for 20000 times torch.double 2.04035638699861 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36724 Differential Revision: D21164626 Pulled By: VitalyFedyunin fbshipit-source-id: e6f3390b2bbb6e8d21e18ffe15f5d49a170fae83	2020-05-05 09:22:54 -07:00
Elias Ellison	23d0441da7	[JIT] Fix GetAttr inconsistency (#37424 ) Summary: We were previously only looking at class attributes, so that didn't include methods etc, and would silently give wrong semantics. This makes hasAttr go through the same resolution as our other attribute lookups. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37424 Differential Revision: D21282633 Pulled By: eellison fbshipit-source-id: 8e970f365c2740d137a02331739c2ed93747b918	2020-05-05 09:06:51 -07:00
Cloud Han	12e64916b3	Migrate clamp from the TH to Aten (CUDA) (#37646 ) Summary: Fixed https://github.com/pytorch/pytorch/issues/24544 Reference https://github.com/pytorch/pytorch/issues/24507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37646 Differential Revision: D21395824 Pulled By: VitalyFedyunin fbshipit-source-id: 111889023d60e3361b5a646bcfb6fb7d5ec969d1	2020-05-05 08:59:52 -07:00
Jeremy Lilley	468a9d448e	[aten] Pass std::function<> to thread_pool by value, instead of const ref. (#37681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37681 By passing by value, we can std::move, and avoid unnecessarily copying args that are part of any std::function/lambda state (e.g. in the jit interpreter, there is a std::vector<> stack passed in the InterpreterContinuation) This makes the api also consistent with e.g. folly and best practices. Added a minor at::launch() benchmark to test/cpp/, the difference is mostly noticeable when copying the std::function<> internal args is non-trivial. Benchmarks pre/post (min over ~5 runs) NoData: 5.81 us -> 5.63 us (-3.2%) WithData(0): 6.67 us -> 5.88 us (-11.8%) WithData(4): 6.98 us -> 6.51 us (-6.7%) WithData(256): 9.44 us -> 7.89 (-16.5%) ghstack-source-id: 103322321 Test Plan: - perf: buck run mode/opt caffe2/test/cpp/api:parallel_benchmark pre/post - correctness buck test mode/dev-nosan caffe2/test/... Reviewed By: dzhulgakov Differential Revision: D21355148 fbshipit-source-id: 3567e730845106f1991091e4a892d093e00571c3	2020-05-05 08:41:38 -07:00
Xiang Gao	d7ccb4b392	Migrate CUDA unary complex kernel to c10::complex (#37647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37647 Differential Revision: D21351018 Pulled By: anjali411 fbshipit-source-id: 51e4a4a3bdc9b3f8b9f7a5e0d65c06f209c55401	2020-05-05 08:02:00 -07:00
Jithun Nair	51c9444274	Enable test_distributed test test_backend_full_group (#37794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37794 Differential Revision: D21392441 Pulled By: mrshenli fbshipit-source-id: 5621b9341a676b695244790ba125d08491a3fe6f	2020-05-05 07:56:57 -07:00
Xiang Gao	7c2944899b	Add vec256 for c10::complex (#37690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37690 Test Plan: Imported from OSS Differential Revision: D21394694 Pulled By: anjali411 fbshipit-source-id: 4f0e68280e9c9faf398cfb2d213ecdc4f5cde9fb	2020-05-05 07:18:57 -07:00
Alban Desmaison	6133be31bd	Fix for hooks with no name (#37785 ) Summary: Fix https://github.com/pytorch/pytorch/issues/37672 Make sure we only access fields that exist and handle python errors correctly. Before the fix, the given test would throw: ``` AttributeError: 'MyHookClass' object has no attribute '__name__' The above exception was the direct cause of the following exception: Traceback (most recent call last): File "test_autograd.py", line 432, in test_hook_with_no_name x.sum().backward() File "/Users/albandes/workspace/pytorch_dev/torch/tensor.py", line 184, in backward torch.autograd.backward(self, gradient, retain_graph, create_graph) File "/Users/albandes/workspace/pytorch_dev/torch/autograd/__init__.py", line 115, in backward allow_unreachable=True) # allow_unreachable flag SystemError: <built-in method run_backward of torch._C._EngineBase object at 0x112fd8100> returned a result with an error set ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37785 Differential Revision: D21387946 Pulled By: albanD fbshipit-source-id: dcb9afa37b3e10620dc9182d8aa410e7130ffb64	2020-05-05 07:14:35 -07:00
Xiang Gao	16c7907ad0	Migrate CUDA fill kernel to c10::complex (#37651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37651 Test Plan: Imported from OSS Differential Revision: D21394351 Pulled By: anjali411 fbshipit-source-id: 4fad836700ea25c184dcf4824829f85a0b1e2510	2020-05-05 06:55:30 -07:00
Brian Vaughan	d4edbbd396	Revert D21369541: Make a separate cmake option for caffe2 tests Test Plan: revert-hammer Differential Revision: D21369541 Original commit changeset: 669cff70c5b5 fbshipit-source-id: 500d261eaf3f02bcd698d343480b9e951e2844b9	2020-05-05 06:30:52 -07:00
Hongyi Jia	0549e1f384	[Tensorpipe/RPC] tensorpipe RPC agent (#35483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35483 Implement the initial version of TensorPipe RPC agent, and register to RPC registry to expose to Python interface. As a starter, it utilizes all available TensorPipe transports (shm, uv) and channels (basic, cma). Test Plan: https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/experimental/jiayisuse/tensorpipe_rpc export MASTER_ADDR=127.0.0.1 export MASTER_PORT=28500 buck build mode/dev-nosan mode/no-gpu //experimental/jiayisuse/tensorpipe_rpc:main ./buck-out/gen/experimental/jiayisuse/tensorpipe_rpc/main.par buck build mode/dev-nosan mode/no-gpu //experimental/jiayisuse/tensorpipe_rpc:benchmark ./buck-out/gen/experimental/jiayisuse/tensorpipe_rpc/benchmark.par Multiple connections with async echo ./buck-out/gen/experimental/jiayisuse/tensorpipe_rpc/async_echo.par Reviewed By: lw Differential Revision: D20088366 fbshipit-source-id: 980f641af3321ca93583c62753e1c9174b7d4afc	2020-05-05 05:47:43 -07:00
Hongyi Jia	3411ec6e32	[TensorPipe/RPC] Serialize and deserialize message (#36197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36197 Create APIs to convert between rpc::message and tensorpipe::message 1. tensorpipeSerialize() - converts rpc::message to tensorpipe::message without memory copy (tensors). 2. tensorpipeAllocateMessage - allocates rpc::message based on received tensorpipe descriptor to prepare memory-copy-free receiving. Test Plan: buck test caffe2/test/cpp/rpc:test_tensorpipe_serialization Reviewed By: lw Differential Revision: D20084125 fbshipit-source-id: ffbc310f93443e50261aed752be0fe176610dd2a	2020-05-05 05:45:57 -07:00
Jianyu Huang	7fa897eac0	[caffe2] L2 regularization for (RowWise)SparseAdagrad fusion on GPUs (#37805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37805 Resolve the unit test failures after https://github.com/pytorch/pytorch/pull/37653 Test Plan: ``` buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient $caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps$' ``` ``` buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_weighted_sum_gradient $caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps$' ``` Reviewed By: jspark1105 Differential Revision: D21395764 fbshipit-source-id: e8224a1ecbff5dce42ab732c0977de352fe98914	2020-05-05 00:05:32 -07:00
Nikita Shulga	429d90f648	[BE] Split pytorch_linux_test into 3 steps (#37808 ) Summary: First one is to download build artifacts Second is to run tests Third is to upload test metadata (runs always, even if `Run` step has failed) Pull Request resolved: https://github.com/pytorch/pytorch/pull/37808 Differential Revision: D21398398 Pulled By: malfet fbshipit-source-id: da23c499a84136e12e88adcc60206ea26bc843c9	2020-05-04 23:48:23 -07:00
Linbin Yu	458134f021	Add several ops for portal NLU/ASR model (again) (#37801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37801 D21215050 was reverted. Re do it. Test Plan: build, CI Reviewed By: iseeyuan Differential Revision: D21393474 fbshipit-source-id: 2e86d5d1980a122a847e146dc6357627ec31d80d	2020-05-04 23:38:04 -07:00
Michael Suo	aff92ef3d6	Make a separate cmake option for caffe2 tests (#37721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37721 Even though we disabled caffe2 test configs in Python, the BUILD_TEST option was still building caffe2 test cpp binaries and various CI configurations were running them (since they just run every binary in `torch/test`). This PR adds a caffe2-specific BUILD_TEST option (BUILD_CAFFE2_TEST), which defaults to OFF, and gates the compilation of caffe2 test cpp binaries under it. Test Plan: Imported from OSS Differential Revision: D21369541 Pulled By: suo fbshipit-source-id: 669cff70c5b53f016e8e016bcb3a99bf3617e1f9	2020-05-04 23:26:27 -07:00
Vasiliy Kuznetsov	faad00a290	add qnnpack path for hardtanh (#35779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35779 Adds a QNNPack path for the clamp kernel, which is useful for hardtanh. Test Plan: python test/test_quantized.py TestQNNPackOps.test_hardtanh Imported from OSS Differential Revision: D20778588 fbshipit-source-id: 537de42e795a9c67924e1acb1d33b370beb9dbf5	2020-05-04 21:58:11 -07:00
Gao, Xiang	f5e6f39e00	Remove std::complex to std::complex casting specialization (#37574 ) Summary: This is no longer needed because cuda copy kernel now uses `c10::complex`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37574 Differential Revision: D21328501 Pulled By: ngimel fbshipit-source-id: dd5226e8b6c54915fb6ee52240a446f0ca30a800	2020-05-04 21:50:10 -07:00
Yinghai Lu	15df33f797	[Onnxifi] Cache output shape inference result for OnnxifiOp (#37796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37796 Shape inference is costly. In bad cases, if we have a lot of uneven tails, we are going to do quite amount of shape inference. This diff will enable each Onnxifi operator to cache the shape inference result for given batch size. In the worst case, we will occupy `num_inference_threads * max_batch_size` OutputReshapeInfo objects per model, where `num_inference_threads` and `max_batch_size` are smaller than 64. Reviewed By: benjibc Differential Revision: D21389946 fbshipit-source-id: 23473e64c338d64d15c70292cca0056205d980eb	2020-05-04 21:27:28 -07:00
Huang Shuang	1845545075	Enable HgemmBatched for ROCm (#37483 ) Summary: The purpose of this PR is to enable HgemmBatched for ROCm. Since the inconsistency between CUDA_VERSION and HIP_VERSION, resulting in THCudaBlas_HgemmStridedBatched() not to be called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37483 Differential Revision: D21395699 Pulled By: ngimel fbshipit-source-id: c5c22d5f2041d4c9911558b2568fc9ce33ddeb5d	2020-05-04 20:51:27 -07:00
Jeff Daily	4a2c642e1f	fix ROCm bench CI by increasing first iter timeout (#37633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37633 Differential Revision: D21395519 Pulled By: ngimel fbshipit-source-id: 03b31417dde0758db6c189c21b6cb5771c776115	2020-05-04 20:49:32 -07:00
Xiang Gao	090ea775c9	Math functions of c10::complex should be overloaded as const reference (#37689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37689 It has to be this way, otherwise, we will not be able to use it in vec256 because the function pointers declared there are using const reference. Test Plan: Imported from OSS Differential Revision: D21394603 Pulled By: anjali411 fbshipit-source-id: daa075b86daaa694489c883d79950a41d6e996ba	2020-05-04 19:59:28 -07:00
Yinghai Lu	8e5f162b4c	[FakeLowp] Reset workspace in test (#37799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37799 Failure to do so will results on some workspace contention. Test Plan: unittest Reviewed By: amylittleyang Differential Revision: D21390900 fbshipit-source-id: 9e837f0f7aae32230740604069308f35b73612b9	2020-05-04 19:41:23 -07:00
Masaki Kozuki	1d43d7caa2	Use `gpu_kernel` in Affine Quantizer (#37312 ) Summary: Removes `CUDA_tensor_apply2` from Affine Quantizer. cc: zasdfgbnm # Profiling ## This PR ### quint8 ```==4458== Range "quantize_per_tensor, seq = 0" Type Time(%) Time Calls Avg Min Max Name Range: 100.00% 4.8703ms 20 243.52us 207.60us 312.66us quantize_per_tensor, seq = 0 GPU activities: 100.00% 751.95us 10 75.194us 74.372us 79.044us _ZN2at6native6modern29vectorized_elementwise_kernelILi4EZZZNS0_75_GLOBAL__N__51_tmpxft_0000424b_00000000_6_affine_quantizer_cpp1_ii_92f2f7d738quantize_tensor_per_tensor_affine_cudaENS_6TensorES4_dlENKUlvE_clEvENKUlvE0_clEvEUlfN3c106quint8EE_NS_6detail5ArrayIPcLi3EEEEEviT0_T1_ API calls: 100.00% 162.48us 10 16.247us 13.383us 35.997us cudaLaunchKernel ``` ### qint8 ```==14289== Range "quantize_per_tensor, seq = 0" Type Time(%) Time Calls Avg Min Max Name Range: 100.00% 4.8143ms 20 240.71us 155.68us 327.78us quantize_per_tensor, seq = 0 GPU activities: 100.00% 748.85us 10 74.884us 73.892us 78.565us _ZN2at6native6modern29vectorized_elementwise_kernelILi4EZZZNS0_75_GLOBAL__N__51_tmpxft_0000424b_00000000_6_affine_quantizer_cpp1_ii_92f2f7d738quantize_tensor_per_tensor_affine_cudaENS_6TensorES4_dlENKUlvE_clEvENKUlvE_clEvEUlfN3c105qint8EE_NS_6detail5ArrayIPcLi3EEEEEviT0_T1_ API calls: 100.00% 166.61us 10 16.661us 13.387us 39.237us cudaLaunchKernel ``` ### qint32 ``` ==17303== Range "quantize_per_tensor, seq = 0" Type Time(%) Time Calls Avg Min Max Name Range: 100.00% 19.011ms 20 950.55us 308.07us 1.0331ms quantize_per_tensor, seq = 0 GPU activities: 100.00% 1.1440ms 10 114.40us 113.42us 117.74us _ZN2at6native6modern29vectorized_elementwise_kernelILi4EZZZNS0_75_GLOBAL__N__51_tmpxft_0000424b_00000000_6_affine_quantizer_cpp1_ii_92f2f7d738quantize_tensor_per_tensor_affine_cudaENS_6TensorES4_dlENKUlvE_clEvENKUlvE1_clEvEUlfN3c106qint32EE_NS_6detail5ArrayIPcLi3EEEEEviT0_T1_ API calls: 100.00% 163.78us 10 16.378us 13.747us 35.668us cudaLaunchKernel ``` ## Original commit: b428f454e13f6e8055124ea19c32b554017137d0 ### quint8 ``` ==4361== Range "quantize_per_tensor, seq = 0" Type Time(%) Time Calls Avg Min Max Name Range: 100.00% 5.6212ms 20 281.06us 230.17us 352.82us quantize_per_tensor, seq = 0 GPU activities: 100.00% 780.85us 10 78.084us 77.633us 78.561us _ZN2at4cuda75_GLOBAL__N__51_tmpxft_00007fda_00000000_6_affine_quantizer_cpp1_ii_13ee0d7721kernelPointwiseApply2IZZZNS_6native75_GLOBAL__N__51_tmpxft_00007fda_00000000_6_affine_quantizer_cpp1_ii_13ee0d7738quantize_tensor_per_tensor_affine_cudaENS_6TensorES5_dlENKUlvE_clEvENKUlvE0_clEvEUlRfRN3c106quint8EE_fSA_jLi1ELi1ELi1EEEvNS0_6detail10TensorInfoIT0_T2_EENSE_IT1_SG_EESG_T_ API calls: 100.00% 166.07us 10 16.606us 13.535us 36.578us cudaLaunchKernel ``` ### qint8 ``` ==12583== Range "quantize_per_tensor, seq = 0" Type Time(%) Time Calls Avg Min Max Name Range: 100.00% 5.5765ms 20 278.82us 226.51us 351.23us quantize_per_tensor, seq = 0 GPU activities: 100.00% 783.28us 10 78.328us 77.826us 80.386us _ZN2at4cuda75_GLOBAL__N__51_tmpxft_00007fda_00000000_6_affine_quantizer_cpp1_ii_13ee0d7721kernelPointwiseApply2IZZZNS_6native75_GLOBAL__N__51_tmpxft_00007fda_00000000_6_affine_quantizer_cpp1_ii_13ee0d7738quantize_tensor_per_tensor_affine_cudaENS_6TensorES5_dlENKUlvE_clEvENKUlvE_clEvEUlRfRN3c105qint8EE_fSA_jLi1ELi1ELi1EEEvNS0_6detail10TensorInfoIT0_T2_EENSE_IT1_SG_EESG_T_ API calls: 100.00% 161.05us 10 16.104us 13.363us 34.284us cudaLaunchKernel ``` ### qint32 ``` ==17267== Range "quantize_per_tensor, seq = 0" Type Time(%) Time Calls Avg Min Max Name Range: 100.00% 19.815ms 20 990.77us 381.03us 1.0717ms quantize_per_tensor, seq = 0 GPU activities: 100.00% 1.1778ms 10 117.78us 117.51us 118.44us _ZN2at4cuda75_GLOBAL__N__51_tmpxft_00007fda_00000000_6_affine_quantizer_cpp1_ii_13ee0d7721kernelPointwiseApply2IZZZNS_6native75_GLOBAL__N__51_tmpxft_00007fda_00000000_6_affine_quantizer_cpp1_ii_13ee0d7738quantize_tensor_per_tensor_affine_cudaENS_6TensorES5_dlENKUlvE_clEvENKUlvE1_clEvEUlRfRN3c106qint32EE_fSA_jLi1ELi1ELi1EEEvNS0_6detail10TensorInfoIT0_T2_EENSE_IT1_SG_EESG_T_ API calls: 100.00% 172.26us 10 17.226us 14.094us 37.952us cudaLaunchKernel ``` ## # Environment ```shell Collecting environment information... PyTorch version: 1.6.0a0+010771e Is debug build: No CUDA used to build PyTorch: 10.2 OS: Ubuntu 18.04.3 LTS GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 CMake version: version 3.14.0 Python version: 3.7 Is CUDA available: Yes CUDA runtime version: 10.2.89 GPU models and configuration: GPU 0: TITAN V Nvidia driver version: 440.33.01 cuDNN version: /usr/local/cuda-10.2/targets/x86_64-linux/lib/libcudnn.so.7 Versions of relevant libraries: [pip] numpy==1.18.1 [pip] torch==1.6.0a0+010771e [conda] blas 1.0 mkl [conda] magma-cuda102 2.5.2 1 pytorch [conda] mkl 2020.0 166 [conda] mkl-include 2020.0 166 [conda] mkl-service 2.3.0 py37he904b0f_0 [conda] mkl_fft 1.0.15 py37ha843d7b_0 [conda] mkl_random 1.1.0 py37hd6b4f25_0 [conda] torch 1.6.0a0+010771e dev_0 <develop> ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37312 Differential Revision: D21383938 Pulled By: jerryzh168 fbshipit-source-id: 21539675267c64508a6b9eafcde1a8861d1fb421	2020-05-04 19:31:27 -07:00
F-G Fernandez	847d102e93	docs: Fixed docstring indentation for documentation (#37739 ) Summary: Hello there, I was going through the default initialization of some layers, and ended up on the `torch.nn.init` documentation. As shown below, there was a slight issue with the docstrings of both `kaiming_normal_` and `kaiming_uniform_` that yielded a wrong list of function parameters: ![doc_issue](https://user-images.githubusercontent.com/26927750/80923512-88e30400-8d84-11ea-8708-36ed3a0f7749.png) This PR fixes the indentation in the corresponding docstrings. Any feedback is welcome! Pull Request resolved: https://github.com/pytorch/pytorch/pull/37739 Differential Revision: D21393728 Pulled By: ngimel fbshipit-source-id: 64523cb328e72d2e51c2c42b20a4545c1ec5f478	2020-05-04 19:08:55 -07:00
Xiang Gao	53ca3e5b9c	Migrate CUDA cat, scatter, gather, index, index_put to c10::complex (#37650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37650 Test Plan: Imported from OSS Differential Revision: D21394299 Pulled By: anjali411 fbshipit-source-id: dd7666af736e720b54b978cc62570d0e840e092f	2020-05-04 18:58:28 -07:00
Basil Hosmer	209c6f9ab5	Move device type init from BackendSelect to backend kernels (#37402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37402 Previously, BackendSelect kernels did just-in-time device type initialization by calling `LegacyTypeDispatch.initForDispatchKey()` with a computed dispatch key. Here we move the initialization to the backend kernels themselves, where we can call the device- specific initializer directly. Putting this up to run tests on it, but a couple questions remain: * why were only BackendSelect kernels doing this initialization? Not all factory ops appear there, nor are all the ops that do appear there factory ops. Currently we generate init code for exactly the BackendSelect ops, but the choice should be better motivated. * the previous scheme maps HIP to its own legacy type dispatch entry, but the logic assumes it's exclusive with CUDA, and no ops appear to mention HIP explicitly, so the new logic doesn't expose a static entry point for it. Needs to be verified. Test Plan: Imported from OSS Differential Revision: D21282974 Pulled By: bhosmer fbshipit-source-id: cd46eb788596948e0572a15fac0f8b43feca5d75	2020-05-04 18:44:43 -07:00
Bharat Raghunathan	0c2a72ec41	Update README to include few (missing?) links (#37714 ) Summary: Update of README Pull Request resolved: https://github.com/pytorch/pytorch/pull/37714 Differential Revision: D21393786 Pulled By: ngimel fbshipit-source-id: 8ae12b38989cbfcdd4d69db1c1ab3bbac0e0db61	2020-05-04 18:34:58 -07:00
Povilas Kanapickas	d16c8238e1	[ONNX] Fix numerical errors in softmax when dim is not last dimension (#37326 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34585. This PR improves the workaround for the problem of different semantics between ONNX softmax and Pytorch softmax. In Pytorch the `dim` parameter specifies over which dimension normalize the values. ONNX on the other hand always coerces the input into a 2D tensor and the `axis` parameter specifies which dimensions represent rows and columns of the resulting tensor. As a result, only when we are normalizing the last dimension (`dim == ndim - 1`) semantics are the same. Previously this was handled by recognizing the `dim == ndim - 1` case and using `softmax` for that. All other cases used a fallback path of explicit invocations of exp, reducesum and div operators to compute the result. Unfortunately, this results in numeric errors when input values are large: the result of exp will produce infinity on both numerator and denumerator and the division of that will result in NaN. This can be improved by transposing the input tensor so that we can reuse ONNX softmax. Similar approach has been applied to `logsoftmax` function in https://github.com/pytorch/pytorch/issues/30433. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37326 Reviewed By: hl475 Differential Revision: D21389712 Pulled By: houseroad fbshipit-source-id: 554fd1b98231a28984c30c7e7abd3c0643386ff7	2020-05-04 18:07:38 -07:00
Michael Suo	804e32a467	split out docs tests into separate job (#37793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37793 Test Plan: Imported from OSS Differential Revision: D21392798 Pulled By: suo fbshipit-source-id: 172fb0522d0b168ca19a382e5fb1eb87b6390acc	2020-05-04 17:58:04 -07:00
Rohan Varma	57dc4cd0f8	[MultiProcessTestCase] Improve the error message when a process terminates (#37627 ) Summary: When a subprocess terminates with an exception in a distributed test, log the process number as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/37627 Differential Revision: D21366149 Pulled By: rohan-varma fbshipit-source-id: 132c4b4c1eb336761c2be26d034d8b739ae19691	2020-05-04 17:46:36 -07:00
Xiang Gao	20e5749129	Migrate CPU casting copy kernel to c10::complex (#37649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37649 Test Plan: Imported from OSS Differential Revision: D21385430 Pulled By: anjali411 fbshipit-source-id: 66106edfc682fc75f293babece1dc4323aa3aeb1	2020-05-04 16:41:27 -07:00
Supriya Rao	0a24f33dc1	[quant][mobile] Return for conv with empty batch (#37779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37779 We should just return empty output Test Plan: Imported from OSS Differential Revision: D21385789 fbshipit-source-id: 4b42f5aaebabfa3f329ed74356bddb33daad98d5	2020-05-04 15:41:14 -07:00
Edward Yang	4fef3763dd	Revert "Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings" (#37778 ) Summary: Original PR: https://github.com/pytorch/pytorch/pull/37419 cc mattip suo Pull Request resolved: https://github.com/pytorch/pytorch/pull/37778 Differential Revision: D21385774 Pulled By: ezyang fbshipit-source-id: 5de532faab8bae132736b6b5189e0ee2ac9935be	2020-05-04 14:32:35 -07:00
Gregory Chanan	4025d87843	Kill the ability to codegen tensor-based broadcasting. (#37547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37547 This shouldn't be used anymore. Test Plan: Imported from OSS Differential Revision: D21315037 Pulled By: gchanan fbshipit-source-id: 12728f1d0e1856bf3e8fe1bfcf36cddd305a4a76	2020-05-04 13:38:56 -07:00
Gregory Chanan	73aa49d529	Move addr broadcasting from codegen layer to native layer. (#37546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37546 Test Plan: Imported from OSS Differential Revision: D21315040 Pulled By: gchanan fbshipit-source-id: 1bba97bd889ec286e3e7f1d0f0450871b996c9ae	2020-05-04 13:38:51 -07:00
Gregory Chanan	e38d7591a7	Move broadcasting code for fmod, fmod_ from codegen layer. (#37545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37545 Test Plan: Imported from OSS Differential Revision: D21315036 Pulled By: gchanan fbshipit-source-id: cbe82205dc71c2a704d717a5f82827fc6ff5106c	2020-05-04 13:37:12 -07:00
Nikolay Korovaiko	4cdaa5956c	capitalize fuseTensorExpr (#37780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37780 Differential Revision: D21386092 Pulled By: Krovatkin fbshipit-source-id: c190f891fe25b3cee9a34b5173756c39efd49c66	2020-05-04 12:40:49 -07:00
Supriya Rao	fe8fdb775f	[quant][graph] Fix bug in replicateDequant (#37637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37637 Insert dequant op at specific offset, rather than for all inputs of user Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D21354931 fbshipit-source-id: 79a1dc63b0ed96c3d51d569116ed963106085d3b	2020-05-04 12:37:29 -07:00
Supriya Rao	a6aa336cc2	[quant][graph] Fix bug in replaceConvolutionWithConv2d (#37635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37635 replaceConvolutionWithConv2d incorrectly assumes that the size of padding is 2. For Conv1d it is 1, in which case we cannot replace with aten::conv2d Test Plan: Imported from OSS Differential Revision: D21354930 fbshipit-source-id: a2dbad856666b4bbb2d9015ade8e1704774f20dd	2020-05-04 12:35:59 -07:00
Bram Wasti	77dd00c850	Permit registration of multiple triggers, but insert warning (#37772 ) Summary: If linking the same file multiple times, the trigger check becomes severe and crashes execution at startup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37772 Differential Revision: D21384072 Pulled By: bwasti fbshipit-source-id: 3396e69cd361f65e50517970d23497804c76023e	2020-05-04 12:18:11 -07:00
Edward Yang	a058e938f9	Refactor error msg stack handling, add TORCH_RETHROW (#37101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37101 Fixes #36954. The basic concept is to streamline the process of rethrowing c10::Error with extra error information. This is in a few steps: - I completely remodeled the Error data type and the internal invariants. Instead of manually adding in newlines, the message stack formatting process is responsible for inserting newlines and spacing as necessary. Call sites are then modified to respect the new API model. - TORCH_RETHROW macro is added, which adds context to an error message and then rethrows it. New internal assert failure looks like: ``` 0 INTERNAL ASSERT FAILED at ../c10/test/util/exception_test.cpp:64, please report a bug to PyTorch. Exception raised from TestBody at ../c10/test/util/exception_test.cpp:64 (most recent call first): frame #0: <unknown function> + 0x6aab9 (0x7ff611d3aab9 in /data/users/ezyang/pytorch-tmp/build/lib/libc10.so) frame #1: ... ``` Error message with context looks like: ``` This is an error This is context 1 This is context 2 ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21202891 Pulled By: ezyang fbshipit-source-id: 361cadd16bc52e5886dba08e79277771ada76169	2020-05-04 11:56:45 -07:00
Edward Yang	efd8f70cac	Make msg() and msg_with_backtrace() private (#37094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37094 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21202892 Pulled By: ezyang fbshipit-source-id: d59e6bffabd90cc734056bdce2cd1fe63262fab8	2020-05-04 11:54:34 -07:00
peng	6dd1beaaa8	To fix caffe2 model with Copy OP cannot export to onnx model (#37144 ) Summary: To fix caffe2 model with Copy OP cannot export to onnx model Pull Request resolved: https://github.com/pytorch/pytorch/pull/37144 Reviewed By: houseroad Differential Revision: D21252421 Pulled By: yinghai fbshipit-source-id: 4f1077188f36b0691d199e418880bbb27f11032d	2020-05-04 11:34:09 -07:00
Xiang Gao	1bac49f075	Migrate item() to c10::complex (#37648 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37648 Test Plan: Imported from OSS Differential Revision: D21382318 Pulled By: anjali411 fbshipit-source-id: c1d3da43f118f18739bb34906f76a5bad097c905	2020-05-04 11:15:51 -07:00
Nikita Shulga	c0ff085775	[PyTorch] Modify `data_parallel` to work with small tensors (#37704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37704 If input tensor can not be chunked, run `parallel_apply` on fewer devices Modfy input tensor dimention in `DataParallelUsesAllAvailableCUDADevices_CUDA` to be chunkable by any number of available CUDA devices Test Plan: Run `test/cpp/api/parallel` on machine with 6 GPUs Differential Revision: D21365416 fbshipit-source-id: 60fdfed4a0e6256b2c966c2ea3e8d0bfb298d9a8	2020-05-04 11:06:42 -07:00
Michael Suo	20f7e62b1d	Revert D21337640: [pytorch][PR] Split up documentation into subpages and clean up some warnings Test Plan: revert-hammer Differential Revision: D21337640 Original commit changeset: d4ad198780c3 fbshipit-source-id: fa9ba6ac542173a50bdb45bfa12f3fec0ed704fb	2020-05-04 10:57:55 -07:00
Jianyu Huang	fd05debbcd	[TS][easy] Typo Fix (#37773 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37773 As Title says ghstack-source-id: 103385174 Test Plan: CI Reviewed By: dmudiger Differential Revision: D21374951 fbshipit-source-id: a2fc48b931f0cecbc8a995bf4b4ace30a8eb0d70	2020-05-04 10:41:07 -07:00
Pavel Belevich	812a3fa03d	Show warning if Tensor.random_()'s from and to are not in [-(2^digits), 2^digits] bounds for floating-point types (#37537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37537 The documentation states that `random_()` samples "from the discrete uniform distribution". Floating-point types can support _discrete_ _uniform_ distribution only within range [-(2^digits), 2^digits], where `digits = std::numeric_limits<fp_type>::digits`, or - [-(2^53), 2^53] for double - [-(2^24), 2^24] for double - [-(2^11), 2^11] for half - [-(2^8), 2^8] for bfloat16 The worst scenario is when the floating-point type can not represent numbers between `from` and `to`. E.g. ``` torch.empty(10, dtype=torch.float).random_(16777217, 16777218) tensor([16777216., 16777216., 16777216., 16777216., 16777216., 16777216., 16777216., 16777216., 16777216., 16777216.]) ``` Because 16777217 can not be represented in float Test Plan: Imported from OSS Differential Revision: D21380387 Pulled By: pbelevich fbshipit-source-id: 80d77a5b592fff9ab35155a63045b71dcc8db2fd	2020-05-04 10:36:04 -07:00
Gao, Xiang	e6221f4ca1	Remove std::complex from TypeMeta (#37632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37632 Differential Revision: D21362056 Pulled By: anjali411 fbshipit-source-id: b20506a36594ad8485ba8ef31d2d8a83ff0862f2	2020-05-04 10:31:34 -07:00
Shen Li	dbcfd62a1c	Remove unnecessary pickle and unpickle invocation in PyRRef __setstate__/__getstate__ methods (#37638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37638 Test Plan: Imported from OSS Differential Revision: D21343280 Pulled By: mrshenli fbshipit-source-id: da462fee5815dc74c7f2dc3161699e461bc7d7d3	2020-05-04 10:26:54 -07:00
Michael Suo	b7f258bbd3	add fmt to libtorch_python.so (#37560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37560 Test Plan: Imported from OSS Differential Revision: D21320059 Pulled By: suo fbshipit-source-id: 95cfe7cf26c515fdfcb4621cc58266d838a38a3e	2020-05-04 10:14:37 -07:00
Jongsoo Park	5216917022	[caffe2/dnnlowp] documentation for pack operator arguments (#37719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37719 As title Test Plan: Just updating doc Reviewed By: hyuen Differential Revision: D21369227 fbshipit-source-id: a45e5d0fa34aea8046eb4bb83e6c4df4d2654252	2020-05-04 09:59:51 -07:00
Grigory Arutyunov	aa54f58041	LoopOptions::gpu_block_index(): bool -> int (#37578 ) Summary: Small change to allow MSVC build pass. The error is ``` D:\pytorch-scripts\caffe2_builders\v141\pytorch\torch/csrc/jit/tensorexpr/stmt.h(370): error C4805: '!=': unsafe mix of type 'bool' and type 'int' in operation (compiling source file D:\pytorch-scripts\caffe2_builders\v141\pytorch\torch \csrc\jit\passes\tensorexpr_fuser.cpp) [D:\pytorch-scripts\caffe2_builders\v141\pytorch\build\RelWithDebInfo\caffe2\tor ch_cpu.vcxproj] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37578 Differential Revision: D21348964 Pulled By: ezyang fbshipit-source-id: 2c5f995e0adbeb681c18625b59250d7ee3e958ef	2020-05-04 09:43:47 -07:00
mattip	f10fbcc820	Split up documentation into subpages and clean up some warnings (#37419 ) Summary: xref gh-32838, gh-34032 This is a major refactor of parts of the documentation to split it up using sphinx's `autosummary` feature which will build out `autofuction` and `autoclass` stub files and link to them. The end result is that the top module pages like torch.nn.rst and torch.rst are now more like table-of-contents to the actual single-class or single-function documentations pages. Along the way, I modified many of the docstrings to eliminate sphinx warnings when building. I think the only thing I changed from a non-documentation perspective is to add names to `__all__` when adding them to `globals()` in `torch.__init__.py` I do not know the CI system: are the documentation build artifacts available after the build, so reviewers can preview before merging? Pull Request resolved: https://github.com/pytorch/pytorch/pull/37419 Differential Revision: D21337640 Pulled By: ezyang fbshipit-source-id: d4ad198780c3ae7a96a9f22651e00ff2d31a0c0f	2020-05-04 09:39:22 -07:00
Wojciech Baranowski	b1e4e4d470	Remove zero_dim_dispatch_when_scalar (#37580 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33094 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37580 Differential Revision: D21380154 Pulled By: ezyang fbshipit-source-id: 4556c7ca6126a7d382f6343aee14a7e46c498ac3	2020-05-04 09:26:45 -07:00
Gregory Chanan	5ec87a3c1a	Move baddbmm broadcasting from codegen layer to native layer. (#37544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37544 Test Plan: Imported from OSS Differential Revision: D21315039 Pulled By: gchanan fbshipit-source-id: aa564d06d415ad2468b4898f84edbb03a3ee698f	2020-05-04 08:42:22 -07:00
Shawn Zhong	66a20c259b	[CircleCI] Store build artifacts for python docs (#37658 ) Summary: This PR allows the build artifacts for python docs to be stored on CircieCI, which helps the reviewer to preview doc changes before merging. The artifacts can be found in the [`ARTIFACTS` tab]( https://app.circleci.com/pipelines/github/pytorch/pytorch/162986/workflows/a969f256-3243-414f-8a02-1234b9dac149/jobs/5320907/artifacts) of the test pytorch_cpp_doc_push, and the website is served at https://5320907-65600975-gh.circle-artifacts.com/0/docs/index.html This PR is inspired by rgommers's comment under https://github.com/pytorch/pytorch/pull/37419#issuecomment-621420500 > There's a CircleCI job pytorch_python_doc_push that builds the docs, however it doesn't store any artifacts for PRs. Controlled by .circleci/scripts/python_doc_push_script.sh. I think that's the only doc build (?). Not sure why it doesn't store the artifacts, that would be useful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37658 Differential Revision: D21380094 Pulled By: ezyang fbshipit-source-id: 1dd44bf836ebc74454f4444ae9321807dccdb313	2020-05-04 07:29:58 -07:00
ashishfarmer	bcdff7eb67	Fix for tests on ROCm (#37616 ) Summary: This pull request fixes and re-enables two of the tests disabled in https://github.com/pytorch/pytorch/issues/37427 1. `test_sparse_add_out_bfloat16` in test_sparse.py fixed to use updated `atol` argument instead of `prec` for `assertEqual` 2. The conversion of `flt_min` to `int64` is divergent on HIP compared to numpy. The change removes that conversion from the `test_float_to_int_conversion_finite` test case in test_torch.py cc: ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/37616 Differential Revision: D21379876 Pulled By: ezyang fbshipit-source-id: 2bfb41d67874383a01330c5d540ee516b3b07dcc	2020-05-04 07:16:54 -07:00
Mrityunjay Tripathi	6c37ad2674	typo in MultiheadAttention documentation (#37496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37496 Differential Revision: D21356999 Pulled By: zou3519 fbshipit-source-id: ba6c7b19053c97e5ab77eb0c44e97d26b04d85da	2020-05-04 07:05:33 -07:00
Edward Yang	136bc5a482	Revert D21215050: Add ops for portal NLU model Test Plan: revert-hammer Differential Revision: D21215050 Original commit changeset: 874023c449e4 fbshipit-source-id: 695494d9607bc8823c494fa06830370adccbf935	2020-05-04 06:46:17 -07:00
Luca Wehrstedt	6a6c29c1c9	Update TensorPipe submodule (#37729 ) Summary: In order to include these fixes that were blocking https://github.com/pytorch/pytorch/pull/35483: - `673eda9efc` - `ff8d1733ad` - `c73367836f` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37729 Reviewed By: beauby Differential Revision: D21378972 Pulled By: lw fbshipit-source-id: 3375fe1fa6e79817da3bb033127c3c8f31c3ffc3	2020-05-04 04:44:57 -07:00
Hao Lu	e26631b333	[caffe2] Shape inference for UnPackRecords Summary: Since UnPackRecords is part of the graph, we need to add shape inference for it to make it work e2e with tvm_jit_op. Because the input is packed, shape inference is impossible without shape info of the packed tensors. Some context, the shape of the packed tensor is 1 X num_embeddings X embedding_size, with 1 being the batch_size. The shape of the corresponding output tensor is thus batch_size X num_embeddings X embedding_size after concatenating the packed tensors on the batch axis. Therefore two more gflags need to be added - caffe2_predictor_num_embeddings - caffe2_predictor_embedding_size These gflags are then added to the UnPackRecordsOp in the predict_net as args to pass the info to c2_frontend so TVM can do its own shape inference. Reviewed By: yinghai Differential Revision: D21286983 fbshipit-source-id: e9a19cb6b564905282a771df2b9d211d5d37dd71	2020-05-04 01:31:47 -07:00
Hao Lu	bd9617d5af	[TVM] Implement UnPackRecordsOp (#37489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37489 Reviewed By: yinghai Differential Revision: D21196596 fbshipit-source-id: 58b8bc3afc472e02ef7e3d31151fe8ace2be2a73	2020-05-04 01:30:06 -07:00
Linbin Yu	843c0230f2	Add ops for portal NLU model (#37192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37192 Add ops used by portal NLU model to lite interpreter Test Plan: local test Reviewed By: iseeyuan Differential Revision: D21215050 fbshipit-source-id: 874023c449e4c04b9f3f871450a7cf02e8f5f5c4	2020-05-03 20:08:28 -07:00
svcscm	ffed77d0c8	Updating submodules Summary: GitHub commits: `fb6b95d8c7` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 9ff6b68322b92710c7356030bc9e4c364a937ea3	2020-05-03 17:58:44 -07:00
Taiqing Wang	506ae60547	[caffe2] L2 regularization for rowwise fused sparse adagrad (#37653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37653 Following up D21320243 adding weight_decay to rowwise fused sparse adagrad. This is more involved because we can't reuse g_sq_avg multiple times. Test Plan: CI Reviewed By: jspark1105 Differential Revision: D21335643 fbshipit-source-id: 491b385c5eb9c0d1e3d31a1cf50d7eb450c2d39d	2020-05-03 10:44:25 -07:00
Taiqing Wang	3403d27def	[caffe2] L2 regularization for fused sparse Adagrad (#37652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37652 Add weight_decay to fused adagrad operators. This should be landed with the next diff together. Just separating out to make review easier. Test Plan: CI Reviewed By: jspark1105 Differential Revision: D21320243 fbshipit-source-id: 1157471988dedd60ba9b62949055f651b1fa028f	2020-05-03 10:44:19 -07:00
Taiqing Wang	8cb1f2f9dc	implement L2 regularization for Adagrad in caffe2 and dper (#37705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37705 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37372 Posted note: [Regularizing SparseNN Against Over-fitting](https://fb.workplace.com/notes/taiqing-wang/regularizing-sparsenn-against-over-fitting/220306075902708/) Problem formulation L(w) = J(w) + lambda/2 * \|\|w\|\|^2 J(w) is the empirical loss, and \|\|w\|\|^2 is the squared L2 norm of the parameters, a.k.a. L2 regularizer. dL(w)/ dw_i = dJ(w)/dw_i + lambda w_i dL(w)/ dw_i is the gradient of L(w) w.r.t. w_i. To implement the L2 regularizer, the gradient of J(w) w.r.t. w_i is added with w_i. lambda is called as weight decay in this implementation. Code changes * In the initialization method of AdagradOptimizer, a new input argument, weight_decay, is added. * In the _run function of AdagradOptimizer, the weight decay will be skipped for 1d bias vectors. * In the parameter update functions of Adagrad, the gradient is updated by weight_decay * w_i. The default value for weight_decay is zero. Test Plan: ` buck build caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_weight_decay ` ` ./buck-out/gen/caffe2/caffe2/fb/dper/layer_models/tests/split_1/sparse_nn_test_weight_decay#binary.par ` Reviewed By: jspark1105 Differential Revision: D21258652 fbshipit-source-id: d2366ddcd736a03205a2d16f914703b16d9fce8f	2020-05-03 10:42:49 -07:00
Haixin Liu	cc0f1b22a2	[PyTorch Numeric Suite] Add module output comparison (#36701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36701 Add module output comparison API. ghstack-source-id: 103368194 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_compare_model_outputs' Differential Revision: D21053197 fbshipit-source-id: cabcafbeeac1b604db069833a0f17ebce506ba65	2020-05-03 00:04:35 -07:00
Luca Wehrstedt	5baa6b6c34	Add a Bazel build config for TensorPipe (#37691 ) Summary: See https://github.com/pytorch/pytorch/pull/37671 for the CI signals once the TensorPipe RPC agent is added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37691 Reviewed By: malfet Differential Revision: D21359470 Pulled By: lw fbshipit-source-id: 577dd6d73a4a11d67b50d8686628dc6d8b24201d	2020-05-02 01:25:06 -07:00
Rohan Varma	d639418307	Add timeout injection to faulty agent for testing (#37485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37485 Adds arbitrary timeout injection to faulty RPC agent. This is to better test scenarios that need information about how long-running RPCs, such as properly testing RPC timeouts and the profiler in all scenarios. This is done by overriding ProcessGroupAgent's `enqueueSend()` function to inject the timeout. Determining which messages to timeout is done similar to the existing `faulty_messages` by having the user specify a mapping of message to timeout. Added unit tests that verify RPC timeouts work with builtin + TorchScript functions, which was not tested before. ghstack-source-id: 103341662 Test Plan: Added unit tests in `FaultyRpcAgentTest`. Differential Revision: D21296537 fbshipit-source-id: 1dbc21aee14e49780272634e9cbb2b5a448f2896	2020-05-01 23:48:28 -07:00
Karl Ostmo	707e0e86c0	[WIP] retry apt at individual package level and at command level (#37696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37696 Differential Revision: D21367632 Pulled By: kostmo fbshipit-source-id: 2f568ba2b404f2394875e0012fce5b930a16a9db	2020-05-01 22:08:10 -07:00
Jerry Zhang	5e0a24f1f9	[quant][graphmode] Move numerics changing passes before finalize (#37514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37514 This is to constrain all numerics changing operations in insert quant dequant pass Test Plan: python test/test_quantization.py TestQuantizeScriptJitPasses Imported from OSS Differential Revision: D21364008 fbshipit-source-id: eb8774e9e4b1db8bf09560e7e4d69d28f9d954a5	2020-05-01 18:40:59 -07:00
Jithun Nair	b4d486abbc	Enable test_DistributedDataParallel_SyncBatchNorm_2D_Input unit test (#33573 ) Summary: Test needs ability to toggle cuDNN/MIOpen at runtime (enabled in PR https://github.com/pytorch/pytorch/issues/33118) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33573 Differential Revision: D21360260 Pulled By: mrshenli fbshipit-source-id: 6e26edc0932efb5d278c2ffc919979b8eb089216	2020-05-01 18:22:25 -07:00
elmirador	ae755a73d3	SyncBatchNorm size check update (#37133 ) Summary: Update the requirements on input dimensions for torch.nn.SyncBatchNorm: 1. Checks the aggregated batch size `count_all` instead of batch size in every DDP process https://github.com/pytorch/pytorch/issues/36865 2. Added test function for SyncBatchNorm where every process only has 1 input Pull Request resolved: https://github.com/pytorch/pytorch/pull/37133 Differential Revision: D21331120 Pulled By: zhaojuanmao fbshipit-source-id: ef3d1937990006609cfe4a68a64d90276c5085f2	2020-05-01 18:01:30 -07:00
Owen Anderson	564de515f5	Add an iterator to Block. (#37542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37542 Differential Revision: D21314421 Pulled By: resistor fbshipit-source-id: e54d7a8a5c9c1186be59f69b5b8af030fc054b32	2020-05-01 15:12:49 -07:00
Mikhail Zolotukhin	fbf110293d	jit/OVERVIEW.md: screen * in 'Node*' for proper rendering. (#37686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37686 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D21358819 Pulled By: ZolotukhinM fbshipit-source-id: 6425786f3b19d6b3d51c8d5386c3ab31d4344959	2020-05-01 14:44:37 -07:00
Linbin Yu	099a84ef9b	Add overload name for aten::tensor and aten::as_tensor (#37655 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37655 Add override name for aten::tensor and aten::as_tensor. These two ops are used in NLU model, and they will included them in lite interpreter Test Plan: verified model can be loaded correctly Reviewed By: iseeyuan Differential Revision: D21346142 fbshipit-source-id: 05ff4d9e0bcf7f4f9a30d95ca81aef9c3f6b0990	2020-05-01 14:31:04 -07:00
Daya Khudia	fa8ab4b80c	[pt][quant] Unify numerics between fakequant and quant/dequant (#37188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37188 Add zero point after rounding in both fakequant and quant. ghstack-source-id: 103231624 Test Plan: buck test //caffe2/test:quantization -- --print-passing-details ``` Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/4222124675094587 Summary (total time 186.50s): PASS: 191 FAIL: 0 SKIP: 20 caffe2/test:quantization - test_numerical_consistency_per_tensor (quantization.test_fake_quant.TestFakeQuantizePerTensor) caffe2/test:quantization - test_numerical_consistency_per_channel (quantization.test_fake_quant.TestFakeQuantizePerChannel) caffe2/test:quantization - test_backward_per_tensor (quantization.test_fake_quant.TestFakeQuantizePerTensor) caffe2/test:quantization - test_qadd_scalar_relu (quantization.test_quantized.TestQuantizedOps) caffe2/test:quantization - test_mean (quantization.test_quantized.TestQNNPackOps) caffe2/test:quantization - test_qnnpack_maxpool2d (quantization.test_quantized.TestQNNPackOps) caffe2/test:quantization - test_qhardsigmoid (quantization.test_quantized.TestQNNPackOps) caffe2/test:quantization - test_batch_norm3d (quantization.test_quantized.TestQuantizedOps) caffe2/test:quantization - test_hardswish (quantization.test_quantized.TestQNNPackOps) caffe2/test:quantization - test_qnnpack_sigmoid_sweep (quantization.test_quantized.TestQNNPackOps) ...and 10 more not shown... FATAL: 0 TIMEOUT: 0 OMIT: 0 ``` Reviewed By: jspark1105 Differential Revision: D21193552 fbshipit-source-id: f63c072d772f459ca6f0f2132aa836b2714fced1	2020-05-01 14:01:04 -07:00
Wojciech Baranowski	95465dcbaf	autograd: move scalar input to a different device when needed (#35286 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33870 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35286 Differential Revision: D21229721 Pulled By: albanD fbshipit-source-id: 6f6a6d44b675457c9580ec2d91da52d12d44f096	2020-05-01 13:56:29 -07:00
cyy	2658bae570	use std::move (#34365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34365 Differential Revision: D21349942 Pulled By: mrshenli fbshipit-source-id: 4deb51cbb557501b43990ec7080c71a839cb5db9	2020-05-01 13:42:23 -07:00
Pavel Belevich	b1790794f6	Enforce Tensor.random_ check that from and to are in tensor dtype bounds (#37507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37507 Replace `TORCH_WARN` with `TORCH_CHECK` if `Tensor.random_()`'s `from` or `to-1` is out of bounds for tensor's dtype. Previously warning said "This warning will become an error in version 1.6 release, please fix the code in advance", so the time has come. Related to #33106 Test Plan: Imported from OSS Differential Revision: D21349413 Pulled By: pbelevich fbshipit-source-id: ac7c196a48fc58634611e427e65429a948119e40	2020-05-01 12:58:45 -07:00
Nikolay Korovaiko	831c8f362f	fix the incorrect merge of profiling information of two tensor types for the same value (#36806 ) Summary: as a part of moving to the dynamic shapes we are now passing `frame_id` to each profiling callback. The implementation of that requires copying profiling callbacks into Interpreter, so `first`s are actually different for every run. The dynamic shapes merging algorithm won't be using `first`, but in the meantime, while we get there, this should be a good enough fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36806 Differential Revision: D21307173 Pulled By: Krovatkin fbshipit-source-id: 7dade56ebcc72ebd40bb7f3d636c7b83c99b628f	2020-05-01 12:53:25 -07:00
Ansha Yu	b410d03e6e	Back out "[c2][opt] nomnigraph transform for ClipRangesGatherSigridHash fusion" (#37675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37675 Original commit changeset: 2c2481e3d497 (Note: this ignores all push blocking failures!) Test Plan: Back out D21262085 due to ASAN crash P130123493 Differential Revision: D21353550 fbshipit-source-id: c43c8764322f7e58aca0c1360b1d03966b1d9798	2020-05-01 12:49:17 -07:00
Shen Li	ba7461c135	Add pointer to RPC parameter server tutorial (#37667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37667 Test Plan: Imported from OSS Differential Revision: D21351052 Pulled By: mrshenli fbshipit-source-id: 8c3f78215f40b5641983f1aea4ac92152a9c136a	2020-05-01 12:18:45 -07:00
Shen Li	49c8a37a0d	Fix doc-gen warnings in RPC (#37666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37666 Add `:orphan:` to avoid "WARNING: document isn't included in any toctree". Test Plan: Imported from OSS Differential Revision: D21351053 Pulled By: mrshenli fbshipit-source-id: 6ff67c418fc1de410c7dc39ad9a0be5c30d07122	2020-05-01 12:17:15 -07:00
Dmytro Ivchenko	ba5137ea9d	[pyper] Use Caffe2 ops Summary: Replace inefficient python code w/ calls to Caffe2 operators Test Plan: existing unit tests for modified operators Reviewed By: alyssawangqq Differential Revision: D21270962 fbshipit-source-id: cb11133be4eff80a24d1358fd7bb7d354075dd8b	2020-05-01 12:06:52 -07:00
Peter Bell	675b3fc834	Prevent unbounded growth of sparse tensor in add operation (#36030 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34964 Sparse cuda add was implemented by just concatenating the indices and values for the tensor. If called repeatedly in a tight loop this will let `nnz` grow unbounded. In the worst case of `x.add_(x)` it grows exponentially. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36030 Differential Revision: D20873504 Pulled By: zou3519 fbshipit-source-id: d90ed8dda0c89571fb89e358757b5dde299513df	2020-05-01 12:05:15 -07:00
Rohan Varma	c0a985fcd6	Allow customizing retryable message types in Faulty agent tests (#37450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37450 It doesn't seem like we could customize the retryable message types by passing faulty_messages into dist_utils, as the `FaultyRpcAgentTestFixture` overrode the `rpc_backend_options` function and provided the default list of retryable message types. Needed to fix this as part of adding timeout injection support as mentioned in https://github.com/pytorch/pytorch/issues/36272 ghstack-source-id: 103287164 Test Plan: `buck test mode/dev-nosan //caffe2/test/distributed/rpc/faulty_agent:rpc_spawn_faulty -- --print-passing-details` Differential Revision: D21270127 fbshipit-source-id: e5dd847dcf92f14b490f84e9ee79291698b85ffa	2020-05-01 12:00:36 -07:00
anjali411	1f09f7ea44	Python API for Complex Storage and storage copy logic (#35771 ) Summary: Following up on this: https://github.com/pytorch/pytorch/pull/35851 cross dtype storage copy is not being used internally, so I have not included cross dtype copy for complex. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35771 Differential Revision: D21319650 Pulled By: anjali411 fbshipit-source-id: 07c72996ee598eba0cf401ad61534494d6f5b5b3	2020-05-01 11:47:22 -07:00
Rohan Varma	deb4100928	[DistributedSampler] Only create torch.generator and seed when shuffling (#37604 ) Summary: We don't need to create `torch.Generator()` and seed it if we are not shuffling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37604 Differential Revision: D21346167 Pulled By: rohan-varma fbshipit-source-id: 6ed560d236bc5c026a7d321755ddc02a29db1604	2020-05-01 10:56:40 -07:00
Nikolay Korovaiko	6ecb5bb1f0	match old fuser rem to eager (#37196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37196 Reviewed By: zdevito Differential Revision: D21223172 Pulled By: Krovatkin fbshipit-source-id: 4d4ff1127d5dc69ab73f07ca79c1f5b0b4dd9732	2020-05-01 10:55:06 -07:00
Gao, Xiang	ecf1ea75a7	Make c10::ComplexHalf a template specialization of c10::complex (#37426 ) Summary: This PR basically makes `c10::ComplexHalf` a template specialization of `c10::complex`. Since `c10::ComplexHalf` is not used much, this does not include much change. Due to the fact that `c10::Half` does not have much `constexpr` methods, it is impossible to keep the same API. Currently, we are just completely reusing the old implementation. It is just the name getting changed from `c10::ComplexHalf` to `c10::complex<c10::Half>`. We can always change the implementation in the future when needed. But for now, I think this is OK. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37426 Differential Revision: D21300754 Pulled By: anjali411 fbshipit-source-id: fc0f65adccf97025a727735096780ce8078675a1	2020-05-01 10:49:24 -07:00
kshitij12345	22708be5af	Migrate `tan` from TH to ATen (CUDA) (#36906 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24641 Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.tan(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.tan(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.tan(a) a.numel() == 10000 for 20000 times torch.half 0.28325206200003095 torch.tan(a) a.numel() == 10000 for 20000 times torch.float 0.28363607099998944 torch.tan(a) a.numel() == 10000 for 20000 times torch.double 0.43924326799998425 torch.tan(a) a.numel() == 100000 for 20000 times torch.half 0.3754699589999859 torch.tan(a) a.numel() == 100000 for 20000 times torch.float 0.38143782899999223 torch.tan(a) a.numel() == 100000 for 20000 times torch.double 1.7672172019999834 ``` After: ``` torch.tan(a) a.numel() == 10000 for 20000 times torch.half 0.28982524599996395 torch.tan(a) a.numel() == 10000 for 20000 times torch.float 0.29121579000002384 torch.tan(a) a.numel() == 10000 for 20000 times torch.double 0.4599610559998837 torch.tan(a) a.numel() == 100000 for 20000 times torch.half 0.3557764019997194 torch.tan(a) a.numel() == 100000 for 20000 times torch.float 0.34793807599999127 torch.tan(a) a.numel() == 100000 for 20000 times torch.double 1.7564662459999454 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36906 Differential Revision: D21335320 Pulled By: VitalyFedyunin fbshipit-source-id: efab9c175c60fb09223105380d48b93a81994fb0	2020-05-01 10:17:19 -07:00
Kimish Patel	df31ddbd98	Add channel shuffle op fp32 + quantized. (#36815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36815 Pytorch does not have native channel shuffle op. This diff adds that for both fp and quantized tensors. For FP implementation is inefficient one. For quantized there is a native QNNPACK op for this. ghstack-source-id: 103267234 Test Plan: buck run caffe2/test:quantization -- quantization.test_quantized.TestQuantizedOps.test_channel_shuffle X86 implementation for QNNPACK is sse2 so this may not be the most efficient for x86. Reviewed By: dreiss Differential Revision: D21093841 fbshipit-source-id: 5282945f352df43fdffaa8544fe34dba99a5b97e	2020-05-01 10:07:15 -07:00
Kimish Patel	1510bdd42d	Replace empty_affine_quantizer with new_qtensor_cpu. (#36814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36814 ghstack-source-id: 103218412 From the flamegraph it seems 40% the time we are spending going through the dispatch stack. I think in quantized model where compute can take less time, such overheads become noticeable {F234432545} Test Plan: Quantized op tests. Reviewed By: jerryzh168 Differential Revision: D21093840 fbshipit-source-id: 1b98b57eae403353596fc31171069d2f43b13385	2020-05-01 10:07:10 -07:00
Kimish Patel	6de949afaf	Add quantized adaptive avgpool. (#36813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36813 - Changes to q_avgpool to map special cases of adaptive avgpool to avgpool. - Map special cases of adaptive avg pool to avgpool. ghstack-source-id: 103218410 Test Plan: QuantizedOps.test_adaptive_avgpool2d Reviewed By: z-a-f Differential Revision: D21093837 fbshipit-source-id: c45a03b597eaa59e1057561ee4e8e116ac138f8f	2020-05-01 10:07:05 -07:00
Kimish Patel	f6c82e04a0	Move to using MemoryFormat::ChannelsLast for avgpool2d. (#36812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36812 ghstack-source-id: 103218413 Test Plan: Quantized op tests. Reviewed By: z-a-f Differential Revision: D21093839 fbshipit-source-id: 9b68916e56684efb80dd131eece655a7f3779362	2020-05-01 10:04:49 -07:00
Xiang Gao	e852b45d9f	Overload c10::complex operators inside c10 namespace (#37605 ) Summary: See: https://github.com/pytorch/pytorch/issues/37563#issuecomment-622062118 http://isocpp.github.io/CppCoreGuidelines/CppCoreGuidelines#Ro-namespace Tested by eq and ne operator not failing Pull Request resolved: https://github.com/pytorch/pytorch/pull/37605 Differential Revision: D21349289 Pulled By: anjali411 fbshipit-source-id: d12c89c207e36ebd4c88aa4d06425dd98d58883b	2020-05-01 08:42:35 -07:00
Nikolay Korovaiko	4ed790d742	Adding symbolic sizes, contiguity, stride indices (#36101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36101 Reviewed By: jamesr66a Differential Revision: D20908711 Pulled By: Krovatkin fbshipit-source-id: f90ce74acffeb645d7d906d07e293164d65ed7e6	2020-05-01 02:01:25 -07:00
Michael Suo	9e32a1f5cd	[wip] update graph fuser aliasdb in-place (#37106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37106 Recomputing the aliasdb on every fusion iteration + in every subblock is hugely expensive. Instead, update it in-place when doing fusion. The graph fuser pass operates by pushing nodes into a fusion group. So we start with ``` x, y = f(a, b, c) ``` and end with: ``` x_out, y_out = prim::fusionGroup(a, b, c) x_in, y_in = f(a_in, b_in, c_in) -> x_in, y_in ``` We destroy the `x` and `y` `Value*`s in the process. This operation is easy to express as an update to the aliasDb--`x_out` just takes on all the aliasing information `x` used to have. In particular, since we know `f` and `prim::fusionGroup` are purely functional, we don't have to mess with any write information. This PR is the bare minimum to get this working, in the interest of unscrewing the compilation times ASAP. Followups I want to do: - We don't have a way of expressing deletion of values in AliasDb. In `graph_fuser.cpp` we sometimes construct nodes that we end up throwing away, and we are littering `MemoryDAG` with references to dangling pointers. Because of the way the pass works, it's fine, but this is fragile so I want to fix it. - We should decouple alias analysis from write tracking, to simplify the job of keeping the write caches consistent as we mutate the aliasing information. - the tensorexpr fuser doesn't do this and thus is incorrect today, we need to update it to work. Test Plan: Imported from OSS Differential Revision: D21219179 Pulled By: suo fbshipit-source-id: 8ae5397b3a0ad90edec2fbc555647091f1ad5284	2020-04-30 22:21:35 -07:00
lixinyu	0692804747	add slope == 0 case into standard leaky relu nn test (#37559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37559 Test Plan: Imported from OSS Differential Revision: D21319922 Pulled By: glaringlee fbshipit-source-id: 212ef8e9d0f0d55a312d282693cd5990e0376c6a	2020-04-30 20:56:11 -07:00
Michael Voznesensky	91e74fd843	[JIT] Adds a `code_with_constants` method to module printing (#37586 ) Summary: Closes https://github.com/pytorch/pytorch/issues/36625 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37586 Differential Revision: D21331385 Pulled By: suo fbshipit-source-id: 752e63eac8bdd06c6719efb972cdc832ad7c1535	2020-04-30 20:44:01 -07:00
peter	7c4bda7e6f	Eliminate warnings for cpp extensions on Windows (#37400 ) Summary: Improve the readability of the logs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37400 Differential Revision: D21302597 Pulled By: ezyang fbshipit-source-id: b8cbd33f95b6839ad4c6930bed8750c9b5a2ef7a	2020-04-30 20:28:03 -07:00
Pavel Belevich	5ab36ec98b	Move cauchy_() to DistributionTemplates (#37602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37602 Fixes #37371 Test Plan: Imported from OSS Differential Revision: D21334739 Pulled By: pbelevich fbshipit-source-id: b8443c14760ec825da3f7d300ad496578170671f	2020-04-30 20:03:02 -07:00
Michael Suo	bedc50ed07	Ensure we are diffing against the right thing in clang-format (#37589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37589 Apply the fix from https://github.com/pytorch/pytorch/commit/9f890a92 to the clang-format job as well. Test Plan: Imported from OSS Differential Revision: D21330250 Pulled By: suo fbshipit-source-id: d92b9666ba8b92d049393cbe7f2ce45daa563910	2020-04-30 19:04:33 -07:00
Zafar Takhirov	a09cb5f2f5	[quant] quantized reflection_pad1d (#37452 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37452 Test Plan: Imported from OSS Differential Revision: D21286659 Pulled By: z-a-f fbshipit-source-id: f9f4de497a790b296149313562d09f8ead5facee	2020-04-30 18:45:38 -07:00
svcscm	20f5d4436e	Updating submodules Summary: GitHub commits: `a253b9719d` `6504ae0c4e` `e6a4c5a552` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 267d7c60012547549ce2750c4f6c5bd16024fdfa	2020-04-30 18:40:52 -07:00
Zafar	e841bea465	[quant] QNNPACK Add deconvolution parameters (#36716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36716 Test Plan: Imported from OSS Differential Revision: D21110112 Pulled By: z-a-f fbshipit-source-id: 4b62e1bb3c3b6a3276bc5f8ee5ead0f513ec0137	2020-04-30 18:32:47 -07:00
Michael Suo	5efd10518f	[jit] speed up alias analysis (#36345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36345 During compilation, we spend a huge amount of time in alias analyis. This PR does a few things to speed it up. 1. Separate the analysis into two phases: one where we build up the necessary data structures, and the other where we service aliasing queries. This allows us to defer building indices/maintaining index consistency until after the "buildup" phase is done. 2. Properly memoize/dynamic program the memory locations lookups. 3. Done naively, setting wildcards invalidates the above memoization, trigger costly recomputation. So I added a cache-aware `setWildcards`. Sadly that means you need alias analysis to reach into the guts of memorydag, but the speedup is worth it. Sadly, these changes are kind of coupled for correctness reasons, so they're all here at once. I used this model (thanks IlyaOvodov) as a provisional benchmark. You can get it here: https://www.dropbox.com/s/jlyygn6yygj1jkx/yolov3.zip. Unzip at run `python test_timing.py`. Baseline: (752.076s) right before 6bc8ffe82462c77ac4f9b27452046cb1f8f07d92 After optimizing before inlining: (699.593s) After deferring cache construction: (426.180s) After cache-aware `setWildcards`: (193.678s) So a nice 75% speedup to overall compilation. There's a lot more to do in other places of the compilation pipeline though. Followup to this PR specifically: Everything that fans out from the `analyze` call is the "buildup" phase of AliasDB construction. This should be factored into a separate analysis pass to statically distinguish the two phases (right now we just null out stuff to accomplish the same thing dynamically). Test Plan: Imported from OSS Differential Revision: D20952727 Pulled By: suo fbshipit-source-id: 099f797222d7e71e5c04991584adc2c7eab5a70f	2020-04-30 18:27:41 -07:00
vishwakftw	e98ad6c05b	[RELAND] Remove patches that circumvent MAGMA bug (#35973 ) Summary: Changelog: - The magma implementation of small singular square batch matrices had a bug that resulted in nan values in the LU factorization result. This has been fixed in MAGMA 2.5.2. This PR removes the existing patch that was a temporary workaround for this bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35973 Test Plan: - Existing tests for det and lu should pass This is a re-submit of https://github.com/pytorch/pytorch/issues/34357 Differential Revision: D21336552 Pulled By: seemethere fbshipit-source-id: 9c3b350966913147f1d5811927f3cae10fe620f1	2020-04-30 16:28:36 -07:00
Huayu Li	cd4c3b48a6	Add LN after specialzied output embeddings and flexible LCE (#35178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35178 * add layer norm (LN) after specialized output embeddings * add flexible lce inside specialized module Test Plan: * unit-tests * buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- * buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_6 -- * workflows * flexible lce: f177025325 {F232112501} * LN: f177025301 {F232112982} Differential Revision: D20586281 fbshipit-source-id: 664e77cb4cb5bec6646cafd2e4afb88aff27df03	2020-04-30 15:32:09 -07:00
Edward Yang	6f8838cd2f	Revert D21326386: [pytorch][PR] [Reland] Implement cusparse Descriptor class and clean up cusparse code Test Plan: revert-hammer Differential Revision: D21326386 Original commit changeset: f34875865c8b fbshipit-source-id: 7b173ddc9c6f9d8d496e2bf3cd80bc9b85bda50a	2020-04-30 15:16:36 -07:00
Lu Fang	1aedc2c5b9	Skip c2 ref onnx model tests (#37591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37591 skip the tests since gluster is gone. Test Plan: ci Reviewed By: ezyang Differential Revision: D21330359 fbshipit-source-id: a4e158fb72eddb08ba49fcfa9541569a150f8481	2020-04-30 14:32:47 -07:00
Hong Xu	cd48fb5030	Vectorize linspace on CPU. (#27957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27957 Benchmark (gcc 8.3, Debian Buster, turbo off, Release build, Intel(R) Xeon(R) E-2136): ```python import timeit for dtype in ('torch.double', 'torch.float', 'torch.uint8', 'torch.int8', 'torch.int16', 'torch.int32', 'torch.int64'): for n, t in [(40_000, 50000), (400_000, 5000)]: print(f'torch.linspace(0, 10, {n}, dtype={dtype}) for {t} times') print(timeit.timeit(f'torch.linspace(0, 10, {n}, dtype={dtype})', setup=f'import torch', number=t)) ``` Before: ``` torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times 1.3964195849839598 torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times 1.2374563289922662 torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times 1.8631796519621275 torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times 1.6991038109990768 torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times 1.8358083459897898 torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times 1.7214750979910605 torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times 1.8356257299892604 torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times 1.706238206999842 torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times 1.7463878280250356 torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times 1.6172360889613628 torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times 1.8656846070080064 torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times 1.714238062966615 torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times 1.8272205490502529 torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times 1.6409171230043285 ``` After: ``` torch.linspace(0, 10, 40000, dtype=torch.double) for 50000 times 1.0077099470072426 torch.linspace(0, 10, 400000, dtype=torch.double) for 5000 times 0.8227124120458029 torch.linspace(0, 10, 40000, dtype=torch.float) for 50000 times 1.0058343949494883 torch.linspace(0, 10, 400000, dtype=torch.float) for 5000 times 0.8376779520185664 torch.linspace(0, 10, 40000, dtype=torch.uint8) for 50000 times 1.903041019977536 torch.linspace(0, 10, 400000, dtype=torch.uint8) for 5000 times 1.7576498500420712 torch.linspace(0, 10, 40000, dtype=torch.int8) for 50000 times 1.7628699769848026 torch.linspace(0, 10, 400000, dtype=torch.int8) for 5000 times 1.6204477970022708 torch.linspace(0, 10, 40000, dtype=torch.int16) for 50000 times 2.0970272019621916 torch.linspace(0, 10, 400000, dtype=torch.int16) for 5000 times 1.9493417189805768 torch.linspace(0, 10, 40000, dtype=torch.int32) for 50000 times 2.29020385700278 torch.linspace(0, 10, 400000, dtype=torch.int32) for 5000 times 2.1212510910118 torch.linspace(0, 10, 40000, dtype=torch.int64) for 50000 times 2.3479344319785014 torch.linspace(0, 10, 400000, dtype=torch.int64) for 5000 times 2.156775983981788 ``` Test Plan: Imported from OSS Differential Revision: D20773454 Pulled By: VitalyFedyunin fbshipit-source-id: ebeef59a90edde581669cc2afcc3d65929c8ac79	2020-04-30 14:26:24 -07:00
Tongzhou Wang	2c33ea1c47	[doc] improve tensor.view doc (#36728 ) Summary: fix inaccurate formula. advertise `reshape` better. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36728 Differential Revision: D21094139 Pulled By: zou3519 fbshipit-source-id: 966ce5b84938b384584489040d2f132fee295bb4	2020-04-30 12:17:03 -07:00
svcscm	3e1859959a	Updating submodules Summary: GitHub commits: `2b59db7359` `242186f5ff` `4bf6682fe6` `fe238e5438` `e8cf50093e` `17d9a609f2` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 3c1030ebc3768a827583b50c2d47fba494816943	2020-04-30 11:54:53 -07:00
SsnL	13013848d5	Fix cpp_ext build dir create permission (#34239 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34238 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34239 Differential Revision: D21328036 Pulled By: soumith fbshipit-source-id: dac2735383b1a689139af5a23f61ccbebd1fd6c1	2020-04-30 11:30:07 -07:00
Gregory Chanan	287f3b746e	Remove Backend -> THPLayout mapping. (#37527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37527 This is yet another place that needs to be updated for adding a new "Backend" and is unnecessary. Instead, just use layout_from_backend and have a map from Layout -> THPLayout. Other changes: - rename torch::getDtype and torch::getLayout to torch::getTHPDtype and torch::getTHPLayout since e.g. for layout you are both passing in and returning a "layout" type. - add NumOptions to Layout to match the dtype/ScalarType formulation. Test Plan: Imported from OSS Differential Revision: D21309836 Pulled By: gchanan fbshipit-source-id: ede0e4f3bf7ff2cd04a9b17df020f0d4fd654ba3	2020-04-30 11:11:09 -07:00
Lucas Hosseini	8a30553738	[TensorPipe/RPC] Add TensorPipe dependency (#36695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36695 Reviewed By: lw Differential Revision: D21312297 Pulled By: beauby fbshipit-source-id: 39fdc3de91efa4ac97dd169f09fb304b273b0050	2020-04-30 11:05:15 -07:00
Ansha Yu	b97341e3dd	[c2][opt] nomnigraph transform for ClipRangesGatherSigridHash fusion (#37535 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37535 Fuse ClipRanges + GatherRanges + SigridHash -> ClipRangesGatherSigridHash dpa_product_ctr model's dper2 to dper3 migration is blocked by 3.6% higher prospector cpu usage. Root cause is traced down to sigrid transforms, where ClipRanges, GatherRanges, SigridHash are separately called, instead of fused, as is the case in dper2. Further context: https://fb.quip.com/GijaAZtX5mav https://fb.quip.com/pIDdAjJP2uiG Test Plan: Local benchmarking with small model 181513584_0 (Dper3 full model is 178772812, dper2 refresh is 178770392) Transform turned on: P129799373 Iters per second: 609.291 Transform turned off: P129799397 Iters per second: 519.088 We also want to confirm this performance on the full model in canary and in qrt. `buck build mode/opt-clang mode/no-gpu caffe2/caffe2/fb/predictor:ptvsc2_predictor_bench` `MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 ./buck-out/opt/gen/caffe2/caffe2/fb/predictor/ptvsc2_predictor_bench --pred_net=/data/users/ansha/tmp/dpa/small_pred_net.pb --c2_model=/data/users/ansha/tmp/dpa/181513584_0.predictor --c2_inputs=/data/users/ansha/tmp/dpa/c2_inputs_small.pb --iters=3000 --warmup_iters=100 --num_threads=32 --c2_apply_nomnigraph_passes=1 --caffe2_predictor_enable_preproc_fusion=1` Prospector canary: https://our.intern.facebook.com/intern/ads/canary/426280288521552095/ Check that ClipRangesGatherSigridHash is used: https://fburl.com/scuba/caffe2_operator_stats_canary/e6qfdsat Reviewed By: yinghai Differential Revision: D21262085 fbshipit-source-id: 2c2481e3d4977abb8abe6e9ef0c9999382320ab2	2020-04-30 11:03:47 -07:00
Owen Anderson	20ba29d81c	Add support for reductions on CPU in tensorexpr (#37333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37333 Differential Revision: D21290289 Pulled By: resistor fbshipit-source-id: ebba11f7af9e22b48c47e2eefb9497fa77acd17d	2020-04-30 10:59:38 -07:00
James Reed	d3d10cc14a	Add tests for lower_graph and fix unpack() ops dispatch (#37540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37540 ghstack-source-id: 103169129 Test Plan: buck test mode/no-gpu mode/dev //caffe2/test:jit -- 'test_lower_graph_conv $test_jit\.TestScript$' buck test mode/no-gpu mode/dev //caffe2/test:jit -- 'test_lower_graph $test_jit\.TestScript$' Differential Revision: D21313433 fbshipit-source-id: bb9942272784e517b07537ee4c149b9dc4df4c2a	2020-04-30 10:55:05 -07:00
Tzu-Wei Huang	149b468ce2	[TensorBoard] Fixes missing doc for add_graph (#37504 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37415 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37504 Differential Revision: D21305979 Pulled By: natalialunova fbshipit-source-id: 0606a39a6006a236a37a0e6df87959736474547f	2020-04-30 10:26:19 -07:00
Gao, Xiang	c5624e831d	Add overloads of std:: math functions for c10::complex [resubmit] (#37468 ) Summary: This reverts commit d167a7f6542ca751de0d5bd76653a587f97906f8. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37468 Differential Revision: D21305110 Pulled By: anjali411 fbshipit-source-id: d1bdc9d9feac00331fc2b2b905d49f80bef680f9	2020-04-30 10:20:45 -07:00
Xiao Wang	e9db16e0c1	[Reland] Implement cusparse Descriptor class and clean up cusparse code (#37533 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/37389. Fix for the cuda 10.1 CI failure. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37533 Differential Revision: D21326386 Pulled By: ezyang fbshipit-source-id: f34875865c8bad76163995c18d88b0e76656bb22	2020-04-30 10:12:44 -07:00
Vasiliy Kuznetsov	5bb01568c3	speed up and re-enable quantized bn unit tests (#37420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37420 The quantized BN unit tests were disabled because they took too long. This diff removes hypothesis from these test cases and instead generates the cases manually. The run time is ~7 seconds per test on my devgpu. Test Plan: ``` python test/test_quantization.py TestQuantizedOps.test_batch_norm2d_relu python test/test_quantization.py TestQuantizedOps.test_batch_norm3d ``` Imported from OSS Differential Revision: D21310333 fbshipit-source-id: 2499f7a3d6a87c0278d012ae65132f148cee6d2e	2020-04-30 09:44:44 -07:00
Gregory Chanan	f09eb391b9	Move masked_select broadcasting from codegen layer to native layer. (#37543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37543 Test Plan: Imported from OSS Differential Revision: D21315038 Pulled By: gchanan fbshipit-source-id: 66e255a9a696db189154605c84dca2a0f3b9ee5c	2020-04-30 09:12:09 -07:00
Mo Zhou	69e2f1aaff	[cmake] add HAVE_SOVERSION option (default=OFF). (#37502 ) Summary: This is useful for linux distributions when the ABI/API of libtorch has been changed. The default SOVERSION is set to "${TORCH_VERSION_MAJOR}.${TORCH_VERSION_MINOR}". ezyang But if the release strategy of pytorch/caffe2 involves avoiding breaking API/ABI changes to libtorch for minor/patch releases, then we can set `TORCH_SOVERSION` to simply `TORCH_VERSION_MAJOR`. Please confirm that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37502 Differential Revision: D21303565 Pulled By: ezyang fbshipit-source-id: 798f5ec7fc5f0431ff1a7f9e8e5d3a0d3b25bb22	2020-04-30 06:52:33 -07:00
peter	4c8636c74c	Unify the path for environment restore script (#37486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37486 Differential Revision: D21326213 Pulled By: ezyang fbshipit-source-id: 6f0fd07da51439e999026593951396bfc26a2abf	2020-04-30 06:42:27 -07:00
Jiakai Liu	6792bafa72	[pytorch] aten codegen to filter backends for default mobile build Summary: This is a simple change to mitigate the OSS mobile default build size regression caused by #34275 and #34622. Mobile supported backends are already kinda hard-coded in function_wrapper.py as `static_dispatch_backends`: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/function_wrapper.py#L243 This is simply to align dynamic registration with static dispatch for mobile build. To measure mobile build size: ``` // Default mobile build: scripts/build_pytorch_android.sh armeabi-v7a // MobileNetV2 custom build: SELECTED_OP_LIST=MobileNetV2.yaml scripts/build_pytorch_android.sh armeabi-v7a ``` - arm-v7 Android AAR (compressed) size: ``` +----------+-------------------+---------------+ \| \| MobileNetV2 Build \| Default Build \| +----------+-------------------+---------------+ \| Original \| 3,354,589 \| 5,731,992 \| \| #34275 \| 3,404,978 \| 6,640,526 \| \| #34622 \| 3,432,569 \| 6,640,526 \| \| This PR \| 3,431,660 \| 6,534,135 \| +----------+-------------------+---------------+ ``` Differential Revision: D20415107 Test Plan: Imported from OSS Pulled By: ljk53 fbshipit-source-id: 75acf4dc5dfe9242c01b2db0b84bd6b4a1d0cd8d	2020-04-30 01:35:38 -07:00
Ailing Zhang	68250fa557	Vanilla Pytorch bionic clang9 test in CI (#36711 ) Summary: fixes https://github.com/pytorch/pytorch/issues/36676 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36711 Differential Revision: D21148986 Pulled By: ailzhang fbshipit-source-id: e383bd50c893b5eee850eab65aa4681a6b491706	2020-04-29 23:07:46 -07:00
Jiakai Liu	9d0891f886	[pytorch][buck] tweak code analyzer e2e script Summary: - Add debug mode to include debug information. - Move codegen comment to FB shell script (as it's only checked-in FB repo). - Analyze lite-predictor instead of full-JIT as full-JIT BUCK target contains variable kernels thus pull in a lot more dependencies. - Use pre-opt bitcode instead of pre-codegen bitcode - there is one special `callOp()` case in RNN.cpp where optimized bitcode has opname string and API body inlined together: https://fburl.com/diffusion/8rz6u4rg; pre-optimization bitcode should give more stable result. Test Plan: - Tested the bash script with stacked diff. Reviewed By: iseeyuan Differential Revision: D21298837 fbshipit-source-id: be33e2db5d8cb0f804460c503e52beb0dcb4857f	2020-04-29 22:38:09 -07:00
Supriya Rao	ac5403f22e	[quant] Check qengine for TestNormalization (#37562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37562 The model has a LinearLayer which needs fbgemm. Fixes failing windows test. Test Plan: python test/test_quantization.py TestPostTrainingStatic Imported from OSS Differential Revision: D21321032 fbshipit-source-id: 1671fdef5d0a1b43e2a4e703a8852d522af32288	2020-04-29 22:16:43 -07:00
Meghan Lele	091a1192d7	[JIT] Convert float Tensor argument to double in prim::tolist (#37465 ) Summary: Summary Converting a float `Tensor` to a Python list is not supported because Python's float is actually a double. This commit modifies the implementation of `prim::tolist` so that it converts an input argument that is a float Tensor into a double Tensor and emits a warning. Test Plan Modified and ran the corresponding unit test. Before ``` ====================================================================== ERROR: test_to_list (jit.test_list_dict.TestList) Unit tests for Tensor.tolist() function. ---------------------------------------------------------------------- ... RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript (most recent call last): RuntimeError: Output annotation element type and runtime tensor element type must match for tolist() ---------------------------------------------------------------------- Ran 1 test in 0.151s FAILED (errors=1) ``` After ``` UserWarning: Converting float Tensor to double because tolist is only supported for double type Tensors (Triggered internally at ../torch/csrc/jit/runtime/register_prim_ops_fulljit.cpp:626.) return callable(args, *kwargs) . ---------------------------------------------------------------------- Ran 1 test in 0.210s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37465 Differential Revision: D21311829 Pulled By: SplitInfinity fbshipit-source-id: a0c1796013e35baf8d7641af271424a10e26f161	2020-04-29 21:17:19 -07:00
svcscm	eb5590d6f4	Updating submodules Summary: GitHub commits: `19f74a96e6` `ac50e17058` `2d1a80916f` `b9587ed249` `98b76fab50` `b938e6042b` `e661b714ec` `7dfd2114cb` `a2138b57e5` `bbd7e74dd9` `b2bdf8486e` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 1472de3aa441e87859255ad3a73156c180078a1f	2020-04-29 20:19:03 -07:00
Tao Xu	a0075c4825	[XNNPACK] Disable xnnpack ops for both iOS and macOS (#37528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37528 The XNNPACK is currently running into some threading issues using the pthreadpool on iOS. The AIBench has been failing since 4/23. The fix could take days to land, so disable it for the time being. ghstack-source-id: 103151011 Test Plan: - ` buck run mode/mac aibench:run_bench_macos -- -b aibench/specifications/models/pytorch/pytext/pytext_mobile_inference.json --platform ios --framework pytorch --remote --devices D201AP-12.0.1` - `buck test PyTorchPlaygronud` ``` RESULTS FOR //fbobjc/Apps/Internal/PyTorchPlayground:PyTorchPlaygroundTests //fbobjc/Apps/Internal/PyTorchPlayground:verify_no_third_party_license_PyTorchPlayground_test //tools/build_defs/apple/plugins:plugin_function_tests //xplat/configurations/buck/apple/plugins/tests:FBPluginSmokeTests PASS 316ms 2 Passed 0 Skipped 0 Failed PyTorchBITests PASS 1.5s 1 Passed 0 Skipped 0 Failed PyTorchFBNetTests PASS <100ms 1 Passed 0 Skipped 0 Failed //fbobjc/Apps/Internal/PyTorchPlayground:verify_no_third_party_license_PyTorchPlayground_test PASS 2.1s 1 Passed 0 Skipped 0 Failed com.facebook.starlark.testing.plugin_function_tests PASS <100ms 1 Passed 0 Skipped 0 Failed FBPluginCovariantSanityTests PASS <100ms 5 Passed 0 Skipped 0 Failed FBPluginEnumSanityTests PASS <100ms 3 Passed 0 Skipped 0 Failed FBPluginFunctionSanityTests PASS <100ms 8 Passed 0 Skipped 0 Failed FBPluginListLookupSanityTests PASS <100ms 5 Passed 0 Skipped 0 Failed FBPluginSortedBySanityTests PASS <100ms 5 Passed 0 Skipped 0 Failed FBPluginSplitSchemaSanityTests Updated test logs: buck-out/log/test.log TESTS PASSED ``` Reviewed By: kimishpatel, iseeyuan Differential Revision: D21309715 fbshipit-source-id: a09a7f20d6b9d995c8fb54fe44bd9b33884c78d1	2020-04-29 20:05:14 -07:00
Jerry Zhang	482d1f4b8c	[quant][graphmode] fix observer instance copy (#37185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37185 Previously observer instances are sharing the same Tensor attributes, it is OK if we don't do inplace operations on these attributes, but will become a problem when people do inplace changes. This PR uses deepcopy instead of clone_instance which will copy the tensor for each instance. Test Plan: . Imported from OSS Differential Revision: D21309084 fbshipit-source-id: afd974b0c97886fbab815e9c711c126379fe3e17	2020-04-29 19:51:29 -07:00
Vasiliy Kuznetsov	a961d3acf3	graph mode: add handling for layer_norm op (#37525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37525 Adds graph mode handling for the layer_norm op Test Plan: ``` python test/test_quantization.py TestQuantizeScriptPTSQOps.test_layer_norm ``` Imported from OSS Differential Revision: D21310360 fbshipit-source-id: 83f475fb30b89c29623b79a6a6bfd5c20c569b51	2020-04-29 19:43:59 -07:00
Vasiliy Kuznetsov	4e7403c286	graph mode: add hardswish op (#37524 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37524 Adds graph mode handling for the hardswish op. Test Plan: ``` python test/test_quantization.py TestQuantizeScriptPTSQOps.test_hardswish ``` Imported from OSS Differential Revision: D21310365 fbshipit-source-id: 07e79943ed8095a5220be1582867b33597b85855	2020-04-29 19:43:53 -07:00
Vasiliy Kuznetsov	7ac98c9396	graph mode: refactor quantized hardswish API for easier graph handling (#37523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37523 Makes the quantized hardswish function API more suited to graph mode handling, which will come in the next PR. Test Plan: CI Imported from OSS Differential Revision: D21310364 fbshipit-source-id: 0d438dce5b87481d558c07bcccd9fe717200b4dc	2020-04-29 19:43:48 -07:00
Vasiliy Kuznetsov	11b6f70f7d	graph mode: add hardsigmoid op (#37522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37522 Adds hardsigmoid op to graph mode handling. Test Plan: CI Imported from OSS Differential Revision: D21310363 fbshipit-source-id: 4d9f3bb032fb5a4d8f0cf84bff230fc1ce222c3c	2020-04-29 19:43:43 -07:00
Vasiliy Kuznetsov	6cdc8cac47	graph mode: add elu op (#37521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37521 Adds ELU to graph mode handling. Test Plan: CI Imported from OSS Differential Revision: D21310361 fbshipit-source-id: 045fc3af796dea67e0153255648fe5911e70bbed	2020-04-29 19:43:38 -07:00
Vasiliy Kuznetsov	400098d492	graph mode: add hardtanh op (#37469 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37469 Adds graph mode handling for hardtanh op. Test Plan: CI Imported from OSS Differential Revision: D21310362 fbshipit-source-id: 63e4ffca5cf2f345c2a66f84db2193a5e14b1028	2020-04-29 19:42:24 -07:00
Supriya Rao	b33b46a950	[quant] Enable qnnpack tests for test_quantize and test_numeric_suite (#37351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37351 Test Plan: python test/test_quantization.py PostTrainingStaticQuant Imported from OSS Differential Revision: D21293704 fbshipit-source-id: 621f3ac60315b61f99b9b41da691ac3473e974cc	2020-04-29 19:28:22 -07:00
Shen Li	b48239af3c	Cleanup internal functions in python_functions.cpp (#37536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37536 Test Plan: Imported from OSS Differential Revision: D21312500 Pulled By: mrshenli fbshipit-source-id: 52293323d2083d300712b1811cb6784419ea441c	2020-04-29 19:12:46 -07:00
Shen Li	322e564ee3	Minor format cleanup in py_rref.cpp (#37520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37520 Test Plan: Imported from OSS Reviewed By: xush6528 Differential Revision: D21308889 Pulled By: mrshenli fbshipit-source-id: 36d5efc4d9c3e6cc0b2abec35675a338a2f81424	2020-04-29 19:12:40 -07:00
Shen Li	d5b38984c8	Let RPC return FutureIValue instead of FutureMessage (#37519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37519 closes #37446 Currently FutureMessage is used in several places: 1. `rpc_async` returns a `FutureMessage` object and we expose it as `torch.distributed.rpc.Future`. From applications perspective, they are expecting a `py::object` instead of a `Message`, and we do the conversion in the `Future.wait()` pybind method. 2. RPC autograd profiler takes `FutureMessage` and installs callbacks to it. The profiler actually only need a `Future<T>` and does not care what `T` is. 3. `OwnerRRef` exposes a `getFuture()` API which returns a `FutureMessage`. This `FutureMessage` will be marked completed when the value referenced by the `OwnerRRef` is ready. `OwnerRRef` does not need it to be a Message type either, it actually creates an empty `Message` to mark the `Future`. The above places are using `FutureMessage`, but they don't really need a `Message`, and `Message` is a communication layer type that applications or profiler or the RRef shouldn't be aware of. Another motivation for making this change is that for async RPC UDF #36071, we are going to allow application to call `markCompleted` in Python. If we still use `FutureMessage`, then in the `markCompleted` pybind function, it needs to convert the provided `py::object` into a specific message type, which is leaking communication layer code to pybind functions. Even if this is doable, we will have two entities (RPC agent and pybind Python frontend) accessing the same request callback logic. This is too messy. This commit replaces all surface `FutureMessage` with `FutureIValue`, so that `FutureMessage` is no longer visible from Python land. Note that this does not cause BC issues, as the Python Future type name and its API stay intact. Internally, we still have `FutureMessage` in the communication layer. Test Plan: Imported from OSS Reviewed By: xush6528 Differential Revision: D21308887 Pulled By: mrshenli fbshipit-source-id: 4f574f38e83125081f142813cfdde56119522089	2020-04-29 19:10:29 -07:00
Kimish Patel	e9db51f9af	Enable float requantization for avgpool/gavgpool ops. (#37037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37037 For avgpool and gavgpool change requantization scheme: Similar to conv and linear now convert the accumulated int32 values to float, apply requantization scale which includes the averaging multiplier. Conver the resulting float value to int32. Add output_zero_point. Benchmark numbers compared to baseline: % speedup on pixel XL. ------------------------------ \| \| aarch32 \| aarch64\| \|avgpool \| .4 \| 13.6 \| \|gavgpool \| -2.6% \| 3.5% \| ------------------------------- Test Plan: Tested via q8avgpool-test, q8gavgpool-test, average-pooling-test and global-average-pooling-test in PT QNNPACK. Also via integated test_quantized.py. python test/quantization/test_quantized.py Imported from OSS Differential Revision: D21168981 fbshipit-source-id: 9060324304603ca7fd380c788a87b01a6d586c5c	2020-04-29 18:56:43 -07:00
suffian khan	d5363e6499	Set onnx opset version before model select (#37466 ) Summary: Set opset version before model select call - which is used to trigger warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37466 Reviewed By: hl475 Differential Revision: D21308796 Pulled By: houseroad fbshipit-source-id: 0974b9d5b6562d4451f54053138174f663a17aa3	2020-04-29 17:37:09 -07:00
Xiang Gao	1ef992639d	Make c10::complex the C++ type for complex tensors (#37421 ) Summary: # Overview This PR changes the backing type of complex tensors in `ScalarType` from `std::complex` to `c10::complex`. Since `c10::complex` and `std::complex` are reinterpret-castable, we can freely use `std::complex ` to access `c10::complex` data and vice versa. The implementation of `c10::complex` is not complete yet, so we are reinterpret casting all complex data to `std::complex` during dispatch, and do all operations in `std::complex`. # `std::complex` and `c10::complex` interoperatability To use `std::complex ` to access `c10::complex` data, the following specializations are added: ```C++ template <> inline std::complex<float>* Tensor::data_ptr(); template <> inline std::complex<double>* Tensor::data_ptr(); template <> inline std::complex<float> Tensor::item(); template <> inline std::complex<double> Tensor::item(); ``` See [`aten/src/ATen/templates/TensorMethods.h`](https://github.com/pytorch/pytorch/pull/37274/files#diff-0e8bf6f5024b32c240a4c1f0b4d8fd71) And ```C++ template <> inline std::complex<float> Scalar::to(); template <> inline std::complex<double> Scalar::to(); ``` is added in [`c10/core/Scalar.h`](https://github.com/pytorch/pytorch/pull/37274/files#diff-aabe1c134055c8dcefad830c1c7ae957) # Dispatch Macros in [`Dispatch.h`](https://github.com/pytorch/pytorch/pull/37274/files#diff-737cfdab7707be924da409a98d46cb98) still using `std::complex` as its type. We will add macros such as `AT_DISPATCH_ALL_TYPES_AND_C10_COMPLEX_AND3` as needed during the migration and not in this PR. Note that `AT_DISPATCH_ALL_TYPES_AND_COMPLEX_AND3` is only used in copy kernel of CUDA, and this PR is already changing it to use `c10::complex` because CUDA copy kernel has to use its original dtype otherwise there will be funny casting of dtypes causing cuda unspecified launch error. When all the migration is done, the c10 version of macros will be removed, and the default version will have `std::complex` replaced by `c10::complex` by default. This design allows us to incrementally migrate from `std::complex` to `c10::complex`. # Note Note that the `std::complex` is not completely replaced by `c10::complex` in c10 yet, for example `c10::Scalar` is still using `std::complex`. This will be fixed in later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37421 Differential Revision: D21282161 Pulled By: anjali411 fbshipit-source-id: 635e309e8c8a807c2217723ad250b5ab5a20ce45	2020-04-29 16:42:49 -07:00
Ning Dong	5bb9357345	Update assertion in MHA forward to support FP16 training (#37539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37539 Bug fix Test Plan: This passed fbtranslate local integration test when I toggle fp16 to true on GPU. Also it passed in with D21312488 Reviewed By: zhangguanheng66 Differential Revision: D21311505 fbshipit-source-id: 7ebd7375ef2c1b2ba4ac6fe7be5e7be1a490a319	2020-04-29 16:29:23 -07:00
Michael Suo	896f8130a6	Revert D21297549: [jit] fix trace checking reporting divergent names Test Plan: revert-hammer Differential Revision: D21297549 Original commit changeset: 981d5879a4a2 fbshipit-source-id: 9be6e88007c644914973a305f9e7a961ef11a815	2020-04-29 16:16:44 -07:00
Nikita Shulga	f1cd0eeb70	`IValue(bool)` constructor should initialize entire payload (#37513 ) Summary: Closes https://github.com/pytorch/pytorch/issues/37117 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37513 Differential Revision: D21310444 Pulled By: malfet fbshipit-source-id: 6ecb3504c13688d42daed4c847c919171d368830	2020-04-29 15:59:29 -07:00
kshitij12345	7e9cc4df85	Migrate `cos` and `cos_` from TH to ATen (CUDA) (#36653 ) Summary: Benchmark with same build settings on same system. Closes https://github.com/pytorch/pytorch/issues/24545 gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.cos(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.cos(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.cos(a) a.numel() == 10000 for 20000 times torch.half 0.2797315450006863 torch.cos(a) a.numel() == 10000 for 20000 times torch.float 0.283109110998339 torch.cos(a) a.numel() == 10000 for 20000 times torch.double 0.3648525129974587 torch.cos(a) a.numel() == 100000 for 20000 times torch.half 0.34239949499897193 torch.cos(a) a.numel() == 100000 for 20000 times torch.float 0.33680364199972246 torch.cos(a) a.numel() == 100000 for 20000 times torch.double 1.0512770260102116 ``` After: ``` torch.cos(a) a.numel() == 10000 for 20000 times torch.half 0.285825898999974 torch.cos(a) a.numel() == 10000 for 20000 times torch.float 0.2781305120001889 torch.cos(a) a.numel() == 10000 for 20000 times torch.double 0.34188826099989456 torch.cos(a) a.numel() == 100000 for 20000 times torch.half 0.29040409300023384 torch.cos(a) a.numel() == 100000 for 20000 times torch.float 0.28678944200009937 torch.cos(a) a.numel() == 100000 for 20000 times torch.double 1.065477349000048 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36653 Differential Revision: D21164675 Pulled By: VitalyFedyunin fbshipit-source-id: 5dd5d3af47c2a5527e1f4ab7669c2ed9a2293cee	2020-04-29 15:52:24 -07:00
Nikita Shulga	6098cf7e33	Add `sched_setaffinity` check from libgomp to `valgrind.sup` (#37532 ) Summary: - It's valid to call `sched_setaffinity` with nullptr - The call is coming from libomp which should be valgrind safe Pull Request resolved: https://github.com/pytorch/pytorch/pull/37532 Test Plan: CI Differential Revision: D21311252 Pulled By: malfet fbshipit-source-id: a325f97741b997738c35759d02fcc34c1cb44d95	2020-04-29 14:48:23 -07:00
Jesse Brizzi	bca82801e7	add support for generating Vandermonde matrices (#36725 ) Summary: Adds support for generating Vandermonde matrices based off of the Numpy implementation found [here](https://github.com/numpy/numpy/blob/v1.17.0/numpy/lib/twodim_base.py#L475-L563). Adds test to ensure generated matrix matches expected Numpy implementation. Note test are only limited to torch.long and torch.double due to differences in now PyTorch and Numpy deal with type promotion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36725 Differential Revision: D21075138 Pulled By: jessebrizzi fbshipit-source-id: 6bb1559e8247945714469b0e2b07c6f4d5fd1fd0	2020-04-29 13:16:26 -07:00
Edward Yang	f7dce8508c	Revert D21302691: [pytorch][PR] Implement cusparse Descriptor class and clean up cusparse code Test Plan: revert-hammer Differential Revision: D21302691 Original commit changeset: ecbb4063466c fbshipit-source-id: 56ae47273691a12cc8d96635fb4ad9d09080ccc9	2020-04-29 12:57:02 -07:00
Zafar	297cc5512e	[quant] Enable convolution tests (#37494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37494 Test Plan: Imported from OSS Differential Revision: D21299442 Pulled By: z-a-f fbshipit-source-id: 68513b52aaef852278f28031866f85123b016486	2020-04-29 12:24:45 -07:00
Martin Yuan	ec5fb29b96	Add overload names to dict operators. (#37279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37279 Test Plan: Imported from OSS Differential Revision: D21243579 Pulled By: iseeyuan fbshipit-source-id: 2006bc15bdebd325ece037150065fe5b25f0cbc1	2020-04-29 12:10:28 -07:00
Michael Suo	9e97e9244f	Fix mobile type resolution in unpickling (#37425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37425 The mobile type resolver that we inject into the unpickler currently creates a dummy type for everything, even built-in types like List[int]. This PR restricts that behavior to types that start with `__torch__`, and uses the mobile type parser for everything else. I don't like this solution because it relies on a fragile invariant that all "class-like" types have qualified names that start with `__torch__`. I think the long term solution is to just re-use the script type parser here. Test Plan: Imported from OSS Differential Revision: D21291331 Pulled By: suo fbshipit-source-id: c94709bcbd1bac75336e033fd9d3afa6656b0a77	2020-04-29 12:03:46 -07:00
Edward Yang	a3ab560f6c	Port xnnpack operators to new registration API (#36800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36800 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21089650 Pulled By: ezyang fbshipit-source-id: 1babdb5524038e3951d3c4303e4ba87e68b4f138	2020-04-29 11:29:23 -07:00
Bartosz Gasiorzewski	867e05921f	Fix multiple issues with type annotations (#36358 ) Summary: - added tests that showcase the problems - fixed the problems These changes would allow me to remove many "# type: ignore" comments in my codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36358 Differential Revision: D21230704 Pulled By: ezyang fbshipit-source-id: e6d475a0aa1fb40258fa0231ade28c38108355fb	2020-04-29 11:16:39 -07:00
Xiao Wang	bbf29a5239	Implement cusparse Descriptor class and clean up cusparse code (#37389 ) Summary: Add cusparse Descriptor class. Add cusparse Generic API wrapper. Clean up current cuda sparse code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37389 Differential Revision: D21302691 Pulled By: ezyang fbshipit-source-id: ecbb4063466c616eebfe681f1622724692be505c	2020-04-29 11:08:07 -07:00
Nikita Shulga	1bb66a0cd4	Extend some of the basic ops to kHalf (#37121 ) Summary: Added enough operators to make sure that all unit tests from ATen/basic are passing, except for MM and IntArrayRefExpansion Pull Request resolved: https://github.com/pytorch/pytorch/pull/37121 Test Plan: `./bin/basic --gtest_filter=--gtest_filter=BasicTest.BasicTestHalfCPU` + `python -c "import torch; x = torch.tensor([2], dtype=torch.half); print(torch.isfinite(x+x))"` Differential Revision: D21296863 Pulled By: malfet fbshipit-source-id: e03d7a6939df11f611a9b317543bac52403cd009	2020-04-29 10:49:16 -07:00
ashishfarmer	bbd2350c99	Disable tests failing on test2 in ROCm CI (#37427 ) Summary: This pull request disables the unit tests that were observed to be failing once `test2` was enabled. These tests will be one by one looked at and fixed at the earliest, but until then disabling them to unblock `test2` The pull request also disables fftPlanDestroy for rocFFT to avoid double-freeing FFT handles cc: ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/37427 Differential Revision: D21302909 Pulled By: ezyang fbshipit-source-id: ecadda3778e65b7f4f97e24b932b96b9ce928616	2020-04-29 09:56:28 -07:00
Mo Zhou	58a46a174e	[cmake] add USE_SYSTEM_{XNNPACK,ONNX} options. (#37501 ) Summary: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37501 Differential Revision: D21303527 Pulled By: ezyang fbshipit-source-id: 58353d78c66e5bcc9198ce8cde36ac7232bb4b2f	2020-04-29 09:26:16 -07:00
Hong Xu	0d9e3b48c4	Remove THCudaMemGetInfo. Use c10's cacheInfo instead. (#37447 ) Summary: `THCudaMemGetInfo` has only been used in `aten/src/ATen/native/cudnn/Conv.cpp`. We can extract `c10::cuda::CUDACachingAllocator::cacheInfo` out from it and use it in `aten/src/ATen/native/cudnn/Conv.cpp` directly and drop lines that are not used in `THCudaMemGetInfo`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37447 Differential Revision: D21302770 Pulled By: ezyang fbshipit-source-id: 41ad68b8fd5ecc7bc666a6861789c6c1f743f420	2020-04-29 09:20:26 -07:00
Michael Suo	68895eda9d	add fmt, take 7 (#37356 ) Summary: fmt is a formatting library for C++. It has several properties that make it nice for inclusion in PyTorch: - Widely used - Basically copies how Python does it - Support for all the compilers and platforms we care about - Standards track (C++20) - Small code size - Header only This PR includes it as a submodule and sets up the build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37356 Differential Revision: D21262619 Pulled By: suo fbshipit-source-id: 1d9a1a5ed08a634213748e7b02fc718ef8dac4c9	2020-04-29 09:08:24 -07:00
Michela Paganini	d37a4861b8	Explicit attribute setting for pruning and weight_norm upon reparam removal (#34170 ) Summary: To address one of the problems with RNNs that emerged in https://github.com/pytorch/pytorch/issues/33618, I modified the `remove` methods in `torch.nn.utils.prune` and `torch.nn.utils.weight_norm` to make an explicit call to `setattr`, which, in `rnn.py` directly modifies `_flat_weights` (https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/rnn.py#L96) to include the new element. This is important so that `_flat_weights` can reflect the presence of the `Parameter` after the (pruning or weight norm) reparametrization is removed. Without this, the weight in `_flat_weights` would remain a tensor, as originally set by the reparametrization. Simple testing is added, which depends on the current naming scheme for the LSTM module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34170 Differential Revision: D21265965 Pulled By: mickypaganini fbshipit-source-id: 29de4a6b17052d42ccfe67c8560b7f83c20fd09d	2020-04-29 09:01:59 -07:00
Kimish Patel	6176931695	Disable stateless xnnpack for ios. (#37460 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37460 It seems that stateless xnnpack integration for Convolution is breaking iOS runs. Issue seems to be stemming from passing some invalid pointer or pointer that is not longer valid. But beyond this the issue has not been root caused. The issues seems to appear only for iOS so far but blanket disabling it for both ios and android since this improvement had only been recent so no production models are running with this perf improvement yet. Hence no perf regression expected. Test Plan: buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/pytext/pytext_mobile_inference.json --platform ios --framework pytorch --remote --devices D221AP-12.0.1 Reviewed By: xta0 Differential Revision: D21284385 fbshipit-source-id: 1fe01e3a476b340697972743dadf64333cc86b3f	2020-04-29 08:24:24 -07:00
cyy	9259a283b7	use detected python version to find pylibs (#34041 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34041 Differential Revision: D21302552 Pulled By: ezyang fbshipit-source-id: 140c3d2146bad8feb425cf3670cffdbabc5101b1	2020-04-29 08:17:15 -07:00
Pavel Belevich	ec8517b6df	Move exponential_() to DistributionTemplates (#37456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37456 Fixes #37370 Test Plan: Imported from OSS Differential Revision: D21290781 Pulled By: pbelevich fbshipit-source-id: 2f516b5112b9ce1c9ba8967b3758decf86d65676	2020-04-29 08:07:35 -07:00
Pavel Belevich	06168bf17d	Move geometric_() to DistributionTemplates (#37418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37418 Fixes #37369 Test Plan: Imported from OSS Differential Revision: D21290757 Pulled By: pbelevich fbshipit-source-id: 42133f35edcbe716a07987bef2e68a4cdc27236a	2020-04-29 08:07:30 -07:00
Pavel Belevich	ce6077d7a8	Move log_normal_() to DistributionTemplates (#37392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37392 Fixes #37368 Test Plan: Imported from OSS Differential Revision: D21290740 Pulled By: pbelevich fbshipit-source-id: 15a76b2625d2ca8187c25333a86eecd111a259c6	2020-04-29 08:06:05 -07:00
Xiang Gao	253943d5a7	Remove thrust_t from remainder_kernel_cuda (#37470 ) Summary: complex is not supported, so no need to use thrust Pull Request resolved: https://github.com/pytorch/pytorch/pull/37470 Differential Revision: D21296501 Pulled By: anjali411 fbshipit-source-id: bf2075ac933a793b9cdddcda0918604e7574ee2d	2020-04-29 04:01:17 -07:00
Linbin Yu	1b525f88ce	Print all ops in model converter Summary: It will be convenient to print ops names when converting the model in xplat. This diff moves export_opnames to export_module.cpp so it can be used in xplat (caffe2:optimize_for_mobile and caffe2:torch_train). This function was in caffe2/torch/csrc/jit/serialization/export.cpp. I tried to create a target to include this file but it involves too many ONNX deps and I cannot get it to work. Test Plan: local test, verified op names are printed Reviewed By: iseeyuan Differential Revision: D20961557 fbshipit-source-id: 293569081b29c263c1c441df7a63838a81560ce9	2020-04-29 02:14:59 -07:00
peter	bf53784e3c	Treat cross-execution-space-call as errors for NVCC on Windows (#37302 ) Summary: On Windows, when you call those unsupported functions like `std::pow`, `std::isnan` or `std::isinf` in the device function and compile, a warning is thrown: ``` kernel.cu kernel.cu(39): warning: calling a __host__ function from a __host__ __device__ function is not allowed kernel.cu(42): warning: calling a __host__ function from a __host__ __device__ function is not allowed kernel.cu(39): warning: calling a __host__ function("isnan<double> ") from a __host__ __device__ function("test_") is not allowed kernel.cu(42): warning: calling a __host__ function("isinf<double> ") from a __host__ __device__ function("test_") is not allowed ``` However, those calls will lead to runtime errors, see https://github.com/pytorch/pytorch/pull/36749#issuecomment-619239788 and https://github.com/pytorch/pytorch/issues/31108. So we should treat them as errors. Previously, the situation is worse because the warnings are turned off by passing in `-w`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37302 Differential Revision: D21297207 Pulled By: ngimel fbshipit-source-id: 822b8a98c10e54c38319674763b6681db21c1021	2020-04-29 01:52:52 -07:00
Michael Suo	4bfa51d405	[jit] fix trace checking reporting divergent names (#37464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37464 Fixes https://github.com/pytorch/pytorch/issues/23993. There are two fixes here: 1. Previously our name lookup function for the tracer was looking in f.globals for names. For example: ``` sample = torch.ones(1) traced = torch.jit.trace(my_mod, ((sample, sample,),)) # produces a graph with something like # %sample, %sample = prim::TupleUnpack(%input) ``` This is not great if you are, e.g. trace checking, because a non-local bit of interpreter state is affected the graph produced: ``` traced = torch.jit.trace(my_mod, _clone_inputs((sample, sample,),)) # produces a graph with something like # %0, %1 = prim::TupleUnpack(%input) ``` I have removed this functionality, as I don't think it provides huge value. Things that look locally for names will still work, so e.g. inputs, intermediate variables, and the like will be named correctly. 2. Previously, our input cloning for trace checking didn't do a memoized deep copy. So: ``` _clone_inputs((sample, sample, sample)) ``` produces a tuple with three non-aliased tensors. That's wrong! Use copy.deepcopy with a memoization argument to fix this. Test Plan: Imported from OSS Differential Revision: D21297549 Pulled By: suo fbshipit-source-id: 981d5879a4a244520dd68489767129ff357f1497	2020-04-28 23:52:57 -07:00
Elias Ellison	a55d80e1c5	[JIT] remove dominated guards of functional values (#37105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37105 If a value isn't mutated anywhere and is guarded by a node, then we can remove all other guards that are dominated by the first guard. This reduces the number of (test name, Ifs/Loops, non-tensor nodes excluding getAttr and Bailouts) from the previous PR for the following tests: ``` Before: ('upsample', 0, 13) After: ('upsample', 0, 5) Before: ('upsample', 0, 2) After: ('upsample', 0, 1) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 12) After: ('interpolate', 0, 4) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 12) After: ('interpolate', 0, 4) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 12) After: ('interpolate', 0, 4) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 12) After: ('interpolate', 0, 4) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 7) After: ('interpolate', 0, 4) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 7) After: ('interpolate', 0, 4) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 7) After: ('interpolate', 0, 4) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 17) After: ('interpolate', 0, 4) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 17) After: ('interpolate', 0, 4) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 17) After: ('interpolate', 0, 4) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 1, 21) After: ('interpolate', 1, 18) Before: ('interpolate', 0, 3) After: ('interpolate', 0, 2) Before: ('interpolate', 1, 21) After: ('interpolate', 1, 20) Before: ('interpolate', 0, 3) After: ('interpolate', 0, 2) Before: ('interpolate', 1, 13) After: ('interpolate', 1, 11) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 1, 15) After: ('interpolate', 1, 13) Before: ('interpolate', 0, 3) After: ('interpolate', 0, 2) Before: ('interpolate', 1, 25) After: ('interpolate', 1, 21) Before: ('interpolate', 0, 1) After: ('interpolate', 0, 0) Before: ('interpolate', 1, 27) After: ('interpolate', 1, 23) Before: ('interpolate', 0, 3) After: ('interpolate', 0, 2) Before: ('test_nn_BatchNorm1d_affine', 2, 3) After: ('test_nn_BatchNorm1d_affine', 1, 2) Before: ('test_nn_BatchNorm1d_3d_input', 2, 3) After: ('test_nn_BatchNorm1d_3d_input', 1, 2) Before: ('test_nn_BatchNorm1d_affine_simple_average', 2, 5) After: ('test_nn_BatchNorm1d_affine_simple_average', 1, 4) Before: ('test_nn_BatchNorm1d_not_affine', 2, 3) After: ('test_nn_BatchNorm1d_not_affine', 1, 2) Before: ('test_nn_BatchNorm1d_3d_input_not_affine', 2, 3) After: ('test_nn_BatchNorm1d_3d_input_not_affine', 1, 2) Before: ('test_nn_BatchNorm1d_zero_batch', 2, 3) After: ('test_nn_BatchNorm1d_zero_batch', 1, 2) Before: ('test_nn_BatchNorm2d', 2, 3) After: ('test_nn_BatchNorm2d', 1, 2) Before: ('test_nn_BatchNorm2d_2d_simple_average', 2, 5) After: ('test_nn_BatchNorm2d_2d_simple_average', 1, 4) Before: ('test_nn_BatchNorm2d_momentum', 2, 3) After: ('test_nn_BatchNorm2d_momentum', 1, 2) Before: ('test_nn_BatchNorm2d_not_affine', 2, 3) After: ('test_nn_BatchNorm2d_not_affine', 1, 2) Before: ('test_nn_BatchNorm2d_zero_batch', 2, 3) After: ('test_nn_BatchNorm2d_zero_batch', 1, 2) Before: ('test_nn_BatchNorm3d', 2, 3) After: ('test_nn_BatchNorm3d', 1, 2) Before: ('test_nn_BatchNorm3d_3d_simple_average', 2, 5) After: ('test_nn_BatchNorm3d_3d_simple_average', 1, 4) Before: ('test_nn_BatchNorm3d_momentum', 2, 3) After: ('test_nn_BatchNorm3d_momentum', 1, 2) Before: ('test_nn_BatchNorm3d_not_affine', 2, 3) After: ('test_nn_BatchNorm3d_not_affine', 1, 2) Before: ('test_nn_BatchNorm3d_zero_batch', 2, 3) After: ('test_nn_BatchNorm3d_zero_batch', 1, 2) Before: ('test_nn_Transformer', 127, 467) After: ('test_nn_Transformer', 122, 450) ``` Test Plan: Imported from OSS Differential Revision: D21215652 Pulled By: eellison fbshipit-source-id: 0365fc2e351caca7e1ccaa25428908a26e3f5343	2020-04-28 23:28:18 -07:00
Elias Ellison	45e8451b33	optimize is_float_point calls (#37012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37012 Removes an if statement in `torch.nn.functional.affine_grid` Test Plan: Imported from OSS Differential Revision: D21160755 Pulled By: eellison fbshipit-source-id: 8b030936c9fbdb05b44abc9f254805d102f2acc2	2020-04-28 23:28:12 -07:00
Elias Ellison	cde1350a5d	Add support for generic list constants (#36953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36953 Add support for generic lists as a constant. generic dicts & tuples are already implemented. This is a pretty common pattern and cuts down on the number of non-tensor nodes executed in interpolate tests. Test Plan: Imported from OSS Differential Revision: D21160761 Pulled By: eellison fbshipit-source-id: 1e6b7b25b7580f09067794772d44e615601c60c4	2020-04-28 23:28:07 -07:00
Elias Ellison	c516f84525	[JIT] Add Lower Tuples Call & Run remove mutation after list unrolling (#36829 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36829 This changes the IR complexity from the previous PR for the following tests: ``` ('Name', 'Ifs/Loops', 'non-tensor ops') Before: ('max_unpool1d', 0, 3) After: ('max_unpool1d', 0, 0) Before: ('max_unpool2d', 0, 3) After: ('max_unpool2d', 0, 0) Before: ('max_unpool3d', 0, 4) After: ('max_unpool3d', 0, 0) Before: ('adaptive_max_pool2d', 0, 3) After: ('adaptive_max_pool2d', 0, 0) Before: ('adaptive_max_pool3d', 0, 4) After: ('adaptive_max_pool3d', 0, 0) Before: ('adaptive_avg_pool2d', 0, 3) After: ('adaptive_avg_pool2d', 0, 0) Before: ('adaptive_avg_pool3d', 0, 4) After: ('adaptive_avg_pool3d', 0, 0) Before: ('upsample', 13, 68) After: ('upsample', 4, 28) Before: ('upsample', 13, 68) After: ('upsample', 0, 5) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 13, 67) After: ('interpolate', 4, 27) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 13, 67) After: ('interpolate', 4, 27) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 13, 67) After: ('interpolate', 4, 27) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 13, 67) After: ('interpolate', 4, 27) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 14, 59) After: ('interpolate', 0, 3) Before: ('interpolate', 13, 57) After: ('interpolate', 4, 21) Before: ('interpolate', 14, 59) After: ('interpolate', 0, 3) Before: ('interpolate', 14, 59) After: ('interpolate', 0, 3) Before: ('interpolate', 13, 57) After: ('interpolate', 4, 21) Before: ('interpolate', 14, 59) After: ('interpolate', 0, 3) Before: ('interpolate', 14, 59) After: ('interpolate', 0, 3) Before: ('interpolate', 13, 57) After: ('interpolate', 4, 21) Before: ('interpolate', 14, 59) After: ('interpolate', 0, 3) Before: ('interpolate', 13, 77) After: ('interpolate', 4, 33) Before: ('interpolate', 14, 77) After: ('interpolate', 0, 5) Before: ('interpolate', 14, 77) After: ('interpolate', 0, 5) Before: ('interpolate', 13, 77) After: ('interpolate', 4, 33) Before: ('interpolate', 14, 77) After: ('interpolate', 0, 5) Before: ('interpolate', 14, 77) After: ('interpolate', 0, 5) Before: ('interpolate', 13, 77) After: ('interpolate', 4, 33) Before: ('interpolate', 14, 77) After: ('interpolate', 0, 5) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 14, 68) After: ('interpolate', 0, 4) Before: ('interpolate', 15, 103) After: ('interpolate', 1, 23) Before: ('interpolate', 14, 70) After: ('interpolate', 0, 6) Before: ('interpolate', 15, 103) After: ('interpolate', 1, 21) Before: ('interpolate', 14, 70) After: ('interpolate', 0, 6) Before: ('interpolate', 15, 91) After: ('interpolate', 1, 13) Before: ('interpolate', 14, 59) After: ('interpolate', 0, 3) Before: ('interpolate', 15, 93) After: ('interpolate', 1, 16) Before: ('interpolate', 14, 61) After: ('interpolate', 0, 5) Before: ('interpolate', 15, 111) After: ('interpolate', 1, 28) Before: ('interpolate', 14, 77) After: ('interpolate', 0, 5) Before: ('interpolate', 15, 113) After: ('interpolate', 1, 27) Before: ('interpolate', 14, 79) After: ('interpolate', 0, 7) Before: ('test_nn_AdaptiveMaxPool2d_single', 0, 3) After: ('test_nn_AdaptiveMaxPool2d_single', 0, 0) Before: ('test_nn_AdaptiveMaxPool2d_tuple', 0, 3) After: ('test_nn_AdaptiveMaxPool2d_tuple', 0, 0) Before: ('test_nn_AdaptiveMaxPool3d_single', 0, 4) After: ('test_nn_AdaptiveMaxPool3d_single', 0, 0) Before: ('test_nn_AdaptiveMaxPool3d_tuple', 0, 4) After: ('test_nn_AdaptiveMaxPool3d_tuple', 0, 0) Before: ('test_nn_AdaptiveMaxPool3d_single_nonatomic', 0, 4) After: ('test_nn_AdaptiveMaxPool3d_single_nonatomic', 0, 0) Before: ('test_nn_AdaptiveMaxPool3d_tuple_nonatomic', 0, 4) After: ('test_nn_AdaptiveMaxPool3d_tuple_nonatomic', 0, 0) Before: ('test_nn_AdaptiveAvgPool2d_single', 0, 3) After: ('test_nn_AdaptiveAvgPool2d_single', 0, 0) Before: ('test_nn_AdaptiveAvgPool2d_single_1x1output', 0, 3) After: ('test_nn_AdaptiveAvgPool2d_single_1x1output', 0, 0) Before: ('test_nn_AdaptiveAvgPool2d_tuple', 0, 3) After: ('test_nn_AdaptiveAvgPool2d_tuple', 0, 0) Before: ('test_nn_AdaptiveAvgPool3d_single', 0, 4) After: ('test_nn_AdaptiveAvgPool3d_single', 0, 0) Before: ('test_nn_AdaptiveAvgPool3d_tuple', 0, 4) After: ('test_nn_AdaptiveAvgPool3d_tuple', 0, 0) ``` Test Plan: Imported from OSS Differential Revision: D21160758 Pulled By: eellison fbshipit-source-id: 68ccbf3af74398e8dbad7e6bedb639635dafdb2e	2020-04-28 23:28:02 -07:00
Elias Ellison	cdc0880632	add post unroll optimizations (#36828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36828 This changes ir complexity for the following: ``` ("Name", "Ifs/Loops", "non-tensor ops") Before: ('max_unpool1d', 0, 12) After: ('max_unpool1d', 0, 3) Before: ('max_unpool2d', 0, 22) After: ('max_unpool2d', 0, 3) Before: ('max_unpool3d', 0, 33) After: ('max_unpool3d', 0, 4) Before: ('adaptive_max_pool2d', 0, 6) After: ('adaptive_max_pool2d', 0, 3) Before: ('adaptive_max_pool3d', 0, 9) After: ('adaptive_max_pool3d', 0, 4) Before: ('adaptive_avg_pool2d', 0, 6) After: ('adaptive_avg_pool2d', 0, 3) Before: ('adaptive_avg_pool3d', 0, 9) After: ('adaptive_avg_pool3d', 0, 4) Before: ('instance_norm', 1, 6) After: ('instance_norm', 0, 0) Before: ('group_norm', 1, 6) After: ('group_norm', 0, 0) Before: ('upsample', 13, 71) After: ('upsample', 13, 68) Before: ('upsample', 13, 71) After: ('upsample', 13, 68) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 13, 70) After: ('interpolate', 13, 67) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 13, 70) After: ('interpolate', 13, 67) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 13, 70) After: ('interpolate', 13, 67) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 13, 70) After: ('interpolate', 13, 67) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 14, 60) After: ('interpolate', 14, 59) Before: ('interpolate', 13, 58) After: ('interpolate', 13, 57) Before: ('interpolate', 14, 60) After: ('interpolate', 14, 59) Before: ('interpolate', 14, 60) After: ('interpolate', 14, 59) Before: ('interpolate', 13, 58) After: ('interpolate', 13, 57) Before: ('interpolate', 14, 60) After: ('interpolate', 14, 59) Before: ('interpolate', 14, 60) After: ('interpolate', 14, 59) Before: ('interpolate', 13, 58) After: ('interpolate', 13, 57) Before: ('interpolate', 14, 60) After: ('interpolate', 14, 59) Before: ('interpolate', 13, 82) After: ('interpolate', 13, 77) Before: ('interpolate', 14, 82) After: ('interpolate', 14, 77) Before: ('interpolate', 14, 82) After: ('interpolate', 14, 77) Before: ('interpolate', 13, 82) After: ('interpolate', 13, 77) Before: ('interpolate', 14, 82) After: ('interpolate', 14, 77) Before: ('interpolate', 14, 82) After: ('interpolate', 14, 77) Before: ('interpolate', 13, 82) After: ('interpolate', 13, 77) Before: ('interpolate', 14, 82) After: ('interpolate', 14, 77) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 14, 71) After: ('interpolate', 14, 68) Before: ('interpolate', 15, 106) After: ('interpolate', 15, 103) Before: ('interpolate', 14, 73) After: ('interpolate', 14, 70) Before: ('interpolate', 15, 106) After: ('interpolate', 15, 103) Before: ('interpolate', 14, 73) After: ('interpolate', 14, 70) Before: ('interpolate', 15, 92) After: ('interpolate', 15, 91) Before: ('interpolate', 14, 60) After: ('interpolate', 14, 59) Before: ('interpolate', 15, 94) After: ('interpolate', 15, 93) Before: ('interpolate', 14, 62) After: ('interpolate', 14, 61) Before: ('interpolate', 15, 116) After: ('interpolate', 15, 111) Before: ('interpolate', 14, 82) After: ('interpolate', 14, 77) Before: ('interpolate', 15, 118) After: ('interpolate', 15, 113) Before: ('interpolate', 14, 84) After: ('interpolate', 14, 79) Before: ('test_nn_BatchNorm1d_3d_input', 3, 9) After: ('test_nn_BatchNorm1d_3d_input', 2, 3) Before: ('test_nn_BatchNorm1d_3d_input_not_affine', 3, 9) After: ('test_nn_BatchNorm1d_3d_input_not_affine', 2, 3) Before: ('test_nn_BatchNorm1d_zero_batch', 3, 9) After: ('test_nn_BatchNorm1d_zero_batch', 2, 3) Before: ('test_nn_BatchNorm2d', 3, 13) After: ('test_nn_BatchNorm2d', 2, 3) Before: ('test_nn_BatchNorm2d_2d_simple_average', 3, 15) After: ('test_nn_BatchNorm2d_2d_simple_average', 2, 5) Before: ('test_nn_BatchNorm2d_momentum', 3, 13) After: ('test_nn_BatchNorm2d_momentum', 2, 3) Before: ('test_nn_BatchNorm2d_not_affine', 3, 13) After: ('test_nn_BatchNorm2d_not_affine', 2, 3) Before: ('test_nn_BatchNorm2d_not_tracking_stats', 1, 10) After: ('test_nn_BatchNorm2d_not_tracking_stats', 0, 0) Before: ('test_nn_BatchNorm2d_zero_batch', 3, 13) After: ('test_nn_BatchNorm2d_zero_batch', 2, 3) Before: ('test_nn_BatchNorm3d', 3, 17) After: ('test_nn_BatchNorm3d', 2, 3) Before: ('test_nn_BatchNorm3d_3d_simple_average', 3, 19) After: ('test_nn_BatchNorm3d_3d_simple_average', 2, 5) Before: ('test_nn_BatchNorm3d_momentum', 3, 17) After: ('test_nn_BatchNorm3d_momentum', 2, 3) Before: ('test_nn_BatchNorm3d_not_affine', 3, 17) After: ('test_nn_BatchNorm3d_not_affine', 2, 3) Before: ('test_nn_BatchNorm3d_not_tracking_stats', 1, 14) After: ('test_nn_BatchNorm3d_not_tracking_stats', 0, 0) Before: ('test_nn_BatchNorm3d_zero_batch', 3, 17) After: ('test_nn_BatchNorm3d_zero_batch', 2, 3) Before: ('test_nn_InstanceNorm1d', 1, 6) After: ('test_nn_InstanceNorm1d', 0, 0) Before: ('test_nn_InstanceNorm1d_tracking_stats', 1, 6) After: ('test_nn_InstanceNorm1d_tracking_stats', 0, 0) Before: ('test_nn_InstanceNorm2d', 1, 10) After: ('test_nn_InstanceNorm2d', 0, 0) Before: ('test_nn_InstanceNorm2d_tracking_stats', 1, 10) After: ('test_nn_InstanceNorm2d_tracking_stats', 0, 0) Before: ('test_nn_InstanceNorm3d', 1, 14) After: ('test_nn_InstanceNorm3d', 0, 0) Before: ('test_nn_InstanceNorm3d_tracking_stats', 1, 14) After: ('test_nn_InstanceNorm3d_tracking_stats', 0, 0) Before: ('test_nn_GroupNorm_1d_affine', 1, 6) After: ('test_nn_GroupNorm_1d_affine', 0, 0) Before: ('test_nn_GroupNorm_1d_no_affine_IN', 1, 6) After: ('test_nn_GroupNorm_1d_no_affine_IN', 0, 0) Before: ('test_nn_GroupNorm_1d_no_affine_LN', 1, 6) After: ('test_nn_GroupNorm_1d_no_affine_LN', 0, 0) Before: ('test_nn_GroupNorm_2d_affine', 1, 10) After: ('test_nn_GroupNorm_2d_affine', 0, 0) Before: ('test_nn_GroupNorm_2d_no_affine_IN', 1, 10) After: ('test_nn_GroupNorm_2d_no_affine_IN', 0, 0) Before: ('test_nn_GroupNorm_2d_no_affine_LN', 1, 10) After: ('test_nn_GroupNorm_2d_no_affine_LN', 0, 0) Before: ('test_nn_AdaptiveMaxPool2d_single', 0, 6) After: ('test_nn_AdaptiveMaxPool2d_single', 0, 3) Before: ('test_nn_AdaptiveMaxPool2d_tuple', 0, 6) After: ('test_nn_AdaptiveMaxPool2d_tuple', 0, 3) Before: ('test_nn_AdaptiveMaxPool3d_single', 0, 9) After: ('test_nn_AdaptiveMaxPool3d_single', 0, 4) Before: ('test_nn_AdaptiveMaxPool3d_tuple', 0, 9) After: ('test_nn_AdaptiveMaxPool3d_tuple', 0, 4) Before: ('test_nn_AdaptiveMaxPool3d_single_nonatomic', 0, 9) After: ('test_nn_AdaptiveMaxPool3d_single_nonatomic', 0, 4) Before: ('test_nn_AdaptiveMaxPool3d_tuple_nonatomic', 0, 9) After: ('test_nn_AdaptiveMaxPool3d_tuple_nonatomic', 0, 4) Before: ('test_nn_AdaptiveAvgPool2d_single', 0, 6) After: ('test_nn_AdaptiveAvgPool2d_single', 0, 3) Before: ('test_nn_AdaptiveAvgPool2d_single_1x1output', 0, 6) After: ('test_nn_AdaptiveAvgPool2d_single_1x1output', 0, 3) Before: ('test_nn_AdaptiveAvgPool2d_tuple', 0, 6) After: ('test_nn_AdaptiveAvgPool2d_tuple', 0, 3) Before: ('test_nn_AdaptiveAvgPool3d_single', 0, 9) After: ('test_nn_AdaptiveAvgPool3d_single', 0, 4) Before: ('test_nn_AdaptiveAvgPool3d_tuple', 0, 9) After: ('test_nn_AdaptiveAvgPool3d_tuple', 0, 4) ``` Test Plan: Imported from OSS Differential Revision: D21160759 Pulled By: eellison fbshipit-source-id: 91ca6ef2269ee364ca354c8d0843847744145d25	2020-04-28 23:27:57 -07:00
Elias Ellison	92129956cf	Add size peephole optimziation (#36758 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36758 Test Plan: Imported from OSS Differential Revision: D21160760 Pulled By: eellison fbshipit-source-id: 9cdb8eeffa71fb4670a811347ae4fad2a82ae1d8	2020-04-28 23:27:52 -07:00
Elias Ellison	0c3a6f941f	disable peephole optimizations that require alias db (#36757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36757 replacing x + 0 with x isn't that much of a speedup, and is an optimization also duplicated at the Tensor Expr level. Constructing an alias db is costly and it's not worth rebuilding an alias db each time we optimize out x + 0. Test Plan: Imported from OSS Differential Revision: D21160757 Pulled By: eellison fbshipit-source-id: 9b3d4fa430b838898fe6c78660ec3c608547bb31	2020-04-28 23:26:33 -07:00
kshitij12345	4e3dc34c47	add complex support to `reciprocal_cuda` kernel (#36749 ) Summary: dylanbespalko anjali411 Not sure if the test should be added to `test_torch` or `test_complex`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36749 Differential Revision: D21290529 Pulled By: anjali411 fbshipit-source-id: 07bc282e4c9480cd015ec5db104e79728437cd90	2020-04-28 21:51:46 -07:00
James Reed	fd4a09ea73	[WIP] Bind in CellParams for RNN (#35787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35787 Test Plan: Imported from OSS Differential Revision: D20784118 Pulled By: jamesr66a fbshipit-source-id: 5d8f7e1502f707bff9a9aefa90e3edfb3429549b	2020-04-28 21:47:06 -07:00
Michael Ranieri	74c00b1f69	move to explicit avx2 switching (#37207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37207 The main idea here is to try and give build system more flexibility on when various AVX instruction are defined, previously it was based solely on compiler defined preprocessor flags. Here we re-use `CPU_CAPABILITY` which already needs to be defined for each pass in `["DEFAULT", "AVX", "AVX2"]` over the source files. To give a slightly more concrete reason this is needed, is that we have not found a way to override `/arch` flags previously specified on the command line from visual studio (causing us to duplicate symbols in some cases). Test Plan: CI green Differential Revision: D21218512 fbshipit-source-id: f628153f5f3d83cd6bd4a5283fb0dc751a58ebf9	2020-04-28 21:45:34 -07:00
lixinyu	21b7af1e7b	allow inplace leaky_relu backward calc when slope == 0 (#37453 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37453 to fix (#37345) Test Plan: Imported from OSS Differential Revision: D21290911 Pulled By: glaringlee fbshipit-source-id: 81677e9e195298bc1bde82b77c51f52d58aa5422	2020-04-28 21:42:33 -07:00
Jerry Zhang	facdd15cc6	[quant] Finishing refactor for quantization test files (#37366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37366 - we can put both fake quant module and observer module tests in the test_workflow_module.py - added test_quantized_functional.py - moved tests in test_numerics.py to test_quantize.py and removed test_numerics.py Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D21282198 fbshipit-source-id: 60107cee7d1ed2cd14a45650e91ec28b8a262c52	2020-04-28 21:40:57 -07:00
Supriya Rao	e69115ec52	[quant][graph] Add JIT passes for dynamic quant multi uses of quant node (#37125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37125 For dynamic quant we need to replicate the choose_qparams and quantize function in addition to replicating dequant. RemoveRedundantQuantizeOps pass checks for the choose_qparams - quant - dequant pattern in the graph and removes it if the node following it cannot be quantized using dynamic quantization. Test Plan: python test_quantize_script.py test_dynamic_quant_multi_uses Imported from OSS Differential Revision: D21283697 fbshipit-source-id: 70fa0abdaeb2cc2935149a941d93a7e8b28d61d3	2020-04-28 21:36:55 -07:00
Supriya Rao	3b9ddab093	[quant][graph] Run dynamic quantization for specific ops (#37093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37093 Specify which ops should/can be dynamically quantized. Similar to static quantization Test Plan: python test_quantize_script.py test_dynamic_multi_op Imported from OSS Differential Revision: D21283695 fbshipit-source-id: 7ee238940c5c239f6ef8af994655e0b13db64161	2020-04-28 21:36:50 -07:00
Supriya Rao	9dab3ed5c6	[graph][quant] Enable accessing child/grandchild modules in forward (#37045 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37045 Fixes to get the correct path for child modules Test Plan: Imported from OSS Differential Revision: D21283698 fbshipit-source-id: 48a7f7762df86a5177ea117ab0cd7cb1d6e6209d	2020-04-28 21:36:46 -07:00
Supriya Rao	e55d2e6fa6	[quant][graph] Add check for qconfig_dict key (#37014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37014 User should only pass name as key in dict. Test Plan: Imported from OSS Differential Revision: D21283696 fbshipit-source-id: e6babbe9302c812d6ae03ed7f843d2816b752e78	2020-04-28 21:35:17 -07:00
Michael Suo	92b9089fd9	[jit] Fix pretty printing of functions (#37432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37432 Fixes https://github.com/pytorch/pytorch/issues/36803. Test Plan: Imported from OSS Differential Revision: D21284735 Pulled By: suo fbshipit-source-id: 8c673099b3171070bff80fd1defc91487f66d4b3	2020-04-28 21:30:49 -07:00
Nikita Shulga	07bb442b24	Move DistributonTemplates to anonymous namespace (#37429 ) Summary: All templates which are included from `ATen/native/cpu` must be in anonymous namespace, especially if they are using instruction set extensions but do not support dynamic dispatching. Otherwise, linker is free to pick AVX2, AVX or DEFAULT version of instantiated templates during final linking stage Test Plan; Apply on top of https://github.com/pytorch/pytorch/pull/37121 and make sure that `basic` test successfully finishes on CircleCI MacPro (that does not support AVX2), but `ATEN_CPU_CAPABILITY=avx2 ./basic --gtest_filter=*HalfCPU` crashes with illegal instruction Pull Request resolved: https://github.com/pytorch/pytorch/pull/37429 Differential Revision: D21294818 Pulled By: malfet fbshipit-source-id: ab32b8553de225d2f672fac2f48591682bd7dec4	2020-04-28 20:20:54 -07:00
Sebastian Messmer	12f5a32863	Don't use NonVariableTypeMode in custom ops (#37355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37355 Potentially fixes https://github.com/pytorch/pytorch/issues/37306 ghstack-source-id: 103073537 Test Plan: waitforsandcastle Differential Revision: D21261946 fbshipit-source-id: 454652b528dcf942bec5438f89201822de40bbf0	2020-04-28 20:11:31 -07:00
Nikolay Korovaiko	edc5ef1afb	run the simple executor for jit tests by default, add profiling jobs … (#37017 ) Summary: …for fusion tests fix flake8 warnings fix ci failures fix test_determination.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/37017 Differential Revision: D21238446 Pulled By: Krovatkin fbshipit-source-id: 393e6135883dc5ac57bdff580de96c66829d454c	2020-04-28 19:16:52 -07:00
Jerry Zhang	6fa76b8a0c	[jit] __deepcopy__ for `RecursiveScriptModule` (#32684 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32684 Previously we have `clone` and `clone_instance`, where `clone` will clone both type and value, and `clone_instance` only clone the value, both of them are shallow copies. We need to re-evaluate whether we should expose them as a user facing API. I think we should hide `clone`, but `clone_instance` might be useful as well, especially when we are copying a model with very large weights, people might just want to do shallow copy. This PR adds a `deepcopy` that might be useful as a user API, which deep copies the values, including Tensor, but we didn't deepcopy `Blob`, `Capsule`, `Future` or `PyObject`. For more discussions please see the following issue. fixes: https://github.com/pytorch/pytorch/issues/32519 Test Plan: Imported from OSS Differential Revision: D21220756 fbshipit-source-id: 476bf11fe82c08fac36e7457879a09f545ffdc5e	2020-04-28 18:47:11 -07:00
peter	e5a24a6389	Retry anaconda upload (#37414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37414 Differential Revision: D21294546 Pulled By: seemethere fbshipit-source-id: f9ee6211a0cd1b4f809ac6d3acfebfb74fbe8a2b	2020-04-28 18:24:45 -07:00
Emilio Castillo	273c464145	Fix `TensorIterator::view_offsets_` size (#37214 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37084 There are 3 alternatives for this design. This PR and the first one. When a tensor is a scalar `ndim==0`, accessing view_offsets_[0] when doing reductions, yields an invalid offset for the index which is the output of `argmax` and `argmin`. `fba9b9a023/aten/src/ATen/native/cpu/Reduce.h (L217)` This also happens in cuda code: `fba9b9a023/aten/src/ATen/native/cuda/Reduce.cuh (L797)` The second alternative is to check the size of `view_offsets` before accessing it. But this introduces some burden. The third alternative is related to the way that inputs are treated in `argmax` and `argmin` depending on the `dim` argument value. `fba9b9a023/aten/src/ATen/native/ReduceOps.cpp (L775-L780)` If `dim` is not specified, then the scalar gets reshaped into a 1-dim tensor and everything works properly, since now `view_offsets` has an actual entry. If dim is specified, then the input remains as a scalar causing the issue we see here. This PR tries to solve it in a generic way for every case so I went with option 1. I am willing to discuss it and change if you think that the other alternatives are better. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37214 Differential Revision: D21258320 Pulled By: ngimel fbshipit-source-id: 46223412187bbba4bfa7337e3f1d2518db72dea2	2020-04-28 18:08:51 -07:00
Mike Ruberry	dcd8a1b399	Revert D21286660: [quant] Generalizing _calculate_dynamic_qparams in quantized test Test Plan: revert-hammer Differential Revision: D21286660 Original commit changeset: 98d90cdb34ac fbshipit-source-id: a4194193c9aa53fb2dc9bbc04fde9c2925aa378f	2020-04-28 18:01:44 -07:00
Zeeshan Siddiqui	6c0f447b51	Remove ONNX BatchNorm(12) test and converter. (#37309 ) Summary: Pursuant to https://github.com/onnx/onnx/pull/2750 we must remove PyTorch ONNX exporter related changes to BatchNorm(12) that were introduced as part of https://github.com/pytorch/pytorch/pull/35567. This change is also needed to unblock ONNX [BUILD CI failures](https://circleci.com/gh/onnx/onnx/4629?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link) caused by PyTorch/Caffe2 tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37309 Reviewed By: hl475 Differential Revision: D21288914 Pulled By: houseroad fbshipit-source-id: 15b076a2af55918dcd57f4e2fc77accd3d1510bd	2020-04-28 17:45:01 -07:00
Jiakai Liu	8258d42bd0	[pytorch] add '__BASE__' section to op deps to factor out frequently used util ops (#37404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37404 Many aten operators are really like util functions, e.g.: aten::is_nonzero, aten::is_floating_point, etc. These ops can be called via overloaded c++ operator, so seemingly trivial and innocent code changes can affect how these ops are used by other ops (thus changes the output of static analyzer). Most of these util ops are rather small in terms of build size cost, so for the purpose of optimizing binary size with custom build, whether to include these ops or not does not make significant difference. In fact for non-trivial models a set of these ops are almost always used. This PR introduced the (optional) '__BASE__' ops section to the dependency graph. We can maintain the list of frequently used small util ops for internal BUCK build. This way, the output dependency graph will only contain meaningful edges with significant binary size impact, and it will be more stable from trivial code changes (which is checked in FB codebase). Having a stable and sparse deps graph by factoring out frequently used based ops is also a nice property to allow us to explore alternative custom build solutions in case we find it hard to maintain the static code analyzer. Test Plan: Imported from OSS Differential Revision: D21280835 Pulled By: ljk53 fbshipit-source-id: c4d0d1f07ca868c60f23118d877fc1eeead4c875	2020-04-28 17:18:09 -07:00
Jiakai Liu	e0a5b443d6	[pytorch] remove unused flags from code analyzer & move format support to python (#37393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37393 Simplify the code analyzer by removing some unused flags and moving the different format printer logic to python script. It's easier to add other post processing logic to adapt to different BUCK build configs. Test Plan: Imported from OSS Differential Revision: D21280836 Pulled By: ljk53 fbshipit-source-id: 0d66d5891d850f012c4ab4f39eabbd9aecc1caa9	2020-04-28 17:16:55 -07:00
Zafar Takhirov	239ce75a74	[quant] Generalizing _calculate_dynamic_qparams in quantized test (#37451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37451 Test Plan: Imported from OSS Differential Revision: D21286660 Pulled By: z-a-f fbshipit-source-id: 98d90cdb34ac3d0ef33f7ebe1c9f32001d4e80b6	2020-04-28 16:55:51 -07:00
Raghuraman Krishnamoorthi	024f663fc1	Resubmit "Fix NaN error in dynamic quantization in qLinear, re-enable test_quantized_rnn"" (#37458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37458 Original commit changeset: 0204c360ef2c ghstack-source-id: 103069356 Test Plan: buck test caffe2/test:quantization -- 'test_quantized_rnn $quantization\.test_quantization\.PostTrainingDynamicQuantTest$' --print-passing-details --run-disabled Reviewed By: jamesr66a Differential Revision: D21287904 fbshipit-source-id: a60deb8bdcf6af4258d1c1b199fa2a5a90318528	2020-04-28 16:18:15 -07:00
Hao Lu	5b6f6da18c	[caffe2] Copy tensor in single tensor input case in UnPackRecordsOp (#37454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37454 Fix a bug introduced in D21224497. In the case of having a single unpacked tensor as input, we still need to copy the underline memory because only inputs are guaranteed to be read-only. The output could be overwritten later during inference. If we share the tensor, we could potentially overwrite the input, which in principle should be read only. Test Plan: ``` buck test caffe2/caffe2/python/operator_test:dataset_ops_test ``` AdIndexer canary: https://our.intern.facebook.com/intern/ads/canary/426290361213982683 Reviewed By: yinghai Differential Revision: D21274309 fbshipit-source-id: 71931d4b1afbdc700ba070ea618d1679f1bbe5a7	2020-04-28 15:37:11 -07:00
Hong Xu	d1a39815f9	Remove Python 2 string compatibility in ATen/function_wrapper.py (#37388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37388 Differential Revision: D21285638 Pulled By: ezyang fbshipit-source-id: 0cd524a5c000581d6fb3d1dd299191a4cbf19766	2020-04-28 14:23:50 -07:00
Negin Raoof	580928801f	[ONNX] Adding 'numel' and 'to' export for script module (#36501 ) Summary: These two ops are needed for torchvision model export. Since we're scripting a part of the code for dynamic export of models (in https://github.com/pytorch/vision/pull/2052), these two changes are requited. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36501 Reviewed By: hl475 Differential Revision: D21260721 Pulled By: houseroad fbshipit-source-id: 86d9d38665a4a36d22cec741012d976e5bd8d36b	2020-04-28 12:35:45 -07:00
Peter Bell	a51f047c7e	Synchronize MAGMA functions with the current CUDA stream (#36605 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21821 This follows ngimel's [suggestion](https://github.com/pytorch/pytorch/issues/21821#issuecomment-502968982) to manually synchronize MAGMA calls with the current stream. This is handled automatically with `MagmaStreamSyncGuard`. I think for the functions with `_batched` variants we could possibly avoid synchronisation by using a batch of size 1 since these have a `magma_queue_t` argument. However, I presume there's a reason it wasn't written like that in the first place. I also figured out why porting to aten ["magically fixed"](https://github.com/pytorch/pytorch/issues/21821#issuecomment-527647971) `torch.svd`. The magma functions for svd all take host arrays as input and output. The ATen port uses blocking `copy_`s which fully synchronize the operation. On the other hand, the THC functions use `cudaMemcpy` which doesn't synchronize with streams created with `cudaStreamNonBlocking` (which `aten` does). The fix is to use `cudaMemcpyAsync` and `cudaStreamSynchronize`, the same as `copy_` does internally: `835ee34e38/aten/src/ATen/native/cuda/Copy.cu (L192-L193)` I'm not sure how to test these changes as I wasn't able to reproduce any of the stream sync issues. Possibly a mixture of non-determinism and because some of these functions are implicitly synchronous anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36605 Differential Revision: D21258265 Pulled By: ngimel fbshipit-source-id: 76d8f687c605e5e9cd68b97dc1d70a39a13376ec	2020-04-28 11:24:44 -07:00
Ilia Cherniavskii	d068a456d3	[resubmit] Enable global observers API (#37382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37382 After adding c10::DispatchKey::Profiler the behavior of RecordFunction observers is also controlled by the dispatch key, this PR moves the logic outside of the profiler into the record function Reviewed By: jamesr66a Differential Revision: D21268320 fbshipit-source-id: 93207e3b55325d20dcc5b1e8f448ab86933321da	2020-04-28 10:49:31 -07:00
Bram Wasti	4234d62489	[hotfix] Workaround for older versions of ninja (#37417 ) Summary: Older versions of ninja don't like relative paths in configure_file when it is called twice. https://gitlab.kitware.com/cmake/cmake/issues/17601 Fix suggested in comments https://gitlab.kitware.com/cmake/cmake/-/issues/18584 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37417 Reviewed By: malfet Differential Revision: D21280141 Pulled By: bwasti fbshipit-source-id: 4cb94996a9e8ae8c01602ea1da6f4ce9d61fa700	2020-04-28 09:03:51 -07:00
peter	c5d6f59ab1	Replacing EHa with EHsc (#37235 ) Summary: We should not rely on the async exceptions. Catching C++ only exception is more sensible and may get a boost in both space (1163 MB -> 1073 MB, 0.92x) and performance(51m -> 49m, 0.96x). Pull Request resolved: https://github.com/pytorch/pytorch/pull/37235 Differential Revision: D21256918 Pulled By: ezyang fbshipit-source-id: 572ee96f2e4c48ad13f83409e4e113483b3a457a	2020-04-28 08:20:37 -07:00
Robert Porter	8fe2a5e91b	Fixes type annotations for named tensors #27846 (#36890 ) Summary: This enables type checking for named tensors, and fixes the underlying problems. The bulk of the fix is modifying `gen_pyi.py` to generate reasonable types in `torch/__init__.pyi`. I took two approaches: First, I tried to take a generic approach and added `DimnameList` to the magic list of variable argument lists. Unfortunately that was insufficient for many of the method signatures, so I also added manual definitions for `rename`, `refine_names`, and `unflatten` in `__init__.pyi.in`. Finally there were a few problems in the doctests that had to be cleaned up so that `test/test_type_hints.py` will run successfully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36890 Differential Revision: D21259192 Pulled By: zou3519 fbshipit-source-id: 2a9e7d7bec9be5ae3ae2995078c6abfa3eca103c	2020-04-28 06:51:22 -07:00
Nikita Shulga	ebcacd5e87	[Bazel] Build `ATen_CPU_AVX2` lib with AVX2 arch flags enabled (#37381 ) Summary: Make sleef dependency public so that `ATen_CPU_{capability}` libs can depend on it Pull Request resolved: https://github.com/pytorch/pytorch/pull/37381 Test Plan: CI Differential Revision: D21273443 Pulled By: malfet fbshipit-source-id: 7f756c7f3c605e51cf0c27ea37f687913cd48708	2020-04-27 22:49:37 -07:00
Rohan Varma	b37080d97a	remove record_function_enter and record_function_exit from header (#37052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37052 These only need to be in the cpp as they are not referenced anywhere else. These functions should only be used from the python operators torch.opts.profiler.record_function_{enter, exit}. ghstack-source-id: 102979051 Test Plan: CI Differential Revision: D21171987 fbshipit-source-id: dfe8130d2b64de6179222327069ce1ab877829e3	2020-04-27 21:21:34 -07:00
Hao Lu	48b126f496	[caffe2] Fast path for single tensor in UnPackRecordsOp (#37361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37361 Add a fast path for the case of batch_size = 1 and single ad embedding in UnPackRecordsOp. In this case, there is no need to pack the single tensor into a shared_ptr<vector<vector<Tensor>>> and then unpack it in UnPackRecordsOp. Instead, we can just pass the tensor as it is into UnPackRecordsOp and share the data with the output tensor. Reviewed By: yinghai Differential Revision: D21224497 fbshipit-source-id: 70685e5cc20ffdc5e0044a4b97a7fc5133786db4	2020-04-27 20:50:21 -07:00
Ilia Cherniavskii	da64ed14f6	Reduce volume of spammy warning (#37360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37360 title Test Plan: CI Reviewed By: jamesr66a Differential Revision: D21263320 fbshipit-source-id: 590da265ca1f7d4e19c28c47ffbad9b6af6fdc2f	2020-04-27 19:45:51 -07:00
Rohan Varma	4ff4119d45	[rpc] Move _set_rpc_backand and RpcBackendOptions to use float instead of timedelta (#37027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37027 The RPC timeout passed into rpc_sync and rpc_async after the below change is now float, so we should make these APIs consistent. ghstack-source-id: 102971906 Test Plan: Existing unittests, also added unittest testing specific timeout set in ProcessGroupRpcBackendOptions and the dispatch rpc backend options handling. Differential Revision: D21125171 fbshipit-source-id: a5894b8ce31d2926f2c3d323d1cda4d54b30cef1	2020-04-27 19:38:06 -07:00
Mikhail Zolotukhin	5a59bbc1da	[TensorExpr] IRPrinter: show output_args separate from reduce_args when printing ReduceOp. (#37367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37367 Before this change we printed all the args in the same list, for example: ``` BEFORE RFACTOR: { for (int m = 0; m < m_1; m++) { for (int n = 0; n < n_1; n++) { sum[0] = ReduceOp(sum, float(0), (sum[0]) + (b[m, n]), {m, n}); } } } AFTER RFACTOR: { for (int m = 0; m < m_1; m++) { for (int n = 0; n < n_1; n++) { tmp_buf[n] = ReduceOp(tmp_buf, float(0), (tmp_buf[n]) + (b[m, n]), {nm}); # <<< n is out, m is reduce here } } for (int n = 0; n < n_1; n++) { sum[0] = ReduceOp(sum, float(0), (sum[0]) + (tmp_buf[n]), {n}); } } ``` With this change we explicitly show which args are reduce args: ``` BEFORE RFACTOR: { for (int m = 0; m < m_1; m++) { for (int n = 0; n < n_1; n++) { sum[0] = ReduceOp(sum, float(0), (sum[0]) + (b[m, n]), out_args={}, reduce_args={m, n}); } } } AFTER RFACTOR: { for (int m = 0; m < m_1; m++) { for (int n = 0; n < n_1; n++) { tmp_buf[n] = ReduceOp(tmp_buf, float(0), (tmp_buf[n]) + (b[m, n]), out_args={n}, reduce_args={m}); } } for (int n = 0; n < n_1; n++) { sum[0] = ReduceOp(sum, float(0), (sum[0]) + (tmp_buf[n]), out_args={}, reduce_args={n}); } } ``` Test Plan: Imported from OSS Differential Revision: D21265807 Pulled By: ZolotukhinM fbshipit-source-id: 384396cd55562570f8e33657b856a4404d451080	2020-04-27 18:49:29 -07:00
Michael Suo	a4383266f0	Revert D21262421: [pytorch][PR] [doc] Fix JIT code highlighting Test Plan: revert-hammer Differential Revision: D21262421 Original commit changeset: 4fb62cce9543 fbshipit-source-id: 4e852e178a2469d94ddbf8ee18903ed8cebd4906	2020-04-27 18:30:18 -07:00
Igor Sugak	f1e89fbe53	[pytorch] add missing host-device attribute to fix clang build (#37358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37358 Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //vision/fair/detectron2/tools:benchmark ``` Reviewed By: ngimel Differential Revision: D21262235 fbshipit-source-id: 00633352d87da0881b2cc90759265fa0d0bd96be	2020-04-27 18:24:20 -07:00
Zafar Takhirov	fae87908d9	Back out "Fix NaN error in dynamic quantization in qLinear, re-enable test_quantized_rnn" Summary: Original commit changeset: 948a888f5516 (Note: this ignores all push blocking failures!) Test Plan: Revert Reviewed By: jamesr66a Differential Revision: D21269147 fbshipit-source-id: 0204c360ef2c3f28c2b2fbe367eb4cfd77f717c4	2020-04-27 18:01:44 -07:00
James Reed	cf41f6bed1	Fix record_function (#37364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37364 Test Plan: Imported from OSS Reviewed By: ilia-cher Differential Revision: D21265347 Pulled By: jamesr66a fbshipit-source-id: f2fe2f6879ea220501fc518977cdb6a6d3529f87	2020-04-27 17:56:10 -07:00
Nik Ved	ed0a572eed	Migrate `scatter` and `scatter_` from the TH to Aten (CUDA) (#35697 ) Summary: Fixes [24621](https://github.com/pytorch/pytorch/issues/24621). Some preliminary results: ## Case 1: dense indexing ```python import torch import numpy from IPython import get_ipython numpy.random.seed(13) torch.manual_seed(13) ipython = get_ipython() Ms=1024 * 8 Ns=1024 * 4 dim = 0 top_power = 4 for pM in range(top_power): M = Ms * (2 ** pM) for pN in range(top_power): N = Ns * (2 ** pN) input_ = torch.rand(M, N, device=torch.device('cuda')) src = torch.rand(M, N, device=torch.device('cuda')) index = torch.tensor(numpy.random.randint(0, min(M, N), (M, N)), device=torch.device('cuda') ) print(f"Problem size (MxN): {M}x{N}") ipython.magic("timeit input_.scatter_(0, index, src); torch.cuda.synchronize()") ipython.magic("timeit input_.scatter_(1, index, src); torch.cuda.synchronize()") ``` ### TH ``` Problem size (MxN): 8192x4096 11.5 ms ± 4.52 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 1.21 ms ± 1.23 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) Problem size (MxN): 8192x8192 24.1 ms ± 2.69 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 2.49 ms ± 26.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 8192x16384 48.5 ms ± 4.33 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 5.3 ms ± 23 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 8192x32768 97.5 ms ± 3.82 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 12.2 ms ± 21.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 16384x4096 22.9 ms ± 1.96 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 2.43 ms ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 16384x8192 48.2 ms ± 3.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 5.03 ms ± 13 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 16384x16384 97.6 ms ± 5.54 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 10.2 ms ± 7.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 16384x32768 196 ms ± 8.61 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 20.2 ms ± 160 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) Problem size (MxN): 32768x4096 45.8 ms ± 4.11 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 4.85 ms ± 6.77 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 32768x8192 96.4 ms ± 3.98 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 10 ms ± 6.25 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 32768x16384 195 ms ± 7.16 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 20.3 ms ± 161 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) Problem size (MxN): 32768x32768 391 ms ± 36.3 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 40.7 ms ± 166 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) Problem size (MxN): 65536x4096 91.5 ms ± 5.9 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 9.65 ms ± 3.93 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 65536x8192 192 ms ± 9.94 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 20.1 ms ± 36 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 65536x16384 390 ms ± 26.9 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 40.7 ms ± 207 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) Problem size (MxN): 65536x32768 783 ms ± 33.7 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 86.9 ms ± 76.1 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` ### ATen ``` Problem size (MxN): 8192x4096 [49/1095] 12 ms ± 3.71 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 1.19 ms ± 236 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) Problem size (MxN): 8192x8192 25.1 ms ± 3.91 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 2.38 ms ± 17.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 8192x16384 50.6 ms ± 2.21 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 4.62 ms ± 15.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 8192x32768 102 ms ± 5.16 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 9.26 ms ± 20.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 16384x4096 23.9 ms ± 3.01 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 2.37 ms ± 7.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 16384x8192 50.2 ms ± 3.08 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 4.83 ms ± 8.46 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 16384x16384 102 ms ± 3.97 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 9.79 ms ± 6.48 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 16384x32768 204 ms ± 20.7 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 19.1 ms ± 32.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 32768x4096 47.8 ms ± 2.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 4.72 ms ± 8.92 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 32768x8192 100 ms ± 4.53 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 9.68 ms ± 7.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 32768x16384 203 ms ± 21.1 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 19.6 ms ± 17.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 32768x32768 408 ms ± 19.6 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 39.4 ms ± 229 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) Problem size (MxN): 65536x4096 95.6 ms ± 3.77 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 9.41 ms ± 735 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 65536x8192 201 ms ± 12.6 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 19.3 ms ± 21.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) Problem size (MxN): 65536x16384 407 ms ± 16.2 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 39.2 ms ± 207 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) Problem size (MxN): 65536x32768 816 ms ± 40.3 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 78.4 ms ± 45.5 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` ## Case 2: sparse indexing ```python import torch from IPython import get_ipython ipython = get_ipython() torch.set_num_threads(1) device = 'cuda' nrows = 100000 ncols = 100000 dims = [nrows, ncols] res = torch.randn(dims, device=device) idx1 = torch.randint(dims[0], (1, dims[1]), device=device).long() src1 = torch.randn(1, dims[1], device=device) idx2 = torch.randint(dims[1], (dims[0], 1), device=device).long() src2 = torch.randn(dims[0], 1, device=device) ipython.magic("timeit res.scatter_(0, idx1, src1); torch.cuda.synchronize()") ipython.magic("timeit res.scatter_(1, idx2, src2); torch.cuda.synchronize()") ``` ### TH ``` 199 µs ± 609 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 43.3 µs ± 95.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` ### ATen ``` 199 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 119 µs ± 3.21 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` ## Case 3: many-to-one, one-to-many ```python import torch from IPython import get_ipython ipython = get_ipython() torch.set_num_threads(1) device = 'cuda' nfeat = 10000 nrep = 5 a=torch.arange(nfeat, device=device).repeat_interleave(nrep) batch=3 #batch can vary 1-200 res = torch.randn(100000, 100000, device=device) for batch in [100, 500, 1000, 5000, 10000]: print("Batch: ", batch) c=torch.randint(3, (batch, nfeat * nrep), device=device).float() ipython.magic("timeit res.scatter_(1,a.unsqueeze(0).expand(batch,a.size(0)),c); torch.cuda.synchronize()") enum_values = [ list(range(1, 201)), list(range(1000, 1020)), list(range(2000, 2010)), list(range(3000, 3206)), ] indices = torch.tensor([i for i, values in enumerate(enum_values) for _j in range(len(values))], device=device) c = torch.randint(3, (batch, 4), device=device).float() idx = indices.unsqueeze(0).expand(c.size(0), indices.size(0)) src = c.repeat(1, idx.shape[-1] // c.shape[-1]) ipython.magic("timeit res.scatter_(1,idx,src); torch.cuda.synchronize()") print() ``` ### TH ``` Batch: 100 119 µs ± 287 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 14.7 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) Batch: 500 534 µs ± 2.24 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 16.4 µs ± 21.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) Batch: 1000 1.06 ms ± 2.96 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 20.6 µs ± 53.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Batch: 5000 5.28 ms ± 15.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 56.3 µs ± 93.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Batch: 10000 10.6 ms ± 20.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 101 µs ± 148 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` ### ATen ``` Batch: 100 63.9 µs ± 501 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 13.5 µs ± 350 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) Batch: 500 241 µs ± 535 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 14.8 µs ± 332 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) Batch: 1000 468 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 16.7 µs ± 381 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) Batch: 5000 2.27 ms ± 5.59 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 31.1 µs ± 146 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Batch: 10000 4.52 ms ± 5.54 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 54 µs ± 82.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) ``` ## Correctness (passed) ```python import torch import numpy from IPython import get_ipython numpy.random.seed(13) torch.manual_seed(13) ipython = get_ipython() Ms=1024 * 2 Ns=1024 * 2 dim = 0 top_power = 5 for pM in range(top_power): M = Ms * (2 ** pM) for pN in range(top_power): N = Ns * (2 ** pN) input_ = torch.rand(M, N, device=torch.device('cuda')) input_clone_ = input_.clone() #src = torch.rand(M, N, device=torch.device('cuda')) src = torch.ones(M, N, device=torch.device('cuda')) index = torch.tensor(numpy.random.randint(0, min(M, N), (M, N)), device=torch.device('cuda') ) other_index1 = torch.arange(0, N, device=torch.device('cuda')).repeat(M, 1) other_index0 = torch.arange(0, M, device=torch.device('cuda')).repeat(N, 1).t() print(f"Problem size (MxN): {M}x{N}") #ipython.magic("timeit input_.scatter_(0, index, src); torch.cuda.synchronize()") #ipython.magic("timeit input_.scatter_(1, index, src); torch.cuda.synchronize()") input_.scatter_(0, index, src) input_clone_.index_put_((index, other_index1), src); assert((input_ == input_clone_).all()) input_.scatter_(1, index, src) input_clone_.index_put_((other_index0, index), src); assert((input_ == input_clone_).all()) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35697 Differential Revision: D21258380 Pulled By: ngimel fbshipit-source-id: aebf01474cc9caf0a1dc1041ca6b753e3981df2e	2020-04-27 17:32:03 -07:00
anjali411	b8ec165c0d	Fix failing test in test_torch.py (#37362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37362 Differential Revision: D21264829 Pulled By: anjali411 fbshipit-source-id: cec6af84630378f03cb3863c85e161776af236cd	2020-04-27 16:42:11 -07:00
Michael Suo	20143e5f27	Revert D21245094: [resubmit] Enable global observers API Test Plan: revert-hammer Differential Revision: D21245094 Original commit changeset: 595e41b18206 fbshipit-source-id: 90344b361857d76ce5db75438c949dad1f5f186b	2020-04-27 16:19:46 -07:00
Nikita Shulga	d294c06287	Fetch TORCH_PYTHON_SRCS filelists from build_variables (#37267 ) Summary: In build-variables.bzl split filelist into `libtorch_python_core_sources` and `libtorch_python_distributed_sources` Move jit passes from `glob_libtorch_python_sources()` to `libtorch_core_jit_sources` filelist Validated that original `TORCH_PYTHON_SRCS` filelist matches one in `build_varaiables.bzl` by running the following script: ``` import os def read_file(path): with open(path) as f: return f.read() def get_cmake_torch_python_srcs(): caffe2_cmake = read_file("torch/CMakeLists.txt") start = caffe2_cmake.find("set(TORCH_PYTHON_SRCS") end = caffe2_cmake.find(")", start) return caffe2_cmake[start:end+1] def get_cmake_torch_python_srcs_list(): _srcs = get_cmake_torch_python_srcs() unfiltered_list = [x.strip() for x in _srcs.split("\n") if len(x.strip())>0] return [x.replace("${TORCH_SRC_DIR}/","torch/") for x in unfiltered_list if 'TORCH_SRC_DIR' in x] import imp build_variables = imp.load_source('build_variables', 'tools/build_variables.bzl') libtorch_python_sources = set(build_variables.libtorch_python_core_sources) torch_python_srcs = set(get_cmake_torch_python_srcs_list()) print(set.difference(libtorch_python_sources, torch_python_srcs)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37267 Test Plan: CI Differential Revision: D21258292 Pulled By: malfet fbshipit-source-id: bb6d7ee73c97cbe149a9021756b9a4c9fb3ce50e	2020-04-27 16:10:07 -07:00
Wanchao Liang	1039b95ff0	[autograd] add documentation about multithread autograd (#37020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37020 Add multithread autograd documentation to the doc note. Test Plan: Imported from OSS Differential Revision: D21260996 Pulled By: wanchaol fbshipit-source-id: 91d523560268ae62d4c6d773121b282ba837a561	2020-04-27 15:53:21 -07:00
James Reed	b3ada29584	Skip test_profiler_custom_op on ROCm (#37374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37374 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D21266446 Pulled By: jamesr66a fbshipit-source-id: 405a14e92ae7222cc9163fc392c34066f75862f6	2020-04-27 15:47:41 -07:00
Justin Liang	16f4501cd4	Improve checkpoint docs to warn users about detached gradient issues (#37266 ) Summary: See https://discuss.pytorch.org/t/training-with-gradient-checkpoints-torch-utils-checkpoint-appears-to-reduce-performance-of-model/78102/3?u=jwl for details. Updated the docs to warn users about issues with checkpointing models that use `detach()` or `torch.no_grad()` to freeze their model layers/weights during training. When they do this, training with `checkpoint` will fail as it forces the outputs to require gradients when the model itself does not. Hence, during the backward pass it will output the error: ``` [4]<stderr>:RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn ``` Maybe it is possible to fix this directly in the code, but I am not sure how in the current codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37266 Differential Revision: D21262558 Pulled By: mrshenli fbshipit-source-id: 529cf370534504baf8937ef17dac5d6916fbf5ae	2020-04-27 15:25:23 -07:00
Shawn Zhong	023c3575f0	[doc] Fix JIT code highlighting (#37338 ) Summary: Fix https://github.com/pytorch/pytorch/issues/36216 \| Before \| After \| \| --- \| --- \| \| ![image](https://user-images.githubusercontent.com/6421097/80353700-55abec80-883b-11ea-9ae2-72f37ba23c16.png)\| ![image](https://user-images.githubusercontent.com/6421097/80353403-ef26ce80-883a-11ea-885b-2a2963f79d20.png) \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/37338 Differential Revision: D21262421 Pulled By: mrshenli fbshipit-source-id: 4fb62cce9543e6a4852828f58a279c36565f8c44	2020-04-27 15:04:42 -07:00
Nikita Shulga	8dc5502cb1	Do not add special `CUDNN` search path rules for `torch_python` (#37349 ) Summary: Those rules never worked until https://github.com/pytorch/pytorch/pull/37275 and afterwards they are causing crashes in manywheels builds, because getting `cudnn` linked into `libtorch_python` and `libtorch_cuda` causes double-free exceptions, see: https://app.circleci.com/pipelines/github/pytorch/pytorch/160350/workflows/85696e1c-1e67-4780-8ceb-18bc0a614507/jobs/5254443 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37349 Test Plan: Enable `manywheels` build by temporarily enabling `manywheels` build on this PR and validate that it fixes the issue, see https://app.circleci.com/pipelines/github/pytorch/pytorch/160796/workflows/13227fbc-97c0-47f6-9a87-e840e1a4b5de/jobs/5267315/steps Differential Revision: D21264484 Pulled By: malfet fbshipit-source-id: 01f1082cdb0c078cb0fdd7da381c037df7c89b6f	2020-04-27 14:58:48 -07:00
Zafar Takhirov	f463586739	Revert D20984966: [quant] Generalizing _calculate_dynamic_qparams in quantized test Test Plan: revert-hammer Differential Revision: D20984966 Original commit changeset: 17437297adae fbshipit-source-id: 30b9f7a2b2a772b2bf1c4b81cf99bddf37d4b179	2020-04-27 14:36:44 -07:00
Zafar Takhirov	f07b85b6a6	Revert D20984967: [quant] quantized reflection_pad1d Test Plan: revert-hammer Differential Revision: D20984967 Original commit changeset: 4731f16ba05a fbshipit-source-id: ad3b4edaeb837c9561c36c36a122a6f9c00cd0db	2020-04-27 14:35:28 -07:00
Ilia Cherniavskii	5fab4c30dd	[resubmit] Enable global observers API (#37292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37292 After adding c10::DispatchKey::Profiler the behavior of RecordFunction observers is also controlled by the dispatch key, this PR moves the logic outside of the profiler into the record function Reviewed By: jamesr66a Differential Revision: D21245094 fbshipit-source-id: 595e41b18206d2ba4cf639cb320f630907868b3f	2020-04-27 14:24:51 -07:00
Jacob Zhong	e33c3e49d5	Fix hard-code cmake target (#37310 ) Summary: Fix https://github.com/pytorch/pytorch/issues/33928. Basically just move the dependency into a new imported target. I'm not sure whether this modification will affect other parts, please test it throughly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37310 Differential Revision: D21263066 Pulled By: ezyang fbshipit-source-id: 7dc38f578d7e9bcb491ef5e122106fb66a33156f	2020-04-27 14:20:30 -07:00
James Reed	c4401ea9ab	Make test_quantize runnable (#37357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37357 Test Plan: Imported from OSS Differential Revision: D21262896 Pulled By: jamesr66a fbshipit-source-id: b39d5f751678a6f2f8c40b65fd2cdb96c58f7eaf	2020-04-27 13:46:52 -07:00
Mikhail Zolotukhin	e8421807d8	[TensorExpr] Fix indendation in CudaPrinter. (#37305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37305 Test Plan: Imported from OSS Differential Revision: D21245880 Pulled By: ZolotukhinM fbshipit-source-id: aab0492130682cb603f4f51d69ccb56e9582dc60	2020-04-27 13:38:57 -07:00
Mikhail Zolotukhin	e49ccdf211	[TensorExpr] Add IRPrinter::visit for AtomicAdd. (#37304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37304 Test Plan: Imported from OSS Differential Revision: D21245881 Pulled By: ZolotukhinM fbshipit-source-id: f2062c86227b1b5a9a4c0a3187da024a4db56d12	2020-04-27 13:36:52 -07:00
Lu Fang	d167a7f654	Revert D21256854: [pytorch][PR] Add overloads of std:: math functions for c10::complex Test Plan: revert-hammer Differential Revision: D21256854 Original commit changeset: 2112ba6b7992 fbshipit-source-id: b81c377f9cd33a493a63d1e666cbe6765516fca8	2020-04-27 13:23:34 -07:00
Michael Ranieri	af9c3a3652	uniform_int_distribution does not support uint8_t (#37260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37260 List of supported types here: https://en.cppreference.com/w/cpp/numeric/random/uniform_int_distribution Test Plan: CircleCI green, test compiles and passes on msvc. Reviewed By: malfet Differential Revision: D21237280 fbshipit-source-id: 51b09b87511e35bfe8a57ecd48ed772d587dba9b	2020-04-27 13:09:39 -07:00
Sebastian Messmer	045c588bc6	Enable use_c10_dispatcher: full for some more ops (#37273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37273 The issues why those couldn't be `use_c10_dispatcher: full` have either been fixed or those ops have been newly introduced without the tag but could have used it. Let's enable the tag for them. ghstack-source-id: 102896116 Test Plan: waitforsandcastle Differential Revision: D21242516 fbshipit-source-id: 5158ecc1ff6b34896f36904ea7bd7fcb4811a0bf	2020-04-27 13:05:42 -07:00
Parth Agarwal	201ba13911	Correct $ANDROID_HOME string empty check (#37064 ) Summary: Updated file to correct shell code to test whether $ANDROID_HOME env variable is empty or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37064 Differential Revision: D21181787 Pulled By: IvanKobzarev fbshipit-source-id: 40c1d79d0fb730c7f68aa7472ce9b2398e91f2a2	2020-04-27 11:16:51 -07:00
Xiao Wang	805c417ec9	Implement avg_pool2d kernel for channels_last (#35855 ) Summary: Implement avg_pool2d for channels_last. This will close https://github.com/pytorch/pytorch/issues/34996. Performance compared with avg_pool2d contiguous can be found at `ed6617c6bc/avg-pool2d-channels-last/avg-pool2d-naive.ipynb` cc csarofeen ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/35855 Differential Revision: D21187360 Pulled By: VitalyFedyunin fbshipit-source-id: b654b56168bc3982be306b634c7ed2f92018a9e5	2020-04-27 11:06:10 -07:00
mattip	ec8006cc16	[ONNX] fix provider_version and add consistency test (#36797 ) Summary: forward port the test from pr gh-36795, xref issue gh-32561 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36797 Differential Revision: D21257034 Pulled By: ezyang fbshipit-source-id: d217da0e74f00a433c904defc0bf3eb5f594fd5e	2020-04-27 11:00:23 -07:00
Lukas Koestler	0048243f70	Check compiler -v to determine compiler (fix #33701 ) (#37293 ) Summary: As described in the issue (https://github.com/pytorch/pytorch/issues/33701) the compiler check for building cpp extensions does not work with ccache. In this case we check compiler -v to determine which compiler is actually used and check it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37293 Differential Revision: D21256913 Pulled By: ezyang fbshipit-source-id: 5483a10cc2dbcff98a7f069ea9dbc0c12b6502dc	2020-04-27 10:49:04 -07:00
Gao, Xiang	6d409481b3	Add overloads of std:: math functions for c10::complex (#35725 ) Summary: Issue: https://github.com/pytorch/pytorch/issues/35284 ~This depends on and contains https://github.com/pytorch/pytorch/pull/35524. Please review after the dependency gets merged and I will rebase to get a clean diff.~ The implementation of most functions follow the pattern ```C++ template<typename T> C10_HOST_DEVICE c10::complex<T> some_function(c10::complex<T> x) { #if defined(__CUDACC__) \|\| defined(__HIPCC__) return static_cast<c10::complex<T>>(thrust::some_function(static_cast<thrust::complex<T>>(x))); #else return static_cast<c10::complex<T>>(std::some_function(static_cast<std::complex<T>>(x))); #endif } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35725 Differential Revision: D21256854 Pulled By: ezyang fbshipit-source-id: 2112ba6b79923450feafd7ebdc7184a3eaecadb6	2020-04-27 10:32:16 -07:00
Ryad ZENINE	a08a9f3b82	Enable uint8 upsampling 2 (#35029 ) Summary: Hi everyone, This is a supper small PR to enable `unit8` support for `nearest` up-sampling in `cpu` and `cuda`. This works enables us to move forward with the support of 'uint8' images in 'torchvision`. See impacted issues : https://github.com/pytorch/vision/issues/1375 https://github.com/pytorch/vision/issues/1179#issuecomment-558197607 Note: I wanted to add a unit test to ensure we have the expected behavior. I could not locate the `upsampling` unit tests for `nearest`. I can add the test if you point me to the right location. Thanks Pull Request resolved: https://github.com/pytorch/pytorch/pull/35029 Reviewed By: cpuhrsch Differential Revision: D21227144 Pulled By: fmassa fbshipit-source-id: 33c4b5188dedd8f7f872e9d797e2a9b58ee7315c	2020-04-27 10:25:10 -07:00
Xingying Cheng	5c9d1e4824	Propagate module lints for mobile scripted module. (#37046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37046 ghstack-source-id: 102669259 Creating a python api entry to generate mobile model lints which takes a scripted module as argument and returns a map of module lints. The initial version is to create placeholder which included module bundled input as the first lint instance. More lints will be added in the future. Test Plan: python test/test_optimizer.py Reviewed By: dreiss Differential Revision: D21164648 fbshipit-source-id: 9e8f4e19d74b5464a55cc73b9dc18f358c5947d6	2020-04-27 10:20:12 -07:00
Mo Zhou	5b9f7f7b0e	[cmake] Add USE_SYSTEM_{GLOO,FP16,PTHREADPOOL,PSIMD,FXDIV,BENCHMARK} options (#14699 ) (#37277 ) Summary: These options are disabled by default, and are supposed to be used by linux distro developers. With the existing shortcut option USE_SYSTEM_LIBS toggled, these new options will be enabled as well. Additionally, when USE_SYSTEM_LIBS is toggled, setup.py should no longer check the existence of git submodules. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37277 Differential Revision: D21256999 Pulled By: ezyang fbshipit-source-id: 84f97d008db5a5e41a289cb7bce94906de3c52cf	2020-04-27 09:37:27 -07:00
peterjc123	3a0ff3cd2f	Generate environment restore script for Windows build jobs (#37319 ) Summary: for better debugging purposes Pull Request resolved: https://github.com/pytorch/pytorch/pull/37319 Differential Revision: D21257011 Pulled By: ezyang fbshipit-source-id: 41c7f1aa440f3ea626536b64392cca32f7c32dd3	2020-04-27 08:33:12 -07:00
Mo Zhou	007163407c	[cmake] Support "Generic" BLAS (#14699 ) (#37276 ) Summary: The "Generic" BLAS refers to the Netlib BLAS. This option is meaningful to the Debian family due to the "update-alternatives" mechanism, which enables the user to switch the libblas.so providers between different implementations at runtime, such as ATLAS, OpenBLAS, and Intel MKL. Such, building against generic BLAS provides much flexibility. This new option is not documented in setup.py because it's only supposed to be used by linux distro (especially Debian family) developersonly. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37276 Differential Revision: D21256877 Pulled By: ezyang fbshipit-source-id: 55a5356653a1cfc763a5699b04afe5938f2007ec	2020-04-27 08:17:43 -07:00
Pavel Izmailov	22ac071d9a	Add SWA to PyTorch mainline (#35032 ) Summary: This PR is based on the issue https://github.com/pytorch/pytorch/issues/29994#issue-524418771 and the discussion in the previous version of the PR https://github.com/pytorch/pytorch/pull/30559. Specifically, I followed the interface outlined in this [comment](https://github.com/pytorch/pytorch/pull/30559#issuecomment-574864768). ## Structure - `torch/optim/swa_utils.py` contains the implementation of `AveragedModel` class, `SWALR` learning rate scheduler and `update_bn` utility - `test/test_optim.py` contains unit tests for the three components of SWA - `torch/optim/swa_utils.pyi` describes the interface of `torch/optim/swa_utils.py` The new implementation consists of - `AveragedModel` class; this class creates a copy of a given model and allows to compute running averages of the parameters. - `SWALR` learning rate scheduler; after a certain number of epochs switches to a constant learning rate; this scheduler is supposed to be chained with other schedulers. - `update_bn` utility; updates the Batch Normalization activation statistics for a given model and dataloader; this utility is meant to be applied to `AveragedModel` instances. For `update_bn` I simplified the implementation compared to the [original PR](https://github.com/pytorch/pytorch/pull/30559) according to the sugestions by vadimkantorov. ## Example ```python loader, optimizer, model = ... swa_model = torch.optim.swa_utils.AveragedModel(model) # You can use custom averaging functions with `avg_fun` parameter ema_avg = lambda p_avg, p, n_avg: 0.1 * p_avg + 0.9 * p ema_model = torch.optim.swa_utils.AveragedModel(model, avg_function=ema_avg) scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=300) swa_start = 160 swa_scheduler = SWALR(optimizer, start_epoch=swa_start, swa_lr=0.05) for i in range(300): for input, target in loader: optimizer.zero_grad() loss_fn(model(input), target).backward() optimizer.step() scheduler.step() swa_scheduler.step() if i > swa_start: swa_model.update_parameters(model) # Update bn statistics for the swa_model at the end torch.optim.swa_utils.update_bn(loader, swa_model) ``` UPDATED: ```python3 loader, optimizer, model, loss_fn = ... swa_model = torch.optim.swa_utils.AveragedModel(model) scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=300) swa_start = 160 swa_scheduler = SWALR(optimizer, swa_lr=0.05) for i in range(300): for input, target in loader: optimizer.zero_grad() loss_fn(model(input), target).backward() optimizer.step() if i > swa_start: swa_model.update_parameters(model) swa_scheduler.step() else: scheduler.step() # Update bn statistics for the swa_model at the end torch.optim.swa_utils.update_bn(loader, swa_model) ``` Fixes https://github.com/pytorch/pytorch/issues/29994 cc soumith vincentqb andrewgordonwilson vadimkantorov Pull Request resolved: https://github.com/pytorch/pytorch/pull/35032 Differential Revision: D21079606 Pulled By: vincentqb fbshipit-source-id: e07f5e821f72ada63789814c2dcbdc31f0160c37	2020-04-27 07:42:19 -07:00
Jeff Daily	828d590b06	[ROCm] Update to ROCm 3.3 (#37247 ) Summary: CC ezyang . ROCm 3.3 packages went live on 2020-04-01. Tag 376 was pushed on 2020-04-15, so it should be based on ROCm 3.3. The upgrade to ROCm 3.3 is required as part of the effort to stabilize ROCm CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37247 Differential Revision: D21256198 Pulled By: ezyang fbshipit-source-id: 92ac21c0122eda360ec279d2c3d462c3e6bf4646	2020-04-27 06:51:13 -07:00
Wanchao Liang	f41742ff2f	[autograd] remove spinning for dist engine (#36606 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36606 This PR refactor the continuation logic of the async mode on autograd engine, to avoid launch spinning works. To achieve that: 1. remove the continuation logic in execute_graph_task_with_continuiation 2. separate the usage of execute_graph_task between dist_engine and local engine, now dist_engine universally use `execute_graph_task_until_ready_queue_empty` (a better name appreciated here). 3. remove enqueue_blocked_task_on_cpu 4. remove the async mode in `execute_with_graph_task` as we don't need to use it in dist_engine Test Plan: Imported from OSS Differential Revision: D21032731 Pulled By: wanchaol fbshipit-source-id: 708ea3bc14815bdc151b56afa15eb85b4ac0f4b1	2020-04-26 22:23:30 -07:00
Wanchao Liang	ed9ec3c96f	[autograd] refactor some functions (#37061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37061 This PR refactors: 1. `set_device` to make it out of Engine 2. put `graph_task_completed` into GraphTask 3. put `mark_graph_task_completed` into GraphTask This also make the distributed engine easy to call those functions. Test Plan: Imported from OSS Differential Revision: D21188688 Pulled By: wanchaol fbshipit-source-id: f56106e6ed7d966cfa4d962781c7865cc3c5321d	2020-04-26 22:21:59 -07:00
lixinyu	47fec01c45	Fix cpp extension compile failure on some envs (#37221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37221 Test Plan: Imported from OSS Differential Revision: D21226873 Pulled By: glaringlee fbshipit-source-id: 0a390bbeaf153ee5ec355943f92c2dbcc5e04b59	2020-04-26 11:00:20 -07:00
Mike Ruberry	b428f454e1	Revert D18927220: if_constexpr for C++14 Test Plan: revert-hammer Differential Revision: D18927220 Original commit changeset: 19a135e00af6 fbshipit-source-id: a1b8755a27903b98b742881b3ecce4f5e99543b2	2020-04-26 04:27:53 -07:00
Mike Ruberry	b64fc3c4b5	Changes warnings generated in cpp to show point of Python origination (#36052 ) Summary: Today in PyTorch, warnings triggered in C++ are printed to Python users like this: `../aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.` This may be unhelpful to Python users, who have complained it's difficult to relate these messages back to their programs. After this PR, warnings that go through the PyWarningHandler and allow it to add context print like this: ``` test/test_torch.py:16463: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead. (Triggered internally at ../aten/src/ATen/native/BinaryOps.cpp:81.) cpu_result = getattr(cpu_tensor, op_str)(*cpu_args) ``` This relates the warning back to the user's program. The information about the cpp file and line number is preserved in the body of the warning message. Some warnings, like those generated in the JIT, already account for a user's Python context, and so they specify that they should be printed verbatim and are unaffected by this change. Warnings originating in Python and warnings that go through c10's warning handler, which prints to cerr, are also unaffected. A test is added to test_torch.py for this behavior. The test relies on uint8 indexing being deprecated and its warning originating from its current header file, which is an unfortunate dependency. We could implement a `torch.warn` function, instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36052 Differential Revision: D20887740 Pulled By: mruberry fbshipit-source-id: d3515c6658a387acb7fccaf83f23dbb452f02847	2020-04-25 21:18:58 -07:00
Peter Bell	f8ec51bd86	Ensure DataParallel replicas can be saved (#37307 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/37182 The `zero_grad` wrapper from `_replicate_for_data_parallel` can't be pickled. So instead, I set an attribute `_is_replica = True` and check for this in `Module.zero_grad`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37307 Differential Revision: D21246119 Pulled By: mrshenli fbshipit-source-id: 4755786d48a20bc247570ba672de9dd526914ce1	2020-04-25 20:57:24 -07:00
Omkar Salpekar	2b050371b4	Make listenLoopInternal non-virtual (#37265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37265 In PGA, `listenLoopInternal` should not be virtual - PGA doesn't have any child classes that override this. Re-arranged some comments for `listenLoop` as well. ghstack-source-id: 102880792 Test Plan: Sandcastle/CI Differential Revision: D21238761 fbshipit-source-id: 5ec5058bc462182cf970faca9a734c11c7be2a32	2020-04-25 20:14:04 -07:00
Omkar Salpekar	d98ea604f4	Improve Error Message for Dist Autograd Context Cleanup Failure (#37255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37255 Improved error message logged when Distributed Autograd Context cleanup fails - added node information and underlying error. The previous error message also assumed that the cause of the error was due to too many RPC's failing, but this is not necessarily the case. ghstack-source-id: 102867620 Test Plan: Ensuring Sandcastle/CI tests pass. Verified the correct message is logged when this code path is executed in `test_backward_node_failure` and `test_backward_node_failure_python_udf` . Differential Revision: D20950664 fbshipit-source-id: 267318187b7ef386930753c9679a5dfab6d87018	2020-04-25 19:25:07 -07:00
Zafar	b198796a28	[quant] quantized reflection_pad1d (#36450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36450 Test Plan: Imported from OSS Differential Revision: D20984967 Pulled By: z-a-f fbshipit-source-id: 4731f16ba05a6aa57636d9ab85f12dfdeebcf08d	2020-04-25 18:21:45 -07:00
Yinghai Lu	7604f470ed	Add weight info in debug_ssa_net (#37262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37262 It's convenient to have weights info in the debug_ssa_net so that we can tell what is weight and what is primary inputs. We can get their shape and size info with some post-processing script easily. Reviewed By: ChunliF Differential Revision: D21237537 fbshipit-source-id: 1fadc605283ef2eed78c44494e062a16ccf135ab	2020-04-25 18:07:28 -07:00
Ksenija Stanojevic	92e91cee8d	ONNX Export Support for CrossEntropyLoss (#34830 ) Summary: Add ONNX export support for torch.nn.CrossEntropyLoss. This PR makes following changes: 1. Updates nll_loss export 2. Makes a post pass for SoftmaxCrossEntropy Pull Request resolved: https://github.com/pytorch/pytorch/pull/34830 Reviewed By: hl475 Differential Revision: D21230712 Pulled By: houseroad fbshipit-source-id: c81911a41968e23813ba10274340ce4d8ba1ed78	2020-04-25 17:56:53 -07:00
Zafar	205c6ffbc5	[quant] Generalizing _calculate_dynamic_qparams in quantized test (#36449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36449 Test Plan: Imported from OSS Differential Revision: D20984966 Pulled By: z-a-f fbshipit-source-id: 17437297adae813bc5c6fa43c6c7514f72ce2f6c	2020-04-25 17:06:40 -07:00
Haixin Liu	ca39f99d48	[Pytorch Numeric Suite] Add module level comparison (#37242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37242 Add module level comparison API. ghstack-source-id: 102853727 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub' Reviewed By: raghuramank100 Differential Revision: D21232277 fbshipit-source-id: de707eea101a66a37869129460274c56e4e07db2	2020-04-25 16:46:10 -07:00
Nikita Shulga	a04022c656	Use `std::chrono::high_resolution_clock` for profiling on Mac (#37280 ) Summary: According to Darwin man-page: `CLOCK_REALTIME` the system's real time (i.e. wall time) clock, expressed as the amount of time since the Epoch. This is the same as the value returned by `gettimeofday`(2). I.e. its returns timestamp with microsecond resolution, as can be obvserved by running following small program: ``` #include <sys/time.h> #include <stdint.h> #include <stdbool.h> #include <stdio.h> bool conseq_time(clockid_t c) { struct timespec t1, t2; clock_gettime(c, &t1); clock_gettime(c, &t2); printf("t1={.tv_sec=%ld, .tv_nsec=%ld}\n", t1.tv_sec, t1.tv_nsec); printf("t2={.tv_sec=%ld, .tv_nsec=%ld}\n", t2.tv_sec, t2.tv_nsec); bool rc = t1.tv_sec == t2.tv_sec && t1.tv_nsec == t2.tv_nsec; printf("Two timestamps are %sequal\n", rc ? "" : "not "); return rc; } int main(void) { printf("using CLOCK_REALTIME\n"); conseq_time(CLOCK_REALTIME); printf("using CLOCK_MONOTONIC_RAW\n"); conseq_time(CLOCK_MONOTONIC_RAW); return 0; } ``` which if compiled outputs something like: ``` using CLOCK_REALTIME t1={.tv_sec=107519, .tv_nsec=860315000} t2={.tv_sec=107519, .tv_nsec=860315000} Two timestamps are equal using CLOCK_MONOTONIC_RAW t1={.tv_sec=107520, .tv_nsec=954297363} t2={.tv_sec=107520, .tv_nsec=954297426} Two timestamps are not equal ``` But why do it, if all this platform specific logic is already nicely abstracted in `std::chrono::`: https://github.com/llvm/llvm-project/blob/master/libcxx/src/chrono.cpp#L117 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37280 Differential Revision: D21246608 Pulled By: malfet fbshipit-source-id: 6beada30657a2720000e34214b1348112e55be50	2020-04-25 15:57:08 -07:00
Zafar	59052e39b8	[quant] qtensor resize (#36442 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36442 Test Plan: Imported from OSS Differential Revision: D20984080 Pulled By: z-a-f fbshipit-source-id: 7fcf24bd2f92f038b670f510118b012d8c7acc74	2020-04-25 15:52:35 -07:00
Mike Ruberry	bf860a4eba	Adds missing documentation . (#37295 ) Summary: Fixes torch.isclose documentation missing a `.`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37295 Differential Revision: D21245426 Pulled By: mruberry fbshipit-source-id: 88ce57ed68c2eac6aa83932780a6ba30e9fa69ea	2020-04-25 15:36:35 -07:00
Raghuraman Krishnamoorthi	34284c1279	Fix NaN error in dynamic quantization in qLinear, re-enable test_quantized_rnn (#36009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36009 When scale is very small (less than float eps, but greater than minimum double precision value), computation of reciprocal of scale in floating point precision within FBGEMM returns inf, while QuantUtils does not. Changed computation in QuantUtils to occur with floating point precision to re-enable tests. ghstack-source-id: 102896302 Test Plan: buck test caffe2/test:quantization -- 'test_quantized_rnn $quantization\.test_quantization\.PostTrainingDynamicQuantTest$' --print-passing-details --run-disabled Summary (total time 59.91s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Differential Revision: D20853000 fbshipit-source-id: 948a888f5516b3ba9c6efb7de31ef2cc9d431991	2020-04-25 14:52:56 -07:00
Mike Ruberry	84a31fb4e7	Revert D18927221: Boxing uses if_constexpr instead of SFINAE Test Plan: revert-hammer Differential Revision: D18927221 Original commit changeset: 70d99025b45e fbshipit-source-id: a4b650bbb6d76dda6086d88eb554f3c3077b0f76	2020-04-25 14:22:41 -07:00
James Reed	c90955e3d1	[profiler] Sort by end interval as well when parsing CPU trace (#37297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37297 Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D21245463 Pulled By: jamesr66a fbshipit-source-id: 8d307eaa32fa960b93dfd9a3b0b4c767fd903094	2020-04-25 13:58:30 -07:00
Nikita Shulga	ea741f829e	Add `--repeat` option to python unit-test (#37281 ) Summary: This would run same testsuite (or individual test) multiple time Useful for detecting flaky tests Example usage: `python test_autograd.py TestAutograd.test_profiler -v --repeat=100` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37281 Differential Revision: D21244442 Pulled By: malfet fbshipit-source-id: 3ecafec7ae87bc1e418aa28151bbc472ef37a713	2020-04-25 13:56:58 -07:00
Nikita Shulga	44345ad08c	Do not define C10_IOS on Mac (#37283 ) Summary: Because MacOS is not iOS Pull Request resolved: https://github.com/pytorch/pytorch/pull/37283 Test Plan: CI Differential Revision: D21244398 Pulled By: malfet fbshipit-source-id: b822e216e83887e2f2961b5c5384eaf749629f61	2020-04-25 13:52:46 -07:00
Negin Raoof	cb27067b32	[ONNX] Remove inverse op (#37005 ) Summary: ONNX inverse op is being removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37005 Reviewed By: hl475 Differential Revision: D21230728 Pulled By: houseroad fbshipit-source-id: 7e10414918c57938cda4ca03875c070319d429fb	2020-04-25 12:23:15 -07:00
Sebastian Messmer	b18f57e548	Boxing uses if_constexpr instead of SFINAE (#31092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31092 - ghstack-source-id: 102878439 Test Plan: unit tests Reviewed By: ezyang Differential Revision: D18927221 fbshipit-source-id: 70d99025b45edfaef11a0d587cf8bf8e749df6b8	2020-04-25 11:34:04 -07:00
Sebastian Messmer	f5e6f1f333	if_constexpr for C++14 (#31091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31091 This implements a C++17 "if constexpr" like feature for C++14. This can be used, for example, to replace SFINAE or to force the compiler to remove some parts of a function in the assembly based on a condition. PRs stacked on top will use this to simplify some of our template metaprogramming. ghstack-source-id: 102867141 Test Plan: unit tests Differential Revision: D18927220 fbshipit-source-id: 19a135e00af6ebb0139ce3730353762d4512158f	2020-04-25 11:31:51 -07:00
Bram Wasti	04b36fc264	[TensorExpr] rfactor implementation (#36237 ) Summary: A similar interface to Halide's rfactor: https://halide-lang.org/tutorials/tutorial_lesson_18_parallel_associative_reductions.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/36237 Reviewed By: zheng-xq Differential Revision: D21233309 Pulled By: bwasti fbshipit-source-id: d2706a9e90b707ee195e339f834ff4a54b63a256	2020-04-25 10:01:31 -07:00
Shen Li	c52deb694e	Consolidate usage on torch::jit::toPyObject in RPC request_callback (#37249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37249 Test Plan: Imported from OSS Differential Revision: D21234990 Pulled By: mrshenli fbshipit-source-id: d07210151342bd2ad12d1364d9f22817ee59b0c2	2020-04-25 09:37:38 -07:00
Shen Li	3d934c3d36	Add using torch::utils::Future to simplify code in RRefContext (#36811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36811 Test Plan: Imported from OSS Differential Revision: D21093846 Pulled By: mrshenli fbshipit-source-id: 61a6b1483ef1533803a18bec216ebe82aa187458	2020-04-25 09:37:33 -07:00
Shen Li	269ec9a139	Prevent RRef.to_here() to block an RPC thread on the callee using Future callbacks (#36805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36805 Test Plan: Imported from OSS Differential Revision: D21093847 Pulled By: mrshenli fbshipit-source-id: 81b0934874af36e03329fe6176628e3aca12811f	2020-04-25 09:37:28 -07:00
Shen Li	6e1e55c134	Prevent RRef unpickle to block waiting for OwnerRRef creation (#36785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36785 Currently, RRef unpickle (both Python and TorchScript) will block until the OwnerRRef has been created by the original `rpc.remote` call, if it is an OwnerRRef. This is not ideal as the correctness would then depends on the number of threads configuration. This commit changed that behavior. Both `rpc.remote` and the unpickle can create OwnerRRefs. More specifically, whichever one arrives first will create the OwnerRRef and the subsequent ones will retrieve the same OwnerRRef, so that no one is blocking. Test Plan: Imported from OSS Differential Revision: D21083089 Pulled By: mrshenli fbshipit-source-id: 34ef063d50549b01c968b47815c4fe9fac179d3d	2020-04-25 09:36:02 -07:00
Xiang Gao	d7f7c290e3	addmv migration [resubmit] (#37236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37236 Differential Revision: D21232988 Pulled By: anjali411 fbshipit-source-id: ac6c0ee018aef3c841b039d76e6e1fbb3cd0292d	2020-04-25 07:43:27 -07:00
Ilia Cherniavskii	856e8cf028	Revert D21213786: Enable global observers API Test Plan: revert-hammer Differential Revision: D21213786 Original commit changeset: e618254da74a fbshipit-source-id: 425ea5d44fa55655ec0dd586c5075996b926177b	2020-04-25 00:59:24 -07:00
Nikita Shulga	e6231c9e24	Do not run valgrind on the Aten unit tests compiled with clang (#37152 ) Summary: Valgrind detects some unitialized variables if torch_cpu is compiled with clang, which are not reproducible if the same code is compiled with gcc nor using address sanitizer tool See https://github.com/pytorch/pytorch/issues/37117 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37152 Differential Revision: D21241577 Pulled By: malfet fbshipit-source-id: 4a5dddf2a4fc4238dc9117cb92ee4e34af9e6064	2020-04-25 00:11:28 -07:00
Ilia Cherniavskii	6e659e928b	Enable global observers API (#37195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37195 After adding c10::DispatchKey::Profiler the behavior of RecordFunction observers is also controlled by the dispatch key, this PR moves the logic outside of the profiler into the record function Reviewed By: ngimel Differential Revision: D21213786 fbshipit-source-id: e618254da74a4f1ce16c51a3869bbd75a4f561ad	2020-04-24 23:49:28 -07:00
Sebastian Messmer	4e976b9334	Remove callBoxedWorkaround (#36850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36850 Since now all unboxing happens after dispatch, which means that all c10 ops support unboxing, we can now use op.callBoxed() for all ops and don't need callBoxedWorkaround (which was going through the JIT registry) anymore. ghstack-source-id: 102879558 Test Plan: waitforsandcastle Differential Revision: D21102375 fbshipit-source-id: d1e041116563a9650d5a86b07eb96d217d8756f3	2020-04-24 23:13:31 -07:00
Hong Xu	6ea2aedab9	Cast shape_.size() to int64_t before comparing with squash_dim (#37109 ) Summary: This is generating a considerable amount of warning messages since TensorIterator.h is included from a lot of files: /home/hong/xusrc/pytorch/aten/src/ATen/native/TensorIterator.h:372:47: warning: comparison of integers of different signs: 'const int64_t' (aka 'const long') and 'c10::SmallVectorTemplateCommon::size_type' (aka 'unsigned long') [-Wsign-compare] TORCH_CHECK(squash_dim >= 0 && squash_dim < shape_.size(), Pull Request resolved: https://github.com/pytorch/pytorch/pull/37109 Differential Revision: D21242163 Pulled By: ngimel fbshipit-source-id: aec2978ee76750676a449eb6671142a782658de3	2020-04-24 22:39:53 -07:00
Nikita Shulga	30eb0bdf32	Do not define list "0" in torch/CMakeLists.txt (#37275 ) Summary: Per https://cmake.org/cmake/help/latest/command/list.html list insert arguments order is `list(INSERT <list> <index> [<element>...])` That is first argument is list name not the index it gets inserted into Pull Request resolved: https://github.com/pytorch/pytorch/pull/37275 Differential Revision: D21243539 Pulled By: malfet fbshipit-source-id: b947ad64f1a3549df68083383537899b19abd9ca	2020-04-24 21:32:13 -07:00
Raghuraman Krishnamoorthi	904949382e	Ensure that histogram observers have zero-point of zero for post ReLU activations (#37107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37107 Currently histogram observers relax both the min and max values of the activations for performance speedup reasons. This causes an issue for glow where there is a slow down if the zero-point is not zero for post ReLU activations. ghstack-source-id: 102768017 Test Plan: buck test caffe2/test:quantization -- 'test_histogram_observer_one_sided $quantization\.test_quantization\.RecordHistogramObserverTest$' --print-passing-details Differential Revision: D21187636 fbshipit-source-id: 8d616b9e9caf2979a26a215e99434f71025e3d8b	2020-04-24 20:57:34 -07:00
Xiaodong Wang	ef9ec03e77	[CUDA11] Pytorch change (#37187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37187 Adding CUDACC guard for gcc9+ Reviewed By: ngimel Differential Revision: D21209798 fbshipit-source-id: 5cc4efc7108577d74bee4c12c942ed1e5bf9bbac	2020-04-24 20:29:53 -07:00
Nikolay Korovaiko	a80a438e37	correctly set and restore states in te tests (#37210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37210 Differential Revision: D21238634 Pulled By: Krovatkin fbshipit-source-id: 6462239753399c10c871baa5d5fdff5465cf2544	2020-04-24 20:16:51 -07:00
Xiao Wang	686b521784	Update cusparse deprecated Xcsrmm2 call (#37202 ) Summary: Reland of https://github.com/pytorch/pytorch/issues/36845 due to Windows CI failure. binary_windows_wheel_3_7_cu102_build is passed, so the windows guard should be fine this time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37202 Differential Revision: D21233358 Pulled By: xw285cornell fbshipit-source-id: 707de0ff21d178686354ffaea7625f1d68b3e8d3	2020-04-24 20:12:21 -07:00
Gao, Xiang	4a72ddedcd	Show cpu info for macos jobs (#37220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37220 Differential Revision: D21243205 Pulled By: ezyang fbshipit-source-id: 77a4d904e80c59b6d4d39b1a1a0fb441d8a35f0c	2020-04-24 19:53:34 -07:00
Yang Gu	1d0334dd62	Add cpu build and test to Windows CI (#37135 ) Summary: Add windows build and test for cpu Pull Request resolved: https://github.com/pytorch/pytorch/pull/37135 Differential Revision: D21243189 Pulled By: ezyang fbshipit-source-id: dd804ac258940e608facaf375d80ff5a0c59a7ae	2020-04-24 19:49:07 -07:00
Sebastian Messmer	1d8012a624	Delete dead code (#37254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37254 This code is leftover from the KernelFactory deletion. ghstack-source-id: 102866045 Test Plan: waitforsandcastle Differential Revision: D21235480 fbshipit-source-id: 739ba677d2139ba9934d103f75a609638f1a3856	2020-04-24 18:08:31 -07:00
Michael Suo	1f08ff12ec	[jit] fix named tuples as attributes (#37251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37251 This was broken by recent changes to how we serialize with type tags. We save a name (like `Dict[str, MyNamedTuple]`) and then relied on the mobile type parser to resolve that name back into a set of types. This doesn't work for any NamedTypes as the mobile type parser doesn't know how to resolve those. The unpickler allows the caller to inject a type resolver in for this purpose, use that so that when importing in a non-mobile environment you get the right results. A second problem also had to be fixed: the SourceImporter type loader would only load named types directly (e.g. `MyNamedTuple`) and choked if it was a general type that contained a named tupe (e.g. `List[MyNamedTuple]`). Fixed that and renamed `loadNamedType` to `loadType` for clarity. Test Plan: Imported from OSS Differential Revision: D21235213 Pulled By: suo fbshipit-source-id: 16db0f4c5e91a890d67a8687cc8ababa6b94b0f4	2020-04-24 17:48:44 -07:00
Nikita Shulga	47c4dca1ab	Remove python-2 or python<3.5 checks from unit tests (#37252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37252 Test Plan: CI Differential Revision: D21241083 Pulled By: malfet fbshipit-source-id: 44164b822f7905288abb2beda0175d2162d86143	2020-04-24 17:42:04 -07:00
Michael Suo	521910e0e9	Update clang_format_ci.sh (#37268 ) Summary: shellcheck led me astray! Pull Request resolved: https://github.com/pytorch/pytorch/pull/37268 Differential Revision: D21241361 Pulled By: suo fbshipit-source-id: 68244bb889e784ccd36d714209c2c15e2d6f04f8	2020-04-24 17:19:36 -07:00
James Reed	b60c3dfdd9	Add fallback wrapper for profiler (#37194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37194 Test Plan: Imported from OSS Reviewed By: ilia-cher, ngimel Differential Revision: D21217886 Pulled By: jamesr66a fbshipit-source-id: b06195e9ac110979d128391e067d5c9f416c1873	2020-04-24 16:24:58 -07:00
Basil Hosmer	047488a7ff	Mask all high dispatch keys in BackendSelect kernels (#37257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37257 Previously, we were relying on fragile invariants to avoid collecting and feeding high precedence, non-backend dispatch keys to backend initialization machinery, which would assert on them. (These same keys are then used for redispatch, so a second latent problem lurks behind the first.) Here we mask off the BackendDispatch key and all keys to its left. Followup: move backend init code to backend-specific wrappers (`CPUType` etc.). This will let us remove the backend init code from both BackendSelect and STATIC_DISPATCH wrappers. (Though BackendSelect will still need to compute a dispatch key, so the logic introduced here will still be necessary.) Test Plan: Imported from OSS Differential Revision: D21235856 Pulled By: bhosmer fbshipit-source-id: 1b8bd7897ed4b41a95718f3cfceddf4ee094744a	2020-04-24 16:12:39 -07:00
Zachary DeVito	b6bb644e41	Fix long line splitting issue in python_print (#37088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37088 For an inlined expression tree like `(e_0, (e_1, e_long))` the previous algoritm only scanned the same statement as `e_long`, splitting the inlined expressions across lines. Because it did not scan `e_0`, `e_0` would still get emitted inline, causing it to reverse order with `e_1` and `e_long`. The new algorithm scans starting at `e_long` and going all the way back up the expression until it reaches the end of the inlined statement. Caching of what has already been scanned has been added so that if there was a second long long `e_long2` after `e_long`, it would not rescan and re-inline the statements that were already split. Test Plan: Imported from OSS Differential Revision: D21180394 Pulled By: zdevito fbshipit-source-id: 4d142c83a04c89a47d04282f67a513f82cf153c0	2020-04-24 15:14:39 -07:00
Hong Xu	d6ce6570f9	Remove unused imports in aten/src/ATen/function_wrapper.py (#37245 ) Summary: typing is available since Python 3.5, no need to try-import. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37245 Differential Revision: D21236650 Pulled By: albanD fbshipit-source-id: daf150103835d0c6cd3c39300044e548bb6d311d	2020-04-24 15:10:36 -07:00
anjali411	4f3946a89b	Added complex dtypes to get_all_math_dtypes, complex acc type for cpu, fixed rdiv and pow for complex (#37193 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/36730 https://github.com/pytorch/pytorch/issues/36057 Partially resolves: https://github.com/pytorch/pytorch/issues/36671 ``` >>> 2j / torch.tensor([4], dtype = torch.complex64) tensor([(0.0000+0.5000j)], dtype=torch.complex64) >>> 1 / torch.tensor(3+4j) tensor((0.1200-0.1600j), dtype=torch.complex64) ``` rdiv is more generally broken for all dtypes because it doesn't promote the types properly eg. ``` >>> 1 / torch.tensor(2) tensor(0) >>> 2j / torch.tensor(4) tensor(0) ``` so that issue should be fixed in a separate PR Adding CPU acc types for complex Added cumsum, cumprod for complex dtypes Added complex dtypes to get_all_math_dtypes to expand testing for complex dtypes Old PR - https://github.com/pytorch/pytorch/pull/36747 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37193 Differential Revision: D21229373 Pulled By: anjali411 fbshipit-source-id: 8a086136d8c10dabe62358d276331e3f22bb2342	2020-04-24 15:05:50 -07:00
Wanchao Liang	c38dcd45d7	[jit] fix return different types bug in tracing module calls (#37190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37190 if module call return different types, we need to record them correctly Test Plan: Imported from OSS Differential Revision: D21214871 Pulled By: wanchaol fbshipit-source-id: 46ba98f08ed4ade22f9740cb3fca84b29557e125	2020-04-24 14:49:28 -07:00
Wanchao Liang	5362a0b948	[jit] fix lifting bug in tracing module calls (#37189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37189 This fix bug in tracing module calls to correct lift values with its correponding value type, rather than the default tensor type. Test Plan: Imported from OSS Differential Revision: D21214872 Pulled By: wanchaol fbshipit-source-id: f635154851365e2d7b88186d6e47634123eac42f	2020-04-24 14:47:54 -07:00
Xiang Gao	a13b5b0ae8	Split reduction compile units (#37205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37205 Test Plan: Imported from OSS Differential Revision: D21233254 Pulled By: ngimel fbshipit-source-id: 68b37ebbdd715a30c616e425a39b6b21c01b37e2	2020-04-24 14:19:31 -07:00
Kimish Patel	9f02897431	Account for the change in optimizeForMobile API change. Test Plan: TBD Reviewed By: ayush29feb Differential Revision: D21185736 fbshipit-source-id: fc7abc9c2eba8e6a390e54168b1fc4a17bf80e68	2020-04-24 13:21:56 -07:00
Alexander Fix	2baff9476e	Test test_is_nonzero make expected exception inline (#37128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37128 In certain build modes (in fbcode, building a .par) the mechanism to get test output "expect" files doesn't work. All other tests in test_torch.py already had assertExpectedInline instead of assertExpected, with the expected result inline in the file. There was no equivalent for assertExpectedRaises, so I added one, and changed the tests for test_is_nonzero (the only test using this) Test Plan: CI, specifically the test test_is_nonzero should pass Reviewed By: malfet Differential Revision: D21197651 fbshipit-source-id: 2a07079efdcf1f0b0abe60e92cadcf55d81d4b13	2020-04-24 13:12:31 -07:00
Sebastian Messmer	deefafb01d	Allow std::array as operator argument and return (#34399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34399 Custom ops can now take std::array as arguments and return it. This PR also moves the ops in native_functions.yaml that were blocked by this to now `use_c10_dispatcher: full`. ghstack-source-id: 102643208 Test Plan: unit tests Differential Revision: D20315072 fbshipit-source-id: 93232448663df962f65e0f25bfb35826dd3374f8	2020-04-24 13:07:14 -07:00
Sebastian Messmer	fc528ccbaf	[wip] Allow ArrayRef as kernel parameter (#34335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34335 ghstack-source-id: 102625264 Test Plan: waitforsandcastle Differential Revision: D20296841 fbshipit-source-id: 123e6eae304dbca17d8f7474a79fb3b4769d23ad	2020-04-24 13:05:49 -07:00
Richard J. Knight	93cd05b0f4	Fix CMake errors on systems where {Q/X}NNPACK is not supported (#35607 ) Summary: - add a couple of checks for USE_XNNPACK to disable additional code paths if XNNPACK is not supported When passing through the code paths where the platform checks are made (cmake/Dependencies.cmake:89), if XNNPACK is not supported, then the var FXDIV_SOURCE_DIR will not be set. CMake emits the errors when add_directory is called and FXDIV_SOURCE_DIR is empty. see: https://github.com/pytorch/pytorch/issues/34606 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35607 Differential Revision: D20895645 Pulled By: seemethere fbshipit-source-id: 3bd10cf89f0fb6825fdd6e1d52c71ee37c67b953	2020-04-24 12:37:23 -07:00
anjali411	6e92579883	Added autograd support for C->C functions and enabled requires_grad=True for complex (#36932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36932 Differential Revision: D21181230 Pulled By: anjali411 fbshipit-source-id: 295f2cd1e2b9918a8b2cb88cab0536b2407dc455	2020-04-24 12:30:49 -07:00
Pavel Belevich	1beca4ac6a	Prerequisites for CSPRNG (#36631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36631 Summary of changes 1. Moved random transformation functions to DistributionHelper.h (`uniform_int_from_to_distribution`, `uniform_int_full_range_distribution`, `uniform_int_distribution`) to avoid code duplication between default CPU, CUDA rngs and custom rng extensions 2. Made GeneratorImpl fields protected instead of private 3. Introduced `TORCH_CHECK_IF_NOT_ON_CUDA` that does the same as `TORCH_CHECK` if it is not CUDA/ROCm device 4. To test multiple rng extensions I had to move ops registration to the method `registerOps()`, expose it to python and call it `def setUp(self)` Test Plan: Imported from OSS Differential Revision: D21229202 Pulled By: pbelevich fbshipit-source-id: 6aa3280f2fc3324cf3e748388b5087e3a1e49f23	2020-04-24 12:25:37 -07:00
Michael Suo	af08334c63	better local command for clang-format check (#37127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37127 Wrap what we're running in CI in a small script so we can exactly reproduce it locally if ncessary. Test Plan: Imported from OSS Reviewed By: malfet Differential Revision: D21196804 Pulled By: suo fbshipit-source-id: 45497daae4bafd236a0d1bb1480841f0d9f39262	2020-04-24 12:19:57 -07:00
moto	5a27ec09b8	Add Inverse Short Time Fourier Transform in ATen native (#35569 ) Summary: Ported `torchaudio`'s implementation (test, and documentation as well) to ATen. Note - Batch packing/unpacking is performed in Python. ATen implementation expects 4D input tensor. - The way `hop_length` is initialized in the same way as `stft` implementation. [The Torchaudio's version tried to mimic the same behavior but slightly different](`7da61a4bee/torchaudio/functional.py (L152-L157)`). Closes https://github.com/pytorch/pytorch/issues/34827 Relates https://github.com/pytorch/pytorch/issues/3775 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35569 Differential Revision: D21178090 Pulled By: mthrok fbshipit-source-id: 2701a8b241a36a6fb1b740c2fb2b07cb938185d4	2020-04-24 12:14:55 -07:00
Xiang Gao	20328f67bb	Add core of c10::complex [resubmit] (#36626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36626 This reverts commit 9216c67c9eb0dfc58b530740d954997130e05b13. Test Plan: Imported from OSS Differential Revision: D21140441 Pulled By: anjali411 fbshipit-source-id: 488530088e2ff87dc27e70d21ace88ff2967e7ab	2020-04-24 12:08:23 -07:00
Bilge Acun	6ac0f67699	[C2] Optimize MulGradient Operator when inner_size is 1 (#36767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36767 Add a simpler implementation of the MulGradient cuda kernel for when inner_size==1, inner loop is eliminated. Reviewed By: xw285cornell Differential Revision: D21013269 fbshipit-source-id: bb62470d91a7fef6eecc3d4766a2c994ca6bb2c8	2020-04-24 11:17:49 -07:00
Shawn Zhong	cae77fa351	[doc] Fix broken links in the TOC of CONTRIBUTING.md (#37131 ) Summary: Some links in the TOC of CONTRIBUTING.md is broken since GitHub removes the invalid characters (e.g., `+` in C++) in the anchor link, while the existing TOC uses `-` for replacement. This PR uses `-` instead of `*` and `+` for the bullet lists to make it consistent with README.md. `b889e0da8a/README.md (L11-L18)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37131 Differential Revision: D21231299 Pulled By: zou3519 fbshipit-source-id: 8e7bb61550827ce97378d3428542e43612bac8e1	2020-04-24 10:56:10 -07:00
Jerry Zhang	385165ec67	[reland][quant] QuantizedCUDA implementation (#36936 ) (#37081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37081 Closes https://github.com/pytorch/pytorch/issues/30813 Relanding of https://github.com/pytorch/pytorch/pull/35463 1. Tensor quantization logic(quantize_) is moved to the aten/native/quantized. Previously all logic for tensor quantization lived in the aten/quantized/Quantizer.cpp file, and started to become complicated and hard to read. This problem should be addressed in refactoring PR. Still, I reworked this partially because I had to add tensor quantization logic for CUDA, and it was native to move everything to the aten/native/quantized. 2. Requirements to run CUDA_tensor_apply was eased to process any tenser that lives on the CUDA device(QuantizedCUDA included). 3. All quantized data types now have a default constructor. NVCC refuses to compile any gpu_kernel or CUDA_tensor_apply* without them. 4. Minor changes in many files to register QuantizedCUDA backend. 5. test_quantized_tensor is extended to process QuantizedCUDA backend where possible. Test Plan: Imported from OSS Differential Revision: D21206694 Pulled By: jerryzh168 fbshipit-source-id: c7433aad9c095a34c57e6dddd128b5c5d9292373	2020-04-24 10:21:59 -07:00
Jesse Brizzi	77abb6938e	Port register_string_ops.cpp to new operator registration API (#37008 ) Summary: Adresses https://github.com/pytorch/pytorch/issues/36925 We have a new operator registration API introduced in https://github.com/pytorch/pytorch/issues/36258, and we need to port all use sites of the old registration API to use it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37008 Differential Revision: D21160557 Pulled By: jessebrizzi fbshipit-source-id: 6bc0d57c40229cc7a477cde371c08479d4a4fe4f	2020-04-24 10:04:56 -07:00
Haixin Liu	8254a63802	Speed up calculate Qparams for per-channel observers (#30485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30485 Use vectorization to speed up calculate Qparams for per-channel observers. New implementation is about 1000 times faster. Task: https://github.com/pytorch/pytorch/issues/30348#event-2824868602 ghstack-source-id: 102808561 Test Plan: ``` import torch import time import numpy as np from torch.quantization.observer import PerChannelMinMaxObserver obs = PerChannelMinMaxObserver() acc_time = 0 X = torch.randn(1000, 10) obs(X) for i in range(100): start = time.time() obs.calculate_qparams() acc_time = acc_time + time.time()-start print(acc_time) ``` Before change: 20.3 After change: 0.017 Differential Revision: D18711905 fbshipit-source-id: 3ed20a6734c9950773350957aaf0fc5d14827640	2020-04-24 07:32:36 -07:00
Brian Vaughan	a50a1fb4c3	Enforce kw-only args now that py2 is unsupported (#37069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37069 Test Plan: Imported from OSS Differential Revision: D21204729 Pulled By: nairbv fbshipit-source-id: 8e93decae59e753706fa288bcdc3bf6278b8eeb5	2020-04-24 07:08:24 -07:00
Alban Desmaison	35b9c89dc1	Revert D21045393: [PyTorch Numeric Suite] Add module level comparison Test Plan: revert-hammer Differential Revision: D21045393 Original commit changeset: 4303805f732c fbshipit-source-id: 06d8a234eda800eb14bc3aa58ff14b0d3cf86d86	2020-04-24 07:03:04 -07:00
Haixin Liu	fba9b9a023	[PyTorch Numeric Suite] Add module level comparison (#36669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36669 Add module level comparison API. ghstack-source-id: 102802362 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_compare_model_stub' Differential Revision: D21045393 fbshipit-source-id: 4303805f732cc8c8fc67ce40d9594b664507bf82	2020-04-24 00:17:22 -07:00
Shen Li	827f04a075	Supporting create an RPC gang of size 1 (#32731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32731 As we now support send to self, we no longer require world_size > 1. Removing the assert from ProcessGroupAgent. Test Plan: Imported from OSS Differential Revision: D19609558 Pulled By: mrshenli fbshipit-source-id: ecec18d756f97d8d78d4526a63b7cb8ab6f858a3	2020-04-23 22:53:13 -07:00
Shen Li	a633c2d112	Fix const-cast lint error in process_group_agent.cpp (#37184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37184 Test Plan: Imported from OSS Differential Revision: D21213292 Pulled By: mrshenli fbshipit-source-id: 1fb5447bc0033242e97cc2ec8c52b24e0cf61e8a	2020-04-23 22:52:03 -07:00
Hong Xu	3ff892febb	Remove redundant definition of fmadd functions in complex Vec256 (#37167 ) Summary: These straightforward implementations have already been covered in vec256_base.h. No need to specialize the template function for complex types. `7aec364bdf/aten/src/ATen/cpu/vec256/vec256_base.h (L696-L698)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37167 Differential Revision: D21222595 Pulled By: ezyang fbshipit-source-id: fbe311cd39cafbb6ccec151f9d8e52450b17437f	2020-04-23 21:41:36 -07:00
svcscm	070dea2d7e	Updating submodules Summary: GitHub commits: `23198e8b72` `4134ddf996` `fc25d15e3a` `b646e5df83` `e076254b26` `fc4852b397` `9ae029fc23` `e04f3bce4f` `b734bf0646` `55443f2b16` `bad0017d5b` `91063b0c3e` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: a20549d3cdb6d1fcf4dacbaa31ba59e778d3d462	2020-04-23 21:34:46 -07:00
Mo Zhou	ff21b15624	cmake: add USE_SYSTEM_{LIBS,CPUINFO,SLEEF} options (#14699 ) (#37137 ) Summary: ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37137 Differential Revision: D21222632 Pulled By: ezyang fbshipit-source-id: 47624b30f8d07b31a40a26edf665bbec39e45202	2020-04-23 20:43:36 -07:00
Pritam Damania	05e98149ae	Refactor lambda post hook. (#37025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37025 This allows us to reuse this framework in other places. Test Plan: buck test mode/dev-nosan caffe2/torch/fb/distributed/model_parallel/tests:test_dist_optim -- test_optimizer_hook Differential Revision: D20958327 fbshipit-source-id: 2a37dae3687fea8820427e174900111b58673194	2020-04-23 15:29:34 -07:00
Karl Ostmo	35f7945828	Revert D21196366: [pytorch][PR] Update cusparse deprecated Xcsrmm2 call Test Plan: revert-hammer Differential Revision: D21196366 Original commit changeset: 592d6bd6379f fbshipit-source-id: 3bb8c090c13a1f4af0edea5b99d81fee53240ef9	2020-04-23 15:19:36 -07:00
Vishwak Srinivasan	fd5b5cd604	Allowing casting str to int in JIT (#36016 ) Summary: Changelog: - Allow int(str) in TorchScript Pull Request resolved: https://github.com/pytorch/pytorch/pull/36016 Test Plan: - Added tests in test_jit.py Closes https://github.com/pytorch/pytorch/issues/35948 Differential Revision: D21076438 Pulled By: driazati fbshipit-source-id: d0753dc0e1c79f4f943c303b58b2d228856ba793	2020-04-23 14:26:24 -07:00
Shen Li	989341c0c6	Add comments to explain how MultiProcessTestCase works (#37179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37179 Test Plan: Imported from OSS Differential Revision: D21211196 Pulled By: mrshenli fbshipit-source-id: 86dc8183b4def7e95a236f6c5c73ef67466f4ddb	2020-04-23 14:17:12 -07:00
Ashkan Aliabadi	c4b9f3bf55	Enable torch_speed_benchmark to accept different memory formats. (#36202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36202 Test Plan: Imported from OSS Differential Revision: D20970216 Pulled By: AshkanAliabadi fbshipit-source-id: bb5a260e5677716356eec6ad4daa1f3c65420bbd	2020-04-23 13:18:43 -07:00
Ashkan Aliabadi	ba3f8d35e0	Enable stateless XNNPACK linear. (#35791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35791 The optimal solution to use XNNPACK is to separate operator creation from execution - also called prepacking the weights. If we have done our job properly, JIT must have caught and replaced nn.Linear on mobile with the prepacked versions. Still, if we somehow end up in at::native::linear for whatever reason, it is still more efficient to go through XNNPACK than the alternatives of at::addmm or at::matmul. Differential Revision: D20821863 Test Plan: Imported from OSS Pulled By: AshkanAliabadi fbshipit-source-id: 5a75bfd900435c89c1b8536dc09248e788292e0c	2020-04-23 13:18:38 -07:00
Ashkan Aliabadi	72f80b5247	Enable stateless XNNPACK convolutions. (#35790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35790 The optimal solution to use XNNPACK is to separate operator creation from execution - also called prepacking the weights. If we have done our job properly, JIT must have caught and replaced nn.Conv2ds on mobile with the prepacked versions. Still, if we somehow end up in _convolution for whatever reason, it is still more efficient to go through XNNPACK for NHWC tensors, compared to the alternative of converting NHWC to NCHW and going through NNPACK. Differential Revision: D20821864 Test Plan: Imported from OSS Pulled By: AshkanAliabadi fbshipit-source-id: 2732280c2fd31edcb39658f6530d03331a1a4a75	2020-04-23 13:17:10 -07:00
kshitij12345	e98cdfa26f	Migrate `tanh` from TH to ATen (CUDA) (#36995 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24642 Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.tanh(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.tanh(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.tanh(a) a.numel() == 10000 for 20000 times torch.half 0.2816318240002147 torch.tanh(a) a.numel() == 10000 for 20000 times torch.float 0.2728829070001666 torch.tanh(a) a.numel() == 10000 for 20000 times torch.double 0.39797203200214426 torch.tanh(a) a.numel() == 100000 for 20000 times torch.half 0.3228214350019698 torch.tanh(a) a.numel() == 100000 for 20000 times torch.float 0.31780802399953245 torch.tanh(a) a.numel() == 100000 for 20000 times torch.double 1.3745740449994628 ``` After: ``` torch.tanh(a) a.numel() == 10000 for 20000 times torch.half 0.27825374500025646 torch.tanh(a) a.numel() == 10000 for 20000 times torch.float 0.27764024499992956 torch.tanh(a) a.numel() == 10000 for 20000 times torch.double 0.3771585260001302 torch.tanh(a) a.numel() == 100000 for 20000 times torch.half 0.2995866400015075 torch.tanh(a) a.numel() == 100000 for 20000 times torch.float 0.28355561699936516 torch.tanh(a) a.numel() == 100000 for 20000 times torch.double 1.393811182002537 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36995 Differential Revision: D21163353 Pulled By: ngimel fbshipit-source-id: e2216ff62cdfdd13b6a56daa63d4ef1440d991d4	2020-04-23 12:29:27 -07:00
Taylor Robie	7aec364bdf	extend gather shape check to handle incorrectly sized outputs (#37102 ) Summary: Fixes a safety issue (Nonsense values and segfaults) introduced by https://github.com/pytorch/pytorch/pull/36875 when in-place gather tries to use incorrect shapes. Consider the following block of code: ``` k0 = 8 k1 = 8 m = 100 x = torch.rand((k0, k1)) ind = torch.randint(0, k0, (m, k1)) output = torch.empty((m, k1)) print(torch.gather(x, 0, ind, out=output)) print(torch.gather(x, 1, ind, out=output)) ``` The first gather is legal, the second is not. (`ind` and `output` need to be transposed) Previously this was caught when the kernel tried to restride inputs for TensorIterator, but we can no longer rely on those checks and must test explicitly. If `m` is small the second gather returns gibberish; if it is large enough to push the read out of memory block the program segfaults. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37102 Differential Revision: D21190580 Pulled By: robieta fbshipit-source-id: 80175620d24ad3380d78995f7ec7dbf2627d2998	2020-04-23 11:47:01 -07:00
Ashkan Aliabadi	006f1a32f8	Mobile CPU allocator. (#36032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36032 QNNPACK AND XNNPACK may out-of-bound access the input and / or output tensors. This is by-design, and chosen to make the implementation of micro-kernels both simpler and faster as a result of not having to individually handle the corner cases where the number of processed elements is not a multiple of SIMD register width. This behavior will trigger ASAN though, and may result in a segfault if the accessed memory location just so happens to fall on a page the current process has no read access to. Here we define a custom allocator that allocates the extra storage required to keep this behavior safe. This allocator could have been restricted to QNNPACK and XNNPACK only, but that would have negative performance ramifications, as input tensors must now be reallocated, and copied over, if the tensor is not allocated with this allocator to begin with. Making this allocator the default on mobile builds minimizes the probability of unnecessary reallocations and copies, and also enables acceleration of operations where the output tensor is allocated outside of the function doing the implementation, wherein the implementation cannot simply re-allocate the output with the guarding allocator. Test Plan: Imported from OSS Differential Revision: D20970217 Pulled By: AshkanAliabadi fbshipit-source-id: 65cca2d38d7c0cef63c732f393016f50f1fa5199	2020-04-23 11:03:03 -07:00
Mikhail Zolotukhin	ebfe631ed8	[TensorExpr] Cleanup TensorExprKernel class and add CPP tests for it. (#36952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36952 Differential Revision: D21139939 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: a6605c0d6ccbb049ce27e6cdcc8fd8d2ebc057e3	2020-04-23 10:51:33 -07:00
Anjali Chourdia	c306f2ed08	Revert D20660338: [pytorch][PR] Migrate addmv and mv from legacy to ATen native (CUDA & CPU) Test Plan: revert-hammer Differential Revision: D20660338 Original commit changeset: db1f521f1241 fbshipit-source-id: 8616ddd7bbd8f00351cfc45331a09b0bc9aa28ea	2020-04-23 10:46:45 -07:00
Gao, Xiang	438aed63a1	Fix prelu_backward TensorIterator split (#36134 ) Summary: We should have ```C++ for (auto& sub_iter : iter.with_32bit_indexing()) { launch_prelu_cuda_backward_share_weights_kernel(sub_iter, weight_data); } ``` But I mistakenly wrote it as ```C++ for (auto& sub_iter : iter.with_32bit_indexing()) { launch_prelu_cuda_backward_share_weights_kernel(iter, weight_data); } ``` in my previous PR. Which leads to infinite recursion on it. I found this bug when working on https://github.com/pytorch/pytorch/pull/34004 I also add a `TORCH_INTERNAL_ASSERT_DEBUG_ONLY` to test for this. Besides, the caller is already guaranteed contiguous, so we don't need to handle no-contiguous tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36134 Differential Revision: D21187542 Pulled By: VitalyFedyunin fbshipit-source-id: 0fafdd7b672bf89fcaa2b42e08b7d41ade7e6bcb	2020-04-23 10:42:20 -07:00
Jerry Zhang	230b68168b	[quant] Refactor test files (#36964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36964 Rename and restructure quantization related tests https://github.com/pytorch/pytorch/issues/31625 Test Plan: . Imported from OSS Differential Revision: D21192509 fbshipit-source-id: 148c93e86e0ea68ab18a067fe74a8035a29a1e4e	2020-04-23 10:28:56 -07:00
gzygzy9211	ab2a9ab925	Non-blocking SyncBatchNorm update (#36659 ) Summary: As shown in https://github.com/pytorch/pytorch/issues/36452 , SyncBatchNorm can block host thread due the ``MemcpyDtoH`` and ``MemcpyHtoD`` when dealing with argument ``counts`` for native function ``batch_norm_gather_stats_with_counts``. - This fix change signiture of ``batch_norm_gather_stats_with_counts`` to ```c++ std::tuple<Tensor, Tensor> batch_norm_gather_stats_with_counts_cuda(const Tensor& self, const Tensor& mean, const Tensor& invstd, const Tensor& running_mean, const Tensor& running_var, double momentum, double epsilon, const Tensor& counts) ``` so it can directly receive "counts" in a ``CUDATensor`` rather than ``IntArrayRef`` whose data is in host memory. - This fix also improve implementation of ``SyncBatchNorm`` function so the construction of ``counts`` tensor will not cause additional ``MemcpyHtoD``, which will block host thread, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36659 Differential Revision: D21196991 Pulled By: ngimel fbshipit-source-id: 84a529e6cf22e03618fecbb8f070ec452f81229e	2020-04-23 10:22:19 -07:00
Leeleo3x	f11df2d2b4	Use temporary variable to store input parameters in loop. (#36288 ) Summary: The original implementation of maxpool and im2col kernels will fail if `gridSize` * `blockSize` is smaller than the `nthreads` in maxpool kernel or `n` in im2col kernel. Input parameters `bottom_data`, `data_col`, `data_im`, and loop index `index` are modified inside the loop body and the corrupted data will be carried to the second iteration. This patch uses temporary variables to replace the input parameters and loop indices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36288 Differential Revision: D21189020 Pulled By: VitalyFedyunin fbshipit-source-id: a8075a35e707e6cc99cffd0b2177369e8caea37c	2020-04-23 10:16:58 -07:00
Alban Desmaison	3799d1d74a	Fix many doc issues (#37099 ) Summary: Fix https://github.com/pytorch/pytorch/issues/35643 https://github.com/pytorch/pytorch/issues/37063 https://github.com/pytorch/pytorch/issues/36307 https://github.com/pytorch/pytorch/issues/35861 https://github.com/pytorch/pytorch/issues/35299 https://github.com/pytorch/pytorch/issues/23108 https://github.com/pytorch/pytorch/issues/4661 Just a bunch of small updates on the doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37099 Differential Revision: D21185713 Pulled By: albanD fbshipit-source-id: 4ac06d6709dc0da6109a6ad3daae75667ee5863e	2020-04-23 10:01:03 -07:00
Nikita Shulga	9763db3031	`MultiProcessTestCase` to use instance rather than class method wrappers (#36826 ) Summary: This makes its wrappers stackable with `common_utils.TestCase` ones Pull Request resolved: https://github.com/pytorch/pytorch/pull/36826 Test Plan: CI Differential Revision: D21178217 Pulled By: mrshenli fbshipit-source-id: f80dd4aa175e20bd338b38b2c42c3118258f45dc	2020-04-23 08:40:02 -07:00
Xiang Gao	3880f14b64	Canonicalize includes in torch, and add tests for it (#36303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36303 Test Plan: Imported from OSS Differential Revision: D20943003 Pulled By: ezyang fbshipit-source-id: 81fcbaccc1a7eec422bd8347d196bb66a5467884	2020-04-23 08:09:21 -07:00
Ralf Gommers	b3f04a398a	Re-enable JIT test `test_class_sorting` (#37140 ) Summary: Closes gh-36902 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37140 Differential Revision: D21202738 Pulled By: ezyang fbshipit-source-id: ff05384b8c0f625ed1aaaa82628af13d5496788b	2020-04-23 07:55:39 -07:00
Xiao Wang	11cef0fe88	Update cusparse deprecated Xcsrmm2 call (#36845 ) Summary: The new function signature https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-generic-function-spmm. Please also check https://docs.nvidia.com/cuda/cusparse/index.html#cusparse-generic-api-reference for the limitations. I have added windows guard in this PR. > LIMITATION: The generic APIs are currently available for all platforms except Windows. Using these APIs in any other systems will result in compile-time or run-time failures. Their support will be extended in the next releases. Edit: also add a cuda guard to let ROCm use old version API (avoid build failures) Since the new cusparse signatures sometimes give inaccurate results in CUDA 10.1, and this was fixed in CUDA 10.2, the new signatures should only be used with CUDA >= 10.2 cc csarofeen ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/36845 Differential Revision: D21196366 Pulled By: ezyang fbshipit-source-id: 592d6bd6379f7db52cbad827d43864ea65ff18ea	2020-04-23 07:05:44 -07:00
Gao, Xiang	a38c6e0454	Migrate addmv and mv from legacy to ATen native (CUDA & CPU) (#30898 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/24605 https://github.com/pytorch/pytorch/issues/24535 https://github.com/pytorch/pytorch/issues/24739 https://github.com/pytorch/pytorch/issues/24680 https://github.com/pytorch/pytorch/issues/30986 This does not fix https://github.com/pytorch/pytorch/issues/29984, it will be fixed in later PR. Most of this PR is just following the same logic inside TH and THC except the handle of n-dimensional zero-sized tensor, in specific the case: ``` (m,).addmv((m, 0), (0,), beta, alpha) ``` # Legacy code bugs and how this PR deal with it The above case is a case where BLAS often have a mismatch of semantics with PyTorch: For BLAS and cuBLAS, the above is a noop, but for PyTorch, it is a scalar-vector multiplication `output = beta * input`. The handle of this case is already very poor in legacy code and it is poorly tested: For the CPU implementation, there are two code paths: - Path 1: when dtype is float or double and `USE_BLAS`, then use BLAS - Path 2: when other dtypes or not `USE_BLAS`, use a fallback kernel in PyTorch For the CUDA implementation, there are also two code paths: - Path 1: when float or double, then use `cublasSgemv` or `cublasDgemv` in cuBlas - Path 2: when half, dispatch to `addmm` `test_blas_alpha_beta_empty` is supposed to cover all cases, but unfortunately, it only tests the Path 1 of CUDA and Path 1 of CPU, and both uncovered paths (path 2 for CPU and path 2 for CUDA) are buggy in legacy code. In this PR, I expanded the coverage of `test_blas_alpha_beta_empty`, but unfortunately, I have to skip the `half` dtype on CUDA 9. See the description below for detail: ## Bug on CPU implementation For the CPU implementation, the fallback kernel in path 2 already has the same semantics as PyTorch, not BLAS. But the code that tries to correct BLAS semantics to match PyTorch also runs on this case, leading to double correction, that is, `output = beta * input` now becomes `output = beta * beta * input`. This leads to the issue https://github.com/pytorch/pytorch/issues/30986 I just opened, and it is fixed in this PR. ## Bug on CUDA implementation For the CUDA implementation, path 2 dispatches to ``` (m, 1).addmm((m, 0), (0, 1), beta, alpha) ``` But unfortunately, for some old CUDA version when on old GPU on half dtype, the above is also noop, which is definitely not correct. But from what I see, on newer CUDA version or newer GPU, this is not a problem. This is a bug of PyTorch in `addmm`, so I opened a new issue https://github.com/pytorch/pytorch/issues/31006 to track this problem. But this is highly likely a dependency bug for PyTorch originating from cuBLAS, and it is only on a rarely used edge case on old hardware and software, so this issue would be a `won't_fix` unless some real requirements strongly indicate that this should be fixed. This issue is already with legacy code, and this PR does not make it worse. To prevent this issue from bothering us, I disable the test of `half` dtype for CUDA 9 when expanding the coverage of `test_blas_alpha_beta_empty`. I promote a CircleCI CUDA 10.1 test to `XImportant` so that it runs on PRs, because the path 2 of CUDA implementation is only covered by this configuration. Let me know if I should revert this change. ## An additional problem In legacy code for `addmv`, dtype `bfloat16` is enabled and dispatch to `addmm`, but `addmm` does not support `bfloat16` from what I test. I do the same thing in the new code. Let me know if I should do it differently. # Benchmark Code: ```python import torch print(torch.__version__) for i in range(1000): torch.arange(i, device='cuda') print('cpu') for i in 10, 100, 1000, 10000: a = torch.randn((i,)) b = torch.randn((i, i)) c = torch.randn((i,)) %timeit a.addmv(b, c, alpha=1, beta=2) print('cuda') for i in 10, 100, 1000, 10000: a = torch.randn((i,)).cuda() b = torch.randn((i, i)).cuda() c = torch.randn((i,)).cuda() torch.cuda.synchronize() %timeit a.addmv(b, c, alpha=1, beta=2); torch.cuda.synchronize() ``` Before: ``` 1.5.0a0+2b45368 cpu 2.74 µs ± 30.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 8.5 µs ± 85.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 686 µs ± 2.95 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 74 ms ± 410 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cuda The slowest run took 4.81 times longer than the fastest. This could mean that an intermediate result is being cached. 27.6 µs ± 23 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 17.3 µs ± 151 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 20.5 µs ± 369 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 756 µs ± 6.81 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` After: ``` 1.5.0a0+66b4034 cpu 3.29 µs ± 20 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 9.09 µs ± 7.41 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 687 µs ± 7.01 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 73.8 ms ± 453 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) cuda 18.2 µs ± 478 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 17.7 µs ± 299 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 21.5 µs ± 2.38 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 751 µs ± 35.3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30898 Differential Revision: D20660338 Pulled By: anjali411 fbshipit-source-id: db1f521f124198f63545064026f93fcb16b68f18	2020-04-23 06:56:49 -07:00
Mike Ruberry	0dd21c3b72	Lets @dtypes take tuples of dtypes (#36908 ) Summary: Lets dtypes take tuples of dtypes instead of just single dtypes. This pattern comes up when tests have distinct in and out types. A test in test_type_promotion is updated to use the new behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36908 Differential Revision: D21161523 Pulled By: mruberry fbshipit-source-id: ebac81c1b6c494a2146d595fcdb3e35c22cf859c	2020-04-23 02:28:20 -07:00
Dmytro Dzhulgakov	50a1850d8d	[pytorch] Route default warning sync to LOG(WARNING) - second try (#36984 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36984 Follow LOG(WARNING) format for c++ side warnings in order to play well with larger services, especially when using glog. I need to hook up into GLOG internals a bit in order to override FILE/LINE without having to change the whole thing to be macros, but it seems to be stable between glog versions. Note, this also changes caffe2_log_level to warning by default - I think it's a much better default when compiling without glog (or maybe even have info). With glog output, stderr capture doesn't work any more in tests. That's why we instead use c10-level warnings capture. Test Plan: Run unittest in both glog and non-glog build mode: glog: ``` W0416 12:06:49.778215 3311666 exception_test.cpp:23] Warning: I'm a warning (function TestBody) ``` no-glog: ``` [W exception_test.cpp:23] Warning: I'm a warning (function TestBody) ``` Reviewed By: ilia-cher Differential Revision: D21151351 fbshipit-source-id: fa926d9e480db5ff696990dad3d80f79ef79f24a	2020-04-23 01:08:00 -07:00
Alexander Fix	b889e0da8a	[torch] Excluding test_fft_input_modification without MKL (#36680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36680 If torch compiled without MKL, this test fails with torch.fft requiring MKL support Test Plan: CI Reviewed By: malfet Differential Revision: D21051362 fbshipit-source-id: dd2e2c7d323622c1c25fc4c817b85d83d2241b3a	2020-04-22 21:58:02 -07:00
ashishfarmer	355cafde26	[ROCm] Don't use MIOpen for tensors with more than INT_MAX number of elements (#37110 ) Summary: This pull request extends the fallback implemented in https://github.com/pytorch/pytorch/issues/31383 to not use MIOpen for tensors where number of elements in a tensor exceeds INT_MAX. The PR also enables the corresponding test in TestNN cc: ezyang jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/37110 Differential Revision: D21196336 Pulled By: ezyang fbshipit-source-id: 25fd80308a0e2f7941c249735674ebc85d3fd39e	2020-04-22 21:20:53 -07:00
Karl Ostmo	f46231a2f4	Revert D21144940: [pytorch][PR] ci: Change file_diff_from_base to be dynamic Test Plan: revert-hammer Differential Revision: D21144940 Original commit changeset: ec6d1c2adcf7 fbshipit-source-id: e4f2d37b7e043aadc210ef45b2bba6cf859aeed3	2020-04-22 21:09:09 -07:00
Mike Ruberry	f771c96852	Returns float from complex angle (#36896 ) Summary: Updates angle to return a float tensor, by default, when given complex inputs. This behavior is compatible with Python, NumPy, and C++. The implementation follows the former implementation for complex abs, extracting the logic into a common function for both abs and angle. The test for complex abs's behavior in test_type_promotion.py is updated to also test the behavior of complex angle by comparing its results to NumPy's. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36896 Differential Revision: D21170589 Pulled By: mruberry fbshipit-source-id: f5a634aea351dd58a8376f1474fc5a6422038cbf	2020-04-22 19:51:15 -07:00
Michael Suo	45706bf6d8	properly whitelist clang-format in CI (#37122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37122 Test Plan: Imported from OSS Reviewed By: shannonzhu Differential Revision: D21194505 Pulled By: suo fbshipit-source-id: d756c5291535ac1aefbbad57b38f324e42e2c2f7	2020-04-22 19:24:22 -07:00
Gao, Xiang	7c7cb74887	Add missing ${CMAKE_CURRENT_SOURCE_DIR}/complex_test.cpp (#37080 ) Summary: This test is never built in OSS CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/37080 Differential Revision: D21179296 Pulled By: anjali411 fbshipit-source-id: 22a5b82f17676213c8ec51642bef35dc61f9cace	2020-04-22 19:22:59 -07:00
Alexander Fix	ca665c682c	Separate RTLD_GLOBAL from _load_global_deps() (#36682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36682 For fb internal builds we need to separate whether to use global deps library from loading with RTLD_GLOBAL. Test Plan: CI -- this should be a no-op for existing builds Reviewed By: ezyang Differential Revision: D21051427 fbshipit-source-id: 83bb703d6ceb0265a4c58166749312a44172e78c	2020-04-22 19:08:44 -07:00
Ailing Zhang	d0291df7d9	[resubmit] Rebase xla job on top master before running CI build. (#37085 ) Summary: Resubmit of https://github.com/pytorch/pytorch/issues/36852 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37085 Differential Revision: D21180010 Pulled By: ailzhang fbshipit-source-id: c448b836e4c13b15860e0b69da76082bd644badd	2020-04-22 18:46:21 -07:00
James Reed	e557b7cec2	Kill BC hack in torchbind (#37112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37112 Test Plan: Imported from OSS Differential Revision: D21189654 Pulled By: jamesr66a fbshipit-source-id: 83469d4f81cdecc2897594e80e7b84047239e10a	2020-04-22 17:36:18 -07:00
Alex Balik	4ab46f6baf	[pytorch] Delete unneeded scripts Summary: These aren't needed Test Plan: Look closely Differential Revision: D21191456 fbshipit-source-id: d9921afb5363106406a0f6432612586ff4be4290	2020-04-22 17:23:52 -07:00
Sebastian Messmer	de090c42b1	Optimize binary size of assert macros (#37023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37023 Optimize binary size of assert macros, through two ideas: Concatenate string literals with __FILE__ and __LINE__ at compile time into one literal instead of keeping them in separate literals and combining them with c10::str Optimize binary size of c10::str for some scenarios, especially for the scenario where it is called with an empty parameter list, this is actually a common call scenario in assert macros. In server oss builds, this PR reduces binary size from 118.05 MB to 117.05 MB ghstack-source-id: 102607237 Test Plan: Run oss server build (python setup.py install) and check size of libtorch_cpu.so reducing from 118.05MB to 117.05MB Differential Revision: D20719400 fbshipit-source-id: 5c61f4195b947f06aafb8f0c8e255de3366e1ff2	2020-04-22 17:13:17 -07:00
Vasiliy Kuznetsov	7f50162d1e	quantized activations: clean up more unneeded quantizations (#36981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36981 Replaces unneeded quantize calls for remaining quantized activations with empty tensor creation. Should be a perf win for anyone who uses these. Test Plan: python test/quantization/test_quantized.py TestQuantizedOps Imported from OSS Differential Revision: D21185969 fbshipit-source-id: 473b2b8aa40046ea3f0665bd45b03f09e8a7d572	2020-04-22 16:17:08 -07:00
Vasiliy Kuznetsov	2773ed3082	hardswish: remove unnecessary quantize call (#36980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36980 Missed this on the original diff, fixing. Create the output tensor directly instead of quantizing it. Test Plan: tests still pass microbenchmarks show a 2x performance improvment for int8: https://gist.github.com/vkuzo/3b321b428e4c38e805000961c263286b (this will depend on input size) Imported from OSS Differential Revision: D21185970 fbshipit-source-id: 5b9e93d9f9ac05a8120532bd03ad347541a132c2	2020-04-22 16:15:54 -07:00
Masaki Kozuki	6fcabf619d	[takeover] BTRS algorithm for fast/efficient binomial sampling (#36858 ) Summary: The original PR is https://github.com/pytorch/pytorch/pull/31278. CC: ezyang jamestwebber fritzo zasdfgbnm --- <!-- # This PR - CPU In [1]: import torch; import torch.distributions as dist In [2]: counts = torch.randint(10, 1000, [1000,1000]) ...: p = 0.5 * torch.ones(1000, 1000) In [3]: %timeit dist.binomial.Binomial(total_count=counts, probs=p).sample() 94.8 ms ± 911 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) --> ``` # This PR - GPU In [1]: import torch; import torch.distributions as dist In [2]: counts = torch.randint(10, 1000, [1000,1000]).cuda(); p = 0.5 * torch.ones(1000, 1000).cuda() In [3]: %timeit dist.binomial.Binomial(total_count=counts, probs=p).sample() 737 µs ± 216 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) # master (commit: 806f22b167c74897cf67c0828b528fa3e4e6d6de) - GPU In [5]: counts = torch.randint(10, 1000, [1000,1000]).cuda(); p = 0.5 * torch.ones(1000, 1000).cuda() In [6]: %timeit dist.binomial.Binomial(total_count=counts, probs=p).sample() 46.3 ms ± 76.2 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36858 Differential Revision: D21178367 Pulled By: ezyang fbshipit-source-id: 7e7d6f463e35b07156d69bd7452040b2f9c2eb7a	2020-04-22 15:53:41 -07:00
Ralf Gommers	baaa0943f1	Update third_party/cpuinfo to include a fix for conda builds, older kernels (#37083 ) Summary: Includes a single bug fix, https://github.com/pytorch/cpuinfo/pull/37, that showed up as a build error for undefined `__NR_getcpu`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/37083 Differential Revision: D21187439 Pulled By: ezyang fbshipit-source-id: f34be3937b2cb6c6d1c40f86098ef59f17781d66	2020-04-22 15:23:48 -07:00
Shen Li	8d6a8d2b3f	Fix DDP bug in single process multiple device use cases (#36503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36503 Test Plan: Imported from OSS Differential Revision: D21179274 Pulled By: mrshenli fbshipit-source-id: 0afce30ae0ddda753d1e240584a0f80df9aec4c2	2020-04-22 15:06:28 -07:00
Ailing Zhang	efcbcca454	Revert D21138687: [pytorch][PR] Added complex dtypes to get_all_math_dtypes, complex acc type for cpu, fixed rdiv and pow for complex Test Plan: revert-hammer Differential Revision: D21138687 Original commit changeset: ad3602ccf86c fbshipit-source-id: 69eb031c1a7c3d5e4b9f4241fbdada8d5980535d	2020-04-22 14:49:45 -07:00
Ralf Gommers	78d5707041	Fix type annotations and make MyPy run on torch/ (#36584 ) Summary: This PR fixes a couple of syntax errors in `torch/` that prevent MyPy from running, fixes simple type annotation errors (e.g. missing `from typing import List, Tuple, Optional`), and adds granular ignores for errors in particular modules as well as for missing typing in third party packages. As a result, running `mypy` in the root dir of the repo now runs on: - `torch/` - `aten/src/ATen/function_wrapper.py` (the only file already covered in CI) In CI this runs on GitHub Actions, job Lint, sub-job "quick-checks", task "MyPy typecheck". It should give (right now): `Success: no issues found in 329 source files`. Here are the details of the original 855 errors when running `mypy torch` on current master (after fixing the couple of syntax errors that prevent `mypy` from running through): <details> ``` torch/utils/tensorboard/_proto_graph.py:1: error: Cannot find implementation or library stub for module named 'tensorboard.compat.proto.node_def_pb2' torch/utils/tensorboard/_proto_graph.py:2: error: Cannot find implementation or library stub for module named 'tensorboard.compat.proto.attr_value_pb2' torch/utils/tensorboard/_proto_graph.py:3: error: Cannot find implementation or library stub for module named 'tensorboard.compat.proto.tensor_shape_pb2' torch/utils/backcompat/__init__.py:1: error: Cannot find implementation or library stub for module named 'torch._C' torch/for_onnx/__init__.py:1: error: Cannot find implementation or library stub for module named 'torch.for_onnx.onnx' torch/cuda/nvtx.py:2: error: Cannot find implementation or library stub for module named 'torch._C' torch/utils/show_pickle.py:59: error: Name 'pickle._Unpickler' is not defined torch/utils/show_pickle.py:113: error: "Type[PrettyPrinter]" has no attribute "_dispatch" torch/utils/tensorboard/_onnx_graph.py:1: error: Cannot find implementation or library stub for module named 'tensorboard.compat.proto.graph_pb2' torch/utils/tensorboard/_onnx_graph.py:2: error: Cannot find implementation or library stub for module named 'tensorboard.compat.proto.node_def_pb2' torch/utils/tensorboard/_onnx_graph.py:3: error: Cannot find implementation or library stub for module named 'tensorboard.compat.proto.versions_pb2' torch/utils/tensorboard/_onnx_graph.py:4: error: Cannot find implementation or library stub for module named 'tensorboard.compat.proto.attr_value_pb2' torch/utils/tensorboard/_onnx_graph.py:5: error: Cannot find implementation or library stub for module named 'tensorboard.compat.proto.tensor_shape_pb2' torch/utils/tensorboard/_onnx_graph.py:9: error: Cannot find implementation or library stub for module named 'onnx' torch/contrib/_tensorboard_vis.py:10: error: Cannot find implementation or library stub for module named 'tensorflow.core.util' torch/contrib/_tensorboard_vis.py:11: error: Cannot find implementation or library stub for module named 'tensorflow.core.framework' torch/contrib/_tensorboard_vis.py:12: error: Cannot find implementation or library stub for module named 'tensorflow.python.summary.writer.writer' torch/utils/hipify/hipify_python.py:43: error: Need type annotation for 'CAFFE2_TEMPLATE_MAP' (hint: "CAFFE2_TEMPLATE_MAP: Dict[<type>, <type>] = ...") torch/utils/hipify/hipify_python.py:636: error: "object" has no attribute "items" torch/nn/_reduction.py:27: error: Name 'Optional' is not defined torch/nn/_reduction.py:27: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/nn/_reduction.py:47: error: Name 'Optional' is not defined torch/nn/_reduction.py:47: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/utils/tensorboard/_utils.py:17: error: Skipping analyzing 'matplotlib.pyplot': found module but no type hints or library stubs torch/utils/tensorboard/_utils.py:17: error: Skipping analyzing 'matplotlib': found module but no type hints or library stubs torch/utils/tensorboard/_utils.py:18: error: Skipping analyzing 'matplotlib.backends.backend_agg': found module but no type hints or library stubs torch/utils/tensorboard/_utils.py:18: error: Skipping analyzing 'matplotlib.backends': found module but no type hints or library stubs torch/nn/modules/utils.py:27: error: Name 'List' is not defined torch/nn/modules/utils.py:27: note: Did you forget to import it from "typing"? (Suggestion: "from typing import List") caffe2/proto/caffe2_pb2.py:17: error: Unexpected keyword argument "serialized_options" for "FileDescriptor"; did you mean "serialized_pb"? caffe2/proto/caffe2_pb2.py:25: error: Unexpected keyword argument "serialized_options" for "EnumDescriptor" caffe2/proto/caffe2_pb2.py:31: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:35: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:39: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:43: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:47: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:51: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:55: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:59: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:63: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:67: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:71: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:75: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:102: error: Unexpected keyword argument "serialized_options" for "EnumDescriptor" caffe2/proto/caffe2_pb2.py:108: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:112: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:124: error: Unexpected keyword argument "serialized_options" for "EnumDescriptor" caffe2/proto/caffe2_pb2.py:130: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:134: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:138: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:142: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:146: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:150: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:154: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:158: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:162: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:166: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:170: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:174: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:178: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:182: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:194: error: Unexpected keyword argument "serialized_options" for "EnumDescriptor" caffe2/proto/caffe2_pb2.py:200: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:204: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:208: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:212: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:224: error: Unexpected keyword argument "serialized_options" for "EnumDescriptor" caffe2/proto/caffe2_pb2.py:230: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:234: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:238: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:242: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:246: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:250: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:254: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/caffe2_pb2.py:267: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:274: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:281: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:288: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:295: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:302: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:327: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:334: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:341: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:364: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:371: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:378: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:385: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:392: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:399: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:406: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:413: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:420: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:427: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:434: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:441: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:448: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:455: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:462: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:488: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:495: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:502: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:509: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:516: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:523: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:530: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:537: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:544: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:551: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:558: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:565: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:572: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:596: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:603: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:627: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:634: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:641: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:648: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:655: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:662: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:686: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:693: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:717: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:724: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:731: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:738: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:763: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:770: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:777: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:784: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:808: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:815: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:822: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:829: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:836: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:843: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:850: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:857: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:864: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:871: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:878: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:885: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:892: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:916: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:923: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:930: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:937: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:944: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:951: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:958: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:982: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:989: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:996: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1003: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1010: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1017: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1024: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1031: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1038: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1045: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1052: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1059: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1066: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1090: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:1097: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1104: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1128: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:1135: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1142: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1166: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:1173: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1180: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1187: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1194: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1218: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:1225: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1232: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1239: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1246: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1253: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1260: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1267: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1274: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1281: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1305: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:1312: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1319: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1326: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1333: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1340: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1347: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1354: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1361: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1368: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1375: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1382: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1389: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1396: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1420: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:1427: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1434: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1441: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1465: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:1472: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1479: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1486: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1493: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1500: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1507: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1514: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1538: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/caffe2_pb2.py:1545: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1552: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1559: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1566: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/caffe2_pb2.py:1667: error: "GeneratedProtocolMessageType" has no attribute "Segment" torch/multiprocessing/queue.py:4: error: No library stub file for standard library module 'multiprocessing.reduction' caffe2/proto/torch_pb2.py:18: error: Unexpected keyword argument "serialized_options" for "FileDescriptor"; did you mean "serialized_pb"? caffe2/proto/torch_pb2.py:27: error: Unexpected keyword argument "serialized_options" for "EnumDescriptor" caffe2/proto/torch_pb2.py:33: error: Unexpected keyword argument "serialized_options" for "EnumValueDescriptor" caffe2/proto/torch_pb2.py:50: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/torch_pb2.py:57: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:81: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/torch_pb2.py:88: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:95: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:102: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:109: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:116: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:123: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:130: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:137: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:144: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:151: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:175: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/torch_pb2.py:182: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:189: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:196: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:220: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/torch_pb2.py:227: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:234: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:241: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:265: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/torch_pb2.py:272: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:279: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:286: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:293: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:300: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:307: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:314: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:321: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:328: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:335: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:342: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:366: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/torch_pb2.py:373: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:397: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/torch_pb2.py:404: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:411: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:418: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:425: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/torch_pb2.py:432: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:17: error: Unexpected keyword argument "serialized_options" for "FileDescriptor"; did you mean "serialized_pb"? caffe2/proto/metanet_pb2.py:29: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/metanet_pb2.py:36: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:43: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:50: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:57: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:64: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:88: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/metanet_pb2.py:95: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:102: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:126: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/metanet_pb2.py:133: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:140: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:164: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/metanet_pb2.py:171: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:178: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:202: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/metanet_pb2.py:209: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:216: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:240: error: Unexpected keyword argument "serialized_options" for "Descriptor" caffe2/proto/metanet_pb2.py:247: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:254: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:261: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:268: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:275: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:282: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:289: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/metanet_pb2.py:296: error: Unexpected keyword argument "serialized_options" for "FieldDescriptor" caffe2/proto/__init__.py:13: error: Skipping analyzing 'caffe2.caffe2.fb.session.proto': found module but no type hints or library stubs torch/multiprocessing/pool.py:3: error: No library stub file for standard library module 'multiprocessing.util' torch/multiprocessing/pool.py:3: note: (Stub files are from https://github.com/python/typeshed) caffe2/python/scope.py:10: error: Skipping analyzing 'past.builtins': found module but no type hints or library stubs caffe2/python/__init__.py:7: error: Module has no attribute "CPU" caffe2/python/__init__.py:8: error: Module has no attribute "CUDA" caffe2/python/__init__.py:9: error: Module has no attribute "MKLDNN" caffe2/python/__init__.py:10: error: Module has no attribute "OPENGL" caffe2/python/__init__.py:11: error: Module has no attribute "OPENCL" caffe2/python/__init__.py:12: error: Module has no attribute "IDEEP" caffe2/python/__init__.py:13: error: Module has no attribute "HIP" caffe2/python/__init__.py:14: error: Module has no attribute "COMPILE_TIME_MAX_DEVICE_TYPES"; maybe "PROTO_COMPILE_TIME_MAX_DEVICE_TYPES"? caffe2/python/__init__.py:15: error: Module has no attribute "ONLY_FOR_TEST"; maybe "PROTO_ONLY_FOR_TEST"? caffe2/python/__init__.py:34: error: Item "_Loader" of "Optional[_Loader]" has no attribute "exec_module" caffe2/python/__init__.py:34: error: Item "None" of "Optional[_Loader]" has no attribute "exec_module" caffe2/python/__init__.py:35: error: Module has no attribute "cuda" caffe2/python/__init__.py:37: error: Module has no attribute "cuda" caffe2/python/__init__.py:49: error: Module has no attribute "add_dll_directory" torch/random.py:4: error: Cannot find implementation or library stub for module named 'torch._C' torch/_classes.py:2: error: Cannot find implementation or library stub for module named 'torch._C' torch/onnx/__init__.py:1: error: Cannot find implementation or library stub for module named 'torch._C' torch/hub.py:21: error: Skipping analyzing 'tqdm.auto': found module but no type hints or library stubs torch/hub.py:24: error: Skipping analyzing 'tqdm': found module but no type hints or library stubs torch/hub.py:27: error: Name 'tqdm' already defined (possibly by an import) torch/_tensor_str.py:164: error: Not all arguments converted during string formatting torch/_ops.py:1: error: Cannot find implementation or library stub for module named 'torch._C' torch/_linalg_utils.py:26: error: Name 'Optional' is not defined torch/_linalg_utils.py:26: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/_linalg_utils.py:26: error: Name 'Tensor' is not defined torch/_linalg_utils.py:63: error: Name 'Tensor' is not defined torch/_linalg_utils.py:63: error: Name 'Optional' is not defined torch/_linalg_utils.py:63: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/_linalg_utils.py:70: error: Name 'Optional' is not defined torch/_linalg_utils.py:70: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/_linalg_utils.py:70: error: Name 'Tensor' is not defined torch/_linalg_utils.py:88: error: Name 'Tensor' is not defined torch/_linalg_utils.py:88: error: Name 'Optional' is not defined torch/_linalg_utils.py:88: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/_linalg_utils.py:88: error: Name 'Tuple' is not defined torch/_linalg_utils.py:88: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Tuple") torch/_jit_internal.py:17: error: Need type annotation for 'boolean_dispatched' torch/_jit_internal.py:474: error: Need type annotation for '_overloaded_fns' (hint: "_overloaded_fns: Dict[<type>, <type>] = ...") torch/_jit_internal.py:512: error: Need type annotation for '_overloaded_methods' (hint: "_overloaded_methods: Dict[<type>, <type>] = ...") torch/_jit_internal.py:648: error: Incompatible types in assignment (expression has type "FinalCls", variable has type "_SpecialForm") torch/sparse/__init__.py:11: error: Name 'Tensor' is not defined torch/sparse/__init__.py:71: error: Name 'Tensor' is not defined torch/sparse/__init__.py:71: error: Name 'Optional' is not defined torch/sparse/__init__.py:71: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/sparse/__init__.py:71: error: Name 'Tuple' is not defined torch/sparse/__init__.py:71: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Tuple") torch/nn/init.py:109: error: Name 'Tensor' is not defined torch/nn/init.py:126: error: Name 'Tensor' is not defined torch/nn/init.py:142: error: Name 'Tensor' is not defined torch/nn/init.py:165: error: Name 'Tensor' is not defined torch/nn/init.py:180: error: Name 'Tensor' is not defined torch/nn/init.py:194: error: Name 'Tensor' is not defined torch/nn/init.py:287: error: Name 'Tensor' is not defined torch/nn/init.py:315: error: Name 'Tensor' is not defined torch/multiprocessing/reductions.py:8: error: No library stub file for standard library module 'multiprocessing.util' torch/multiprocessing/reductions.py:9: error: No library stub file for standard library module 'multiprocessing.reduction' torch/multiprocessing/reductions.py:17: error: No library stub file for standard library module 'multiprocessing.resource_sharer' torch/jit/_builtins.py:72: error: Module has no attribute "_no_grad_embedding_renorm_" torch/jit/_builtins.py:80: error: Module has no attribute "stft" torch/jit/_builtins.py:81: error: Module has no attribute "cdist" torch/jit/_builtins.py:82: error: Module has no attribute "norm" torch/jit/_builtins.py:83: error: Module has no attribute "nuclear_norm" torch/jit/_builtins.py:84: error: Module has no attribute "frobenius_norm" torch/backends/cudnn/__init__.py:8: error: Cannot find implementation or library stub for module named 'torch._C' torch/backends/cudnn/__init__.py:86: error: Need type annotation for '_handles' (hint: "_handles: Dict[<type>, <type>] = ...") torch/autograd/profiler.py:13: error: Name 'ContextDecorator' already defined (possibly by an import) torch/autograd/function.py:2: error: Cannot find implementation or library stub for module named 'torch._C' torch/autograd/function.py:2: note: See https://mypy.readthedocs.io/en/latest/running_mypy.html#missing-imports torch/autograd/function.py:109: error: Unsupported dynamic base class "with_metaclass" torch/serialization.py:609: error: "Callable[[Any], Any]" has no attribute "cache" torch/_lowrank.py:11: error: Name 'Tensor' is not defined torch/_lowrank.py:13: error: Name 'Optional' is not defined torch/_lowrank.py:13: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/_lowrank.py:14: error: Name 'Optional' is not defined torch/_lowrank.py:14: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/_lowrank.py:14: error: Name 'Tensor' is not defined torch/_lowrank.py:82: error: Name 'Tensor' is not defined torch/_lowrank.py:82: error: Name 'Optional' is not defined torch/_lowrank.py:82: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/_lowrank.py:82: error: Name 'Tuple' is not defined torch/_lowrank.py:82: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Tuple") torch/_lowrank.py:130: error: Name 'Tensor' is not defined torch/_lowrank.py:130: error: Name 'Optional' is not defined torch/_lowrank.py:130: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/_lowrank.py:130: error: Name 'Tuple' is not defined torch/_lowrank.py:130: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Tuple") torch/_lowrank.py:167: error: Name 'Tensor' is not defined torch/_lowrank.py:167: error: Name 'Optional' is not defined torch/_lowrank.py:167: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/_lowrank.py:167: error: Name 'Tuple' is not defined torch/_lowrank.py:167: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Tuple") torch/quantization/observer.py:45: error: Variable "torch.quantization.observer.ABC" is not valid as a type torch/quantization/observer.py:45: note: See https://mypy.readthedocs.io/en/latest/common_issues.html#variables-vs-type-aliases torch/quantization/observer.py:45: error: Invalid base class "ABC" torch/quantization/observer.py:127: error: Name 'Tensor' is not defined torch/quantization/observer.py:127: error: Name 'Tuple' is not defined torch/quantization/observer.py:127: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Tuple") torch/quantization/observer.py:172: error: Module has no attribute "per_tensor_symmetric" torch/quantization/observer.py:172: error: Module has no attribute "per_channel_symmetric" torch/quantization/observer.py:192: error: Name 'Tensor' is not defined torch/quantization/observer.py:192: error: Name 'Tuple' is not defined torch/quantization/observer.py:192: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Tuple") torch/quantization/observer.py:233: error: Module has no attribute "per_tensor_symmetric" torch/quantization/observer.py:233: error: Module has no attribute "per_channel_symmetric" torch/quantization/observer.py:534: error: Name 'Tensor' is not defined torch/quantization/observer.py:885: error: Name 'Tensor' is not defined torch/quantization/observer.py:885: error: Name 'Tuple' is not defined torch/quantization/observer.py:885: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Tuple") torch/quantization/observer.py:894: error: Cannot determine type of 'max_val' torch/quantization/observer.py:894: error: Cannot determine type of 'min_val' torch/quantization/observer.py:899: error: Cannot determine type of 'min_val' torch/quantization/observer.py:902: error: Name 'Tensor' is not defined torch/quantization/observer.py:925: error: Name 'Tensor' is not defined torch/quantization/observer.py:928: error: Cannot determine type of 'min_val' torch/quantization/observer.py:929: error: Cannot determine type of 'max_val' torch/quantization/observer.py:946: error: Argument "min" to "histc" has incompatible type "Tuple[Tensor, Tensor]"; expected "Union[int, float, bool]" torch/quantization/observer.py:946: error: Argument "max" to "histc" has incompatible type "Tuple[Tensor, Tensor]"; expected "Union[int, float, bool]" torch/quantization/observer.py:1056: error: Module has no attribute "per_tensor_symmetric" torch/quantization/observer.py:1058: error: Module has no attribute "per_channel_symmetric" torch/nn/quantized/functional.py:76: error: Name 'Tensor' is not defined torch/nn/quantized/functional.py:76: error: Name 'BroadcastingList2' is not defined torch/nn/quantized/functional.py:259: error: Name 'Tensor' is not defined torch/nn/quantized/functional.py:259: error: Name 'Optional' is not defined torch/nn/quantized/functional.py:259: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/nn/quantized/functional.py:289: error: Module has no attribute "ops" torch/nn/quantized/functional.py:290: error: Module has no attribute "ops" torch/nn/quantized/functional.py:308: error: Name 'Tensor' is not defined torch/nn/quantized/functional.py:326: error: Name 'Tensor' is not defined torch/nn/quantized/functional.py:356: error: Name 'Tensor' is not defined torch/nn/quantized/functional.py:371: error: Name 'Tensor' is not defined torch/nn/quantized/functional.py:400: error: Name 'Tensor' is not defined torch/nn/quantized/functional.py:400: error: Name 'Optional' is not defined torch/nn/quantized/functional.py:400: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/nn/quantized/functional.py:430: error: Name 'Tensor' is not defined torch/nn/quantized/functional.py:448: error: Name 'Tensor' is not defined torch/nn/quantized/modules/linear.py:26: error: Module has no attribute "ops" torch/nn/quantized/modules/linear.py:28: error: Module has no attribute "ops" torch/nn/quantized/modules/functional_modules.py:40: error: Name 'Tensor' is not defined torch/nn/quantized/modules/functional_modules.py:47: error: Name 'Tensor' is not defined torch/nn/quantized/modules/functional_modules.py:54: error: Name 'Tensor' is not defined torch/nn/quantized/modules/functional_modules.py:61: error: Name 'Tensor' is not defined torch/nn/quantized/modules/functional_modules.py:68: error: Name 'List' is not defined torch/nn/quantized/modules/functional_modules.py:68: note: Did you forget to import it from "typing"? (Suggestion: "from typing import List") torch/nn/quantized/modules/functional_modules.py:68: error: Name 'Tensor' is not defined torch/nn/quantized/modules/functional_modules.py:75: error: Name 'Tensor' is not defined torch/nn/quantized/modules/functional_modules.py:140: error: Name 'Tensor' is not defined torch/nn/quantized/modules/functional_modules.py:146: error: Name 'Tensor' is not defined torch/nn/quantized/modules/functional_modules.py:151: error: Name 'Tensor' is not defined torch/nn/quantized/modules/functional_modules.py:157: error: Name 'Tensor' is not defined torch/nn/quantized/modules/functional_modules.py:162: error: Name 'List' is not defined torch/nn/quantized/modules/functional_modules.py:162: note: Did you forget to import it from "typing"? (Suggestion: "from typing import List") torch/nn/quantized/modules/functional_modules.py:162: error: Name 'Tensor' is not defined torch/nn/quantized/modules/functional_modules.py:168: error: Name 'Tensor' is not defined torch/multiprocessing/spawn.py:9: error: Module 'torch.multiprocessing' has no attribute '_prctl_pr_set_pdeathsig' torch/multiprocessing/__init__.py:28: error: Module has no attribute "__all__" torch/jit/frontend.py:9: error: Cannot find implementation or library stub for module named 'torch._C._jit_tree_views' torch/jit/annotations.py:6: error: Module 'torch._jit_internal' has no attribute 'BroadcastingList2'; maybe "BroadcastingList1" or "BroadcastingListCls"? torch/jit/annotations.py:6: error: Module 'torch._jit_internal' has no attribute 'BroadcastingList3'; maybe "BroadcastingList1" or "BroadcastingListCls"? torch/jit/annotations.py:9: error: Cannot find implementation or library stub for module named 'torch._C' torch/distributions/distribution.py:16: error: Need type annotation for 'arg_constraints' (hint: "arg_constraints: Dict[<type>, <type>] = ...") torch/distributions/distribution.py:74: error: Name 'arg_constraints' already defined on line 16 torch/distributions/distribution.py:84: error: Name 'support' already defined on line 15 torch/functional.py:114: error: Name 'Tuple' is not defined torch/functional.py:114: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Tuple") torch/functional.py:114: error: Name 'Optional' is not defined torch/functional.py:114: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/functional.py:189: error: Incompatible types in assignment (expression has type "None", variable has type "Tensor") torch/functional.py:200: error: Argument 1 to "_indices_product" has incompatible type "Tuple[int, ...]"; expected "List[int]" torch/functional.py:204: error: No overload variant of "__setitem__" of "list" matches argument types "Tensor", "int" torch/functional.py:204: note: Possible overload variants: torch/functional.py:204: note: def __setitem__(self, int, int) -> None torch/functional.py:204: note: def __setitem__(self, slice, Iterable[int]) -> None torch/functional.py:204: error: No overload variant of "__getitem__" of "list" matches argument type "Tensor" torch/functional.py:204: note: def __getitem__(self, int) -> int torch/functional.py:204: note: def __getitem__(self, slice) -> List[int] torch/functional.py:207: error: "Tensor" has no attribute "copy_" torch/functional.py:212: error: No overload variant of "__setitem__" of "list" matches argument types "Tensor", "int" torch/functional.py:212: note: Possible overload variants: torch/functional.py:212: note: def __setitem__(self, int, int) -> None torch/functional.py:212: note: def __setitem__(self, slice, Iterable[int]) -> None torch/functional.py:212: error: No overload variant of "__getitem__" of "list" matches argument type "Tensor" torch/functional.py:212: note: def __getitem__(self, int) -> int torch/functional.py:212: note: def __getitem__(self, slice) -> List[int] torch/functional.py:215: error: Incompatible types in assignment (expression has type "None", variable has type "Tensor") torch/functional.py:334: error: Name 'Optional' is not defined torch/functional.py:334: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/functional.py:429: error: Argument 2 to "pad" has incompatible type "Tuple[int, int]"; expected "List[int]" torch/functional.py:431: error: Module has no attribute "stft" torch/functional.py:766: error: Module has no attribute "cdist" torch/functional.py:768: error: Module has no attribute "cdist" torch/functional.py:770: error: Module has no attribute "cdist" torch/functional.py:775: error: Name 'Optional' is not defined torch/functional.py:775: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/functional.py:780: error: Name 'Optional' is not defined torch/functional.py:780: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/functional.py:780: error: Name 'number' is not defined torch/functional.py:780: error: Name 'norm' already defined on line 775 torch/functional.py:785: error: Name 'Optional' is not defined torch/functional.py:785: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/functional.py:785: error: Name 'number' is not defined torch/functional.py:785: error: Name 'norm' already defined on line 775 torch/functional.py:790: error: Name 'Optional' is not defined torch/functional.py:790: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/functional.py:790: error: Name 'norm' already defined on line 775 torch/functional.py:795: error: Name 'norm' already defined on line 775 torch/functional.py:960: error: Name 'Any' is not defined torch/functional.py:960: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Any") torch/functional.py:960: error: Name 'Tuple' is not defined torch/functional.py:960: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Tuple") torch/functional.py:1036: error: Argument 1 to "len" has incompatible type "int"; expected "Sized" torch/functional.py:1041: error: Name 'Optional' is not defined torch/functional.py:1041: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/functional.py:1041: error: Name 'Tuple' is not defined torch/functional.py:1041: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Tuple") torch/functional.py:1056: error: Name 'Optional' is not defined torch/functional.py:1056: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/functional.py:1056: error: Name 'Tuple' is not defined torch/functional.py:1056: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Tuple") torch/distributions/von_mises.py:87: error: Incompatible types in assignment (expression has type "_Real", base class "Distribution" defined the type as "None") torch/distributions/negative_binomial.py:25: error: Incompatible types in assignment (expression has type "_IntegerGreaterThan", base class "Distribution" defined the type as "None") torch/distributions/multivariate_normal.py:116: error: Incompatible types in assignment (expression has type "_Real", base class "Distribution" defined the type as "None") torch/distributions/laplace.py:23: error: Incompatible types in assignment (expression has type "_Real", base class "Distribution" defined the type as "None") torch/distributions/independent.py:34: error: Need type annotation for 'arg_constraints' (hint: "arg_constraints: Dict[<type>, <type>] = ...") torch/distributions/cauchy.py:28: error: Incompatible types in assignment (expression has type "_Real", base class "Distribution" defined the type as "None") torch/distributions/poisson.py:28: error: Incompatible types in assignment (expression has type "_IntegerGreaterThan", base class "Distribution" defined the type as "None") torch/distributions/one_hot_categorical.py:32: error: Incompatible types in assignment (expression has type "_Simplex", base class "Distribution" defined the type as "None") torch/distributions/normal.py:27: error: Incompatible types in assignment (expression has type "_Real", base class "Distribution" defined the type as "None") torch/distributions/lowrank_multivariate_normal.py:79: error: Incompatible types in assignment (expression has type "_Real", base class "Distribution" defined the type as "None") torch/distributions/gamma.py:30: error: Incompatible types in assignment (expression has type "_GreaterThan", base class "Distribution" defined the type as "None") torch/distributions/exponential.py:23: error: Incompatible types in assignment (expression has type "_GreaterThan", base class "Distribution" defined the type as "None") torch/distributions/fishersnedecor.py:25: error: Incompatible types in assignment (expression has type "_GreaterThan", base class "Distribution" defined the type as "None") torch/distributions/dirichlet.py:44: error: Incompatible types in assignment (expression has type "_Simplex", base class "Distribution" defined the type as "None") torch/nn/quantized/dynamic/modules/rnn.py:230: error: Incompatible types in assignment (expression has type "int", variable has type "Tensor") torch/nn/quantized/dynamic/modules/rnn.py:232: error: Incompatible types in assignment (expression has type "int", variable has type "Tensor") torch/nn/quantized/dynamic/modules/rnn.py:236: error: Incompatible return value type (got "Tuple[Any, Tensor, Any]", expected "Tuple[int, int, int]") torch/nn/quantized/dynamic/modules/rnn.py:351: error: Incompatible types in assignment (expression has type "Type[LSTM]", base class "RNNBase" defined the type as "Type[RNNBase]") torch/nn/quantized/dynamic/modules/rnn.py:381: error: Module has no attribute "quantized_lstm" torch/nn/quantized/dynamic/modules/rnn.py:385: error: Module has no attribute "quantized_lstm" torch/nn/quantized/dynamic/modules/rnn.py:414: error: Argument 1 to "forward_impl" of "LSTM" has incompatible type "PackedSequence"; expected "Tensor" torch/nn/quantized/dynamic/modules/rnn.py:416: error: Incompatible types in assignment (expression has type "PackedSequence", variable has type "Tensor") torch/nn/quantized/dynamic/modules/rnn.py:418: error: Incompatible return value type (got "Tuple[Tensor, Tuple[Tensor, Tensor]]", expected "Tuple[PackedSequence, Tuple[Tensor, Tensor]]") torch/nn/quantized/dynamic/modules/rnn.py:420: error: Argument 1 of "permute_hidden" is incompatible with supertype "RNNBase"; supertype defines the argument type as "Tensor" torch/nn/quantized/dynamic/modules/rnn.py:420: error: Return type "Tuple[Tensor, Tensor]" of "permute_hidden" incompatible with return type "Tensor" in supertype "RNNBase" torch/nn/quantized/dynamic/modules/rnn.py:426: error: Argument 2 of "check_forward_args" is incompatible with supertype "RNNBase"; supertype defines the argument type as "Tensor" torch/nn/intrinsic/qat/modules/conv_fused.py:232: error: Incompatible types in assignment (expression has type "Type[ConvBnReLU2d]", base class "ConvBn2d" defined the type as "Type[ConvBn2d]") torch/distributions/beta.py:27: error: Incompatible types in assignment (expression has type "_Interval", base class "Distribution" defined the type as "None") torch/distributions/geometric.py:31: error: Incompatible types in assignment (expression has type "_IntegerGreaterThan", base class "Distribution" defined the type as "None") torch/distributions/continuous_bernoulli.py:38: error: Incompatible types in assignment (expression has type "_Interval", base class "Distribution" defined the type as "None") torch/distributions/bernoulli.py:30: error: Incompatible types in assignment (expression has type "_Boolean", base class "Distribution" defined the type as "None") torch/quantization/fake_quantize.py:126: error: Module has no attribute "per_tensor_symmetric" torch/quantization/fake_quantize.py:132: error: Module has no attribute "per_channel_symmetric" torch/distributions/transformed_distribution.py:41: error: Need type annotation for 'arg_constraints' (hint: "arg_constraints: Dict[<type>, <type>] = ...") torch/jit/__init__.py:1: error: Cannot find implementation or library stub for module named 'torch._C' torch/jit/__init__.py:15: error: Module 'torch.utils' has no attribute 'set_module' torch/jit/__init__.py:70: error: Name 'Attribute' already defined on line 68 torch/jit/__init__.py:213: error: On Python 3 '{}'.format(b'abc') produces "b'abc'"; use !r if this is a desired behavior torch/jit/__init__.py:215: error: On Python 3 '{}'.format(b'abc') produces "b'abc'"; use !r if this is a desired behavior torch/jit/__init__.py:1524: error: Unsupported dynamic base class "with_metaclass" torch/jit/__init__.py:1869: error: Name 'ScriptModule' already defined on line 1524 torch/jit/__init__.py:1998: error: Need type annotation for '_jit_caching_layer' torch/jit/__init__.py:1999: error: Need type annotation for '_jit_function_overload_caching' torch/distributions/relaxed_categorical.py:34: error: Incompatible types in assignment (expression has type "_Real", base class "Distribution" defined the type as "None") torch/distributions/relaxed_categorical.py:108: error: Incompatible types in assignment (expression has type "_Simplex", base class "Distribution" defined the type as "None") torch/distributions/relaxed_bernoulli.py:31: error: Incompatible types in assignment (expression has type "_Real", base class "Distribution" defined the type as "None") torch/distributions/relaxed_bernoulli.py:114: error: Incompatible types in assignment (expression has type "_Interval", base class "Distribution" defined the type as "None") torch/distributions/logistic_normal.py:31: error: Incompatible types in assignment (expression has type "_Simplex", base class "Distribution" defined the type as "None") torch/distributions/log_normal.py:26: error: Incompatible types in assignment (expression has type "_GreaterThan", base class "Distribution" defined the type as "None") torch/distributions/half_normal.py:27: error: Incompatible types in assignment (expression has type "_GreaterThan", base class "Distribution" defined the type as "None") torch/distributions/half_cauchy.py:28: error: Incompatible types in assignment (expression has type "_GreaterThan", base class "Distribution" defined the type as "None") torch/distributions/gumbel.py:28: error: Incompatible types in assignment (expression has type "_Real", base class "Distribution" defined the type as "None") torch/nn/quantized/modules/conv.py:18: error: Module 'torch.nn.utils' has no attribute 'fuse_conv_bn_weights' torch/nn/quantized/modules/conv.py:209: error: Name 'Optional' is not defined torch/nn/quantized/modules/conv.py:209: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/nn/quantized/modules/conv.py:214: error: Module has no attribute "ops" torch/nn/quantized/modules/conv.py:321: error: Name 'Optional' is not defined torch/nn/quantized/modules/conv.py:321: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/nn/quantized/modules/conv.py:323: error: Module has no attribute "ops" torch/nn/quantized/modules/conv.py:447: error: Name 'Optional' is not defined torch/nn/quantized/modules/conv.py:447: note: Did you forget to import it from "typing"? (Suggestion: "from typing import Optional") torch/nn/quantized/modules/conv.py:449: error: Module has no attribute "ops" torch/nn/quantized/modules/conv.py:513: error: Name 'nn.modules.conv._ConvTransposeNd' is not defined torch/nn/quantized/modules/conv.py:525: error: Name 'List' is not defined torch/nn/quantized/modules/conv.py:525: note: Did you forget to import it from "typing"? (Suggestion: "from typing import List") torch/nn/quantized/modules/conv.py:527: error: Name 'List' is not defined torch/nn/quantized/modules/conv.py:527: note: Did you forget to import it from "typing"? (Suggestion: "from typing import List") torch/nn/intrinsic/quantized/modules/conv_relu.py:8: error: Module 'torch.nn.utils' has no attribute 'fuse_conv_bn_weights' torch/nn/intrinsic/quantized/modules/conv_relu.py:21: error: Incompatible types in assignment (expression has type "Type[ConvReLU2d]", base class "Conv2d" defined the type as "Type[Conv2d]") torch/nn/intrinsic/quantized/modules/conv_relu.py:62: error: Incompatible types in assignment (expression has type "Type[ConvReLU3d]", base class "Conv3d" defined the type as "Type[Conv3d]") torch/distributions/weibull.py:25: error: Incompatible types in assignment (expression has type "_GreaterThan", base class "Distribution" defined the type as "None") torch/distributions/kl.py:35: error: Need type annotation for '_KL_MEMOIZE' (hint: "_KL_MEMOIZE: Dict[<type>, <type>] = ...") torch/distributions/studentT.py:27: error: Incompatible types in assignment (expression has type "_Real", base class "Distribution" defined the type as "None") torch/distributions/mixture_same_family.py:48: error: Need type annotation for 'arg_constraints' (hint: "arg_constraints: Dict[<type>, <type>] = ...") torch/distributions/__init__.py:158: error: Name 'transforms' is not defined torch/onnx/utils.py:21: error: Cannot find implementation or library stub for module named 'torch._C' torch/distributed/rendezvous.py:4: error: Cannot find implementation or library stub for module named 'urlparse' torch/distributed/rendezvous.py:4: error: Name 'urlparse' already defined (possibly by an import) torch/distributed/rendezvous.py:4: error: Name 'urlunparse' already defined (possibly by an import) torch/distributed/rendezvous.py:9: error: Module 'torch.distributed' has no attribute 'FileStore' torch/distributed/rendezvous.py:9: error: Module 'torch.distributed' has no attribute 'TCPStore' torch/distributed/rendezvous.py:65: error: On Python 3 '{}'.format(b'abc') produces "b'abc'"; use !r if this is a desired behavior torch/distributed/distributed_c10d.py:11: error: Module 'torch.distributed' has no attribute 'AllreduceOptions'; maybe "ReduceOptions" or "AllreduceCoalescedOptions"? torch/distributed/distributed_c10d.py:11: error: Module 'torch.distributed' has no attribute 'AllreduceCoalescedOptions'; maybe "AllreduceOptions"? torch/distributed/distributed_c10d.py:11: error: Module 'torch.distributed' has no attribute 'AllToAllOptions' torch/distributed/distributed_c10d.py:11: error: Module 'torch.distributed' has no attribute 'BroadcastOptions' torch/distributed/distributed_c10d.py:11: error: Module 'torch.distributed' has no attribute 'GatherOptions'; maybe "ScatterOptions"? torch/distributed/distributed_c10d.py:11: error: Module 'torch.distributed' has no attribute 'ReduceOptions'; maybe "AllreduceOptions", "ReduceScatterOptions", or "ReduceOp"? torch/distributed/distributed_c10d.py:11: error: Module 'torch.distributed' has no attribute 'ReduceScatterOptions'; maybe "ScatterOptions" or "ReduceOptions"? torch/distributed/distributed_c10d.py:11: error: Module 'torch.distributed' has no attribute 'ScatterOptions'; maybe "ReduceScatterOptions" or Pull Request resolved: https://github.com/pytorch/pytorch/pull/36584 Reviewed By: seemethere, ailzhang Differential Revision: D21155985 Pulled By: ezyang fbshipit-source-id: f628d4293992576207167e7c417998fad15898d1	2020-04-22 14:17:08 -07:00
Nikita Shulga	e921cd222a	Move bulky constants from SobolEngineOpsUtil.h to .cpp file (#37086 ) Summary: Also move statics in global namespace to inlines in `at::native::sobol_utils` namespace Pull Request resolved: https://github.com/pytorch/pytorch/pull/37086 Test Plan: CI as well as build with Xcode 11.3 Smoke test perf, compiled using: ` cmake ../pytorch -DPYTHON_EXECUTABLE=/usr/bin/python3.7 -DUSE_CUDA=NO -DBUILD_TEST=YES-DCMAKE_CXX_COMPILER=/usr/bin/cuda-g++ -DCMAKE_C_COMPILER=/usr/bin/cuda-gcc -DUSE_MKLDNN=ON -G Ninja` and run on Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GH running FC-30: Before: ``` $ python3.7 -m timeit -s 'from torch.quasirandom import SobolEngine; sobol = SobolEngine(65, True, 18)' 'sobol.draw(11)' 50000 loops, best of 5: 7.99 usec per loop ``` After: ``` $ python3.7 -m timeit -s 'from torch.quasirandom import SobolEngine; sobol = SobolEngine(65, True, 18)' 'sobol.draw(11)' 50000 loops, best of 5: 7.72 usec per loop ```` Differential Revision: D21182866 Pulled By: malfet fbshipit-source-id: d3e501ccb9ffbe6395c1598a6f79f2f2f1f37ee0	2020-04-22 14:10:21 -07:00
Emilio Castillo	5fc391a646	Enforce type promotion in `torch.cat` (#35030 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35014 CUDA `cat` implementation doesn't use `TensorIterator` so there is the need of manually doing some checks in the code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35030 Differential Revision: D21155853 Pulled By: nairbv fbshipit-source-id: 9e78bb7591f806734e12555831157061c925ff40	2020-04-22 13:35:07 -07:00
Eli Uriegas	73bffeff62	scripts: Distinguish between platforms in conda promote (#37089 ) Summary: Files that were named the same within the anaconda repository, i.e. pytorch_1.5.0-cpu.bz2, were found to be clobbering each other, especially amongst different platforms. This lead to similarly named packages for different platforms to not get promoted. This also adds "--skip" to our anaconda upload so that we don't end up overwriting our releases just in case this script gets run twice. Also, conda search ends up erroring out if it doesn't find anything for the current platform being searched for so we should just continue forward if we don't find anything since we want to be able to use this script for all of the packages we support which also do not release packages for all of the same platforms. (torchtext for example only has "noarch") This should also probably be back-ported to the `release/1.5` branch since this changeset was used to release `v1.5.0` Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/37089 Differential Revision: D21184768 Pulled By: seemethere fbshipit-source-id: dbe12d74df593b57405b178ddb2375691e128a49	2020-04-22 13:29:56 -07:00
Nikita Shulga	76cb7f2043	Use filelist from build_variables.bzl to fetch distributed file list (#37090 ) Summary: Rename `get_filelist` to `append_filelist` Repalce hadcoded filelist under `USE_DISTRIBUTED` with `append_filelist("libtorch_distributed_sources" TORCH_SRCS)` call Pull Request resolved: https://github.com/pytorch/pytorch/pull/37090 Test Plan: CI Differential Revision: D21184002 Pulled By: malfet fbshipit-source-id: 25bb7f97fcb2bf5bec8bdb3aa059ae13e7610007	2020-04-22 13:13:25 -07:00
Rohan Varma	7bd2014eec	[resubmit][rpc] per-RPC timeouts for rpc_sync and rpc_async (#34650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34650 Resubmit of https://github.com/pytorch/pytorch/pull/33840, which was overly eager in the sense that it deleted a lot of code that we didn't want to get rid of yet (default timeout handling). This PR adds an optional argument into `rpc_sync` and `rpc_async` as well as `RpcAgent::send()` that allows the user to specify a timeout for an RPC to override the default set timeout. If the user does not specify this argument, then the currently set default RPC timeout given in the RPC constructor or by `rpc.set_rpc_timeout()` is used. Otherwise, we use the passed in timeout. This diff does not address: 1) timeout support when called rpc.rpc_async is called as a JIT operator. For this to work, we would need to change the logic in `register_distributed_ops` to pass in this timeout to `rpcTorchscript`. One more issue is that torchscript doesn't support the timedelta object. This will be done in a follow up PR as it requires a fair amount of changes to the argument parsing logic. 2) Per-RPC timeouts for internal messages or `rpc.remote()`. A follow-up diff will address the latter with the approach of raising the timeout error at the earliest next possible time to the user, such as when the next time the RRef is forked or `to_here` is called Added unit tests to confirm the current behavior ghstack-source-id: 102622601 Test Plan: Added unit tests in rpc_test Differential Revision: D20376953 fbshipit-source-id: 9fb3f147520588308ab50dd33286255658d76d47	2020-04-22 13:00:42 -07:00
Martin Yuan	b0ee6c70aa	Remove register_mobile_ops.cpp (#37035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37035 Test Plan: Imported from OSS Differential Revision: D21168316 Pulled By: iseeyuan fbshipit-source-id: 11a534c94b2a8f75e0e01d56810a968e62a3b706	2020-04-22 12:54:49 -07:00
Bomme	8a6ab004f7	Dockerfile: Update miniconda installer download location & remove unnecessary flag (#37082 ) Summary: https://repo.continuum.io/ was permanently moved to https://repo.anaconda.com No need to specify -O since we have -o Pull Request resolved: https://github.com/pytorch/pytorch/pull/37082 Differential Revision: D21182390 Pulled By: malfet fbshipit-source-id: eeec70a883cbfd14105abd1ac6685f66afc02c02	2020-04-22 12:26:33 -07:00
Eli Uriegas	5710f278a1	ci: Change file_diff_from_base to be dynamic (#36260 ) Summary: Changes the file_diff_from_base function to get the base reference directly from CircleCI's pipeline variables instead of being hardcoded to master. cc gchanan Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/36260 Differential Revision: D21144940 Pulled By: seemethere fbshipit-source-id: ec6d1c2adcf703119bdab2a43f26a39a5fbaf71b	2020-04-22 11:51:27 -07:00
Michael Suo	cf77e56938	clang-format don't run on master (#37058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37058 We shouldn't add advisory checks to master, because PRs will get reverted if they fail. This PR makes the following changes: 1. Factor out the binary fetch logic into `clang_format_utils.py` 2. Copypasta the canonical git integration from llvm and modify it to use our binary fetcher. No more bikeshedding about how to integrate, we just use the standard integration. 3. Change the CI job to run on pull-requests only and use `git-clang-format`. 4. The original `clang_format.py` is now renamed `clang_format_all.py` to reflect its purpose. 5. The pre-commit hook has been changed to use `git-clang-format`. For pre-commit hook users: no changes required. For others: add `tools/git-clang-format` to your PATH and you can do `git clang-format` to format your working tree. Test Plan: Imported from OSS Differential Revision: D21180893 Pulled By: suo fbshipit-source-id: f8358fb7ce26f11585226aaac5ed89d379257bfb	2020-04-22 11:37:22 -07:00
Jianyu Huang	171476e870	CUDA implementation of Sparse Adagrad Fusion for GPUs (#35762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35762 We implement the following operators for Regular and RowWise SparseAdagrad Fusion with SLS and SLWS gradient - SparseAdagradFusedWithSparseLengthsSumGradient - RowWiseSparseAdagradFusedWithSparseLengthsSumGradient - SparseAdagradFusedWithSparseLengthsWeightedSumGradient - RowWiseSparseAdagradFusedWithSparseLengthsWeightedSumGradient Test Plan: - SparseAdagradFusedWithSparseLengthsSumGradient - RowWiseSparseAdagradFusedWithSparseLengthsSumGradient ``` buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_sum_gradient $caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps$' --print-passing-details ``` - SparseAdagradFusedWithSparseLengthsWeightedSumGradient - RowWiseSparseAdagradFusedWithSparseLengthsWeightedSumGradient ``` buck test mode/dev-nosan //caffe2/caffe2/fb/net_transforms/tests:fuse_sparse_ops_test -- 'test_fuse_sparse_adagrad_with_sparse_lengths_weighted_sum_gradient $caffe2\.caffe2\.fb\.net_transforms\.tests\.fuse_sparse_ops_test\.TestFuseSparseOps$' --print-passing-details ``` Benchmark code: ``` buck run mode/dev-nosan //caffe2/caffe2/fb/optimizers:adagrad_fused_bench_gpu ``` Reviewed By: jspark1105 Differential Revision: D20453096 fbshipit-source-id: bc209348232e3454af0d1d909bbd8ab7f07f69fd	2020-04-22 11:30:20 -07:00
Jeremy Lilley	3580c93716	[autograd] Demote the dist container shard line to VLOG(1) (#36978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36978 We're seeing quite a bit of these running unittests, might be a bit verbose at LOG(INFO) ghstack-source-id: 102557335 Test Plan: regular unittest coverage, this is logging-only Differential Revision: D21149262 fbshipit-source-id: 4992342883920f58484afd8b1e432c1455035835	2020-04-22 10:48:28 -07:00
Tao Xu	9b0e7ebab0	[iOS] 1.5.0 Cocoapods Release (#37039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37039 ### Summary Cocoapods 1.5.0 release. Binary has been pushed to AWS - https://ossci-ios.s3.amazonaws.com/libtorch_ios_1.5.0.zip ### Test Plan - ` pod spec lint --verbose --allow-warnings --no-clean --use-libraries --skip-import-validation` - TestApp Test Plan: Imported from OSS Differential Revision: D21169113 Pulled By: xta0 fbshipit-source-id: d015c218ed20b168a1ef21025db8880da4e3b074	2020-04-22 10:36:18 -07:00
kshitij12345	a00d6758b8	Migrate `cosh` and `cosh_` from TH to ATen (CUDA) (#36654 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24546 Benchmark with same build settings on same system. gcc : version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) CUDA : 10.1 GPU : 1050ti ```python import timeit for n, t in [(10_000, 20000), (100_000, 20000)]: for dtype in ('torch.half', 'torch.float', 'torch.double'): print(f'torch.cosh(a) a.numel() == {n} for {t} times {dtype}') print(timeit.timeit(f'torch.cosh(a); torch.cuda.synchronize()', setup=f'import torch; a=torch.arange({n}, dtype={dtype}, device="cuda")', number=t)) ``` Before: ``` torch.cosh(a) a.numel() == 10000 for 20000 times torch.half 0.2813017509997735 torch.cosh(a) a.numel() == 10000 for 20000 times torch.float 0.28355878599904827 torch.cosh(a) a.numel() == 10000 for 20000 times torch.double 0.27810572300040803 torch.cosh(a) a.numel() == 100000 for 20000 times torch.half 0.3239932899996347 torch.cosh(a) a.numel() == 100000 for 20000 times torch.float 0.321233343998756 torch.cosh(a) a.numel() == 100000 for 20000 times torch.double 0.5546665399997437 ``` After: ``` torch.cosh(a) a.numel() == 10000 for 20000 times torch.half 0.2905335750001541 torch.cosh(a) a.numel() == 10000 for 20000 times torch.float 0.27596429500044906 torch.cosh(a) a.numel() == 10000 for 20000 times torch.double 0.30358699899989006 torch.cosh(a) a.numel() == 100000 for 20000 times torch.half 0.30139567500009434 torch.cosh(a) a.numel() == 100000 for 20000 times torch.float 0.30246640400036995 torch.cosh(a) a.numel() == 100000 for 20000 times torch.double 0.5403946970000106 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36654 Differential Revision: D21164606 Pulled By: VitalyFedyunin fbshipit-source-id: 55e88f94044957f81599ae3c12cda38a3e2c985c	2020-04-22 10:16:24 -07:00
Nikita Shulga	e7a72bb0c6	Add nomnigraph include folder to `Caffe2_GPU_INCLUDE` (#37056 ) Summary: Because `caffe2/contrib/tensort` includes nomnigraph headers Pull Request resolved: https://github.com/pytorch/pytorch/pull/37056 Test Plan: `cmake ../pytorch -DPYTHON_EXECUTABLE=/usr/bin/python3.7 -DCMAKE_BUILD_TYPE=RELWITHDEBINFO -DUSE_CUDA=YES -DBUILD_TEST=YES -DUSE_TENSORRT=YES -DTENSORRT_ROOT=$HOME/Downloads/TensorRT-7.0.0.11 -DCMAKE_CXX_COMPILER=/usr/bin/cuda-g++ -DCMAKE_C_COMPILER=/usr/bin/cuda-gcc -DUSE_MKLDNN=ON -G Ninja; ninja torch_cuda` Differential Revision: D21178927 Pulled By: malfet fbshipit-source-id: e1bed94fdb395ebfd6eb5d950ca378da77592531	2020-04-22 09:44:13 -07:00
Ailing Zhang	7c9e7ef128	Revert D21171747: [pytorch][PR] Rebase xla job on top master before running CI build. Test Plan: revert-hammer Differential Revision: D21171747 Original commit changeset: 433ea0e14d03 fbshipit-source-id: 6d5538a3533356997077bb1b8cd46aa6ec4332f8	2020-04-22 09:27:25 -07:00
David Reiss	e75fb4356b	Remove (most) Python 2 support from Python code (#35615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35615 Python 2 has reached end-of-life and is no longer supported by PyTorch. Now we can clean up a lot of cruft that we put in place to support it. These changes were all done manually, and I skipped anything that seemed like it would take more than a few seconds, so I think it makes sense to review it manually as well (though using side-by-side view and ignoring whitespace change might be helpful). Test Plan: CI Differential Revision: D20842886 Pulled By: dreiss fbshipit-source-id: 8cad4e87c45895e7ce3938a88e61157a79504aed	2020-04-22 09:23:14 -07:00
Edward Yang	a894fff265	Back out "Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API" Summary: Original commit changeset: 636e8a11afc6 Test Plan: export to OSS Reviewed By: malfet Differential Revision: D21170502 fbshipit-source-id: e8f35f103c4924aedbcaaf868475008d24bdeeab	2020-04-22 09:18:23 -07:00
Nikita Shulga	3b832ee2bf	Use Python3 `super()` throughout `torch.testing.` (#37024 ) Summary: Hattip to ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/37024 Differential Revision: D21173244 Pulled By: malfet fbshipit-source-id: 7079703e28777d873f69bf9fd4dcbad8d53a2682	2020-04-22 09:00:28 -07:00
Taylor Robie	28fadfc4eb	Reduce overheads on several CPU kernels by avoiding restrides. (#36875 ) Summary: Calling `t.as_strided(..., ...)` must make a new `TensorImpl` to back the new tensor, which takes 300-400 ns. Reduction, scatter/gather, and comparison kernels currently restride inputs and outputs in order to handle `dim` inside the function passed to TensorIterator. Because these Tensors are created solely for consumption by the iterator a full restride and metadata copy is surplus to requirements. Moreover, shapes are already checked by these kernels prior to calling `add_input` and `add_output`, so shape inference and broadcasting are also unnecessary. This PR adds a `TensorIterator::declare_static_shape(...)` method, which allows certain kernels to use a much more constrained and efficient shape path. This results in a 900-1200 ns speedup for `gather / scatter / scatter_add / cumsum / cumprod` and a 250-500 ns speedup for elementwise `min` and `max`. Measurements were taken with [this python script](https://gist.github.com/robieta/51ac5db2f9c7e812d5ff264403ce4f92), which is driven by [this bash script](https://gist.github.com/robieta/1420e917cf38885de3093f8c3a7bd437). The general procedure for mitigating environmental skew is to repeatedly switch between an environment which is built with master and one which is built with this branch while running the python script. Within the python measurement script the following was used to reduce variation: * Set number of threads to 1 * Aggressively and randomly interleave task measurements to limit correlation between tasks and system state based on when they were run or what task preceded the current one. * Warmup period, dropping the first three passes through all of the tasks. Two independent end-to-end runs are included since there is some variation even with the above measures. Overall measurement error seems to be about +/- 100 ns. The benchmark also includes several tasks which are not affected by this PR, both to check for a degradation in TensorIterator performance when static shapes are not set (which did happen for an earlier iteration of this optimization) and to estimate measurement variability and validate that measured improvements are significant. First run: ``` Delta (median) Master (25%, 75%) Branch (25%, 75%) --------------------------------------------------------------------------------------------------------- gather_1D \| 920 \| 4,000 (-170, +230) \| 3,100 (-110, +140) gather_dim0 \| 910 \| 4,100 (-170, +230) \| 3,200 (-110, +150) gather_dim1 \| 1,200 \| 4,400 (-190, +240) \| 3,200 (-120, +150) scatter_1D \| 1,100 \| 2,800 (-120, +160) \| 1,700 (-64 , +81) scatter_dim0 \| 1,000 \| 2,900 (-130, +160) \| 1,900 (-72 , +95) scatter_dim1 \| 1,200 \| 3,200 (-130, +170) \| 1,900 (-67 , +87) scatter_add_1D \| 1,100 \| 2,800 (-120, +150) \| 1,700 (-68 , +89) scatter_add_dim0 \| 1,000 \| 2,900 (-120, +150) \| 1,900 (-77 , +93) scatter_add_dim1 \| 1,300 \| 3,100 (-140, +180) \| 1,900 (-76 , +92) cumsum_1D \| 1,000 \| 4,600 (-200, +260) \| 3,600 (-120, +170) cumsum_dim0 \| 860 \| 4,500 (-190, +240) \| 3,700 (-140, +180) cumsum_dim1 \| 1,200 \| 4,800 (-210, +260) \| 3,700 (-130, +180) cumprod_1D \| 1,000 \| 4,600 (-200, +270) \| 3,600 (-130, +170) cumprod_dim0 \| 910 \| 4,600 (-210, +270) \| 3,700 (-130, +170) cumprod_dim1 \| 1,200 \| 4,900 (-220, +290) \| 3,700 (-130, +170) min_dim0 \| 280 \| 5,900 (-220, +270) \| 5,600 (-220, +260) min_dim1 \| 560 \| 6,200 (-230, +310) \| 5,600 (-230, +270) max_dim0 \| 320 \| 5,900 (-220, +280) \| 5,600 (-200, +250) max_dim1 \| 540 \| 6,100 (-250, +310) \| 5,600 (-200, +250) std (reference) \| 58 \| 4,300 (-180, +280) \| 4,200 (-160, +200) clamp (reference) \| 87 \| 3,400 (-160, +220) \| 3,400 (-140, +170) argmin (reference) \| -85 \| 3,900 (-170, +250) \| 4,000 (-170, +200) sum (reference) \| -11 \| 4,200 (-180, +240) \| 4,200 (-160, +190) x < y (reference) \| 110 \| 3,700 (-170, +290) \| 3,500 (-140, +150) max(x, y) (reference) \| 170 \| 3,600 (-170, +200) \| 3,400 (-140, +180) * Times in nanoseconds Deltas: positive is improvement, negative is regression. ``` Second run:** ``` Delta (median) Master (25%, 75%) Branch (25%, 75%) --------------------------------------------------------------------------------------------------------- gather_1D \| 850 \| 3,900 (-130, +150) \| 3,000 (-110, +130) gather_dim0 \| 860 \| 4,000 (-140, +150) \| 3,200 (-110, +150) gather_dim1 \| 1,200 \| 4,300 (-160, +160) \| 3,200 (-110, +150) scatter_1D \| 1,100 \| 2,700 (-98 , +110) \| 1,700 (-64 , +83) scatter_dim0 \| 950 \| 2,800 (-100, +110) \| 1,900 (-67 , +88) scatter_dim1 \| 1,200 \| 3,100 (-120, +140) \| 1,900 (-69 , +88) scatter_add_1D \| 1,100 \| 2,700 (-92 , +110) \| 1,700 (-65 , +95) scatter_add_dim0 \| 960 \| 2,800 (-100, +100) \| 1,900 (-74 , +100) scatter_add_dim1 \| 1,200 \| 3,100 (-100, +130) \| 1,900 (-72 , +100) cumsum_1D \| 960 \| 4,500 (-140, +190) \| 3,600 (-130, +170) cumsum_dim0 \| 820 \| 4,500 (-140, +180) \| 3,700 (-130, +170) cumsum_dim1 \| 1,100 \| 4,800 (-160, +200) \| 3,600 (-120, +170) cumprod_1D \| 960 \| 4,500 (-130, +190) \| 3,600 (-130, +180) cumprod_dim0 \| 820 \| 4,500 (-150, +190) \| 3,700 (-130, +180) cumprod_dim1 \| 1,100 \| 4,800 (-150, +220) \| 3,700 (-130, +180) min_dim0 \| 260 \| 5,800 (-210, +250) \| 5,500 (-200, +230) min_dim1 \| 580 \| 6,100 (-230, +270) \| 5,500 (-200, +220) max_dim0 \| 250 \| 5,800 (-210, +230) \| 5,600 (-170, +210) max_dim1 \| 520 \| 6,100 (-220, +240) \| 5,600 (-180, +210) std (reference) \| 170 \| 4,300 (-210, +220) \| 4,100 (-160, +190) clamp (reference) \| 140 \| 3,400 (-140, +170) \| 3,300 (-120, +170) argmin (reference) \| -51 \| 3,800 (-170, +190) \| 3,900 (-140, +160) sum (reference) \| -58 \| 4,100 (-160, +170) \| 4,200 (-170, +190) x < y (reference) \| 64 \| 3,600 (-150, +210) \| 3,500 (-140, +180) max(x, y) (reference) \| 120 \| 3,500 (-130, +150) \| 3,400 (-130, +150) * Times in nanoseconds **Deltas: positive is improvement, negative is regression. ``` CC ilia-cher VitalyFedyunin glaringlee gdankel Pull Request resolved: https://github.com/pytorch/pytorch/pull/36875 Differential Revision: D21173011 Pulled By: robieta fbshipit-source-id: 2067ab62f8f8d7b50e20a486a262864480699bbe	2020-04-22 08:58:53 -07:00
anjali411	25eb250d77	Added complex dtypes to get_all_math_dtypes, complex acc type for cpu, fixed rdiv and pow for complex (#36747 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/36730 https://github.com/pytorch/pytorch/issues/36057 Partially resolves: https://github.com/pytorch/pytorch/issues/36671 ``` >>> 2j / torch.tensor([4], dtype = torch.complex64) tensor([(0.0000+0.5000j)], dtype=torch.complex64) >>> 1 / torch.tensor(3+4j) tensor((0.1200-0.1600j), dtype=torch.complex64) ``` rdiv is more generally broken for all dtypes because it doesn't promote the types properly eg. ``` >>> 1 / torch.tensor(2) tensor(0) >>> 2j / torch.tensor(4) tensor(0) ``` so that issue should be fixed in a separate PR Adding CPU acc types for complex Added cumsum, cumprod for complex dtypes Added complex dtypes to get_all_math_dtypes to expand testing for complex dtypes Pull Request resolved: https://github.com/pytorch/pytorch/pull/36747 Differential Revision: D21138687 Pulled By: anjali411 fbshipit-source-id: ad3602ccf86c70294a6e71e564cb0d46c393dfab	2020-04-22 08:52:41 -07:00
Ailing Zhang	191fa528f5	Rebase xla job on top master before running CI build. (#36852 ) Summary: This PR tries to rebase on top of origin/master before building xla job. I also saw a TODO in existing code which does a very similar thing (rebase on master for gcc5 jobs), so I just fixed the TODO by moving the logic into a separate step. Currently the logic is: For these gcc5 and xla jobs, we rebase on top of "target" branch before building. - This only happens on PRs. - "Target" branch is "origin/master" by default, but if it's trying to merge into a release branch, target branch will be the release branch. - I made the "target" branch a param mainly it's allow us to rebase on `viable/strict` if we want. But after a second thought, how quickly `viable/strict` moves forward is not controlled only by xla job, and it's hard to predict how long the breakage will last if it's not moving. But we do have control over how long a xla breakage lasts on `origin/master` (which should be short since we monitor it closely). So I currently want to keep `origin/master` and move to `viable/strict` when it's super stable. - There're jobs like `pytorch_paralleltbb_linux_xenial_py3_6_gcc5_4_build` which would fall into the rebase logic as well, but since those jobs doesn't run on PRs(so the old logic was essentially no-op), I didn't enabled the new logic on those jobs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36852 Differential Revision: D21171747 Pulled By: ailzhang fbshipit-source-id: 433ea0e14d030e2e0fa74d2ff4244327e9db7044	2020-04-22 08:46:54 -07:00
Zafar	3e3498cf03	[quant][graphmode] torch.clamp (#36887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36887 Test Plan: Imported from OSS Differential Revision: D21110469 Pulled By: z-a-f fbshipit-source-id: 6829f5c315b8950a89364132bca953d78cd4ff3d	2020-04-22 04:20:41 -07:00
Mikhail Zolotukhin	799793f279	[TensorExpr] Cleanup IRPrinter implementation for statements. (#37050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37050 With this change curly braces are printed as a part of Block rather than a part of the enclosing statement. It allows us, for instance, to more easily see nested blocks: now they will be printed each in its own curly-braced scope. As a side effect, I had to change how we print loop options. Previously we did it like this: ``` for (...) { // <loop options> <loop body (Block)> } ``` Now, since everything in between { and } is a part of the block, we have to do it the following way: ``` for (...) /* <loop options> / { <loop body (Block)> } ``` Note the change from '//' to '/ .. */' for the loop option comments. Test Plan: Imported from OSS Differential Revision: D21171851 Pulled By: ZolotukhinM fbshipit-source-id: 39f51a9e15aec03b6527b0634fd4b9e01a912cda	2020-04-21 23:20:18 -07:00
Mikhail Zolotukhin	b8e2d797c0	[TensorExpr] Insert allocations for temporary buffer at the innermost valid scope. (#36836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36836 Test Plan: Imported from OSS Differential Revision: D21099913 Pulled By: ZolotukhinM fbshipit-source-id: 8faf5f1d55b60bdd4f4b2b909977aeb7abaa95b4	2020-04-21 22:51:46 -07:00
Christian Kastner	6df90bcecc	setup.py: Remove conflicting double documentation of USE_FBGEMM (#36993 ) Summary: Line 33+ contains instructions on how to disable use, 108+ on how to enable it. The default in CMakeLists.txt is enabled, so drop the latter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36993 Differential Revision: D21161793 Pulled By: ngimel fbshipit-source-id: 08c5eecaf8768491f90d4a52c338ecea32a0c35e	2020-04-21 22:33:49 -07:00
Nikita Shulga	4593d87b84	Do not link torch_python with nccl (#37040 ) Summary: If NCCL is used, just allow `torch_python` to access enums defined in header file Pull Request resolved: https://github.com/pytorch/pytorch/pull/37040 Test Plan: `cmake ../pytorch -DPYTHON_EXECUTABLE=/usr/bin/python3.7 -DCMAKE_BUILD_TYPE=RELWITHDEBINFO -DUSE_CUDA=YES -DBUILD_TEST=YES -DUSE_NCCL=YES -DUSE_DISTRIBUTED=NO -DCMAKE_CXX_COMPILER=/usr/bin/cuda-g++ -DCMAKE_C_COMPILER=/usr/bin/cuda-gcc -DUSE_MKLDNN=ON -G Ninja` + `ninja torch_python` Differential Revision: D21171573 Pulled By: malfet fbshipit-source-id: e5eba0f610da3b0fcd17342ad46458dc7b0d251b	2020-04-21 21:00:49 -07:00
Mike Ruberry	4bbc49f53a	Revert D21143025: [reland][quant] QuantizedCUDA implementation Test Plan: revert-hammer Differential Revision: D21143025 Original commit changeset: 11405e2e8f87 fbshipit-source-id: ce471ec95c1fc6abff6d1bbdba11bef02f3a0d62	2020-04-21 20:36:12 -07:00
Nikolay Korovaiko	7b03ce7bb3	make sure logs work inside aten/c10 namespaces as well (#37018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37018 Differential Revision: D21167933 Pulled By: Krovatkin fbshipit-source-id: b63f890c19b9887d5709b308ef691f5061cd27b8	2020-04-21 20:01:13 -07:00
Mike Ruberry	4a2372bc90	Implements torch.isclose for complex tensors (#36456 ) Summary: Previously torch.isclose would RuntimeError when called on complex tensors. This update updates torch.isclose to run on complex tensors and be consistent with [NumPy](https://numpy.org/doc/1.18/reference/generated/numpy.isclose.html). However, NumPy's handling of NaN, -inf, and inf values is odd, so I adopted Python's [cmath.isclose](https://docs.python.org/3/library/cmath.html) behavior when dealing with them. See https://github.com/numpy/numpy/issues/15959 for more on NumPy's behavior. While implementing complex isclose I also simplified the isclose algorithm to: - A is close to B if A and B are equal, if equal_nan is true then NaN is equal to NaN - If A and B are finite, then A is close to B if `abs(a - b) <= (atol + abs(rtol * b))` This PR also documents torch.isclose, since it was undocumented, and adds multiple tests for its behavior to test_torch.py since it had no dedicated tests. The PR leaves equal_nan=True with complex inputs an error for now, pending the outcome of https://github.com/numpy/numpy/issues/15959. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36456 Differential Revision: D21159853 Pulled By: mruberry fbshipit-source-id: fb18fa7048e6104cc24f5ce308fdfb0ba5e4bb30	2020-04-21 19:53:55 -07:00
Shen Li	5c2b273089	Add RRef Python Helper to launch function on the referenced object (#36619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36619 With this PR, applications no longer need to create dedicated helpers to run functions on the object referenced by an RRef. Instead, `rref.rpc_sync().some_func()` will use `rpc_sync` to run `some_func` on the owner of the RRef using the object referenced by the RRef. Similar helpers for `rref.rpc_async().some_func()` and `rref.remote().some_func()` are also added. An alternative design is to expose PyRRef as RRefBase and then implement everything in a new Python RRef class. However, the RRef class cannot directly inherit from PyRRef/RRefBase, otherwise we will need to let pyRemote* C++ functions to load RRef from Python and return an RRef instance. It is possible to let RRef hold a instance of PyRRef instead of inherit from it, but this does not look like a elegant design, as we will have RRef holding PyRRef and PyRRef holding the C++ RRef. Another alternative is to use dynamic method loading, by installing member methods to PyRRef instances. However, this would require different solutions to handle RRef(data) and rpc.remote(...). Base on the above thinking, we decided to go with the current implementation for simplicity and we can also keep all RRef-related APIs in one place. Test Plan: Imported from OSS Differential Revision: D21028333 Pulled By: mrshenli fbshipit-source-id: fe90f56ef7183d18874e357900093755e1601eb4	2020-04-21 19:29:54 -07:00
Shen Li	b982a6a247	Expose torch.distributed.is_available() API (#37021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37021 Test Plan: Imported from OSS Differential Revision: D21164318 Pulled By: mrshenli fbshipit-source-id: 08a446af342cbe54f3eb4994956ffa7ef4922bcf	2020-04-21 18:38:46 -07:00
Nick Gibson	25abdcb3d1	[TensorExpr] add Block flattening to IR Simplifier (#37013 ) Summary: Some IR optimizations were leaving superfluous Blocks in the IR, this PR adds simplification and merging of enclosing Block statements to the IR Simplifier, e.g. ``` Block { Stmt 1 Block { Stmt 2 } Block {} } ``` becomes ``` Block { Stmt 1 Stmt 2 } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/37013 Differential Revision: D21166208 Pulled By: nickgg fbshipit-source-id: 6dcdf863980d94731a8ddf184882c4a5b7259381	2020-04-21 17:58:18 -07:00
Mike Ruberry	a850d8a526	Fixes exponential with lambda=0 (#36837 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/36798. In the future more thorough testing would be nice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36837 Differential Revision: D21102342 Pulled By: mruberry fbshipit-source-id: 4fae45677e54b403296033720dfb13abca47f3a4	2020-04-21 17:34:07 -07:00
Mikhail Zolotukhin	dc327d9082	[TensorExpr] Remove obsolete code for handling dynamic shapes from kernel.cpp. (#36686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36686 Test Plan: Imported from OSS Differential Revision: D21053305 Pulled By: ZolotukhinM fbshipit-source-id: eb6111df8aead1fd881749141b87a1395285eb0e	2020-04-21 17:29:12 -07:00
Mikhail Zolotukhin	359e7f4bba	Teach IRParser to parse strides along with sizes in a tensor type. (#36951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36951 Test Plan: Imported from OSS Differential Revision: D21139940 Pulled By: ZolotukhinM fbshipit-source-id: b56a1fddfc9de4684da3ba9a462e344d0985e8b6	2020-04-21 17:27:15 -07:00
James Reed	8eb22f6ee9	Revert D21161361: [pytorch][PR] Revert "Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API" Test Plan: revert-hammer Differential Revision: D21161361 Original commit changeset: dca4192d3f7b fbshipit-source-id: 311b6ab6169feb5e12ae8c959789f8f0acd5205d	2020-04-21 17:13:05 -07:00
Jeremy Lilley	443fe7ca0e	[rpc] Avoid wireDeserializer overreading buffers by 1 byte (#36976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36976 The bounds check and the read were swapped in two places - I noticed ASAN complaining in an unrelated change on an erroneous buffer. Adding a couple simple test cases. ghstack-source-id: 102606986 Test Plan: buck test mode/dev caffe2/test/cpp/rpc: Differential Revision: D21148936 fbshipit-source-id: 7ec5007535f7310437ac1b9a72852a223b9dd29a	2020-04-21 17:01:45 -07:00
Hector Yuen	b019a8d484	fix spatialbatchnorm on nnpi (#36987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36987 the discrepancy comes from using eigen's sqrt. Replaced it with std::sqrt and worked, so using MKLs version Removed momentum, made epsilon float, enhanced the test with hypothesis Test Plan: testing the mkl dependencies in prod, if things work, will remove the intrinsics implementation, if no, will use intrinsics Reviewed By: yinghai Differential Revision: D21151661 fbshipit-source-id: 56e617b13bc32b0020691f7201d16dee00f651b5	2020-04-21 16:52:13 -07:00
Pritam Damania	f0a533c5dd	Fix flaky test_backward_node_failure_python_udf (#36969 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36969 `test_backward_node_failure_python_udf` was flaky since it used the RPC framework to indicate rank 0 was done with processing. Since we kill nodes in this unit test, it is very likely that listenLoop() has exited on some nodes and hence using an RPC to inform all nodes about rank 0's completion might not work, since the RPC might not be processed on certain nodes. To fix this, we use the c10d store instead for this notification. ghstack-source-id: 102549873 Test Plan: waitforbuildbot Differential Revision: D21147099 fbshipit-source-id: 745273a6cae0debbae131bb4cc7debe9c201bf98	2020-04-21 16:42:47 -07:00
James Reed	1592d6842c	[resubmit] Move profiler to a dispatch wrapper (#36766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36766 Original commit changeset: dcb41d243369 ghstack-source-id: 102614215 Test Plan: waitforsadcastle Differential Revision: D21076029 fbshipit-source-id: c2461c57cfd364bd23ff99bc2cb5572d22e23391	2020-04-21 16:37:11 -07:00
Mike Ruberry	bcdb0727c2	Revert D20907254: Fix long line splitting issue in python_print Test Plan: revert-hammer Differential Revision: D20907254 Original commit changeset: ebfc1a4eefc2 fbshipit-source-id: 76440a8649a17728c50e2f3eeb3744a2245f6daf	2020-04-21 16:24:32 -07:00
Wojciech Baranowski	a92f1dc85e	native_functions.yaml: reset_grad_accumulator (#36431 ) Summary: Add optional 'reset_grad_accumulator' in native_functions.yaml Use it for all overloads of `set_` Fixes https://github.com/pytorch/pytorch/issues/33941 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36431 Differential Revision: D21156062 Pulled By: ezyang fbshipit-source-id: 5a63fd091a618a33cb05fd96bbb1e87162abc9a4	2020-04-21 16:19:15 -07:00
cyy	806f22b167	find backtrace by cmake module (#36017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36017 Reviewed By: zou3519, albanD Differential Revision: D20870233 Pulled By: ezyang fbshipit-source-id: b11daf22a900e47b5b72272fb2f096d78c075bf8	2020-04-21 16:00:33 -07:00
Edward Yang	6ebfff6c4e	Add locks to fallback register/deregister. (#36628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36628 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D21029702 Pulled By: ezyang fbshipit-source-id: 2322094338ad896653b2db43ff74a8ab1593b3e1	2020-04-21 15:55:20 -07:00
Zachary DeVito	bf676682e7	Fix long line splitting issue in python_print (#36188 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36188 * Need to remove n^2 behavior for scanning whether to split or not otherwise long inline chains will take a long time re-scanning. Test Plan: Imported from OSS Differential Revision: D20907254 Pulled By: zdevito fbshipit-source-id: ebfc1a4eefc26d5806381e7afd75b7a9cd4cde97	2020-04-21 15:46:42 -07:00
Edward Yang	e1742e8e4e	Revert "Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API" (#37019 ) Summary: This reverts commit 2ccdc39dce91a1821ede9bfdea26b30d66e1554f. Original PR: https://github.com/pytorch/pytorch/pull/36742 Pull Request resolved: https://github.com/pytorch/pytorch/pull/37019 Differential Revision: D21161361 Pulled By: ezyang fbshipit-source-id: dca4192d3f7be25a34bbe3d57ddce3afc1c2558c	2020-04-21 15:39:39 -07:00
Kimish Patel	6eb109e1ad	Enable float only requantization. Part 1. (#35856 ) Summary: This PR is motivated by two issues it tries to address: 1) relax the constraint on requantization scale (<1). 2) Unify requantization methodology across pytorch integration of QNNPACK and FBGEMM. Here we are trying to address the first part for Conv and Linear. Existing requantization scheme performs scale multiplication entirely in integer arithmetic by extracting mantissa and exponent part of FP scale and processing them. This including appropriate rounding required. The set of instruction, corresponding to this, are specifically tailored for the condition when scale < 1. Relaxing this constraint requires us to fix that sequence of instruction. In this PR we take a simpler approach of essentially converting Int32 to FP32, apply scale, convert FP32 to Int32 with appropriate rounding, to-nearest-ties-to-even. This is followed by zero point add and clipping. Since in 32-bit ARM nearest-ties-to-even rounding instruction is not available, the sequence is little different. Sequence for both 32-bit and 64-bit are taken from https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/cpu/qnnpack/src/requantization/fp32-neon.c. Furthermore relaxing the scale constraint and moving towards FP requantization also helps us move towards unifying requantization producer across QNNPACK and FBGEMM. Summary of the PR: - requantization params are modified to lift some computation that would have to be in the kernel otherwise for aarch32 kernels, particularly: - Computing vfmin, vfmax, vfmagic and vimagic. - Fixed q8gemm, q8conv and q8dwconv kernels. - Fixed the corresponding tests. What is not done: - XZP kernels are not changed as part of this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35856 Differential Revision: D20996325 Pulled By: kimishpatel fbshipit-source-id: 7a7a18b09dd2564768142371db06d98bf7479f49	2020-04-21 15:20:20 -07:00
Edward Yang	1f82679311	Revert D21156042: [pytorch][PR] CMake/Ninja: fix dependencies for .cu files Test Plan: revert-hammer Differential Revision: D21156042 Original commit changeset: fda3aaa57207 fbshipit-source-id: 59b208d4dc7ab743876af3ed382477770526aa1a	2020-04-21 14:24:27 -07:00
Linbin Yu	4e463b6366	add missing ops for portal TTS model (again) (#37007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/37007 D20961463 was reverted due to clang-format. Redo it. Test Plan: verified TTS model can be loaded without problem Reviewed By: iseeyuan Differential Revision: D21157626 fbshipit-source-id: 372bf6196da20b3ebafa283c5c3f7c924a37ed60	2020-04-21 14:12:01 -07:00
Guanheng Zhang	b607c83a26	Add support for bool/byte `attn_mask` tensor in MultiheadAttention/Transformer modules (#33763 ) Summary: Add the support to accept both float, byte, and bool tensors for `attn_mask`. No breakage is expected. - If a bool tensor is provided, positions with `True` are not allowed to attend while `False` values will be unchanged. - if a byte tensor is provided, it will be converted to bool tensor. Positions with non-zero are not allowed to attend while zero values will be unchanged. - If a float tensor is provided, it will be added to the attention weight. Note: the behavior of the float mask tensor is slightly different from the first two options because it is added to the attention weight, rather than calling `masked_fill_` function. Also, converting a byte tensor to bool tensor within `multi_head_attention_forward` causes extra overhead. Therefore, a bool mask is recommended here. For `key_padding_mask`: - if a bool tensor is provided, it will be converted to bool tensor. The positions with the value of `True` will be ignored while the position with the value of `False` will be unchanged. - If a byte tensor is provided, the positions with the value of non-zero will be ignored while the position with the value of zero will be unchanged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33763 Differential Revision: D20925358 Pulled By: zhangguanheng66 fbshipit-source-id: de174056be183cdad0f3de8024ee0a3c5eb364c9	2020-04-21 14:06:59 -07:00
Nick Gibson	9854df673c	[TensorExpr] Fix bug in For elimination in the IRSimplifier. (#36965 ) Summary: When doing elimination of For loops which execute once, e.g. `for i = 0; i < 1; ++i { thing; } => thing;` we do var substitution while the temporary simplifier ExprNodes still exist, which could put them in an invalid state and leave unsimplified terms in the expression. The fix is to apply substitution before simplifying the body of the for loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36965 Differential Revision: D21145248 Pulled By: nickgg fbshipit-source-id: d874600c7a098fc05b8ef3109e516e2eaa2c24e0	2020-04-21 13:38:09 -07:00
Sebastian Messmer	6d13a334f6	Remove use_c10_dispatcher: unboxed_only (#36838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36838 All ops now do unboxing after dispatch, i.e. c10 handles unboxing and c10 registers a wrapper for the op to JIT The last op that manually registered its own wrapper to JIT in register_aten_ops.cpp was migrated. Since there are no ops using use_c10_dispatcher: unboxed_only anymore, we can delete the feature. Also: - Rename some files to more accurately describe what they're doing now: - OpsAlreadyMovedToC10.h/cpp -> ATenOpList.h/cpp - register_aten_ops.cpp -> generated_unboxing_wrappers.cpp - gen_jit_dispatch.py -> gen_unboxing_wrappers.cpp ghstack-source-id: 102532915 Test Plan: waitforsandcastle Differential Revision: D21100081 fbshipit-source-id: be824958eef33f6cd42a6a652175bd0b1df4ebf9	2020-04-21 13:32:33 -07:00
HC Zhu	ea97fa1f2a	[PyTorch][Dist] Trigger pre/post hooks of output function nodes under distributed autograd (#34501 ) Summary: # Goals Do the following things during a distributed backward pass. 1. Accumulate the gradient of a variable to RPC context once the gradient is ready instead of at the very end of the backward pass. 2. Run post/pre hooks installed in`AccumulateGrad` nodes once the gradient is ready for the variable. Currently, the hooks in `AccumulateGrad` are not executed just because the function `AccumulateGrad` itself is not even evaluated by the local engine. 3. Make it extensible to support post hooks installed by DDP's reducer. # Introduce GradCapturePreHook ## Why do we need this? ### Root issue: * dist engine uses the autograd.grad-like API on the vanilla engine and then in the Future callback populates the context with the gradients. This is a bad emulation of the .backward() call on the vanilla engine. ### Practical issue: * The leaf’s hook are not called (because associated with the AccumulateGrad that is not call in the autograd.grad-like API). Modules like DDP rely on these hooks. * The Future is marked as completed before the context is actually populated with the grads leading to unexpected behavior on the user side. * The Future callback is only called at the complete end of the backward and so too late for DDP if they want to overlap compute/transfert. ### Proposed solution: * Provide hooks in the autograd.grad-like API that will allow the distributed engine to populate the context and call the hooks to better emulate the .backward call. ## Who can install a grad capture pre-hook? This will be an internal hook at C++ level and it won’t be exposed to PyThon code. Only call-sites directly interacting with the local engine can install such hooks. ## Signature The returned `grad` will be captured. ``` virtual const torch::Tensor& grad operator()(const torch::Tensor& grads) = 0; ``` ## Where are hooks installed? Grad capture pre-hooks are install in GraphTask::ExecInfo::Capture. ExecInfo is per node. Every backward run will have its own GraphTask instance. ## When/How will hooks be called? When the local engine captures the grads for a node, all grad capture pre hooks are called one by one in the order they are added. The output grads of the hooks will replace the original grads. The output of the last hook will be used for grad capturing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34501 Test Plan: All existing tests should pass. ``` python setup.py develop python test/distributed/rpc/test_dist_autograd_spawn.py DistAutogradTestWithSpawn.test_post_hooks ``` Differential Revision: D20953673 Pulled By: hczhu fbshipit-source-id: 543b3844823330ea9f9856bab7c5cb2679290a53	2020-04-21 13:23:18 -07:00
Jerry Zhang	97d3a8495d	[reland][quant] QuantizedCUDA implementation (#36936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36936 Closes https://github.com/pytorch/pytorch/issues/30813 Relanding of https://github.com/pytorch/pytorch/pull/35463 1. Tensor quantization logic(quantize_) is moved to the aten/native/quantized. Previously all logic for tensor quantization lived in the aten/quantized/Quantizer.cpp file, and started to become complicated and hard to read. This problem should be addressed in refactoring PR. Still, I reworked this partially because I had to add tensor quantization logic for CUDA, and it was native to move everything to the aten/native/quantized. 2. Requirements to run CUDA_tensor_apply was eased to process any tenser that lives on the CUDA device(QuantizedCUDA included). 3. All quantized data types now have a default constructor. NVCC refuses to compile any gpu_kernel or CUDA_tensor_apply* without them. 4. Minor changes in many files to register QuantizedCUDA backend. 5. test_quantized_tensor is extended to process QuantizedCUDA backend where possible. Test Plan: Imported from OSS Differential Revision: D21143025 Pulled By: jerryzh168 fbshipit-source-id: 11405e2e8f87e48fadc0a084c51db15f85ccb500	2020-04-21 13:18:52 -07:00
Rohan Varma	4efef475d7	[WIP] make test_distributed gloo test use MultiProcessTestCase (#36970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36970 We would like to move all distributed testing to use the existing multiprocessing tooling defined in common_distributed.py. With this change, we make `TestDistBackend` inherit from `MultiProcessTestCase` and enable fork mode multiprocessing. In the next step, we can enable spawn mode for these tests which will give us TSAN coverage. ghstack-source-id: 102553801 Test Plan: Unittests Differential Revision: D21146947 fbshipit-source-id: 608fa2cb93e88f8de6a5ac87c523e2c4e4fede1b	2020-04-21 13:13:48 -07:00
Zafar	6383373a04	[quant][graphmode] fused conv3d + relu (#36885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36885 Test Plan: Imported from OSS Differential Revision: D21110359 Pulled By: z-a-f fbshipit-source-id: d2590c5af13cdf5c843e68529ced8e3ce72d256e	2020-04-21 13:05:27 -07:00
James Reed	2ccdc39dce	Revert D21089648: Put TORCH_LIBRARY in torch/library.h; add custom class API Test Plan: revert-hammer Differential Revision: D21089648 Original commit changeset: 8d54329c1252 fbshipit-source-id: 636e8a11afc628a4cdae9d44824985c10c70555e	2020-04-21 12:21:45 -07:00
Meghan Lele	a05406ea56	[clang-format] Disable progress bar if stdout is piped (#36955 ) Summary: Summary This commit disables the progress bar for the `clang-format` binary download if stdout is not attached to a terminal. The cursor repositioning tricks used to print out the progress bar don't work if stdout is redirected something that is not a terminal, and so the file ends up containing each progress bar update on a separate line. This happens in the GitHub workflow for checking formatting and is annoying to scroll through. Test Plan 1. Manual invocation of the script still produces progress bar. ``` (pytorch) me@devgpuXXX:pytorch (disable-cf-progress-bar)$ with-proxy tools/clang_format.py Downloading clang-format to /home/me/local/repos/pytorch/.clang-format-bin 0% \|################################################################\| 100% Using clang-format located at /home/me/local/repos/pytorch/.clang-format-bin/clang-format ``` 2. GitHub `clang-format` workflow output no longer contains progress bar. ``` Run set -eux + echo '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + echo '\| Run tools/clang_format.py to fix formatting errors \|' \| Run tools/clang_format.py to fix formatting errors \| + echo '~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~' ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + tools/clang_format.py --verbose --diff Created directory /home/runner/work/pytorch/pytorch/.clang-format-bin for clang-format binary Downloading clang-format to /home/runner/work/pytorch/pytorch/.clang-format-bin Reference Hash: d1365110da598d148d8143a7f2ccfd8bac7df499 Actual Hash: d1365110da598d148d8143a7f2ccfd8bac7df499 Using clang-format located at /home/runner/work/pytorch/pytorch/.clang-format-bin/clang-format All files formatted correctly ``` Fixes This PR fixes https://github.com/pytorch/pytorch/issues/36949. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36955 Differential Revision: D21157861 Pulled By: SplitInfinity fbshipit-source-id: 16c6d4395cee09f3bd2abac13e9be4acdde73406	2020-04-21 11:28:33 -07:00
Mike Ruberry	71ec8b2002	Switches test_jit to use float32 as its default scalar type (#36982 ) Summary: Our test suite used to set double as its default scalar type, and when it was switched to not do so (to be more consistent with how users experience PyTorch), a few tests had to still set the default scalar type to double to function properly. Now that the jit no longer creates double tensors so frequently, it appears that test_jit no longer needs to set double as its default scalar type, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36982 Differential Revision: D21152120 Pulled By: mruberry fbshipit-source-id: ea6d3c1ad55552dc5affa1fe1bd0e5189849e6d7	2020-04-21 11:23:28 -07:00
Di Wu	54f265249c	Optimize grouped Conv3d performance (#36355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36355 Resolving issue in https://github.com/pytorch/pytorch/issues/36155, by: - supporting grouped conv3d in ```slow_conv3d``` - adding a fast path in ```__convolution``` to call ```slow_conv3d``` when running grouped conv3d on CPU - bypassing unfolding when kernel_size = 1 Test Plan: Added the following test cases in test_nn.py, testing both forward and backward: - test_Conv3d_groups_nobias - test_Conv3d_groups_wbias - test_Conv_1x1 Imported from OSS Differential Revision: D20957073 fbshipit-source-id: 29afd1e6be8c484859eaedd51463954e2fdccc38	2020-04-21 11:17:07 -07:00
Fritz Obermeyer	00b7d84eb7	Add a .with_cache() method to distributions.Transform objects (#36882 ) Summary: This resolves an issue observed by stefanwebb where the composition of multiple transforms is cached only if all components are cached. This PR adds a new method `.with_cache()` so that e.g. you can compose a normalizing flow (that needs to be cached) with a `SigmoidTransform` (that wasn't already cached) by calling `.with_cache()` on the latter. This issue also comes up when composing non-cached constraint transforms as returned by `transform_to()` and `biject_to()`: after this PR you can call `transform_to(constraints.positive).with_cache()` to get a cached `ExpTransform`. ## Tested - [x] added a unit test Pull Request resolved: https://github.com/pytorch/pytorch/pull/36882 Differential Revision: D21155914 Pulled By: ezyang fbshipit-source-id: 3c06e63785ca2503e08a5cd7532aff81882835e9	2020-04-21 10:50:31 -07:00
Edward Yang	01100cb477	Put TORCH_LIBRARY in torch/library.h; add custom class API (#36742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36742 Now, you can define a custom class inside a TORCH_LIBRARY block. It looks very similar to what you did before. Instead of ``` static auto m = torch::class_<Class>("Namespace", "Class").def("foo", foo); ``` you write ``` TORCH_LIBRARY(Namespace, m) { m.class_<Class>("Class") .def("foo", foo); } ``` All the old usages still work, but at some point we should start updating the tutorials when we're ready to go 100% live with the new pybind11 style API. custom class API previously lived in torch/ folder and in torch namespace, so for consistency, the new TORCH_LIBRARY also got moved to torch/library.h The definition of Library::class_ is in the bottom of that header because I need all of the class_ constructors available, but there is a circular dependency between the two headers. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D21089648 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: 8d54329c125242605336c22fa1642aae6940b507	2020-04-21 10:05:21 -07:00
Wojciech Baranowski	db84689c09	CMake/Ninja: fix dependencies for .cu files (#36938 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/26304 After this patch `build.ninja` entries for `.cu` files will contain a `depfile` variable pointing to a `.NVCC-depend` file containing dependencies (i.e., header files included directly or indirectly) of the `.cu` source file. Until now, those `.NVCC-depend` files were being transposed into `.cu.o.depend` files in CMake format. That did not work as intended because the `.cu.o` target file was declared to be dependent on the `.cu.o.depend` file itself, rather than its contents. In fact, Ninja lacks the functionality to process dependencies in the CMake format of those `.cu.o.depend` files. This was tested on Linux as described in https://github.com/pytorch/pytorch/issues/26304#issuecomment-614667170 I have also verified that the original problem does not reproduce with Makefiles (i.e., when `ninja` is not present in the system) and that PyTorch still build successfully with Makefiles after this patch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36938 Differential Revision: D21156042 Pulled By: ezyang fbshipit-source-id: fda3aaa57207f4d6bf74d2f254fe45fb7fd90eec	2020-04-21 09:43:48 -07:00
Bangsheng Tang	246b208e4f	make merge_fp32_into_fp16_inputs to generate ops for each partition (#36973 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36973 handle the case where inputs are used in multiple partitions Test Plan: unit tests Reviewed By: yinghai Differential Revision: D21107672 fbshipit-source-id: 9eca20220b80f27400aefcdaeff5d5503e32654c	2020-04-21 09:27:36 -07:00
Torsten Wörtwein	be52b7f0ea	Documentation LU Decomposition: deriving L, U, and P (#36907 ) Summary: Add note to LU decomposition to use `lu_unpack` to get `L`, `U`, and `P`. Fixes https://github.com/pytorch/pytorch/issues/36752. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36907 Differential Revision: D21134545 Pulled By: albanD fbshipit-source-id: 54d4872bb8c95dfb8048aedace9781f843ab8a30	2020-04-21 07:40:21 -07:00
Yuxin Wu	ff435a0e6b	[pytorch] add test for empty tensor support in nn.Linear (#36983 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36983 fix https://github.com/pytorch/pytorch/issues/34202 it seems to be fixed now but without a test Test Plan: sandcastle Differential Revision: D21149623 fbshipit-source-id: 109f8e75a0826541ec7beb1920d5a38e0e826899	2020-04-21 01:15:26 -07:00
Mike Ruberry	0f3af8529a	Revert D20961463: [TEST] add ops for portal TTS model Test Plan: revert-hammer Differential Revision: D20961463 Original commit changeset: 5022077caccd fbshipit-source-id: 1f567020e94ac151ed7568c13b5c3cd61226d309	2020-04-21 01:04:04 -07:00
Nikita Shulga	98c293c1ef	Do not use VLAs in vec256_qint.h (#36855 ) Summary: Use `std::decay_t<decltype(foo)>::size()` instead of `foo.size()` to help compiler with static array allocations. Even if `Vec256::size()` is `constexpr`, `foo.size()` (where `foo` is of type `Vec256`) is not an integral constant expression, therefore compiler have to use VLAs, which are not part of C++ standard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36855 Test Plan: CI Differential Revision: D21151194 Pulled By: malfet fbshipit-source-id: eaf3e467c7f7ee6798ca82fe9f8fae725011ead0	2020-04-21 00:59:23 -07:00
Linbin Yu	68f6b9873b	[TEST] add ops for portal TTS model (#36971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36971 Add lite interpreter ops for portal TTS model. Test Plan: Convert to lite interpreter model: buck run //xplat/caffe2/fb/pytorch_predictor:converter <FULL_JIT_MODEL> <LITE_MODEL> Load model using benchmark program (on devserver) buck run //xplat/caffe2/fb/lite_predictor:lite_predictor_tts -- --model <MODEL> (Expect benchmark to fail because inputs are invalid) Reviewed By: iseeyuan Differential Revision: D20961463 fbshipit-source-id: 5022077caccd8c07666789bbbca68c643129ee0a	2020-04-21 00:35:50 -07:00
Nikita Shulga	3d2d5c82da	Clean-up non-AVX variant of `bitwise_binary_op` template (#36966 ) Summary: Compute number of element as `constexpr` and use it as both `buffer` element size as well as for upper boundary Pull Request resolved: https://github.com/pytorch/pytorch/pull/36966 Differential Revision: D21150602 Pulled By: malfet fbshipit-source-id: 581634565c54c7295f3b77c8dc86659d5cc4ce19	2020-04-21 00:29:04 -07:00
Hong Xu	a1eb591ea6	fmadd in vec256_base should be on Vec256<T>, not T (#36751 ) Summary: This should have been intended to be the general version of fmadd in vec256_double and vec256_float. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36751 Differential Revision: D21148849 Pulled By: pbelevich fbshipit-source-id: 0805075d81c61d22383a3055aebcb91d09e545de	2020-04-20 21:30:57 -07:00
JackCaoG	cdc1ca040a	Enable test_hardsigmoid_grad_xla on pytorch side (#36967 ) Summary: hardsigmoid_backward is implemented in xla side so the test will not error out but is really slow due to a lot of recompile. Enable the test on the pytorch side but skip it in xla side so xla can control when to enable the test Pull Request resolved: https://github.com/pytorch/pytorch/pull/36967 Differential Revision: D21149113 Pulled By: ailzhang fbshipit-source-id: fc337622fafa7be9cff2631de131980ea53adb8d	2020-04-20 21:21:59 -07:00
rohithkrn	742d9796bc	[ROCm] Enable wrongly skipped tests on CPU on ROCm (#36968 ) Summary: `skipIfRocm` skips the test on ROCm regardless of device type [CPU or GPU]. `skipCUDAIfRocm` skips only on GPU on ROCm and runs the test on CPU. ezyang iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/36968 Differential Revision: D21149721 Pulled By: ezyang fbshipit-source-id: 361811b0b307f17193ad72ee8bcc7f2c65ce6203	2020-04-20 21:15:58 -07:00
Gregory Chanan	ce0500eb4c	Ensure linearIndex of advanced indexing backwards is contiguous. (#36959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36959 This is a more straightforward solution to the problem than https://github.com/pytorch/pytorch/pull/36957; I don't know about the relative performance. Fixes: #36956 Test Plan: Imported from OSS Differential Revision: D21144146 Pulled By: gchanan fbshipit-source-id: a10ab905219a73157d5d7183492b52d7c8dd6072	2020-04-20 20:52:55 -07:00
omromano	59f923e884	Update NNPI backend to 0.5.1.8 (#4397 ) Summary: Update of NNPI Backend to v0.5.1.8. Pull Request resolved: https://github.com/pytorch/glow/pull/4397 Reviewed By: jfix71 Differential Revision: D20938791 Pulled By: arunm-git fbshipit-source-id: 4f50d104db1ac274e92d9fce6fb86afd930d8511	2020-04-20 20:34:20 -07:00
Deyu Fu	346215caa4	[jit] Adding vectorized load/store support for JIT generated CUDA kernel (#36555 ) Summary: JIT pointwise kernel currently does not do vectorized load/store, which may lead to not optimal performance in shorter data types, like half and int8. In this PR, a fixed length of 4 elements per load/store is added for supported tensor shape, implemented as a runtime check inside kernel. Supported tensor shape: - all input/output data point are aligned to 4sizeof(dtype) - last dimension contiguous(stride 1) and size is multiple of 4 - all other dimension have stride that is multiple of 4 All test_jit passed, and here is performance result on a simple `ax+by+c` fusion result before PR: ``` torch.float32 kernel time: 0.748 ms. torch.float16 kernel time: 0.423 ms. torch.int8 kernel time: 0.268 ms. ``` result after PR: ``` torch.float32 kernel time: 0.733 ms. torch.float16 kernel time: 0.363 ms. torch.int8 kernel time: 0.191 ms. ``` test code: ``` import torch import time # disable profiling to test all data types torch._C._jit_set_profiling_mode(False) torch._C._jit_set_profiling_executor(False) torch.jit.script def axpby(x, y): return x * 2 - y * 3 + 1 for test_dtype in [torch.float32, torch.float16, torch.int8]: a = torch.randn(12345,4096, device="cuda").to(test_dtype) b = torch.randn(12345,4096, device="cuda").to(test_dtype) # warm up for _ in range(100): c = axpby(a,b) torch.cuda.synchronize() start = time.time() for _ in range(1000): c = axpby(a,b) torch.cuda.synchronize() end = time.time() print("{} kernel time: {:.3f} ms.".format(test_dtype, end-start)) ``` Generated code: [log_with_generated_code.txt](https://github.com/pytorch/pytorch/files/4472813/log_with_generated_code.txt) Additional note: double type is disabled from vectorized code path. We can later improve it with dynamic vectorization length support and less in-kernel check when we can use tensor shape information in codegen. For now, this implementation is following cache through TensorDesc mechanism, which does not have enough compile time information. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36555 Differential Revision: D21142762 Pulled By: ngimel fbshipit-source-id: 1cfdc5807a944c4670b040dc2d2dfa480377e7d7	2020-04-20 19:24:28 -07:00
Ilia Cherniavskii	3ae70cb847	Add RecordFunctionGuard (#36215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36215 Make it possible to disable observers, e.g. to avoid infinite recursion if an observer uses an operator Test Plan: USE_BLAS=MKL USE_MKLDNN=0 USE_CUDA=0 python setup.py develop install ./build/bin/test_jit Differential Revision: D20912676 Pulled By: ilia-cher fbshipit-source-id: 29760cdfe488a02f943f755967b78779d6dbcef3	2020-04-20 19:19:14 -07:00
Shen Li	a14a8376aa	Link NCCL lib to TORCH_PYTHON_LINK_LIBRARIES when USE_NCCL=1 (#36948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36948 Compiling with USE_DISTRIBUTED=0 fails as it would still try to compile python_nccl.cpp which requires NCCL but the NCCL lib is not linked. Test Plan: Imported from OSS Differential Revision: D21142012 Pulled By: mrshenli fbshipit-source-id: 6ca94056ca859da7f833a31edcb4c5260d8625e4	2020-04-20 19:15:02 -07:00
Pritam Damania	32307efd68	Fix flaky test_barrier_timeout* tests for test_distributed. (#36963 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36963 A couple of reasons why these tests were flaky: 1) Sometimes the error message for timeout would include lowercase 'timeout' which was not included in the regex. 2) The timeout was 0.2 seconds, which was probably too small for ASAN/TSAN. ghstack-source-id: 102541231 Test Plan: waitforbuildbot Differential Revision: D21144954 fbshipit-source-id: 57945f53e1627028835cbfd2adb72f21d87f593f	2020-04-20 18:58:16 -07:00
Xiaoqiang Zheng	5e504e83e8	Add sync-point insertions and block/thread local memory allocations (#36563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36563 Test Plan: Imported from OSS Differential Revision: D21014238 Pulled By: zheng-xq fbshipit-source-id: 4d61ff2f76345ea2825f2d5f60a771f65b24ad69	2020-04-20 18:52:30 -07:00
Ailing Zhang	0647f34477	Delete docker build job for pytorch-linux-bionic-clang9-thrift-llvmdev (#36930 ) Summary: Seems like no one is using this image. We could delete it from our docker hub. I think we don't need to regenerate a new set of images, since we are only deleting. But please correct me if I'm wrong. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36930 Reviewed By: malfet Differential Revision: D21138079 Pulled By: ailzhang fbshipit-source-id: 4a563e6310b193cb885411bcd925296b01223368	2020-04-20 17:58:36 -07:00
Supriya Rao	c03d149483	[quant][graph] Add quantizedmul_relu and quantized::mul_scalar_relu ops (#36844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36844 Test Plan: python test/quantization/test_quantize_script.py TestQuantizeScriptPTSQOps.test_quantized_mul_relu python test/quantization/test_quantize_script.py TestQuantizeScriptPTSQOps.test_quantized_mul_scalar_relu Imported from OSS Differential Revision: D21134440 fbshipit-source-id: 483fb30066cebf659b2f3be22c18d389c7972f81	2020-04-20 15:42:06 -07:00
Supriya Rao	ee2a9ac56e	[quant][graph] Support for quantized::mul and quantized::mul_scalar (#36818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36818 Test Plan: python test_quantize_script.py test_quantized_mul python test_quantize_script.py test_quantized_mul_scalar Imported from OSS Differential Revision: D21134438 fbshipit-source-id: 9ed5e852c5c0c6899a11e3ed36e12b5045608ea4	2020-04-20 15:40:32 -07:00
David Reiss	1c15cb4773	Add bundled input support to speed_benchmark_torch (#36765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36765 We recently added support for bundling inputs with models. Now add support to the benchmarker to use those inputs. This frees users from having to look up the proper input format for each model. Test Plan: - Ran on a model without bundled inputs. Saw a clear error. - Ran on a model with too few bundled inputs. Saw a clear error. - Ran on a proper bundled input. Model executed. Differential Revision: D21142659 Pulled By: dreiss fbshipit-source-id: d23c1eb9d1de882345b007bf2bfbbbd6f964f6fe	2020-04-20 15:32:57 -07:00
Zafar	dbdd0f50f4	[quant] Minor refactor in fused conv names (#36883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36883 Test Plan: Imported from OSS Differential Revision: D21110360 Pulled By: z-a-f fbshipit-source-id: f7268a8004432254aa54525854bed059a1a6a350	2020-04-20 15:24:39 -07:00
Jesse Brizzi	28f439d4f4	add absolute alias for abs (#36597 ) Summary: Adds an absolute alias for the abs function to match Numpy's use of both: https://docs.scipy.org/doc/numpy/reference/generated/numpy.absolute.html Adds test to ensure the output from abs and absolute are the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36597 Differential Revision: D21024458 Pulled By: jessebrizzi fbshipit-source-id: 4f2987e7bc7cde444d0a93e833a0350844b48d44	2020-04-20 14:49:51 -07:00
Vasiliy Kuznetsov	4d171c0ed9	hardsigmoid: add PyTorch wrapper for the QNNPACK path (#36699 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36699 Hooks up the QNNPACK op from the previous PR to work in the PyTorch layers. Test Plan: ``` python test/quantization/test_quantized.py TestQuantizedOps.test_qhardsigmoid python test/quantization/test_quantized.py TestQNNPackOps.test_qhardsigmoid ``` Imported from OSS Differential Revision: D21057152 fbshipit-source-id: 5f2094d1db80575f7f65497f553ca329f7518175	2020-04-20 14:20:28 -07:00
Vasiliy Kuznetsov	1d720228d2	hardsigmoid operator for QNNPACK, using LUTs (#36698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36698 Adds the hardsigmoid op to QNNPACK using the LUT kernel Test Plan: ``` cd aten/src/ATen/native/quantized/cpu/qnnpack with-proxy ./scripts/build-local.sh ./build/local/hardsigmoid-test ``` Imported from OSS Differential Revision: D21057153 fbshipit-source-id: 31ce09643959b159a82e7083fc11e1e5e98c49ce	2020-04-20 14:18:59 -07:00
Xiang Gao	97f2513c26	Canonicalize includes in aten, and add tests for it (#36301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36301 Test Plan: Imported from OSS Differential Revision: D20943004 Pulled By: ezyang fbshipit-source-id: 4f4d3a5be40f3caedea94fab11d7c7810913ddc1	2020-04-20 14:14:43 -07:00
kshitij12345	47023148ee	Convert C casts to static casts (UnaryOpsKernel) (#36400 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36400 Differential Revision: D21067974 Pulled By: ezyang fbshipit-source-id: 6b99feef349fb97f3b459eab805a7f3d923f9f06	2020-04-20 14:09:38 -07:00
Supriya Rao	a2951a1ea1	[quant][graph] Update quantize_dynamic_script API to take sample model args (#36817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36817 For dynamic quant we need to run the observers on the weights to calculate the qparams before calling convert on the model. The API requires the user to provide dummy inputs that will be fed to the model after the prepare step to run the observers Test Plan: test_quanitze_script.py test_quantization.py Imported from OSS Differential Revision: D21134439 fbshipit-source-id: 8acaab4eb57aadb68a2a02cc470bb042c07e1f6b	2020-04-20 13:45:51 -07:00
Sebastian Messmer	63e9d95c12	Remove hacked twins from codegen (#36666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36666 We need to introduce hacked twin overloads for ops taking lists of optional tensors. I'm not really sure why actually, but them being a special case in codegen blocks removal of `use_c10_dispatcher: unboxed_only`. This PR does not remove the "hacked twin" hack, but it removes it from codegen, instead manually specifying them in register_prim_ops.cpp and unblocking removal of `use_c10_dispatcher: unboxed_only`. Original commit changeset: c5e2386ad06a ghstack-source-id: 102507901 Test Plan: waitforsandcastle Differential Revision: D21044962 fbshipit-source-id: 9d423aac08a1dd2bab54940ccb6219ebdcb7d230	2020-04-20 13:40:14 -07:00
Xi Wang	4e365b9cd1	[Distribution] Implement kl divergence for Cauchy distribution (#36477 ) Summary: Implement closed-form kl divergence between cauchy distribution ### Reference: https://arxiv.org/pdf/1905.10965.pdf Pull Request resolved: https://github.com/pytorch/pytorch/pull/36477 Differential Revision: D21134487 Pulled By: ezyang fbshipit-source-id: 69d2cc2237aa931f224c3807baee7c63f91583fc	2020-04-20 13:27:11 -07:00
Zhang, Xiaobing	4d2502a0c2	fix explicitly defaulted constexpr assignment operator fails to compile error for gcc 5.3.0 (#36561 ) Summary: gcc 5.3.0 has an issue which can't define default function as constexpr, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68754. for works for gcc 5.3.0, not define default function as constexpr function now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36561 Differential Revision: D21024109 Pulled By: ezyang fbshipit-source-id: 58fce704625b7d0926e40b6b12841ebbe392c59c	2020-04-20 13:19:10 -07:00
Zafar	e0e70589ef	[quant][graphmode] tanh pattern and test (#36880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36880 Test Plan: Imported from OSS Differential Revision: D21110105 Pulled By: z-a-f fbshipit-source-id: 45b33ec203c333c1b40376bcf768f2a0eaa8cc5e	2020-04-20 12:44:38 -07:00
Rohan Varma	752d3c281a	[profiler] Allow record_function ctx manager to profile futures (#35055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35055 This is the first step to improving the way RPCs are profiled as suggested by Ilia. For now, since RPC can return two different types of futures, we have to implement two different code paths, one for the python eager mode future and one for the jit future. This diff implements the python eager part. We have defined a method `_call_end_callbacks_on_future` that takes in a future and schedules a `RecordFunction` to be completed as a callback on the future. Once https://github.com/pytorch/pytorch/pull/35039 lands, we can implement the JIT codepath by registering an operator that takes a `Future(t)` as well. These code paths will be merged once the futures are merged. ghstack-source-id: 102478180 Test Plan: Added unit tests Differential Revision: D20452003 fbshipit-source-id: 1acdcb073bd1f63d6fb2e78277ac0be00fd6671d	2020-04-20 12:37:54 -07:00
Hong Xu	1e054bfbdc	Report error for lt, le, gt, ge in complex Vec256 (consistent with <, <=, >, >=) (#36646 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36646 Differential Revision: D21089498 Pulled By: ezyang fbshipit-source-id: 8df33f8ef7070eea6132f355e507f479f6ca6080	2020-04-20 12:23:12 -07:00
Thomas Viehmann	dc4d888193	ROCm: don't warn about CUDA compute capabilities (#35949 ) Summary: The warning doesn't apply for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35949 Differential Revision: D21050540 Pulled By: ezyang fbshipit-source-id: 23b13bddd13f132c2017ddea12b2dc54f611fba6	2020-04-20 11:53:56 -07:00
Xing Liu	2fa17dedac	add a fast path for EmbeddingBag calling FBGEMM (#36679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36679 Test Plan: Imported from OSS Unit tests: python test/run_test.py -i test_nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_failures_cpu python test/run_test.py -i test_nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_and_offsets_cpu python test/run_test.py -i test_nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu python test/run_test.py -i test_nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_and_no_offsets_cpu python test/test_nn.py TestNN.test_embeddingbag_from_pretrained python test/test_nn.py TestNN.test_embeddingbag_from_pretrained_options Finally run: python test/test_nn.py Reviewed By: supriyar Differential Revision: D21058034 Pulled By: xing-liu fbshipit-source-id: 8fef39078132f63c406976d6b76c51f9ce573f90	2020-04-20 11:39:16 -07:00
Rohan Varma	68f847c4c6	[rpc] Remove redundant call to createExceptionResponse (#36857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36857 This was a redundant call as we immediately took the msg and conerted it back to a string ghstack-source-id: 102424018 Test Plan: CI Differential Revision: D21104235 fbshipit-source-id: 4124007d800dbe2718ddebb40281d0a86484685e	2020-04-20 11:30:37 -07:00
JackCaoG	399f494d22	Add at::aten::hardsigmoid symbol (#36851 ) Summary: This will allow xla to use this stmbol when lowering the hardsigmoid Pull Request resolved: https://github.com/pytorch/pytorch/pull/36851 Differential Revision: D21102827 Pulled By: ailzhang fbshipit-source-id: 99429a40a61ba84eb38b872cb3656aa5a172b03b	2020-04-20 11:17:20 -07:00
Vasiliy Kuznetsov	13391cebe2	ai-pep: match the qlinear benchmark to linear (#36674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36674 Slight changes to qlinear benchmark to have it be in the same format as linear, for fairer comparisons between FP and Q. Test Plan: ``` cd benchmarks/operator_benchmark python -m pt.linear_test python -m pt.qlinear_test ``` Imported from OSS Differential Revision: D21102562 fbshipit-source-id: 4f5c693b5de7e26c4326a9ec276560714290f6c6	2020-04-20 09:46:32 -07:00
Vasiliy Kuznetsov	25649684ed	ai-pep: align qconv benchmark to conv (#36673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36673 Slight changes to the qconv benchmark to make it match the floating point benchmark, so we can compare across the two better. Test Plan: ``` cd benchmarks/operator_benchmark python -m pt.qconv_test --tag_filter all python -m pt.conv_test --tag_filter all ``` Imported from OSS Differential Revision: D21102563 fbshipit-source-id: d11c1e4c13d4c5fa1f2332c687aee6889c81b659	2020-04-20 09:44:09 -07:00
Kurt Mohler	c7cf4c1bd6	Bmm sparse dense (#33430 ) Summary: Add sparse-dense BMM operation for CUDA and CPU. Closes https://github.com/pytorch/pytorch/issues/5672 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33430 Differential Revision: D21017828 Pulled By: ezyang fbshipit-source-id: 5bf60efcb16d05c08c7a284accc04d8968f98752	2020-04-20 09:35:16 -07:00
Dmytro Dzhulgakov	30e7055ed7	Revert D21078446: [pytorch] Route default warning sync to LOG(WARNING) Test Plan: revert-hammer Differential Revision: D21078446 Original commit changeset: b5d36aac54d6 fbshipit-source-id: adff2d7e396b2efdd29eeabfe393fbc55edbe635	2020-04-20 00:26:56 -07:00
Mike Ruberry	0f0d69009e	Makes CUDA -float->uint8 cast consistent with CPU (#36832 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/36807. Also updates the cast testing to catch issues like this better. In the future a more constexpr based approach to casting would be nice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36832 Differential Revision: D21120822 Pulled By: mruberry fbshipit-source-id: 9504ddd36cfe6d9f9f545fc277fef36855c1b221	2020-04-19 23:33:38 -07:00
Dmytro Dzhulgakov	9d5dda7c2f	[pytorch] Route default warning sync to LOG(WARNING) (#36768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36768 Follow LOG(WARNING) format for c++ side warnings in order to play well with larger services, especially when using glog. I need to hook up into GLOG internals a bit in order to override FILE/LINE without having to change the whole thing to be macros, but it seems to be stable between glog versions. Note, this also changes caffe2_log_level to warning by default - I think it's a much better default when compiling without glog (or maybe even have info) Test Plan: Run unittest in both glog and non-glog build mode: glog: ``` W0416 12:06:49.778215 3311666 exception_test.cpp:23] Warning: I'm a warning (function TestBody) ``` no-glog: ``` [W exception_test.cpp:23] Warning: I'm a warning (function TestBody) ``` Reviewed By: ilia-cher Differential Revision: D21078446 fbshipit-source-id: b5d36aac54d6b6295a72de6754696ccafbcb84ca	2020-04-19 23:02:55 -07:00
Dmytro Dzhulgakov	49457a7be7	Logging for ATen op subtype Summary: ATenOp should go away, but before it does it's important to understand what's going inside of it. We already log `arguments`, but it's rather hard to parse in scuba as its a list, not a dictionary. Let's extract operator name explicitly so that grouping works well Test Plan: unittest Reviewed By: ngimel Differential Revision: D21057966 fbshipit-source-id: 86be7cca39055620477a28bd5d8ab29e8edd2ff9	2020-04-19 23:02:50 -07:00
Dmytro Dzhulgakov	246e9abf3f	Backward-compatible workaround for ATenOp index with dtype=uint8 (#36667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36667 Hacky workaround that would allow us to reland https://github.com/pytorch/pytorch/pull/34418 Basically moves the type conversion into ATenOp wrapper that is still used in some models. Test Plan: Added unittest. Before it was producing warnings about wrong dtype, with this fix it doesn't Reviewed By: ngimel Differential Revision: D21037368 fbshipit-source-id: 06b435525d8d182c7607e33fd745060d3d6869e9	2020-04-19 23:00:58 -07:00
Xiang Gao	60c3060621	Remove CUDA9Workarounds.cuh (#36840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36840 Differential Revision: D21121709 Pulled By: ngimel fbshipit-source-id: d9319de2511ca660869ec2eafdc1b6edddcf2f51	2020-04-19 22:49:49 -07:00
Joseph Spisak	3c55b5a8ef	Update persons_of_interest.rst	2020-04-19 20:26:02 -07:00
linziyi	1341ea4802	Fix MaxPool3d CUDA backward incorrect results for non-square output (#36820 ) Summary: In the CUDA version of max_pool3d backward, function `max_pool3d_with_indices_backward_out_frame` is defined with args as `..., oheight, owidth, ...` but called with `..., owidth, oheight, ...`. As a result gradients are not fully calculated along the longer dimension due to insufficient grid size. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36820 Differential Revision: D21120078 Pulled By: ngimel fbshipit-source-id: d061726647a4a45d45d5c1a00f2f1cf2745726a8	2020-04-19 18:05:02 -07:00
Natalia Gimelshein	1b3741aa7f	[WIP] reenable bfloat16 masked_select (#36859 ) Summary: Try reenabling bfloat16 masked_select, see it windows tests pass. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36859 Differential Revision: D21109535 Pulled By: ngimel fbshipit-source-id: ca260943e6575d8e788e9fd87161a0d40d3d44fb	2020-04-19 15:41:32 -07:00
Yinghai Lu	be9748f226	Minor tweak of FakeLowp CMakefile (#36861 ) Summary: On some machine I found error like cannot find `cpuinfo.h` when building FakeLowp ops. Fixing it. Also updated the README. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36861 Reviewed By: amylittleyang Differential Revision: D21105755 Pulled By: yinghai fbshipit-source-id: 4f17bd969d38e1b2945b8753ffe4bdc703de36bf	2020-04-19 15:02:26 -07:00
Alban Desmaison	49b10c58a3	Revert D20896697: [pytorch][PR] QuantizedCUDA implementation Test Plan: revert-hammer Differential Revision: D20896697 Original commit changeset: 163554efa23d fbshipit-source-id: e3e370ef7c8be68ea34368dfcc7a7efc9d1f8761	2020-04-19 12:41:51 -07:00
Aleksandr Fedorov	f6daa6220e	QuantizedCUDA implementation (#35463 ) Summary: Closes https://github.com/pytorch/pytorch/issues/30813 1. Tensor quantization logic(quantize_) is moved to the aten/native/quantized. Previously all logic for tensor quantization lived in the aten/quantized/Quantizer.cpp file, and started to become complicated and hard to read. This problem should be addressed in refactoring PR. Still, I reworked this partially because I had to add tensor quantization logic for CUDA, and it was native to move everything to the aten/native/quantized. 2. Requirements to run CUDA_tensor_apply was eased to process any tenser that lives on the CUDA device(QuantizedCUDA included). 3. All quantized data types now have a default constructor. NVCC refuses to compile any gpu_kernel or CUDA_tensor_apply* without them. 4. Minor changes in many files to register QuantizedCUDA backend. 5. test_quantized_tensor is extended to process QuantizedCUDA backend where possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35463 Differential Revision: D20896697 Pulled By: jerryzh168 fbshipit-source-id: 163554efa23d11a2b10bbc2492439db4798eb26b	2020-04-19 08:33:16 -07:00
Brian Vaughan	54ed6fd3ee	Use both absolute and relative tolerance in testing (#34258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34258 This PR allows both atol and rtol to be specified, uses defaults based on the prior analysis (spreadsheet attached to https://github.com/pytorch/pytorch/pull/32538), but retains the absolute tolerance behavior in cases where precision was previously specified explicitly. Test Plan: Imported from OSS Differential Revision: D21110255 Pulled By: nairbv fbshipit-source-id: 57b3a004c7d5ac1be80ee765f03668b1b13f4a7e	2020-04-19 06:16:49 -07:00
Tao Xu	3aec9f7924	[AIDemos] Add missing operators for AIDemos (#36756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36756 1. Add missing ops for pytorch models used in AIDemos. This is because we're migrating towards the lite-interpreter on mobile, the full JIT version will be deprecated 2. Replace the old mobilenet model with a newer one in bytecode format 3. Regenerate the reddit model to include bytecode ghstack-source-id: 102422498 Test Plan: `buck build AIDemos:AIDemos` Reviewed By: iseeyuan, linbinyu Differential Revision: D21013409 fbshipit-source-id: 7704d32fccfe61a2c9db38846ce3153bb93eee7f	2020-04-19 02:09:26 -07:00
Amy Yang	b0b9e704ed	[nnpi glow unit test] SLS tests shape sweep with hypothesis testing (#36833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36833 Add hypothesis testing sweep for one test in each SLS test suite for different precisions. Sweep random seed, embedding shape, batch_size, weight with hypothesis testing. Refactor sls tests into proper file with precision labeled in filename. Test Plan: FB intern: buck test mode/dev //caffe2/caffe2/contrib/fakelowp/test:test_sls_8bit_nnpi_fp32nnpi Will test OSS after exporting PR. Reviewed By: yinghai Differential Revision: D21098346 fbshipit-source-id: af167118e5289bb7178ffc27aaec3af101dcd828	2020-04-18 20:43:25 -07:00
meganset	8b685a8af0	C++ make constructor NamedAnyModule(name,any) public (#36869 ) Summary: Allows creation of _NamedAnyModule_ directly from _AnyModule_, e.g. ``` auto a=torch::nn::AnyModule(torch::nn::Linear(1,2)); auto m=torch::nn::NamedAnyModule("fc", a); ``` Without the public constructor, it would be necessary to recast the AnyModule to underlying type, then have the constructor cast it back to AnyModule. With the public AnyModule constructor, possible to do ``` auto q=Sequential({m}); ``` or ``` q->push_back(m.name, m.module()); ``` (works in conjunction with PR https://github.com/pytorch/pytorch/issues/36720 which allowed adding _AnyModule_ directly) Pull Request resolved: https://github.com/pytorch/pytorch/pull/36869 Differential Revision: D21110074 Pulled By: yf225 fbshipit-source-id: aaea02282b9024824785e54d8732c0a12c850977	2020-04-18 20:08:15 -07:00
Xiang Gao	6ba734bae9	Vectorize reduction when reducing on fastest striding dimension [resubmit] (#36873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36873 Differential Revision: D21109194 Pulled By: ngimel fbshipit-source-id: eb18c6b4394f19a6c5eca45ef4ce97d623e051bd	2020-04-18 16:27:00 -07:00
Pritam Damania	136d84dd38	Enhance error message for MPI unavailability. (#36781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36781 Mention that you need to to build PyTorch from source to enable MPI. Additional context: https://discuss.pytorch.org/t/distributed-pytorch-with-mpi/77106. ghstack-source-id: 102341246 Test Plan: waitforbuildbot Differential Revision: D21082009 fbshipit-source-id: 3a3286349e71322726a341dfc743b5978c7d9a56	2020-04-18 14:45:44 -07:00
Lu Fang	d933ec14ce	[c10] Fix the hanlding for Caffe2 ops which return tensor list (#36841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36841 right now, all c2 ops's output will be unwrapped blindly. This is not correct, if we have a single tensor list returned. Test Plan: buck test mode/dev-nosan mode/no-gpu //caffe2/caffe2/fb/python/operator_test:torch_integration_test Reviewed By: alyssawangqq Differential Revision: D21100463 fbshipit-source-id: 9f22f3ddf029e7da9d98008d68820bf7f8239d4f	2020-04-18 13:30:43 -07:00
Jeremy Lilley	0e6c66493a	[engine] Ensure future is complete when exiting Engine::mark_graph_task_completed() (#36856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36856 Previously, we could early-exit mark_graph_task_completed() without the future actually being fully complete - we were only guaranteeing that it was at least in the process of being marked complete. This seems to be triggering an assert graph_task->future_result_->completed() This change simply adds a 1-line waitNoThrow() call to ensure that the future has been marked complete before exiting the mark_graph_task_completed() function. The cost is relatively reasonable, since this isn't the common path. ghstack-source-id: 102423589 Test Plan: buck test mode/dev-nosan caffe2/test/,,, Differential Revision: D21104121 fbshipit-source-id: 51c1554618880fe80d52d5eb96716abc15f6be8a	2020-04-18 07:47:09 -07:00
Karel Ha	5d9b4d5720	Update contribution_guide.rst (#36438 ) Summary: Fix formatting: change "Frequently Asked Questions" into an RST header, which is clickable and one get a URL of the FAQ section Pull Request resolved: https://github.com/pytorch/pytorch/pull/36438 Differential Revision: D21106180 Pulled By: mruberry fbshipit-source-id: 370dafd1883bd57285b478cf2faa14ae2f86e3ba	2020-04-18 02:27:38 -07:00
Bo Chang	2e93808cde	Update functional.py (#36600 ) Summary: Fix a latex typo in the docstring. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36600 Differential Revision: D21106164 Pulled By: mruberry fbshipit-source-id: b611f0eac209b59b06ace1017e418a68341c4105	2020-04-18 02:16:54 -07:00
Yinghai Lu	197c85fcbc	Use hypothesis to generate seed (#36860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36860 att Test Plan: ``` buck test mode/dev //caffe2/caffe2/contrib/fakelowp/test:test_batchnorm_nnpi_fp16nnpi -- 'test_bn $caffe2\.caffe2\.contrib\.fakelowp\.test\.test_batchnorm_nnpi_fp16\.BatchnormTest$' ``` Reviewed By: hyuen Differential Revision: D21085268 fbshipit-source-id: becc25d6e8841dc25842d9397ecf500f4da6d2f7	2020-04-17 22:15:53 -07:00
Jerry Zhang	57c50db441	[reland][quant] Add backward compatiblity test (#36842 ) Summary: re-created the same PR: https://github.com/pytorch/pytorch/pull/36639 because ghimport does not support importing binary files right now Pull Request resolved: https://github.com/pytorch/pytorch/pull/36842 Test Plan: python test/quantization/test_backward_compatibility.py Differential Revision: D21100689 Pulled By: jerryzh168 fbshipit-source-id: 625a0f9da98138c9c2891b9d99fc45d85fa27cca	2020-04-17 21:24:31 -07:00
Yinghai Lu	b245b1d23e	Open source fbgemm fp16 pack op (#36791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36791 This should enable `test_fc_nnpi_fp16.py` test in fakelowp test. Test Plan: ``` buck test caffe2/caffe2/fb/fbgemm: ``` Reviewed By: hyuen Differential Revision: D21085221 fbshipit-source-id: 512bca2eea1a4cc5d11129cfe9e65e7a4a0ba1e0	2020-04-17 21:00:52 -07:00
Yinghai Lu	1b1a6a90c0	Open source fakefp16 BatchMatMul op (#36789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36789 ATT. This should enable test_batchmatmul_nnpi_fp16.py test. Test Plan: ``` buck test caffe2/caffe2/fb/fbgemm: ``` Reviewed By: hyuen Differential Revision: D21084876 fbshipit-source-id: d2b4a4c44ad5cf454cefbe6a4cafdf110c6d35cb	2020-04-17 21:00:47 -07:00
Yinghai Lu	b08494eb19	Use hypothesis to to control the rand seed (#36717 ) Summary: This will increase the test coverage. More tests to enable: ``` test_fc_nnpi_fp16.py test_op_nnpi_fp16.py test_batchmatmul_nnpi_fp16.py ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36717 Reviewed By: hyuen Differential Revision: D21061269 Pulled By: yinghai fbshipit-source-id: 99277db8ff23f0f4e679f7b8955cfa305916e7a7	2020-04-17 20:59:15 -07:00
Jithun Nair	dc1f9eee53	Avoid printing erroneous warning about "MIOpen not found" for ROCm builds (#33837 ) Summary: Older versions of MIOpen (<=2.2) don't have the `miopenGetVersion` api, but MIOpen is always a part of the ROCm builds, so do NOT set `lib` to None for ROCm builds. `__cudnn_version` will be `None` for older versions of MIOpen. Setting `lib` to `None` ends up printing the following erroneous warning when running unit tests: ``` /root/.local/lib/python3.6/site-packages/torch/backends/cudnn/__init__.py:120: UserWarning: cuDNN/MIOpen library not found. Check your LD_LIBRARY_PATH }.get(sys.platform, 'LD_LIBRARY_PATH'))) ``` Eg.: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/18387/consoleFull Pull Request resolved: https://github.com/pytorch/pytorch/pull/33837 Differential Revision: D20369285 Pulled By: xw285cornell fbshipit-source-id: e82e6f8f5bccb486213cf868f40aece41ce11f98	2020-04-17 20:31:01 -07:00
Jithun Nair	6963973d5b	Print GPU info for ROCm test runs (#36827 ) Summary: Printing the GPU device info for ROCm test runs could aid in triaging device-specific issues. Printing gfx generation and device Name for now. Sample output: ``` Marketing Name: AMD EPYC 7702 64-Core Processor Marketing Name: AMD EPYC 7702 64-Core Processor Name: gfx906 Marketing Name: Device 66a1 Name: gfx906 Marketing Name: Device 66a1 Name: gfx906 Marketing Name: Device 66a1 Name: gfx906 Marketing Name: Device 66a1 Name: gfx906 Marketing Name: Device 66a1 Name: gfx906 Marketing Name: Device 66a1 Name: gfx906 Marketing Name: Device 66a1 ``` cc iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/36827 Differential Revision: D21104532 Pulled By: xw285cornell fbshipit-source-id: 167ce6b762e7f85d22addad755bfdcde8d111788	2020-04-17 20:13:14 -07:00
Yuxin Wu	a64ea8ea04	Back out "Vectorize reduction when reducing on fastest striding dimension" (#36854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36854 Original commit changeset: ea3f7f29709c Test Plan: n/a Differential Revision: D21103684 fbshipit-source-id: e4862b32bf9815486e5fa7e05b9816550e9b0263	2020-04-17 19:53:30 -07:00
Nikita Shulga	681158e211	Print all test output while running unit tests in bazel (#36825 ) Summary: Closes https://github.com/pytorch/pytorch/issues/36595 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36825 Test Plan: CI Differential Revision: D21100064 Pulled By: malfet fbshipit-source-id: 7aef30bc5c4dc9cfdaaf10c7aa9888647e4a3ef5	2020-04-17 18:06:58 -07:00
Tzu-Wei Huang	f767de608c	[tensorboard] Add strings to image boxes (#30941 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/27300 sample usage: ```python import torch from torch.utils.tensorboard import SummaryWriter with SummaryWriter() as w: w.add_image_with_boxes('imagebox_label', torch.ones(3, 240, 240) * 0.5, torch.Tensor([[10, 10, 100, 100], [101, 101, 200, 200]]), global_step=0, labels=['label1', 'label2']) ``` ![image](https://user-images.githubusercontent.com/2005323/70387144-53580b80-19dc-11ea-91a1-9275de13ca79.png) cc sanekmelnikov orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/30941 Differential Revision: D21083617 Pulled By: natalialunova fbshipit-source-id: b451b701159eecc0ea0bece96ec69f69f5432791	2020-04-17 17:58:43 -07:00
Nikita Shulga	4668d47d1f	Add build_variable.bzl to CMAKE_RERUN target (#36809 ) Summary: `configure_file` command adds its input as a top-level dependency triggering make file regeneration if file timestamp have changed Also abort CMAKE if `exec` of build_variables.bzl failed for some reason Pull Request resolved: https://github.com/pytorch/pytorch/pull/36809 Test Plan: Add invalid statement to build_variables.bzl and check that build process fails Differential Revision: D21100721 Pulled By: malfet fbshipit-source-id: 79a54aa367fb8dedb269c78b9538b4da203d856b	2020-04-17 17:28:07 -07:00
Xingying Cheng	86f354c530	Python binding api to optimize for mobile model on script module. (#36357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36357 ghstack-source-id: 101907180 Creating a python api entry to optimize mobile model which takes a scripted module as argument and returns an optimized scripted module. The initial optimization features includes inserting and folding prepack ops. Test Plan: python test/test_optimizer.py Differential Revision: D20946076 fbshipit-source-id: 93cb4a5bb2371128f802d738eb26d0a4f3b2fe10	2020-04-17 16:21:27 -07:00
Kevin Matzen	fac076a82c	[pytorch] move prim::TupleIndex from register_prim_ops_fulljit to register_prim_ops (#36808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36808 Trying to run a model on mobile and prim::TupleIndex is reported as missing. Moving it out from fulljit. Reviewed By: linbinyu Differential Revision: D21065879 fbshipit-source-id: d7a6dc9e5ad306d76825eaef815ab5582d4bf9a1	2020-04-17 16:19:25 -07:00
Edward Yang	0a8a012005	[RELAND] Port to quantized and other operators to new registration API (#36799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36799 This is a roll up of a bunch of small PRs for ease of landing. - Update reference to RegisterOperators in error message in Convolution. - Add explicit schema for quantized conv/conv_prepack (fixes #36511) - Add a centralized TORCH_LIBRARY declaration for quantized and xnnpack ops (fixes #36510) - Port to new registration API: - Resize - detach/detach_ - All quantization operators - Update quantized README for registering operators with new API Test Plan: Imported from OSS Differential Revision: D21089649 Pulled By: ezyang fbshipit-source-id: 3dd205c2c075f6a3d67aadb2b96af25512e7acd0	2020-04-17 15:46:23 -07:00
Linbin Yu	d7608c7f56	Move DICT ops to lite interpreter (#36816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36816 Move all DICT ops to lite interpreter Test Plan: build Reviewed By: iseeyuan Differential Revision: D21085335 fbshipit-source-id: dd33e5846cea699cadb1801af5dbff68b7f542a0	2020-04-17 15:24:19 -07:00
Lin Yang	cc5befc461	[Format] format a few files (#35187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35187 When I touch these files, lint will always introduce some unintended change, to prevent it from happening, we need to format the code first. change is generated by: arc f Test Plan: integration test. Differential Revision: D20587596 fbshipit-source-id: 512cf6b86bd6632a61c80ed53e3a9e229feecc2a	2020-04-17 14:30:01 -07:00
Sebastian Messmer	54a1e8509c	Reduce binary size of schema inference (#34735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34735 ghstack-source-id: 102255215 Test Plan: testinprod Differential Revision: D20448172 fbshipit-source-id: 1e5580c9e7eb3626420fc1a06acef072586438a7	2020-04-17 13:36:40 -07:00
Tzu-Wei Huang	3a400b8dc3	[tensorboard] Fix function input parameter for add_hparams (#31301 ) Summary: closes https://github.com/pytorch/pytorch/issues/30943 both parameters in add_hparams are mandatory. cc sanekmelnikov orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/31301 Differential Revision: D21069761 Pulled By: natalialunova fbshipit-source-id: 1a12f6760fa9676d11901fa85aa91f4f9fff976d	2020-04-17 11:19:14 -07:00
Xiang Gao	d92005ff73	Vectorize reduction when reducing on fastest striding dimension (#36709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36709 Test Plan: Imported from OSS Differential Revision: D21083393 Pulled By: ngimel fbshipit-source-id: ea3f7f29709c9a6e5b3ec45ba809cb2cf6c5e0c8	2020-04-17 10:12:49 -07:00
Michael Carilli	e6bc34f549	Amp gradient accumulation example (#36601 ) Summary: Several people have asked me about proper Amp usage with gradient accumulation. In particular, it's [unclear to people](https://github.com/NVIDIA/apex/issues/439#issuecomment-610351482) that you should only call `scaler.unscale_()` (if desired) and `scaler.update()` in iterations where you actually plan to step. This PR adds a minimal accumulation example. I built the docs locally and it looks free from sphinx errors, at least. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36601 Differential Revision: D21082295 Pulled By: ngimel fbshipit-source-id: b2faa6c02b9f7e1972618a0f1d5360a03f0450ac	2020-04-17 09:56:36 -07:00
Ailing Zhang	adca88e821	Fix hardsigmoid/hardswish for proper device dispatch. (#36704 ) Summary: https://github.com/pytorch/pytorch/pull/36351 make `hardsigmoid_backward` use tensoriterator but that can be done only after proper device dispatch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36704 Differential Revision: D21068126 Pulled By: ailzhang fbshipit-source-id: 6a6a74216f2b50fa7d15f692cd1583d3d233580a	2020-04-17 09:42:09 -07:00
ashish	9df9aef9b9	[ROCm] Use float datatype for RNN test for MIOpen (#36772 ) Summary: This pull request changes the datatype for `test_RNN_cpu_vs_cudnn_no_dropout` on ROCm testing to float. Currently MIOpen RNN does not support double datatype, so using only double would not run this test using MIOpen. To correctly test PyTorch RNN operator using MIOpen, we would need to test it using float tensors and module. The changes in this PR addresses the comments in https://github.com/pytorch/pytorch/issues/34615 ezyang iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/36772 Differential Revision: D21089533 Pulled By: ezyang fbshipit-source-id: b5781e4ca270d64c6b949b3f0436e7b4eb870e27	2020-04-17 09:14:06 -07:00
Gregory Chanan	4c666d42ff	Handle log_sigmoid(out=) properly. (#36736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36736 Fixes: https://github.com/pytorch/pytorch/issues/36499 Changes: 1) Moves some bindings from LegacyNNDefinitions to Activation so all of log_sigmoid lives together 2) Properly handle non-contiguous / incorrectly sized out parameters to log_sigmoid. This is done by copying from a buffer if necessary. 3) Require that the internal buffer (different from 2)) is contiguous. This should always be the case because it's always created internally. 4) Adds a test Test Plan: Imported from OSS Differential Revision: D21070934 Pulled By: gchanan fbshipit-source-id: 94577313c32d1ef04d65c1d6657598304a39fe6e	2020-04-17 08:27:57 -07:00
Gregory Chanan	46288465fe	Print keyword-only arg symbol for function signature suggestions. (#36780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36780 Fixes: https://github.com/pytorch/pytorch/issues/36773 Test Plan: Imported from OSS Differential Revision: D21081993 Pulled By: gchanan fbshipit-source-id: 624b0077f88208aafa131ab7b3e5f1fe9dd70987	2020-04-17 07:30:46 -07:00
Gregory Chanan	ebdc4f02ad	Fix incorrect merge of #34136 . (#36760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36760 If you look at https://github.com/pytorch/pytorch/pull/34136/, you will notice a commit (`80c15c087c`) that didn't get merged. This is to address that, to avoid crashing on remainder when the rhs is 0. Test Plan: Imported from OSS Differential Revision: D21078776 Pulled By: gchanan fbshipit-source-id: 0ac138cbafac28cf8d696a2a413d3c542138cff9	2020-04-17 07:25:13 -07:00
Xiaoqiang Zheng	32bbf12aa7	Make trivial thread-idx for degenerate statements without thread-idx. (#36480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36480 Test Plan: Imported from OSS Differential Revision: D20992505 Pulled By: zheng-xq fbshipit-source-id: 3d4e5401b59b9507b5f2db659e511bd1af53f5ab	2020-04-17 02:31:07 -07:00
Sebastian Messmer	31f91d645a	Improve aten::backward handling (#36750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36750 - It seems the JIT schema for aten::backward and the schema in native_functions.yaml diverged on whether the retain_graph/keep_graph parameter takes a `bool` or a `bool?`. Make them identical again. - Also remove the mutability annotation for the self parameter. This does not make sense together with AliasAnalysisKind::CONSERVATIVE and it triggered an assertion - After fixing the mutability annotation, we can fix that assertion so that it doesn't exclude aten::backward from its check anymore - Last but not least, remove the unboxed_only marker from aten::backward. This requires us to add a special case in register_c10_ops.cpp for it, because JIT still has its own implementation. ghstack-source-id: 102351871 Test Plan: waitforsandcastle Differential Revision: D21004102 fbshipit-source-id: 19bd1adbd8103c214d32e5126671a809adec581e	2020-04-17 00:49:49 -07:00
Mike Ruberry	b45b9673a1	Fixes clang format (#36787 ) Summary: Fixes clang format. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36787 Differential Revision: D21084603 Pulled By: mruberry fbshipit-source-id: 7e29da135f9a2aa126cb68640e33c1914fd570e3	2020-04-17 00:42:51 -07:00
Sebastian Messmer	a89d1ed549	Move unboxing for factory ops to after dispatch (#36564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36564 - ghstack-source-id: 102292162 Test Plan: waitforsandcastle Differential Revision: D21014324 fbshipit-source-id: 5b95eafbb668bed174cd2c826e308e74e329f552	2020-04-16 22:44:00 -07:00
Nikolay Korovaiko	b5483b8286	[pytorch][PR] Re-enable a failing test (#36763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36763 Differential Revision: D21083309 Pulled By: Krovatkin fbshipit-source-id: 4fb5b95bd3e01bd83a406d4394f266d7fd168f21	2020-04-16 22:21:47 -07:00
Mike Ruberry	f00014b790	Revert D21080503: [pytorch][PR] [quant] Add backward compatiblity test Test Plan: revert-hammer Differential Revision: D21080503 Original commit changeset: 1dca08208bcc fbshipit-source-id: 5cd8c22130ff28b9231f657f80961e94b65b5792	2020-04-16 22:03:12 -07:00
Pritam Damania	b0227f2965	Add a test to verify non-contiguous tensors work correctly with RPC. (#36705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36705 ghstack-source-id: 102257937 Test Plan: waitforbuildbot Differential Revision: D21058176 fbshipit-source-id: 1d32730d61420324856cc641f888751418c1dd26	2020-04-16 21:56:36 -07:00
Xingying Cheng	eccb40f505	Optimize mobile model on cloned module instead of in-place transformation (#36621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36621 Instead of doing in-place transformation inside optimizeForMobile method, we would like to maintain the original structure for passed scriptedModule, so before optmization starts, we use the cloned module to do subsequent optimization process and return the optimized cloned module. Test Plan: unit test python test/test_mobile_optimizer.py Imported from OSS Differential Revision: D21028406 fbshipit-source-id: 756172ef99b1c1df6bb7d00e5deca85a4c239a87	2020-04-16 21:49:18 -07:00
Nikolay Korovaiko	76f9528878	fix an infininte loop in liveness (#36697 ) Summary: This should fix https://github.com/pytorch/pytorch/issues/36434 We create new nodes to insert explicit uses of loop counters while doing liveness analysis. This was totally fine when we had a two-pass liveness (since we don't really care about liveness sets for those nodes), but with the fixed-point algorithm we can never achieve the fixed point because the initial liveness sets for these new nodes start empty and we always add some live values to those sets thus `changed_` is always set `true`. Now it's amazing that this didn't get exposed and worked for such a long time! Apparently, when we destroyed and recreated those new nodes they were allocated at the exact same addresses in the memory!!!!!! And we use those addresses as keys to get liveness sets, so these new nodes inherited the liveness sets � I was still a bit sceptical of this explanation, so I added more tracing to liveness analysis and AFAICT this is exactly how we were able to get away with this bug for such a long time!!! Here's a few excerpts from the trace. Before we enter a loop we create a node to use loop's upper bound. ``` [DEBUG liveness.cpp:121] @#$Creating a store for mtc : 0x555777c19eb0 ``` When processing the loop, we also process this node. Its liveness sets are empty! ``` [DEBUG liveness.cpp:099] Processing node = prim::Store(%3) addr = 0x555777c19eb0 [DEBUG liveness.cpp:148] @#$liveness_sets_[it] : {} ``` We are done with this loop. We remove the node we added ``` [DEBUG liveness.cpp:127] @#$Destroying a store for ctc : 0x555777c19eb0 ``` We are about to process the loop for the second time, so we create the use node again. Note, it's allocated at the exact same address!!! ``` [DEBUG liveness.cpp:118] @#$Creating a store for ctc : 0x555777c19eb0 ``` Now we process it again. But now it has non-empty sets even though it's a brand new node!!!! ``` [DEBUG liveness.cpp:099] Processing node = prim::Store(%i) addr = 0x555777c19eb0 [DEBUG liveness.cpp:148] @#$liveness_sets_[it] : {2, 3, 10} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36697 Differential Revision: D21059313 Pulled By: Krovatkin fbshipit-source-id: b0fdeb4418e0e73f34187826877179260f21cf7b	2020-04-16 20:50:01 -07:00
Wanchao Liang	6d4c509168	[autograd] lower MAX_DEPTH limit according to TSAN limit (#36745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36745 As we hold a mutex for our custom C++ Node, when calling reentrant backward from custom C++ function, we will cocurrently holding many mutexes up to MAX_DEPTH. TSAN only allow 65 mutexes at once, otherwise it will complain. This PR lower the limit according to TSAN. TSAN Reference: https://github.com/google/sanitizers/issues/950 Test Plan: Imported from OSS Differential Revision: D21072604 Pulled By: wanchaol fbshipit-source-id: 99cd1acab41a203d834fa4947f4e6f0ffd2e70f2	2020-04-16 20:43:20 -07:00
Nikita Shulga	d7fc05b0bf	Fetch TORCH_SRCS from `build_variables.bzl` (#36737 ) Summary: Mimic `.bzl` parsing logic from https://github.com/pytorch/FBGEMM/pull/344 Generate `libtorch_cmake_sources` by running following script: ``` def read_file(path): with open(path) as f: return f.read() def get_cmake_torch_srcs(): caffe2_cmake = read_file("caffe2/CMakeLists.txt") start = caffe2_cmake.find("set(TORCH_SRCS") end = caffe2_cmake.find(")", start) return caffe2_cmake[start:end+1] def get_cmake_torch_srcs_list(): caffe2_torch_srcs = get_cmake_torch_srcs() unfiltered_list = [x.strip() for x in get_cmake_torch_srcs().split("\n") if len(x.strip())>0] return [x.replace("${TORCH_SRC_DIR}/","torch/") for x in unfiltered_list if 'TORCH_SRC_DIR' in x] import imp build_variables = imp.load_source('build_variables', 'tools/build_variables.bzl') libtorch_core_sources = set(build_variables.libtorch_core_sources) caffe2_torch_srcs = set(get_cmake_torch_srcs_list()) if not libtorch_core_sources.issubset(caffe2_torch_srcs): print("libtorch_core_sources must be a subset of caffe2_torch_srcs") print(sorted(caffe2_torch_srcs.difference(libtorch_core_sources))) ``` Move common files between `libtorch_cmake_sources` and `libtorch_extra_sources` to `libtorch_jit_core_sources` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36737 Test Plan: CI Differential Revision: D21078753 Pulled By: malfet fbshipit-source-id: f46ca48d48aa122188f028136c14687ff52629ed	2020-04-16 19:12:52 -07:00
Supriya Rao	dcfc121fd7	Enable jit trace check_trace for quantized inputs (#36740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36740 Issue #23986 Test Plan: python test/quantization/test_quantized_nn_mods.py Imported from OSS Differential Revision: D21077551 fbshipit-source-id: fdd15db3284975c99b3e250a568fa94c617d21eb	2020-04-16 19:06:55 -07:00
Jerry Zhang	484a00b2d3	[quant] Add backward compatiblity test (#36771 ) Summary: re-created the same PR: https://github.com/pytorch/pytorch/pull/36639 because ghimport does not support importing binary files right now Pull Request resolved: https://github.com/pytorch/pytorch/pull/36771 Test Plan: python test/quantization/test_backward_compatibility.py Differential Revision: D21080503 Pulled By: jerryzh168 fbshipit-source-id: 1dca08208bccead60bba03e5fb5d39e1a1d7c20d	2020-04-16 19:00:30 -07:00
Vasiliy Kuznetsov	2c558dba3d	quantized layer norm: add to static quant (#36690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36690 Adds the static quantization hook for LayerNorm Test Plan: ``` python test/quantization/test_quantized_nn_mods.py ModuleAPITest.test_layer_norm python test/quantization/test_quantization.py EagerModePostTrainingQuantTest.test_normalization ``` Imported from OSS Differential Revision: D21055401 fbshipit-source-id: 188329f35359576d50ed0db5fb675ce68c28bf7d	2020-04-16 18:18:02 -07:00
Wanchao Liang	24aac32171	[jit] Add dictionary as output of tracer (#36696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36696 This PR add dictionary as a supported output of tracer under the strict flag. Test Plan: Imported from OSS Reviewed By: houseroad Differential Revision: D21056962 Pulled By: wanchaol fbshipit-source-id: ace498182d636de853cf8a1efb3dc77f5d53db29	2020-04-16 18:12:38 -07:00
Wanchao Liang	e1cb8577ac	[jit] remove Dict iterationOrder and use insertion order (#36609 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36609 This PR remove iterationOrder that backed by CompareKeys, we universally use Dict inseration order default backed by c10::Dict to match the python behaviors Test Plan: Imported from OSS Reviewed By: houseroad Differential Revision: D21056963 Pulled By: wanchaol fbshipit-source-id: 487961c2db2cdc27461b2fbd6df91faafc6920b5	2020-04-16 18:11:15 -07:00
Karl Ostmo	05bbf6afb6	Revert D20964193: Port to new registration API (part 1) Test Plan: revert-hammer Differential Revision: D20964193 Original commit changeset: 27aeea01ccf5 fbshipit-source-id: b17e1342431c493055f053dd575cf24065335bf9	2020-04-16 17:54:32 -07:00
David Reiss	63e5058c88	Fix naming of "strides" method in TensorType (#36727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36727 Looks like this was renamed by accident in 0cbd7fa46f2 Test Plan: Unit test. Lint. Differential Revision: D21076697 Pulled By: dreiss fbshipit-source-id: dbd18cb41c7b26479984a7a7b12ad41a4c5b7658	2020-04-16 17:07:27 -07:00
Supriya Rao	753157b88e	[quant][graph] Graph mode quantization support for sigmoid (#36622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36622 Test Plan: python test/quantization/test_quantize_script.py test_swap_dequantize_all_ops Imported from OSS Differential Revision: D21075255 fbshipit-source-id: 025f432215eaa8acf34d492e7722102ca053abeb	2020-04-16 16:50:54 -07:00
Supriya Rao	17c268be10	[quant][graph] Add quantized batch_norm2d_relu to graph mode (#36552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36552 Do the fusion for inplace and non-inplace relu Tested for functional relu as well. Functional batch_norm is not a usual use-case (since it expects the weight, bias, mean, var) so that is not tested. Test Plan: test_quantize_script.py test_batch_norm2d_relu Imported from OSS Differential Revision: D21075253 fbshipit-source-id: 0a07ea477cab19abf1d1b0856e623b1436240da1	2020-04-16 16:49:26 -07:00
Edward Yang	66158868d5	Update reference to RegisterOperators in error message in Convolution (#36389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36389 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20964193 Pulled By: ezyang fbshipit-source-id: 27aeea01ccf5dfcebb8f043cde009a14dde3958e	2020-04-16 16:22:31 -07:00
Owen Anderson	1fc3556ec9	Teach the tensorexpr vectorizer to handle nested For loops. (#36467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36467 Differential Revision: D21013179 Pulled By: resistor fbshipit-source-id: aa4f3da58cf16934f11e0cf4252a300cbac98f21	2020-04-16 15:40:44 -07:00
Sebastian Messmer	e9b4580411	Revert D20839674: [pytorch][PR] Re-enable a failing test Test Plan: revert-hammer Differential Revision: D20839674 Original commit changeset: 68f41610a823 fbshipit-source-id: b69ccfd49bbde566870fa53cd3fe2931721db4ea	2020-04-16 15:26:34 -07:00
Amy Yang	37479ddf4e	[caffe2] create and register child ws in pybind (#36741 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36741 Create child workspace that shares parent workspace's blobs. Register child workspace in registrar to enable switching into child workspace and feeding to child workspace alone. Test Plan: numeric suite unit tests in stacked diff Reviewed By: hx89 Differential Revision: D21055567 fbshipit-source-id: 374b12aef75a4c58452c271f8961ee156ce6c559	2020-04-16 14:53:55 -07:00
Nikita Shulga	5b515fd034	Delete pytorch_linux_xenial_cuda9_cudnn7_py3_build (#36731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36731 Differential Revision: D21071658 Pulled By: malfet fbshipit-source-id: c0e072bd5316e332f5dbc818f4bc140ce8950437	2020-04-16 14:46:42 -07:00
Karl Ostmo	4894cba572	Revert D19775659: [WIP] Move profiler to a dispatch wrapper Test Plan: revert-hammer Differential Revision: D19775659 Original commit changeset: 5cbe5f736660 fbshipit-source-id: dcb41d2433697c5d521044a9dbc12c79f31e0929	2020-04-16 14:18:51 -07:00
Nick Gibson	ee3d046f87	[TensorExpr] Add support for Axis reordering in LoopNest (#36540 ) Summary: Adds a capability for reordering axes in the LoopNest. This was fairly straightforward except when handling Reduction initializers which required more changes, UPDATE: actually the complicated bit was preserving the ordering of statements in the loopnest which should not be reordered. Usage looks something like this: ``` Tensor* tensor = Compute( "f", {{2, "x"}, {3, "y"}}, [](const VarHandle& x, const VarHandle& y) { return ExprHandle(1.0f) + cast<float>(x) * x + cast<float>(y) * y; }); LoopNest l({tensor}); /* LoopNest looks like: for x in ... for y in ... f[x,y] = 1 + x * x + y * y; / auto loops = l.getLoopStmtsFor(tensor); l.reorderAxis(tensor, loops[0], loops[1]) / LoopNest looks like: for y in ... for x in ... f[x,y] = 1 + x * x + y * y; */ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36540 Differential Revision: D21068143 Pulled By: nickgg fbshipit-source-id: f02c29004376df4f5a9bedff366c075772726618	2020-04-16 13:42:47 -07:00
James Reed	a85c835196	[WIP] Move profiler to a dispatch wrapper (#33057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33057 Test Plan: Imported from OSS Differential Revision: D19775659 Pulled By: jamesr66a fbshipit-source-id: 5cbe5f736660c8543764ef62b16550638d9ceb72	2020-04-16 13:36:37 -07:00
Nikolay Korovaiko	487dc0f961	Re-enable a failing test (#35847 ) Summary: This test was failing because caching resulted into a function with multiple execution plans rather than multiple functions with a single execution plan each as a test writer intended. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35847 Differential Revision: D20839674 Pulled By: Krovatkin fbshipit-source-id: 68f41610a823d94c1e744c85ac72652c741d73ae	2020-04-16 11:46:02 -07:00
Michael Ranieri	3567b881a5	make sure dispatch test works on windows (#36729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36729 setenv not available on windows Test Plan: CI green in ovrsource Reviewed By: stepancheg Differential Revision: D21067835 fbshipit-source-id: ddbc3285ef88f123dc6a200b661c48cfafc6bf00	2020-04-16 11:36:56 -07:00
Supriya Rao	cb6bebfa9b	[quant][graph] Add quantized batch_norm2d support to graph mode (#36692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36692 Test Plan: python test_quantize_script.py Imported from OSS Differential Revision: D21055596 fbshipit-source-id: b21ad6bb9763cd2e7b22f525a9a46e5f4d485e17	2020-04-16 11:12:51 -07:00
Supriya Rao	dd4dece68a	[quant][graph] Add useQuantizable function (#36691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36691 Enables to selectively insert observers at the inputs of aten/call functionc Test Plan: test_quantize_script.py Imported from OSS Differential Revision: D21055597 fbshipit-source-id: b47733b94b127d7a47b3224da7af98f0da38d30d	2020-04-16 11:11:10 -07:00
Elias Ellison	54a575c9bd	[JIT] fix torch.tensor jit dtype (#36587 ) Summary: Previously we were always creating a double tensor from `torch.tensor(1.)`, whereas python eager uses the current default dtype. Fix for https://github.com/pytorch/pytorch/issues/36369 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36587 Differential Revision: D21043617 Pulled By: eellison fbshipit-source-id: 38da303594f52e06941d86b6e57c4a06e7d36938	2020-04-16 10:55:49 -07:00
Edward Yang	e29348f828	Switch to pybind11 style registration function API. (#36258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36258 Previous we had a && chaining style API. There are some downsides to this API: - It's easy to forget the 'static' qualifier in front, leading to subtle ODR bugs. - It is not compatible with torchbind class_ definitions, as these need multiple levels of chaining. So in practice people end up having to define multiple static initializers, one per class. - It's not like pybind11. - There's no way to conveniently get the file and line number of the registration, as there is no macro point in the API. - The old API doesn't really encourage people to put all of their definitions for a library in one place, and to give a custom namespace for it. Similarly, the old API wasn't very DRY, because you had to keep repeating the namespace/dispatch key you were writing implementations for. The new API is modeled exactly off of the PYBIND11_MODULE macro: you write: ``` TORCH_LIBRARY(aten, m) { m.def("aten::add(Tensor self, Tensor other) -> Tensor"); ... } ``` in a non-chaining fashion, and under the hood the macro expands to define a function, and define a static initializer that allocates c10::Library (previously called c10::Module, but we renamed it to avoid confusion with the existing NN module concept), passes it to your function, and then retains it for the rest of the lifetime of the program. Specification of the namespace is mandatory, and in later commit I plan to make it a hard error to TORCH_LIBRARY the same library name twice. If you are specifying an implementation for an existing operator (e.g., you're the XLA backend, or even if you're just putting registrations for implementations at the implementation site), you should use TORCH_LIBRARY_IMPL, which instead takes a backend argument (instead of namespace) and can be used to specify an implementation for a backend. Unlike TORCH_LIBRARY, you can do as many of these as you want for a backend. This needs updates to the mobile code analyzer. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929257 Pulled By: ezyang fbshipit-source-id: ba04d78492e8c93ae7190165fb936f6872896ada	2020-04-16 10:44:21 -07:00
David Reiss	3c85f44ce8	Fail setup.py if trying to set up with Python 2 (#35613 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35613 Python 2 has reached end-of-life and is no longer supported by PyTorch. To spare users from a long, doomed setup when trying to use PyTorch with Python 2, detect this case early and fail with a clear message. This commit covers setup.py. Test Plan: Attempted to build PyTorch with Python 2 and saw a clear error quickly. Differential Revision: D20842881 Pulled By: dreiss fbshipit-source-id: caaaa0dbff83145ff668bd25df6d7d4b3ce12e47	2020-04-16 10:24:03 -07:00
David Reiss	83de675ebf	Fail CMake setup if trying to build with Python 2 (#35612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35612 Python 2 has reached end-of-life and is no longer supported by PyTorch. To spare users from a long, doomed build when trying to use PyTorch with Python 2, detect this case early and fail with a clear message. This commit covers CMake setup. Test Plan: Attempted to build PyTorch with Python 2 and saw a clear error quickly. Differential Revision: D20842873 Pulled By: dreiss fbshipit-source-id: b35e38c12f9381ff4ca10cf801b7a03da87b1d19	2020-04-16 10:22:36 -07:00
Jessica Lin	ac950bb9c8	Update docs for master to remove Python 2 references (#36336 ) Summary: Fix compile error from original PR in jit_language_references.rst: https://github.com/pytorch/pytorch/pull/36114 Full details in task: https://our.intern.facebook.com/intern/tasks/?t=64776265 With pytroch 1.5+ we remove python2 support from PyTorch. All documentation under docs/ and on the pytorch.org website needs to remove Python 2 references. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36336 Differential Revision: D21057507 Pulled By: jlin27 fbshipit-source-id: 993a763f1ecb16dad859bc02a07625ddc023645d	2020-04-16 10:15:48 -07:00
Omkar Salpekar	f5c230b892	Make futures vector a local function var (#36677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36677 Move the `futures` vector to be a local function var like `errorFutures`. Holding the lock to clear the vector is now unnecessary. ghstack-source-id: 102265569 Differential Revision: D20884589 fbshipit-source-id: c9a13258bee737d86f9b0d11cdd28263bb923697	2020-04-16 10:09:39 -07:00
Christian Sarofeen	f11c4f90c2	New CUDA Fuser: Unrolling support, interface refactor (#36435 ) Summary: Unrolling support has been added in a way that we get good performing code on GPUs. Not sure how long this link will last but an example of a generated unrolled kernel is: https://godbolt.org/z/i0uAv3 What can be seen from there is multiple calls of "ld.global.f32" without "ld.store.f32" in between them (and vice versa). This means that we are launching multiple loads that can be run in parallel, as well as multiple stores that can be run in parallel. This can be a crucial optimization for memory bound kernels. This was generally a point of concern in TVM as an attempt of a similar kernel from TVM produces: https://godbolt.org/z/Vu97vG which surrounds load - store pairs in conditional branches preventing the benefits of unrolling. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36435 Reviewed By: ZolotukhinM Differential Revision: D21024011 Pulled By: soumith fbshipit-source-id: e852e282fa7a304aba962e1926f756098c011fe0	2020-04-16 09:20:24 -07:00
Mike Ruberry	d7fabfd5df	Implements complex isfinite and isinf (#36648 ) Summary: Implements complex isfinite and isinf, consistent with NumPy. A complex value is finite if and only if both its real and imaginary part are finite. A complex value is infinite if and only if its real or imaginary part are infinite. Old isfinite, isinf, and isnan tests are modernized and instead of fixtures the torch results are compared with NumPy. A new test is added for complex isfinite, isinf, and isnan. The docs for each function are updated to clarify what finite, infinite, and NaN values are. The new tests rely on a new helper, _np_compare, that we'll likely want to generalize in the near future and use in more tests. Addresses part of the complex support tasks. See https://github.com/pytorch/pytorch/issues/33152. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36648 Differential Revision: D21054766 Pulled By: mruberry fbshipit-source-id: d947707c5437385775c82f4e6c722349ca5a2174	2020-04-16 09:09:02 -07:00
Mike Ruberry	d0c925f1c7	Returns float tensors for complex inputs to abs (#35871 ) Summary: Per title. A test is added to test_type_promotion for the behavior. This behavior is consistent with NumPy's. For complex inputs to `abs` the result is cast to float after the computation since the computation of abs must be performed on the original complex tensor. While `std::abs` returns a float value when called on complex inputs, returning a FloatTensor directly would require additional loop instantiations in TensorIterator. This may be worthwhile to pursue in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35871 Differential Revision: D20984456 Pulled By: mruberry fbshipit-source-id: 226445178f92f2b0292e92578656d98674a6aa20	2020-04-16 09:03:17 -07:00
wasiher	bede7d9995	Fixed check for the buffer overflow in assert (#36476 ) Summary: This code looks like a mistake ```C++ AT_ASSERT(size_t(kind) < sizeof(names) / sizeof(AttributeKind)); ``` It does not check if `kind` variable fits in array of pointer called `names` Even if we write something like this: that assert won't fail ```C++ AttributeKind kind = AttributeKind::ival; ((unsigned int)&kind2) += 1; ``` So I fixed it Pull Request resolved: https://github.com/pytorch/pytorch/pull/36476 Differential Revision: D21018748 Pulled By: colesbury fbshipit-source-id: f4d3b8faf64cf07232d595075f831805084f5d00	2020-04-16 08:58:03 -07:00
anjali411	9e016f77a8	Added complex types to get_all_dtypes and turned on masked_fill for complex (#36335 ) Summary: 1. Added complex dtypes to get_all_dtypes to unify testing for complex dtypes with other dtypes so that they don't get out of sync with behavior supported for other dtypes. 2. resolves https://github.com/pytorch/pytorch/issues/36322, https://github.com/pytorch/pytorch/issues/36327 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36335 Differential Revision: D21045603 Pulled By: anjali411 fbshipit-source-id: 5089306b66fdc18148e831f56298da5de673be67	2020-04-16 08:24:45 -07:00
Shen Li	049dede3be	Move rpc.rst back to the source folder to preserve existing doc URLs (#36675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36675 Test Plan: Imported from OSS Differential Revision: D21048628 Pulled By: mrshenli fbshipit-source-id: 3cb1b35ddc1f40c673b0db9048d77dfa024be1e7	2020-04-16 08:12:34 -07:00
Mike Ruberry	30fabd9398	Creates "Versioned Symbol" pattern to preserve serialized Torchscript semantics (#36300 ) Summary: PyTorch users write programs and save them as serialized Torchscript. When this Torchscript is loaded it contains symbols like "aten::div" describing some of the program's behavior. If the behavior of these symbols has changed since the program was serialized, however, then the original program's semantics may not be preserved. For example, when we make aten::div always perform "true" division, like NumPy, Python3, and JAX, then serialized Torchscript programs relying on aten::div performing floor division on integral inputs will break. This PR demonstrates the "Versioned Symbol" pattern that lets symbols be remapped into Torchscript builtins that preserve their historic behavior. Using this pattern, after we update aten::div to always perform true division, we could remap it in older Torchscript programs to a builtin that implements its historic behavior. The pattern is described in the [Versioned Symbols] note in the code and is implemented like this: - If BuiltinModule is given a version, before it returns a symbol it queries to see if another symbol should be substituted for it. - versioned_symbol.cpp has a map for symbols and the version range for which another symbol should be substituted for them. - The substitutions are implemented as builtin functions. An example using the new, test-only _subcmul function is implemented to test this behavior. A test in jit/test_save_load.py follows the pattern described in the [Versioned Symbols] note and uses a fixture serialized with file version 2 to verify that the historic behavior is preserved. In the future we will likely need a slightly more complex mechanism with multiple substitutions with distinct version ranges, and this just requires changing the map to be Symbol->Iterable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36300 Differential Revision: D21058990 Pulled By: mruberry fbshipit-source-id: 2b7c732878c0ecfcd9f0a6205fb6d6421feeaf61	2020-04-16 04:56:53 -07:00
Basil Hosmer	0785585db9	Reland Make DispatchKeyExtractor forget about TensorOptions (#36290 ) (#36562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36562 The BackendSelect dispatch key gives us a way to extract backend- specific dispatch keys from non-Tensor arguments without teaching the DispatchKeyExtractor about them. Here we finish switching over to the BackendSelect approach for factory functions and remove TensorOptions from the set of types DispatchKeyExtractor needs to consider. Test Plan: Imported from OSS Differential Revision: D21013652 Pulled By: bhosmer fbshipit-source-id: e30512d1c3202149e72b7d7ce15084bbfed63ac7	2020-04-16 01:17:32 -07:00
Jongsoo Park	f89fc204c6	[caffe] fix input order in SLS op documentation (#36708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36708 WEIGHTS is the second input operand of SparseLengthsWeightedSum operators but in the documentation the order was wrong. Test Plan: CI Reviewed By: yinghai Differential Revision: D21058240 fbshipit-source-id: e160e983603e606e63fbbfdee34d98d3587870d8	2020-04-16 00:55:54 -07:00
Nick Gibson	7539ea0207	[TensorExpr] Add simplification of length 0 and 1 For loops to IR Simplifier (#36348 ) Summary: Simplifies loops which can be collapsed down into a single block or removed entirely. E.g. ``` For 0..1 { Statements... } ``` Is now just `Block({Statements...})` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36348 Differential Revision: D21057959 Pulled By: nickgg fbshipit-source-id: 2f95a19a965c4a6e023680e2cea9ea846e82d62e	2020-04-15 23:56:34 -07:00
Nikita Shulga	e17cf93b9a	Report tesnro_expr test results (#36684 ) Summary: Also download MNIST dataset in background while jit tests are running Pull Request resolved: https://github.com/pytorch/pytorch/pull/36684 Differential Revision: D21059045 Pulled By: malfet fbshipit-source-id: 9904be303763e9891f1818c1334f328bc0d0e4a7	2020-04-15 22:16:46 -07:00
Nikita Shulga	f548946363	Fix out-of-boundary access in `caffe2::StartsWith` (#36672 ) Summary: `std::mismatch( InputIt1 first1, InputIt1 last1, InputIt2 first2 )` assumes that container for `first2` iterator contains at least `last1 - first` elements, which is not the case if `prefix` is longer than `str` Found while running unit tests on Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/36672 Differential Revision: D21049407 Pulled By: malfet fbshipit-source-id: ad45779d47a0c6898900e0247c920829a2179f62	2020-04-15 20:40:59 -07:00
Ailing Zhang	30dd0b74fd	Save view_fn for inplace update on view tensors (#36073 ) Summary: This PR enables inplace updates on view Tensors for tensor types(XLA) that doesn't support as_strided. (See Notes inside PR) Pull Request resolved: https://github.com/pytorch/pytorch/pull/36073 Reviewed By: yf225 Differential Revision: D20994282 Pulled By: ailzhang fbshipit-source-id: 83eeccb297b242f9822f08ad110a7045d7055639	2020-04-15 20:11:27 -07:00
Pritam Damania	f64fae9193	Fix race in mark_graph_task_completed. (#36640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36640 We had the following race when two threads entered 'mark_graph_task_completed'. 1) Thread 1 grabs the graph_task mutex first and moves captured_vars_ to its local 'vars'. 2) Thread 1 releases the lock. 3) Thread 2 grabs the mutex and moves an empty captured_vars_ to its local 'vars'. 4) Thread 2 now proceeds to call 'markCompleted' with empty grads. 5) Thread 1 which actually has the right grads never gets to set the grads on the future since future_completed_ is set to True by Thread 2. Discovered this while running our RNN example: https://github.com/pytorch/examples/tree/master/distributed/rpc/rnn and verified this PR fixes the race. ghstack-source-id: 102237850 Test Plan: waitforbuildbot Differential Revision: D21035196 fbshipit-source-id: 1963826194d466b93f19e8016b38e4f9cad47720	2020-04-15 20:05:34 -07:00
Vasiliy Kuznetsov	a5d0d762fa	redo of add quantized layer norm implementation (#36593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36593 This is a redo of https://github.com/pytorch/pytorch/pull/35329 with a better test. Adds a quantized implementation of LayerNorm for server. A future PR will add the Python wrapper. Test Plan: numerics match the floating point implementation benchmarks by input size: v1 (mean+var non-vectorized): https://gist.github.com/vkuzo/f6d72c04742608112f4c2e612c74bd13 v2 (mean+var vectorized in float): https://gist.github.com/vkuzo/4dd95657c5b5f3654e0965db00eff8d2 v3 (mean+var vectorized in int, current): https://gist.github.com/vkuzo/57a75f75629da9f23b64b38ca0e3d34b Differential Revision: D21030268 Pulled By: vkuzo fbshipit-source-id: b3594c3393cfce37a881319e2e0560620d51080f	2020-04-15 19:47:18 -07:00
Vasiliy Kuznetsov	91f1d79d1b	hardswish: enable for QAT (#36604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36604 Adds the logic to wrap the HardSwish module in FakeQuant to support QAT. Test Plan: Added test to cover that this happens properly. Imported from OSS Differential Revision: D21045322 fbshipit-source-id: 8c46559ade58a5d5c56442285842627a3143eb0f	2020-04-15 18:04:11 -07:00
Vasiliy Kuznetsov	65df8b3886	hardswish: make it work in static quantization (#36545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36545 * adds a quantized nn.module for Hardswish so we can observe activation values * modifies the hardswish op to allow specifying scale + zero_point * makes hardswish model be properly swapped in static quantization Test Plan: added tests and they pass for: * the new _out flavor of hardswish * QNNPACK changes * static quant e2e Imported from OSS Differential Revision: D21045320 fbshipit-source-id: ab7e52f0f54a7d5923ab6f58197022cc28c12354	2020-04-15 18:02:35 -07:00
Elias Ellison	9cbeb0faed	[JIT] Dont optimize shape peepholes on inline (#36404 ) Summary: With https://github.com/pytorch/pytorch/pull/35562, we are running peephole optimization on inlining to reduce the number of nodes that are copied. The tracer encodes the sizes in the graph like: ``` graph(%0 : Double(7)): %1 : Function = prim::Constant[name="tensor_size"]() %2 : Tensor = prim::CallFunction(%1, %0) return (%2) ``` however people would like to reuse the graph with different shapes so running size invalidations would invalidate that. long term it might be better for the tracer to not include shape information but there are downstream users of that. Separates out FuseAddMM from peephole so that now there is a single `disable_size_optimizations` parameter, and onnx explicitly invokes fuseaddmm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36404 Differential Revision: D20968974 Pulled By: eellison fbshipit-source-id: 56f8f1699e3b0adeeccdfd5a67bb975fd41a2913	2020-04-15 17:49:48 -07:00
Nick Gibson	a99b169828	[TensorExpr] fix a bug in LLVM codegen around empty kernels (#36660 ) Summary: LLVM Codegen assumes that the kernel contains real statements, but that is not guaranteed, especially after IR Simplification. This PR adds a catch for the case where no value is generated after recursing the LLVMCodegen visitor through the kernel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36660 Differential Revision: D21044066 Pulled By: nickgg fbshipit-source-id: e521c766286b1ff4e26befcec7ff4959db8181a4	2020-04-15 17:45:06 -07:00
davidriazati	8d66f88eb1	[jit] Fix bound method copying (#36546 ) Summary: Previously we were copying the bound method of the original class to the new script module class, which causes `self` to be wrong. This PR changes it so we fetch the unbound function, then bind it to the new script module, then attach it to the module. Fixes #28280 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36546 Pulled By: driazati Differential Revision: D21023329 fbshipit-source-id: 6b3f8404700860151792f669a9c02fbd13365272	2020-04-15 17:38:20 -07:00
Omkar Salpekar	5927a6731c	[PyTorch Docs] Updated RRef docs to indicate RPC Retries (#36678 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36678 Updated the docs to explicitly indicate that RRef control messages are idempotent and retried upon failure. ghstack-source-id: 102225791 Test Plan: build bot Differential Revision: D20828041 fbshipit-source-id: ca4d71c65a453664c16c32134c47637a966b1a19	2020-04-15 17:33:20 -07:00
Nikita Shulga	6bd6b70a02	Fix clang-format (#36685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36685 Differential Revision: D21052657 Pulled By: malfet fbshipit-source-id: b4ec7eba21864108a1108f8c83b5d33cf31ab89e	2020-04-15 17:02:20 -07:00
ashish	609b6875f9	Enable test_upsamplingNearest2d_launch_fail on ROCm (#36624 ) Summary: The test case exercised in `test_upsamplingNearest2d_launch_fail` will fail on ROCm. The max. grid size per dimension for ROCm are 4294967295(0xffffffff), which is why the tensor dims in `test_upsamplingNearest2d_launch_fail` must give correct results. This PR adds that test case `test_upsamplingNearest2d_launch_rocm` for ONLY ROCm scenario which is essentially the same as `test_upsamplingNearest2d_launch_fail` without an expected failure decorator ezyang iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/36624 Differential Revision: D21050330 Pulled By: ezyang fbshipit-source-id: d7370c97eaab98f382f97052ed39cc168a3bfa71	2020-04-15 16:29:53 -07:00
Ailing Zhang	2cf53128a8	Switch xla job to use bionic clang9 image (#36618 ) Summary: XLA need to switch to clang9 to build with latest TF dependency. We keep pytorch/pytorch build remain using gcc for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36618 Differential Revision: D21045723 Pulled By: ailzhang fbshipit-source-id: 015b65dad2aeef31fd66b753d519b2c9b9ed8b7f	2020-04-15 16:00:42 -07:00
Sebastian Messmer	ddd9eb3e12	Make special cases prim ops instead (#36635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36635 Those ops were manually written in register_aten_ops.cpp, which had a few issues, for example caused them to be duplicated across all register_aten_ops_X.cpp and exist multiple times. Instead, these should just be regular prim ops. ghstack-source-id: 102204991 Test Plan: waitforsandcastle Differential Revision: D21032778 fbshipit-source-id: 18f5eef1cad842d89c97610fc77b957608d2b15e	2020-04-15 15:54:31 -07:00
Jeremy Lilley	a3314f1902	[jit] Add return statement back to Future::addCallback() (#36662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36662 This was a mistake from an earlier change, though the expected impact is relatively minimal - mostly keeping callback around longer than necessary in the case of callbacks already-completed futures. ghstack-source-id: 102203224 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D21044145 fbshipit-source-id: f3bd58bd6bde83caaa7b9bd0385d0ce3647dbc05	2020-04-15 13:09:40 -07:00
Xiaoqiang Zheng	dad25ae47d	Add the one-block multi-thread global reduction support. (#36306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36306 Missing __syncthreads between sections. Differential Revision: D20957254 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Pulled By: zheng-xq fbshipit-source-id: c988f0205b667174b3ee851c28adeec2dbd089f7	2020-04-15 13:05:11 -07:00
Xiaoqiang Zheng	e80813fae3	Add trivial reduce for Cuda (#36293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36293 Detect non-read-only loads, and not to use __ldg. Resubmiting #36092 Test Plan: Imported from OSS Reviewed By: ZolotukhinM Differential Revision: D20935933 Pulled By: zheng-xq fbshipit-source-id: f9280db26aa9c9c8119cea12571bc820f5fbcb61	2020-04-15 13:03:58 -07:00
peter	efab75730f	Migrate release CI jobs to CircleCI for Windows (#36657 ) Summary: It should work for both tagged builds and nightly builds now. Corresponding test pr: https://github.com/pytorch/pytorch/pull/36580 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36657 Differential Revision: D21047686 Pulled By: seemethere fbshipit-source-id: ad7065fc30f9b0d353bff52d4a9f35c8470daf63	2020-04-15 12:50:26 -07:00
Shen Li	5afd816793	Add a warning for Single-Process Multi-GPU DDP (#36656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36656 Test Plan: Imported from OSS Differential Revision: D21042537 Pulled By: mrshenli fbshipit-source-id: fa3501dc2bba14550ec4f254612a80f61fe86a4a	2020-04-15 12:43:50 -07:00
Daya Khudia	df9a250b8d	[pt][quant] avgpool3d for graph mode (#36598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36598 avgpool3d op for graph mode quantization ghstack-source-id: 102204586 Test Plan: buck test //caffe2/test:quantization -- 'TestQuantizeScriptPTSQOps' --print-passing-details 2>&1 \| tee b.log Differential Revision: D21023035 fbshipit-source-id: cb5e627763513a19dba099a79cad750914b14ec6	2020-04-15 12:39:32 -07:00
Mikhail Zolotukhin	ba3d4019e9	Remove prim::CudaFusionGroup from register_prim_ops_fulljit.cpp: it is registered in jit/codegen/cuda/interface.cpp. (#36661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36661 Test Plan: Imported from OSS Differential Revision: D21044112 Pulled By: ZolotukhinM fbshipit-source-id: aba280c0ddae1350239c0656ae37203dbd620534	2020-04-15 12:35:52 -07:00
Nikita Shulga	62e884f8d9	Report bazel-test results as CircleCI metadata (#36643 ) Summary: Also print docker container stats at the end of the run Pull Request resolved: https://github.com/pytorch/pytorch/pull/36643 Differential Revision: D21044161 Pulled By: malfet fbshipit-source-id: 6877d8ce4789116ef270124307844f6cef7dcef5	2020-04-15 11:27:48 -07:00
Jiakai Liu	f98e0a099a	[pytorch] handle pybind11 style registration API with code analyzer (#36607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36607 PR #36258 and subsequent PRs in the stack switch c10 registrations to the new pybind11 style registration API. One notable difference from old c10 registration API is that, operator's namespace is no longer in op schema string, e.g. "aten::" will be factored out from "aten::conv", "aten::emtpy" and etc. The namespace string will be declared at the beginning of registrations with TORCH_LIBRARY / TORCH_LIBRARY_IMPL macro. A rather simple fix is to extract namespace string from the name of enclosing function of registrations, as the TORCH_LIBRARY macro will always create an init function (per namespace) by appending namespace string to a common prefix. Another side effect of the API change is that it adds some debug string constants to the registration API, and because of factoring out the namespace part from op name, there is no longer an effect way to differentiate between real op name and debug strings. A simple workaround is that we only keep the first string constant it encounters while BFSing the LLVM IR - the real op name is directly passed into the registration call while the debug string is indirectly passed via CppFunction. These new assumptions might be broken by future changes but it's so simple to implement to unblock the API work. Test Plan: Imported from OSS Differential Revision: D21026008 Pulled By: ljk53 fbshipit-source-id: c8c171d23aaba6d6b7985d342e8797525126a713	2020-04-15 11:03:41 -07:00
Nikita Shulga	527cf877d6	Delete old `mkl_speed_test.py` Summary: It was always skipped for last 1.5 years (since D10372230 was landed) Test Plan: CI Reviewed By: ailzhang Differential Revision: D21036194 fbshipit-source-id: 9ace60b45a123a9372a88310b91f33a69ae8880c	2020-04-15 11:02:01 -07:00
Omkar Salpekar	4a49ad0da7	Fixed error Regex Parsing for Node Failure Tests (#36620 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36620 Sending to a node that has been shutdown in ProcessGroupAgent could throw several possible exceptions. This PR updates the tests to check for the right exceptions while waiting for other nodes in the gang to fail in `test_backward_node_failure` and `test_backward_node_failure_python_udf`. ghstack-source-id: 102153944 Test Plan: Stress-tested `test_backward_node_failure` and `test_backward_node_failure_python_udf`. They were previously completely broken, but this change makes `test_backward_node_failure` functional and `test_backward_node_failure_python_udf` is flaky but fails infrequently. A change to make the last test work reliably is planned. Differential Revision: D21027280 fbshipit-source-id: e85c2d219ee408483442bd9925fff7206c8efe4b	2020-04-15 10:54:59 -07:00
Omkar Salpekar	87be115fd0	Error Handling in RPC Agent (#35263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35263 Process Group Agent throws an exception if a send attempt is made after the agent is shutdown. With retries, we should catch this exception and mark the original future with an error. ghstack-source-id: 102153897 Test Plan: Running all rpc/dist_autograd tests. Differential Revision: D20611412 fbshipit-source-id: a6009f0b0aa8be662364158962a054c5c29090bf	2020-04-15 10:53:31 -07:00
lixinyu	1e7155caa5	Bucketization (#7284 ) (#34577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34577 Test Plan: Imported from OSS Differential Revision: D20380975 Pulled By: glaringlee fbshipit-source-id: d75939bc54d98675f88d7037491a8420ac20847a	2020-04-15 10:32:51 -07:00
Vasiliy Kuznetsov	3c8921b747	hardswish: add backards pass test (#36420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36420 Adds a unit test for hardswish backward pass Test Plan: Unit test passes on cpu and cuda Imported from OSS Differential Revision: D20994100 fbshipit-source-id: 579df709cc2d92fce3b9a0eeb6faeb9fe8d2f641	2020-04-15 10:17:13 -07:00
Vasiliy Kuznetsov	16e90eba59	hardsigmoid: add cuda kernels (#36351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36351 Adds CUDA kernels for hardsigmoid, to enable its use in training. Note: the update to the cpu backward pass is to keep the cpu vs cuda logic consistent, no change in functionality. Test Plan: add CI for the forward pass run this for the backward pass: https://gist.github.com/vkuzo/95957d365600f9ad10d25bd20f58cc1a Imported from OSS Differential Revision: D20955589 fbshipit-source-id: dc198aa6a58e1a7996e1831f1e479c398ffcbc90	2020-04-15 10:15:49 -07:00
musikisomorphie	cdfefa77a3	PR for double backwards of nn.Fold and nn.Unfold (issue #33452 ) (#36379 ) Summary: soumith ezyang albanD After lots of experiments, I didn't manage to directly print the gradients of Fold/Unfold_backward (let me know if I am wrong). Thus, in my testing codes, I compare the gradients of Fold/Unfold_backward implicitly by comparing the gradients of its following operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36379 Differential Revision: D21040646 Pulled By: ezyang fbshipit-source-id: dafdbfe2c7b20efa535402c7f81fce5c681fce2f	2020-04-15 10:10:05 -07:00
Jiakai Liu	9cac2b83d9	[pytorch] improve code analyzer to dump ops called from c++ functions (#35941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35941 The key step of mobile custom build is to find out ops used by specific model, with which it can produce a tailored build of optimal size. However, ops can not only be called from TorchScript model but can also be called from C++ code directly, e.g.: via torch::jit:: APIs. With static dispatch, ops called this way will be statically linked into client code. With dynamic dispatch, we need obtain & keep these ops explicitly. This PR improves static code analyzer to dump ops that are called from visible c++ symbols matching specific regex. This provides a mechanism to solve the custom build problem with dynamic dispatch. It starts with dumping ops that are callable from functions in torch::jit namespace and include them in custom build with dynamic dispatch. We can extend it to analyze custom code / to refine the set of JIT APIs that are relevant, and etc. This is just a preliminary version. We need improve its usability for more general purpose. Test Plan: Imported from OSS Differential Revision: D20835166 Pulled By: ljk53 fbshipit-source-id: a87cfb22b34f89545edd0674a5dfca6b7cff2b0c	2020-04-14 23:21:19 -07:00
Negin Raoof	f99a28f515	[ONNX] Adding a pass to replace interpolate function with aten::__interpolate (#35744 ) Summary: Since aten;:__interpolate is removed in https://github.com/pytorch/pytorch/pull/34514, we need a pass replace interpolate function with aten::__interpolate for ONNX export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35744 Reviewed By: hl475 Differential Revision: D20907041 Pulled By: houseroad fbshipit-source-id: f2d2cdfec47389245c50f538267124eedf682adf	2020-04-14 23:16:22 -07:00
Yuxi Hu	cf27d07e04	Implementation of STORM optimizer caffe2 python wrapper (#36399 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36399 Added caffe2 python wrapper and unit test for the STORM C++ operator. Test Plan: All newly added unit tests passed using "buck test //caffe2/caffe2/python:optimizer_test -- TestStorm" {F233644598} Reviewed By: chocjy Differential Revision: D18841013 fbshipit-source-id: f692bc18412839db140202ec9a971e556db0e54f	2020-04-14 23:05:45 -07:00
Yuxi Hu	f7c9faab05	Implementation and operator test for STORM optimizer (#36225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36225 Implemented the [STORM](https://arxiv.org/abs/1905.10018) optimizer operator for dense and sparse cases. Test Plan: All newly added unit tests passed using "buck test //caffe2/caffe2/python/operator_test:storm_test". {F233643713} Reviewed By: chocjy Differential Revision: D18702897 fbshipit-source-id: d25eeb492aa2a03c69754d3f076a8239230b3bf4	2020-04-14 23:04:26 -07:00
Sebastian Messmer	84f4061a67	Back out "Revert D20147487: Refactor jit::Operator to more clearly distinguish the two possible states" (#36634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36634 Original commit changeset: e69432dc3d03 ghstack-source-id: 102163937 Test Plan: waitforsandcastle Differential Revision: D21023952 fbshipit-source-id: d1bad395cb0b4eda91a5d815291ac9b7bdb04573	2020-04-14 22:16:17 -07:00
Nikita Shulga	70d3616aa1	[PyTorch] Split `libtorch_sources` into smaller filelists (#36583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36583 To make them more reusable across different build systems Move `load()` directive at the head of `build_variables.bzl` inside function that uses them to make `build_variables.bzl` valid standalone python source file Test Plan: CI + `python -c 'exec(open("tools/build_variables.bzl").read());print(libtorch_sources)'` Reviewed By: EscapeZero Differential Revision: D21018974 fbshipit-source-id: 3dbf2551620f164b8910270ad2c5c91125a9f5f0	2020-04-14 21:51:33 -07:00
Nikita Shulga	91e59f5fe2	[PyTorch] Remove build definitions from `build_variables.bzl` (#36602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36602 `build_variables.bzl` should contain only filelists to make it interpretable between BUCK, Cmake and Bazel build systems. Test Plan: CI Reviewed By: dzhulgakov Differential Revision: D21022886 fbshipit-source-id: 9dd1e289ac502bc325e1223197b6156a316498ba	2020-04-14 21:50:01 -07:00
Tzu-Wei Huang	80b01ba4f3	[TensorBoard] fix #34954 (#36496 ) Summary: cc orionr sanekmelnikov Pull Request resolved: https://github.com/pytorch/pytorch/pull/36496 Differential Revision: D21012775 Pulled By: natalialunova fbshipit-source-id: 2dc978d70d457b511bd13a3399246ae0349ff8ca	2020-04-14 19:23:47 -07:00
Mikhail Zolotukhin	ceecca3324	Clang-format: whitelist test/cpp/tensorexpr/*. (#36616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36616 Test Plan: Imported from OSS Differential Revision: D21027732 Pulled By: ZolotukhinM fbshipit-source-id: f6504ae9c9c0872cee7f0ffcff3ad0e1e229b482	2020-04-14 19:09:37 -07:00
Mikhail Zolotukhin	317f598103	[TensorExpr] Clang-format test/cpp/tensorexpr/*. (#36615 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36615 Test Plan: Imported from OSS Differential Revision: D21027733 Pulled By: ZolotukhinM fbshipit-source-id: e19cd85c1634f4e40805814ac71eec719d6587f8	2020-04-14 19:08:18 -07:00
Jeremy Lilley	37aab14d14	[future] Avoid some future callback self-captures. (#36502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36502 We're sometimes deleting futures without completing them (discovered by logging), and we've recently noticed a slow memory leak. This change migrates the future lambda cases where there was self-capture. - In some cases, we use weak_ptr<>, plus .lock()/assert in the lambda callback. This avoids the reference cycle. We use this primarily in the case where the value ends up being moved in the callback (something we want to be careful about) - We also add a convenience api to Future where the completed Future is returned as an arg. This allows us to avoid self-capture, though it assumes that the markCompleted() caller is persisting the future for the markCompleted() duration (this has been the case) ghstack-source-id: 102130672 Test Plan: ctr_mobile_feed, buck test mode/dev-nosan caffe2/test/... Differential Revision: D20998905 fbshipit-source-id: 7dd52fe4e567a5dea20e8d43862fc2335fd3ce16	2020-04-14 17:52:44 -07:00
xiaobingsuper	1a0b95e7e4	bfloat16: enable basic math function (#35172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35172 Test Plan: Imported from OSS Differential Revision: D20721146 Pulled By: ngimel fbshipit-source-id: 25b2176d0a431706c51a7086e0642aff814d7148	2020-04-14 17:18:21 -07:00
Supriya Rao	73f11a0b23	Update qbatch_norm2d opbenchmark test (#36630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36630 Test Plan: OMP_NUM_THREADS=1 python -m pt.qbatchnorm_test Imported from OSS Differential Revision: D21030508 fbshipit-source-id: 1ece1bd7429207732eae4dd1982ceddcdc5d3a91	2020-04-14 17:09:18 -07:00
Lu Fang	67e0bf14b7	Add support of Dict as output when connecting script and tracing (#36265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36265 Reviewed By: hl475 Differential Revision: D20927160 Pulled By: houseroad fbshipit-source-id: 5a63022e92d234b97b57d60ef7f7aa3bc41c2d22	2020-04-14 16:06:53 -07:00
Kurt Mohler	ce3555a635	Relanding masked_select cuda port from TH to ATen (#36539 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33054 Relanding PR https://github.com/pytorch/pytorch/issues/35429 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36539 Differential Revision: D21007226 Pulled By: ngimel fbshipit-source-id: 3c66ad073ff8e767ad120bc94120379d40346018	2020-04-14 14:03:59 -07:00
Mike Ruberry	9216c67c9e	Revert D21021677: [pytorch][PR] Add core of c10::complex Test Plan: revert-hammer Differential Revision: D21021677 Original commit changeset: 9e144e581fa4 fbshipit-source-id: ce6a88fc71ec0134d0fc6ecdddc4c4db35f89b1f	2020-04-14 13:58:24 -07:00
Edward Yang	5150334c1d	Unconditionally register schema even for manual registration. (#36250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36250 The general concept is that I want a centralized location where you can find all of the registrations for a library. I cannot do this if I don't codegen all of the schemas in one spot--right now, most schemas get generated, but not manually registered ones. Let us assume that manual registration has to do with the actual implementations; nothing strange is going on with the schema definition itself. Make it so. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929258 Pulled By: ezyang fbshipit-source-id: 0a9fdc8eccd7b688b3e7bd8ed64b6e2af21978f4	2020-04-14 13:34:07 -07:00
Zino Benaissa	6c742af235	Remove attributes and method of submodules in frozen module (#34787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34787 This is a follow up patch of freezing of TorchScript modules. This patch enables removal of constant attributes and unused method in submodules. The clean up logic is generalized to handle attributes that share their class type. Test Plan: Imported from OSS Differential Revision: D21004990 Pulled By: bzinodev fbshipit-source-id: 84778aa9ae1a96d23db29c051031f9995ed3ac90	2020-04-14 12:07:12 -07:00
Sebastian Messmer	01b121bd14	Fix bc test (#36588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36588 - ghstack-source-id: 102122179 (Note: this ignores all push blocking failures!) Test Plan: - Differential Revision: D21021653 fbshipit-source-id: b3693a1a5e27d28dc2fc772cbef5787ab4ceafaa	2020-04-14 11:43:06 -07:00
Rohan Varma	7390c333d6	[CI] fix test_distributed for python 3.8+ (#36542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36542 Python 3.8 set the default multiprocessing start mode to spawn, but we need fork in these tests, otherwise there are some pickling issues. Test: Ensure that these tests succeed when run with python 3.8 ghstack-source-id: 102093824 Test Plan: Ensure success with python 3.8 Differential Revision: D21007753 fbshipit-source-id: 4b39844c6ba76a53293c0dfde7c98ec5a78fe113	2020-04-14 11:38:33 -07:00
Xiang Gao	25252816cf	Add core of c10::complex (#35524 ) Summary: Step 0 of https://github.com/pytorch/pytorch/issues/35284 Reference: https://en.cppreference.com/w/cpp/numeric/complex We are targeting C++20. The difference across C++ versions are mostly `constexpr` qualifiers, newer version has more function declared as `constexpr` This PR adds the core of `c10::complex`, it includes - standard constructors as in `std::complex` - explicit conversion constructors converting from `std/thrust::complex` to `c10::complex` - standard assignment operators as in `std::complex` - conversion assignment operators converting from `std/thrust::complex` to `c10::complex` - other standard operators as in `std::complex` - standard methods as in `std::complex` - explicit casting operators to std/thrust - basic non-member functions as in `std::complex`: - arithmetic operators - `==`, `!=` - `<<`, `>>` - `std::real`, `std::imag`, `std::abs`, `std::arg`, `std::norm`, `std::conj`, `std::proj`, `std::polar` - Some of them are intentionally not completely implemented, these are marked as `TODO` and will be implemented in the future. This PR does not include: - overload of math functions which will come in the next PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/35524 Differential Revision: D21021677 Pulled By: anjali411 fbshipit-source-id: 9e144e581fa4b2bee62d33adaf756ce5aadc0c71	2020-04-14 11:00:24 -07:00
anjali411	9a680056ad	Remove extern C for TH_API (#36142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36142 Differential Revision: D20975451 Pulled By: anjali411 fbshipit-source-id: 1a42487f3af9be306cb08ddd8afa9b5e60545846	2020-04-14 10:55:52 -07:00
Ailing Zhang	8a60d8bfe2	Create a new bionic image with clang9 (#36187 ) Summary: New images are already available at http://docker.pytorch.org/pytorch.html. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36187 Differential Revision: D21011545 Pulled By: ailzhang fbshipit-source-id: 4fe98fa63110cb2ecb0194d4a8878fe9d2193611	2020-04-14 10:26:40 -07:00
Supriya Rao	4ebb1278e0	[quant] Update qbatch_norm name to qbatch_norm2d (#36494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36494 Make name consistent with op. Since we have batch_norm2d and batch_norm3d ops Test Plan: python test/quantization/test_quantized.py test_batch_norm2d Imported from OSS Differential Revision: D21008831 fbshipit-source-id: f81ca71a331d5620fd6a3f6175020a30f2e2566b	2020-04-14 10:04:27 -07:00
Natalia Gimelshein	f3f640d479	move test_abs to device-generic tests (#36465 ) Summary: Per title. test_abs used to be marked as slow_test and run on cpu only. Conceptually similar tests are done in TestTorchMathOps, so it's a matter of adding `abs` test there. 2 remaining checks (correct abs for large-valued long tensors, and correct abs for signed zeros) are factored into separate tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36465 Differential Revision: D21000248 Pulled By: ngimel fbshipit-source-id: 8bc8b0da936b1c10fe016ff2f0dbb5ea428e7e61	2020-04-14 09:48:08 -07:00
Xing Wang	4b3e3d8227	[improve logging] add the param information when logging the optimizer engine (#36558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36558 In the log, frequently see a large trunk of Using engine xx for rowWise Adagrad, but without information on which parameter is applied. Test Plan: Should be covered by existing testing that use optimizer Reviewed By: chocjy Differential Revision: D20985176 fbshipit-source-id: 6eb4e19e5307db53fc89b38594a3f303f1492a1c	2020-04-14 07:42:24 -07:00
Nik Ved	d3cf9452af	doc note on deterministic/non-deterministic gradient for min/max/median (#36481 ) Summary: An update on the note that the subgradients for min/max are not deterministic. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36481 Differential Revision: D20993887 Pulled By: albanD fbshipit-source-id: 4e1a7519d94a9dcf9d359ad679360874d32c1fe2	2020-04-14 07:27:18 -07:00
Wojciech Baranowski	69e3ee2d5f	DataLoader: properly diagnose exceeding file descriptor limit (#34768 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/973 Common failure scenario: * DataLoader creates workers and communicates with them through SHMs * Workers send back through an AF_UNIX socket file descriptors to SHMs containing data * The limit of open files gets fully used * A FD gets stripped from a socket message coming back from a worker, without the worker knowing this. * This causes a `RuntimeError: received 0 items of ancdata` in the standard `multiprocessing` package * The exception is not handled by PyTorch and so is presented to the users. After this change the user will see ``` Traceback (most recent call last): File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 761, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/queues.py", line 113, in get return _ForkingPickler.loads(res) File "/home/wbaranowski/git/Quansight/pytorch/torch/multiprocessing/reductions.py", line 294, in rebuild_storage_fd fd = df.detach() File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/resource_sharer.py", line 58, in detach return reduction.recv_handle(conn) File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/reduction.py", line 184, in recv_handle return recvfds(s, 1)[0] File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/multiprocessing/reduction.py", line 162, in recvfds len(ancdata)) RuntimeError: received 0 items of ancdata During handling of the above exception, another exception occurred: Traceback (most recent call last): File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 787, in _try_get_data fs = [tempfile.NamedTemporaryFile() for i in range(10)] File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 787, in <listcomp> fs = [tempfile.NamedTemporaryFile() for i in range(10)] File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/tempfile.py", line 551, in NamedTemporaryFile (fd, name) = _mkstemp_inner(dir, prefix, suffix, flags, output_type) File "/home/wbaranowski/miniconda3/envs/pytorch-cuda-dev/lib/python3.6/tempfile.py", line 262, in _mkstemp_inner fd = _os.open(file, flags, 0o600) OSError: [Errno 24] Too many open files: '/tmp/tmpnx_f6v_f' During handling of the above exception, another exception occurred: Traceback (most recent call last): File "test_shm_leak.py", line 56, in <module> worker_init_fn=worker_init_fn File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 345, in __next__ data = self._next_data() File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 861, in _next_data idx, data = self._get_data() File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 828, in _get_data success, data = self._try_get_data() File "/home/wbaranowski/git/Quansight/pytorch/torch/utils/data/dataloader.py", line 791, in _try_get_data "Too many open files. Communication with the" RuntimeError: Too many open files. Communication with the workers is no longer possible. Please increase the limit using `ulimit -n` in the shell or change the sharing strategy by calling `torch.multiprocessing.set_sharing_strategy('file_system')` at the beginning of your code ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34768 Differential Revision: D20538053 Pulled By: ezyang fbshipit-source-id: be4425cf2fa02aff61619b2b829c153cb1a867cb	2020-04-14 07:10:57 -07:00
Philip Pronin	ed2d1cb2c4	Revert D20147487: Refactor jit::Operator to more clearly distinguish the two possible states Test Plan: revert-hammer Differential Revision: D20147487 Original commit changeset: 50ce10b56f2b fbshipit-source-id: e69432dc3d03002516cd248c84cfea08531c81be	2020-04-14 06:31:31 -07:00
Hao Lu	fb70b4fb93	[caffe2] Add support for std::shared_ptr<std::vector<TensorList>> in PackRecordsOp and UnPackRecordsOp (#36550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36550 Separate dataset_ops changes into a separate diff. Test Plan: ``` buck test caffe2/caffe2/python/operator_test:dataset_ops_test ``` AI/AF canary (tested with D20959214): https://our.intern.facebook.com/intern/experiment_store/experiment/3298538636995/#commit1-commit2 https://our.intern.facebook.com/intern/experiment_store/experiment/2199027015376/#commit1-commit2 Reviewed By: yinghai Differential Revision: D20988910 fbshipit-source-id: b37a7bfd131813e9472a5e2fa24d681d1ef19018	2020-04-14 03:43:21 -07:00
Sebastian Messmer	018c3420b8	Make dim, numel, element_size into prim ops (#36551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36551 Before, those ops were special cased in the jit codegen but that blocks our unboxing refactoring. Instead, make those regular prim ops. ghstack-source-id: 102081858 Test Plan: waitforsandcastle Differential Revision: D21009196 fbshipit-source-id: b90320fce589fc0553f17582b66a5a05d0fd32d1	2020-04-14 02:18:36 -07:00
Edward Yang	dd64e738c5	Expunge TensorId from all DispatchKey names. (#36240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36240 It's annoying, historical, and unnecessary (enum class is already namespaced). I did this codemod with: ``` git grep -l 'CPUTensorId' \| xargs sed -i 's/CPUTensorId/CPU/g' git grep -l 'CUDATensorId' \| xargs sed -i 's/CUDATensorId/CUDA/g' git grep -l 'VariableTensorId' \| xargs sed -i 's/VariableTensorId/Autograd/g' git grep -l 'HIPTensorId' \| xargs sed -i 's/HIPTensorId/HIP/g' git grep -l 'MSNPUTensorId' \| xargs sed -i 's/MSNPUTensorId/MSNPU/g' git grep -l 'XLATensorId' \| xargs sed -i 's/XLATensorId/XLA/g' git grep -l 'PrivateUse1_TensorId' \| xargs sed -i 's/PrivateUse1_TensorId/PrivateUse1/g' git grep -l 'PrivateUse2_TensorId' \| xargs sed -i 's/PrivateUse2_TensorId/PrivateUse2/g' git grep -l 'PrivateUse3_TensorId' \| xargs sed -i 's/PrivateUse3_TensorId/PrivateUse3/g' git grep -l 'AutocastTensorId' \| xargs sed -i 's/AutocastTensorId/Autocast/g' git grep -l '_PreAutogradTensorId' \| xargs sed -i 's/_PreAutogradTensorId/_PreAutograd/g' git grep -l 'TESTING_ONLY_GenericWrapperTensorId' \| xargs sed -i 's/TESTING_ONLY_GenericWrapperTensorId/TESTING_ONLY_GenericWrapper/g' git grep -l 'TESTING_ONLY_GenericModeTensorId' \| xargs sed -i 's/TESTING_ONLY_GenericModeTensorId/TESTING_ONLY_GenericMode/g' ``` Then I did a git grep for remaining TensorId occurrences, and manually killed those (mostly in codegen, and some docs that needed updating). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929255 Pulled By: ezyang fbshipit-source-id: dc371b6aa6e6ea7c0a5660137c14debde806a09d	2020-04-13 23:33:44 -07:00
Edward Yang	8f501f3083	Update internal invariants in the world of manuallyBoxedKernel (#36388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36388 (This wants to make me barf). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20964195 Pulled By: ezyang fbshipit-source-id: 3699a02b16060d79dae9890bafeaafad9ad9ae60	2020-04-13 23:32:01 -07:00
Johannes M Dieterich	076d46f826	[ROCm] Add debug flag (#36521 ) Summary: This kernel debug flag should help locate the issues we are observing on some of the CI nodes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36521 Differential Revision: D21010612 Pulled By: ezyang fbshipit-source-id: d746e4eb0af832e770d2231bfee4154b6e703c19	2020-04-13 23:27:24 -07:00
Pritam Damania	6e7eaabf49	Lock optimizations for DistAutogradContainer. (#36529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36529 DistAutogradContainer is a singleton for the entire process and has a single lock that protects access to map keyed by context_id. Performance profiling showed that this lock is a potential bottleneck for training. As a result, in this PR, we have the following optimizations: 1) Shard the map into 256 buckets with each bucket having its own lock. This would ensure we hold much finer grained locks. 2) sendReleaseContextRpc was being called under a lock, moved this to be outside the lock. ghstack-source-id: 102085139 Test Plan: waitforbuildbot Differential Revision: D21003934 fbshipit-source-id: 55f80dd317311bce0efd3ca8ca617d071297b5dc	2020-04-13 23:11:22 -07:00
Karl Ostmo	411ccce279	Revert D20936595: Make DispatchKeyExtractor forget about TensorOptions Test Plan: revert-hammer Differential Revision: D20936595 Original commit changeset: c2f3cc567761 fbshipit-source-id: 1fcaa0484377e1580c08cd89fd0fcbdeb3f73f11	2020-04-13 23:05:21 -07:00
Wanchao Liang	999d7f6ab2	[jit] tracer flag to guard risky behaivors (#36277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36277 This PR introduce a flag to the tracer that guard the risky behaviors like adding list/dict as output of the tracer. Currently to ensure not BC breaking user, we throw warning if the tracer output is list, and will throw error when the tracer output is dict to enforce using this flag (next PR) Test Plan: Imported from OSS Differential Revision: D20998157 Pulled By: wanchaol fbshipit-source-id: 0d2c55f1a263a48b1b92dd6ad54407815e0a6f72	2020-04-13 22:35:03 -07:00
Mikhail Zolotukhin	d5ba39c25d	[TensorExpr] Postpone insertion of Alloc/Free statements in computeAt. (#36526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36526 Test Plan: Imported from OSS Differential Revision: D21004740 Pulled By: ZolotukhinM fbshipit-source-id: 8ac8db0d4e31065e4fbd3e0cc27f15a15dcb141c	2020-04-13 22:30:00 -07:00
Hao Lu	4d1ccafb4b	[caffe2] Enable copying for caffe2::Tensor (#36468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36468 Since `caffe2::Tensor` is now refcounted, enabling copy constructor and the copy assignment operator should be fine. Test Plan: ``` buck test mode/dev //caffe2/caffe2:caffe2_test_cpu -- TensorTest ``` AI/AF canaries with changes up to D20959214: https://our.intern.facebook.com/intern/experiment_store/experiment/3298538636995/#commit1-commit2 https://our.intern.facebook.com/intern/experiment_store/experiment/2199027015376/#commit1-commit2 AI/AF canaries on this diff: https://our.intern.facebook.com/intern/ads/canary/425960191574068914/ https://our.intern.facebook.com/intern/ads/canary/425960179835413033/ Reviewed By: yinghai Differential Revision: D20985924 fbshipit-source-id: ead5f5ceff23d0adc06d598128de16a5533d767b	2020-04-13 21:41:52 -07:00
Jie	289d52c120	Fixing SyncBN dgrad (#36382 ) Summary: Previous PR https://github.com/pytorch/pytorch/issues/22248 which provides support for variadic batch size across processes doesn't account the mean_dy/mean_dy_xmu on backward path, which produces wrong dgrad. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36382 Differential Revision: D20984446 Pulled By: ngimel fbshipit-source-id: 80066eee83760b275d61e2cdd4e86facca5577fd	2020-04-13 21:08:31 -07:00
Basil Hosmer	70b826a884	Make DispatchKeyExtractor forget about TensorOptions (#36290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36290 The BackendSelect dispatch key gives us a way to extract backend- specific dispatch keys from non-Tensor arguments without teaching the DispatchKeyExtractor about them. Here we finish switching over to the BackendSelect approach for factory functions and remove TensorOptions from the set of types DispatchKeyExtractor needs to consider. Test Plan: Imported from OSS Differential Revision: D20936595 Pulled By: bhosmer fbshipit-source-id: c2f3cc56776197a792cae2a83aeaca995effaad2	2020-04-13 20:57:50 -07:00
Sebastian Messmer	36b273abc0	Refactor jit::Operator to more clearly distinguish the two possible states (#33905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33905 jit::Operator is semantically either a c10 op or a jit-only op but that is represented in a set of member variables with intricate invariants about their values. Making this explicitly represented in a c10::either reduces the number of possible states, removing many of the invalid ones. Similarly, if it is a jit-only op, there were schema_string_ and schema_ of which only one could be set at any time. Using a c10::either there too. ghstack-source-id: 102084054 Test Plan: unit tests Differential Revision: D20147487 fbshipit-source-id: 50ce10b56f2b1f51c8279cef03077c861db3eaac	2020-04-13 20:54:42 -07:00
Sebastian Messmer	9fcb4ab393	Fix either::map naming (#33904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33904 This was misnamed and should actually be either::fold. ghstack-source-id: 102050883 Test Plan: it's just a rename Differential Revision: D20148263 fbshipit-source-id: 5d2ed92230e20e8bb7dec26ac3f26de7f03a6e39	2020-04-13 20:53:28 -07:00
Yinghai Lu	eb00bac2b5	Make FakeLowP tests work (#36525 ) Summary: Make the e2e FakeLowP python tests work with Glow lowering in OSS environment. Added a README.md as a guideline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36525 Reviewed By: hyuen Differential Revision: D21004706 Pulled By: yinghai fbshipit-source-id: d182152e4a1a3368640bd7872cb9ea4d4bff4b02	2020-04-13 20:16:33 -07:00
Vivek Panyam	8544591f5a	Fix a segfault in DeviceThreadHandlePool and PoolWindow (#36416 ) Summary: This PR fixes a bug related to object destruction order across threads. The bug can cause segfaults during shutdown of processes that use libtorch. See https://github.com/pytorch/pytorch/issues/36408 for more detail Pull Request resolved: https://github.com/pytorch/pytorch/pull/36416 Differential Revision: D21006321 Pulled By: ezyang fbshipit-source-id: da97936d9f2ed3f3e3aba8a3a29b38314f04b57f	2020-04-13 20:10:46 -07:00
Gao, Xiang	c7631716da	Output more debugging information for reduce kernel (#35946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35946 Differential Revision: D21007660 Pulled By: ngimel fbshipit-source-id: 83dc257c3d9ff722d30270214c413d8a16bcffc0	2020-04-13 19:43:38 -07:00
Vasiliy Kuznetsov	1e22717118	qnnpack hardswish - pytorch op integration (#36320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36320 Hooks up the aten quantized hardswish op to the QNNPACK path added in the previous PR. Test Plan: tests pass will run benchmarking on mobile to confirm Imported from OSS Differential Revision: D20965043 fbshipit-source-id: e3f147268142103b5ea3f48610aa3b9837b7b61a	2020-04-13 19:08:07 -07:00
Vasiliy Kuznetsov	0964b662c3	qnnpack hardswish - LUTs (#36252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36252 Adds a baseline hardswish kernel using LUTs in QNNPACK. Performance is 1.9 GB/s on a Nexus 6 and 2.2 GB/s on Pixel 3 - same as other LUT based ops. Enforcing scale and zp to be equal to the input, to match the server implementation. There are some potential improvements in rewriting this as NEON kernels for a further speedup - saving that until later, if we need it. Test Plan: ``` with-proxy ./scripts/build-local.sh ./build/local/hardswish-test with-proxy scripts/build-android-armv7.sh adb push ./build/android/armeabi-v7a/hardswish-* /data/qnnpack adb shell /data/qnnpack/hardswish-test /data/qnnpack/hardswish-bench with-proxy scripts/build-android-arm64.sh adb push ./build/android/arm64-v8a/hardswish-* /data/qnnpack /data/qnnpack/hardswish-test /data/qnnpack/hardswish-bench ``` Imported from OSS Differential Revision: D20965044 fbshipit-source-id: 982938361971513cb15873438e12c23a38e819e3	2020-04-13 19:06:59 -07:00
Haixin Liu	455d4aab64	[PyTorch Numeric Suite] Add weight compare API (#36186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36186 Start PyTorch Numeric Suite under PyTorch quantization and add weight compare API to it. ghstack-source-id: 102062165 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_compare_weights' Differential Revision: D20903395 fbshipit-source-id: 125d84569837142626a0e2119b3b7657a32dbf4e	2020-04-13 19:02:00 -07:00
Sudarshan Raghunathan	739351fac4	Fix linter warning: replace f-strings with str.format for Py2 compat (#35492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35492 Test Plan: Imported from OSS Differential Revision: D20998727 Pulled By: drdarshan fbshipit-source-id: 54f34a7649a2772ad030b456f1b50aba831ce2e0	2020-04-13 18:43:58 -07:00
Nikita Shulga	501d9f33ab	Fix clang format (#36544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36544 Differential Revision: D21007972 Pulled By: malfet fbshipit-source-id: 5c252ac628553e00b6d55d29233272e14a3f2545	2020-04-13 18:36:48 -07:00
Pavel Belevich	0b7e832325	Fix signed integer overflow in rng_test.h (#36421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36421 Test Plan: Imported from OSS Differential Revision: D20978925 Pulled By: pbelevich fbshipit-source-id: 30b6abb19abe70738a3a68427f3b3df67510fb48	2020-04-13 18:30:21 -07:00
Nikita Shulga	fd008bd170	Make patterns in test_unmatched_annotations more flexible (#36422 ) Summary: To make them compatible with python3.7 and python3.8 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36422 Test Plan: CI Differential Revision: D21006399 Pulled By: malfet fbshipit-source-id: 725df277ff3e4479fc2c39d16a30fbf301fde9e5	2020-04-13 17:53:37 -07:00
Tzu-Wei Huang	1f40bddf57	[TensorBoard] fix #36471 (#36495 ) Summary: cc orionr sanekmelnikov Confirm that the function was removed already. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36495 Differential Revision: D21003122 Pulled By: natalialunova fbshipit-source-id: 364b0790953980e02eb7ff8fa0b6218d7e34a0c3	2020-04-13 17:49:16 -07:00
Tzu-Wei Huang	c49de6ce0d	[TensorBoard] fix #33140 (#36497 ) Summary: cc orionr sanekmelnikov The fix was ported from `9d267066a6` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36497 Differential Revision: D21002710 Pulled By: natalialunova fbshipit-source-id: 0d2f3697c650bccdf6de52583de0d38c5c219261	2020-04-13 17:44:09 -07:00
Kurt Mohler	0912284830	CI failure tips (#36507 ) Summary: Finding out how to ssh into a CircleCI job to debug a failure is a challenge because, as far as I know, there isn't any concise documentation about it. I figured it might be nice to include this in CONTRIBUTING.md. Maybe there are some other tips about non-CircleCI jobs that could be added in the future as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36507 Differential Revision: D21006526 Pulled By: ezyang fbshipit-source-id: 0a544ecf37bf9550e9b2f07595332dc5f394bb9e	2020-04-13 17:39:47 -07:00
Chunli Fu	b38d505e42	[shape inference] use max_seq_size as max_feature_len in SLS and LengthsRangeFill inference (#36346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36346 Reviewed By: yinghai, ipiszy Differential Revision: D20952490 fbshipit-source-id: ac0e00b47be3fbfa908b84e062c83817dc326924	2020-04-13 17:28:43 -07:00
Chunli Fu	110893abf0	[Shape Inference] Infer input(1) from input(0) in elementwise ops (#36498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36498 Test Plan: unit test Reviewed By: yinghai Differential Revision: D20996757 fbshipit-source-id: 35e5e0bed63d2ba8a0699a8a78bff6f78e7af23c	2020-04-13 16:58:34 -07:00
Pavel Belevich	c9a1fc2b31	replace Generator arguments with c10::optional<Generator> (#36232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36232 The purpose of this PR is to replace `at::Generator generator = nullptr` with `c10::optional<at::Generator> = c10::nullopt` all over the code * #36230 Replace std::shared_ptr with c10::intrusive_ptr in at::Generator Test Plan: Imported from OSS Differential Revision: D20943603 Pulled By: pbelevich fbshipit-source-id: 65d335990f01fcc706867d5344e73793fad68ae6	2020-04-13 16:26:57 -07:00
Nikita Shulga	5a7f889a11	Use bazel build rules from fbgemm (#36339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36339 Test Plan: CI Differential Revision: D21004672 Pulled By: malfet fbshipit-source-id: 8ce5b436686cfb70141104aa6722c8cc13609caa	2020-04-13 16:03:27 -07:00
Wanchao Liang	3526627f46	Use unittest assertWarns instead (#36411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36411 This PR remove pytorch specific defined assertwarns and use the unit test one, also format some tests Test Plan: Imported from OSS Differential Revision: D20998159 Pulled By: wanchaol fbshipit-source-id: 1280ecff2dd293b95a639d13cc7417fc819c2201	2020-04-13 15:56:42 -07:00
Sebastian Messmer	d7b7998370	Enable more tests in fbcode (#36418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36418 Those tests were only run in oss before but should also run in fbcode. ghstack-source-id: 101973722 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D20976954 fbshipit-source-id: 7ced56dcbdbfe0e07993871a7811a086894b6b32	2020-04-13 15:51:53 -07:00
Jeremy Lilley	0035aeef40	[autograd] Avoid holding lock when completing GraphTask futureResult (#35101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35101 TSAN is noting lock-order-inversion in context of dist autograd because we're holding lock when GraphTask calls markCompleted() on the relevant futureResult_. Add an atomic bool to make it possible to protect this without holding the mutex, and also fix alignment of a few struct vars. ghstack-source-id: 101805283 Test Plan: buck test mode/opt-tsan //caffe2/test/distributed/rpc:dist_autograd_spawn_thrift Differential Revision: D20553517 fbshipit-source-id: 446e3718dd68876bd312166ecceed1d92868ce4e	2020-04-13 15:23:47 -07:00
Mikhail Zolotukhin	765bf8f03d	Remove duplicate bindings from torch/csrc/jit/python/init.cpp. (#36492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36492 Test Plan: Imported from OSS Differential Revision: D20995235 Pulled By: ZolotukhinM fbshipit-source-id: 6afa3a956e57c2fb94bb29d332177be73a2bac2a	2020-04-13 12:28:32 -07:00
Danny Huang	ced9edbaa4	[Torch Device][c10] Fix the expected torch device error message (#36446 ) Summary: This PR made the expected torch device string error message to include `xla` as the acceptable torch device prefix string. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36446 Test Plan: No Logic changed, and made sure `xla` is acceptable in `torch.device`. ``` import torch device = torch.device("xla") ``` ``` device = torch.device("unrecognized") RuntimeError: Expected one of cpu, cuda, mkldnn, opengl, opencl, ideep, hip, msnpu, xla device type at start of device string: unrecognized ``` Differential Revision: D20993449 Pulled By: dahsh fbshipit-source-id: 83afe4f913a650a655bfda9c2a64bf9e5aa27e16	2020-04-13 12:02:07 -07:00
Thomas Viehmann	d070c0bcf0	ROCm: enable cpp_extensions.load/load_inline (#35897 ) Summary: This enables cpp_extensions.load/load_inline. This works by hipify-ing cuda sources. Also enable tests. CuDNN/MIOpen extensions aren't yet supported, I propose to not do this in this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35897 Differential Revision: D20983279 Pulled By: ezyang fbshipit-source-id: a5d0f5ac592d04488a6a46522c58e2ee0a6fd57c	2020-04-13 11:44:08 -07:00
Tristan Rice	ce54f0d411	Back out "Revert D20449887: [dt][caffe2] enable using smart exceptions in async nets" (#36172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36172 Original commit changeset: 3d7801613f86 D20449887 broke some OSS tests as the OSS export sync wasn't working correctly. Test Plan: Manually export latest version to OSS to trigger the tests + test plan in D20449887 verified onnx tests are passing in https://github.com/pytorch/pytorch/pull/36172 Reviewed By: andrewwdye Differential Revision: D20902279 fbshipit-source-id: bc30fcc9f5cc8076f69a5d92675fd27455948372	2020-04-13 11:31:52 -07:00
Zachary DeVito	d591a7bb82	Use Function to implement fork. (#36179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36179 This ensures normal optimization passes run for forked functions. Test Plan: Imported from OSS Differential Revision: D20907253 Pulled By: zdevito fbshipit-source-id: 72cfa9f82643214b1ef3de24697d163a9a24b29c	2020-04-13 11:21:48 -07:00
Zachary DeVito	967cdc2baf	Simplify replicate logic (#36174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36174 Test Plan: Imported from OSS Differential Revision: D20903301 Pulled By: zdevito fbshipit-source-id: 714a32fe417b7d1615886936c41505d1ba538f47	2020-04-13 11:21:43 -07:00
Zachary DeVito	4f956fcf88	_requires_grad -> requires_grad (#36168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36168 This makes it match what eager mode uses as the keyword name. Currently _requires_grad will not appear in serialization because it is not listed as kwarg only. There is a small chance there is a model that has never been run in eager mode that uses the _requires_grad name, but this is rare enough that I don't think we need to worry about it unless something breaks in testing. Test Plan: Imported from OSS Differential Revision: D20902557 Pulled By: zdevito fbshipit-source-id: 605cf5371b4fc15ec1b4e8a12f9660d723530de4	2020-04-13 11:20:27 -07:00
Jeremy Lilley	e3b6dd1708	[rref] Minor tweaks in rref_context (#36419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36419 Since we call waitForThreadLocalPendingRRefs per-RPC, construct it already-satisfied in the common empty case, to avoid extra mutex/cv work. Also, naming consistency for recording_. ghstack-source-id: 101975739 Test Plan: ctr_mobile_feed, buck test mode/dev-nosan caffe2/test/... Differential Revision: D20977879 fbshipit-source-id: e321a33127e4b5797e44e039839c579057e778e5	2020-04-13 10:19:02 -07:00
Kurt Mohler	2bc49a4b85	block_diag dense (#33449 ) Summary: Add block_diag function for dense tensors, based on scipy.linalg.block_diag Closes https://github.com/pytorch/pytorch/issues/31932 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33449 Differential Revision: D20943099 Pulled By: zou3519 fbshipit-source-id: 8b5c9476fb5af959aafa4169612c660396d9b717	2020-04-13 10:04:55 -07:00
Anton Jansson	35cc2bbca3	Removed unnecessary call to '_strong_wolfe' in LBFGS. (#36453 ) Summary: It was called twice, but the result of the first invocation was not used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36453 Differential Revision: D20993535 Pulled By: yf225 fbshipit-source-id: 4d85207a936b846866424903d7622905f3fddd36	2020-04-13 09:06:33 -07:00
Alexander Sidorov	1e15063761	ThroughputBenchmark: integration with Autograd Profiler (#36282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36282 The reason to do this explicitly in the tool is that we don't want to capture warmup in profiling (as well as input cloning). So instead we make the benchmarking code explicitly aware of the profiler. Example output: ``` I0408 16:06:40.300040 85516 throughput_benchmark-inl.h:106] Using Autograd profiler. Trace will be saved to /tmp/tmpt0gsz85y I0408 16:06:40.302232 85516 throughput_benchmark-inl.h:111] Starting threads I0408 16:06:40.302258 85524 throughput_benchmark-inl.h:78] Starting forward thread 1 I0408 16:06:40.302259 85525 throughput_benchmark-inl.h:78] Starting forward thread 2 I0408 16:06:40.302261 85523 throughput_benchmark-inl.h:78] Starting forward thread 0 I0408 16:06:40.302259 85526 throughput_benchmark-inl.h:78] Starting forward thread 3 I0408 16:06:40.412879 85525 throughput_benchmark-inl.h:88] Shutting down forward thread 2. Total number of finished threads: 1 I0408 16:06:40.412971 85523 throughput_benchmark-inl.h:88] Shutting down forward thread 0. Total number of finished threads: 2 I0408 16:06:40.412989 85526 throughput_benchmark-inl.h:88] Shutting down forward thread 3. Total number of finished threads: 3 I0408 16:06:40.413033 85524 throughput_benchmark-inl.h:88] Shutting down forward thread 1. Total number of finished threads: 4 I0408 16:06:40.413056 85516 throughput_benchmark-inl.h:123] Finished benchmark Average latency per example: 443.256us Total number of iterations: 1000 Total number of iterations per second (across all threads): 9024.12 Total time: 110.814ms ``` Test Plan: Imported from OSS Differential Revision: D20987125 Pulled By: ezyang fbshipit-source-id: 1f8980c3a5a0abdc268c7a16c99aa9ea868689eb	2020-04-13 08:53:40 -07:00
Kjell Schubert	a2e059cfa6	add missing 'import warnings' (#35313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35313 The intention of D16955662 was to print a warning when a single-layer LSTM has an (ignored) dropout specified. I ran into this warning with one of our models, but instead of a warning I got "name 'warnings' is not defined". The linter could have called out that problem on the original diff, not sure why it didn't. Test Plan: Before this diff JITing a particular model in f176977725 yielded "name 'warnings' is not defined". After this diff f176980937 gets past that point (failing in an unrelated downstream workflow). Reviewed By: jianyuh Differential Revision: D20611822 fbshipit-source-id: 99d90f4830f3b15ddbf1e2146e2cc014ef26c2ab	2020-04-13 08:41:44 -07:00
Max Balandat	379e4d9cad	[pytorch] Make behavior of SobolEngine consistent w/ other RNG functions (#36427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36427 Addresses https://github.com/pytorch/pytorch/issues/36341 Test Plan: unit tests Reviewed By: ldworkin Differential Revision: D20952703 fbshipit-source-id: 28055f4c4c0f8012c2d96e473b822fa455dd833c	2020-04-13 07:53:33 -07:00
svcscm	d2e0c628e9	Updating submodules Summary: GitHub commits: `15e343ce0c` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 0955bbb3c6628319981f52e7b3076af1ae28ddfe	2020-04-13 00:00:48 -07:00
Mike Ruberry	b92f8d9b7e	Revert D20950587: [pytorch][PR] Added complex types to get_all_dtypes and turned on masked_fill for complex Test Plan: revert-hammer Differential Revision: D20950587 Original commit changeset: ba7c372a28f0 fbshipit-source-id: 487ac59a971b1ecefd20fd446385ba12334d9695	2020-04-12 21:33:17 -07:00
Nikita Shulga	6be8560375	Do not double compile generated files (#36417 ) Summary: Bazel puts generated files in its private hermetic builds, but for some reason also searches for files in `torch/csrcs/*/generated/` folders. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36417 Test Plan: Use the same folder to compile pytorch using cmake and bazel Differential Revision: D20987580 Pulled By: malfet fbshipit-source-id: 36d15ba3ce0d0c7ea923ddef902bd500f2578430	2020-04-12 15:16:38 -07:00
anjali411	4bcd8ab6f7	Added complex types to get_all_dtypes and turned on masked_fill for complex (#36335 ) Summary: 1. Added complex dtypes to get_all_dtypes to unify testing for complex dtypes with other dtypes so that they don't get out of sync with behavior supported for other dtypes. 2. resolves https://github.com/pytorch/pytorch/issues/36322, https://github.com/pytorch/pytorch/issues/36327 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36335 Differential Revision: D20950587 Pulled By: anjali411 fbshipit-source-id: ba7c372a28f007372b6f15adf7c52d3a09fd4007	2020-04-12 13:41:06 -07:00
Zafar Takhirov	d83509e603	[quant] Fix for the conv1d kernel shape (#36397 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36397 Differential Revision: D20966295 Test Plan: Imported from OSS Pulled By: z-a-f fbshipit-source-id: bd2ab9dcfe22b900cff1ddffa60618fa8f703a1f	2020-04-11 22:34:46 -07:00
Mike Ruberry	0c9bf64989	Disables complex clamp (#36373 ) Summary: This partially addresses https://github.com/pytorch/pytorch/issues/33568 by disabling clamp for complex inputs until an appropriate solution can be implemented. test_complex_unsupported in test_torch.py is extended to validate this behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36373 Differential Revision: D20984435 Pulled By: mruberry fbshipit-source-id: 49fd2e1e3a309f6a948585023953bae7ce3734c8	2020-04-11 22:24:06 -07:00
Mike Ruberry	254be6a201	Adds NumPy array x Torch tensor binary ufunc interaction test (#35945 ) Summary: Adds test for behavior reported in https://github.com/pytorch/pytorch/issues/35257 to ensure it doesn't regress. The test was extended to reveal three additional issues: - https://github.com/pytorch/pytorch/issues/36363 - https://github.com/pytorch/pytorch/issues/36058 - https://github.com/pytorch/pytorch/issues/36057 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35945 Differential Revision: D20984429 Pulled By: mruberry fbshipit-source-id: a15be9455afba9c77e40c337a860f9be348bf8d5	2020-04-11 21:56:38 -07:00
Ksenija Stanojevic	4f728c9d81	[ONNX] Enable constant folding for Shape (#35386 ) Summary: Enabled constant folding for onnx::Shape Pull Request resolved: https://github.com/pytorch/pytorch/pull/35386 Reviewed By: hl475 Differential Revision: D20682412 Pulled By: houseroad fbshipit-source-id: 4559a35f174edfb7e6364c0fbf5bc1d55d0d26dc	2020-04-11 13:49:52 -07:00
Mikhail Zolotukhin	e3af0c9f9b	[TensorExpr] Add new file bounds_inference.cpp to BUILD.bazel. (#36440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36440 Test Plan: Imported from OSS Differential Revision: D20983368 Pulled By: ZolotukhinM fbshipit-source-id: 5d847c5b066297e8e5585c870387165f89938e45	2020-04-11 13:19:18 -07:00
Yinghai Lu	c1efe1ddb5	Enable building of FakeLowP ops (#36170 ) Summary: We open sourced the FakeLowp ops as a reference implementation of fp16 ops. This PR makes it buildable. ``` USE_CUDA=0 USE_ROCM=0 USE_FAKELOWP=ON python setup.py install ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36170 Test Plan: Build Onnxifi library in Glow. ``` cp ${GLOW}/build/lib/Onnxifi/libonnxifi-glow.so ${MY_PATH}/ibonnxifi.so LD_LIBRARY_PATH=${MY_PATH}/ibonnxifi.so python pytorch/caffe2/python/fakelowp/test_sls_nnpi_fp16.py ``` It doesn't run successfully right now because we need to open source the glow gflags and some other ops like `FbgemmPack`. Reviewed By: houseroad Differential Revision: D20980681 Pulled By: yinghai fbshipit-source-id: 6dd31883a985850a77261bcc527029479bbc303f	2020-04-11 13:17:59 -07:00
Mike Ruberry	7aa6a8fd7a	Disables complex min and max (#36377 ) Summary: Partially addresses https://github.com/pytorch/pytorch/issues/36374 by disabling min and max for complex inputs. test_complex_unsupported in test_torch.py is extended to validate this behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36377 Differential Revision: D20964661 Pulled By: mruberry fbshipit-source-id: 79606c2e88c17c702543f4af75847d2460586c2d	2020-04-11 12:30:35 -07:00
Sebastian Messmer	7b9ab91614	Improve boxed dispatch performance (#33313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33313 Instead of just remembering the number of arguments and iterating over the stack, the DispatchKeyExtractor now remembers the exact locations of the dispatch relevant arguments (i.e. Tensor arguments) and only looks at those. ghstack-source-id: 101908386 Test Plan: unit tests, benchmarks Differential Revision: D19748549 fbshipit-source-id: b5b9ff2233b3507e0b600460f422912cfa9e3f0f	2020-04-11 12:04:27 -07:00
Sebastian Messmer	22212a82b4	Remove functor factories in KernelFunction (#35488 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35488 - The original problem why those existed was a SIOF (see the multi-line comment that is deleted in this PR). However, I think this SIOF situation only happened for caffe2 kernels exposed to PyTorch and those now use a different mechanism that shouldn't cause the SIOF anymore (they now create the caffe2 kernel instance on each call instead of storing it in the functor). If this PR passes CI, I'm assuming that the SIOF doesn't exist anymore and we can simplify this code. ghstack-source-id: 101933838 Test Plan: waitforsandcastle Differential Revision: D20676093 fbshipit-source-id: 462e11f75f45d9012095d87f447be88416f5dcdc	2020-04-11 11:58:40 -07:00
Martin Yuan	91441ae87f	[Lite Interpreter] Move Implicic ops to register_prim_ops.cpp (#36406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36406 As title. Move the related operators so that they are available from lite interpreter. ghstack-source-id: 101944177 (Note: this ignores all push blocking failures!) Test Plan: CI Reviewed By: ayush29feb Differential Revision: D20958833 fbshipit-source-id: a755d4d662b9757d8d425b7a25f519aaad1fd330	2020-04-11 09:34:37 -07:00
Mikhail Zolotukhin	df5f0a04ff	[TensorExpr] Implement LoopNest::computeAt (#36112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36112 Differential Revision: D20885662 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 4ea6293b249562fca46739dc36c5483d912e5838	2020-04-11 04:01:14 -07:00
Mikhail Zolotukhin	397aa46a3e	[TensorExpr] Bounds inference (#35120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35120 Differential Revision: D20567926 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 89a2afcddaf23a5c6259c15e4f7194e8649c1c4d	2020-04-11 03:59:34 -07:00
Sebastian Messmer	c856a2cb0d	Move unboxing to after dispatch for ops with manual kernel registrations (#36398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36398 ghstack-source-id: 101935322 Test Plan: CI Differential Revision: D20967226 fbshipit-source-id: 10e694bd7cd53e5efa1f21c5aa2c9ba3fac9ba44	2020-04-11 03:33:18 -07:00
Pavel Belevich	7e8c27ed25	Fix view_complex_as_float for empty tensors (#36415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36415 Test Plan: Imported from OSS Differential Revision: D20974194 Pulled By: pbelevich fbshipit-source-id: afc19a47d585b7c0c33fcde922d10fa377194315	2020-04-11 03:18:10 -07:00
Lu Fang	742c77971a	Revert D20961711: [pytorch][PR] Returns float tensors for complex inputs to abs Test Plan: revert-hammer Differential Revision: D20961711 Original commit changeset: 232f62cf64ca fbshipit-source-id: 7b2a537d2effe6b2449f192dc42e375062058995	2020-04-11 02:55:41 -07:00
Shihao Xu	ae452a81a9	[DistAutograd x JIT] Capture global state, dist autograd current context id, before thread switching triggered by JIT future.wait() (#36395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36395 titled Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_fork -- test_restore_context_after_swtich_to_jit_thread buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_fork buck-out/gen/caffe2/test/distributed/rpc/jit/dist_autograd_fork\#binary.par -r test_restore_context_after_swtich_to_jit_thread ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork -- test_backward_simple_script_call buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call ``` Differential Revision: D7857991 fbshipit-source-id: 168e0e3846a50ea92d4f9450a30ccc6c13e2fcec	2020-04-11 02:51:39 -07:00
Lu Fang	0dbb21f89e	Revert D20931186: Enable c10 unboxing for ops with TensorList Test Plan: revert-hammer Differential Revision: D20931186 Original commit changeset: 494723326070 fbshipit-source-id: c5e2386ad06acabaee05f6addcf1eb898f3d4ae0	2020-04-11 02:47:34 -07:00
svcscm	409346eee3	Updating submodules Summary: GitHub commits: `1b6423cc1f` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 112f86c34c8f2f666271deb34e97d74f6f528696	2020-04-10 20:41:34 -07:00
Pritam Damania	5b331e8611	Catch exception in distributed engine callbacks. (#36118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36118 Callbacks registered with the autograd engine Future in the distributed engine have a non-trivial amount of business logic. Its entirely possible that we throw exceptions in these callbacks resulting in those not being propagated back to the client (since the appropriate future was not marked as completed). In this PR, I've added appropriate try-catch blocks to ensure we always mark the appropriate Future with an error. ghstack-source-id: 101904294 Test Plan: Tested by simulating an exception. Differential Revision: D20885521 fbshipit-source-id: b6b6f5994a5fb439e40ec7c585435b6dfe7ddb8e	2020-04-10 19:41:30 -07:00
svcscm	d71aeeceef	Updating submodules Summary: GitHub commits: `16c94d8911` `0bed508d4a` `1b437911a9` `79c102e8df` `5c19a441c4` `76f1dd09a9` `9623728be6` `2df3f7e68f` `41e97f3303` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 02b3999a2ef53f78d521af12dff15d82c79f877e	2020-04-10 18:28:39 -07:00
Ailing Zhang	e892398922	Upstream generic device test patch. (#36321 ) Summary: So that XLA can run all tests by setting env `PYTORCH_TEST_PATH` instead of patching a diff. :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/36321 Differential Revision: D20946635 Pulled By: ailzhang fbshipit-source-id: 55ab7db7fd93063ad495a0c23a903218a29625a4	2020-04-10 16:59:48 -07:00
Sebastian Messmer	4305c7f97e	Remove experimental c10 ops (#36394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36394 Those are remains from the time when c10 was being constructed. They've fulfilled their goal of making sure that the c10 library supports all needed corner cases and those corner cases are now covered by actual ops. We don't need these experimental ops anymore. ghstack-source-id: 101933837 Test Plan: CI Differential Revision: D20965279 fbshipit-source-id: ff46f2482ff58ca3fa955288083b12ec2066938e	2020-04-10 16:11:16 -07:00
Yinghai Lu	6920b13500	Move fakelowp tests from glow to caffe2 (#36409 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36409 Pull Request resolved: https://github.com/pytorch/glow/pull/4409 Since glow OSS doesn't really ship with python, it's much easier to do it in pytorch. All the glow dependency can be done though LD_LIBRARY_PATH in OSS. Test Plan: ``` buck test caffe2/caffe2/python/fakelowp: ``` Reviewed By: amylittleyang Differential Revision: D20969308 fbshipit-source-id: 06a02d23f4972a92beb18e1d052e27d8724539d0	2020-04-10 15:52:36 -07:00
Mike Ruberry	bd4761123d	Revert D20958928: [pytorch][PR] Port masked_select cuda from TH to ATen Test Plan: revert-hammer Differential Revision: D20958928 Original commit changeset: 4704f5d2d271 fbshipit-source-id: 47eb440a74b7b1bd46b4a2aa1999e6de5aeb602b	2020-04-10 15:30:16 -07:00
Mike Ruberry	86e8c49fae	Revert D20523080: [pytorch] reduce memory footprint in fused conv QAT ops Test Plan: revert-hammer Differential Revision: D20523080 Original commit changeset: 4a94047dee01 fbshipit-source-id: 66dce461c13dce794edb17fd7a32607d9c68a846	2020-04-10 15:23:54 -07:00
Vasiliy Kuznetsov	eddbee19a7	hardswish: add cuda kernels (#36350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36350 Adds CUDA kernels for hardswish in order to unblock use in training. Test Plan: added test coverage for forward pass ran this script for various input sizes to test backward pass against a manual Hardswish module: https://gist.github.com/vkuzo/30e196b059427725817f2ee934ed0384 Imported from OSS Differential Revision: D20955590 fbshipit-source-id: 635706fbf18af9a4205f2309f3314f2996df904d	2020-04-10 13:53:37 -07:00
Dmytro Dzhulgakov	7576cf8d00	[caffe2] Use cpuinfo in perfkernels to simplify build dependency (#36371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36371 It allows to drop circular dependency and remove unknown_symbols in Buck build. It'd be good to get rid of GetCpuId all together in favor of cpuinfo, but it's not really blocking anything Reviewed By: malfet Differential Revision: D20958000 fbshipit-source-id: ed17a2a90a51dc1adf9e634af56c85f0689f8f29	2020-04-10 13:26:34 -07:00
Kurt Mohler	343f2c0925	Port masked_select cuda from TH to ATen (#35429 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33054 This PR does not directly depend on PR https://github.com/pytorch/pytorch/issues/33269 (the CPU counterpart), but whichever one of these two PRs gets merged last should remove `_th_masked_select` and `_th_masked_select_bool` from `aten/src/ATen/Declarations.cwrap`. Performance stats are here: https://github.com/pytorch/pytorch/issues/33054#issuecomment-591710014 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35429 Differential Revision: D20958928 Pulled By: ngimel fbshipit-source-id: 4704f5d2d271f3669cecd4f41d266ec1f67ec7f2	2020-04-10 13:21:17 -07:00
Yinghai Lu	d27dccfdaf	Open source the missing part of FakeFp16 ops (#36353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36353 ATT Test Plan: buck build Reviewed By: hyuen, amylittleyang Differential Revision: D20953953 fbshipit-source-id: f6d2562dd3d123f0fcf4c912ed15053bf215d321	2020-04-10 13:16:13 -07:00
svcscm	c029aaa25c	Updating submodules Summary: GitHub commits: `0938cf4150` `d600e5b0eb` `15e3b9c3ad` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: b3ac607796d6e67f5260bb5627474be7f2d45f2c	2020-04-10 13:07:46 -07:00
Martin Yuan	f999d600d0	Fix the typo in operator name string (#36296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36296 When there's no overload name, the operator name string should be "name", instead of "name.". Test Plan: Imported from OSS Differential Revision: D20966759 Pulled By: iseeyuan fbshipit-source-id: b4b31923c7ec5cdca8ac919bd6a84ba51afb6cd1	2020-04-10 12:56:16 -07:00
Kevin Matzen	82be7c755a	[pytorch] reduce memory footprint in fused conv QAT ops (#35002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35002 I was running into some memory issues once I enabled QAT and I found some opportunities to in-place operations. In particular, looks like we can do the ReLUs in-place and the bias addition seems to also work inline. The multiplication operation right above the bias addition is not eligible because there's a bifurcation to produce conv_orig. Reviewed By: jerryzh168 Differential Revision: D20523080 fbshipit-source-id: 4a94047dee0136f4014a328374896b28f561e41f	2020-04-10 12:39:35 -07:00
Xiang Gao	15c7486416	Canonicalize includes in c10, and add tests for it (#36299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36299 Test Plan: Imported from OSS Differential Revision: D20943005 Pulled By: ezyang fbshipit-source-id: 9dd0a58824bd0f1b5ad259942f92954ba1f63eae	2020-04-10 12:07:52 -07:00
Nick Gibson	42457e634d	[TensorExpr] add support for Reduction Ops (#35866 ) Summary: Second attempt at the reduction frontend for the TensorExpr compiler. Has two APIs, a simple version for common reduction types and a customizable Reducer fronted which allows specifying initializer, reduction interaction via lambda and body via lambda. Simple API looks like so: ``` Buffer b(BufHandle("b", {10}), kInt); Tensor* c = Reduce("sum", {}, Sum(b), {{10, "m"}}); ``` An example of specializing a Sum to do Matmul: ``` Buffer tA(BufHandle("tA", {M, K}), kFloat); Buffer tB(BufHandle("tB", {K, N}), kFloat); Sum matmul([&](ParameterList& v) { ExprHandle m = v[0]; ExprHandle n = v[1]; ExprHandle k = v[2]; return tA(m, k) * tB(k, n); }); Tensor* mm = Reduce("mm", {{M, "m"}, {N, "n"}}, matmul, {{K, "k"}}); ``` A fully specialized Reduction: ``` VarHandle searchValue("searchValue", kInt); Buffer b(BufHandle("b", {4, 10}), kInt); Reducer anyEqSV( ExprHandle(0), [](ExprHandle a, ExprHandle b) { return CompareSelect::make(a, 1, 1, b, kEQ); }, [&](ParameterList& v) { return CompareSelect::make(b.call(v), searchValue, kEQ); }); Tensor* any = Reduce("anyEqual", {{4, "i"}}, anyEqSV, {{10, "j"}}); ``` --- Until lowering, Reductions are held in a compound form for easier optimization: ``` VarHandle m("m", kInt); Buffer b(BufHandle("b", {2, 3, m}), kFloat); Tensor* c = Reduce("sum", {{2, "l"}, {3, "n"}}, Sum(b), {{m, "m"}}); LoopNest loop({c}); std::cout << loop.root_stmt() << "\n"; ``` ``` for (int l = 0; l < 2; l++) { for (int n = 0; n < 3; n++) { for (int m = 0; m < m_1; m++) { sum[l, n] = ReduceOp(sum[l, n] = float(0);, (sum[l, n]) + (b[l, n, m]), {m}); } } } ``` ``` loop.prepareForCodegen(); std::cout << loop.root_stmt() << "\n"; ``` ``` for (int l = 0; l < 2; l++) { for (int n = 0; n < 3; n++) { sum[(0 + l * (1 * 3)) + n * 1] = float(0); for (int m = 0; m < m_1; m++) { sum[(0 + l * (1 * 3)) + n * 1] = (sum[(0 + l * (1 * 3)) + n * 1]) + (b[((0 + l * ((1 * m_1) * 3)) + n * (1 * m_1)) + m * 1]); } } } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35866 Differential Revision: D20965577 Pulled By: nickgg fbshipit-source-id: afe506c90db794447180056417013bcaf0e2c049	2020-04-10 11:57:10 -07:00
Chunli Fu	5177906d67	[Shape Inference] Infer shape info for second input of elementwise ops (#36365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36365 Test Plan: unit test Reviewed By: yinghai, ipiszy Differential Revision: D20959518 fbshipit-source-id: bafbe4f87534398003a49ff5bd8f398b3a0f1473	2020-04-10 11:21:14 -07:00
Sebastian Messmer	4a98ba811c	Enable c10 unboxing for ops with TensorList (#36330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36330 - ghstack-source-id: 101908458 Test Plan: CI Differential Revision: D20931186 fbshipit-source-id: 4947233260704f6962865957a8f1a7b38dd6cb0a	2020-04-10 11:04:14 -07:00
svcscm	e574ff3511	Updating submodules Summary: GitHub commits: `02f45c752f` `9e89ffb776` `3db8b846ab` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: d5c9f07ec63a1eebf77a9e2b8cae11e2098016a1	2020-04-10 10:55:12 -07:00
Martin Yuan	d73ee763fc	Fix the clang-format error caused in register prim ops change. (#36393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36393 Fix the clang-format CI error caused in #35426 Test Plan: Imported from OSS Differential Revision: D20964854 Pulled By: iseeyuan fbshipit-source-id: 97f2ba1e006cac0f33b223315263b0b84c24cb15	2020-04-10 10:11:27 -07:00
Valmiki Rampersad	247f2df840	Fixed include file header guard. (#36329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36329 The same header guard was used in two different header files (not sure if this was intentional.) Test Plan: CI Tests Reviewed By: jspark1105 Differential Revision: D20946512 fbshipit-source-id: dd0190943a8c90059d480f15c05f3bfcce956acd	2020-04-10 10:11:22 -07:00
Edward Yang	79973a16ce	Add missing TORCH_API annotation (#36391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36391 Without it I get ``` ImportError: /data/users/ezyang/pytorch-tmp/torch/lib/libtorch_python.so: undefined symbol: _ZN5torch3jit18checkDoubleInRangeEd ``` when I build with DEBUG=1 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20964292 Pulled By: ezyang fbshipit-source-id: b2569f5813c6490de51372e70029648a36891e7a	2020-04-10 10:09:24 -07:00
Di Wu	b0c90fad93	Re-enable test_avg_pool3d_nhwc (#36259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36259 Re-enable test disabled Related to #36129, which should be fixed by an earlier PR #36103. Test Plan: Imported from OSS Differential Revision: D20933100 fbshipit-source-id: aca4e3b0b83a581fe58760b6730255b3176f41fc	2020-04-10 10:04:45 -07:00
Hameer Abbasi	1875c2e4bd	Add torch.Tensor.as_subclass method. (#34369 ) Summary: This is according to pytorch/rfcs#3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34369 Differential Revision: D20963929 Pulled By: ezyang fbshipit-source-id: e618af6fd36e1dfaeda617162314ad5840f55358	2020-04-10 09:16:35 -07:00
Hameer Abbasi	7c825bad10	[RELAND] Add __torch_function__ benchmarks (#36138 ) Summary: Re-land of https://github.com/pytorch/pytorch/issues/35530 and https://github.com/pytorch/pytorch/issues/34645 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36138 Differential Revision: D20893770 Pulled By: ezyang fbshipit-source-id: 75ab688a086f5fb87412a853df5246c0c39704ca	2020-04-10 09:14:31 -07:00
Mike Ruberry	3aeb2b1562	Returns float tensors for complex inputs to abs (#35871 ) Summary: Per title. A test is added to test_type_promotion for the behavior. This behavior is consistent with NumPy's. For complex inputs to `abs` the result is cast to float after the computation since the computation of abs must be performed on the original complex tensor. While `std::abs` returns a float value when called on complex inputs, returning a FloatTensor directly would require additional loop instantiations in TensorIterator. This may be worthwhile to pursue in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35871 Differential Revision: D20961711 Pulled By: mruberry fbshipit-source-id: 232f62cf64caa4154eb2194969efa51d2082d842	2020-04-10 09:08:45 -07:00
Hong Xu	817e4f9ef1	Correct a ValueError in dataloader to TypeError (#36244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36244 Differential Revision: D20963949 Pulled By: ezyang fbshipit-source-id: 8c6aa4831021788052269e7aa8282d11eba4e085	2020-04-10 09:03:58 -07:00
Martin Yuan	a91097bdfb	Revert D20964368: Revert D20408831: [Lite Interpreter] Operator registration migrate from manual to selective build Test Plan: revert-hammer Differential Revision: D20964368 Original commit changeset: f1874088a597 fbshipit-source-id: d9317ed97a98e2b04c190785b5564536b1096282	2020-04-10 08:19:36 -07:00
Edward Yang	586481a6e2	Revert D20408831: [Lite Interpreter] Operator registration migrate from manual to selective build Test Plan: revert-hammer Differential Revision: D20408831 Original commit changeset: ec75dd762c46 fbshipit-source-id: f1874088a5970dd220cc027d0020ab6223b9bd93	2020-04-10 08:03:38 -07:00
Hong Xu	ee4cc96eee	Vectorize in-place comparison operators (#35117 ) Summary: Benchmark: (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz) ```python import timeit for op in ('gt', 'lt', 'ge', 'le', 'eq', 'ne'): for dtype in ('torch.float', 'torch.double', 'torch.int16', 'torch.int32', 'torch.int64'): for n, t in [(10_000, 100000), (100_000, 10000)]: print(f'a.{op}_(b), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit(f'a.{op}_(b)', setup=f'import torch; a = torch.arange(1, {n}, dtype={dtype}); b = torch.arange({n}, 1, -1, dtype={dtype})', number=t)) ``` Before: ``` a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.778998922000028 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.6359690249992127 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.double 1.0801493119997758 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.9360321379990637 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.7341018620008981 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.6345281440007966 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.7396387640001194 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.6429641230006382 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.7759611700003006 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.6672059659995284 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.7724312530008319 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.6392585769990546 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.7917451840003196 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.6455550159989798 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.739991647998977 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.6572993859990675 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.7627949479992822 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.6476544910001394 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.7965036850000615 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.6780715599998075 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.7653547080008138 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.6383065829995758 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.7895260240002244 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.6508346030004759 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.7409299750015634 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.6383492870008922 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.7620547579990671 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.6474270239996258 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.8070051169997896 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.6712598600006459 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.7627660060006747 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.6406353189995571 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.double 1.0826010620003217 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.9391552950000914 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.7427801039993938 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.6365172640016681 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.7679271510005492 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.6453389289999905 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.788032889000533 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.6708840760002204 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.float 1.078837263999958 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.9397531720005645 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.double 1.1031508050000411 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.9412319389994082 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.7509566959997755 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.638570957000411 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.7592877549996047 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.6458840529994632 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.7984061539991671 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.6776346309998189 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.7724407899986545 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.6581534130000364 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.8303323249983805 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.6954390920000151 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.745512373998281 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.6360954970004968 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.7569978400006221 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.6450422030011396 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.7889118379989668 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.6693385389989999 ``` After: ``` a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.2444220920006046 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.2031730359994981 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.35491806199934217 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.3905606850003096 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.16665379499863775 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.10095906300011848 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.21650469999985944 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.18737469400002738 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.35481256200000644 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.36696120199849247 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.21976138800164335 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.20275393200063263 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.3695997209997586 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.39441510399956314 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.15657078300137073 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.0992998069996247 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.20425128799979575 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.20352934599941364 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.35883567900054913 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.39059587599876977 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.21457727400047588 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.18836135499986995 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.35971907199927955 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.3688875009993353 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.1576009280015569 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.09524034199966991 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.2064543649994448 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.18726435600001423 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.35351785300008487 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.3680737989998306 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.2132134399998904 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.2140274829998816 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.36539215199991304 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.39128020300086064 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.15712150600120367 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.10149904400168452 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.2103407699996751 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.2134442910009966 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.35387034300038067 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.38917528399906587 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.2190484450002259 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.2030815980015177 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.3710030169986567 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.36419657899932645 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.15986497499943653 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.10145393699895067 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.21011781599918322 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.20121852699958254 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.36681504499938455 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.364472848999867 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.2290963309988001 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.21674784300012107 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.3829616689999966 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.39437660300063726 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.1661020749997988 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.10052955100036343 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.21827425599985872 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.21522501399886096 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.37058242300008715 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.39304063900090114 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35117 Differential Revision: D20721181 Pulled By: pbelevich fbshipit-source-id: 4e38cc0f42393483db91b2dc53ffc507f91ed904	2020-04-10 06:50:07 -07:00
Martin Yuan	7fcf8b0a3b	[Lite Interpreter] Operator registration migrate from manual to selective build (#35426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35426 Use selective build with the full set of operators (vs. manually register each used op with "_" prefix). Lite interpreter relies on JIT operator dispatch. In future we still need JIT operator dispatch dispatch ops that are not registered in c10. Currently the selective build is for c10/aten dispatch in BUCK. There is JIT selective code-gen in OSS but not ported to BUCK yet. This diff is also porting the selective code-gen in BUCK. * The selected op list is passed to gen_jit_dispatch.py. * The list passed to gen_jit_dispatch is the top-level ops (USED_PT_OPS) only, because the selective c10/aten dispatch already registered other ops that are called from the top-level ops. ghstack-source-id: 101885215 (Note: this ignores all push blocking failures!) Test Plan: 1. In Python, run torch.jit.export_opnames(scripted_M_mod) 2. Append the operator names into fbcode/caffe2/pt_ops.bzl and the BUCK target. 3. Run ``` buck run xplat/caffe2/fb/lite_predictor:lite_predictor_bi -- --model=/home/myuan/temp/bi_pytext_0315.bc --input_dims "1,4" --input_type int64 --pytext_len=4 ``` Should provide expected results. In addition, the size of the generated code for JIT registration, for example, ```register_aten_ops_0.cpp```, should be significantly reduced (from ~250 KB to ~80KB). The non-selected op registration schema are still kept, but the registration functor is replaced by ```DUMMY_OPERATION``` Reviewed By: ljk53 Differential Revision: D20408831 fbshipit-source-id: ec75dd762c4613aeda3b2094f5dad11804dc9492	2020-04-10 02:31:32 -07:00
Yuxin Wu	9a4bc67f66	[caffe2/detectron2] fix Mask R-CNN caffe2 conversion on GPU (#36366 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36366 fix issues introduced in D20528758 Reviewed By: linbinyu Differential Revision: D20959367 fbshipit-source-id: 920055a6782b9c6729177f7101f2f9eb3e40ebf8	2020-04-10 01:12:13 -07:00
svcscm	31dca07fa5	Updating submodules Summary: GitHub commits: `20b6cf14e2` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: f5a64d847512d25a138f666d25e03c58277327f0	2020-04-09 23:49:34 -07:00
Jianyu Huang	37c1bd2946	Move FakeFP16 back to internal to remove dependency on MKL (#36297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36297 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/343 We moved FakeFP16 back to close source and kept `RoundToFloat16` function in "fbgemm/FbgemmConvert.h". This is because FakeFP16 introduced dependency on MKL in the FBGEMM core. Also it doesn't seem to be needed for open source, as it is not used anywhere. Test Plan: CI Reviewed By: jspark1105 Differential Revision: D20937962 fbshipit-source-id: 9487a9fd2282b6df2f754c22bea36f2255a5c791	2020-04-09 23:03:04 -07:00
davidriazati	2ec6a30722	Bump produced file format version (#36085 ) Summary: This was left off of #35741, but the max supported file format change has been landed for several weeks, so this should be fine to land. ](https://our.intern.facebook.com/intern/diff/20875051/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/36085 Pulled By: driazati Reviewed By: eellison Differential Revision: D20875051 fbshipit-source-id: c3b84c85d791cb6f286a2ed38ca5cd1219b332b2	2020-04-09 22:52:49 -07:00
Ansha Yu	aac36a89ff	[model transform] tuple to arglist jit pass (#36093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36093 Unwrap any tuples (including NamedTuples) in the module forward function input list to be arglist. 1. Supports multiple tuple inputs, and traces their use through CallMethods and TupleIndex 2. Does not unwrap inner use of other tuples that did not show up in the original toplevel graph inputs We work from the ScriptModule level instead of the Graph level because: 1. If the ScriptModule was previously called with the original set of inputs, the GraphExecutor caches the ExecutionPlan (specifically, ArgumentSpecCreator is derived from the Graph and type check the inputs passed in) 2. Since we are changing this graph's inputs, we clone the module and clear the GraphExecutor. Since we work from ScriptModule level, we cannot take advantage of jit level syntactic sugar like run_pass(), so I jit exposed this as a cpp extension. Let me know if there are other ideas about this. Test Plan: buck test caffe2/torch/fb/model_transform:signature_translation_test Todo: Verify use in bento Untranslated graph: ``` > graph(%self : __torch__.test_jit.SparseNNWrapper, > %inputs.1 : NamedTuple(dense : Tensor, sparse : Dict(int, Tensor))): > %2 : __torch__.test_jit.SparseNN = prim::GetAttr[name="main_module"](%self) > %4 : Tensor = prim::CallMethod[name="forward"](%2, %inputs.1) # /data/users/ansha/fbsource/fbcode/buck-out/dev/gen/caffe2/test/jit#binary,link-tree/test_jit.py:12141:23 > return (%4) ``` Translated graph: ``` > graph(%self : __torch__.test_jit.___torch_mangle_1.SparseNNWrapper, > %inputs.1_0 : Tensor, > %inputs.1_1 : Dict(int, Tensor)): > %2 : __torch__.test_jit.___torch_mangle_2.SparseNN = prim::GetAttr[name="main_module"](%self) > %3 : Tensor = prim::CallMethod[name="forward"](%2, %inputs.1_0, %inputs.1_1) # /data/users/ansha/fbsource/fbcode/buck-out/dev/gen/caffe2/test/jit#binary,link-tree/test_jit.py:12141:23 > return (%3) ``` Reviewed By: houseroad Differential Revision: D20313673 fbshipit-source-id: fddd07c9537dc8b6f480a14d697bea10ecc74470	2020-04-09 22:05:43 -07:00
svcscm	391a36a59c	Updating submodules Summary: GitHub commits: `54dc44ba8a` `726a62cb89` `955756111e` `66a95f0fac` `f1d333089f` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 3585db200a5d84ba01b8cf90a3bd7bf003314345	2020-04-09 21:55:10 -07:00
Zafar Takhirov	891a533b24	Adding Conv1d to quantization default_mappings (#36352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36352 Test Plan: Imported from OSS Differential Revision: D20955781 Pulled By: z-a-f fbshipit-source-id: 37fbcf329a6abcd9a367a73ad65ce543ed9ffe47	2020-04-09 21:41:13 -07:00
Xiang Gao	3d7c9abbf7	Refactor thread_reduce for better unrolling and vectorization in the future (#36014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36014 Benchmark on RTX2080Ti: 2.13ms vs 1.88ms https://github.com/zasdfgbnm/things/blob/master/2020Q2/reduction-benchmark-refactor.ipynb Test Plan: Imported from OSS Differential Revision: D20927535 Pulled By: ngimel fbshipit-source-id: b65b749b58cebe0751e4ec7e1cf359543c401580	2020-04-09 20:03:17 -07:00
Xiang Gao	d9227bb311	Target 4096 blocks instead of split to large grid for large reduction (#35997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35997 When the number of blocks is large enough, we are already achieving blalanced SM allocation. But we still should keep the number of inputs per thread large, because thread reduce is cheap. Benchmark for Half on V100: https://github.com/zasdfgbnm/things/blob/master/2020Q2/reduction-benchmark.ipynb On large tensor, it is: 1.37ms vs 1.25ms Test Plan: Imported from OSS Differential Revision: D20927533 Pulled By: ngimel fbshipit-source-id: 40df52e439cc1c01cda66c6195b600f301c5e984	2020-04-09 20:00:53 -07:00
Donna Choi	2f5b523cd0	Remove unnecessary whitespace in complex tensors (#36331 ) Summary: This PR addresses Issue https://github.com/pytorch/pytorch/issues/36279. Previously, printing of complex tensors would sometimes yield extra spaces before the elements as shown below: ``` print(torch.tensor([[1 + 1.340j, 3 + 4j], [1.2 + 1.340j, 6.5 + 7j]], dtype=torch.complex64)) ``` would yield ``` tensor([[(1.0000 + 1.3400j), (3.0000 + 4.0000j)], [(1.2000 + 1.3400j), (6.5000 + 7.0000j)]], dtype=torch.complex64) ``` This occurs primarily because when the max width for the element is being assigned, the formatter's max_width is calculated prior to truncating the float values. As a result, ```self.max_width``` would end up being much longer than the final length of the element string to be printed. I address this by adding a boolean variable that checks if a complex tensor contains only ints and change the control flow for calculating ```self.max_width``` accordingly. Here are some sample outputs of both float and complex tensors: ``` tensor([[0., 0.], [0., 0.]], dtype=torch.float64) tensor([[(0.+0.j), (0.+0.j)], [(0.+0.j), (0.+0.j)]], dtype=torch.complex64) tensor([1.2000, 1.3400], dtype=torch.float64) tensor([(1.2000+1.3400j)], dtype=torch.complex64) tensor([[(1.0000+1.3400j), (3.0000+4.0000j)], [(1.2000+1.3400j), (6.5000+7.0000j)]], dtype=torch.complex64) tensor([1.0000, 2.0000, 3.0000, 4.5000]) tensor([(1.+2.j)], dtype=torch.complex64) ``` cc ezyang anjali411 dylanbespalko Pull Request resolved: https://github.com/pytorch/pytorch/pull/36331 Differential Revision: D20955663 Pulled By: anjali411 fbshipit-source-id: c26a651eb5c9db6fcc315ad8d5c1bd9f4b4708f7	2020-04-09 19:35:05 -07:00
Jerry Zhang	d916cf05d4	[quant][test] Split TestQuantizeScript to two TestCase (#36354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36354 TestQuantizeScript is splitted to - TestQuantizeScriptJitPasses - TestQuantizeScriptPTSQOps (post training static quantization ops) Test Plan: . Imported from OSS Differential Revision: D20956731 fbshipit-source-id: 860cd24ea3f49450126ce2d872894492bdc822d8	2020-04-09 19:18:22 -07:00
Elias Ellison	8cb1950805	[JIT] fix alias assertion (#36178 ) Summary: AnyType wasn't listed as a mutable type, so the assertion triggered (yay!). Also update the `isMutableTypeInternal(from) != isMutableTypeInternal` logic to be more encompassing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36178 Differential Revision: D20922356 Pulled By: eellison fbshipit-source-id: 7060a62b18e98dc24b6004a66225c196aadb566e	2020-04-09 18:25:18 -07:00
BowenBao	48bf3eef1a	[ONNX] disable size optimizations for onnx (#36243 ) Summary: Reviving this PR https://github.com/pytorch/pytorch/issues/35401 eellison. I believe after the profiled graph executor fix the test failures are handled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36243 Differential Revision: D20950623 Pulled By: eellison fbshipit-source-id: 5fbee426d1a098d84d5938540d45ce00828299be	2020-04-09 18:17:42 -07:00
Zafar Takhirov	c5662dd5dc	Base class for the quantized ConvTranspose (#35370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35370 Test Plan: Imported from OSS Differential Revision: D20641812 Pulled By: z-a-f fbshipit-source-id: 42bb1ed96d6b6e0a5da6e693d02ff616c33d9ef6	2020-04-09 17:52:03 -07:00
Peizhao Zhang	7374a00bef	[pt]Supported benchmarking pytorch jit self-contained models. (#35279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35279 Supported benchmarking pytorch jit self-contained models. * By specifying flag `--no_inputs=True`, the binary supports benchmarking self-contained torchscript model (model runs without inputs, `model.forward()`) * This allows moving data preparation part outside of this binary. Reviewed By: kimishpatel Differential Revision: D20585639 fbshipit-source-id: c28e50503534c90023c1430479d26f1c1ce740b1	2020-04-09 17:02:17 -07:00
Jerry Zhang	f2bae8e869	[quant][fix] at::print for per channel affine quantized tensors (#36280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36280 Test Plan: . Imported from OSS Differential Revision: D20948352 fbshipit-source-id: 92188806b9c129458ebb2cdc47599427e3b6e216	2020-04-09 16:27:00 -07:00
svcscm	51456dc808	Updating submodules Summary: GitHub commits: `12efe7c0ad` `af9585915d` `07c56b2c42` `4a29e9cbc3` `1aaa4f7c22` `832671379a` `26862c2f23` `819d357723` `9f18c234d9` `81b956b41c` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 643525fbe02fc99e1258047ae13f0fe3704e3709	2020-04-09 16:19:45 -07:00
Jerry Zhang	358466f1da	[quant] Move graph mode quantization tests to test_quantize_script.py (#36324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36324 Test Plan: . Imported from OSS Differential Revision: D20948046 fbshipit-source-id: 2dd8f0c6fbe8fd84293420b97592dc586d25def9	2020-04-09 16:10:18 -07:00
Pritam Damania	14ce500a9b	Appropriately handle exceptions in autograd engine. (#36019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36019 Once the autograd engine is finished with a GraphTask it would call `markCompleted` on the Future. This could trigger callbacks on the Future that could throw exceptions. If one of the callbacks did throw an exception, we would call setErrorIfNeeded, which would be no-op since the Future is already marked as completed. This would effectively mean we would be swallowing exceptions. To avoid this, we do the following: 1) Rethrow the exception in `mark_graph_task_completed`. 2) In `setErrorIfNeeded`, log the error if we are ignoring it. ghstack-source-id: 101607329 Test Plan: Verified appropriate logging. Differential Revision: D20854806 fbshipit-source-id: 76bdf403cfd6d92f730ca1483ad5dba355f83e58	2020-04-09 15:18:03 -07:00
Richard Zou	9662ef66b7	Fix `torch.min` docs (#36319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36319 On the way to resolving #35216. This is a fix for just the master branch but once this goes in, I'll send a cherry-pick to release/1.5 The problem is that we were not calling `format` on a string that had templates (e.g., '{input}', '{dim}'). This change makes it so that we call format on the entire docstring for `torch.min`. Test Plan: - The `torch.max` docs are OK: https://pytorch.org/docs/master/torch.html#torch.max and don't need changing. - `torch.min` docs, before this change: see second screenshot in #35216. - after this change: <Insert link here on github> ![image](https://user-images.githubusercontent.com/5652049/78921702-4e2acc00-7a63-11ea-9ea0-89636ff6fb0a.png) Differential Revision: D20946702 Pulled By: zou3519 fbshipit-source-id: a1a28707e41136a9bb170c8a4191786cf037a0c2	2020-04-09 15:10:59 -07:00
svcscm	1ffc2d9ace	Updating submodules Summary: GitHub commits: `fe67bb7c0e` `027c1644a7` `8045a2a068` `05953181a9` `faeba96985` `e860f8840a` `a8a1113de5` `959cdee731` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 652dbf7024ea374506fa5c46440a1ab9d630e28b	2020-04-09 15:03:00 -07:00
Edward Yang	2de3e491a8	[RELAND] Add temporary impl_UNBOXED syntax sugar for unboxed-only defs. (#36223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36223 Previously #35714 There are a lot of unboxed only defs. We're committed to removing them at the end of the half but as I am about to do a lot of porting to the new API, let's get them into a form where they're easy to remove. This is a new overload impl_UNBOXED that will pass the function pointer straight to CppFunction::makeUnboxedOnly I don't attempt to make the _UNBOXED API complete; in particular, catchall declarations don't get this sugar (as there are very few of them). To get some coverage of _UNBOXED API for code analysis, I switched one of our unboxed tests to be an impl rather than a def. This shouldn't materially affect coverage. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929259 Pulled By: ezyang fbshipit-source-id: 72d2061b6c8a6afbcd392b47f53ade18de2f9184	2020-04-09 14:58:33 -07:00
Edward Yang	ef07bb65e9	[RELAND] Add DispatchKey impl overload; remove use of torch::dispatch (#36222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36222 Reland of #35706, with fixes to code analyzer. It is extremely common to define implementations of operators at a specific dispatch key, so we add an overload to impl specifically for this case. I then delete most uses of torch::dispatch dispatch_autograd call sites can't make use of this overload. So instead the new preferred way to specify something as autograd is to pass kAutograd as the dispatch key (short form, analogous to kCPU/kCUDA which we support today). I flip flopped about whether or not kAutograd should have the type DispatchKey or some other type (to help better encapsulate the DispatchKey enum); this is more direct and I can't think of any BC problems from this usage. Some other reorganization I did: - I renamed all of the worker functions in op_registration to have a leading underscore and made them private, just to make it more clear what the public versus private API were (the private API shouldn't be used by users because it doesn't come with && overloads) Note that this means I needed to adjust the regex in the code analyzer, because - In a few places where I was touching lines already, I replaced full DispatchKey typed out enums with shorter kFoo names, similar to kAutograd but I didn't publish these globally. - Code analyzer now prints a unified diff, and in the other order (because I tend to think of the diff as reporting how the /new/ result is different) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20929256 Pulled By: ezyang fbshipit-source-id: c69b803d2b3a1a8aff70e14da33d3adec5239f13	2020-04-09 14:56:55 -07:00
Nick Gibson	477f1c047c	[TensorExpr] add simplication of constant branches to IR Simplifier (#36257 ) Summary: Adds handling of constant branches to the TensorExpr IR Simplifier. This covers both IfThenElse and Cond when the condition expression is a known constant (e.g. `IfThenElse(1, X, Y) => X`), or when both arms of the branch are the same (e.g. `IfThenElse(Y, X, X) => X`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/36257 Differential Revision: D20947777 Pulled By: nickgg fbshipit-source-id: 974379e42a6d65ce3e7178622afb62d36ad4e380	2020-04-09 14:45:13 -07:00
Tristan Rice	90c7db8ae3	caffe2/core/plan_executor: add cancellation of async nets on error + propagate exceptions via std::exception_ptr for stack traces (#31966 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31966 This has three parts: * When `--caffe2_handle_executor_threads_exceptions` is set when a parallel execution step throws an exception it can hang waiting for async nets to finish. This adds cancellation code to cancel any async nets. * This makes the exceptions returned from parallel workers pass a std::exception_ptr so the stack trace can be recorded with folly::SmartExceptionTracer. * Define Cancel method at NetBase level to avoid pulling in unsupported AsyncSchedulingNet for fbandroid. Test Plan: Added unit tests for plan_executor buck test //caffe2/caffe2:caffe2_test_cpu buck test //caffe2/caffe2:caffe2_test_cpu -- PlanExecutorTest --stress-runs 100 Reviewed By: boryiingsu Differential Revision: D19320177 fbshipit-source-id: d9939fcea1317751fa3de4172dfae7f781b71b75	2020-04-09 14:38:18 -07:00
Edward Yang	88c22070fe	Revert D20768930: add quantized layer norm implementation Test Plan: revert-hammer Differential Revision: D20768930 Original commit changeset: ddf8727e9840 fbshipit-source-id: a190e1d1e42281eba627b0dbb6de1b3651cd5e97	2020-04-09 14:36:37 -07:00
Supriya Rao	d51ad40fe1	[quant][onnx] Mark upsample_nearest2d, sigmoid and reshape as no scale (#36325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36325 return the scale of the input tensor Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py Imported from OSS Differential Revision: D20947338 fbshipit-source-id: 71fc15fce815972d23804ff7cf936da997e71dc0	2020-04-09 14:31:55 -07:00
Christian Sarofeen	e551bfc8de	New CUDA Fuser code lowering refactor (#36199 ) Summary: This PR completely refactors the code lowering process from our IR to CUDA. Before we had one giant step that would go from a relatively high level IR straight to CUDA, now we're lowering this first into concepts like ForLoop, IfThenElse, TensorIndex, Allocate. This lowering will allow us to do more complex code lowering like reductions and unrolling. Unrolling will quickly follow this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36199 Reviewed By: dzhulgakov Differential Revision: D20925220 Pulled By: soumith fbshipit-source-id: 8f621c694c68a1aad8653e625d7287fe2d8b35dc	2020-04-09 14:27:05 -07:00
Dhruv Choudhary	f0ea6862ba	Support for pruning delays in Adagrad Optimizer (#34527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34527 Adding support for prune_delays and prune ratios in Adagrad optimizer. Test Plan: Tested via unit tests in masked_adagrad_optimizer_test. Added unit test for prune_delay versions of MaskedAdagrad buck build caffe2/caffe2/fb/optimizers:masked_adagrad_optimizer_test; buck-out/gen/caffe2/caffe2/fb/optimizers/masked_adagrad_optimizer_test#binary.par buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- 'test_pruning' All Dper tests passed https://our.intern.facebook.com/intern/testinfra/testrun/7599824380741217 Reviewed By: chocjy Differential Revision: D20313419 fbshipit-source-id: 5c2c8d4e0fc2ec538bcd6f145c6b87a2381f90f3	2020-04-09 12:59:23 -07:00
Tristan Rice	376542c83d	caffe2: preserve python exception type from PythonOp (#36267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36267 This makes PythonOp throw the original python exception instead of wrapping it in a c10::Error type. This allows throwing exceptions from Python and preserving the type when they're caught again in Python. This is important for structured logging and handling non-retryable error types. Test Plan: buck test caffe2/caffe2/python:python_op_test Reviewed By: wenqicaofb Differential Revision: D20928098 fbshipit-source-id: 001747f022c657b420f8450b84d64f4d57f6cdf6	2020-04-09 12:43:24 -07:00
huangzhiyuan	8493383e94	remove some code part never been called (#35033 ) Summary: At title. Found that `THBlas_(swap)` was never be used. So I remove it from repo. Please help review patch, and any suggestions are welcomed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35033 Differential Revision: D20918998 Pulled By: albanD fbshipit-source-id: 93af8429231421185db0ccdfdd44e349a8f68c67	2020-04-09 12:36:52 -07:00
Omkar Salpekar	264da24c9e	Fixing RPC Shutdown and Thread Joining (#36239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36239 ProcessGroupAgent and ThriftAgent threads were joined at shutdown, but RpcAgent threads were joined by the destructor. This PR joins all threads at shutdown by using a pattern similar to `start` in RPC. The derived classes implement a `shutdownImpl` class that cleans up backend-specific state. RpcAgent implements `shutdown` which cleans up generic state and calls the underlying `shutdownImpl`. The atomic running is now set and unset by RpcAgent so backends do not need to mutate it. ghstack-source-id: 101820415 Test Plan: Ensured this works with `test_duplicate_name` (in which RpcAgent is constructed but PGA is not), and selected `rpc_spawn` and `dist_autograd_spawn` tests with TSAN. Checking Build Bot and CI as well, and continuing to test more with TSAN on devserver (currently running into memory issues). Reviewed By: jjlilley Differential Revision: D20902666 fbshipit-source-id: 5dbb5fc92ba66f75614c050bb10b10810770ab12	2020-04-09 12:32:00 -07:00
albanD	9497b21e63	Grad input padding support for dilation argument (#33872 ) Summary: Fix https://github.com/pytorch/pytorch/issues/16012 It replaces https://github.com/pytorch/pytorch/pull/20684 that has gone stale and simply adds tests on top of it. These calls used to crash, they now work and return the same value as the backward using the autograd engine. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33872 Differential Revision: D20148360 Pulled By: albanD fbshipit-source-id: 1113f1a25be238570fa8900fc1be658b61a47802	2020-04-09 11:09:55 -07:00
Sebastian Messmer	e311e53abe	Revert D18672405: Revert D18672405: Use codegen'ed unboxing wrappers (#36010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36010 - ghstack-source-id: 101645932 Test Plan: CI Differential Revision: D20577433 fbshipit-source-id: f80fb62de68d1d11ea05dd1f694b36356a06189b	2020-04-09 10:45:28 -07:00
Daya Khudia	866227cfb3	[pt][quant] Add vector path to copy kernel for quantized data types (#36189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36189 We only had a scalar path for the copy kernel for quantized data types. This diff adds a vector path. It should improve all the ops where copy is used. This results in 10x better performance for mul_scalar in one of the benchmarked models. ### Before: ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.16% 171.287us 0.16% 171.287us 171.287us 1 quantized::conv2d 56.65% 58.830ms 56.65% 58.830ms 387.040us 152 quantized::add_scalar 6.02% 6.256ms 6.02% 6.256ms 67.270us 93 quantized::relu6 2.04% 2.121ms 2.04% 2.121ms 22.808us 93 quantized::mul_scalar 19.33% 20.076ms 19.33% 20.076ms 215.876us 93 quantized::mul 13.79% 14.320ms 13.79% 14.320ms 124.520us 115 quantized::add 1.17% 1.215ms 1.17% 1.215ms 43.388us 28 adaptive_avg_pool2d 0.04% 41.684us 0.64% 661.083us 28.743us 23 _adaptive_avg_pool2d 0.60% 619.399us 0.60% 619.399us 26.930us 23 sigmoid 0.17% 180.745us 0.17% 180.745us 8.216us 22 dropout 0.00% 1.798us 0.00% 1.798us 1.798us 1 view 0.01% 8.529us 0.01% 8.529us 8.529us 1 dequantize 0.01% 7.481us 0.01% 7.481us 7.481us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 103.849ms ``` ### After: ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.23% 193.581us 0.23% 193.581us 193.581us 1 quantized::conv2d 68.66% 58.702ms 68.66% 58.702ms 386.197us 152 quantized::add_scalar 7.11% 6.082ms 7.11% 6.082ms 65.401us 93 quantized::relu6 2.40% 2.056ms 2.40% 2.056ms 22.104us 93 quantized::mul_scalar 2.34% 2.001ms 2.34% 2.001ms 21.513us 93 quantized::mul 16.85% 14.410ms 16.85% 14.410ms 125.308us 115 quantized::add 1.34% 1.149ms 1.34% 1.149ms 41.033us 28 adaptive_avg_pool2d 0.05% 46.415us 0.78% 667.620us 29.027us 23 _adaptive_avg_pool2d 0.73% 621.205us 0.73% 621.205us 27.009us 23 sigmoid 0.25% 215.650us 0.25% 215.650us 9.802us 22 dropout 0.00% 2.503us 0.00% 2.503us 2.503us 1 view 0.01% 11.608us 0.01% 11.608us 11.608us 1 dequantize 0.01% 9.221us 0.01% 9.221us 9.221us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 85.500ms ``` Test Plan: buck test //caffe2/test:quantization -- 'test_qtensor_copy' --print-passing-details Reviewed By: jspark1105 Differential Revision: D20906956 fbshipit-source-id: d538b8dc0d031ce61cb1b0af14a1c012976d75b1	2020-04-09 10:43:18 -07:00
Nick Gibson	1443db8dc3	[TensorExpr] fix bug in IRSimplifier when multiplying by 0 (#36287 ) Summary: In the IR Simplifier we were not treating multiply by zero specially, which meant some constant expressions were stored in formats that were not constant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36287 Differential Revision: D20937497 Pulled By: nickgg fbshipit-source-id: 528e430313ea048524d7a4a0256eef4a0297438b	2020-04-09 09:55:16 -07:00
peter	5061ef63f4	Revert "Revert D20885968: [pytorch][PR] Enable backtrace with MSVC" (#36205 ) Summary: This reverts commit 8afa001d898914a48d6b9e3d944a99607d2819c1 and made a few improvements including the following items. 1. return `std::string` for `get_module_base_name` 2. eliminate `module should always be true` warning 3. do `SymInitialize` and `SymCleanup` once to save time Pull Request resolved: https://github.com/pytorch/pytorch/pull/36205 Reviewed By: malfet Differential Revision: D20919672 Pulled By: ezyang fbshipit-source-id: 0063a478779feb106459af48063485ef676008a5	2020-04-09 09:48:41 -07:00
Hector Yuen	423b01431b	make vendor match with this implementation (#36302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36302 used 1/sqrt(x) vs rsqrt(x) Test Plan: tested with the seed from testwarden 1586230820 tested without the seed Differential Revision: D20939672 fbshipit-source-id: c7be030c4ae42e78765edda2ce1ad2e213a46030	2020-04-09 09:42:34 -07:00
Vasiliy Kuznetsov	f813e7184e	add quantized layer norm implementation (#35329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35329 Adds a quantized implementation of LayerNorm for server. A future PR will add the Python wrapper. Test Plan: numerics match the floating point implementation benchmarks by input size: v1 (mean+var non-vectorized): https://gist.github.com/vkuzo/f6d72c04742608112f4c2e612c74bd13 v2 (mean+var vectorized in float): https://gist.github.com/vkuzo/4dd95657c5b5f3654e0965db00eff8d2 v3 (mean+var vectorized in int, current): https://gist.github.com/vkuzo/57a75f75629da9f23b64b38ca0e3d34b Imported from OSS Differential Revision: D20768930 fbshipit-source-id: ddf8727e9840c65ead3b890220af0638c5637028	2020-04-09 09:11:41 -07:00
Vasiliy Kuznetsov	23e5f6a7be	Add avx2 integer horizontal sum and sum of squares to vec256 qint types (#35693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35693 Adds utility functions to quantized int types of vec256 to calculate horizontal sums and sums of squares using avx2 intrinsics. This is useful for quantized implementations of various normalization layers (LayerNorm, GroupNorm, InstanceNorm), where we need to calculate the mean and variance of a layer of quantized ints. Test Plan: Adhoc c++ tester for the correctness of the avx2 functions: https://gist.github.com/vkuzo/0380f450793cd5c05abbeacb6d3883ae Run with: ``` -lstdc++ -mavx2 -lm -ldl -o main main.cpp && ./main ``` The integration bits and performance will be tested in the next PR in the stack where we will hook quantized Layernorm to use this. Imported from OSS Differential Revision: D20768804 fbshipit-source-id: 4720dd358dde0dabbab8e1a33a67be55925d98f9	2020-04-09 09:10:10 -07:00
Jiakai Liu	126d00c8dd	[pytorch] move force schema registration output into a separate file (#36284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36284 ATen/gen.py's `force_schema_registration` flag was added in #34622 to unblock c10 boxing for custom build, as full-JIT frontend expects certain op schemas are always registered (the actual op implementation can be skipped if it's not used). The flag didn't work together with `per_op_registration` flag, which was added for FB BUCK selective build. This PR made it work with `per_op_registration` flag, by moving schema registrations to a separate file. This way, internal full-JIT can include the new source file while lite-JIT can ignore it. OSS custom build should still work as before. Updated table of codegen flags and 5 build configurations that are related to mobile: ``` +--------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------+ \| \| Open Source \| FB BUCK \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| \| Default Build \| Custom Build w/ Stat-Disp \| Custom Build w/ Dyna-Disp \| Full-JIT \| Lite-JIT \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| Dispatch Type \| Static \| Static \| Dynamic \| Dynamic \| Dynamic \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| ATen/gen.py \| \| \| \| \| \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| --op_registration_whitelist \| unset \| used root ops \| closure(used root ops) \| unset \| closure(possibly used ops) \| \| --backend_whitelist \| CPU Q-CPU \| CPU Q-CPU \| CPU Q-CPU \| CPU Q-CPU \| CPU Q-CPU \| \| --per_op_registration \| false \| false \| false \| true \| true \| \| --force_schema_registration \| false \| true \| true \| true \| true (output unused) \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| tools/setup_helpers/generate_code.py \| \| \| \| \| \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| --disable-autograd \| true \| true \| true \| false \| WIP \| \| --selected-op-list-path \| file(used root ops) \| file(used root ops) \| file(used root ops) \| unset \| unset \| \| --selected-op-list (WIP) \| unset \| unset \| unset \| unset \| used root ops \| \| --force_schema_registration (WIP) \| false \| true \| true \| true \| false \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ ``` ghstack-source-id: 101840182 Test Plan: - check OSS CI; - patch D20577433 on top of this change to make sure test passes on it; - check mobile build size bot; Differential Revision: D20932484 fbshipit-source-id: 5028a6f90f2c7ee66fc70c562643b536a32b4d33	2020-04-09 09:00:52 -07:00
Donna Choi	fdf7a833e7	Address printing inconsistency between float and complex tensors (#35841 ) Summary: See issue [https://github.com/pytorch/pytorch/issues/33494 Complex number printing inconsistent with float](https://github.com/pytorch/pytorch/issues/33494). Changes introduces an optional argument in Formatter's ```format``` function to discern whether a tensor is a float tensor or not. This way, there is consistency between float tensors and complex tensors so that the complex tensors print in the same manner as float tensors: - Only a decimal point and no zeros for integer values. - Trailing zeros only if the value is truly a float. - White space introduced to fill the gap so that +/- symbols and commas align. Here are some example outputs. ``` print(torch.zeros((2,2), dtype=torch.float64)) ``` yields ``` tensor([[0., 0.], [0., 0.]], dtype=torch.float64) ``` ``` print(torch.zeros((2,2), dtype=torch.complex64)) ``` previously yielded ``` tensor([[(0.0000 + 0.0000j), (0.0000 + 0.0000j)], [(0.0000 + 0.0000j), (0.0000 + 0.0000j)]], dtype=torch.complex64) ``` and now yields ``` tensor([[(0 + 0.j), (0 + 0.j)], [(0 + 0.j), (0 + 0.j)]], dtype=torch.complex64) ``` This new print version is more consistent with float tensor's pretty print. The following example mixes integer and decimals: ``` print(torch.tensor([[1 + 1.340j, 3 + 4j], [1.2 + 1.340j, 6.5 + 7j]], dtype=torch.complex64)) ``` This yields: ``` tensor([[ (1.0000 + 1.3400j), (3.0000 + 4.0000j)], [ (1.2000 + 1.3400j), (6.5000 + 7.0000j)]], dtype=torch.complex64) ``` The following example ``` torch.tensor([1,2,3,4.5]) ``` yields ``` tensor([1.0000, 2.0000, 3.0000, 4.5000]) . ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35841 Differential Revision: D20893848 Pulled By: anjali411 fbshipit-source-id: f84c533b8957a1563602439c07e60efbc79691bc	2020-04-09 08:54:25 -07:00
Ailing Zhang	2b30e7fe11	Move inplace view tests to generic testing framework (#36281 ) Summary: So that all these tests run on CUDA as well. This PR is preparation for https://github.com/pytorch/pytorch/pull/36073 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36281 Differential Revision: D20931467 Pulled By: ailzhang fbshipit-source-id: e70c2c1981d9557c4b7ed5e0bd85345e298bf63c	2020-04-09 08:48:01 -07:00
Shen Li	ddf5755ff8	Fix DDP error checking for unused parameters (#36054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36054 Test Plan: Imported from OSS Differential Revision: D20865498 Pulled By: mrshenli fbshipit-source-id: 6dbb7b9b1d1ace3a8a619431330c260bfc43cefd	2020-04-09 08:06:25 -07:00
Masaki Kozuki	7403545518	Fix exception message of `torch.optim.AdamW`. (#36088 ) Summary: PyTorch does not implement `SparseAdamW`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36088 Differential Revision: D20932357 Pulled By: gchanan fbshipit-source-id: 49e5b72c34ff8ce0deb6b3807662b8b7d67d959f	2020-04-09 08:02:10 -07:00
Nik Ved	075b732f26	doc fix for KLDivLoss (#36137 ) Summary: Fixes doc for KLDivLoss as per [this comment](https://github.com/pytorch/pytorch/pull/34586#discussion_r404442741). Pull Request resolved: https://github.com/pytorch/pytorch/pull/36137 Differential Revision: D20932395 Pulled By: gchanan fbshipit-source-id: ecc395e6bc689fbf758e2cdca946049de8963856	2020-04-09 07:54:56 -07:00
Pavel Belevich	5bbcddae3b	Add at::Generator to IValue (#36231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36231 Differential Revision: D20923443 Pulled By: pbelevich fbshipit-source-id: 0cc00a5c1f7bb2fb5525416291c4dd870d23a881	2020-04-09 06:57:32 -07:00
Pavel Belevich	ea8e347135	Replace std::shared_ptr with c10::intrusive_ptr in at::Generator (#36230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36230 To make `at::Generator` compatible with `IValue` this PR replaces `std::shared_ptr<c10::GeneratorImpl>` with `c10::intrusive_ptr<c10::GeneratorImpl>` Differential Revision: D20923377 Pulled By: pbelevich fbshipit-source-id: 3cb4214900023d863e5f2fe4ea63ec8aeb30936a	2020-04-09 06:55:54 -07:00
Mike Ruberry	62f9312abd	Revert D20783298: Fix naming of "strides" method in TensorType Test Plan: revert-hammer Differential Revision: D20783298 Original commit changeset: 8fcc146284af fbshipit-source-id: 30e3cb6d7a30d82048534d4d2e794b7e08ae01bb	2020-04-09 04:24:43 -07:00
Aapo Kyrola	7487b2a184	[caffe2][debuggability] add length checks to MergeMultiScalarFeatureTensors (#36248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36248 Add basic index/length checks to MergeMultiScalarFeatureTensors to avoid segfaults. But I don't really understand this op: what would cause this mismatch (see test plan) -- would iike to add it to the assertion description. Reviewed By: houseroad Differential Revision: D20912048 fbshipit-source-id: 29ef8c4bd261a48d64cbef6aa4f0306d7f058e71	2020-04-09 02:57:21 -07:00
Pritam Damania	5f25e98fc7	Use _sparse_coo_tensor_unsafe to shallow copy sparse tensors in accumulate_grad (#36292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36292 As reported in https://github.com/pytorch/pytorch/issues/36120, sparse_coo_tensor has some expensive checks and we were using that to shallow copy a sparse tensor in AccumulateGrad. This can be avoided by using _sparse_tensor_coo_unsafe since we're just reusing the indices and values from a valid sparse tensor to shallow copy it. Using the benchmark code mentioned in https://github.com/pytorch/pytorch/issues/36120, these are the results: 1) 65.1 ms on master with this PR. 2) 127.5 ms for PyTorch 1.4 3) 916.5 ms on master without this patch. ghstack-source-id: 101817209 Test Plan: waitforbuildbot Differential Revision: D20935573 fbshipit-source-id: 4661bc779c06b47b5eb677e3fd4e192d1e3cba77	2020-04-09 00:20:32 -07:00
Rohan Varma	f59e646faa	[rpc] Allow profiling in RPC to work with torchscript function invocations (#36275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36275 Calling a TorchScript function from within RPC was added after initial support for the profiler with RPC, hence, we were not recording torchscript funtions invoked under RPC correctly. This diff passes the `RecordFunction` to the `_invoke_torchscript..` calls similar to what is done for builtin and UDFs. However, this is only a temporary solution. We will be removing the use of `RecordFunction` as a standalone in the RPC code in https://github.com/pytorch/pytorch/pull/35055. This diff is to unblock recording of torchscript functions in the meantime. ghstack-source-id: 101800134 Test Plan: Added tests for calling a script function with builtin, sync, and asyc. The output looks like below: ``` ------ --------------- --------------- --------------- --------------- --------------- > Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls > ---------------------------------------------------------------------------------------------------------- --------- ------ --------------- --------------- --------------- --------------- --------------- > rpc_sync#__torch__.torch.testing._internal.distributed.rpc.rpc_test.my_script_func(worker1 -> worker2) 99.92% 1.056s 99.92% 1.056s 1.056s 1 > select 0.04% 383.661us 0.04% 383.661us 95.915us 4 > fill_ 0.02% 210.966us 0.02% 210.966us 52.741us 4 > to 0.00% 26.276us 0.00% 26.276us 26.276us 1 > empty 0.02% 159.802us 0.02% 159.802us 79.901us 2 > set_ 0.01% 93.818us 0.01% 93.818us 93.818us 1 > ---------------------------------------------------------------------------------------------------------- --------- ------ --------------- --------------- --------------- --------------- --------------- > Self CPU time total: 1.057s ``` Note that we use `torch.jit._qualified_name` to get the name of the script fn. Differential Revision: D20930453 fbshipit-source-id: c6d940aa44fcd9dd8a1a29c156aa19e0d8428d60	2020-04-08 23:58:36 -07:00
svcscm	3d199aab08	Updating submodules Summary: GitHub commits: `83fc90b3df` `1910f8c0e3` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 1d08aaea71b4896e8d214a00abae03944901e748	2020-04-08 21:23:38 -07:00
Yinghai Lu	dd36f8c21b	[FBGEMM] Open sourcing fbgemm_fp16 ops (#36212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36212 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/342 So that we can let vendor use this as a reference for fp16 emulated ops. Will modify the dependent TARGETS and CMakefiles. Test Plan: ``` buck test deeplearning/fbgemm: ``` Reviewed By: hyuen Differential Revision: D20911460 fbshipit-source-id: bb8a43e13591f295727fe1ecc74eca4ca85ab5b8	2020-04-08 20:44:18 -07:00
Jeremy Lilley	291c910e85	[future] Re-land some safe portions of the future change. (#36254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36254 These future use changes were all landed yesterday as part of the future refactoring, quickly reverted due to an observed OOM, but now being relanded, since they've since been tested to be benign. ghstack-source-id: 101776613 Test Plan: buck test mode/dev-nosan caffe2/test/... not ooming: buck run mode/opt -c=python.package_style=inplace //caffe2/torch/fb/training_toolkit/examples:ctr_mbl_feed_integration -- prod Differential Revision: D20924010 fbshipit-source-id: 28872e488df34c7a886bcd659fa7e9914639d306	2020-04-08 20:05:33 -07:00
Nikita Shulga	2458f6c63e	Move all nccl from torch_python to torch_cuda (#36193 ) Summary: Because `torch_python` is supposed to be thin wrapper around `torch` In this PR, all invocation of functions from nccl library are moved from python_nccl.cpp (which is part of torch_python) to nccl.cpp (which is part of torch_cuda) Pull Request resolved: https://github.com/pytorch/pytorch/pull/36193 Test Plan: CI Differential Revision: D20930047 Pulled By: malfet fbshipit-source-id: 7f278610077df6ac5dc3471c1a1b5d51e653ef9c	2020-04-08 18:01:47 -07:00
Robin Lobel	34a10238d5	fix is_float_scale_factor warning (c++) (#35601 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35601 Differential Revision: D20925642 Pulled By: yf225 fbshipit-source-id: a4e1f953efce04b3f399a8e526fb6c055cc2971c	2020-04-08 17:52:09 -07:00
Hong Xu	3a8838840b	Add comparison operators to Vec256<BFloat16> (#36106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36106 Test Plan: Imported from OSS Differential Revision: D20927638 Pulled By: ngimel fbshipit-source-id: 6831747baab1af9d794011e2c7ae0291828dea2b	2020-04-08 17:03:36 -07:00
Hong Xu	0bc17ddaa9	Use templates instead of macro when defining Vec256<BFloat16> bin operators (#35844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35844 Also, bitwise operators can operate on the underlying __m256i representation directly instead of making expensive conversions to float16. Test Plan: Imported from OSS Differential Revision: D20927639 Pulled By: ngimel fbshipit-source-id: 148c503df090580c8504f0df8d6ed2648d614120	2020-04-08 17:02:22 -07:00
Nikita Shulga	0f34d648c8	Fix signed-unsigned warnings (RELAND) (#36224 ) Summary: This is a realand of https://github.com/pytorch/pytorch/pull/36196 Before the fix bazel spews following multi-line warning for every single caffe2 operator: ``` In file included from ./c10/util/logging_is_google_glog.h:50, from ./c10/util/Logging.h:26, from ./caffe2/core/logging.h:2, from ./caffe2/core/blob.h:13, from ./caffe2/core/operator.h:18, from ./caffe2/sgd/adadelta_op.h:1, from caffe2/sgd/adadelta_op.cc:1: bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h: In instantiation of 'std::string* google::Check_LTImpl(const T1&, const T2&, const char*) [with T1 = int; T2 = long unsigned int; std::string = std::__cxx11::basic_string<char>]': ./caffe2/core/operator.h:192:5: required from 'const T& caffe2::OperatorBase::Input(int, caffe2::DeviceType) [with T = caffe2::Tensor; caffe2::DeviceType = c10::DeviceType]' ./caffe2/core/operator.h:890:48: required from 'const caffe2::Tensor& caffe2::Operator<Context>::Input(int, caffe2::DeviceType) [with Context = caffe2::CPUContext; caffe2::DeviceType = c10::DeviceType]' ./caffe2/sgd/adadelta_op.h:87:5: required from 'bool caffe2::SparseAdadeltaOp<Context>::RunOnDevice() [with Context = caffe2::CPUContext]' ./caffe2/sgd/adadelta_op.h:85:8: required from here bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:32: warning: comparison of integer expressions of different signedness: 'const int' and 'const long unsigned int' [-Wsign-compare] 722 \| DEFINE_CHECK_OP_IMPL(Check_LT, < ) \| ^ bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:148:53: note: in definition of macro 'GOOGLE_PREDICT_TRUE' 148 \| #define GOOGLE_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1)) \| ^ bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:1: note: in expansion of macro 'DEFINE_CHECK_OP_IMPL' 722 \| DEFINE_CHECK_OP_IMPL(Check_LT, < ) \| ^~~~~~~~~~~~~~~~~~~~ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36224 Test Plan: CI Differential Revision: D20919506 Pulled By: malfet fbshipit-source-id: b8b4b7c62dcbc109b30165b19635a6ef30033e73	2020-04-08 16:29:27 -07:00
David Reiss	16980e455f	Fix naming of "strides" method in TensorType (#35170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35170 Looks like this was renamed by accident in 0cbd7fa46f2 Test Plan: Unit test. Imported from OSS Differential Revision: D20783298 fbshipit-source-id: 8fcc146284af022ec1afe8d651baf6721b190ad3	2020-04-08 15:59:28 -07:00
Supriya Rao	6972c27d94	[quant] Enable fusion for conv modules with bias (#36173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36173 Previously we were ignoring the conv bias during training if it existed This PR adds the bias from the conv op during the conv+bn fusion process Test Plan: python test/quantization/test_quantization.py Imported from OSS Differential Revision: D20921613 fbshipit-source-id: eacb2ccf9107f413ac4ef23163ba914af9b90924	2020-04-08 15:53:32 -07:00
Nick Gibson	caa45c8e33	[TensorExpr] fix warnings (#36167 ) Summary: Fix a bunch of minor warnings in jit/tensorexpr, mostly unused variable & wrong sign comparisons. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36167 Differential Revision: D20905081 Pulled By: nickgg fbshipit-source-id: 16fe605a86f08596f64e74e9337c59a2581a4d5a	2020-04-08 15:42:29 -07:00
Shen Li	76c7652cc5	Add distributed data parallel benchmark tool (#35198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35198 The need for this tool was motivated by #28883. In the past, we have done ad-hoc benchmarking, but it's time for something more structured. It would be nice to add more model architectures so that we can get a full picture of the performance impact of a code change simply by running this suite a few times. Test Plan: Imported from OSS Differential Revision: D20591296 Pulled By: mrshenli fbshipit-source-id: ee66ce0ebca02086453b02df0a94fde27ab4be49	2020-04-08 15:07:03 -07:00
Elias Ellison	4f3af09162	[JIT] Incremental updates to Alias Db in Mutation Remover pass (#35421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35421 This PR makes it so that we don't have to rebuild the entire alias db each time we remove a node in alias analysis. Test Plan: Imported from OSS Differential Revision: D20922470 Pulled By: eellison fbshipit-source-id: 9f43ed6dc743bf8a6b84a4aa38cff7059d46741d	2020-04-08 15:00:44 -07:00
Elias Ellison	4db87f4f97	[JIT] Allow mutated values as functional graph inputs (#33297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33297 Allowing mutated values as inputs but not outputs has the effect of buffering up all mutated values as inputs to the graph. Just as we values which escape scope as graph inputs but not graph outputs - we should also allow values that get mutated. In both cases, the contract is that that the functional graph cannot write to graph inputs. Without this patch, if there is a single write to the Tensor wildcard set it would disable all optimization. Test Plan: Imported from OSS Differential Revision: D20607175 Pulled By: eellison fbshipit-source-id: c698e7cf3374e501cd5d835663991026a113ec6b	2020-04-08 14:59:26 -07:00
Edward Yang	6016f694c0	Revert D20901746: [pytorch][PR] Update docs for master to remove Python 2 references Test Plan: revert-hammer Differential Revision: D20901746 Original commit changeset: 07f8dc8e6fab fbshipit-source-id: 13c55597f9f79b8473210cf35a5a0f1fb34bae39	2020-04-08 14:49:11 -07:00
Edward Yang	83907ded1d	Revert D20895316: [pytorch][PR] [JIT] List reland Test Plan: revert-hammer Differential Revision: D20895316 Original commit changeset: 9a2bc0e6bdcb fbshipit-source-id: d135f0038cf240a0973ecfcd540121cbd4ecb5a7	2020-04-08 14:40:10 -07:00
Jongsoo Park	7c76c71616	[caffe2] remove quant options of SparseAdagrad from OSS (#35608 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35608 Various quantization options in SparseAdagradOp is only for experimental purposes and was unnecessarily complicating the code. Moving these options back to internal code, merge into SparseSimdAdagradStochasticQuantOp, and change the name to SparseSimdAdagradFakeQuantOp Test Plan: CI Differential Revision: D20720426 fbshipit-source-id: 34c8fdea49f239c795f63e978ab13c8f535609d2	2020-04-08 14:33:23 -07:00
Lingyi Liu	3be6a4db4d	improve the quantized batch_norm performance (#35639 ) Summary: The original batch_norm performance is 2X slower than C2 for some shape, especially for the remaining channel size close to 32. For example, we have a total channel size 321 + 24. The 24 channel execution in original implementation will be slow. Benchmark ``` import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('*', str(dtype), '**') x = torch.rand(1, 4, 56, 56, 24) q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype) q_x = q_x.permute([0, 4, 1, 2, 3]) c = 24 mean = torch.rand(c).float() var = torch.rand(c).float() weight = torch.rand(c).float() bias = torch.rand(c).float() eps = 0.001 x = x.permute([0, 4, 1, 2, 3]) NITER = 10 s = time.time() for i in range(NITER): float_out = torch.nn.functional.batch_norm(x, weight=weight, bias=bias, running_mean=mean, running_var=var, training=False, momentum=0, eps=eps) float_out = torch.nn.functional.relu(float_out) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.ops.quantized.batch_norm3d_relu(q_x, weight, bias, mean, var, eps, 0.5, 1) time_per_iter_quant = (time.time() - s) / NITER print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') ``` ``` ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 0.6527423858642578 1.649641990661621 2.5272481554532837 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 0.5787134170532227 1.040959358215332 1.7987475796152104 torch.qint32 * time/iter ms (float) time/iter ms (quant) quant/float 0.5466938018798828 2.262735366821289 4.138944614042739 ``` //Before the change: ``` torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 0.7526159286499023 2.330636978149414 3.0967149238128426 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 0.21767616271972656 1.3946294784545898 6.406900328587075 torch.qint32 *** time/iter ms (float) time/iter ms (quant) quant/float 0.24483203887939456 2.561521530151367 10.46236245009251 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35639 Differential Revision: D20723292 Pulled By: lly-zero-one fbshipit-source-id: 66692eabaffb5030c2a37ec0f1322df3665411aa	2020-04-08 14:28:17 -07:00
Pritam Damania	82dd01150c	Fix race during RPC shutdown. (#36113 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36113 As part of debugging https://github.com/pytorch/pytorch/issues/35863, I discovered that the unit test would timeout during clean shutdown. Looking into this further, it looks like there is a race in `_on_leader_follower_report_shutdown_intent` when multiple followers call the same method on the leader. To fix this, I've ensured we have an appropriate lock in `_on_leader_follower_report_shutdown_intent` to guard against this. I ran the test 500 times to validate that this fix works. Closes #35863 ghstack-source-id: 101641463 Test Plan: 1) waitforbuildbot 2) Ran the test 500 times. Differential Revision: D20884373 fbshipit-source-id: 9d580e9892adffc0c9a4c2e832881fb291a1ff16	2020-04-08 14:12:33 -07:00
Nikita Shulga	5910c51545	Exclude `torch/csrc/cuda/nccl` from `clang-tidy` (#36249 ) Summary: Since workflow configures pytorch with 'USE_NCCL` set to 0, we can not tidy those files Pull Request resolved: https://github.com/pytorch/pytorch/pull/36249 Differential Revision: D20926213 Pulled By: malfet fbshipit-source-id: 69c051b7d22fb5f19147a7955782a7de5137f740	2020-04-08 14:06:00 -07:00
David Reiss	fab06bfb75	Add utility for bundling sample inputs with models (#35631 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35631 Bundling sample inputs with our models with a standardized interface will make it possible to write benchmarking and code-coverage tools that call all models in a uniform way. The intent is to make this a standard for mobile models within Facebook. Putting it in torch/utils so tests can run on GitHub and because it might be useful for others as well. `augment_model_with_bundled_inputs` is the primary entry point. See its docstring for usage information and the test for some example uses. One design question I had was how much power should be available for automatic deflating and inflating of inputs. The current scheme gives some automatic handling and a reasonable escape hatch ("_bundled_input_inflate_format") for top-level tensor arguments, but no automatic support for (e.g.) tensors in tuples or long strings. For more complex cases, we have the ultimate escape hatch of just defining _generate_bundled_inputs in the model. Another design question was whether to add the inputs to the model or wrap the model in a wrapper module that had these methods and delegated calls to `forward`. Because models can have other exposed methods and attributes, the wrapped seemed too onerous. Test Plan: Unit test. Differential Revision: D20925013 Pulled By: dreiss fbshipit-source-id: 4dbbb4cce41e5752133b4ecdb05e1c92bac6b2d5	2020-04-08 13:10:36 -07:00
David Reiss	645d57ea01	Expose JIT Module's "register_attribute" to Python (#35630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35630 Prefix underscored for now because the semantics of this method can be confusing. It adds a new attribute to the type, which can be shared by several objects. Test Plan: Next diff in stack uses it, and has unit tests. Imported from OSS Differential Revision: D20904253 fbshipit-source-id: dcbf60eacf0e0e075c19238165aa33954aa73b5f	2020-04-08 13:09:28 -07:00
Nikita Shulga	246416ac3b	`clang-tidy` workflow only needs `cuda-toolkit` (#36241 ) Summary: `cuda` metapackage install both kernel driver + runtime libraries + toolchain, while `cuda-toolkit` metapackage as name suggests installs only toolchain + library headers. This reduces install dependencies time for `clang-tidy` step by 60+ sec Pull Request resolved: https://github.com/pytorch/pytorch/pull/36241 Test Plan: CI Differential Revision: D20923839 Pulled By: malfet fbshipit-source-id: 1e773285222bed179973449573215fcaee1983de	2020-04-08 12:29:58 -07:00
jjsjann123	9a2b505563	[JIT] Shape inference improvement (#35051 ) Summary: Support `aten::div` in `PropagateCompleteShapeOnNode`. complete shape propagation on `aten::div` is disabled, because shape inference relies on running node to propagate shape. For `aten::div` we run into deviding-by-zero problem. However, shape propagation for pointwise operatoins should be identical. We would be able to swap the operation for `aten::div` with `aten::mul`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35051 Differential Revision: D20921359 Pulled By: eellison fbshipit-source-id: 344371f34724a1b6bb2f853ebb4cef80423a4f9f	2020-04-08 12:28:40 -07:00
Nick Gibson	195362d74c	[TensorExpr] scalar factorization of Div (#36154 ) Summary: Add support for the TensorExpr IR Simplifier to factorize common terms on either side of a Div node. e.g. `(8 * x) / (4 * y) => (2 * x) / y`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36154 Differential Revision: D20910580 Pulled By: nickgg fbshipit-source-id: ee071d93bc4711b1e710be312de599d18ab506f3	2020-04-08 11:56:07 -07:00
Jeremy Lilley	a91535930f	[future] Undo some recent torch::utils::Future api changes (#36220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36220 The torch::utils::Future change from yesterday may have introduced a reference cycle, leading to OOM on PS. This change reverts the lambda capture changes with torch::utils::Future until we can analyze further. ghstack-source-id: 101756106 Test Plan: ctr mobile feed: buck run mode/opt -c=python.package_style=inplace //caffe2/torch/fb/training_toolkit/examples:ctr_mbl_feed_integration -- prod-preset Differential Revision: D20918904 fbshipit-source-id: d637f2370aa72c1765b98f3b9e10eb969a025624	2020-04-08 11:28:22 -07:00
meganset	93256617c8	C++ Adam optimizer - corrected messages for check of default options (#36161 ) Summary: Modified messages in the check of default options for the Adam optimizer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36161 Differential Revision: D20920140 Pulled By: yf225 fbshipit-source-id: e697ef1741d4dd86f7f18dc0be2c3b4bd3894d8f	2020-04-08 11:22:50 -07:00
Marian Ivanov	ae71c5c7e6	Optimized bincount for the CPU by removing extra size() calls (#35822 ) Summary: By removing the calls of `size` that were effectively nops, I've managed to make `bincount_cpu` run around 6 times faster on my machine. EDIT: (Running Windows 10, I'm suspecting this may be a Windows-specific bug) For histogramming 1e7 samples with 1e5 bins, best of 20 with 10 runs each Before: 3.201189 After: 0.466188 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35822 Differential Revision: D20919885 Pulled By: ezyang fbshipit-source-id: 1657056d69a02f1e61434f4cc8fa800f8d4e1fe8	2020-04-08 11:09:14 -07:00
Peter Bell	e99c53dc86	Fix broadcast_coalesce for empty tensors (#35965 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35470 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35965 Differential Revision: D20919377 Pulled By: ezyang fbshipit-source-id: cfbcb35a44507de1c3ed7e0732cfc3b124b9bc0b	2020-04-08 11:02:11 -07:00
Orion Reblitz-Richardson	38849e119f	[pytorch] Add error when PyTorch used with Python 2 (#36151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36151 Python 2 has reached end-of-life and is no longer supported by PyTorch. To avoid confusing behavior when trying to use PyTorch with Python 2, detect this case early and fail with a clear message. This commit covers `import torch` only and not C++ for now. Test Plan: waitforsandcastle Reviewed By: dreiss Differential Revision: D20894381 fbshipit-source-id: a1073b7a648e07cf10cda5a99a2cf4eee5a89230	2020-04-08 10:40:27 -07:00
Hector Yuen	f99e6370dc	fix build breakage of //sigrid/... (#36206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36206 nothing wrong with the code, adding appropriate casts to keep the compiler happy Test Plan: build //sigrid/... tests in the same directory buck test //glow/glow/tests/fakelowp/... Reviewed By: jspark1105 Differential Revision: D20911279 fbshipit-source-id: 086ef028006a53048e1cfbe9dbc6c4bdd18fb259	2020-04-08 10:27:27 -07:00
Nikita Shulga	07306406ce	s/repo.continuum.io/repo.anaconda.com/ (#36233 ) Summary: Followup after https://github.com/pytorch/pytorch/pull/36201 Per https://github.com/conda/conda/issues/6886 `repo.anaconda.com` should have been used since Feb 2019 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36233 Test Plan: CI Differential Revision: D20920706 Pulled By: malfet fbshipit-source-id: 1a9027e60df4de21111731d7fbda28c02846b417	2020-04-08 10:06:50 -07:00
Elias Ellison	9ada7abc18	[JIT] fix comprehension scope writes (#36105 ) Summary: In a comprehension like: ``` def f()->int: i = 1 x = [i for i in range(7)] return i ``` the variables inside the comprehension do not write to the function environment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36105 Differential Revision: D20880699 Pulled By: eellison fbshipit-source-id: 40af0f7470e0baeff7ef158cb461bf85c816d169	2020-04-08 10:00:45 -07:00
Ailing Zhang	b9fc4358d6	Enabled debug symbol in test_cpp_api_parity tests by default. (#36209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36209 Differential Revision: D20920042 Pulled By: yf225 fbshipit-source-id: 7e8f5a54bdb90d4a01f59e1f68cf036bf3620293	2020-04-08 09:34:32 -07:00
peter	b9260bdb7b	Don't build deps for `python setup.py egg_info` (#36208 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/36207. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36208 Differential Revision: D20919649 Pulled By: ezyang fbshipit-source-id: b5242a540181b29dba8987fb5f00332e1e81ca98	2020-04-08 09:02:01 -07:00
Artyom Astafurov	901bb3c350	Delete as_variable_ref (#36096 ) Summary: This PR closes https://github.com/pytorch/pytorch/issues/34895 and builds on work started by ayushtues in https://github.com/pytorch/pytorch/pull/35184 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36096 Reviewed By: zou3519 Differential Revision: D20893693 Pulled By: astaff fbshipit-source-id: 13aac1feaef3bcf86f7a4cf92d26e7a1ae43a3b3	2020-04-08 08:57:01 -07:00
Edward Yang	4c8e38c6d7	Minor doc improvement for code_analyzer (#36177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36177 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20904241 Pulled By: ezyang fbshipit-source-id: b13584dfdb1f852e451b1295c0d4cd4a7f53712f	2020-04-08 08:14:50 -07:00
Gregory Chanan	5a03664fd5	Attempt to fix the pytorch_cpp_doc_push build by pinning breathe. (#36190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36190 Differential Revision: D20907613 Pulled By: gchanan fbshipit-source-id: 1ae04e12c0920b4fe566604f4d1cab2773117532	2020-04-08 07:42:55 -07:00
Akshay Bhandary	83abd7ffbf	Revert D20909696: [pytorch][PR] Fix signed-unsigned warnings Test Plan: revert-hammer Differential Revision: D20909696 Original commit changeset: 16723355f473 fbshipit-source-id: e1cf6e9d42f852693549a94d7f5830196781f00e	2020-04-08 01:21:04 -07:00
Nikolay Korovaiko	6f8017bf07	Enable simple executor for FBCODE (#34748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34748 Differential Revision: D20909390 Pulled By: Krovatkin fbshipit-source-id: b3d0c981825d362d3d4f9012ff8151ffc7a59671	2020-04-08 00:19:49 -07:00
Nikita Shulga	f0bddd5e7a	Fix clang-format broken by https://github.com/pytorch/pytorch/pull/33788 (#36203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36203 Test Plan: CI Differential Revision: D20911028 Pulled By: malfet fbshipit-source-id: 93af66ce35139700118efacb5cb6c68175cb66d5	2020-04-07 23:01:07 -07:00
Mingzhe Li	c04232ae2b	Back out "[reland] Skip OpenMP Thread when OMP_NUM_THREADS is 1" (#36198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36198 Original commit changeset: 4476a810dfe7 With the previous diff, when user sets KMP_AFFINITY, it will be ignored when OMP_NUM_THREADS is 1. That could cause performance regression. Test Plan: n/a Reviewed By: ilia-cher Differential Revision: D20909628 fbshipit-source-id: 5738f99aa61072337146257a68189d3d03ad39f7	2020-04-07 22:54:15 -07:00
Nikita Shulga	25fe27981f	Fix signed-unsigned warnings (#36196 ) Summary: Otherwise, while bazel spews following multi-line warning for every single caffe2 operator: ``` In file included from ./c10/util/logging_is_google_glog.h:50, from ./c10/util/Logging.h:26, from ./caffe2/core/logging.h:2, from ./caffe2/core/blob.h:13, from ./caffe2/core/operator.h:18, from ./caffe2/sgd/adadelta_op.h:1, from caffe2/sgd/adadelta_op.cc:1: bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h: In instantiation of 'std::string* google::Check_LTImpl(const T1&, const T2&, const char*) [with T1 = int; T2 = long unsigned int; std::string = std::__cxx11::basic_string<char>]': ./caffe2/core/operator.h:192:5: required from 'const T& caffe2::OperatorBase::Input(int, caffe2::DeviceType) [with T = caffe2::Tensor; caffe2::DeviceType = c10::DeviceType]' ./caffe2/core/operator.h:890:48: required from 'const caffe2::Tensor& caffe2::Operator<Context>::Input(int, caffe2::DeviceType) [with Context = caffe2::CPUContext; caffe2::DeviceType = c10::DeviceType]' ./caffe2/sgd/adadelta_op.h:87:5: required from 'bool caffe2::SparseAdadeltaOp<Context>::RunOnDevice() [with Context = caffe2::CPUContext]' ./caffe2/sgd/adadelta_op.h:85:8: required from here bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:32: warning: comparison of integer expressions of different signedness: 'const int' and 'const long unsigned int' [-Wsign-compare] 722 \| DEFINE_CHECK_OP_IMPL(Check_LT, < ) \| ^ bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:148:53: note: in definition of macro 'GOOGLE_PREDICT_TRUE' 148 \| #define GOOGLE_PREDICT_TRUE(x) (__builtin_expect(!!(x), 1)) \| ^ bazel-out/k8-fastbuild/bin/external/com_github_glog/_virtual_includes/glog/glog/logging.h:722:1: note: in expansion of macro 'DEFINE_CHECK_OP_IMPL' 722 \| DEFINE_CHECK_OP_IMPL(Check_LT, < ) \| ^~~~~~~~~~~~~~~~~~~~ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36196 Differential Revision: D20909696 Pulled By: malfet fbshipit-source-id: 16723355f473379ba9da6d3c33bd561b9724800a	2020-04-07 21:31:01 -07:00
Nikita Shulga	e2f9c668a2	Use `repo.anaconda.com` instead of `repo.continuum.io` (#36201 ) Summary: Per https://github.com/conda/conda/issues/6886 `repo.anaconda.com` should have been used since Feb 2019 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36201 Test Plan: CI Differential Revision: D20910667 Pulled By: malfet fbshipit-source-id: 3a191e2cae293e6f96dbb323853e84c07cd7aabc	2020-04-07 21:09:18 -07:00
Nikita Shulga	986a8fdd6a	Use counter instead of vector of futures in `_parallel_run` (#36159 ) Summary: This should be faster than allocating one mutex, flag and conditional variable per task. Using `std::atomic<size_t>` to count remaing tasks is not sufficient, because modification of remaining counter and signalling conditional variable must happen atomically, otherwise `wait()` might get invoked after `notify_one()` was called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36159 Test Plan: CI Differential Revision: D20905411 Pulled By: malfet fbshipit-source-id: facaf599693649c3f43edafc49f369e90d2f60de	2020-04-07 18:59:56 -07:00
Rohan Varma	fc5d658324	[rpc] allow ability to abort second call to RecvWork::wait() in ProcessGroupAgent::listenLoop (#36084 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36084 https://github.com/pytorch/pytorch/pull/30330 added support to abort the call to a `RecvWork` created by `recvAnysource`, but there is an additional call to `pg_->recv()` to actually get the tensor sent over the wire (the previous call is the preamble for the tensor). This adds support to be able to abort this call as well in `::shutdown()`, which can be used to avoid hangs during ungraceful shutdown. Added an internal test case in `ProcessGroupAgentTest` to ensure that an appropriate error message is raised when this happens. ghstack-source-id: 101689402 Test Plan: Added test in ProcessGroupAgentTest. We also add a basic config that allows us to control whether to abort the call to `pg->recv()` and `pg->recvAnysource()` in `FailingWaitProcessGroupGloo`. Run test binary: ```buck build mode/dev-nosan //caffe2/torch/fb/distributed/thriftRpcBackend/test:ProcessGroupAgentTest --keep-going ~/fbcode/buck-out/gen/caffe2/torch/fb/distributed/thriftRpcBackend/test/ProcessGroupAgentTest ``` P128567144 Differential Revision: D20632764 fbshipit-source-id: c0b3c391fd3e0ae711661ad99f309ee4d93f6582	2020-04-07 18:44:56 -07:00
Nikolay Korovaiko	4b916b6b75	Mark every frame with a unique id (#33788 ) Summary: This PR introduces frame ids that will allow us to associate profiling information with its corresponding run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33788 Differential Revision: D20164897 Pulled By: Krovatkin fbshipit-source-id: 8172ff9f4d188b339e2ff98a80bbe4a2b306a8aa	2020-04-07 17:52:06 -07:00
Jeremy Lilley	72b55fea6b	[jit] Make torch::utils::Future and ivalue::future apis closer (#35849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35849 This change harmonizes some aspects of the api. - torch::utils::Future callback should have no args, like ivalue::future. Many of the lines of this change are related to fixing that up downstream. No args makes the api simpler to use, particularly since many/most of the downstream use cases ignore the passed-in args. It's simple enough to appropriately capture the future in the lambda if necessary. - Add error/hasError methods to ivalue::Future. - Use c10::optional underneath for error to ivalue::Future. - Change markCompleted(error) to setError(error) to ivalue::Future. - Add setValue(FutureError) version to torch::utils::Future ghstack-source-id: 101684435 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D20803251 fbshipit-source-id: e3d925287bd9a80d649843eef5f270163f448269	2020-04-07 17:05:35 -07:00
Jessica Lin	373dc7c8ef	Group libraries in TOC and add PyTorch Elastic (#34928 ) Summary: Move XLA out of Notes and group with other libraries. Also adds link to PyTorch Elastic ![image](https://user-images.githubusercontent.com/8042156/76912125-f76d1080-686f-11ea-99d5-bb7be199adbd.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34928 Differential Revision: D20901732 Pulled By: jlin27 fbshipit-source-id: a5da915bb435a3aa8995d8bbe87f53ef79fd3ce6	2020-04-07 16:37:45 -07:00
Elias Ellison	2afe171538	[JIT] List reland (#36146 ) Summary: Relanding https://github.com/pytorch/pytorch/pull/33783 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36146 Differential Revision: D20895316 Pulled By: eellison fbshipit-source-id: 9a2bc0e6bdcbd43f9abe51eadaa28f90bccafcc9	2020-04-07 16:18:30 -07:00
Jessica Lin	43234be525	Update docs for master to remove Python 2 references (#36114 ) Summary: Full details in task: https://our.intern.facebook.com/intern/tasks/?t=64776265 With pytroch 1.5+ we remove python2 support from PyTorch. All documentation under docs/ and on the pytorch.org website needs to remove Python 2 references. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36114 Differential Revision: D20901746 Pulled By: jlin27 fbshipit-source-id: 07f8dc8e6fab0b232e5048a63079cab0c433c85f	2020-04-07 16:13:18 -07:00
Nikita Shulga	ebf743a63a	Fix bazel-test linking issue (#36157 ) Summary: Move `src/GenerateI8Depthwise.cc` from `fbgemm_baze` to `fbgemm_avx2`, because bazel hides unsued functions across the libraries Pull Request resolved: https://github.com/pytorch/pytorch/pull/36157 Differential Revision: D20900955 Pulled By: malfet fbshipit-source-id: a180c8b54ca39bc076f42f2a740fd3b7f20750dc	2020-04-07 13:57:40 -07:00
Edward Yang	8afa001d89	Revert D20885968: [pytorch][PR] Enable backtrace with MSVC Test Plan: revert-hammer Differential Revision: D20885968 Original commit changeset: 6ad3822af31e fbshipit-source-id: 468199cae2178b17b7ff63114e274b6844eecb7f	2020-04-07 12:10:45 -07:00
svcscm	c2901333f1	Updating submodules Summary: GitHub commits: `705c16caef` `2f18250af6` `4e89db8a8e` `c97495c660` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: fee6c68f6ef541d230716bc9be8e978e20958e2a	2020-04-07 11:28:55 -07:00
Negin Raoof	681ca45717	[ONNX] Export torch.inverse op (#35318 ) Summary: Added support for torch.inverse export as part of opset 12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35318 Reviewed By: hl475 Differential Revision: D20865900 Pulled By: houseroad fbshipit-source-id: be12b5194d21408cae24eec16e9e12377e8546ad	2020-04-07 10:48:33 -07:00
Elias Ellison	6bc8ffe824	[JIT] Optimize before inlining (#35562 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/35424, only this time I run optimizations in the right order so the PR description is actually true. This speeds up the inlining pass of FairSeq model from 180s -> 13s, and MaskRCNN model from 5s -> 1.5s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35562 Differential Revision: D20738922 Pulled By: eellison fbshipit-source-id: 1439cf9d1f0bc780e2d64a744694f8b3b7ba4b70	2020-04-07 09:42:26 -07:00
caozhong	2b06d5adc6	Fix compilation errors for enabling Intel nextgen compiler (icx/icpx) (#35939 ) Summary: ICPX's aggressive inlining elude implicit instantiation of templates, cause linking error. Signed-off-by: caozhong <zhong.z.cao@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35939 Reviewed By: jianyuh Differential Revision: D20887025 Pulled By: jspark1105 fbshipit-source-id: 0618634c63dd3145ef11196ca140e974bdddd940	2020-04-07 09:31:32 -07:00
Nikita Shulga	444073efde	Add GenerateI8Depthwise.cc to bazel build definition of fbgemm (#36144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36144 Differential Revision: D20894322 Pulled By: malfet fbshipit-source-id: 306702d4c6e2c79fef2345b0befd101d0a9317bf	2020-04-07 09:24:38 -07:00
Di Wu	16d9bcd725	Fix test_avg_pool3d issue in pytorch_paralleltbb_linux_xenial_py3_6_gcc5_4_test (#36103 ) Summary: Fix parallel execution issue introduced by https://github.com/pytorch/pytorch/issues/35740 Pull Request resolved: https://github.com/pytorch/pytorch/pull/36103 Test Plan: test_quantized.py Differential Revision: D20879323 Pulled By: allwu fbshipit-source-id: a2deaaf5c933cbef3096a399c19c44d28935bd69	2020-04-07 08:26:20 -07:00
Edward Yang	7920a970c6	Don't statically link MKL multiple times on Linux (#36078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36078 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20873442 Pulled By: ezyang fbshipit-source-id: c576432b1016beb735dca0b9a8bebb752f764ca8	2020-04-07 08:14:16 -07:00
Pavel Belevich	34b32ca914	Remove operator-> from at::Generator (#36027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36027 Differential Revision: D20856462 Pulled By: pbelevich fbshipit-source-id: 156fc23d51d8125d41e96b36b3b1312f13040588	2020-04-07 08:07:07 -07:00
Pavel Belevich	3328a2f903	Rename CPUGenerator to CPUGeneratorImpl and CUDAGenerator to CUDAGeneratorImpl (#36026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36026 Differential Revision: D20856458 Pulled By: pbelevich fbshipit-source-id: 6d105593dca67640d508a4aebf7edf028d52af32	2020-04-07 08:05:23 -07:00
peter	7e84a30ad6	Enable backtrace with MSVC (#36039 ) Summary: Make it possible to report the C++ exceptions in console. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36039 Differential Revision: D20885968 Pulled By: ezyang fbshipit-source-id: 6ad3822af31e5a64c4a93f16627fbefb7750e1c8	2020-04-07 07:25:12 -07:00
lixinyu	b55dee9fe1	fix max_pool2d cuda version Dimension out of range issue(#36046 ) (#36095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36095 Test Plan: Imported from OSS Differential Revision: D20876733 Pulled By: glaringlee fbshipit-source-id: a2b92fd2dd0254c5443af469e3fb2faa2323e5c9	2020-04-07 01:12:00 -07:00
Mike Ruberry	3e5d25fdfd	Skips test_avg_pool3d_nhwc (#36130 ) Summary: See https://github.com/pytorch/pytorch/issues/36129. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36130 Differential Revision: D20888016 Pulled By: mruberry fbshipit-source-id: 3738ac6c7f4370b03fd528c90414ba8a7944b3bb	2020-04-06 23:50:32 -07:00
Mike Ruberry	70acc9c0f5	Skips test_qadd_scalar_relu (#36128 ) Summary: See https://github.com/pytorch/pytorch/issues/36127. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36128 Differential Revision: D20887918 Pulled By: mruberry fbshipit-source-id: d3745f173ad713bb2847157df2890a1b7c18f0af	2020-04-06 23:18:57 -07:00
svcscm	2e8f9547fa	Updating submodules Summary: GitHub commits: `1f42be50b7` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: ce850ccf3a4857edc09bd5f5086f9650f95830d8	2020-04-06 22:55:28 -07:00
George Gensure	447bcd341d	Bazel build of pytorch with gating CI (#36011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36011 Differential Revision: D20873430 Pulled By: malfet fbshipit-source-id: 8ffffd10ca0ff8bdab578a70a9b2b777aed985d0	2020-04-06 22:50:33 -07:00
Islam Ismailov	64594d8333	Clang 9 and GCC 9 Support (#35835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35835 Make compilable with Clang 9 and GCC 9. Test Plan: Compile with Clang 9 and GCC 9 Differential Revision: D20800182 fbshipit-source-id: dd9474640270de0ad6392641513a7f2fa970d6e3	2020-04-06 21:14:43 -07:00
Mike Ruberry	803a4e135e	Fixes CMake lint error (#36123 ) Summary: ``` Total Errors: 1 Ignoring file: aten/src/ATen/ATenConfig.cmake.in caffe2/CMakeLists.txt:504: Extra spaces between 'if' and its () [whitespace/extra] Ignoring file: cmake/Caffe2Config.cmake.in Ignoring file: cmake/Caffe2ConfigVersion.cmake.in ``` Fixes that error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36123 Differential Revision: D20886027 Pulled By: mruberry fbshipit-source-id: 826a8b02bb128916e3b1634f3ff312cc36e100b5	2020-04-06 21:00:37 -07:00
Edward Yang	449a4ca340	Add more alternative filters in places people forgot to add them. (#36082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36082 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20874618 Pulled By: ezyang fbshipit-source-id: b6f12100a247564428eb7272f803a03c9cad3a97	2020-04-06 20:29:24 -07:00
Mike Ruberry	3570ef6a0f	Revert D20876204: [pytorch][PR] Add trivial reduce for Cuda Test Plan: revert-hammer Differential Revision: D20876204 Original commit changeset: a719f3583cc4 fbshipit-source-id: 6d00afb3a24754d283a7b832c0b784ed9fce36e1	2020-04-06 20:17:04 -07:00
Edward Yang	459163b8eb	Revert D20449887: [dt][caffe2] enable using smart exceptions in async nets Test Plan: revert-hammer Differential Revision: D20449887 Original commit changeset: 047fdf1bd52f fbshipit-source-id: 3d7801613f86885c204f3946f3a52a855516faa3	2020-04-06 19:37:05 -07:00
svcscm	0f243688be	Updating submodules Summary: GitHub commits: `545a6d3fe4` `2f75edd34f` `f53cdab3d7` `1c00a2daaf` `284e1c738b` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: bf0582a3740d7e07ae2f009002acc0a8ea275917	2020-04-06 19:25:42 -07:00
omromano	2f1ca26abd	Update NNPI Backend to v0.5.1.4 (#4334 ) Summary: Update of NNPI Backend to v0.5.1.4 branched out of commit 2aa0e5d3e108a0607acd3184dc803cfd77bd6c3c. Pull Request resolved: https://github.com/pytorch/glow/pull/4334 Reviewed By: jfix71 Differential Revision: D20631651 Pulled By: arunm-git fbshipit-source-id: e5d770c22ccccb753f13035d82c1e61951c256a5	2020-04-06 19:15:49 -07:00
xiaobingsuper	4c140052a6	bfloat16: vectorized unary ops (#35092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35092 Test Plan: Imported from OSS Differential Revision: D20721147 Pulled By: ngimel fbshipit-source-id: 5d40eed36fd5c8b2d0d08bfb1b416fb608a5eaef	2020-04-06 18:52:39 -07:00
Alexander Fix	3da67ce367	[caffe2] Factor libtorch_python_sources into exposed definition (#36005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36005 Getting ovrsource pytorch working, requires single source list across ovrsource and fbsource to avoid build failures every time the source list changes. This diff factors out libtorch_python_sources into a separate function (needs to be function because it uses glob which is disallowed at global scope) Test Plan: CI Reviewed By: malfet Differential Revision: D20852072 fbshipit-source-id: 0e8ae3f6605e090e3ffdd6aa227fac905e7d9877	2020-04-06 18:28:08 -07:00
Nikita Shulga	6a45584272	Remove `__nv_relfatbin` section from nccl_static library (#35843 ) Summary: NCCL library is built using [CUDA separate compilation](https://devblogs.nvidia.com/separate-compilation-linking-cuda-device-code/), which consists of building intermediate CUDA binaries and then linking them into GPU code that could be executed on device. Intermediate CUDA code is stored in `__nv_relfatbin` section, and code that can be launched is stored in `.nv_fatbin`. When `nvcc` is used to link executable/shared library, it removes those intermediate binaries, but default host linker is not aware of that and therefore it is kept inside host executable. Help compiler by removing `__nv_relfatbin` sections from object file inside `libncc_static.a`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35843 Test Plan: Build pytorch with CUDA and run `test_distributed.py` Differential Revision: D20882224 Pulled By: malfet fbshipit-source-id: f23dd4aa416518324cb38b9bd6846e73a1c7dd21	2020-04-06 18:23:08 -07:00
Xiaoqiang Zheng	a81be33a4e	Add trivial reduce for Cuda (#36092 ) Summary: Detect non-read-only loads, and not to use __ldg. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36092 Reviewed By: ZolotukhinM Differential Revision: D20876204 Pulled By: zheng-xq fbshipit-source-id: a719f3583cc4ca30fcfb49d999ca785181354d84	2020-04-06 17:58:50 -07:00
Hector Yuen	f421cf3978	update comments on fake operator (#36086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36086 update comments Test Plan: internal tests pass buck test //glow/fb/test/numerics/... Reviewed By: yinghai Differential Revision: D20875297 fbshipit-source-id: f0dde406c66ab6c9e5cb1c4f669f162486fefda0	2020-04-06 16:54:23 -07:00
Chunli Fu	5d33cf5dfc	[Shape Inference] Set new shape according to precedence of dimType over previous value (#36081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36081 Reviewed By: yinghai, ipiszy Differential Revision: D20873112 fbshipit-source-id: a610f989b9edb830097fda7502c04400ddfb42f1	2020-04-06 16:25:08 -07:00
Elias Ellison	4ced22c5de	[JIT] Add IR Benchmarking tests to ai bench (#35732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35732 Adding IR Complexity benchmarking tests to AI Bench. For full description of IR benchmarking look at https://github.com/pytorch/pytorch/pull/34918. Test Plan: https://our.intern.facebook.com/intern/testinfra/testconsole/testrun/5348024580589132/ test run Local run: PyTorchObserver {"type": "NET", "metric": "conv1d_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "conv1d_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "conv2d_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "conv2d_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "conv3d_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "conv3d_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "conv_transpose1d_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "conv_transpose1d_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "conv_transpose2d_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "conv_transpose2d_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "conv_transpose3d_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "conv_transpose3d_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "conv_tbc_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "conv_tbc_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "avg_pool1d_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "avg_pool1d_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "avg_pool2d_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "avg_pool2d_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "avg_pool3d_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "avg_pool3d_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} I0403 15:39:29.672143 2015194 init.cc:579] Skip logging in unit test environment for event: torch.script.compile PyTorchObserver {"type": "NET", "metric": "Linear_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "Linear_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "Threshold_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "Threshold_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "ReLU_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "ReLU_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "ReLU6_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "ReLU6_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "RReLU_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "RReLU_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "Hardtanh_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "Hardtanh_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "Sigmoid_profiled_num_ifs_loops", "unit": "scalar", "value": "0.0"} PyTorchObserver {"type": "NET", "metric": "Sigmoid_profiled_num_non_tensor_nodes", "unit": "scalar", "value": "0.0"} Reviewed By: driazati Differential Revision: D20754238 fbshipit-source-id: 179240dee516647e5583b9fe47083c84241ddacc	2020-04-06 16:14:25 -07:00
Gregory Chanan	40a45957a0	May fix TopKTypeConfig<at::Half> without an additional Bitfield specialization (#36077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36077 Test Plan: Imported from OSS Differential Revision: D20872623 Pulled By: gchanan fbshipit-source-id: 9363dfc22cc316fa9e845f8b479da7894976079f	2020-04-06 16:14:21 -07:00
Gregory Chanan	2173746f64	Compile THCTensorTopK per dtype. (#36074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36074 ROCm builds fail inconsistently on this file by timing out. Test Plan: Imported from OSS Differential Revision: D20872395 Pulled By: gchanan fbshipit-source-id: 20d0890433b7290c36ed99bc7bfb73a93971ead1	2020-04-06 16:12:27 -07:00
Omkar Salpekar	7d1f06462c	Fixing Potential TSAN issue with joining RPC helper threads (#36094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36094 The condition variable waiting in the RPC retry thread must be notified after setting the atomic running to False. This will cause ensure the thread is joinable, and allow `rpc.shutdown` to function correctly ghstack-source-id: 101538860 Test Plan: build bot Differential Revision: D20854763 fbshipit-source-id: b92050712a1e6c31d4dd3b3d98f32ef8dee0f2f2	2020-04-06 15:56:06 -07:00
Owen Anderson	b8383b3d4c	[WIP] Enable NNC's LLVM dependency in CI (#35564 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35564 Differential Revision: D20848144 Pulled By: resistor fbshipit-source-id: 992589447162766fbe8df0c696563511a2bb8e52	2020-04-06 15:54:35 -07:00
Rohan Varma	2ef1ace877	[rpc] call threadPool.waitWorkComplete after listenerThread.join() to fix (#35394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35394 As above ghstack-source-id: 101592571 Test Plan: Existing CI, no longer flaky Differential Revision: D20632405 fbshipit-source-id: fbfd81470b3361371109af341f0db3ef8b3a415b	2020-04-06 15:18:30 -07:00
Vasiliy Kuznetsov	cc78914755	qactivation_benchmarks: small bug fix (#35731 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35731 Changes relu and relu6 to point to the functional implementations here. The previous behavior tested the time to create the module, but didn't actually run the function (I noticed this when adding the new input sizes and seeing the measured time not change). Test Plan: run the benchmark, the time now changes as expected with input size for these. Imported from OSS Differential Revision: D20875542 fbshipit-source-id: 3a6278a7a861437d613c1e30698a58175a8e8555	2020-04-06 15:02:33 -07:00
Vasiliy Kuznetsov	6405f26a02	add more quantized activation benchmarks and input sizes (#35729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35729 * there were a few quantized activations which had implementations but not benchmarks, adds them * adds the input sizes from `unary_tests.py` here, so we can compare fairly from fp to quantized implementations of activations Test Plan: ``` python -m pt.qactivation_test ``` Imported from OSS Differential Revision: D20875544 fbshipit-source-id: f55a66422233b96f0791c85b05476596d5d72b5d	2020-04-06 15:02:29 -07:00
Vasiliy Kuznetsov	b68c3827de	add benchmark for quantized batchnorm (#35389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35389 Adds a benchmark for quantized batchnorm, with the parameters the same compared to floating point batchnorm benchmark. Test Plan: run benchmarks https://gist.github.com/vkuzo/c49be58abdf0ff64797fab3936d0cb15 Imported from OSS Differential Revision: D20875543 fbshipit-source-id: ced89fbe2d18168e92950d0b74ca638aba54cd96	2020-04-06 15:01:05 -07:00
Tristan Rice	8ef82fc2c9	[dt][caffe2] enable using smart exceptions in async nets (#34753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34753 This improves support for exceptions and capturing stack traces in caffe2 async nets. We generally want to use exceptions everywhere we can in order to preserve stack information. It also makes the exception timestamp more accurate so multiple exceptions at the same time can be correctly ordered. Test Plan: Updated the tests to use the new error semantics + adds a test to ensure the stack is correctly propagated through deferrable async scheduling. Reviewed By: andrewwdye Differential Revision: D20449887 fbshipit-source-id: 047fdf1bd52fd7c7c1f3fde77df9a27ed9e288e7	2020-04-06 14:27:07 -07:00
James Reed	3228939f23	[JIT] Fix fake_range() (#36083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36083 Test Plan: Imported from OSS Differential Revision: D20874745 Pulled By: jamesr66a fbshipit-source-id: fc57defefbc8e9840b8d5bac89b4146179e00b06	2020-04-06 14:12:35 -07:00
rohithkrn	3e402a5940	[ROCm] Enable BFloat16 type for add_out_sparse (#35978 ) Summary: Enables bfloat16 type for add_out of sparse tensors. Also enabled it for coalesce() which is used in unit test reference checking. iotamudelta ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/35978 Differential Revision: D20874142 Pulled By: ezyang fbshipit-source-id: af8d2f4bc5f5cc3bb7f8cb1e3c688669ba3d13b9	2020-04-06 14:07:17 -07:00
Richard Zou	cb385cb6d7	Pin Sphinx to 2.4.4 (take 2), fix docs CIs (#36072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36072 Update to https://github.com/pytorch/pytorch/pull/36065/ which was almost there Test Plan: - Wait for CI Differential Revision: D20871661 Pulled By: zou3519 fbshipit-source-id: 2bf5ce382e879aafd232700ff1c0d61fc17ea52d	2020-04-06 13:48:59 -07:00
Elias Ellison	0475d7b08d	[JIT] optimize mutableType calls (#35474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35474 I had previously tried to optimize getMutableTypePtr calls by not recursing through container types, but it turns out there are a few uses of container types which refine their contained elements. This attempt was in #35301 Now I am optimizing calls by caching TypePtr -> Mutable TypePtr conversions. Now that we are doing caching none of the functions marked as const are really const anymore. Previously many of the const functions actually mutated internal state, such as rebuildWriteCache. one kind of annoying thing is that there is a general api for querying mutability isMutableType that doesn't use the cache, and one internal that does, isMutableTypeInternal. It would be nice if I could call isMutableType within alias analysis and it would dispatch to the internal function, but I'm not sure how to do that. getMutableTypePtr showed up as 12% of the first run of FairSeq, so this is a function worth optimizing. Test Plan: Imported from OSS Differential Revision: D20873493 Pulled By: eellison fbshipit-source-id: 1b42bb58ba4142c118a6bc47a26978cd7fd0ac79	2020-04-06 13:31:51 -07:00
Johannes M Dieterich	45fc881f05	[ROCm] Hotfix: Black list tensorexpr test set that has failures on ROCm (#36049 ) Summary: Test set got enabled with ROCm failures in https://github.com/pytorch/pytorch/pull/35914 - black list it for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36049 Differential Revision: D20869814 Pulled By: zou3519 fbshipit-source-id: fcdb2abc9f3407344b56cf8d48b7740008317020	2020-04-06 13:26:05 -07:00
Nikita Shulga	59ed0c5fd7	Strip newline when ingesting `version.txt` (#36002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36002 Test Plan: Run cmake and observe there are no warning in stdout nor in `CMakeCache.txt` Differential Revision: D20872854 Pulled By: malfet fbshipit-source-id: 8a61b63b3d564e597e7a62dd913c97bc64b183b9	2020-04-06 13:21:10 -07:00
Vasiliy Kuznetsov	4ef383d5db	add type hints on recently added ops to make them scriptable (#35885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35885 For the ops I added recently, ensure all the typehints are present, so that JIT can script them. We might want to look into a test for this in the future. Test Plan: scripting works for all of them now: https://gist.github.com/vkuzo/1d92fdea548ad596310fffcbe95e4438 Imported from OSS Differential Revision: D20818431 fbshipit-source-id: 0de61eaf70c08d625128c6fffd05788e6e5bb920	2020-04-06 12:17:16 -07:00
tianyou.gty	8dba98da0f	[ONNX] Added support for constant folding onnx::Add and onnx::Sub (#35869 ) Summary: Added support for constant folding onnx::Add and onnx::Sub Pull Request resolved: https://github.com/pytorch/pytorch/pull/35869 Reviewed By: hl475 Differential Revision: D20865640 Pulled By: houseroad fbshipit-source-id: 2b8c1cc196959b5b5b9ce018dbdcb74d59a92d9f	2020-04-06 10:50:21 -07:00
Nick Gibson	d568c7d966	[TensorExpr] add more detail to malformed_input exceptions (#35891 ) Summary: Add an explanation string to malformed_input exceptions thrown inside jit/tensorexpr to aid in debugging issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35891 Differential Revision: D20822306 Pulled By: nickgg fbshipit-source-id: ce153a05218f2a4da5ecf5f1a5dc439070c96e55	2020-04-06 10:36:31 -07:00
Lingyi Liu	82d58ed484	disable the test to stop breaking the builds (#36053 ) Summary: allwu Leave it to you for further investigation and enable it back. Pull Request resolved: https://github.com/pytorch/pytorch/pull/36053 Differential Revision: D20865286 Pulled By: lly-zero-one fbshipit-source-id: b3e44b1343b66944aaa5a0a3909c8b5e9390c52f	2020-04-05 21:05:49 -07:00
neginraoof	e56ba8481e	[ONNX] fix size for opset 11 (#35984 ) Summary: Fixing size, as the aten op has updated to support 0 inputs Pull Request resolved: https://github.com/pytorch/pytorch/pull/35984 Reviewed By: hl475 Differential Revision: D20858214 Pulled By: houseroad fbshipit-source-id: 8ad0a0174a569455e89da6798eed403c8b162a47	2020-04-05 18:58:54 -07:00
Jianyu Huang	8224398c14	[pytorch] Fix the extra_repr print message for float16 dynamic quantization (#36044 ) Summary: When applying the float16 dynamic quantization with ``` model = torch.quantization.quantize_dynamic( model, {torch.nn.Linear}, dtype=torch.float16 ) print(model) ``` there is an issue when we try to print the model. Basically we cannot print the `qscheme` information for float16 weight (It is not per-tensor or per-channel quantization defined for int8 dynamic quantization). Before this PR: ``` Traceback (most recent call last): File "dlrm_s_pytorch.py", line 860, in <module> print(dlrm) File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1142, in __repr__ mod_str = repr(module) File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1142, in __repr__ mod_str = repr(module) File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1136, in __repr__ extra_repr = self.extra_repr() File "/home/jianyuhuang/miniconda3/lib/python3.7/site-packages/torch/nn/quantized/dynamic/modules/linear.py", line 55, in extra_repr self.in_features, self.out_features, self.weight().qscheme() RuntimeError: Could not run 'aten::qscheme' with arguments from the 'CPUTensorId' backend. 'aten::qscheme' is only available for these back ends: [QuantizedCPUTensorId, VariableTensorId]. ``` After this PR: ``` (4): DynamicQuantizedLinear( in_features=2, out_features=1, dtype=torch.float16 (_packed_params): LinearPackedParams() ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/36044 Differential Revision: D20860811 Pulled By: jianyuh fbshipit-source-id: d1405a185f46a8110e6d27982b40534c854f4d1c	2020-04-05 14:27:42 -07:00
Martin Yuan	81c8ca1e2e	Disable tracing for Pytorch Mobile client (#36007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36007 Tracing is not needed in Pytorch Mobile client. Disabling it has a couple of benefits: 1. It's a pre-requisite to build lite interpreter. 2. It saves the code size for full jit and Federated learning (around 600k). Solution: use PYTORCH_DISABLE_TRACING to disable it. It's better than passing an argument to code-gen because: 1. It's a single-point change in the code template for both VariableType and VariableFactories. 2. code-gen does not handle VariableTypeManual.cpp. The macro is need there anyway. ghstack-source-id: 101529401 Test Plan: CI Reviewed By: ljk53 Differential Revision: D20852558 fbshipit-source-id: c28cec9f90208974acfa351ec9aec3fabbbb8aac	2020-04-05 13:55:38 -07:00
anjali411	66d50060eb	Temporary methods for real and imag values of complex tensors (#35879 ) Summary: Notes: 1. didn't name them as _copy_real and _copy_imag because it's desirable (but not necessary) to have these methods as tensor methods. 2. replaced old .real() and .imag() instances with _copy_real() and _copy_imag() methods 3. didn't add documentation because we plan to remove these methods when we add real and imag as tensor attributes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35879 Differential Revision: D20841760 Pulled By: anjali411 fbshipit-source-id: 7267e6fbaab9a5ce426e9396f12238994666b0dd	2020-04-05 07:22:02 -07:00
Aayush Naik	b3cdec88e3	Fix torch complex exp CPU implementation (#35532 ) (#35715 ) Summary: There was a permutation operation missing in each of the complex vector files. I also added some test cases, the last two of which fail under the current implementation. This PR fixes that: all the testcases pass. Fixes https://github.com/pytorch/pytorch/issues/35532 dylanbespalko Pull Request resolved: https://github.com/pytorch/pytorch/pull/35715 Differential Revision: D20857024 Pulled By: anjali411 fbshipit-source-id: 4eecd8f0863faa838300951626f26b89e6cc9c6b	2020-04-04 15:33:32 -07:00
Sebastian Messmer	7ee88d61f7	Rename boxing/unboxing files and utilities (#35411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35411 The file and class names in ATen/core/boxing were quite confusing. Let's rename them for readability. Also move function schema inference out of the boxing logic into op_registration.h where it belongs. ghstack-source-id: 101539206 Test Plan: waitforsandcastle Differential Revision: D20653621 fbshipit-source-id: 6a79c73d5758bee1e072d543c030913b18a69c7c	2020-04-04 14:13:28 -07:00
Jongsoo Park	8a6173edf2	[caffe2] tune prefetch distance Summary: As title Test Plan: Seeing prefetch distance 8 is slightly faster overall. Probably because prefetch distance 16 was tuned for int8 but int4 is a bit slower. Comparing 4, 8, and 16 SKL T6 ``` bit_rate 4batch size 1 num rows 5000000 emb dim 112 avg length 100 \| bit_rate 4batch size 1 num rows 5000000 emb dim 112 avg length 100 \| bit_rate 4batch size 1 num rows 5000000 emb dim 112 avg length 100 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 SLS cache not flushed prefetch 4 b/w 3.23747 GB/s effective b/w: 3.4533GB/s \| SLS cache not flushed prefetch 8 b/w 3.14268 GB/s effective b/w: 3.35219GB/s \| SLS cache not flushed prefetch 16 b/w 3.31473 GB/s effective b/w: 3.53572GB/s SLS cache flushed prefetch 4 b/w 0.953971 GB/s effective b/w: 1.01757GB/s \| SLS cache flushed prefetch 8 b/w 0.908884 GB/s effective b/w: 0.969477GB/s \| SLS cache flushed prefetch 16 b/w 0.858664 GB/s effective b/w: 0.915908GB/s bit_rate 4batch size 1 num rows 6000000 emb dim 24 avg length 100 \| bit_rate 4batch size 1 num rows 6000000 emb dim 24 avg length 100 \| bit_rate 4batch size 1 num rows 6000000 emb dim 24 avg length 100 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 SLS cache not flushed prefetch 4 b/w 0.738689 GB/s effective b/w: 2.95476GB/s \| SLS cache not flushed prefetch 8 b/w 0.593186 GB/s effective b/w: 2.37274GB/s \| SLS cache not flushed prefetch 16 b/w 0.654879 GB/s effective b/w: 2.61952GB/s SLS cache flushed prefetch 4 b/w 0.422531 GB/s effective b/w: 1.69013GB/s \| SLS cache flushed prefetch 8 b/w 0.525107 GB/s effective b/w: 2.10043GB/s \| SLS cache flushed prefetch 16 b/w 0.509311 GB/s effective b/w: 2.03724GB/s bit_rate 4batch size 1 num rows 2000000 emb dim 20 avg length 50 \| bit_rate 4batch size 1 num rows 2000000 emb dim 20 avg length 50 \| bit_rate 4batch size 1 num rows 2000000 emb dim 20 avg length 50 64 bit indices with prefetching, lengths_sum 50 \| 64 bit indices with prefetching, lengths_sum 50 \| 64 bit indices with prefetching, lengths_sum 50 SLS cache not flushed prefetch 4 b/w 0.28347 GB/s effective b/w: 1.29586GB/s \| SLS cache not flushed prefetch 8 b/w 0.273341 GB/s effective b/w: 1.24956GB/s \| SLS cache not flushed prefetch 16 b/w 0.550877 GB/s effective b/w: 2.5183GB/s SLS cache flushed prefetch 4 b/w 0.348034 GB/s effective b/w: 1.59101GB/s \| SLS cache flushed prefetch 8 b/w 0.359454 GB/s effective b/w: 1.64322GB/s \| SLS cache flushed prefetch 16 b/w 0.348086 GB/s effective b/w: 1.59125GB/s bit_rate 4batch size 1 num rows 6000000 emb dim 24 avg length 50 \| bit_rate 4batch size 1 num rows 6000000 emb dim 24 avg length 50 \| bit_rate 4batch size 1 num rows 6000000 emb dim 24 avg length 50 64 bit indices with prefetching, lengths_sum 50 \| 64 bit indices with prefetching, lengths_sum 50 \| 64 bit indices with prefetching, lengths_sum 50 SLS cache not flushed prefetch 4 b/w 0.322958 GB/s effective b/w: 1.29183GB/s \| SLS cache not flushed prefetch 8 b/w 0.622423 GB/s effective b/w: 2.48969GB/s \| SLS cache not flushed prefetch 16 b/w 0.331167 GB/s effective b/w: 1.32467GB/s SLS cache flushed prefetch 4 b/w 0.406938 GB/s effective b/w: 1.62775GB/s \| SLS cache flushed prefetch 8 b/w 0.548998 GB/s effective b/w: 2.19599GB/s \| SLS cache flushed prefetch 16 b/w 0.40833 GB/s effective b/w: 1.63332GB/s bit_rate 4batch size 1 num rows 4000000 emb dim 68 avg length 100 \| bit_rate 4batch size 1 num rows 4000000 emb dim 68 avg length 100 \| bit_rate 4batch size 1 num rows 4000000 emb dim 68 avg length 100 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 SLS cache not flushed prefetch 4 b/w 2.26016 GB/s effective b/w: 3.80658GB/s \| SLS cache not flushed prefetch 8 b/w 1.66055 GB/s effective b/w: 2.79671GB/s \| SLS cache not flushed prefetch 16 b/w 2.31538 GB/s effective b/w: 3.89959GB/s SLS cache flushed prefetch 4 b/w 0.714837 GB/s effective b/w: 1.20394GB/s \| SLS cache flushed prefetch 8 b/w 0.659482 GB/s effective b/w: 1.11071GB/s \| SLS cache flushed prefetch 16 b/w 0.643239 GB/s effective b/w: 1.08335GB/s bit_rate 4batch size 16 num rows 10000000 emb dim 28 avg length 100 \| bit_rate 4batch size 16 num rows 10000000 emb dim 28 avg length 100 \| bit_rate 4batch size 16 num rows 10000000 emb dim 28 avg length 100 64 bit indices with prefetching, lengths_sum 1600 \| 64 bit indices with prefetching, lengths_sum 1600 \| 64 bit indices with prefetching, lengths_sum 1600 SLS cache not flushed prefetch 4 b/w 2.33704 GB/s effective b/w: 8.30946GB/s \| SLS cache not flushed prefetch 8 b/w 2.53271 GB/s effective b/w: 9.00521GB/s \| SLS cache not flushed prefetch 16 b/w 2.27881 GB/s effective b/w: 8.10242GB/s SLS cache flushed prefetch 4 b/w 0.594799 GB/s effective b/w: 2.11484GB/s \| SLS cache flushed prefetch 8 b/w 0.675113 GB/s effective b/w: 2.4004GB/s \| SLS cache flushed prefetch 16 b/w 0.681539 GB/s effective b/w: 2.42325GB/s bit_rate 4batch size 1 num rows 2000000 emb dim 20 avg length 100 \| bit_rate 4batch size 1 num rows 2000000 emb dim 20 avg length 100 \| bit_rate 4batch size 1 num rows 2000000 emb dim 20 avg length 100 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 SLS cache not flushed prefetch 4 b/w 0.564903 GB/s effective b/w: 2.58242GB/s \| SLS cache not flushed prefetch 8 b/w 0.416964 GB/s effective b/w: 1.90612GB/s \| SLS cache not flushed prefetch 16 b/w 0.544874 GB/s effective b/w: 2.49085GB/s SLS cache flushed prefetch 4 b/w 0.342759 GB/s effective b/w: 1.5669GB/s \| SLS cache flushed prefetch 8 b/w 0.394844 GB/s effective b/w: 1.805GB/s \| SLS cache flushed prefetch 16 b/w 0.339888 GB/s effective b/w: 1.55378GB/s bit_rate 4batch size 40 num rows 5000000 emb dim 12 avg length 40 \| bit_rate 4batch size 40 num rows 5000000 emb dim 12 avg length 40 \| bit_rate 4batch size 40 num rows 5000000 emb dim 12 avg length 40 64 bit indices with prefetching, lengths_sum 1600 \| 64 bit indices with prefetching, lengths_sum 1600 \| 64 bit indices with prefetching, lengths_sum 1600 SLS cache not flushed prefetch 4 b/w 1.72766 GB/s effective b/w: 11.057GB/s \| SLS cache not flushed prefetch 8 b/w 1.77403 GB/s effective b/w: 11.3538GB/s \| SLS cache not flushed prefetch 16 b/w 1.72637 GB/s effective b/w: 11.0488GB/s SLS cache flushed prefetch 4 b/w 0.388679 GB/s effective b/w: 2.48754GB/s \| SLS cache flushed prefetch 8 b/w 0.440318 GB/s effective b/w: 2.81803GB/s \| SLS cache flushed prefetch 16 b/w 0.46335 GB/s effective b/w: 2.96544GB/s ``` BDW T6 ``` bit_rate 4batch size 1 num rows 5000000 emb dim 112 avg length 100 \| bit_rate 4batch size 1 num rows 5000000 emb dim 112 avg length 100 \| bit_rate 4batch size 1 num rows 5000000 emb dim 112 avg length 100 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 SLS cache not flushed prefetch 4 b/w 0.160859 GB/s effective b/w: 0.171583GB/s \| SLS cache not flushed prefetch 8 b/w 0.153157 GB/s effective b/w: 0.163367GB/s \| SLS cache not flushed prefetch 16 b/w 0.13472 GB/s effective b/w: 0.143701GB/s SLS cache flushed prefetch 4 b/w 0.147863 GB/s effective b/w: 0.157721GB/s \| SLS cache flushed prefetch 8 b/w 0.127365 GB/s effective b/w: 0.135856GB/s \| SLS cache flushed prefetch 16 b/w 0.147118 GB/s effective b/w: 0.156926GB/s bit_rate 4batch size 1 num rows 6000000 emb dim 24 avg length 100 \| bit_rate 4batch size 1 num rows 6000000 emb dim 24 avg length 100 \| bit_rate 4batch size 1 num rows 6000000 emb dim 24 avg length 100 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 SLS cache not flushed prefetch 4 b/w 0.190173 GB/s effective b/w: 0.760691GB/s \| SLS cache not flushed prefetch 8 b/w 0.275278 GB/s effective b/w: 1.10111GB/s \| SLS cache not flushed prefetch 16 b/w 0.190026 GB/s effective b/w: 0.760104GB/s SLS cache flushed prefetch 4 b/w 0.160147 GB/s effective b/w: 0.640589GB/s \| SLS cache flushed prefetch 8 b/w 0.188328 GB/s effective b/w: 0.753313GB/s \| SLS cache flushed prefetch 16 b/w 0.168198 GB/s effective b/w: 0.672792GB/s bit_rate 4batch size 1 num rows 2000000 emb dim 20 avg length 50 \| bit_rate 4batch size 1 num rows 2000000 emb dim 20 avg length 50 \| bit_rate 4batch size 1 num rows 2000000 emb dim 20 avg length 50 64 bit indices with prefetching, lengths_sum 50 \| 64 bit indices with prefetching, lengths_sum 50 \| 64 bit indices with prefetching, lengths_sum 50 SLS cache not flushed prefetch 4 b/w 0.240228 GB/s effective b/w: 1.09818GB/s \| SLS cache not flushed prefetch 8 b/w 0.239071 GB/s effective b/w: 1.0929GB/s \| SLS cache not flushed prefetch 16 b/w 0.213773 GB/s effective b/w: 0.977248GB/s SLS cache flushed prefetch 4 b/w 0.144547 GB/s effective b/w: 0.660788GB/s \| SLS cache flushed prefetch 8 b/w 0.143015 GB/s effective b/w: 0.653782GB/s \| SLS cache flushed prefetch 16 b/w 0.143064 GB/s effective b/w: 0.654009GB/s bit_rate 4batch size 1 num rows 6000000 emb dim 24 avg length 50 \| bit_rate 4batch size 1 num rows 6000000 emb dim 24 avg length 50 \| bit_rate 4batch size 1 num rows 6000000 emb dim 24 avg length 50 64 bit indices with prefetching, lengths_sum 50 \| 64 bit indices with prefetching, lengths_sum 50 \| 64 bit indices with prefetching, lengths_sum 50 SLS cache not flushed prefetch 4 b/w 0.271859 GB/s effective b/w: 1.08744GB/s \| SLS cache not flushed prefetch 8 b/w 0.236553 GB/s effective b/w: 0.946214GB/s \| SLS cache not flushed prefetch 16 b/w 0.271416 GB/s effective b/w: 1.08567GB/s SLS cache flushed prefetch 4 b/w 0.175296 GB/s effective b/w: 0.701185GB/s \| SLS cache flushed prefetch 8 b/w 0.172102 GB/s effective b/w: 0.688409GB/s \| SLS cache flushed prefetch 16 b/w 0.167294 GB/s effective b/w: 0.669176GB/s bit_rate 4batch size 1 num rows 4000000 emb dim 68 avg length 100 \| bit_rate 4batch size 1 num rows 4000000 emb dim 68 avg length 100 \| bit_rate 4batch size 1 num rows 4000000 emb dim 68 avg length 100 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 SLS cache not flushed prefetch 4 b/w 0.169403 GB/s effective b/w: 0.285311GB/s \| SLS cache not flushed prefetch 8 b/w 0.180288 GB/s effective b/w: 0.303643GB/s \| SLS cache not flushed prefetch 16 b/w 0.177599 GB/s effective b/w: 0.299114GB/s SLS cache flushed prefetch 4 b/w 0.126442 GB/s effective b/w: 0.212955GB/s \| SLS cache flushed prefetch 8 b/w 0.134837 GB/s effective b/w: 0.227094GB/s \| SLS cache flushed prefetch 16 b/w 0.130753 GB/s effective b/w: 0.220216GB/s bit_rate 4batch size 16 num rows 10000000 emb dim 28 avg length 100 \| bit_rate 4batch size 16 num rows 10000000 emb dim 28 avg length 100 \| bit_rate 4batch size 16 num rows 10000000 emb dim 28 avg length 100 64 bit indices with prefetching, lengths_sum 1600 \| 64 bit indices with prefetching, lengths_sum 1600 \| 64 bit indices with prefetching, lengths_sum 1600 SLS cache not flushed prefetch 4 b/w 0.152165 GB/s effective b/w: 0.541032GB/s \| SLS cache not flushed prefetch 8 b/w 0.153995 GB/s effective b/w: 0.547538GB/s \| SLS cache not flushed prefetch 16 b/w 0.157132 GB/s effective b/w: 0.558692GB/s SLS cache flushed prefetch 4 b/w 0.144367 GB/s effective b/w: 0.513307GB/s \| SLS cache flushed prefetch 8 b/w 0.151356 GB/s effective b/w: 0.538156GB/s \| SLS cache flushed prefetch 16 b/w 0.152418 GB/s effective b/w: 0.541932GB/s bit_rate 4batch size 1 num rows 2000000 emb dim 20 avg length 100 \| bit_rate 4batch size 1 num rows 2000000 emb dim 20 avg length 100 \| bit_rate 4batch size 1 num rows 2000000 emb dim 20 avg length 100 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 \| 64 bit indices with prefetching, lengths_sum 100 SLS cache not flushed prefetch 4 b/w 0.221365 GB/s effective b/w: 1.01195GB/s \| SLS cache not flushed prefetch 8 b/w 0.242639 GB/s effective b/w: 1.1092GB/s \| SLS cache not flushed prefetch 16 b/w 0.24001 GB/s effective b/w: 1.09719GB/s SLS cache flushed prefetch 4 b/w 0.141794 GB/s effective b/w: 0.6482GB/s \| SLS cache flushed prefetch 8 b/w 0.13777 GB/s effective b/w: 0.629803GB/s \| SLS cache flushed prefetch 16 b/w 0.142048 GB/s effective b/w: 0.649364GB/s bit_rate 4batch size 40 num rows 5000000 emb dim 12 avg length 40 \| bit_rate 4batch size 40 num rows 5000000 emb dim 12 avg length 40 \| bit_rate 4batch size 40 num rows 5000000 emb dim 12 avg length 40 64 bit indices with prefetching, lengths_sum 1600 \| 64 bit indices with prefetching, lengths_sum 1600 \| 64 bit indices with prefetching, lengths_sum 1600 SLS cache not flushed prefetch 4 b/w 0.175974 GB/s effective b/w: 1.12623GB/s \| SLS cache not flushed prefetch 8 b/w 0.157427 GB/s effective b/w: 1.00754GB/s \| SLS cache not flushed prefetch 16 b/w 0.175214 GB/s effective b/w: 1.12137GB/s SLS cache flushed prefetch 4 b/w 0.160466 GB/s effective b/w: 1.02699GB/s \| SLS cache flushed prefetch 8 b/w 0.152678 GB/s effective b/w: 0.97714GB/s \| SLS cache flushed prefetch 16 b/w 0.168301 GB/s effective b/w: 1.07712GB/s ``` Reviewed By: jianyuh Differential Revision: D20799658 fbshipit-source-id: cd486d1bac56662de54960237d5fd8e3e9ba6822	2020-04-04 11:40:05 -07:00
Sebastian Messmer	7b2b17f727	Revert D20802884: [Shape Inference] Set new shape according to precedence of dimType over previous value Test Plan: revert-hammer Differential Revision: D20802884 Original commit changeset: f8bdab5d5a8c fbshipit-source-id: d44ffd273f2cdd582bc7306e7628ab09ab106ff2	2020-04-04 11:28:39 -07:00
Martin Yuan	82087ee7f6	Add DICT_CONSTRUCT and NAMED_TUPLE_CONSTRUCT to lite interpreter (#36015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36015 Test Plan: Imported from OSS Reviewed By: linbinyu Differential Revision: D20853995 Pulled By: iseeyuan fbshipit-source-id: 153f76d223f9ffc71e2259b741a7e5d78ae63f22	2020-04-04 09:52:58 -07:00
Will Feng (FAIAR)	5fab1bf3e4	Use `std::abs` instead of `abs` in lbfgs.cpp (#35974 ) Summary: This supersedes https://github.com/pytorch/pytorch/pull/35698. `abs` is a C-style function that takes only integral argument `std::abs` is polymorphic and can be applied to both integral and floating point types This PR also increases `kBatchSize` in `test_optimizer_xor` function in `test/cpp/api/optim.cpp` to fix `OptimTest.XORConvergence_LBFGS` failure under ASAN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35974 Test Plan: CI Reviewed By: pbelevich Differential Revision: D20853570 Pulled By: yf225 fbshipit-source-id: 6135588df2426c5b974e4e097b416955d1907bd4	2020-04-04 09:37:21 -07:00
Chunli Fu	e3e2dd7779	[Shape Inference] Set new shape according to precedence of dimType over previous value (#35910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35910 Reviewed By: iroot900 Differential Revision: D20802884 fbshipit-source-id: f8bdab5d5a8c7f71d564d18bd630425cb9f27c76	2020-04-03 23:30:37 -07:00
Ashkan Aliabadi	b7f4b6a6de	Support for XNNPACK max pooling operator. (#35354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35354 Differential Revision: D20821862 Test Plan: Imported from OSS Pulled By: AshkanAliabadi fbshipit-source-id: 156fb8db85ab194919f68fd99599f08f2647b695	2020-04-03 22:53:15 -07:00
Ashkan Aliabadi	beb9430ff6	Propagate input tensor names in XNNPACK backend. (#35351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35351 Differential Revision: D20821861 Test Plan: Imported from OSS Pulled By: AshkanAliabadi fbshipit-source-id: 68bc50a0e87572f4d5388961ae83138852e69249	2020-04-03 22:51:38 -07:00
Ilia Cherniavskii	a604041a11	Back out "[pytorch][PR] indexing: throw exception for masks with dtype=uint8" (#36013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/36013 Original commit changeset: f4ebaabf427d Test Plan: CI Differential Revision: D20853694 fbshipit-source-id: 93deb43f67a385ddfd6853fef6f1dc6de408ec37	2020-04-03 21:40:02 -07:00
Hong Xu	de04a1850f	Remove nonexistent op variable in complex tests. (#35722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35722 Differential Revision: D20851653 Pulled By: EscapeZero fbshipit-source-id: 42f23f4150d0bf501e0e03f0802dd3e3c5fa60f5	2020-04-03 19:57:55 -07:00
Shihao Xu	e5c6003f3e	Mark prim::rpc_async as having side effect (#35994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35994 prim::rpc_async was optimized out if we don't takes its returned future and wait on the future. Test Plan: ` Differential Revision: D7850846 fbshipit-source-id: e4e46506ab608f2e072027d6c10c49a4d784b14a	2020-04-03 18:12:50 -07:00
Pavel Belevich	e73ab30f3d	rand() and uniform_() for complex dtype (#35585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35585 Test Plan: Imported from OSS Differential Revision: D20820378 Pulled By: pbelevich fbshipit-source-id: 761a274042ff44b46720339f34017974bf796e63	2020-04-03 18:05:24 -07:00
Nikita Shulga	6be9c77998	Revert D20783179: [pytorch][PR] Bazel build of pytorch Test Plan: revert-hammer Differential Revision: D20783179 Original commit changeset: b160908a3e10 fbshipit-source-id: 5b7b36305525e7ccc49540b48991149cf0a759f4	2020-04-03 17:59:10 -07:00
Nikita Shulga	fced0c9837	Fix ATen/test/complex_test logic (#35976 ) Summary: Properly implement complex number arithmetic, for example `a / (b + ci) = a (b - c*i)/(b^2 + c^2)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35976 Test Plan: CI Differential Revision: D20851452 Pulled By: malfet fbshipit-source-id: dd5d0fbc0b27c4ccfa66a8e75b97791188efc78a	2020-04-03 17:57:43 -07:00
Jiakai Liu	eba5bdbeaa	[pytorch] register c10 ops for static dispatch (#35193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35193 PR #34275 / D20266240 caused size regression. PR #35148 / D20578316 reverted partially to fix the regression. With buck selective build landed it should no longer cause size regression. This diff relands the reverted part of the original diff. ghstack-source-id: 100641910 Test Plan: CI Differential Revision: D20586305 fbshipit-source-id: 6f314d6c13d1a557b314123a5ca350ab88441e95	2020-04-03 17:52:42 -07:00
George Gensure	585f153d00	Bazel build of pytorch (#35220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35220 Reviewed By: seemethere Differential Revision: D20783179 Pulled By: malfet fbshipit-source-id: b160908a3e107790fa06057a77de9d6d23493bbc	2020-04-03 17:13:58 -07:00
Pavel Belevich	4b64dffcb6	Move uniform_() to DistributionTemplates(Migrate uniform_ from TH to ATen) (#35580 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35580 `uniform_kernel_cpu` is based on https://github.com/pytorch/pytorch/pull/30954 Test Plan: Imported from OSS Differential Revision: D20820221 Pulled By: pbelevich fbshipit-source-id: 13f9fc8fc75b0e9fb48021f2ac08dcb38212a53f	2020-04-03 16:37:44 -07:00
Wanchao Liang	4d5fe90046	[rpc] replace tests on worker_name (#35955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35955 bulk replace worker{} with worker_name() Test Plan: Imported from OSS Differential Revision: D20849012 Pulled By: wanchaol fbshipit-source-id: 52ab1439c9dbfe814b576a97b0689331f1ff0274	2020-04-03 16:32:30 -07:00
Basil Hosmer	ccfcf47531	Calls to Tensor::to pass MemoryFormat by TensorOptions (#34249 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34249 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20834164 Pulled By: bhosmer fbshipit-source-id: 67586512df6b30869a8a77149fde6ff27beab81e	2020-04-03 16:28:17 -07:00
Kimish Patel	d559a47933	Enable relu fusion with prepacked linear/conv. (#35705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35705 Introduces a pass for relu fusion. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20746592 fbshipit-source-id: 6c22f60a20e9121618c85077b9b58fb8d4082b3b	2020-04-03 15:38:45 -07:00
Eli Uriegas	eb42199788	third_party: bump fbgemm to 0bb23bf9 (#35988 ) Summary: Looks like the branch was force pushed, lets update this to a commit that exists Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35988 Differential Revision: D20849115 Pulled By: seemethere fbshipit-source-id: 2f1202dcddef834d0b75a46e1202aa30b0176ac9	2020-04-03 15:33:46 -07:00
David Reiss	a054d05707	Add torch.utils.show_pickle for showing pickle contents in saved models (#35168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35168 Sometimes when a saved model isn't working, it's nice to be able to look at the contents of the pickle files. Unfortunately, pickletools output isn't particularly readable, and unpickling is often either not possible or runs so much post-processing code that it's not possible to tell exactly what is present in the pickled data. This script uses a custom Unpickler to unpickle (almost) any data into stub objects that have no dependency on torch or any other runtime types and suppress (almost) any postprocessing code. As a convenience, the wrapper can search through zip files, supporting command lines like `python -m torch.utils.show_pickle /path/to/model.pt1@*/data.pkl` When the module is invoked as main, we also install a hack in pprint to allow semi-resonable formatting of our stub objects. Test Plan: Ran it on a data.pkl, constants.pkl, and a debug pkl Differential Revision: D20842550 Pulled By: dreiss fbshipit-source-id: ef662d8915fc5795039054d1f8fef2e1c51cf40a	2020-04-03 15:11:20 -07:00
Basil Hosmer	ec3b355a0f	Update ostream << TensorOptions printer. (#35892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35892 A couple recent property additions were missing, plus we weren't distinguishing between defaults and bona fide property values. Test Plan: Imported from OSS Differential Revision: D20834147 Pulled By: bhosmer fbshipit-source-id: 26a7e433414e0cde1eee2a9a67472f03ba970897	2020-04-03 15:01:59 -07:00
Nikita Shulga	03a4a4887d	Fix `clang-format` (#35969 ) Summary: Just run `./tools/clang_format.py --verbose` and `git commit --all` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35969 Test Plan: CI Differential Revision: D20845626 Pulled By: malfet fbshipit-source-id: 0ae9a91dfa33417a021e7e9d233baba4188daf81	2020-04-03 14:36:20 -07:00
davidriazati	71669f0249	Fix flake8 (#35968 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35968 Pulled By: driazati Differential Revision: D20845617 fbshipit-source-id: 1b1cedb9c5c721f7f7edf94b91fbbb97d249bc2a	2020-04-03 14:02:37 -07:00
Nikita Shulga	7b04772c51	Keep same autogenerated files structure between fbcode and OSS builds (#35951 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35951 Change generate_code to keep folder structure the same regardless of whether install path is provide Amend build_variables.bzl accordingly Another preliminary step to merge https://github.com/pytorch/pytorch/pull/35220 Test Plan: CI Reviewed By: EscapeZero, seemethere Differential Revision: D20839410 fbshipit-source-id: 02297560a7e48aa7c6271f7a8517fc4a1ab35271	2020-04-03 12:28:07 -07:00
Mikhail Zolotukhin	ba3cec867f	Reenable test/test_tensorexpr.py (#35914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35914 Test Plan: Imported from OSS Differential Revision: D20827188 Pulled By: ZolotukhinM fbshipit-source-id: ffcc1bb0396a0a19afb577a7ab4ca95c7e4ced37	2020-04-03 12:20:31 -07:00
Mikhail Zolotukhin	af5121f62a	Invoke TensorExpr fuser pass from a graph executor. (#35913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35913 The pass itself is still disabled by default, but with this change we don't need to register it as a custom pass anymore. It allows us to control its behavior with env variables more easily. Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D20827189 Pulled By: ZolotukhinM fbshipit-source-id: e74d90b5e46422e7ab7bc40974a805220da50fbc	2020-04-03 12:20:26 -07:00
Mikhail Zolotukhin	b3d30f2dc4	[TensorExpr] Compiler warnings cleanups. (#35925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35925 Test Plan: Imported from OSS Differential Revision: D20830304 Pulled By: ZolotukhinM fbshipit-source-id: 5ecd8fd403a3222385306a5295a199c86c88a6cc	2020-04-03 12:18:42 -07:00
Devin He	b46fddf506	idtt + zch distributed inference (#35763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35763 Adds inference function and test for ScatterAssign Test Plan: Updated unit test Reviewed By: yyetim, shunting1986 Differential Revision: D20501079 fbshipit-source-id: 7ec6ef0127a151250dd699c90c2b80c35cfb1fe4	2020-04-03 12:09:34 -07:00
davidriazati	596153cad1	[jit] Enable type tags in serialization (#35741 ) Summary: This enables the serialization part of this change (the deserialization stuff is already landed #33255) Pull Request resolved: https://github.com/pytorch/pytorch/pull/35741 Pulled By: driazati Differential Revision: D20758124 fbshipit-source-id: e2cdefa99c3bec991491e5e967e7f1661ca7ffd9	2020-04-03 11:59:42 -07:00
Omkar Salpekar	19bbfbe1cf	[RPC][Better Engineering] Consolidated all rpcAgentRunning atomic booleans (#33915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33915 Closes: https://github.com/pytorch/pytorch/issues/32963 Test Plan: build bot Reviewed By: jjlilley Differential Revision: D20074714 fbshipit-source-id: ee89e76f547a1da71825a317c096176524504290	2020-04-03 11:50:05 -07:00
Kimish Patel	c5c63a2e35	Add quick utility to transform scripted/traced models for mobile. (#35904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35904 Currently this optimization means transform conv2d and linears to prepacked(xnnpack) equivalent. Test Plan: buck run fbsource//xplat/caffe2:optimize_for_mobile -- --model="/tmp/inpainting_fbnet.pt" Reviewed By: AshkanAliabadi Differential Revision: D20824433 fbshipit-source-id: 88d5c0d21b77911f95f018b03398b0df758ab0d7	2020-04-03 11:42:11 -07:00
Kimish Patel	f48008c261	Set eval mode during optimization for mobile. (#35903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35903 Eval mode must be set for module freezing, which is required for prepack folding. Test Plan: Test locally by transforming a model. As shown in the diff above this one. Reviewed By: AshkanAliabadi Differential Revision: D20824420 fbshipit-source-id: 6c226f44cca317b0333fb580ebbfd060128ae919	2020-04-03 11:39:37 -07:00
davidriazati	6e13a7787b	[jit] Fix type comparisons segfault (#35929 ) Summary: Pybind will convert `None`s to `nullptr`s, so this adds a check to make sure those don't get into the actual type comparison logic. Fixes #35778 ](https://our.intern.facebook.com/intern/diff/20831278/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/35929 Pulled By: driazati Differential Revision: D20831278 fbshipit-source-id: 5800050e5eec280072afde58141ad00c1e8db8e2	2020-04-03 11:33:48 -07:00
Will Feng (FAIAR)	2fa3c1570d	Refactor C++ API parity test mechanism and turn it on in CI again (#35190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35190 The following are the main changes: - The main logic of C++ API parity test mechanism is moved from `test/test_cpp_api_parity.py` to `test/cpp_api_parity/module_impl_check.py` and `test/cpp_api_parity/functional_impl_check.py`, so that there is a clear separation between module tests and functional tests, although they still share a lot of common utility functions which are all in `test/cpp_api_parity/utils.py`. - Module init tests (i.e. testing whether C++ module accepts the same constructor options as the corresponding Python module) is removed and will be added again in the future. - `cpp_constructor_args` / `cpp_options_args` / `cpp_function_call` are added as appropriate to all test params dict in `torch/testing/_internal/common_nn.py`, to indicate how to run C++ API parity test for this test params dict. Test Plan: Imported from OSS Differential Revision: D20588198 Pulled By: yf225 fbshipit-source-id: 11238c560c8247129584b9b49df73fff40c4d81d	2020-04-03 11:20:36 -07:00
Orion Reblitz-Richardson	2d8dbcd3ef	Remove python2 and 3.5 from requirements.txt, README and docs (#35677 ) Summary: Some more cleanup now that we no longer support python2 or 3.5 on master and eventually PyTorch 1.6 release. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35677 Differential Revision: D20838097 Pulled By: orionr fbshipit-source-id: 95d553a1e8769f3baa395e0bc6d4ce7cd93236e9	2020-04-03 11:05:43 -07:00
Supriya Rao	7468ef04c2	[quant][graphmode] Add quantize_per_tensor.tensors (#35916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35916 quantize_per_tensor can now accept list of tensors. Needed for operators like LSTM and cat Test Plan: Imported from OSS Differential Revision: D20830388 fbshipit-source-id: 73f81cf6b7c7614ef19a73b721bc57cf33211345	2020-04-03 10:42:59 -07:00
Supriya Rao	f0c747243c	[quant][graphmode] Insert Observers for dynamic LSTM (#35894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35894 Insert new TensorListObserver only for weight input of dynamic LSTM This is because we are currently not observing the activation inputs in graph mode. Activation tensors are dynamically quantized within the aten::qlinear_dynamic op Test Plan: python test/quantization/test_quantize_script.py Imported from OSS Differential Revision: D20830387 fbshipit-source-id: 81bd197ee509df41bd7622ed09fa3f199a37573b	2020-04-03 10:42:54 -07:00
Supriya Rao	0429d2c9b8	[quant][graphmode] Add new tensorlist observer for LSTM (#35893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35893 LSTM operator inputs have tensor list for activations and weights. In graph mode we need a new observer to work with tensor list Test Plan: python test/quantization/test_quantization.py ObserverTest Imported from OSS Differential Revision: D20830389 fbshipit-source-id: 4790f8932ae3d38446c1d942a2b3780aa91e3022	2020-04-03 10:41:28 -07:00
Shihao Xu	87582ae6c4	Make RRef type_hint mismatch exception message more actionable to users (#35943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35943 This change will add message to tell why the concrete Module type is not a subtype of the Interface type, by telling the missing method name. For example, users may have forgot to tag that method with torch.jit.export. Test Plan: ` Differential Revision: D7993693 fbshipit-source-id: 1a5b1d9ef483e5e120ab53c2427586560fbb9bcd	2020-04-03 10:25:09 -07:00
Hong Xu	ea8021d726	Make intdiv_256 a more generic binary operator template (#35422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35422 This would make `intdiv_256` a much more generic template that can easily accommodate other types of binary operators in the future. The operator becomes "out-of-place" because this would make it easier to substitute with other operators, and compilers should have no problem optimizing this. Test Plan: Imported from OSS Differential Revision: D20826861 Pulled By: ngimel fbshipit-source-id: a6d0706cc1a585063426e988d9982bad402a9b36	2020-04-03 10:08:07 -07:00
Hong Xu	a1cf3fd1da	lshift and rshift on CUDA should match the behavior on CPU (#35339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35339 CPU version converts integers to their unsigned version first. The CUDA version should also do it. Also added tests for this. Test Plan: Imported from OSS Differential Revision: D20826862 Pulled By: ngimel fbshipit-source-id: 164c84cfd931d8c57177a038c1bb8b6f73134d07	2020-04-03 10:08:03 -07:00
Hong Xu	beac3f27f0	Make intdiv_256 a more generic binary operator template (#35422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35422 This would make `intdiv_256` a much more generic template that can easily accommodate other types of binary operators in the future. The operator becomes "out-of-place" because this would make it easier to substitute with other operators, and compilers should have no problem optimizing this. Test Plan: Imported from OSS Differential Revision: D20824641 Pulled By: ngimel fbshipit-source-id: ec93f7b23eb7196f3791f4d07092ce12c254b6e0	2020-04-03 10:06:40 -07:00
Nikita Shulga	e707cee501	Fix gcc-5.4 compilation (#35935 ) Summary: It needs a hint how to hash `enum class` in `std::unordered_map` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35935 Test Plan: CI Differential Revision: D20837750 Pulled By: malfet fbshipit-source-id: 4208ee4bfa2e3cfbedf5b92bf18031225bf9dfa1	2020-04-03 08:39:30 -07:00
Johannes M Dieterich	be125d18dd	[ROCm] [ROCm 2.10+] enable fp16 dot in Caffe2 backend (#30432 ) Summary: ROCm 2.10 has a hdot implementation, use it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30432 Differential Revision: D20777482 Pulled By: ezyang fbshipit-source-id: b4826cc399faa08bd83047375283b17bcd2477eb	2020-04-03 08:01:23 -07:00
Your Name	8484ca581e	Add `GRAPH_UPDATE` for `x.size()` in Peephole Optimize (#34865 ) Summary: Fix https://github.com/pytorch/pytorch/issues/31820 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34865 Reviewed By: jamesr66a Differential Revision: D20772078 Pulled By: suo fbshipit-source-id: cddf870e23983cc42da898edf3f98897353b2abe	2020-04-03 01:05:14 -07:00
Christian Sarofeen	aeb13f212b	Make ValType hashable. (#35917 ) Summary: Build fix stemming from https://github.com/pytorch/pytorch/issues/34785 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35917 Differential Revision: D20829353 Pulled By: soumith fbshipit-source-id: 4ba84ecedd354efbc9ac47c9b0f0e3871b404f13	2020-04-03 00:16:56 -07:00
Hong Xu	1a146b0577	Vec256<bfloat16>::arange step size should accept templates. (#35842 ) Summary: See https://github.com/pytorch/pytorch/issues/34555 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35842 Differential Revision: D20824448 Pulled By: ngimel fbshipit-source-id: d76d0d499cfd102386931e4c029829e02a657bce	2020-04-02 23:39:58 -07:00
Nikita Shulga	a5af478f29	Use full include path in autogenerated Functions.cpp (#35924 ) Summary: Preliminary step to merge https://github.com/pytorch/pytorch/pull/35220 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35924 Test Plan: CI Differential Revision: D20832159 Pulled By: malfet fbshipit-source-id: 29ff2e3c04c08c39c49f35414f94b76f0651859a	2020-04-02 22:46:09 -07:00
Ashkan Aliabadi	d0ce94d20e	Avoid one unnecessary memory allocation in XNNPACK integration. (#35350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35350 Currently we call input.contiguous() on the input tensor resulting in an unecessary allocation and copy in cases where the input is not contiguous with regards to the requested memory format. The reason is that in such scenarios, this call re-allocates and copies the input tensor into contiguous storage, only for this newly allocated tensor to be used as the source of another copy to the final destination. Instead, if we copy into the destination directly in such circumstances, we will save an allocation and a copy. Differential Revision: D20656798 Test Plan: Imported from OSS Pulled By: AshkanAliabadi fbshipit-source-id: 3f8c51df4d1fd386fa9473e7024621a7b7c6e86c	2020-04-02 21:33:30 -07:00
Kimish Patel	c33ea41f9c	Fixes a bug in serializing min/max plus one more. (#35850 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35850 1. Clamping values were not being propagated through all the structures and hence were not being serialized. 2. Moved to using Scalar for min/max instead of float. Reason being, the fusion for hardtanh_ does not work. During sub graph rewrite we direct values from hardtanh_ to preacking ops, but since they expect float values, the types conflict and we cannot serialize the model. Test Plan: Imported from OSS Differential Revision: D20807523 fbshipit-source-id: 57d6b2e4b65afd9510a0f3ba9365333b768977f5	2020-04-02 21:02:12 -07:00
Nikita Shulga	e2adcc1c53	Report CUDA separate compilation flag (#35726 ) Summary: In Summary specify whether CUDA code is compiled with separate compilation enabled Also, correctly handle space-separate TORCH_NVCC_FLAGS when adding them to NVCC_CUDA_FLAGS Pull Request resolved: https://github.com/pytorch/pytorch/pull/35726 Test Plan: CI + local build with TORCH_NVCC_FLAGS set to "-Xfatbin -compress-all" Differential Revision: D20830885 Pulled By: malfet fbshipit-source-id: 0e0ecab4a97b6c8662a2c4bfc817857da9f32201	2020-04-02 19:35:02 -07:00
Nick Korovaiko	767ea03b22	Clear profiling information timely and appropriately (#35814 ) Summary: Clear profiling information before it gets used by passes before guard insertion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35814 Differential Revision: D20800599 Pulled By: Krovatkin fbshipit-source-id: 978d71c22e1880dc888e7e75e7c25501c573333f	2020-04-02 19:14:56 -07:00
Eli Uriegas	1a72326942	.circleci: Bump libtorch builds to 3.7 (#35912 ) Summary: The image is actually using Python 3.7.2 so we should reflect that within our circleci configs Should fix any issues related to `libtorch*gcc5_4` jobs. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35912 Reviewed By: orionr Differential Revision: D20827149 Pulled By: seemethere fbshipit-source-id: 72917b35f6d176ce1f5bc999d6808b9f1d9944f2	2020-04-02 17:09:05 -07:00
Mike Ruberry	591b5da2c8	Removes integer division call sites (#35862 ) Summary: Per title. Tests of integer division are unchanged. The intent of this PR is to eliminate warning noise as users see our integer div deprecation warning and try to update their programs to be conformant. In particular, some CUDA indexing operations could perform a deprecated integer division, possibly confusing users. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35862 Differential Revision: D20817957 Pulled By: mruberry fbshipit-source-id: b9fa15922c9bcea3cb08c0402ea2515feec137c9	2020-04-02 17:04:15 -07:00
Linbin Yu	ced1e46399	[PTM] register aten::dequantize.self for spark spot int8 model Summary: aten::dequantize.self is the only missing op in spark spot int8 model Test Plan: same as D20761873 Reviewed By: iseeyuan Differential Revision: D20785654 fbshipit-source-id: 19a3394370af58012ed0dedcc458f3633d921527	2020-04-02 16:57:59 -07:00
Alexander Sidorov	f5b9574887	[easy] ThroughputBenchmark: make ScriptModuleBenchmark usable from c++ (#35848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35848 This class so far was used from Python binding only. As a result, testing in c++ only environment is not currently possible. More specifically, adding inputs requires using py::args and py::kwargs. This PR fixes this by adding another addInput function to ScriptModuleBenchmark class. Test Plan: Imported from OSS Differential Revision: D20820772 Pulled By: ilia-cher fbshipit-source-id: f1ea1b7baa637b297cc0dec5ca6375f6caff21f5	2020-04-02 16:53:49 -07:00
Feng Tian	762270c51f	add c10d dynamic loading mechanism and unit test (#28068 ) Summary: The original behavior of pytorch c10d only supports built-in c10d backends, such as nccl/gloo/mpi. This patch is used to extend the c10d capability to support dynamically loading 3rd party communication libraries which are derived from ProcessGroup base class. related RFC is in: https://github.com/pytorch/pytorch/issues/27955 Through this way, user just need specify a 3rd party c10d backend name when invoking torch.distributed.init_process_group(). The proposed logic will try to load corresponding c10d backend cpp extension automatically. as for how to develop a new 3rd party c10d backend through cpp extension, pls refer to test/cpp_extensions/cpp_c10d_extension.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/28068 Differential Revision: D19174838 Pulled By: agolynski fbshipit-source-id: 3409a504a43ce7260e6f9d1207c00e87471fac62	2020-04-02 15:46:51 -07:00
Jack Montgomery	2a4ca70832	Fix contant/placeholder loading in checkGraphCompatibility Summary: Properly load node inputs as placeholders during onnxifi checkGraphCompatibility only if they are non-weigth inputs to the node Test Plan: `buck test glow:` PASS: 2286 FAIL: 0 SKIP: 456 Reviewed By: jfix71 Differential Revision: D20823088 fbshipit-source-id: 76215b2c0c3934e36714201c7e716e8f95463e6d	2020-04-02 15:36:07 -07:00
Elias Ellison	2595c62208	[JIT] Better error on default params error (#35888 ) Summary: Someone messaged me abt this when a better error msg would have solved their problem Pull Request resolved: https://github.com/pytorch/pytorch/pull/35888 Differential Revision: D20819538 Pulled By: eellison fbshipit-source-id: 95d124bfd162e1747dcdf7a981703a279a5dfaa6	2020-04-02 15:31:22 -07:00
anjali411	c070e8fb26	Updated canCast to disallow complex -> non complex conversion (#35883 ) Summary: fixes https://github.com/pytorch/pytorch/issues/35675 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35883 Differential Revision: D20818130 Pulled By: anjali411 fbshipit-source-id: c9b4b6112897639d1e9b7073c5dac7a29b9cd990	2020-04-02 15:12:38 -07:00
Song Zhou	dabeff33b9	[pytorch] Fix fblearner flow compiling errors (#35902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35902 Move operator registration to anonymous namespace to avoid collision. Reviewed By: soumith Differential Revision: D20822382 fbshipit-source-id: 1ab00871491668b8b85e803ac877d96477f1688b	2020-04-02 14:52:48 -07:00
Nick Korovaiko	ddcad5b9ca	temp disable test_tensorexpr.py Test Plan: test on CI Reviewed By: soumith Differential Revision: D20823336 fbshipit-source-id: 65c04bc57c6a120003cb561613645d2d7e60189c	2020-04-02 14:28:22 -07:00
Zachary DeVito	15b711a654	Fix reporting of error message in toBool (#35570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35570 Test Plan: Imported from OSS Differential Revision: D20710690 Pulled By: zdevito fbshipit-source-id: a83c687058c09943438f4c7e183754f931783fbc	2020-04-02 14:28:18 -07:00
Tao Xu	bc7fdacf06	[BugFix] Fix compare_exchange_weak in DispatchStub.h (#35794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35794 ### Summary As PyTorch has gone in production on iOS for about week, we've spotted a few crashes (90 out of 20.3k ) related to DispatchStub.h. The major part of the crash log is pasted below (full crash information can be found at `bunnylol logview 1d285dc9172c877b679d0f8539da58f0`): ``` FBCameraFramework void at::native::DispatchStub<void ()(at::TensorIterator&, c10::Scalar), at::native::add_stub>::operator()<at::TensorIterator&, c10::Scalar&>(c10::DeviceType, at::TensorIterator&, c10::Scalar&)(DispatchStub.h:0) +FBCameraFramework at::native::add(at::Tensor const&, at::Tensor const&, c10::Scalar)(BinaryOps.cpp:53) +FBCameraFramework at::CPUType::add_Tensor(at::Tensor const&, at::Tensor const&, c10::Scalar)(CPUType.cpp:55) +FBCameraFramework at::add(at::Tensor const&, at::Tensor const&, c10::Scalar)(Functions.h:1805) +FBCameraFramework [inlined] c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::intrusive_ptr(c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>&&)(intrusive_ptr.h:0) +FBCameraFramework [inlined] c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>::intrusive_ptr(c10::intrusive_ptr<c10::TensorImpl, c10::UndefinedTensorImpl>&&)(intrusive_ptr.h:221) +FBCameraFramework [inlined] at::Tensor::Tensor(at::Tensor&&)(TensorBody.h:93) +FBCameraFramework [inlined] at::Tensor::Tensor(at::Tensor&&)(TensorBody.h:93) +FBCameraFramework c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >::operator()(at::Tensor, at::Tensor, c10::Scalar)(kernel_lambda.h:23) +FBCameraFramework [inlined] c10::guts::infer_function_traits<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> > >::type::return_type c10::detail::call_functor_with_args_from_stack_<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >, false, 0ul, 1ul, 2ul>(c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >, std::__1::vector<c10::IValue, c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >::allocator<std::__1::vector> >, c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >::integer_sequence<unsigned long, 0ul, 1ul, 2ul>)(kernel_functor.h:210) +FBCameraFramework [inlined] c10::guts::infer_function_traits<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> > >::type::return_type c10::detail::call_functor_with_args_from_stack<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >, false>(c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >, std::__1::vector<c10::IValue, c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >::allocator<std::__1::vector> >)(kernel_functor.h:218) +FBCameraFramework c10::detail::make_boxed_from_unboxed_functor<c10::detail::WrapRuntimeKernelFunctor_<(anonymous namespace)::$_3, at::Tensor, c10::guts::typelist::typelist<at::Tensor, at::Tensor, c10::Scalar> >, false, void>::call(c10::OperatorKernel, c10::OperatorHandle const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >)(kernel_functor.h:250) +FBCameraFramework [inlined] (anonymous namespace)::variable_fallback_kernel(c10::OperatorHandle const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >)(VariableFallbackKernel.cpp:32) +FBCameraFramework void c10::KernelFunction::make_boxed_function<&((anonymous namespace)::variable_fallback_kernel(c10::OperatorHandle const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >))>(c10::OperatorKernel, c10::OperatorHandle const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >)(KernelFunction_impl.h:21) +FBCameraFramework torch::jit::mobile::InterpreterState::run(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&)(interpreter.cpp:0) +FBCameraFramework torch::jit::mobile::Function::run(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >&) const(function.cpp:59) +FBCameraFramework torch::jit::mobile::Module::run_method(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >)(module.cpp:51) +FBCameraFramework [inlined] torch::jit::mobile::Module::forward(std::__1::vector<c10::IValue, std::__1::allocator<c10::IValue> >)(module.h:28) ``` The problem is `compare_exchange_weak` is not guaranteed to be successful in one shot, as described in [C++ Concurrency in Action (2nd Edition)](https://livebook.manning.com/book/c-plus-plus-concurrency-in-action-second-edition/chapter-5/79). This might result in `cpu_dispatch_ptr` being null pointer in concurrent situations, thus leading to the crash. As suggested in the book, due to spurious failure, the `compare_exchange_weak` is typically used in a loop. There is also a [stackoverflow discussion](https://stackoverflow.com/questions/25199838/understanding-stdatomiccompare-exchange-weak-in-c11) about this. Feel free to drop comments below if there is a better option. ### The original PR - [Enhance DispatchStub to be thread safe from a TSAN point of view](https://github.com/pytorch/pytorch/pull/32148) ### Test Plan - Keep observing the crash reports in QE Test Plan: Imported from OSS Differential Revision: D20808751 Pulled By: xta0 fbshipit-source-id: 52f5c865b70c59b332ef9f0865315e76d97f6eaa	2020-04-02 14:26:48 -07:00
Soumith Chintala	d9dd353a00	fix clang-format (#35884 ) Summary: breakage introduced in PR that I landed Pull Request resolved: https://github.com/pytorch/pytorch/pull/35884 Differential Revision: D20817603 Pulled By: soumith fbshipit-source-id: b0729bed81549d4c8e6a889c380baa19c73ef127	2020-04-02 12:12:27 -07:00
Taylor A. Robie	09660896c0	Break circular dependency between ATen.h and TensorIndexing.h (#35765 ) Summary: This is mostly just so VS Code will stop yelling at me. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35765 Differential Revision: D20787435 Pulled By: robieta fbshipit-source-id: c8173399328e6da60a07bfcb4b62e91f7f4fe34a	2020-04-02 12:03:15 -07:00
Eli Uriegas	a53328e89c	cmake: Grab TORCH_DEFAULT_VERSION from version.txt (#35260 ) Summary: This variable hasn't been updated in a long time since it usually just gets overwritten by whatever is in the setup.py but let's set the default to something a bit more in-line with what we're actually building. Closes https://github.com/pytorch/pytorch/issues/35210 cc ksasso1028 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35260 Differential Revision: D20818302 Pulled By: seemethere fbshipit-source-id: 530fe137e45be1d0ac0233525c80f7099c17b05a	2020-04-02 11:57:47 -07:00
Zachary DeVito	9097b55479	Propagate static_if more completely. (#35834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35834 This handles the cases we did not handle before in AND and OR statements: static_true \|\| <unknown> -> static_true static_false && <unknown> -> static_false Test Plan: Imported from OSS Differential Revision: D20801125 Pulled By: zdevito fbshipit-source-id: 0ef94c3a14c7af91580fc5248a4ccfd9e8d6d481	2020-04-02 11:44:34 -07:00
Yanli Zhao	173e444e66	track ddp API usage (#35837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35837 track ddp API usage ghstack-source-id: 101344592 Test Plan: check logging Differential Revision: D20801983 fbshipit-source-id: f3d2e6ab5bde0b1320300d67a327f12ddb2c62e7	2020-04-02 11:35:05 -07:00
Kimish Patel	602b51eb30	Changes to qadd for perf improvement. Summary: qadd calls contiguous on input tensors. This by default does contiguous in NCHW format (for 4D tensors). We should call .contiguous(input.suggest_memory_format()) Output allocation also done NCHW format. This results in the subsequent conv having to do memcpy for NHWC format. Both of this leads to majority of the time spent in qadd in copying in FBNET_A model. Fixing these reduces runtime on S8 phone to 15ms from 17. Reducing the gap between c2 and PT latency from ~24% to ~9.5%. Also note that the contract for ops is that they return output tensor in same format as the input memory format. Test Plan: Apply on top of diff D20721889. bento console --file mobile-vision/projects/model_zoo/scripts/run_create_model_benchmark.py Note: There are many calls to .contiguous without format specification in aten/src/ATen/native/quantized/cpu. All those should be replaced with .contiguous(input.suggest_memory_format()) whenever applicable (Most likely applicable to all elementwise ops) Also same should apply for output allocation. Reviewed By: dreiss Differential Revision: D20794692 fbshipit-source-id: 6b81012497721d48e7d6a5efcc402f315b1dfe77	2020-04-02 11:33:18 -07:00
Mikhail Zolotukhin	3ef5ff6012	[TensorExpr] Make Load and Store multi-dimensional. (#35800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35800 This PR includes the following changes: * Introduce a new `Expr` type `Buf`: it plays a similar to `Var` role, but also has dimensions. * Use the new `Buf` class in `Store` and `Load` instead of `Var` for specifying where to store to or load from. `Buf` contains the dimensions info of the buffer we're loading/storing to and hence we are able to keep N-d indexes without flattening them into a 1-d index ([x,y] vs [x+yW]). Flattening of the indexes is now a separate pass that is executed in `LoopNest::prepareForCodegen` - backends still expect indexes to be flattened, and this PR preserves that. * `Tensor` now contains a `Buf` instead of `Var`, and thus Tensor now has the dimensions info (previously it was a property of a `Function`, not a `Tensor`). This brings us closer to Tensor being a combination of Buffer + Function, where Buffer specifies iteration domain and the Function defines a computation. TODOs: * Consider merging `Buffer` with `Buf` or `BufHandle`. It seems that we don't need all of them. * Harden the logic of how we create buffers in fuser pass. Currently it seems that sometimes we don't set dimensions. * Use `Buf` in `Allocate` and `Free`. * Make it clearer that `Function` doesn't "own" dimensions info and that dimensions are a property of a Tensor, not a Function. Differential Revision: D20789005 Test Plan: Imported from OSS Reviewed By: zheng-xq Pulled By: ZolotukhinM fbshipit-source-id: e04188d1d297f195f1c46669c614557d6bb6cde4	2020-04-02 11:18:28 -07:00
Tristan Rice	676fc929b7	[caffe2] fix type and shape inference for common gradient ops (#35857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35857 This fixes a lot of common ops for InferBlobShapesAndTypes as well as adds support for testing the inferred shapes and types of gradient ops. Ops: * Concat * Split * LeakyReLU * Relu * Prelu * Gelu * Elu * Sinh, Tanh, Cosh * Abs * ... and a number of other simple element wise ops Test Plan: Added support to hypothesis test to check the shape and type of gradient ops. Enabled it for all the ops I fixed the shape and type inference for. buck test caffe2/caffe2/python/operator_test: Reviewed By: pradeepd24 Differential Revision: D20806284 fbshipit-source-id: 77f796d9ff208e09e871bdbadf9a0a7c196b77f2	2020-04-02 11:17:04 -07:00
Di Wu	c4f56e9685	[pytorch][PR] Optimize qavg_pool3d_nhwc (#35740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35740 For one of the quantized CV model, the avg_pool3d operation is more than 6x slower than C2 implementation. The reason behind this comes from the following aspects: - function access inside the loop (such as ```q_scale()``` and ```q_zero_point()```) - additional data copy in ```Vec256::store``` and ```at::quantize_vec``` This diff resolves the above issue with the following measures: - lift function access outside the loops - add an 8-lane path in ```QuantizeAvx2``` to replace ```at::quantize_vec``` - in addition, interchanges c-loop to the innermost for better memory locality. Test Plan: buck test //caffe2/test:quantized Performance Before (n x h x w x c = 4 x 56 x 56 x ch): ``` type c=2 c=4 c=15 c=24 c=48 c=128 c=256 torch.qint8 903.08 us 1373.39 us 2297.97 us 636.72 us 864.98 us 1618.72 us 2908.47 us torch.quint8 911.93 us 1429.39 us 2315.59 us 623.08 us 844.17 us 1522.28 us 2711.08 us torch.qint32 897.77 us 1346.97 us 3846.41 us 6211.92 us 11977.74 us 34348.23 us 62927.48 us ``` Performance After: ``` type c=2 c=4 c=15 c=24 c=48 c=128 c=256 torch.qint8 123.29 us 176.00 us 348.90 us 99.02 us 132.73 us 267.17 us 513.43 us torch.quint8 123.76 us 171.90 us 338.17 us 97.92 us 131.06 us 260.09 us 521.16 us torch.qint32 102.97 us 172.57 us 559.31 us 814.03 us 1606.11 us 4164.89 us 10041.52 us ``` Reviewed By: lly-zero-one Differential Revision: D20711888 fbshipit-source-id: a71dd55639500f4a036eee96c357737cff9d33db	2020-04-02 11:12:24 -07:00
Karl Ostmo	0f99b28431	Revert D20775783: Add DispatchKey impl overload; remove use of torch::dispatch Test Plan: revert-hammer Differential Revision: D20775783 Original commit changeset: e45b289e5d1f fbshipit-source-id: 08551428fa886e93cfda14eb51a0f920c335df34	2020-04-02 10:51:50 -07:00
Karl Ostmo	e67951af63	Revert D20775782: Add temporary impl_UNBOXED syntax sugar for unboxed-only defs. Test Plan: revert-hammer Differential Revision: D20775782 Original commit changeset: c5e804c69f59 fbshipit-source-id: 2198e715eb3a24d198a949a44ec192bec523ffb4	2020-04-02 10:48:51 -07:00
Will Feng (FAIAR)	86f3305859	Improve C++ API autograd and indexing docs (#35777 ) Summary: This PR adds docs for the following components: 1. Tensor autograd APIs (such as `is_leaf` / `backward` / `detach` / `detach_` / `retain_grad` / `grad` / `register_hook` / `remove_hook`) 2. Autograd APIs: `torch::autograd::backward` / `grad` / `Function` / `AutogradContext`, `torch::NoGradGuard` / `torch::AutoGradMode` 3. Tensor indexing Pull Request resolved: https://github.com/pytorch/pytorch/pull/35777 Differential Revision: D20810616 Pulled By: yf225 fbshipit-source-id: 60526ec0c5b051021901d89bc3b56861c68758e8	2020-04-02 09:33:11 -07:00
Christian Sarofeen	6d24f8fe21	Infrastructure for a new CUDA Fuser (#34785 ) Summary: Summary: This PR contains the infrastructure of a new CUDA fuser. This CUDA fuser is based on many of the same principles of TensorExpressions and Halide, however the implementation is ground up. The fusion pass itself is similar to the default CUDA fuser, however, it has undergone some refactoring and is using the new code generation infrastructure. For those who are interested in how the code generation in this PR works, I would recommend reviewing _test/cpp/jit/test_gpu_fusion.cpp_ as well as the long comment section at the beginning of _torch/csrc/jit/codegen/cuda/transform_replay.h_ One of the largest differences between our approach and that of TVM/Halide, is the concept of "TensorView". TensorView from a high level should be thought of similarly to how we think of working with Tensors in PyTorch. It's an N-D object which can undergo transformations that change its dimensionality. Dimensionality changes are done through the operations split/merge/reorder/computeAt. These transformations are similar to split/fuse/reorder/compute_at of TVM, they modify how a tensor is iterated over to generate GPU code. Interestingly, in our scheme these transformations are applied to tensors and only impact how that tensor is generated. Warning: This PR is purposefully not feature complete with the current fuser. We wanted to separate out the infrastructure from the fusion capabilities. Once in, smaller incremental PRs will be submitted to expand capabilities of the fuser. Short term goals: Parity with current CUDA fuser (including performance): - Dynamic shapes (no recompilation) - Implicit handling of braodcast (broadcasted tensors are treated as tensors of the braodcasted size in the generated code) - Dropout Mid-term goals: - Transposes fused with pointwise operations where transpose involves only 2 axes (across the fused operation). - 1-D reductions fused with pointwise operations Pull Request resolved: https://github.com/pytorch/pytorch/pull/34785 Reviewed By: ZolotukhinM Differential Revision: D20650977 Pulled By: soumith fbshipit-source-id: ee39c95a880e1b9822e874ed4cc180971572bf63	2020-04-02 09:22:42 -07:00
Edward Yang	8e951c5793	Add temporary impl_UNBOXED syntax sugar for unboxed-only defs. (#35714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35714 There are a lot of unboxed only defs. We're committed to removing them at the end of the half but as I am about to do a lot of porting to the new API, let's get them into a form where they're easy to remove. This is a new overload impl_UNBOXED that will pass the function pointer straight to CppFunction::makeUnboxedOnly I don't attempt to make the _UNBOXED API complete; in particular, catchall declarations don't get this sugar (as there are very few of them). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20775782 Pulled By: ezyang fbshipit-source-id: c5e804c69f5961c9d4862f6c5dbbe4c524cc32cc	2020-04-02 08:52:54 -07:00
Edward Yang	2db61193bb	Add DispatchKey impl overload; remove use of torch::dispatch (#35706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35706 It is extremely common to define implementations of operators at a specific dispatch key, so we add an overload to impl specifically for this case. I then delete most uses of torch::dispatch dispatch_autograd call sites can't make use of this overload. So instead the new preferred way to specify something as autograd is to pass kAutograd as the dispatch key (short form, analogous to kCPU/kCUDA which we support today). I flip flopped about whether or not kAutograd should have the type DispatchKey or some other type (to help better encapsulate the DispatchKey enum); this is more direct and I can't think of any BC problems from this usage. Some other reorganization I did: - I renamed all of the worker functions in op_registration to have a leading underscore and made them private, just to make it more clear what the public versus private API were (the private API shouldn't be used by users because it doesn't come with && overloads) - In a few places where I was touching lines already, I replaced full DispatchKey typed out enums with shorter kFoo names, similar to kAutograd but I didn't publish these globally. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20775783 Pulled By: ezyang fbshipit-source-id: e45b289e5d1f86c180b24cf14c63cf4459ab5337	2020-04-02 08:51:22 -07:00
Peng Xia	c3abcf83aa	[AI Bench] Resumme speed_benchmark_torch.cc to origin Summary: we removed all assistant specific code Test Plan: ``` buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/fbnet_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices SM-G950U-7.0-24 ``` https://our.intern.facebook.com/intern/aibench/details/940147322057842 Reviewed By: kimishpatel Differential Revision: D20686220 fbshipit-source-id: b7336d5ea15fa11be01abf4ad12747feaaf22ea8	2020-04-02 08:35:46 -07:00
Gregory Chanan	1bd68eafb5	Skip ROCm test in test/test_cpp_extensions_aot.py (#35838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35838 It may be flaky. Test Plan: Imported from OSS Differential Revision: D20807409 Pulled By: gchanan fbshipit-source-id: f085d05bcb6a04d304f3cd048c38d2e8453125d6	2020-04-02 08:28:46 -07:00
Nick Gibson	051132f119	[TensorExpr] simplification of round + mod pattern. (#35683 ) Summary: Adds capabilities to the TensorExpr IR Simplifier to simplify down Round + Mod patterns (e.g. `(x/y)y + x%y => x`) via means of lifting integer rounding into a temporary `RoundOff` node. This integrates with existing simplification mechanisms (folding, factorization, reordering, etc) to allow simplification of compound expressions: e.g. `20 (x / (16 / 2)) * 2 + (11 % 6) * (x % (7+1)) => 5 * x.`. Tests: ran tensorexpr cpp and python tests, ran a hpc benchmark and verified results and time didn't regress. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35683 Differential Revision: D20811316 Pulled By: nickgg fbshipit-source-id: 0cd6a517fb9548b3bc689768304b97375df5ac58	2020-04-02 00:11:00 -07:00
Nick Korovaiko	2f50c11954	add test_tensorexpr.py (#35776 ) Summary: Adding `test_tensorexpr.py` to our CI. There's a few complications: the first one is that we now always run `SimpleIREVal` as a part of simplifier, so the counts will always be greater than one. We can potentially invest some effort to differentiate between a real codegen call to `SimpleIREval` and calls in simplifier, but it's probably not that important and the second change to turn not being able to retrieve a counter into a default value of 0 since the test are structured to test for either an llvm or simpleireval backends, so it only seems appropriate to not fail the test too early. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35776 Differential Revision: D20799333 Pulled By: Krovatkin fbshipit-source-id: 2a94ff98e647180c6e6aea141a411c3376c509f9	2020-04-01 22:05:37 -07:00
Rohan Varma	6616fad92e	[Docs] Fix typo in RPC docs (#35809 ) Summary: It's also fixed in the cherry pick PR https://github.com/pytorch/pytorch/pull/35808 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35809 Differential Revision: D20803338 Pulled By: rohan-varma fbshipit-source-id: 1925f367703faf053ab4b1c0ff0acb86230c5d89	2020-04-01 21:16:12 -07:00
Will Feng (FAIAR)	b33ae23c5a	Revert D20794765: [pytorch][PR] Improve C++ API autograd and indexing docs Test Plan: revert-hammer Differential Revision: D20794765 Original commit changeset: fad623e5d505 fbshipit-source-id: 041fb7257d4978a3767d8229d70d6f3cc55e5f28	2020-04-01 20:14:13 -07:00
Omkar Salpekar	6792dac90d	Only Schedule Retries before Agent Shutdown (#35554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35554 We attach a callback to our RPC send attempts that schedule a retry upon failure. This PR only schedules the retry if the agent is running. ghstack-source-id: 101332815 Differential Revision: D20612615 fbshipit-source-id: e1bbb3f162101bce7eb46bad512c9e5dc6d531cc	2020-04-01 19:03:09 -07:00
Jerry Zhang	b3c0939af3	[quant][graphmode][refactor] Move the whitelists to a centeralized place (#35721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35721 Test Plan: . Imported from OSS Differential Revision: D20771829 fbshipit-source-id: f6ec3afe2d8034acbdbd81e5a6fbd4a2a76aa7ac	2020-04-01 18:26:39 -07:00
Evgeny Fiksman	e372f42110	[caffe2] Explicit vectorization of LSTM operator (#35556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35556 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35542 Apply explicit vectorization to lstm_unit operator. Enabled by -DENABLE_VECTORIZATION=1 This optimization requires vector library support and was tested with Intel SVML & clang. However, compiler which support OpenMP4.5 with omp simd extention should also benefit. After the code changes In file included from caffe2/caffe2/operators/lstm_unit_op.cc:1: caffe2/caffe2/operators/lstm_unit_op.h:60:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize] VECTOR_LOOP for (int d = 0; d < D; ++d) { caffe2/caffe2/operators/lstm_unit_op.h:60:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize] caffe2/caffe2/operators/lstm_unit_op.h:112:1: remark: vectorized loop (vectorization width: 8, interleaved count: 1) [-Rpass=loop-vectorize] VECTOR_LOOP for (int d = 0; d < D; ++d) { Test Plan: Check failures at OSS CI - No build failures related to this change - Failing tests are: - py3.6-clang7-rocmdeb-ubuntu16.04-test2 >RuntimeError: fft: ATen not compiled with MKL support - caffe2_onnx_ort2_py3_6_clang7_ubuntu16_04_test - >gradient_check_test.py::TestMakeTwo Exited with code exit status 1 - pytorch_macos_10_13_py3_test , Test errors like: > ERROR [0.014s]: test_boolean_indexing_weirdness_cpu (__main__.NumpyTestsCPU) RuntimeError: shape mismatch: indexing tensors could not be broadcast together with shapes [0], [2] - caffe2_onnx_ort1_py3_6_clang7_ubuntu16_04_test - No failure info Reviewed By: jspark1105 Differential Revision: D20484640 fbshipit-source-id: 8fb82dbd6698c8de3e0bbbc0b48d15b70e36ca94	2020-04-01 17:19:56 -07:00
Gregory Chanan	e0ee8000ac	Make test_leaky_relu_inplace_with_neg_slope device-generic and skipIfRocm. (#35816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35816 Fixes https://github.com/pytorch/pytorch/issues/35689. Test Plan: Imported from OSS Differential Revision: D20796656 Pulled By: gchanan fbshipit-source-id: 474790fe07899d9944644f6b3d7a15db1c2b96db	2020-04-01 17:02:04 -07:00
Will Feng (FAIAR)	41ef2c0d58	Improve C++ API autograd and indexing docs (#35777 ) Summary: This PR adds docs for the following components: 1. Tensor autograd APIs (such as `is_leaf` / `backward` / `detach` / `detach_` / `retain_grad` / `grad` / `register_hook` / `remove_hook`) 2. Autograd APIs: `torch::autograd::backward` / `grad` / `Function` / `AutogradContext`, `torch::NoGradGuard` / `torch::AutoGradMode` 3. Tensor indexing Pull Request resolved: https://github.com/pytorch/pytorch/pull/35777 Differential Revision: D20794765 Pulled By: yf225 fbshipit-source-id: fad623e5d505b7cfcd76a8c5264f18b7a0a3298c	2020-04-01 16:54:08 -07:00
Xiao Wang	301be851ef	Fix grid_sample out of boundary when grid contains large numbers (#35506 ) Summary: This PR would fix https://github.com/pytorch/pytorch/issues/35202, fix GPU part of https://github.com/pytorch/pytorch/issues/24823, be related to https://github.com/pytorch/pytorch/issues/24870. Here is the origin of this problem. 1. Like those in https://github.com/pytorch/pytorch/issues/35202, with large numbers in grid like `grid.min() == -10059144 grid.max()==67680944`; or `nan, inf, 1.0E20` in https://github.com/pytorch/pytorch/issues/24823, `4d39aeec27/aten/src/ATen/native/cuda/GridSampler.cu (L309-L321)` `ix, iy` will be unnormalized to very large numbers, exceed the bound of INT_MAX. Then, those `ix_nw, iy_nw` variables will be cast to INT_MAX, and some other variables with "+1" will be INT_MIN. 2. However, these INT_MAX, INT_MIN should not big problems, because `4d39aeec27/aten/src/ATen/native/cuda/GridSampler.cu (L358-L362)` `4d39aeec27/aten/src/ATen/native/cuda/GridSampler.cuh (L202-L205)` these `within_bounds_2d` functions are supposed to guard the if-statement, prevent the illegal memory access, and leave those output values as zero (padding_modes='zeros'). 3. Now here comes the problem, `within_bounds_2d` is set to "inline". We found that those `+1` statement and `>=0` statement may cause compiler to "optimize" the code, that is: ```cpp int B = something; int a = something; int b = a + 1; bool r = (b >= 0 && b < B); ``` will be compiled into assembly code like ```cpp int B = something; int a = something; bool r1 = (a > -2) int b = a + 1; bool r2 = (b < B); bool r = r1 && r2; ``` This looks nice, but when a = INT_MAX, `a+1` causes Undefined Behavior. Typically, we get b = INT_MIN, then the boolean result from compiled assembly will be true. The `within_bounds_2d` no longer guards us from the illegal memory access. 4. There could be different ways to fix this bug. For example, we may set all of the "ix_nw, iy_nw" values to `int64_t`. That would be a potential performance issue, and doesn't prevent those examples in https://github.com/pytorch/pytorch/issues/24823 with 1E20 in grid. One minimal fix that I found is to restrict `within_bounds_2d` from being inlined. Thus, compiler won't optimize those `a+1` and `a>=0` code together. I did a short performace test, just to make sure this forced noinline solution won't cause regression. The performance script can be found at `a6f8bce522/grid-sample/grid-sample.ipynb`. For this `__attribute__((noinline))` macro, I have tested that on nvcc, and there was no problem. I'm not sure if that also works on clang. cc csarofeen ptrblck ngimel bnehoran zasdfgbnm SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/35506 Differential Revision: D20799304 Pulled By: ngimel fbshipit-source-id: fc70289b35039fad954908a990ab0a2f16fbfcb2	2020-04-01 14:38:30 -07:00
Nikita Shulga	16774f7353	Increase TimerTest tolerance to 20% on Windows (#35818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35818 Test Plan: CI Differential Revision: D20798424 Pulled By: malfet fbshipit-source-id: 57e8d9c6b93903a6632168a4a35bf946d8c518aa	2020-04-01 14:29:05 -07:00
Mikhail Zolotukhin	9fe3b1857d	[TensorExpr] Fix imports in tensorexpr benchmarks. (#35830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35830 Test Plan: Imported from OSS Differential Revision: D20799464 Pulled By: ZolotukhinM fbshipit-source-id: 1b5981ad15042f601a9b6eb01a799cdf71200666	2020-04-01 14:23:33 -07:00
Gregory Chanan	1c93a19a7f	Fix another case of float2::x and float2::y may not be the same on ROCm (#35785 ) Summary: This is another case of the issue fixed in https://github.com/pytorch/pytorch/pull/35783. Mirroring 35786. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35785 Differential Revision: D20800317 Pulled By: ezyang fbshipit-source-id: de5f32839755d5ff5aefff8408df69adbab4d0a1	2020-04-01 14:14:30 -07:00
svcscm	50b0bb6c6a	Updating submodules Summary: GitHub commits: `be34fbe8a4` `c5bc292372` `a09ba2acd7` `c1beec58f4` `a643e68d6d` `2da6546f44` `57096ab13e` `be56f1c78e` `204dff9f76` `79103e7664` `dba77af4fd` `03c4c1bf82` `896dffc48f` `815e209e4f` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 63fad0d8a163a7b5f0107c6b5642cb227f73a2ae	2020-04-01 13:55:49 -07:00
Nikita Shulga	26ee0eee10	Use cufft_static_nocallback (#35813 ) Summary: Hattip to ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/35813 Test Plan: CI Differential Revision: D20800789 Pulled By: malfet fbshipit-source-id: a51cedfc7dfc68ac59d4f00f12eaff43cf1fdd7a	2020-04-01 13:43:49 -07:00
lizz	5d1205bf02	Suppress output when checking hipcc (#35789 ) Summary: Otherwise, it will print some message when hipcc is not found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35789 Differential Revision: D20793089 Pulled By: ezyang fbshipit-source-id: 4b3cb29fb1d74a1931603ee01e669013ccae9685	2020-04-01 13:03:21 -07:00
Edward Yang	16a88e4369	Add unboxedCallRedispatch (#35476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35476 A few things: - Add new callUnboxedRedispatch function which can be used to do a redispatch when you don't want to add a type id to the excluded set. This will recompute the dispatch key but ignore everything including and before the currentDispatchKey - Add FULL_AFTER constructor to DispatchKeySet; used to implement redispatch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D20680518 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: ecd7fbdfa916d0d2550a5b19dd3ee4a9f2272457	2020-04-01 12:48:33 -07:00
Jerry Zhang	ab26dfb44e	[quant] Move quantization tests into test/quantization (#35812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35812 Test Plan: . Imported from OSS Differential Revision: D20795329 fbshipit-source-id: 42cc905c44ce7b86720aeef512d747ff6788d7a2	2020-04-01 12:44:19 -07:00
Hong Xu	15326fb240	Revert "Attempt to fix windows build" (#35217 ) Summary: This reverts commit 0c222555ce82f2caf497e2fea2f2844bdd67e9e5. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35217 Differential Revision: D20793032 Pulled By: ezyang fbshipit-source-id: 66132f6007db2932aafcbdb09d89101cb944bab1	2020-04-01 12:42:44 -07:00
Edward Yang	990b54146f	Revert D16864196: [pytorch][PR] port fmod from TH to ATen Test Plan: revert-hammer Differential Revision: D16864196 Original commit changeset: d884cc9e74bb fbshipit-source-id: d52a4ae715698c92d5878c2b6876cd98dc80b5ac	2020-04-01 12:34:17 -07:00
Nik Ved	35cdb78522	Make kl_div accept target in log space (#34586 ) Summary: Fixes [32520](https://github.com/pytorch/pytorch/issues/32520), implements [34536](https://github.com/pytorch/pytorch/issues/34536). Here are some benchmarks: ```python import torch import torch.nn.functional as F from IPython import get_ipython ipython = get_ipython() torch.set_num_threads(1) for d in [5, 10, 20, 50, 100, 1000]: i = torch.rand(d, d) t = torch.rand(d, d) print(f"Size: {d}x{d}") ipython.magic("timeit F.kl_div(i, t, reduction='none', log_target=False)") ipython.magic("timeit F.kl_div(i, t.log(), reduction='none', log_target=True)") ``` Output: ``` Size: 5x5 16 µs ± 33 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 8.24 µs ± 17.3 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) Size: 10x10 16.7 µs ± 17.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 8.7 µs ± 20.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) Size: 20x20 17.7 µs ± 47.5 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 9.7 µs ± 28.8 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) Size: 50x50 23.6 µs ± 60.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 15 µs ± 33.7 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) Size: 100x100 42.8 µs ± 223 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 34 µs ± 17.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Size: 1000x1000 3.9 ms ± 1.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 3.45 ms ± 364 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34586 Differential Revision: D20652726 Pulled By: ezyang fbshipit-source-id: 480697b4cd01341bbeee7514a8b812705a0600ea	2020-04-01 12:26:58 -07:00
shihongzhi	74ef0adf60	add mv operator to SparseTensor (#21782 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/21266 add mv operator to SparseTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/21782 Differential Revision: D20794372 Pulled By: ezyang fbshipit-source-id: 6b396357d512f7a5860da83e7976c33bf92cf974	2020-04-01 12:21:50 -07:00
Lisa Roach	2b068d10b0	Removing references to PYTHON3COMPATIMPORTS. (#35384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35384 Removing references to PYTHON3COMPATIMPORTS, mostly suppressions but removed one instance of usage in a bash script. Fixed errors arc lint uncovered. Test Plan: arc lint Sandcastle tests Reviewed By: zertosh Differential Revision: D20635401 fbshipit-source-id: 74c6b5edb85a78a44f96b96f72ee75a9c2d029f1	2020-04-01 10:34:04 -07:00
Shen Li	acb59a3b86	Remove unused header in process_group_agent.h (#35767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35767 Test Plan: Imported from OSS Differential Revision: D20771900 Pulled By: mrshenli fbshipit-source-id: af88abfabfbe3d2d94407942738f7dcdfc3f30e2	2020-04-01 10:28:41 -07:00
Michael Suo	6491bf2855	Revert D20777341: [pytorch][PR] Add __torch_function__ benchmarks. Test Plan: revert-hammer Differential Revision: D20777341 Original commit changeset: 6aaaf2a07553 fbshipit-source-id: 1c324f91f85ac624bf878297c96c682a46958954	2020-04-01 10:23:00 -07:00
Xiang Gao	e9d868a529	Kill CUDA_tensor_apply4 (#33998 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33998 Test Plan: Imported from OSS Differential Revision: D20196787 Pulled By: VitalyFedyunin fbshipit-source-id: 1978a014efb4a18ef9fcc7ad928a4264af4a297a	2020-04-01 10:07:47 -07:00
Xiang Gao	d463c10668	Migrate prelu_cuda_backward from CUDA_tensor_apply4 to TensorIterator (#33997 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33997 Test Plan: Imported from OSS Differential Revision: D20196786 Pulled By: VitalyFedyunin fbshipit-source-id: 5d54a9226bd7f369f582192842be2fcd384bc7af	2020-04-01 10:06:16 -07:00
shihongzhi	ceff21a4fc	port fmod from TH to ATen (#24405 ) Summary: https://github.com/pytorch/pytorch/issues/22803 performance benchmarks： import timeit import torch import itertools import statistics def test_perf(sizes, device, repeat, times): def _tensor(name, sizes, device): return '''{0} = torch.rand({1}, device="{2}");'''.format(name, sizes, device) setup_code = 'import torch;' + _tensor('x', sizes, device) + _tensor('y', sizes, device) test_code = '''torch.fmod(y, x);''' if device == "cuda": test_code = test_code + 'torch.cuda.synchronize()' result = timeit.repeat(setup = setup_code,stmt = test_code,repeat = repeat,number = times) mean = statistics.mean(result) std = statistics.stdev(result) print('''sizes = {0} std = {1} mean = {2}'''.format(sizes, std, mean)) def test_perf_for_device(device, small, mid, large): print(device) for s in itertools.product((small, mid, large), (small, mid, large)): test_perf(str(s), device, 3, 300) test_perf_for_device("cpu", 5, 100, 1000) test_perf_for_device("cuda", 5, 100, 10000) pytorch:master cpu sizes = (5, 5) std = 0.0004191587896767566 mean = 0.0052408403377436725 sizes = (5, 100) std = 0.00012129380478190695 mean = 0.006508304664748721 sizes = (5, 1000) std = 0.00018175678335131663 mean = 0.0363664986701527 sizes = (100, 5) std = 0.00034399426107962946 mean = 0.006770268999389373 sizes = (100, 100) std = 0.0006779367543473553 mean = 0.07270567266580959 sizes = (100, 1000) std = 0.01670362224705441 mean = 0.1300258070017056 sizes = (1000, 5) std = 0.010281040640935534 mean = 0.045936293997025736 sizes = (1000, 100) std = 0.012529932966256128 mean = 0.12733882099564653 sizes = (1000, 1000) std = 0.002150238308503937 mean = 1.1608000710014796 cuda sizes = (5, 5) std = 0.00016137550559233116 mean = 0.014315356330674453 sizes = (5, 100) std = 0.0014720358192929545 mean = 0.015730336332732502 sizes = (5, 10000) std = 0.0017510024071247026 mean = 0.015462367334597124 sizes = (100, 5) std = 0.001569950832690219 mean = 0.015847195667447522 sizes = (100, 100) std = 0.000935629392520788 mean = 0.015551854667137377 sizes = (100, 10000) std = 0.002454919985869727 mean = 0.04476405966367262 sizes = (10000, 5) std = 0.0013192075275361463 mean = 0.015794202001416124 sizes = (10000, 100) std = 0.001418935833245521 mean = 0.04419450566638261 sizes = (10000, 10000) std = 0.0070977799177425 mean = 3.267501967328523 shihongzhi:feature/port_fmod cpu sizes = (5, 5) std = 0.0003939277361171243 mean = 0.008732202996422226 sizes = (5, 100) std = 7.568185896146914e-05 mean = 0.010897216998273507 sizes = (5, 1000) std = 3.916722355255723e-05 mean = 0.03223436966557832 sizes = (100, 5) std = 0.00016529833171236708 mean = 0.011018406672519632 sizes = (100, 100) std = 0.000155446405937598 mean = 0.055315166668151505 sizes = (100, 1000) std = 0.005295612670839835 mean = 0.09823771333321929 sizes = (1000, 5) std = 5.087993715488194e-05 mean = 0.03315563267096877 sizes = (1000, 100) std = 0.004952377745126246 mean = 0.09605619766807649 sizes = (1000, 1000) std = 0.10362095898303665 mean = 0.9652185496718934 cuda sizes = (5, 5) std = 5.004076916927963e-05 mean = 0.016851375335439418 sizes = (5, 100) std = 0.0008912925390246038 mean = 0.01788881132476187 sizes = (5, 10000) std = 0.0009701942336158022 mean = 0.018210363331794117 sizes = (100, 5) std = 0.0007897575234315655 mean = 0.017682057005004026 sizes = (100, 100) std = 0.0012395220098068511 mean = 0.016444508665396523 sizes = (100, 10000) std = 0.000957364387413519 mean = 0.016943917328414198 sizes = (10000, 5) std = 0.0011325899538680206 mean = 0.017102815332085203 sizes = (10000, 100) std = 0.0013052748368152663 mean = 0.017058989333842572 sizes = (10000, 10000) std = 0.024267574119715446 mean = 0.30735275766831666 Pull Request resolved: https://github.com/pytorch/pytorch/pull/24405 Differential Revision: D16864196 Pulled By: VitalyFedyunin fbshipit-source-id: d884cc9e74bb8f4ce2ad8d23c676fa914b26d8fb	2020-04-01 09:49:11 -07:00
Alban Desmaison	a736b994b7	Remove old section of the aten doc that is not true anymore (#35807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35807 Differential Revision: D20794708 Pulled By: albanD fbshipit-source-id: 8e67c369bd17f6527dd024c89fd2481ecd7f6ee1	2020-04-01 09:47:44 -07:00
Dhiraj D Kalamkar	945d7a7408	Add All-to-all comms support to distributed module and MPI backend (#32361 ) Summary: As described in https://github.com/pytorch/pytorch/issues/32345, a prototype implementation to add an alltoall communication primitive to torch.distributed module and ProcessGroup abstract interface. Also, implements alltoall in ProcessGroupMPI backend. mnaumovfb JianpingChen066 dmudiger srinivas212 Jianhui-Li mshiryaev ftian1 cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini xush6528 osalpekar Pull Request resolved: https://github.com/pytorch/pytorch/pull/32361 Reviewed By: mrshenli Differential Revision: D20635481 Pulled By: srinivas212 fbshipit-source-id: 3dd0af800ce55d02f02813cde550e3a0f1a287d2	2020-04-01 08:57:12 -07:00
Edward Yang	409bac48e4	Move all warn logic for overwriting registration to OperatorEntry (#35769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35769 This fixes a bug where correct end user API usage can still trigger a warning because we don't preserve the invariants DispatchTable was previously expecting to be done. So now, OperatorEntry is the source of truth, and it just whacks DispatchTable until its the correct state. OperatorEntry does the user-facing checking. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20772383 Pulled By: ezyang fbshipit-source-id: 167d249a826d7b02361ba0a44571813c829649c1	2020-04-01 08:17:46 -07:00
Johannes M Dieterich	6318899c9b	[ROCm] [ROCm 2.10+] enable fp16 dot in PyTorch backend (#30431 ) Summary: ROCm 2.10 has a hdot implementation. Use it and enable test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30431 Differential Revision: D20776784 Pulled By: ezyang fbshipit-source-id: a192a701eb418dac2015e300563ade691c24903e	2020-04-01 07:49:13 -07:00
Hameer Abbasi	8c534bb0bd	Add __torch_function__ benchmarks. (#35530 ) Summary: Since the last one was apparently reverted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35530 Differential Revision: D20777341 Pulled By: ezyang fbshipit-source-id: 6aaaf2a0755359074ae3d0efe32018d78dafe976	2020-04-01 06:30:17 -07:00
neginraoof	60a3e82c4e	[ONNX] Fix for constant folding: Slice, Added ReduceL1 and ReduceL2 (#35280 ) Summary: 1- Added support for constant folding onnx::ReduceL1 and onnx::ReduceL2 2- Fixed constant folding for slice as onnx::Slice opset 11 supports negative axes and indices 3- Updated export of select opset 11 4- Separated test environment for test_utility_functions as environment variables could be overwritten by caffe2 quantization tests on CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/35280 Reviewed By: hl475 Differential Revision: D20626140 Pulled By: houseroad fbshipit-source-id: 39667c7852eeaa97d9da23f53da52760d3670ecf	2020-04-01 04:47:47 -07:00
Rohan Varma	1f06db2579	Refactored rpc docs (#35109 ) Summary: Reorganize as per jlin27 's comments. Screenshots added in comments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35109 Differential Revision: D20788774 Pulled By: rohan-varma fbshipit-source-id: 7d64be70ef76ed6ff303d05d39c338293c234766	2020-04-01 02:01:34 -07:00
Ilia Cherniavskii	a5bfcc5323	Unify management of thread local settings (#35523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35523 In this PR we extend ThreadLocalState to cover dispatch keys and ThreadLocalDebugInfo and move it from JIT interpreter down to thread management (at::launch) and autograd (backward threads) code Test Plan: unit tests (CI) Reviewed By: dzhulgakov Differential Revision: D20615714 fbshipit-source-id: 16a9fc96a25cb6c2629230b1187fbf78786ac565	2020-04-01 01:56:39 -07:00
Ilia Cherniavskii	bc6bd0bb1a	Debug Information Guard Summary: This diff fixes the issues with current handling of debug information passed along the execution of the model. (For example, it is possible that multiple calls to the debug guard may override each other) Test Plan: CI test/cpp/jit Reviewed By: dzhulgakov Differential Revision: D20602775 fbshipit-source-id: 4683957954028af81a1a0f1f12b243650230c9bb	2020-04-01 01:55:29 -07:00
Michael Suo	fd1dfaa7d0	[jit] kill isSameIdentity (#35019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35019 Test Plan: Imported from OSS Differential Revision: D20537903 Pulled By: suo fbshipit-source-id: 4610279f93c53dda30a8a555177f85edb73eea02	2020-04-01 01:47:17 -07:00
Michael Suo	2d85daca58	[jit] kill `shallowEquals` (#35005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35005 This is one of the ad-hoc IValue equality implementations that should be replaced with `operator==`. Test Plan: Imported from OSS Differential Revision: D20537900 Pulled By: suo fbshipit-source-id: 5f31ee2386f9d0b33f2bc047a39351191f4d81b0	2020-04-01 01:47:12 -07:00
Michael Suo	c382ec88d1	[jit] define equality for IValue (#34986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34986 Previously we were reluctant to define equality for IValues, as it's not totally straightforward. But the vacuum this created basically forced people to define their own equality comparisons for their own purposes. We have at least 3 in PyTorch itself, and 2 others outside that I know of. These implementations are generally wrong, so we should just bite the bullet and define equality canonically. Test Plan: Imported from OSS Differential Revision: D20537901 Pulled By: suo fbshipit-source-id: 8d770a31bf6de6f3b38f9826bf898d62c0ccf34e	2020-04-01 01:46:00 -07:00
Michael Suo	0ed3f881c5	clang-fmt (#35796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35796 Test Plan: Imported from OSS Reviewed By: shannonzhu Differential Revision: D20788673 Pulled By: suo fbshipit-source-id: 3555a6204ef174c28e561a8931e13814846813a3	2020-04-01 00:14:36 -07:00
Michael Suo	866d9d4e6a	[jit] Fix name collision on load (#35720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35720 When modules are saved, all relevant types are serialized according to their qualified name with a compilation unit. Since qualified names are guaranteed to be unique within a compilation unit, this normally works fine. On load, all types are registered in a compilation unit owned by the script::Module. Type names are not unique across compilation units, so if you load two modules with colliding type names, make them submodules of yet another module, and save that module, there is the potential of a name collision. See the added tests for examples if that description is confusing. The solution is to unique type names when serializing code by mangling them if we detect a name collision. Test Plan: Imported from OSS Differential Revision: D20749423 Pulled By: suo fbshipit-source-id: a8827ff1d4a89f3e7964dbbb49b4381863da3e6a	2020-04-01 00:02:38 -07:00
lizz	ee6f7c3e62	Remove extra semicolon (#35751 ) Summary: Which suppresses lots of warning during compilation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35751 Differential Revision: D20788302 Pulled By: jamesr66a fbshipit-source-id: 9bc598ab27b87c28c3011597a39d695355cf4157	2020-03-31 23:33:17 -07:00
Natalia Gimelshein	dc1ecdf8d9	Moves torch cpu math tests to device-generic framework (#35658 ) Summary: Per title. Also, replaces reference computation with `math.xx` functions and torch.apply_ with numpy/scipy as appropriate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35658 Differential Revision: D20744541 Pulled By: ngimel fbshipit-source-id: a16ea506397c07f09f0d7f1c54fd8017418bd506	2020-03-31 23:28:38 -07:00
Michael Suo	319aee1afb	Revert D20771828: [quant] Move quantization tests into test/quantization Test Plan: revert-hammer Differential Revision: D20771828 Original commit changeset: 5f1df5e86c29 fbshipit-source-id: d14f915f291ae8a90026c5b65624459211495f47	2020-03-31 23:01:00 -07:00
Michael Suo	06dcb70905	[jit] Fix Type equality in some cases (#35719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35719 Test Plan: Imported from OSS Differential Revision: D20749422 Pulled By: suo fbshipit-source-id: 09b697766c1eb3e56f4cf8acc7e854b0981d7991	2020-03-31 22:29:12 -07:00
Michael Suo	51fb5ef80e	[jit] add cast<> specialization for NamedType (#35718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35718 Because NamedType is not a concrete type (it's just an interface), it has no corresponding TypeKind and thus no default `cast()` behavior. Adding a specialization that does the right thing. Test Plan: Imported from OSS Differential Revision: D20749425 Pulled By: suo fbshipit-source-id: 6ccab1cca26fd2b2805189fcf2305d99ae28145a	2020-03-31 22:29:07 -07:00
Michael Suo	995f53b042	[jit] make `python_str` take a custom renamer (#35717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35717 We need to provide calling code the ability to customize how type names are printed. Will be used to mangle names in python_print, stacked on top. Test Plan: Imported from OSS Differential Revision: D20749424 Pulled By: suo fbshipit-source-id: f110ab569c81e8934487295cd67009fc626ac194	2020-03-31 22:29:02 -07:00
Michael Suo	6a5d008abf	[jit] factor mangler out (#35716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35716 Test Plan: Imported from OSS Differential Revision: D20749426 Pulled By: suo fbshipit-source-id: ec148abf86ab17113d0c71a50375842f8a9ada0e	2020-03-31 22:27:39 -07:00
Shihao Xu	cae6bdf199	[JIT] Mark aten::wait as having side effect, since it can represent RPC message received (#35695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35695 aten::wait was optimized out, causing RPC futures are not waited on. Test Plan: ``` buck test mode/dev-nosan //caffe2/torch/fb/distributed/model_parallel/tests:test_dist_optim ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_python_future_with_jit ``` ``` buck build mode/dev-nosan //caffe2/test:jit && \ buck-out/gen/caffe2/test/jit\#binary.par -r test_trace_fork_wait_inline ``` ``` buck build mode/dev-nosan //caffe2/test:jit && \ buck-out/gen/caffe2/test/jit\#binary.par -r test_trace_fork_wait_inline_onnx ``` Differential Revision: D9562716 fbshipit-source-id: 35b2c971efa42949ffdf0910bd75a927eee8d965	2020-03-31 22:17:25 -07:00
Mike Ruberry	d1a4a64092	Disables imag for real-valued tensors (#35728 ) Summary: In NumPy, calling np.imag on a real-valued tensors returns a non-writable tensor (view) of zeros. In PyTorch we don't support non-writeable tensors (or views), so we can either return a writable tensor or error. If we do the former, that may confuse people who try to write to the imaginary part of a real-valued tensor, and may cause a BC issue if we do support non-writable tensors. This PR errors to provide us flexibility implementation the solution we'd like in the future, while protecting users from unexpected behavior today. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35728 Differential Revision: D20760687 Pulled By: mruberry fbshipit-source-id: f60d445746cc75ba558804c853993d9e4621dad3	2020-03-31 21:34:46 -07:00
Wojciech Baranowski	2f84a07b58	indexing: throw exception for masks with dtype=uint8 (#34418 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34418 Differential Revision: D20776164 Pulled By: ngimel fbshipit-source-id: f4ebaabf427d7967f2f317235562f91c8f9216f0	2020-03-31 20:51:56 -07:00
Jerry Zhang	fef6c617d4	[quant] Move quantization tests into test/quantization (#35688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35688 Test Plan: . Imported from OSS Differential Revision: D20771828 fbshipit-source-id: 5f1df5e86c29f7bdfbdc6563450e909b3bfdc07a	2020-03-31 20:30:57 -07:00
Elias Ellison	1ec0676a33	[JIT] register list prim ops cleanup (#35768 ) Summary: This is a follow up from https://github.com/pytorch/pytorch/pull/34520, which removed specialized list ops. This removes templating from list ops. it also has one minor other change, which is to move `aten::len(t[]) -> int` to `aten::len(Any[]) -> int` so that heterogenous tuples can be called with `len()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35768 Differential Revision: D20772943 Pulled By: eellison fbshipit-source-id: bc36a00920bc94ca8c5aa9eb7d5d7a640388ffbb	2020-03-31 19:24:59 -07:00
anjali411	2c6d1e57cd	is_complex doc fix (#35680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35680 Differential Revision: D20740814 Pulled By: anjali411 fbshipit-source-id: dd35594ef7661a2876479b974b37be83cf472f44	2020-03-31 18:14:51 -07:00
Jeremy Lilley	3ba885896d	[jit] Minor: in unpickler, string tweak in readBytes() (#35550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35550 Avoid clearing data before copying into the string buffer in a few cases. ghstack-source-id: 101020139 Test Plan: buck test mode/dev-nosan caffe2/test/cpp/jit/... Differential Revision: D20699725 fbshipit-source-id: 14dce40dbebdd64fd0d60372cad1b642602205db	2020-03-31 17:55:14 -07:00
Jeremy Lilley	8d64a3848c	[jit] In RPC Server, handle TorchScript continuations asynchronously (#34109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34109 This change adds glue to GraphExecutor to give the RPC server access to the future-based Interpreter::runAsync() api. Previously, if a server encounted a TorchScript continuation-based block with fork/wait, it would simply block in the server thread until the handler completed, since it uses the synchronous Interpreter::run() api. With the ivalue::Future returned by the Interpreter, we can run the TorchScript code asynchronously from c++ simply by connecting its callback to the server callback. We add test cases to cover the new logic, both rpc_async and remote. ghstack-source-id: 101245438 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc/... Differential Revision: D20194321 fbshipit-source-id: 16785ec5d9ed0b16cb1ffab0a9771a77de30fcb0	2020-03-31 17:21:46 -07:00
Johannes M Dieterich	e5746eec1e	[ROCm] Remove installation of ca-certificates and apt-transport-https in test.sh (#35676 ) Summary: These packages are now part of the base docker image. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35676 Differential Revision: D20777497 Pulled By: ezyang fbshipit-source-id: aa9dba905dc376b1462910bc2c4a385d77d7aa0c	2020-03-31 15:36:24 -07:00
Jerry Zhang	9650f465ce	[quant][graphmode] Quantization support for at::sort (#35571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35571 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20769874 fbshipit-source-id: 7d6805754416fd9c4a3d84d42af756e1926111c2	2020-03-31 14:54:16 -07:00
Yinghai Lu	8de01aac0b	[Onnxifi] Add initializers to the C2 net passed into Glow (#35764 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35764 So that Glow knows what input is constant. We probably need to do similar things to torch_glow though. Test Plan: ``` buck build caffe2/caffe2/opt/custom:glow_net_transform ``` Reviewed By: jackm321 Differential Revision: D20770514 fbshipit-source-id: d398eb8eddbdbba21ccb5b4ac9cb335e4b27b8b3	2020-03-31 14:45:05 -07:00
Alexander Sidorov	7d5350c2a3	[easy] ThroughputBenchmark: print out aten's parallel settings before execution (#35632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35632 This is handy to make sure the settings you have match your expectations. Here is an example output I have got: ``` I0328 15:55:12.336715 41258 throughput_benchmark-inl.h:23] ATen/Parallel: at::get_num_threads() : 1 at::get_num_interop_threads() : 14 OpenMP 201511 (a.k.a. OpenMP 4.5) omp_get_max_threads() : 1 Intel(R) Math Kernel Library Version 2020.0.0 Product Build 20191122 for Intel(R) 64 architecture applications mkl_get_max_threads() : 1 Intel(R) MKL-DNN v0.21.1 (Git Hash 7d2fd500bc78936d1d648ca713b901012f470dbc) std:🧵:hardware_concurrency() : 28 Environment variables: OMP_NUM_THREADS : 1 MKL_NUM_THREADS : [not set] ATen parallel backend: OpenMP ``` Test Plan: Imported from OSS Differential Revision: D20731331 Pulled By: ezyang fbshipit-source-id: 5be7ffb23db49b1771c2f563b5d84180c3a0ba7f	2020-03-31 14:25:29 -07:00
xiaobingsuper	07dbf0db46	bfloat16: vectorized clamp, clamp_min and clmap_max (#35082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35082 Test Plan: Imported from OSS Differential Revision: D20721148 Pulled By: ngimel fbshipit-source-id: 949b66e28bfc6049a891fced9ea308131b3675c6	2020-03-31 14:06:44 -07:00
svcscm	8e49afa908	Updating submodules Summary: GitHub commits: `7235cf5630` `faeae13a65` `bec10cc357` `849da725d3` `9dafeb9e64` `99dd5d7429` `80979f81c7` `90d929abd7` `cb8e10a1af` `99d7165530` `70d8d13d0f` `1798e56435` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: d40ae0d701bd41e30d30610cf381ac4fa2537947	2020-03-31 12:42:55 -07:00
Kimish Patel	063275fd33	Fix a bug in subgraph rewriters. (#35704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35704 Due to not clearing nodes_to_delete_, when we try to write graph rewrite pass with multiple patterns, this is observed: IndexError: vector::_M_range_check: __n (which is 0) >= this->size() (which is 0) Test Plan: The PR stacked on top of this run into this error in the unit test. Imported from OSS Differential Revision: D20746593 fbshipit-source-id: 9b55604f49ff2ee2a81a61827880cb679c44607a	2020-03-31 10:52:45 -07:00
Jeremy Lilley	f182b43760	[rref] Handle exceptions returned via remote() calls (#35331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35331 When the function called by remote() throws, it seems sensible to surface that exeption when rref.to_here() is called. Doing this only involves simple modifications: - we need the OwnerRRef to keep around an optional<string> for the error - add an OwnerRRef setError() method that's parallel to setValue(), and plumb through the logic We add rpc_tests to verify that the exception is propagated properly. ghstack-source-id: 101136900 Test Plan: buck test mode/dev-nosan caffe2/test/distributed/rpc:rpc_spawn buck test mode/dev-nosan caffe2/test/distributed/rpc/jit:rpc_spawn Differential Revision: D20634078 fbshipit-source-id: b5b13fdb85cdf6a43f42347d82eabae1635368ec	2020-03-31 10:06:15 -07:00
Vasiliy Kuznetsov	b4c4342747	hswish and hardsigmoid: improve docs (#35431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35431 Resolving z-a-f's comments on earlier PRs on making the docblocks easier to read. Test Plan: render the new docblocks in http://rst.aaroniles.net/ CI Imported from OSS Differential Revision: D20658668 fbshipit-source-id: 5ea4a21d6b8dc9d744e2f4ede2f9d5d799fb902f	2020-03-31 10:01:07 -07:00
Orion Reblitz-Richardson	3d6b5bac0a	Move libtorch to py3 and cleanup other CircleCI config (#35700 ) Summary: This moves libtorch to Python 3.6 and cleans up other CircleCI config for the removal of python2. Going to see if all tests pass on this and will also land before https://github.com/pytorch/pytorch/pull/35677 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35700 Differential Revision: D20767830 Pulled By: orionr fbshipit-source-id: 0d5a8224b65829cc2b08a5844707e0c0e079421a	2020-03-31 09:39:41 -07:00
Ralf Gommers	8ff05031b0	Update collect_env.py to detect relevant conda-installed numpy and cudatoolkit (#35646 ) Summary: Addresses a small issue noticed in gh-32369 With this PR in a clean conda env where `conda install pytorch torchvision cudatoolkit=10.1 -c pytorch` was run: ``` Versions of relevant libraries: [pip] numpy==1.18.1 [pip] torch==1.4.0 [pip] torchvision==0.5.0 [conda] blas 1.0 mkl [conda] cudatoolkit 10.1.243 h6bb024c_0 [conda] mkl 2020.0 166 [conda] mkl-service 2.3.0 py37he904b0f_0 [conda] mkl_fft 1.0.15 py37ha843d7b_0 [conda] mkl_random 1.1.0 py37hd6b4f25_0 [conda] numpy 1.18.1 py37h4f9e942_0 [conda] numpy-base 1.18.1 py37hde5b4d6_1 [conda] pytorch 1.4.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch [conda] torchvision 0.5.0 py37_cu101 pytorch ``` With current master: ``` Versions of relevant libraries: [pip] numpy==1.18.1 [pip] torch==1.4.0 [pip] torchvision==0.5.0 [conda] blas 1.0 mkl [conda] mkl 2020.0 166 [conda] mkl-service 2.3.0 py37he904b0f_0 [conda] mkl_fft 1.0.15 py37ha843d7b_0 [conda] mkl_random 1.1.0 py37hd6b4f25_0 [conda] pytorch 1.4.0 py3.7_cuda10.1.243_cudnn7.6.3_0 pytorch [conda] torchvision 0.5.0 py37_cu101 pytorch ``` Note that the conda output will always also include pip-installed packages, so there are still duplicates now. If it's desirable to completely remove the `pip list` output for conda envs, that's also an option. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35646 Differential Revision: D20736304 Pulled By: zou3519 fbshipit-source-id: fb6b3da3f69395869bc8c52bf7a85e9d15b0476d	2020-03-31 09:32:12 -07:00
Jongsoo Park	ada647214f	[caffe2] explicitly pass use_offsets=false when calling fbgemm embedding kernels (#35711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35711 As title Test Plan: CI Reviewed By: jianyuh Differential Revision: D20747290 fbshipit-source-id: fc9fced744cc8f0c61a671cb4b424ff067c2573d	2020-03-31 08:35:19 -07:00
Michael Lee (Engineering)	81c2412721	[caffe2] Switch to using `public_include_directories Summary: caffe2 uses `-I` all over the place, but really we should use the Buck built-in version of this Alternatively, the `exported_header` clean up means we need to standardize to a single path Test Plan: ``` buck build caffe2:torch-cpp-cpu buck build caffe2/... ``` Reviewed By: malfet Differential Revision: D19150098 fbshipit-source-id: e99aaf69d6c474afaedbd5f693a7736d3d67aafc	2020-03-31 08:18:44 -07:00
Mike Ruberry	d2343bea32	Disables complex floor, ceil, trunc (to be compatible with NumPy) (#35592 ) Summary: NumPy doesn't allow complex inputs to floor, ceil, or trunc, and without careful deliberation I don't think PyTorch should, either: is it intuitive that these functions apply to both the real and imaginary parts of complex tensors, or only to the real parts? This PR disables these functions for complex inputs so we don't prematurely commit a particular behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35592 Differential Revision: D20757796 Pulled By: mruberry fbshipit-source-id: fdc53ac161fca7ad94c9280c3f5cf9c7c40c7f2c	2020-03-31 08:09:09 -07:00
Richard Zou	539d3ff344	Revert D20749588: [pytorch][PR] Use `std::abs` instead of `abs` in lbfgs.cpp Test Plan: revert-hammer Differential Revision: D20749588 Original commit changeset: b6640af67587 fbshipit-source-id: 730ff95e19d2f222aa11d092fa53f661f3f0d367	2020-03-31 06:50:47 -07:00
Rohan Varma	59268d4cbf	[JIT] Improve the error message when registering a custom class twice (#35568 ) Summary: I hit this exception when including the registration code with `torch::class_` in a header file, which was included in multiple cpp files and thus called this twice. It could be helpful to improve the error msg here to indicate what exactly happened. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35568 Differential Revision: D20759476 Pulled By: rohan-varma fbshipit-source-id: 680f6a8abb4453cd7a311cda1e2a03f81e7f7442	2020-03-31 00:34:46 -07:00
Ilia Cherniavskii	800d5617c0	Recording of TorchScript functions (#34710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34710 Extending RecordFunction API to support new recording scopes (such as TorchScript functions), as well as giving more flexibility to set sampling rate. Test Plan: unit test (test_misc.cpp/testRecordFunction) Reviewed By: gdankel, dzhulgakov Differential Revision: D20158523 fbshipit-source-id: a9e0819d21cc06f4952d92d43246587c36137582	2020-03-31 00:33:23 -07:00
Michael Suo	8fef8d19fa	clang-format (#35752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35752 Test Plan: Imported from OSS Differential Revision: D20761302 Pulled By: suo fbshipit-source-id: 00624088b96945081889d5ef7be19c115e0328b4	2020-03-31 00:19:40 -07:00
Supriya Rao	a090de380c	[quant][graph] Add quant fusion for dynamic quantization (#35586 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35586 This pass fuses the choose_qparams-quant-dequant sequence Fusion for weight tensor is the same as static quant. Test Plan: python test/test_quantize_script.py Imported from OSS Differential Revision: D20755680 fbshipit-source-id: b7443770642b6e6fa0fa9da8a44637e9b2d4df70	2020-03-30 23:34:56 -07:00
Supriya Rao	1f7ee7b6b7	[quant][graph] Add pass to insert quant dequant for dynamic quantization (#35448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35448 Add _choose_qparams_per_tensor which returns scale and zero_point similar to the dynamic quantization in the operator Test Plan: python test/test_quantize_script.py Imported from OSS Differential Revision: D20755679 fbshipit-source-id: c9066d8f1bb3e331809be26c4be806faafc9b981	2020-03-30 23:33:32 -07:00
Michael Suo	b0f8429826	Update clang_format.yml	2020-03-30 23:25:16 -07:00
Chunli Fu	35087b8d77	[Shape Inference] Try to infer input of elementwise ops (#35701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35701 Reviewed By: yinghai Differential Revision: D20745111 fbshipit-source-id: 5fcfe3796c1e80d4cf8843713089d802f3bf3759	2020-03-30 23:15:12 -07:00
Edward Yang	4f4ed5c108	Disable c10::import(ns) (#35398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35398 This disables namespaced c10::import which is broken with custom mobile op builds. This is to help prevent people from accidentally breaking the custom mobile build in a mysterious way; if they use the longform version it will work. Fixing the analyzer is tracked in https://github.com/pytorch/pytorch/issues/35397 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20680519 Pulled By: ezyang fbshipit-source-id: a18ac8df7e72bf399807870beedb828131273e48	2020-03-30 21:12:49 -07:00
Edward Yang	726baf69d7	Do not link BLAS into torch_cuda/torch_hip (#35724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35724 When statically linking BLAS, this results in a second useless copy of MKL in libtorch_cuda.so Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20758165 Pulled By: ezyang fbshipit-source-id: 5a82a23c053f440b659f2ac2aaaf3c9d5ec69971	2020-03-30 21:04:14 -07:00
svcscm	cd760fbd7f	Updating submodules Summary: GitHub commits: `18cf0de640` `df3425807f` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 52fb08a9119c9e2933e735cf3415f681070292cc	2020-03-30 20:50:55 -07:00
Jerry Zhang	9018538ab3	[quant][graphmode][refactor] getGeneralTensorInputs(Node) -> getPassThroughInputs(Value) (#35558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35558 This is to have a more fine grained support for general ops, e.g. for sort, the first output will have pass through inputs and the second op does not need to be quantized so we'll have a check for that Test Plan: . Imported from OSS Differential Revision: D20752128 fbshipit-source-id: 825c4c393910a88ecb12e24e9a2f3b05c5d5a7ab	2020-03-30 19:30:37 -07:00
Jerry Zhang	8add1843a9	[quant][graphmode][fix] docs for InsertObservers (#35557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35557 . Test Plan: . Imported from OSS Differential Revision: D20752129 fbshipit-source-id: 1cde20675c4f19fd59116332cae4a444e58973c0	2020-03-30 19:29:09 -07:00
Lu Fang	c2ca4371ae	[PyTorch BC] Clean up whitelist (#35730 ) Summary: Remove stale items Pull Request resolved: https://github.com/pytorch/pytorch/pull/35730 Reviewed By: hl475 Differential Revision: D20754058 Pulled By: houseroad fbshipit-source-id: 5f38b34ad68bd7cb6db8cf1654424d655436ca35	2020-03-30 19:24:42 -07:00
Nikita Shulga	2f3b952d16	Use `std::abs` instead of `abs` in lbfgs.cpp (#35698 ) Summary: `abs` is a C-style function that takes only integral argument `std::abs` is polymorphic and can be applied to both integral and floating point types Pull Request resolved: https://github.com/pytorch/pytorch/pull/35698 Test Plan: CI Differential Revision: D20749588 Pulled By: malfet fbshipit-source-id: b6640af67587650786366fe3907384bc8803069f	2020-03-30 18:47:28 -07:00
Nikita Shulga	fdadaf62b0	Disable batch_norm_relu batch_norm3d quanitized ops tests (#35727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35727 Differential Revision: D20754874 Pulled By: malfet fbshipit-source-id: 287aeb62af28e0a17dccfa84e9c5bbb6913853ca	2020-03-30 18:08:56 -07:00
Natalia Gimelshein	a15a4a5caf	Revert D20722426: [pytorch][PR] [doc] Add overflow notice for cuFFT on half precision Test Plan: revert-hammer Differential Revision: D20722426 Original commit changeset: 68f7304de5d6 fbshipit-source-id: 462133d8e8abff2e815a4a9b1eb047e7ecaa041a	2020-03-30 17:52:03 -07:00
Michael Ranieri	56fabface2	fp16 include not needed (#35708 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35708 these are not actually needed and it breaks the normal include guard that selects correct Half implementation Test Plan: CI green Reviewed By: malfet Differential Revision: D20744681 fbshipit-source-id: 70e3667593c987434415ad8ac3b68828875fc3fd	2020-03-30 17:47:44 -07:00
Natalia Gimelshein	95c1b16fc5	don't replace TensorImpl for inplace min/max dim (#35591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35591 Test Plan: buck test mode/dev //caffe2/test:cuda -- 'test_dim_reduction_cpu $test_torch\.TestTorchDeviceTypeCPU$' Differential Revision: D20718321 Pulled By: ngimel fbshipit-source-id: eba09b37ab1f22463114da69d35982e881dcaa85	2020-03-30 17:15:04 -07:00
Linbin Yu	46330b368a	[5] register aten ops in lite interpreter for detectron2go model (#35248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35248 register aten ops in lite interpreter for detectron2go models. Also set catchAllKernel for some ops since the model requires different DispatchKey. (Note: this ignores all push blocking failures!) Test Plan: (whole stack) buck build -c user.ndk_cxxflags='-g1' -c caffe2.expose_op_to_c10=1 //xplat/caffe2/fb/pytorch_predictor:maskrcnnAndroid#android-armv7 Reviewed By: iseeyuan Differential Revision: D20528762 fbshipit-source-id: 4da4699fe547a63b0c664fe666a8a688f1ab8c6c	2020-03-30 17:07:41 -07:00
Yunus Rahbar	8981271d9f	Skip test_mm on XLA (#35709 ) Summary: https://github.com/pytorch/pytorch/issues/34891 caused a 15 minute regression in XLA test timing when it inadvertently added this test to XLA -- I think it was intended to only add this test to CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35709 Test Plan: The XLA test job should return from ~75 to ~60 minutes. Reviewed By: malfet Differential Revision: D20748176 Pulled By: yns88 fbshipit-source-id: b50227a35bcbf2915b4f2013e2a4705e905d0118	2020-03-30 16:23:31 -07:00
Xiao Wang	e021c13d2d	[doc] Add overflow notice for cuFFT on half precision (#35594 ) Summary: This would fix https://github.com/pytorch/pytorch/issues/33485. cc ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/35594 Differential Revision: D20722426 Pulled By: ngimel fbshipit-source-id: 68f7304de5d6cecdd9e34e8697fc84bc551b1a45	2020-03-30 15:53:32 -07:00
Wanchao Liang	5e27de021e	[rpc] fix backward compatibility test (#35703 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35703 Test Plan: Imported from OSS Differential Revision: D20746388 Pulled By: wanchaol fbshipit-source-id: 4a336fd4a05a606c5b209e5e62c35c70d35614d4	2020-03-30 15:17:51 -07:00
Jerry Zhang	4e19e02976	[quant][graphmode] Quantization support for `quantized::add_scalar_relu` and `quantized::add_scalar_relu_out` (#35509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35509 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20742138 fbshipit-source-id: f6216d0af5da2bd5629aa4909f05dcde7853c8b8	2020-03-30 14:44:38 -07:00
Mingzhe Li	35715a56a9	[reland] Skip OpenMP Thread when OMP_NUM_THREADS is 1 (#35541 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35541 When the OMP_NUM_THREADS is set to 1, we don't need to launch the parallel_for function on an OpenMP thread since there is no intra-op parallelism. By avoiding that, we can reduce the unnecessary context switches. Test Plan: internal Reviewed By: ilia-cher Differential Revision: D20680465 fbshipit-source-id: 4476a810dfe7bf268fcd58fd00afb89ba61644cf	2020-03-30 14:39:57 -07:00
hainq	a0dc36e501	[Windows] Fix torch_cuda's forced link (#35659 ) Summary: The current config on `master` yields the following errors when build from source on Windows with CMake and Visual Studio 2019. ``` Severity Code Description Project File Line Suppression State Error LNK2001 unresolved external symbol \?warp_size@cuda@at@YAHXZ\ torch D:\AI\pytorch\build_libtorch\caffe2\LINK 1 Severity Code Description Project File Line Suppression State Error LNK1120 1 unresolved externals torch D:\AI\pytorch\build_libtorch\bin\Release\torch.dll 1 Severity Code Description Project File Line Suppression State Error LNK2001 unresolved external symbol \?warp_size@cuda@at@YAHXZ\ caffe2_observers D:\AI\pytorch\build_libtorch\modules\observers\LINK 1 Severity Code Description Project File Line Suppression State Error LNK1120 1 unresolved externals caffe2_observers D:\AI\pytorch\build_libtorch\bin\Release\caffe2_observers.dll 1 Severity Code Description Project File Line Suppression State Error LNK2001 unresolved external symbol \?warp_size@cuda@at@YAHXZ\ caffe2_detectron_ops_gpu D:\AI\pytorch\build_libtorch\modules\detectron\LINK 1 Severity Code Description Project File Line Suppression State Error LNK1120 1 unresolved externals caffe2_detectron_ops_gpu D:\AI\pytorch\build_libtorch\bin\Release\caffe2_detectron_ops_gpu.dll 1 ``` This change at least fixes the above errors in that specific setting. Do you think it makes sense to get this merged or will it break other settings? Pull Request resolved: https://github.com/pytorch/pytorch/pull/35659 Differential Revision: D20735907 Pulled By: ezyang fbshipit-source-id: eb8fa1e69aaaa5af2da3a76963ddc910bb716479	2020-03-30 13:59:31 -07:00
xiaobingsuper	639c68b2fe	bfloat16: enable basic math function (#35172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35172 Test Plan: Imported from OSS Differential Revision: D20721068 Pulled By: ngimel fbshipit-source-id: 7e40bda6683f041f04f78739a950cb2a6ac74571	2020-03-30 13:58:11 -07:00
Hong Xu	788ef939d8	float2::x and float2::y may not be the same as float on ROCm (#35593 ) Summary: This causes ambiguity and can be triggered sometimes (e.g., by https://github.com/pytorch/pytorch/issues/35217). Explicitly convert them to float. error: conditional expression is ambiguous; 'const hip_impl::Scalar_accessor<float, Native_vec_, 0>' can be converted to 'float' and vice versa Pull Request resolved: https://github.com/pytorch/pytorch/pull/35593 Differential Revision: D20735663 Pulled By: ezyang fbshipit-source-id: ae6a38a08e59821bae13eb0b9f9bdf21a008d5c0	2020-03-30 13:51:32 -07:00
Yinghai Lu	dd98abb453	Enable splitSparseLengthsSumSparse in onnxifi (#35555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35555 Att. So that we can lower the SparseLengthsSum* part of SparseLengthsSumSparse. We update the tying policy between Gather and SparsLengthsWeightSum so that we don't bother lowering a single Gather into the backend, which is inefficient to execute on card and creates bubbles between continuous lowering graphs. Test Plan: ``` buck test glow/fb/test:test_onnxifinnpi ``` Reviewed By: ipiszy Differential Revision: D20688525 fbshipit-source-id: cb8e38239057ff13a8d385ed09d0d019421de78b	2020-03-30 13:34:59 -07:00
Yinghai Lu	e90e89d189	Transform pass to split SparseLengthsSumSparse (#35522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35522 We will need to apply this transform pass onto the net before lowering to Glow. Test Plan: ``` buck test caffe2/caffe2/opt/custom:split_slss_test ``` Reviewed By: ipiszy Differential Revision: D20688451 fbshipit-source-id: 22c0f5d0dcf97cc51cdc86bfc0abd90328ad5f2c	2020-03-30 13:34:54 -07:00
Yinghai Lu	af4d86788c	Split SparseLengthsSumSparse into SparseLengthsSumSparseLookup + SparseLengthsSum (#35507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35507 We want to split up the SparseLengthsSumSparse op into an indirection op and the SparseLengthsSum op so that we can lower the later part. The indirection part is a plain impl now. Test Plan: ``` for i in `seq 10`; do buck test caffe2/caffe2/python/operator_test:lengths_reducer_fused_nbit_rowwise_ops_test -- test_sparse_lengths_sum_rowwise_sparse; done ``` Reviewed By: jspark1105 Differential Revision: D20683478 fbshipit-source-id: 509effe88719d20aa0c4783bbe0ce1f183ee473c	2020-03-30 13:33:29 -07:00
Lu Fang	35dbc6ebda	[BC] Fix the BC CI (#35692 ) Summary: seems due to skip logic Pull Request resolved: https://github.com/pytorch/pytorch/pull/35692 Reviewed By: hl475 Differential Revision: D20740522 Pulled By: houseroad fbshipit-source-id: 779c279f417a2a493ba7bbfd8b090b7792c6d2a8	2020-03-30 13:27:38 -07:00
Nikita Shulga	39d0500434	Fix PyTorch separate compilation (Reland) (#35581 ) Summary: Looks like there is a bug in CUDA device linker, but kernels that uses `thust::sort_by_key` can not be linked with other kernels Solve the problem by splitting 5 thrust-heavy .cu files into `__torch_cuda_sp` library which is statically linked into `torch_cuda` For default compilation workflow it should not make any difference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35581 Test Plan: Compile with `-DCUDA_SEPARABLE_COMPILATION=YES` and observe library size difference: 310Mb before, 173Mb after if compiled for sm_75 Differential Revision: D20741379 Pulled By: malfet fbshipit-source-id: e9083968324c113e44a39df0de356d79af8e7057	2020-03-30 13:21:57 -07:00
Xiang Gao	b1f08e7426	Call uncheckedSetDevice in ~InlineDeviceGuard only when device index are different (#35438 ) Summary: Setting device could be expensive, especially when a debugger is present. We should check the device are different before we set. cc: ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/35438 Differential Revision: D20664084 Pulled By: ngimel fbshipit-source-id: 2440b4c9d96c41b4a19d5b1e8e1756fa40f090f0	2020-03-30 13:13:17 -07:00
Nikita Shulga	e7371957cf	Report results from CPP unittests on Windows and Linux (Reland) (#35590 ) Summary: Add `--gtest_output=xml:/path/to/artifact-metadata-folder` to scripts invoking unit tests Add artifacts metadata to windows test jobs Install `unittest-xml-reporting` and add IN_CIRCLECI environment variable to remote python test results on Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/35590 Test Plan: Results should eventually be published to: https://circleci.com/build-insights/gh/pytorch/pytorch/master Differential Revision: D20742687 Pulled By: malfet fbshipit-source-id: baae60bdb0a4fb8d4f0d2baa77c65402fa2b99ae	2020-03-30 13:01:45 -07:00
Wanchao Liang	f3151052ce	[autograd] fix engine flakiness (#35599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35599 We don't check if the ready queue was empty before https://github.com/pytorch/pytorch/pull/33157 because the CPU worker's queue might not be empty, but after #33157, we try to check if the owner thread's ready_queue empty after inline exeuction. This might not always hold true, imagine the following case: The CPU thread that calls backward() and the GPU device thread, the Graph is like: GraphRoot(CPU) -> ComputeNode(GPU) in both thread_main, they are decrementing `--local_graph_task->outstanding_tasks_` to zero together, and then both thread will enter `if (graph_task_completed(local_graph_task))`, CPU thread will break out and finish and check if local_ready_queue is empty, the GPU thread will send a dummy task to CPU thread ready queue as it think the graph_task finished on its own thread (it actually finished on both threads together). So there will be cases that there's a dummy task remains in the queue. This happens very rare and non-deterministic, but it might get triggered when we run many jobs in the CI. Remove the check to fix the flakiness Test Plan: Imported from OSS Differential Revision: D20739778 Pulled By: wanchaol fbshipit-source-id: 75a671762650a188f44720625d53f0873617c684	2020-03-30 12:39:39 -07:00
Nikita Shulga	bb32e123e6	Report results of python unit tests during window test runs (#35687 ) Summary: Define `store_test_results` attribute in CircleCI yamls Install `unittest-xml-reporting` and define `IN_CIRCLECI` environment variable to trigger test runners to save results to XML Pull Request resolved: https://github.com/pytorch/pytorch/pull/35687 Differential Revision: D20739831 Pulled By: malfet fbshipit-source-id: 6a7bbf19f93c32766963f5edad191ad8ca316ff8	2020-03-30 12:33:03 -07:00
Edward Yang	3f3b96b1f8	Revert D20735881: [pytorch][PR] [WIP] [reland][pytorch][PR] Fix some incorrect annotation… Test Plan: revert-hammer Differential Revision: D20735881 Original commit changeset: d21e940380f0 fbshipit-source-id: fb50a099320bfac92c9b8e1ca12cdc50d302342f	2020-03-30 12:28:27 -07:00
peter	e7a37823b0	[WIP] [reland][pytorch][PR] Fix some incorrect annotation… (#35588 ) Summary: …s found by clang-cl" This reverts commit a9b540d109aa72e6ba8748019ef1c3ba0d8fac2b. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35588 Differential Revision: D20735881 Pulled By: ezyang fbshipit-source-id: d21e940380f0c1b9b9b84e9cc892985fd3ad0ac3	2020-03-30 11:42:19 -07:00
peter	3bdc4a37ed	CMake script cleanup - mixed case for function names (#35589 ) Summary: Running the following code. ```bash cmake --help-command-list \| grep -v "cmake version" \| while read c; do echo 's/\b'"$(echo $c \| tr '[:lower:]' '[:upper:]')"'$\s$(/'"$c"'\1(/g' done >convert.sed && git ls-files -z -- bootstrap '.cmake' '.cmake.in' 'CMakeLists.txt' \| egrep -z -v '^(cmake/Modules/\|cmake/Modules_CUDA_fix/)' \| xargs -0 sed -i -f convert.sed && rm convert.sed ``` cmake-lint is too sensitive about mixed case so I didn't switch the check on. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35589 Differential Revision: D20735648 Pulled By: ezyang fbshipit-source-id: a09a60a7ce921bb198575a35335faa299bd10b66	2020-03-30 11:37:02 -07:00
Nikita Shulga	bf2b411730	Save results of cpp unittest to `test/test-reports` folder (#35686 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35686 Test Plan: CI Differential Revision: D20739679 Pulled By: malfet fbshipit-source-id: 813e192a6dec1193a21a69cefd45198c8f1361d1	2020-03-30 11:32:12 -07:00
Johannes M Dieterich	0eb26fb01e	[ROCm] Properly blacklist (#35230 ) Summary: test_python_all_except_nn + /usr/bin/python3.6 test/run_test.py --exclude test_nn test_jit_simple test_jit_legacy test_jit_fuser_legacy --verbose --bring-to-front test_quantization test_quantized test_quantized_tensor test_quantized_nn_mods --determine-from= test_nn continues to be run as part of test1 target This will allows us to run run_test.py and correctly disabling these sets for ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35230 Differential Revision: D20735851 Pulled By: ezyang fbshipit-source-id: 255d21374c9605c8f8b6ffa1b08f58fb10d8e543	2020-03-30 08:57:03 -07:00
peter	e3daf70184	Fix AVX detection with clang-cl (#35653 ) Summary: Defining macros `/D__F16C__` or sth similar won't work on clang-cl. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35653 Differential Revision: D20735878 Pulled By: ezyang fbshipit-source-id: 392a664b0a9e74222b1a03b8c3f6ebb2c61d867e	2020-03-30 07:53:37 -07:00
Jerry Zhang	340048b67c	[quant][graphmode] Remove unused patterns (#35385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35385 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20655298 fbshipit-source-id: bc5eda2640a809adb55d3d645c65fb02a6f2f444	2020-03-29 23:48:15 -07:00
Lara Haidar	728c7dcea3	ONNX Update training ops and training amenable export API (#35567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35567 Reviewed By: hl475 Differential Revision: D20715339 Pulled By: houseroad fbshipit-source-id: ad88097e76b169035ab5814b769dc1bed54c6008	2020-03-29 23:14:25 -07:00
Dmytro Dzhulgakov	1f759936f0	Propagate model id used by Predictor to Caffe2 logging Summary: Does the same things as D19658565 but for Caffe2 models. From investigation https://fb.quip.com/PbgsAEmoJVuf the model id that predictor uses and the model id saved inside the model don't match. Common reason is recurring fluent2 jobs but there are others. Since model_id from predictor is what the rest of datasets use, it's way more useful imho. I've considered adding both ids, but it'd require additional piping and I don't think it's that useful. Test Plan: unittests added Reviewed By: houseroad Differential Revision: D20630599 fbshipit-source-id: 3e6d0cb0b6f8c8b6ae5935138f55ae7a2ff60653	2020-03-29 23:07:32 -07:00
Tao Xu	2c19b53d4f	[iOS] Enable selective build for testing FBNet in PyTorchPlayground (#35647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35647 Since we have enabled the unit test for FBNet on iOS, it'll block people from landing due to the missing support for selective build. This PR adds the missing ops in PyTorchPlaygronud to support FBNet. ghstack-source-id: 101098537 allow-large-files Test Plan: - `buck test PyTorchPlayground` Reviewed By: iseeyuan Differential Revision: D20723020 fbshipit-source-id: dc4443f50bb39166dbf45ca159bb32d5b45d2eea	2020-03-29 20:42:51 -07:00
Edward Yang	9e3605de98	[RELAND] New operator registration API (#35061 ) (#35629 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/35061 ; removed the get qualified type name magic from debug strings to work around MSVC 2017 bug. Main points of the new API: - You can register implementations (impl) without having to specify a schema. - Registrations are commutative, so no matter what order your static initializers run, you end up with the same end result. op_registration_test.cpp contains a reasonably comprehensive accounting for the available API surface How does this implementation proceed? The basic concept is to relax the internal invariants of Dispatcher data structures to allow the possibility that a FunctionSchema is not specified in an Operator. - DispatchKeyExtractor has an uninitialized state where it doesn't look for dispatch keys in any arguments of the stack. It can have a schema (de)registered to itself post facto with registerSchema/unregisterSchema. - DispatchTable has a new constructor taking only an OperatorName for the uninitialized state. It can have a schema (de)registered to itself post facto with registerSchema/unregisterSchema - OperatorDef maintains counts of both defs and well as defs_and_impls. defs_and_impls keeps track of the outstanding impl registrations; you may have impl registrations but no defs. If there are no defs (no schema), the operator is not returned by findSchema. A new findOperatorByName fucntion unconditionally returns the OperatorHandle even if there's no schema. OperatorHandle::hasSchema can be used to check if the operator has schema. - Replaced 'registerKernel' with 'registerImpl', which is the new interface for directly registering kernels without implementations. - Because 'registerImpl' no longer requires an OperatorHandle, change 'registerDef' to only return a RegistrationHandleRAII. This is marginally less efficient (since we're doing two hash table lookups on a registration now), but this won't matter in the long term, and probably doesn't matter now either. - Rename registerBackendFallbackKernel to registerFallback (this exposed a bunch of places where we're improperly directly interfacing with Dispatcher; we need to add this capability to the true public API) - All code generated internal registrations are switched to use the new API. This includes VariableType registrations (which previously weren't converted) and the mobile autograd stuff - Switch the new-style def()/impl() APIs to interact directly with Dispatcher, rather than indirecting through the old API - We deleted alias analysis kind merging entirely. As a nod to BC, it's possible to define a full schema with alias analysis kind, and then later do another full schema def with missing alias analysis kind, but the opposite direction is not allowed. We can remove this entirely following the plan at https://github.com/pytorch/pytorch/issues/35040 - Schema matching is moved inside the dispatcher, because we might not be able to immediately schema match at the point of an impl() (because we don't have the schema yet). To do this, we store the inferred function schema inside a KernelEntry, so we can check it when we get the real schema. - Registered kernel functions now store a debug string which can be used to more easily identify them. Tests use this to distinguish between multiple distinct registrations; regular invocations get only very basic information. Because we need our static initializers to work no matter what order they're run, the testing strategy on this PR is quite involved. The general concept: - Bind a (very gimped) version of the dispatcher API from Python, so that we can easily write a more complex testing harness using expect tests. - For series of registrations we want to test, exhaustively test every possible permutation of registrations (and deregistrations), and show that the intermediate states agree no matter what path is taken. - Intermediate states are rendered using a new dumpState() debugging method that prints the internal state of the dispatcher. This method may be generally useful for people who want to see what's in the dispatcher. - Simultaneously, add a new invariant testing function which checks that the internal invariants of the dispatcher are upheld (so we don't have to print internal implementation details of the dispatcher) The testing framework found a few bugs in development. For example, here is a case where we registered schema too early, before checking if it was valid: ``` Traceback (most recent call last): File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch ], raises=True) File "test/test_dispatch.py", line 135, in commute results=results, raises=raises) File "test/test_dispatch.py", line 83, in run_permutation .format(ctor_order[:i], op_ix)) File "test/test_dispatch.py", line 59, in check_invariants .format(expected_provenance, actual_provenance) AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n' name: test::foo - schema: (none) + schema: test::foo(Tensor x, Tensor y) -> (Tensor) catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0) : expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!) ``` There are also C++ smoketests for the API. These tests comprehensively cover the C++ API surface of the new operator registration API, but don't check very hard if the API does the right thing (that's what test_dispatch.py is for) Some miscellaneous changes which could have been split into other PRs, but I was too lazy to do so: - Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName) - Add cloneWithName functionality to FunctionSchema - Unconditionally generate schema registration, even when type_method_dispatch is a dict. The one exception is for manual registrations.... - Add fallback, CppFunction::makeFallthrough and CppFunction::makeFromBoxedFunction to public API of op_registration, so we can stop calling internal registerImpl directly - Add new syntax sugar dispatch_autograd for registering autograd kernels - Minor OperatorName cleanup, storing OperatorName in DispatchTable and defining operator<< on OperatorName - Refactored the op registration API to take FunctionSchema directly. We now do namespacing by post facto fixing up the OperatorName embedded in FunctionSchema. This also means that you can now do torch::import("ns1").def("ns2::blah") and have the ns2 override ns1 (although maybe this is not the correct behavior.) - New torch::schema public API, for attaching alias analysis kind annotation kinds. This meant we had to template up some function signatures which previously took const char*. There's now a nice comment explaining this strategy. - torch::import now takes std::string which means we can use the namespacing from Python Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35629 Differential Revision: D20724551 Pulled By: ezyang fbshipit-source-id: befa46a1affb4ec4ae1fb39e3564a63695a6ca41	2020-03-29 19:48:29 -07:00
Jerry Zhang	86be6443d8	[quant][graphmode] Quantization support for `aten::conv3d` (#35347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35347 Test Plan: python test/test_jit.py TestJit.test_quantized_conv3d Imported from OSS Differential Revision: D20655304 fbshipit-source-id: 2ab6a977eda9064fbb8051669738f37b90f13b6f	2020-03-29 17:39:06 -07:00
Jongsoo Park	e397f87c4b	[aten] remove variable set but never used warning (#34015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34015 Remove warning ``` caffe2/aten/src/ATen/native/cuda/BatchLinearAlgebra.cu(1400): warning: variable "info" was set but never used ``` Test Plan: CI Reviewed By: jianyuh Differential Revision: D20181160 fbshipit-source-id: 31d44522a558fe7c2661a84dd6c35eb9d05b757a	2020-03-29 15:29:22 -07:00
Jerry Zhang	77b4e2d2fc	[quant][graphmode][fix] Add filter for `quantized::add` (#35345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35345 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20655297 fbshipit-source-id: b93791117aaea228ee36c99761b3a46ccd3ea6d1	2020-03-29 14:02:10 -07:00
Jongsoo Park	915a45298c	[aten] remove warning on change of sign (#34016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34016 Remove warning ``` caffe2/aten/src/ATen/native/cuda/Reduce.cuh(654): warning: integer conversion resulted in a change of sign ``` When acc_ptr_ != nullptr , numerator_ and denominator_ must have been initialized. Other minor changes: * Make member variables of AccumulationBuffer private * size_factor_ is used nowhere. Test Plan: CI Reviewed By: dskhudia Differential Revision: D20181169 fbshipit-source-id: e4d023f7fa0692e62be21cfbd971cad8dfb69ea4	2020-03-29 13:29:23 -07:00
Benny Chen	dbd2b8bb41	[SigridHashOp] Fix converter (#34836 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34836 Once SigridHashOp argument is supplied, I realized the shape inference is still wrong because the argument is not supplied in the debug_ssa. Thanks to yinghai, I didn't fix the converter, fixing it in this diff Test Plan: Run the binary, and checked the exported op op { input: "sequential_250/parallel/normalization/dper_feature_normalization/sparse_features_processor/sparse_feature_transform/gather_ranges_GSF_IDLIST_COOCCUR_APP_ID_NEKO_ORGANIC_1D_7D_INSTALL_V1/gathered_values_0" output: "sequential_250/parallel/normalization/dper_feature_normalization/sparse_features_processor/sparse_feature_transform/sequential_1/hash_feature_ids/SigridHash:0_0" type: "SigridHash" arg { name: "salt" i: 0 } arg { name: "maxValue" i: 100000 } arg { name: "hashIntoInt32" i: 1 } arg { name: "net_pos" i: 3 } } it now have hashIntInt32 Reviewed By: yinghai Differential Revision: D20457057 fbshipit-source-id: 023ade5e66df82037a8f2da3174383dda8aff230	2020-03-29 13:06:05 -07:00
Jerry Zhang	6fc2403951	[quant][graphmode] qconfig_dict support None (#35336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35336 Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D20655302 fbshipit-source-id: b453f3240ac487aa29629953b4d71274dbbc25fc	2020-03-29 12:47:47 -07:00
Nikita Shulga	64058796e0	clang-format (#35635 ) Summary: TestPlan: CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/35635 Differential Revision: D20722979 Pulled By: malfet fbshipit-source-id: fb04905118e2590cbf109170552ac45e510b67a8	2020-03-29 12:22:43 -07:00
Mike Ruberry	860790de88	Makes torch.real and torch.imag NumPy compatible, but disables them for complex tensors (#35560 ) Summary: The current implementations of torch.real and torch.imag are not NumPy compatible. In particular: - torch.real on a real tensor does not return the real tensor, like contiguous - torch.real on a complex tensor does not return a real-valued view of the real part - torch.imag on a complex tensor does not return a real-valued view of the imaginary part - torch.Tensor.real and torch.Tensor.imag exist as methods, but in NumPy they are writable attributes This PR makes the functions NumPy compatible by removing the method variants and out kwarg, restricting them to work on only real tensors, and updating the behavior of torch.real to return its input. New tests are added to test_torch.py to verify the behavior, a couple existing complex tests are skipped, and the documentation is updated to reflect the change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35560 Differential Revision: D20714568 Pulled By: mruberry fbshipit-source-id: 5dd092f45757b620c8426c829dd15ee997246a26	2020-03-29 02:09:00 -07:00
Jerry Zhang	67c3822944	[quant][graphmode] Make `aten::relu` a general op (#35420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35420 This PR makes `aten::relu` a general op that doesn't require observation This means we also need to change the logic to support skipping intermediate values because this breaks `conv - relu` pattern if it is not followed by something that is quantizable since `conv` is quantizable, but we decide to skip observing between conv and relu. We changed the old `skip_values` to a new `delay_observation_map_` which records information that allow us to delay the observation of certain values until later points. In the case of `conv - relu` pattern, we delayed the observation of output of `conv` and observe the output of `relu` instead. Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20655309 fbshipit-source-id: 37dbe8a5e2f4cd7582ed67c405f9cf437dd00dbe	2020-03-28 21:29:07 -07:00
Jerry Zhang	efec027653	[quant][graphmode] `prepare_script` takes original qconfig_dict (#35335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35335 We'll script the qconfig_dict in `prepare_script` Test Plan: regression tests in `python test/test_jit.py` Imported from OSS Differential Revision: D20655311 fbshipit-source-id: 002bfd905ff9a9b298a8073d42e12cfffcd1eb71	2020-03-28 18:36:46 -07:00
Edward Yang	227beb9095	Revert D20680520: New operator registration API Test Plan: revert-hammer Differential Revision: D20680520 Original commit changeset: 5d39a28e4ec7 fbshipit-source-id: 5b2497ffc24db9a05b01d526f161bc0164f9f707	2020-03-28 14:49:56 -07:00
Pavel Belevich	486277a309	Replace four make_offset_calculator functions with one (#35551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35551 Test Plan: Imported from OSS Differential Revision: D20703649 Pulled By: pbelevich fbshipit-source-id: 7ab13107dc630de63b3dee776697a96439ffb033	2020-03-28 14:42:26 -07:00
Jerry Zhang	444332710c	[quant][graphmode] Quantization support for `quantized::add_scalar` (#35334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35334 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20655299 fbshipit-source-id: 66e1fa215a4a40f40dc7abe442c05bb5b6b20cfe	2020-03-28 14:00:44 -07:00
Nick Korovaiko	76d5102587	add a cuda/fuser job for legacy graph executor (#35419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35419 Differential Revision: D20719013 Pulled By: Krovatkin fbshipit-source-id: 745d9523a5a9b7b4b556a075351ea58a82501dff	2020-03-28 12:11:18 -07:00
Mikhail Zolotukhin	cd00bbc23f	clang-format. (#35605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35605 Test Plan: Imported from OSS Reviewed By: orionr Differential Revision: D20720486 Pulled By: ZolotukhinM fbshipit-source-id: f081a9fb6ef84fdce3b8f071d5e251e267854a18	2020-03-28 11:45:06 -07:00
Edward Yang	28ab8c6ff8	New operator registration API (#35061 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35061 Main points of the new API: - You can register implementations (impl) without having to specify a schema. - Registrations are commutative, so no matter what order your static initializers run, you end up with the same end result. op_registration_test.cpp contains a reasonably comprehensive accounting for the available API surface How does this implementation proceed? The basic concept is to relax the internal invariants of Dispatcher data structures to allow the possibility that a FunctionSchema is not specified in an Operator. - DispatchKeyExtractor has an uninitialized state where it doesn't look for dispatch keys in any arguments of the stack. It can have a schema (de)registered to itself post facto with registerSchema/unregisterSchema. - DispatchTable has a new constructor taking only an OperatorName for the uninitialized state. It can have a schema (de)registered to itself post facto with registerSchema/unregisterSchema - OperatorDef maintains counts of both defs and well as defs_and_impls. defs_and_impls keeps track of the outstanding impl registrations; you may have impl registrations but no defs. If there are no defs (no schema), the operator is not returned by findSchema. A new findOperatorByName fucntion unconditionally returns the OperatorHandle even if there's no schema. OperatorHandle::hasSchema can be used to check if the operator has schema. - Replaced 'registerKernel' with 'registerImpl', which is the new interface for directly registering kernels without implementations. - Because 'registerImpl' no longer requires an OperatorHandle, change 'registerDef' to only return a RegistrationHandleRAII. This is marginally less efficient (since we're doing two hash table lookups on a registration now), but this won't matter in the long term, and probably doesn't matter now either. - Rename registerBackendFallbackKernel to registerFallback (this exposed a bunch of places where we're improperly directly interfacing with Dispatcher; we need to add this capability to the true public API) - All code generated internal registrations are switched to use the new API. This includes VariableType registrations (which previously weren't converted) and the mobile autograd stuff - Switch the new-style def()/impl() APIs to interact directly with Dispatcher, rather than indirecting through the old API - We deleted alias analysis kind merging entirely. As a nod to BC, it's possible to define a full schema with alias analysis kind, and then later do another full schema def with missing alias analysis kind, but the opposite direction is not allowed. We can remove this entirely following the plan at https://github.com/pytorch/pytorch/issues/35040 - Schema matching is moved inside the dispatcher, because we might not be able to immediately schema match at the point of an impl() (because we don't have the schema yet). To do this, we store the inferred function schema inside a KernelEntry, so we can check it when we get the real schema. - Registered kernel functions now store a debug string which can be used to more easily identify them. There's some best effort stuff based on __FUNCSIG__ but this is only really capable of reporting types and not function symbols. Tests use this to distinguish between multiple distinct registrations. Because we need our static initializers to work no matter what order they're run, the testing strategy on this PR is quite involved. The general concept: - Bind a (very gimped) version of the dispatcher API from Python, so that we can easily write a more complex testing harness using expect tests. - For series of registrations we want to test, exhaustively test every possible permutation of registrations (and deregistrations), and show that the intermediate states agree no matter what path is taken. - Intermediate states are rendered using a new dumpState() debugging method that prints the internal state of the dispatcher. This method may be generally useful for people who want to see what's in the dispatcher. - Simultaneously, add a new invariant testing function which checks that the internal invariants of the dispatcher are upheld (so we don't have to print internal implementation details of the dispatcher) The testing framework found a few bugs in development. For example, here is a case where we registered schema too early, before checking if it was valid: ``` Traceback (most recent call last): File "test/test_dispatch.py", line 164, in test_def_impl_schema_mismatch ], raises=True) File "test/test_dispatch.py", line 135, in commute results=results, raises=raises) File "test/test_dispatch.py", line 83, in run_permutation .format(ctor_order[:i], op_ix)) File "test/test_dispatch.py", line 59, in check_invariants .format(expected_provenance, actual_provenance) AssertionError: 'name[16 chars]ema: (none)\ncatchall: boxed unboxed :: (Tenso[18 chars]0)\n' != 'name[16 chars]ema: test::foo(Tensor x, Tensor y) -> (Tensor)[53 chars]0)\n' name: test::foo - schema: (none) + schema: test::foo(Tensor x, Tensor y) -> (Tensor) catchall: boxed unboxed :: (Tensor _0) -> (Tensor _0) : expected from running ctors (1,); actual from running ctors (1,) and then failing to run ctor 0 (did this failure leave the dispatcher in a wedged state? it shouldn't!) ``` There are also C++ smoketests for the API. These tests comprehensively cover the C++ API surface of the new operator registration API, but don't check very hard if the API does the right thing (that's what test_dispatch.py is for) Some miscellaneous changes which could have been split into other PRs, but I was too lazy to do so: - Add torch::jit::parseName (mirroring parseSchema/parseSchemaOrName) - Add cloneWithName functionality to FunctionSchema - Unconditionally generate schema registration, even when type_method_dispatch is a dict. The one exception is for manual registrations.... - Add fallback, CppFunction::makeFallthrough and CppFunction::makeFromBoxedFunction to public API of op_registration, so we can stop calling internal registerImpl directly - Add new syntax sugar dispatch_autograd for registering autograd kernels - Minor OperatorName cleanup, storing OperatorName in DispatchTable and defining operator<< on OperatorName - Refactored the op registration API to take FunctionSchema directly. We now do namespacing by post facto fixing up the OperatorName embedded in FunctionSchema. This also means that you can now do torch::import("ns1").def("ns2::blah") and have the ns2 override ns1 (although maybe this is not the correct behavior.) - New torch::schema public API, for attaching alias analysis kind annotation kinds. This meant we had to template up some function signatures which previously took const char*. There's now a nice comment explaining this strategy. - torch::import now takes std::string which means we can use the namespacing from Python Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20680520 Pulled By: ezyang fbshipit-source-id: 5d39a28e4ec7c73fe4b1fb2222e865ab65e188f5	2020-03-28 10:52:49 -07:00
Jerry Zhang	e90c32f11f	[quant][graphmode][refactor] Support filter function in quant fusion patterns (#35333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35333 Test Plan: regression tests in: python test/test_jit.py Imported from OSS Differential Revision: D20655312 fbshipit-source-id: 50b937bc56aff93f20fe9a0079bf3aec50f6d25d	2020-03-28 08:23:44 -07:00
Nikita Shulga	5557ceb84e	Remove `pytorch_linux_xenial_py3_5` build and test jobs (#35587 ) Summary: Because Python-3.5 is no longer supported on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/35587 Differential Revision: D20718522 Pulled By: orionr fbshipit-source-id: e8fdef044d8861fe062318c686215286fec3808b	2020-03-28 06:35:54 -07:00
Nick Gibson	5b3492df18	[TensorExpr] Extend arithmetic simplifier to work with multi variable expressions (Attempt 2) (#35415 ) Summary: https://github.com/pytorch/pytorch/pull/35127 was landed and reverted because I missed a test fail (oops). I have found and fixed the issue, which was due to zero terms being introduced after the point that filtered them out (usually required NAN/INF, e.g. x / INF => 0). See https://github.com/pytorch/pytorch/pull/35127 for more info. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35415 Reviewed By: ZolotukhinM Differential Revision: D20702957 Pulled By: nickgg fbshipit-source-id: 119eb41e9fa676bd78e3d1df99297a47ae312185	2020-03-28 00:19:55 -07:00
Mike Ruberry	683246e5ea	Improves precision of linspace, logspace (#35461 ) Summary: The Torch algorithms for linspace and logspace conceptually compute each of their values using: `start_value + step_value * idx` [And NumPy does the same,](`cef4dc9d91/numpy/core/function_base.py (L24)`) except NumPy then [sets the last value in its array directly.](`cef4dc9d91/numpy/core/function_base.py (L162)`) This is because the above computation is unstable when using floats, and NumPy's contract, like PyTorch's, is that the last element in the array is the stop value. In PyTorch there can be a divergence between the computed last value and the actual value. One user reported case was: `torch.linspace(-0.031608279794, 0.031531572342, 257, dtype=torch.float32)` Which causes a difference of 3.7253e-09 between the last value as set by NumPy and computed by PyTorch. After this PR the difference is zero. Instead of simply setting the last element of the tensor, this PR updates the kernels with a "symmetric" algorithm that sets the first and last array elements without requiring an additional kernel launch on CUDA. The performance impact of this change seems small. I tested with a step sizes of 2^8 and 2^22, and all timing differences were imperceptible except for 2^22 on CPU, which appears to have suffered ~5% slowdown. I think that's an acceptable performance hit for the improved precision when we consider the context of linspace. An alternative would be to simply set the last element, as NumPy does, on CPU. But I think it's preferable to keep the CPU and CUDA algorithms aligned and keep the algorithm symmetric. In current PyTorch, for example, torch.linspace starts generating values very similar to NumPy, but as the index increases so do the errors, giving our current implementation a "left bias." Two tests are added to test_torch.py for this behavior. The linspace test will fail on current PyTorch, but the logspace test will succeed since its more complex computation needs wider error bars. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35461 Differential Revision: D20712539 Pulled By: mruberry fbshipit-source-id: 2c1257c8706f4cdf080ff0331bbf2f7041ab9adf	2020-03-27 23:50:39 -07:00
Jerry Zhang	f1d69cb2f8	[quant][graphmode] Quantization support for permute and repeat_interleave (#35332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35332 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20655306 fbshipit-source-id: 43dce62ce178d5c7e68b27fd88ed5d2958014c7b	2020-03-27 22:40:25 -07:00
Jerry Zhang	df27b32014	[quant][graphmode] Make interpolate/upsample work again (#35130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35130 Test Plan: python test/test_jit.py TestJit.test_swap_dequantize_all_ops Imported from OSS Differential Revision: D20655303 fbshipit-source-id: 5ad8c6de28bcabffdfab4c9bc6a61f19f1d061cc	2020-03-27 22:38:57 -07:00
Mike Ruberry	21c94606b8	Cleans up type conversions, adds CPU test comparing with NumPy (#35374 ) Summary: Per title. Follow-up to https://github.com/pytorch/pytorch/pull/35086. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35374 Differential Revision: D20712443 Pulled By: mruberry fbshipit-source-id: 987089c14bff644fd6a636da5530dc260e1d1a68	2020-03-27 22:11:57 -07:00
Jerry Zhang	1cc4e5c338	[quant][graphmode] SwapDeQuant support prim::If (#35142 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35142 supporting swap dequant for prim::If nodes, this includes detecting all blocks of prim::If ends with dequantize, deleting these dequantize and inserting new dequantize for the output of prim::If Test Plan: see next PR that enables swap dequant for interpolate: https://github.com/pytorch/pytorch/pull/35130 Imported from OSS Differential Revision: D20655307 fbshipit-source-id: 4fd53fbde8e169b7d98251e72ca37a29acdeb295	2020-03-27 21:41:18 -07:00
Jerry Zhang	c672a7340b	[quant][graphmode][refactor] getGeneralOpTensorInputIndexes -> getGeneralOpTensorInputs (#35141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35141 This is preparing for the support of prim::If in SwapDeQuant Test Plan: . Imported from OSS Differential Revision: D20655300 fbshipit-source-id: 0c66cab37f3f46dd34217a7b99a4d25a159c8487	2020-03-27 19:28:13 -07:00
Jerry Zhang	26b2725167	[quant][graphmode][refactor] swapDeQuant takes block as arugment (#35135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35135 This in preperation for the support of prim::If in SwapDeQuant Test Plan: . Imported from OSS Differential Revision: D20655296 fbshipit-source-id: d8507e0020096940e14bc0fb7bde6a22ce706b72	2020-03-27 19:26:12 -07:00
Nikita Shulga	2ef5b947a8	Disable unit test failing on Windows (#35549 ) Summary: Introduce DISABLED_ON_WINDOWS macro, that adds `DISABLED_` prefix to string if compiled for Win32 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35549 Test Plan: CI Differential Revision: D20700915 Pulled By: malfet fbshipit-source-id: adddfe2db89b7139093ceef6899862bce0adcf2d	2020-03-27 19:20:29 -07:00
julianmack	ad1091f753	Fixes default dtype value for onnx hardtanh export (opset11) (#35467 ) Summary: Oneline fix to lara-hdr 's PR https://github.com/pytorch/pytorch/pull/30169. Default `dtype` value should be set when `dtype is None` rather than when `dtype is not None`. I didn't make an issue for this as such a small change but I have been using this locally in order to export a model with opset 11 (opset 10 still works). Pull Request resolved: https://github.com/pytorch/pytorch/pull/35467 Differential Revision: D20686048 Pulled By: mruberry fbshipit-source-id: 726a5f9c0711c7a79b171fe98b602cdef27f9b31	2020-03-27 19:15:42 -07:00
Rohan Varma	75e4c53b35	[rpc] Add a debug only check to debug python cleanup races (#35395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35395 as title ghstack-source-id: 101035263 Test Plan: CI Differential Revision: D20632634 fbshipit-source-id: 737e353982b325e73da3825b130aae6b11dbcfe7	2020-03-27 18:53:35 -07:00
davidriazati	43928effee	[jit] Remove _assert_int_or_pair op (#34509 ) Summary: This one doesn't actually do anything so we don't need an op for it. It is used inside `torch.nn.functional.unfold` which is already tested Pull Request resolved: https://github.com/pytorch/pytorch/pull/34509 Pulled By: driazati Differential Revision: D20676445 fbshipit-source-id: b72d1308bdec593367ec4e14bf9a901d0b62e1cc	2020-03-27 18:37:49 -07:00
Nikita Shulga	a9b540d109	Revert D20670031: [pytorch][PR] Fix some incorrect annotations found by clang-cl Test Plan: revert-hammer Differential Revision: D20670031 Original commit changeset: cd8018dee703 fbshipit-source-id: 6900bf46346f0f415812607e5eff67259fc7b478	2020-03-27 18:26:01 -07:00
Wanchao Liang	238903b7be	[jit] Delete polyfill typing (#27510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27510 We could delete polyfill typing bc requirements.txt require user to install typing as a dependency whether in py2 or py3, so those typing actually not getting used either ways. Test Plan: Imported from OSS Differential Revision: D20673393 fbshipit-source-id: ea5276824c6e275c1f991f8c12329040b0058d2b	2020-03-27 18:20:53 -07:00
Ailing Zhang	6d13ef719e	Update warning message for autograd issue + XLA backend (#35543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35543 Differential Revision: D20704554 Pulled By: ailzhang fbshipit-source-id: d492f0510b74b3b44bc369c08c32d4b5afc4de7f	2020-03-27 18:16:10 -07:00
Jerry Zhang	76a8d30693	[quant][graphmode] Fold quantized prepacking ops (#35077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35077 Fold the prepack ops: `quantized::linear_prepack` and `quantized::conv2d_prepack` after `freeze` Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20655301 fbshipit-source-id: fbb4223323f788c88db7b55cfafda46fad106d49	2020-03-27 17:51:51 -07:00
davidriazati	f27403d761	[jit] Fix named tuple resolution (#35409 ) Summary: Fixes #29035 Previously we were missing a case for namedtuples in our Python value resolution logic, so they were just getting resolved as regular Python values, hence the `OSError`s in the linked issue ](https://our.intern.facebook.com/intern/diff/20653496/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/35409 Pulled By: driazati Differential Revision: D20653496 fbshipit-source-id: b5db1a11e918175aa02fda92993d233695417c56	2020-03-27 17:07:26 -07:00
Michael Suo	cfcb63de34	custom class method holder should hold a unique_ptr (#35218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35218 We should express the ownership semantics directly here. Using `shared_ptr` makes it too easy to leak ownership by inadvertently storing a copy. Test Plan: Imported from OSS Differential Revision: D20682673 Pulled By: suo fbshipit-source-id: 32002ee515eb8bb7b37e6d0aac3c0695df4eec79	2020-03-27 16:58:40 -07:00
Nikita Shulga	b9adbb5002	Fix/relax CMake linter rules (#35574 ) Summary: Ignore mixed upper-case/lower-case style for now Fix space between function and its arguments violation Pull Request resolved: https://github.com/pytorch/pytorch/pull/35574 Test Plan: CI Differential Revision: D20712969 Pulled By: malfet fbshipit-source-id: 0012d430aed916b4518599a0b535e82d15721f78	2020-03-27 16:52:33 -07:00
anjali411	96eec95ece	torch.from_numpy for complex dtypes (#35531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35531 Differential Revision: D20693581 Pulled By: anjali411 fbshipit-source-id: d53e26b4175452fa00b287efbfceea18104c1364	2020-03-27 14:40:28 -07:00
Orion Reblitz-Richardson	f101949390	Remove python2 support from setup.py (#35539 ) Summary: As a followup to https://github.com/pytorch/pytorch/pull/35042 this removes python2 from setup.py and adds Python 3.8 to the list of supported versions. We're already testing this in CircleCI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35539 Differential Revision: D20709060 Pulled By: orionr fbshipit-source-id: 5d40bc14cb885374fec370fc7c5d3cde8769039a	2020-03-27 14:33:11 -07:00
peter	45c9ed825a	Formatting cmake (to lowercase without space for if/elseif/else/endif) (#35521 ) Summary: Running commands: ```bash shopt -s globstar sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i caffe2//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i torch//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i c10//CMakeLists.txt sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake sed -e 's/IF (/if(/g' -e 's/IF(/if(/g' -e 's/if (/if(/g' -e 's/ELSE (/else(/g' -e 's/ELSE(/else(/g' -e 's/else (/else(/g' -e 's/ENDif(/endif(/g' -e 's/ELSEif(/elseif(/g' -i cmake//.cmake.in ``` We may further convert all the commands into lowercase according to the following issue: `77543bde41`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35521 Differential Revision: D20704382 Pulled By: malfet fbshipit-source-id: 42186b9b1660c34428ab7ceb8d3f7a0ced5d2e80	2020-03-27 14:25:17 -07:00
Jerry Zhang	04a3345335	[quant] Make conv2d_prepack and linear_prepack pure (#35073 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35073 We want to do constant propagation for quantize_per_tensor/quantize_per_channel which will produce results that's consumed by these ops, and since we need to make sure the output of the node has no writer before constant prop through the node, the consumer needs to be pure as well. Test Plan: see next PR Imported from OSS Differential Revision: D20655310 fbshipit-source-id: 3e33662224c21b889c8121b823f8ce0b7da75eed	2020-03-27 14:19:32 -07:00
Eli Uriegas	e1773f2ac0	.circleci: Change default CUDA for pip, cu101 -> cu102 (#35309 ) Summary: So that packages are correctly marked when looking through the html pages. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35309 Differential Revision: D20626737 Pulled By: seemethere fbshipit-source-id: 0fad3d99f0b0086898939fde94ddbbc9861d257e	2020-03-27 14:13:37 -07:00
Mario Kostelac	02d6e6e55f	histc: Add a note on elements outside of given bounds (#34889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34889 Differential Revision: D20625916 Pulled By: albanD fbshipit-source-id: febb769f40d86bae8e1c7bb51d719b92bf4a572d	2020-03-27 14:04:51 -07:00
Nikita Shulga	4529d03971	Move test_libtorch from win-test2 to win-test1 group (#35540 ) Summary: Let see if it makes both test branches a bit more balanced Pull Request resolved: https://github.com/pytorch/pytorch/pull/35540 Test Plan: CI Differential Revision: D20704642 Pulled By: malfet fbshipit-source-id: 4e2ab5a80adfe78620206d4eaea30207194379cc	2020-03-27 13:10:53 -07:00
Basil Hosmer	ef511d884b	Calls to _empty_affine_quantized pass MemoryFormat by TensorOptions (#34248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34248 This argument will no longer exist in positional form when MemoryFormat is moved into TensorOptions by codegen, so we must stop using it when we make calls from C++. This diff eliminates all direct positional calls, making them be passed in using TensorOptions. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20683398 Pulled By: bhosmer fbshipit-source-id: 6928cfca67abb22fbc667ecc2af8453d93489bd6	2020-03-27 13:02:13 -07:00
Shihao Xu	05e973d673	Add WorkerInfo through TorchBind to make it an available type in TorchScript (#35447 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35447 as titled Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_class_rref_in_py_and_use_in_script ``` Differential Revision: D7923053 fbshipit-source-id: 7b80e0b28aa66343249b8af328ba251314674dcc	2020-03-27 12:41:28 -07:00
Johannes M Dieterich	835ee34e38	[ROCm] Update to ROCm 3.1.1 (#35552 ) Summary: Redux. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35552 Differential Revision: D20701593 Pulled By: ezyang fbshipit-source-id: 1946d1e8fb47d597da903bae5d355bf52a5f017f	2020-03-27 12:21:12 -07:00
Eli Uriegas	ff71a4192d	Bump base version to 1.6.0a0 (#35495 ) Summary: Since we've done the branch cut for 1.5.0 we should bump nightlies to 1.6.0 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35495 Differential Revision: D20697043 Pulled By: seemethere fbshipit-source-id: 3646187a5e729994138bf2c68625f25f11430b3a	2020-03-27 12:14:49 -07:00
Nikolay Korovaiko	9e22d15f14	Enable tensorexpr cpp tests in CI. try #2 (#35454 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35454 Differential Revision: D20665160 Pulled By: Krovatkin fbshipit-source-id: e04cbe92b2ee5a3288f3c4e5c83533bfea85bf85	2020-03-27 12:09:55 -07:00
Vitaly Fedyunin	930d218fbf	Increase Channels Last test coverage (#35504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35504 Test Plan: Imported from OSS Differential Revision: D20682117 Pulled By: VitalyFedyunin fbshipit-source-id: ddd7ef1f075ea2c5c35df7bd698974fc5c59bc40	2020-03-27 12:04:47 -07:00
Alexander Fix	3af46c90bd	[caffe2] Header path in byte_order.h (#35519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35519 Fix include of THHalf.h to be TH/THHalf.h. Makes the include consistent with the rest of caffe2. Test Plan: CI Differential Revision: D20685997 fbshipit-source-id: 893b6e96e4f1a1e7306ba2e40e4e8ee738f0344f	2020-03-27 11:57:21 -07:00
Jerry Zhang	2c300df2ac	[fix] at::print for quantized Tensor (#35545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35545 Looks like we have never printed a quantized Tensor in cpp before (Note: this ignores all push blocking failures!) Test Plan: . Imported from OSS Differential Revision: D20699748 fbshipit-source-id: 9d029815c6e75f626afabf92194154efc83f5545	2020-03-27 11:15:28 -07:00
Nikita Shulga	3cc43bcbb5	Skip slow quanitized tests under ASAN (#35533 ) Summary: Skip tests that take more than finish under a sec normally but take 20+ min under ASAN Pull Request resolved: https://github.com/pytorch/pytorch/pull/35533 Test Plan: CI Differential Revision: D20700245 Pulled By: malfet fbshipit-source-id: 7620b12d3aba1bafb2baa9073fa27c4a0b3dd9eb	2020-03-27 10:55:14 -07:00
peter	0c16cedafe	Fix some incorrect annotations found by clang-cl (#35364 ) Summary: Fixes incorrect usages of symbol annotations including: 1. Exporting or importing a function/class in an anonymous namespace. 2. Exporting or importing a function/class implementation in a header file. However, by removing the symbol annotations, they are now local symbols. If they need to be remain global, I can move the implementations to the source file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35364 Differential Revision: D20670031 Pulled By: ezyang fbshipit-source-id: cd8018dee703e2424482c27fe9608e040d8105b8	2020-03-27 10:40:04 -07:00
Hong Xu	b33e38ec47	Allow a higher-precision step type for Vec256::arange (#34555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34555 This is sometimes necessary, such as when T=int and the step size is of type double. Test Plan: Imported from OSS Differential Revision: D20687063 Pulled By: ezyang fbshipit-source-id: 33086d4252d06e7539733a9b1b3d6774e177b6da	2020-03-27 10:22:05 -07:00
Hong Xu	5a02930d3a	Vectorize (CPU) generic types for binary bitwise operators (#34338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34338 For those types not optimized for AVX2, this commit would give bitwise operations on them a boost. Benchmark (RHEL 7.7, Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz, Turbo off, Release build): ```python import timeit for op in ('bitwise_and', 'bitwise_or', 'bitwise_xor'): for dtype in ('torch.int8', 'torch.uint8'): for n, t in [(10_000, 200000), (100_000, 20000)]: print(f'a.{op}_(b), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit(f'a.{op}_(b)', setup=f'import torch; a = torch.arange(1, {n}, dtype={dtype}); b = torch.arange({n}, 1, -1, dtype={dtype})', number=t)) ``` Before: ``` a.bitwise_and_(b), numel() == 10000 for 200000 times, dtype=torch.int8 1.353799690001324 a.bitwise_and_(b), numel() == 100000 for 20000 times, dtype=torch.int8 1.056434961999912 a.bitwise_and_(b), numel() == 10000 for 200000 times, dtype=torch.uint8 1.2957618809996347 a.bitwise_and_(b), numel() == 100000 for 20000 times, dtype=torch.uint8 1.0591609650000464 a.bitwise_or_(b), numel() == 10000 for 200000 times, dtype=torch.int8 1.3113185389993305 a.bitwise_or_(b), numel() == 100000 for 20000 times, dtype=torch.int8 1.0693870880022587 a.bitwise_or_(b), numel() == 10000 for 200000 times, dtype=torch.uint8 1.3075691039994126 a.bitwise_or_(b), numel() == 100000 for 20000 times, dtype=torch.uint8 1.0589785859992844 a.bitwise_xor_(b), numel() == 10000 for 200000 times, dtype=torch.int8 1.3036618039986934 a.bitwise_xor_(b), numel() == 100000 for 20000 times, dtype=torch.int8 1.0595013140009542 a.bitwise_xor_(b), numel() == 10000 for 200000 times, dtype=torch.uint8 1.2947387999993225 a.bitwise_xor_(b), numel() == 100000 for 20000 times, dtype=torch.uint8 1.059969027999614 ``` After: ``` a.bitwise_and_(b), numel() == 10000 for 200000 times, dtype=torch.int8 0.9562859639991075 a.bitwise_and_(b), numel() == 100000 for 20000 times, dtype=torch.int8 0.6811799210008758 a.bitwise_and_(b), numel() == 10000 for 200000 times, dtype=torch.uint8 0.9522694869992847 a.bitwise_and_(b), numel() == 100000 for 20000 times, dtype=torch.uint8 0.6815469840003061 a.bitwise_or_(b), numel() == 10000 for 200000 times, dtype=torch.int8 0.8609786279994296 a.bitwise_or_(b), numel() == 100000 for 20000 times, dtype=torch.int8 0.5794818879985542 a.bitwise_or_(b), numel() == 10000 for 200000 times, dtype=torch.uint8 0.8534434389985108 a.bitwise_or_(b), numel() == 100000 for 20000 times, dtype=torch.uint8 0.5764101290005783 a.bitwise_xor_(b), numel() == 10000 for 200000 times, dtype=torch.int8 0.9634105910008657 a.bitwise_xor_(b), numel() == 100000 for 20000 times, dtype=torch.int8 0.6819724230008433 a.bitwise_xor_(b), numel() == 10000 for 200000 times, dtype=torch.uint8 1.0901075929978106 a.bitwise_xor_(b), numel() == 100000 for 20000 times, dtype=torch.uint8 0.816546294001455 ``` Test Plan: Imported from OSS Differential Revision: D20687081 Pulled By: ezyang fbshipit-source-id: 59b06460430ce181fb761e45a5bdd6379611b391	2020-03-27 10:15:53 -07:00
anjali411	3c02de0011	copy_ fixed on cuda so removing the workaround in test_many_promotions (#35528 ) Summary: copy_() launch failure fixed on cuda for complex https://github.com/pytorch/pytorch/issues/35344 so removing the workaround added in PR https://github.com/pytorch/pytorch/issues/34093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35528 Differential Revision: D20693228 Pulled By: anjali411 fbshipit-source-id: dbb6369aa5a21574a0a4fe878ca10e4ecc605f6b	2020-03-27 09:39:46 -07:00
Edward Yang	77ad3c5aeb	Revert D20683972: [pytorch][PR] Fix PyTorch separate compilation Test Plan: revert-hammer Differential Revision: D20683972 Original commit changeset: bc1492aa9d1d fbshipit-source-id: 8994cbb36877d4338b8677ac6bc807dd16efa67c	2020-03-27 09:18:48 -07:00
Jongsoo Park	16394a9d3f	[caffe2] early return for empty indices in SLS (#35498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35498 As title Test Plan: Need to run remote predictor canary In SKL T6, numactl -m 0 -C 3 ./sparse_lengths_sum_benchmark.par -d float -e 100000 --embedding-dim 1 --average-len 0 --batch-size 16 -i 1000000 Before this diff 0.000302733 ms. 100%. SparseLengthsSum After this diff 0.000214509 ms. 100%. SparseLengthsSum Reviewed By: jianyuh, ellie-wen Differential Revision: D20678075 fbshipit-source-id: c0c8359036b82ffcbcc8b2a89dfb62db7f0a9c14	2020-03-27 09:10:45 -07:00
peter	25fe7f33ce	Add cmakelint to CI (#35525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35525 Differential Revision: D20696655 Pulled By: malfet fbshipit-source-id: 1b15cd730066c8a80440b39110f7f0d51f8ebad0	2020-03-27 09:04:36 -07:00
Xiaomeng Yang	58f5a89c9a	Refactor RoIAlignOp on CPU (#34698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34698 Refactor RoIAlignOp on CPU Test Plan: buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:roi_align_rotated_op_test Reviewed By: houseroad Differential Revision: D20432434 fbshipit-source-id: 9125eb3bdc83c734222d7d4947c175e3b585afa7	2020-03-27 07:53:58 -07:00
Linbin Yu	2d023fe6a7	[7] add missing roi_align_rotated op to lite interpreter (#35244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35244 add roi_align_rotated op to lite interpreter for detectron2go model (Note: this ignores all push blocking failures!) Test Plan: try to run model in https://home.fburl.com/~stzpz/text_det/fbnet_300_20/ Reviewed By: iseeyuan Differential Revision: D20560485 fbshipit-source-id: a81f3a590b9cc5a02d4da676b3cfa52b0e0a68c3	2020-03-27 07:26:02 -07:00
Alban Desmaison	181da12126	Revert D20687652: [pytorch][PR] Report results from cpp unittests on Windows and Linux Test Plan: revert-hammer Differential Revision: D20687652 Original commit changeset: fc370b7e2614 fbshipit-source-id: 8153815c8ed8f3d4f472caa95eda76180b038a42	2020-03-27 06:56:53 -07:00
Alban Desmaison	45e1be9762	Revert D19710370: [pytorch][PR] ONNX Update training ops and training amenable export API Test Plan: revert-hammer Differential Revision: D19710370 Original commit changeset: e5e79d385529 fbshipit-source-id: d0114dc561a3415869805d3fbf43b92730bbcf54	2020-03-27 06:51:05 -07:00
Linbin Yu	e5cd17cc9e	[4] register quantized ops for lite interpreter (#35247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35247 add a leading "_" to register quantized ops for lite interpreter. They are needed by d2go model (Note: this ignores all push blocking failures!) Test Plan: (whole stack) buck build -c user.ndk_cxxflags='-g1' -c caffe2.expose_op_to_c10=1 //xplat/caffe2/fb/pytorch_predictor:maskrcnnAndroid#android-armv7 Reviewed By: iseeyuan Differential Revision: D20528760 fbshipit-source-id: 5b26d075456641b02d82f15a2d19f2266001f23b	2020-03-27 02:26:03 -07:00
Lara Haidar	025a0abe5a	ONNX Update training ops and training amenable export API (#32950 ) Summary: - Update Dropout and Batchnorm in opset 12 : https://github.com/onnx/onnx/pull/2568 - Update api logic for exporting to ONNX training amenable models Pull Request resolved: https://github.com/pytorch/pytorch/pull/32950 Reviewed By: hl475 Differential Revision: D19710370 Pulled By: houseroad fbshipit-source-id: e5e79d38552936966662c41d39ddf33be1ba3e35	2020-03-27 00:39:39 -07:00
Shihao Xu	ac639d927a	Reland "[RPC] Use qualified name str directly in RPC torch script code path" (#35489 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35489 Relanding https://github.com/pytorch/pytorch/pull/34733. Fix is in https://github.com/pytorch/pytorch/pull/34988 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_class_rref_in_py_and_use_in_script buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_module_rref_in_py_and_use_in_script ``` Differential Revision: D20661748 fbshipit-source-id: d550daab8d689d0a9aa2450f3bdb7417ab79dae2	2020-03-26 23:41:51 -07:00
Nikita Shulga	d2d40c45b6	Report results from cpp unittests on Windows and Linux (#35500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35500 Test Plan: Test in production :) Results should eventually be published to: https://circleci.com/build-insights/gh/pytorch/pytorch/master Differential Revision: D20687652 Pulled By: malfet fbshipit-source-id: fc370b7e261402e14b427f42038ecb2d95bad059	2020-03-26 23:00:33 -07:00
Martin Yuan	da4e68faed	Make operator names consistent between export_opnames and the lite interpreter (#34674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34674 Two changes to make sure the op_names dumped in export_opnames() are consistent to what are actually used in bytecode. * Inline graph before dumping the operator names. * Use code of the graph (which is used in bytecode) instead of the nodes of graph. Test Plan: Imported from OSS Differential Revision: D20610715 Pulled By: iseeyuan fbshipit-source-id: 53fa9c3b36f4f242b7f2b99b421f4adf20d4b1f6	2020-03-26 22:50:59 -07:00
Elias Ellison	8c90ae11b3	[JIT] fix glow subgraph inputs ordering (#35508 ) Summary: My PR https://github.com/pytorch/pytorch/pull/33020 changed subgraph_utils made subgraph utils non-deterministic by using a set instead of a vector for closed over values. This broke a downstream glow test. We're in the process of working with glow to not rely on the subgraph input order, but in the interim make it ordered again to fix the test. An alternative is to use a `set` instead of a vector, but I don't particularly like committing to fixed ordering for the subgraph, especially for things like if nodes and while loops where an order doesn't really have any meaning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35508 Differential Revision: D20683959 Pulled By: eellison fbshipit-source-id: bb39b29fef2904e52b9dc42be194bb57cbea59c4	2020-03-26 22:44:54 -07:00
pinzhenx	bd604cb5b7	Upgrade MKL-DNN to DNNL v1.2 (#32422 ) Summary: ## Motivation This PR upgrades MKL-DNN from v0.20 to DNNL v1.2 and resolves https://github.com/pytorch/pytorch/issues/30300. DNNL (Deep Neural Network Library) is the new brand of MKL-DNN, which improves performance, quality, and usability over the old version. This PR focuses on the migration of all existing functionalities, including minor fixes, performance improvement and code clean up. It serves as the cornerstone of our future efforts to accommodate new features like OpenCL support, BF16 training, INT8 inference, etc. and to let the Pytorch community derive more benefits from the Intel Architecture. <br> ## What's included? Even DNNL has many breaking changes to the API, we managed to absorb most of them in ideep. This PR contains minimalist changes to the integration code in pytorch. Below is a summary of the changes: <br> General: 1. Replace op-level allocator with global-registered allocator ``` // before ideep::sum::compute<AllocForMKLDNN>(scales, {x, y}, z); // after ideep::sum::compute(scales, {x, y}, z); ``` The allocator is now being registeted at `aten/src/ATen/native/mkldnn/IDeepRegistration.cpp`. Thereafter all tensors derived from the `cpu_engine` (by default) will use the c10 allocator. ``` RegisterEngineAllocator cpu_alloc( ideep::engine::cpu_engine(), [](size_t size) { return c10::GetAllocator(c10::DeviceType::CPU)->raw_allocate(size); }, [](void* p) { c10::GetAllocator(c10::DeviceType::CPU)->raw_deallocate(p); } ); ``` ------ 2. Simplify group convolution We had such a scenario in convolution where ideep tensor shape mismatched aten tensor: when `groups > 1`, DNNL expects weights tensors to be 5-d with an extra group dimension, e.g. `goihw` instead of `oihw` in 2d conv case. As shown below, a lot of extra checks came with this difference in shape before. Now we've completely hidden this difference in ideep and all tensors are going to align with pytorch's definition. So we could safely remove these checks from both aten and c2 integration code. ``` // aten/src/ATen/native/mkldnn/Conv.cpp if (w.ndims() == x.ndims() + 1) { AT_ASSERTM( groups > 1, "Only group _mkldnn_conv2d weights could have been reordered to 5d"); kernel_size[0] = w.get_dim(0) * w.get_dim(1); std::copy_n( w.get_dims().cbegin() + 2, x.ndims() - 1, kernel_size.begin() + 1); } else { std::copy_n(w.get_dims().cbegin(), x.ndims(), kernel_size.begin()); } ``` ------ 3. Enable DNNL built-in cache Previously, we stored DNNL jitted kernels along with intermediate buffers inside ideep using an LRU cache. Now we are switching to the newly added DNNL built-in cache, and no longer caching buffers in order to reduce memory footprint. This change will be mainly reflected in lower memory usage from memory profiling results. On the code side, we removed couple of lines of `op_key_` that depended on the ideep cache before. ------ 4. Use 64-bit integer to denote dimensions We changed the type of `ideep::dims` from `vector<int32_t>` to `vector<int64_t>`. This renders ideep dims no longer compatible with 32-bit dims used by caffe2. So we use something like `{stride_.begin(), stride_.end()}` to cast parameter `stride_` into a int64 vector. <br> Misc changes in each commit: Commit: change build options Some build options were slightly changed, mainly to avoid name collisions with other projects that include DNNL as a subproject. In addition, DNNL built-in cache is enabled by option `DNNL_ENABLE_PRIMITIVE_CACHE`. Old \| New -- \| -- WITH_EXAMPLE \| MKLDNN_BUILD_EXAMPLES WITH_TEST \| MKLDNN_BUILD_TESTS MKLDNN_THREADING \| MKLDNN_CPU_RUNTIME MKLDNN_USE_MKL \| N/A (not use MKL anymore) ------ Commit: aten reintegration - aten/src/ATen/native/mkldnn/BinaryOps.cpp Implement binary ops using new operation `binary` provided by DNNL - aten/src/ATen/native/mkldnn/Conv.cpp Clean up group convolution checks Simplify conv backward integration - aten/src/ATen/native/mkldnn/MKLDNNConversions.cpp Simplify prepacking convolution weights - test/test_mkldnn.py Fixed an issue in conv2d unit test: it didn't check conv results between mkldnn and aten implementation before. Instead, it compared the mkldnn with mkldnn as the default cpu path will also go into mkldnn. Now we use `torch.backends.mkldnn.flags` to fix this issue - torch/utils/mkldnn.py Prepack weight tensor on module `__init__` to achieve better performance significantly ------ Commit: caffe2 reintegration - caffe2/ideep/ideep_utils.h Clean up unused type definitions - caffe2/ideep/operators/adam_op.cc & caffe2/ideep/operators/momentum_sgd_op.cc Unify tensor initialization with `ideep::tensor::init`. Obsolete `ideep::tensor::reinit` - caffe2/ideep/operators/conv_op.cc & caffe2/ideep/operators/quantization/int8_conv_op.cc Clean up group convolution checks Revamp convolution API - caffe2/ideep/operators/conv_transpose_op.cc Clean up group convolution checks Clean up deconv workaround code ------ Commit: custom allocator - Register c10 allocator as mentioned above <br><br> ## Performance We tested inference on some common models based on user scenarios, and most performance numbers are either better than or on par with DNNL 0.20. ratio: new / old \| Latency (batch=1 4T) \| Throughput (batch=64 56T) -- \| -- \| -- pytorch resnet18 \| 121.4% \| 99.7% pytorch resnet50 \| 123.1% \| 106.9% pytorch resnext101_32x8d \| 116.3% \| 100.1% pytorch resnext50_32x4d \| 141.9% \| 104.4% pytorch mobilenet_v2 \| 163.0% \| 105.8% caffe2 alexnet \| 303.0% \| 99.2% caffe2 googlenet-v3 \| 101.1% \| 99.2% caffe2 inception-v1 \| 102.2% \| 101.7% caffe2 mobilenet-v1 \| 356.1% \| 253.7% caffe2 resnet101 \| 100.4% \| 99.8% caffe2 resnet152 \| 99.8% \| 99.8% caffe2 shufflenet \| 141.1% \| 69.0% † caffe2 squeezenet \| 98.5% \| 99.2% caffe2 vgg16 \| 136.8% \| 100.6% caffe2 googlenet-v3 int8 \| 100.0% \| 100.7% caffe2 mobilenet-v1 int8 \| 779.2% \| 943.0% caffe2 resnet50 int8 \| 99.5% \| 95.5% _Configuration: Platform: Skylake 8180 Latency Test: 4 threads, warmup 30, iteration 500, batch size 1 Throughput Test: 56 threads, warmup 30, iteration 200, batch size 64_ † Shufflenet is one of the few models that require temp buffers during inference. The performance degradation is an expected issue since we no longer cache any buffer in the ideep. As for the solution, we suggest users opt for caching allocator like jemalloc as a drop-in replacement for system allocator in such heavy workloads. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32422 Test Plan: Perf results: https://our.intern.facebook.com/intern/fblearner/details/177790608?tab=Experiment%20Results 10% improvement for ResNext with avx512, neutral on avx2 More results: https://fb.quip.com/ob10AL0bCDXW#NNNACAUoHJP Reviewed By: yinghai Differential Revision: D20381325 Pulled By: dzhulgakov fbshipit-source-id: 803b906fd89ed8b723c5fcab55039efe3e4bcb77	2020-03-26 22:07:59 -07:00
Orion Reblitz-Richardson	8240db11e1	[pytorch] Remove python2 support from tests and torch.jit (#35042 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35042 Removing python2 tests and some compat code in torch.jit. Check if dependent projects and external tests have any issues after these changes. Test Plan: waitforsandcastle Reviewed By: suo, seemethere Differential Revision: D18942633 fbshipit-source-id: d76cc41ff20bee147dd8d44d70563c10d8a95a35	2020-03-26 21:29:51 -07:00
Rohan Varma	98362d11ff	[rpc] create error string in listenLoop outside of lock (#35393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35393 this was being created inside the lock scope, but we don't need to hold the lock for this. ghstack-source-id: 100953426 Test Plan: CI Differential Revision: D20632225 fbshipit-source-id: dbf6746f638b7df5fefd9bbfceaa6b1a542580e2	2020-03-26 20:57:01 -07:00
Ailing Zhang	77bbbf042d	[JIT]Support converting str to float. (#35352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35352 Differential Revision: D20649286 Pulled By: ailzhang fbshipit-source-id: e9b09bddd0fe3c962a7514d45fd069cd0b4e6df1	2020-03-26 20:24:59 -07:00
Jiakai Liu	00a261fddd	[pytorch] add fallthrough variable kernel for C10_MOBILE (#35491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35491 The goal of this diff is to avoid having to set AutoNonVariableTypeMode guard in client code that uses custom mobile build. The guard was necessary because custom mobile build might not include variable kernels, in which AutoNonVariableTypeMode guard is usually set. It's hard to enforce all callsites to follow this rule, so we make this change to simplify it. Another goal of the diff is to not break FL where real variable kernels are registered. ghstack-source-id: 100944553 Test Plan: - With stacked diff, tested lite-trainer with MnistModel: ``` buck run xplat/caffe2/fb/lite_trainer:lite_trainer \ -c pt.disable_gen_tracing=1 \ -- --model=/home/liujiakai/ptmodels/MnistModel.bc ``` - Will test with the papaya sample app. Differential Revision: D20643627 fbshipit-source-id: 37ea937919259c183809c2b7acab0741eff84d33	2020-03-26 20:08:05 -07:00
peter	f5383a213f	Fix openmp detection with clang-cl (#35365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35365 Differential Revision: D20653049 Pulled By: ezyang fbshipit-source-id: 193c0d956b1aea72b3daa104ef49c4bf167a165a	2020-03-26 19:59:53 -07:00
anjali411	5371fdb1a0	[C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer (#34957 ) Summary: 1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer) 2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function. 3. BC-compatibility serialization test for LBFGS 4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions 5. Made defaults_ optional argument in all optimizers except SGD TODO: add BC-breaking notes for this PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20678162 Pulled By: yf225 fbshipit-source-id: 74e062e42d86dc118f0fbaddd794e438b2eaf35a	2020-03-26 19:53:02 -07:00
Elias Ellison	e68afe3ab9	[JIT] remove prim::shape op (#34286 ) Summary: Desugar prim::shape to aten::size so that passes don't need to reason about both ops. Serialized models still resolve to `prim::shape` so this doesn't break BC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34286 Differential Revision: D20316818 Pulled By: eellison fbshipit-source-id: d1585687212843f51e9396e07c108f5c08017818	2020-03-26 19:29:25 -07:00
Pritam Damania	8f18cdf2b8	[Autograd Testing] Few refactors to test_autograd.py (#35443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35443 Addressing Wanchao's comments from https://github.com/pytorch/pytorch/pull/35268. ghstack-source-id: 100944390 Test Plan: waitforbuildbot Differential Revision: D20662292 fbshipit-source-id: d98bf27106e858fe81e0f7755639c7da0f322913	2020-03-26 18:57:52 -07:00
svcscm	5d9694250c	Updating submodules Summary: GitHub commits: `6a867586ed` `bf0ba207b5` `b90f25fcfe` `ea2ad0ad00` `f32a0cc4a7` `23826a3f97` `6301dbe7a7` `3332b50f59` `b6cf025c4f` `683abef629` `099bb93f87` `10214d1d1b` `5b848ab61d` `a6e81fb889` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: cfd395231e68b7d026fce966bcb8cddf10996770	2020-03-26 18:51:35 -07:00
Michael Suo	9970be2fd2	Update git-pre-commit (#35511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35511 Differential Revision: D20684849 Pulled By: suo fbshipit-source-id: e059e15230d1a4064f45df5c7895b220c9cc20d9	2020-03-26 18:45:33 -07:00
Shihao Xu	9b4bbaab53	Add RRef.local_value() for TorchScript (#35433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35433 Make RRef TorchScript API the same as RRef Python API. Differential Revision: D7923050 fbshipit-source-id: 62589a429bcaa834b55db6ae8cfb10c0a2ee01ff	2020-03-26 18:06:13 -07:00
Tristan Rice	d4f3bc7f8e	[dt] [caffe2] add/fix shape inference for StumpFunc, SliceGradient and ResizeLike (#35430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35430 This fixes and adds tests for several commonly used operators. There's some formatting differences due to running clang-format on one of the files. Test Plan: buck test //caffe2/caffe2/fb/operators:hypothesis_test //caffe2/caffe2/python/operator_test:utility_ops_test //caffe2/caffe2/python/operator_test:concat_split_op_test Reviewed By: yyetim Differential Revision: D20657405 fbshipit-source-id: 51d86d0834003b8ac8d6acb5149ae13d7bbfc6ab	2020-03-26 17:50:32 -07:00
Nikita Shulga	2e739f822b	Fix PyTorch separate compilation (#34863 ) Summary: Looks like there is a bug in CUDA device linker, but kernels that uses `thust::sort_by_key` can not be linked with other kernels Solve the problem by splitting 5 thrust-heavy .cu files into `__torch_cuda_sp` library which is statically linked into `torch_cuda` For default compilation workflow it should not make any difference. Test Plan: Compile with `-DCUDA_SEPARABLE_COMPILATION=YES` and observe library size difference: 310Mb before, 173Mb after if compiled for sm_75 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34863 Differential Revision: D20683972 Pulled By: malfet fbshipit-source-id: bc1492aa9d1d2d21c48e8764a8a7b403feaec5da	2020-03-26 17:49:07 -07:00
Ailing Zhang	2f6f1781af	Add warning to a known autograd issue on XLA backend. (#35449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35449 Differential Revision: D20676835 Pulled By: ailzhang fbshipit-source-id: c351eb5650ff09654f7c2e3588dfea19dcde3856	2020-03-26 17:44:12 -07:00
Supriya Rao	8074779328	[quant][graph] Update dynamic quant tests to use new qconfig (#35451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35451 default_dynamic_qconfig now holds activation observer Test Plan: python test/test_quantize_script.py Imported from OSS Differential Revision: D20664585 fbshipit-source-id: 78cb6747705d230d2bbcfdae59210b4b998d0d15	2020-03-26 17:39:49 -07:00
Supriya Rao	daba68c601	[quant][graph] Add a new observer type for dynamic quantization (#35455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35455 In graph mode we need to observer the activation tensor for dynamic quantization. This observer should behave the same way as the quantization functions called in the dynamic operator. Currently for qlinear_dynamic we call quant_utils::ChooseQuantizationParams which has its own logic for calculating scale and zero_point. We mimic those calculations in the new observer. Test Plan: python test/test_quantization.py ObserverTest Imported from OSS Differential Revision: D20664586 fbshipit-source-id: e987ea71fff777c21e00c498504e6586e92568a2	2020-03-26 17:38:21 -07:00
Jongsoo Park	086dba3804	[caffe2] move fused SparseAdagrad to open source (#35164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35164 As title Test Plan: CI Reviewed By: jianyuh Differential Revision: D20581853 fbshipit-source-id: 393ddd9487cd965c465eaa49e1509863618a6048	2020-03-26 17:29:12 -07:00
Andrew Gallagher	91e4685514	[modules][caffe2/aten] Fix `#include` inside of namespace error (#35302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35302 This is an error in modular builds. Test Plan: CI Reviewed By: igorsugak Differential Revision: D20591224 fbshipit-source-id: 44e8e1be9e54b94f7b54be6bdeb4260a763667ce	2020-03-26 17:17:57 -07:00
Wanchao Liang	618104185b	[autograd] enable graph level thread parallelism on CPU (#33157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33157 This PR enables graph level thread parallelism on CPU for the Autograd Engine. It replace https://github.com/pytorch/pytorch/pull/29574 for the reason of task level parallelism drawbacks with the existing autograd system. Fixes https://github.com/pytorch/pytorch/issues/18333 The graph level parallelism on CPU design: 1. Remove the single CPU thread that init in the Engine itself and allow the owning thread (which calls Engine::execute) to drive the Engine execution so that we could let outer threading to enable thread parallelism. 2. Maintain a separate ReadyQueue per CPU thread, and stash the ReadyQueue for different devices/threads into the thread local shared_ptr, the Engine itself will memorize the shared_ptr of the ReadyQueue to different devices (other than CPU) 3. The CPU thread local ReadyQueue is initialized per CPU thread Engine::execute call (or `backward()`, `grad()` call), and memorized the shared_ptr into the GraphTask since every `backward()` call have its own GraphTask 4. Cross device NodeTask push is accomplished by 2 and 3. we can refer to device's ReadyQueue from Engine, and CPU's ReadyQueue from GraphTask, which means if we can push to a different ReadyQueue according to the device 5. Termination of the CPU thread: if we mark the graph_task as completed, we will exit the while loop and terminate the current backward execution, because it's guranteed that all other NodeTasks is finished before we mark a GraphTask as complete 6. re-entrant thread logic keeps the same, reentrant thread detection is similar as before, we set the worker_device to NO_DEVICE initially and set to CPU afterward to detect if this is a reentrant call or not. 7. we still have the reentrant thread pool that create new threads if it's a deep reentrant case, and reuse the ReadyQueue with the parent thread for performance. Since we introduce the thread parallelism on CPU, we have to ensure the thread safety of the GraphTask. This is not a problem if we execute all forward in different threads since we will build separate GraphTask in different threads, and each GraphTask is a separate instance that share nothing, i.e. Hogwild training on CPU should be fine on this case. But there might be case that user would like to do some part of the task in a single thread, and do the rest of work in several threads concurrently, so thread safety is crucial in those cases. The thread safety strategy for the multithread autograd is as follows: 1. Add a mutex to protect thread safety in Autograd Node/Function, and hold the lock for different data racing cases 2. Lock the mutex during Node::apply(), this is to ensure Node that writing to the shared variable are not racing across threads (i.e. AccumulateGrad and custom C++ Autograd Node if writing to shared variables ) 3. Lock the mutex during Node::release_variables(), this serve the purpose that when we release saved_variables from one thread, no other threads can call the Node::apply(), this ensures the variable references from other threads aren't dangling. 4. If we don't release any variables and no shared data read/write in the Node i.e. purely functional, we don't lock the mutex This way we could protect the thread safety on Autograd Node, but we could still not protect the thread safety on Node pre/post C++ hooks (python hooks are automatically thread safe), we rely on the user to write thread safe C++ hooks if they want the hook to be correctly applied in multithreading environment. User visiable changes: There're not too much user visiable changes, since we use the owning thread to drive the autograd execution, user could write their own threading code and does not block on the Autograd engine, some behaviors that user should be aware of: Non-determinism: if we are calling backward() on multiple thread concurrently but with shared inputs (i.e. Hogwild CPU training). Since parameters are automatically shared across threads, gradient accumulation might become non-deterministic on backward calls across threads, because two backward calls might access and try to accumulate the same .grad attribute. This is technically not safe, and it might result in racing condition and the result might be invalid to use. But this is expected pattern if user are using the multithreading approach to drive the whole training process but using shared parameters, user who use multithreading should have the threading model in mind and should expect this to happen. User should use the functional interface `torch.autograd.grad()` to calculate the gradients instead of `backward()` on loss. Graph retaining: If part of the autograd graph is shared between threads, i.e. run first part of forward single thread, then run second part in multiple threads, then the first part of graph is shared. In this case different threads execute grad() or backward() on the same graph might have issue of destroying the graph on the fly of one thread, and the other thread will crash in this case. We will error out to the user similar to what call `backward()` twice with out `retain_graph=True`, and let the user know they should use `retain_graph=True`. TODOs: [ ] benchmark the PR with example models and datasets to demonstrate the performance gain in CPU training [ ] ensure that we don't regress the single thread autograd performance Follow ups: [ ] a correct and tight integration with distributed autograd [ ] try to unify the thread pool between JIT and Autograd, and see if there's unifying pattern that we could apply universally Test Plan: Imported from OSS Differential Revision: D20236771 Pulled By: wanchaol fbshipit-source-id: 1e0bd4eec14ffebeffdb60b763b8d6f0e427eb64	2020-03-26 17:17:52 -07:00
Wanchao Liang	9b8c9d6c72	[autograd] add tests for simple reentrant and stackoverflow escape (#35259 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35259 This PR added tests as part of https://github.com/pytorch/pytorch/issues/34367 It covers: Re-entrant -> Test simple re-entrant Re-entrant -> Test stack overflow escape mechanism Test Plan: Imported from OSS Differential Revision: D20611828 Pulled By: wanchaol fbshipit-source-id: 2c55f2a0e3244f11b7153956b0d844e1992e5c80	2020-03-26 17:16:32 -07:00
Meghan Lele	b0459fec72	[clang-format] Replace asyncio.run with approximation supported in python 3.6 (#35501 ) Summary: Summary `asyncio.run` is supported only after 3.7 and that too, provisionally so. This commit replaces the use of `asyncio.run` in `tools/clang_format.py` with an approximation that works in both 3.6 and 3.7. Testing Ran the script with both `python3.6` and `python3.7`. ``` $ python3.6 tools/clang_format.py --diff ... Some files not formatted correctly $ ``` ``` $ python3.7 tools/clang_format.py --diff ... Some files not formatted correctly $ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35501 Differential Revision: D20681947 Pulled By: SplitInfinity fbshipit-source-id: 43e13aa85f79396bec1f12ee1e80eff90dbed5db	2020-03-26 17:08:46 -07:00
Chunli Fu	6b1ffcbf59	[model loading] Skip ssaRewrite for predict_net if it has been ssaRewritten (#35428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35428 ATT Reviewed By: yinghai Differential Revision: D20655131 fbshipit-source-id: 4089b3527fc7b83ba793f8d292c7189a0fa68361	2020-03-26 16:48:15 -07:00
Linbin Yu	b704f30189	[3] register caffe2 mask rcnn ops in lite interpreter (#35246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35246 register caffe2 mask-rcnn ops in lite interpreter. It requires a leading "_" in the name. (Note: this ignores all push blocking failures!) Test Plan: buck build -c caffe2.expose_op_to_c10=1 //xplat/caffe2:mask_rcnn_opsAndroid Reviewed By: iseeyuan Differential Revision: D20528758 fbshipit-source-id: 459668a0c6cdc6aec85cb561d7acce2a5291b421	2020-03-26 16:42:29 -07:00
Taylor A. Robie	c1f5a54397	Optimize index_select for 1D inputs (#35243 ) Summary: `gather` turns out to be much faster than `index_select` for this function. (Anywhere from 2-10x faster across my testing.) We do have to match the shape for the generated indices, however this does not affect performance since `.expand` does not copy the underlying buffer. I experimented with a custom kernel, but the improvement over this implementation didn't justify the approach since it would have added significant complexity and reduced the use of shared infrastructure in the PyTorch codebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35243 Differential Revision: D20629914 Pulled By: robieta fbshipit-source-id: 7841b6a40ffd2b32e544f54ef2529904d76864b8	2020-03-26 16:35:11 -07:00
Meghan Lele	c8bd5ac7e9	[workflows] Don't pipe clang_format.py output to a file (#35496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35496 This commit modifies the clang-format workflow so that it prints the output of `tools/clang_format.py` to stdout instead of piping it to a file. This way, the issues encountered by the script (e.g. which files are not formatted correctly) will be visible in the CI window. Testing: CI Test Plan: Imported from OSS Differential Revision: D20678729 Pulled By: SplitInfinity fbshipit-source-id: 8b437c2cf2779de0245c1b4301c57b4ee0dcad6d	2020-03-26 15:43:45 -07:00
Nikita Shulga	ea0cab7f46	Guard listener removal add by `at::Dispatcher::addListener()` with mutex (#35486 ) Summary: Use std::list instead of std::vector to avoid iterating over list of registered listeners Also, fix formatting Pull Request resolved: https://github.com/pytorch/pytorch/pull/35486 Differential Revision: D20677764 Pulled By: malfet fbshipit-source-id: d2a545454a29a12bbbf4aa62d9f8c4029a109e6c	2020-03-26 15:31:12 -07:00
Bram Wasti	a3e10d2a17	Expose enablement of TensorExpr fuser as env variable (#35341 ) Summary: This commit allows one to use an environment variable to enable the fuser in torch/csrc/jit/tensorexpr/ ``` PYTORCH_TENSOREXPR=1 python benchmark.py ``` This commit also changes the registration to happen by default, removing the requirement for the python exposed "_jit_register_tensorexpr_fuser" Pull Request resolved: https://github.com/pytorch/pytorch/pull/35341 Reviewed By: ZolotukhinM Differential Revision: D20676348 Pulled By: bwasti fbshipit-source-id: 4c997cdc310e7567c03905ebff72b3e8a4c2f464	2020-03-26 14:31:57 -07:00
Alban Desmaison	4d39aeec27	Revert D20653072: [pytorch][PR] Add __torch_function__ benchmarks. Test Plan: revert-hammer Differential Revision: D20653072 Original commit changeset: e7e363f8a1b8 fbshipit-source-id: e75e4979399d6fee10e00a673ea45b9bcc0fd447	2020-03-26 13:36:59 -07:00
Alban Desmaison	e00575044e	Revert D20657271: [pytorch][PR] [JIT] Optimize before inlining Test Plan: revert-hammer Differential Revision: D20657271 Original commit changeset: 7a9006858c2f fbshipit-source-id: d77bbc74479ec8ca5d3254eff498e1cbc04add2b	2020-03-26 13:33:44 -07:00
Nikita Shulga	1ff85fc08b	Prefer python3 in clang_format (#35490 ) Summary: On most Linux distros `python` still points to python-2.x Pull Request resolved: https://github.com/pytorch/pytorch/pull/35490 Differential Revision: D20676691 Pulled By: malfet fbshipit-source-id: 0d4519b83cfebb108edc0628bf036a541247584e	2020-03-26 13:27:52 -07:00
Natalia Gimelshein	8d720b7034	fix complex conversions on cuda (#35344 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35225. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35344 Differential Revision: D20650471 Pulled By: ngimel fbshipit-source-id: f9edabc6dd8884f72c1a38cdf9dbe1de8362535e	2020-03-26 13:17:37 -07:00
Hameer Abbasi	bf24753570	Add __torch_function__ benchmarks. (#34645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34645 Differential Revision: D20653072 Pulled By: ezyang fbshipit-source-id: e7e363f8a1b84fc0c354586e266a695e4a2ea60e	2020-03-26 11:29:10 -07:00
Meghan Lele	61623430d3	[workflows] Add clang-format workflow (#35239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35239 This commit adds a new GitHub workflow that checks if a pull request has any formatting issues using `tools/clang_format.py`. Testing: Literally in prod. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D20605802 Pulled By: SplitInfinity fbshipit-source-id: 8dd6517dd907d7b6a3d9e9dd3969b666fbebb709	2020-03-26 11:24:56 -07:00
Meghan Lele	6384c2d81b	[JIT] clang-format JIT code (#35115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35115 This commit runs the newly added tools/clang_format.py on the JIT codebase and includes all of the formatting changes thus produced. Testing: Ran the script, CI. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D20568523 Pulled By: SplitInfinity fbshipit-source-id: e09bdb982ccf090eecfb7c7b461b8d0681eef82b	2020-03-26 11:24:51 -07:00
Meghan Lele	1422d2cd8b	[tools] Replace clang_format.py with clang_format_new.py (#35114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35114 This commit replaces clang_format.py with clang_format_new.py, a new and improved script that downloads, verifies and runs a platform-appropriate clang-format binary on files in a predefined set of whitelisted directories. Testing: Ran the script. Test Plan: Imported from OSS Reviewed By: eellison Differential Revision: D20568450 Pulled By: SplitInfinity fbshipit-source-id: 3bd782dfc211a053c5b419fd4318d38616b5fd16	2020-03-26 11:20:05 -07:00
Elias Ellison	3b2b6ae1a8	[JIT] Optimize before inlining (#35424 ) Summary: This speeds up the inlining pass of FairSeq model from 180s -> 13s. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35424 Differential Revision: D20657271 Pulled By: eellison fbshipit-source-id: 7a9006858c2f1b157f5a3f36ed2b3774cc186de8	2020-03-26 11:08:09 -07:00
Shen Li	cd9a357f32	Fix non-deterministic RNG behavior in dist_optimizer tests (#35425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35425 Prior to this commit, dist_optimizer_test.py uses torch.manual_seed(0) to set RNG state. However, multiple RPC threads from the same process share the same RNG instance. Therefore, even though we reset the RNG state before every torch.rand usage, background RPC thread could still mess up draw order in the RNG, leading to non-deterministic behavior. This commit address this problem by avoid using the default RNG. Test Plan: Imported from OSS Differential Revision: D20657589 Pulled By: mrshenli fbshipit-source-id: 0f45b11a902317f15f3ee8448bc240f5723075a5	2020-03-26 11:01:04 -07:00
Edward Yang	3622e1c90f	Revert D20589048: [pytorch][PR] [ROCm] Update CI dockers to ROCm release 3.1.1 Test Plan: revert-hammer Differential Revision: D20589048 Original commit changeset: 568f40c1b90f fbshipit-source-id: 724c4fe99e8806f00d2f7dceb71d15a02358f663	2020-03-26 09:31:59 -07:00
KostekIV	ada40777c4	Rand function for complex dtype (#34924 ) Summary: Address https://github.com/pytorch/pytorch/issues/34380 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34924 Differential Revision: D20596623 Pulled By: anjali411 fbshipit-source-id: e17ce069cd763b773399128d113704579ca766e6	2020-03-26 08:34:56 -07:00
Mathis Chenuet	17a01c7c7b	feature: deterministic random_split (#34043 ) Summary: ## 🚀 Feature Option to provide a seed (random_state) for random_split() like the sklearn API https://scikit-learn.org/stable/modules/generated/sklearn.model_selection.train_test_split.html. ## Motivation Useful for deterministic sampling & reproducible data generation (easily, without affecting the PRNG for other uses). See https://github.com/pytorch/pytorch/issues/32467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34043 Differential Revision: D20605678 Pulled By: ezyang fbshipit-source-id: 12b10bf72cd8a0d4264ae4d326064f806945d011	2020-03-26 08:02:39 -07:00
Johannes M Dieterich	f7f7c4edd9	[ROCm] Update CI dockers to ROCm release 3.1.1 (#33930 ) Summary: Request to update ROCm CI dockers to release 3.1 Changes required to the PyTorch source base attached: * switch to the fast path for the Caffe2 ReLU operator * switch to the new hipMemcpyWithStream(stream) API to replace hipMemcpyAsync(stream) && hipStreamSynchronize(stream) paradigm in an optimized fashion * disable two regressed unit tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/33930 Differential Revision: D20589048 Pulled By: ezyang fbshipit-source-id: 568f40c1b90f311eb2ba57f02a9901114d8364af	2020-03-26 07:55:44 -07:00
mpariente	79054495d3	(Fixes #33934 ) Fix AttributeError for nn.Module's properties (#34324 ) Summary: As described in https://github.com/pytorch/pytorch/issues/33934, the current attribute error in `nn.Module`'s properties are wrong. ```python from torch import nn class MyModule(nn.Module): property def something(self): hey = self.unknown_function() return hey model = MyModule() print(model.something) ``` This raises `AttributeError: 'MyModule' object has no attribute 'something'` when what we want is `AttributeError: MyModule instance has no attribute 'unknown_function'`. This fixes this issue and will make properties much easier to debug ! Pull Request resolved: https://github.com/pytorch/pytorch/pull/34324 Differential Revision: D20645563 Pulled By: ezyang fbshipit-source-id: 130f861851bdbef43803569a5ce9e24d2b942179	2020-03-26 07:43:21 -07:00
peter	299bd6d701	Update randomtemp on Windows (#35375 ) Summary: Introduce max retry times to the flaky CUDA build command. Changes: https://github.com/peterjc123/randomtemp/compare/v0.2...v0.3 Targets https://github.com/pytorch/pytorch/issues/25393#issuecomment-603776413. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35375 Differential Revision: D20653082 Pulled By: ezyang fbshipit-source-id: a609379af680ac15ad24c9e2e5b330ffba3d1149	2020-03-26 07:41:32 -07:00
peter	4a4e385e13	Revert "Load torch_global_deps for Windows (#35177 )" (#35355 ) Summary: This reverts commit d7a7bcb0428273fa54a836b52e750608ebe7e4de. The previous commit is not useful because torch_global_deps doesn't include any external dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35355 Differential Revision: D20653036 Pulled By: ezyang fbshipit-source-id: 6d2e2f90952ca865b27b649a6ff9114ada8ea78c	2020-03-26 07:33:48 -07:00
Edward Yang	e0c227d376	Revert D20655246: [jit] add module interface tests to test_jit Test Plan: revert-hammer Differential Revision: D20655246 Original commit changeset: 9e1f865b3f2d fbshipit-source-id: 241f10738df714efb662f1c53551617dd1558b13	2020-03-26 06:53:19 -07:00
Edward Yang	843fd740fb	Revert D20645945: [pytorch][PR] [C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer Test Plan: revert-hammer Differential Revision: D20645945 Original commit changeset: 383588065bf1 fbshipit-source-id: 6d7bc5676de64e329d9862889f32033c76b4009c	2020-03-26 06:40:34 -07:00
Chunli Fu	de3b2f98db	[Shape Inference] Add ssaRewrite pybind func (#35410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35410 Reviewed By: yinghai Differential Revision: D20653042 fbshipit-source-id: 3845413d4e80b9be4fb97dc1eb8e824a55fb7576	2020-03-26 00:46:28 -07:00
Johannes M Dieterich	d807292c4a	[ROCm] Hotfix disable tests (#35396 ) Summary: Regressions introduced sometime these last days - disable for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35396 Differential Revision: D20656744 Pulled By: xw285cornell fbshipit-source-id: 386e4e5d50fb81a1d44e8f3558b81cb69299fe92	2020-03-26 00:21:40 -07:00
Nikita Shulga	be0cdf5d15	[jit] Implement `torch::jit::deregisterOperator` (#35107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35107 Test Plan: CI + `bin/test_jit --gtest_filter=JitTest.CustomOperators --gtest_repeat=2` Differential Revision: D20650516 Pulled By: malfet fbshipit-source-id: 4a7d939498588c812319e7c1f432d54e6edf2189	2020-03-25 22:51:30 -07:00
Kurt Mohler	a7c232f74c	Port `mm` cuda from TH to ATen (#34891 ) Summary: Issue https://github.com/pytorch/pytorch/issues/24596 This PR moves `mm` cuda to ATen. The internal `addmmImpl` that was used as the base of the old TH version of `mm` cuda is also ported. This PR also sets up `addmm` cuda to be fairly easily ported to ATen in a future PR, since TH `mm` and `addmm` used the same `addmmImpl` function at their core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34891 Differential Revision: D20650713 Pulled By: ngimel fbshipit-source-id: 692aba1bbae65a18d23855b5e101446082d64c66	2020-03-25 21:42:35 -07:00
Hong Xu	fa4603ef36	Also sync submodule in the Dockerfile (#35423 ) Summary: Sometimes submodule URL may have changed between commits. Let Dockerfile also sync submodules before updating. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35423 Differential Revision: D20658464 Pulled By: ngimel fbshipit-source-id: 9c101338437f9e86432d3502766858fa5156a800	2020-03-25 21:19:33 -07:00
Rohan Varma	0ccceb2290	[dist autograd] profile the amount of time spent executing (#35261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35261 Uses the RECORD_FUNCTION macro to profile the amount of time in dist_autograd and ensure that it shows up in the profiler output. Since dist_autograd.backward() is blocking, we can avoid stuffing the RecordFunction into a callback. This does not support profiling the RPCs that are created when gradients are forwarded over to other nodes; this can be added in a follow up diff. ghstack-source-id: 100723408 Test Plan: Added a UT. Differential Revision: D20611653 fbshipit-source-id: f9718cf488398a1c7b63ac3841bd2f4549082c8a	2020-03-25 20:57:53 -07:00
Mingzhe Li	7fbb562369	Back out "[reland] Skip OpenMP thread when OMP_NUM_THREADS is set to 1" Summary: Original commit changeset: 0d5a6537aa2f build-break overriding_review_checks_triggers_an_audit_and_retroactive_review Test Plan: I will watch WhereIsStable Differential Revision: D20662253 Ninja: master broken fbshipit-source-id: 96eb398dd8f4060f85e76fdfdff6aeb2befccc57	2020-03-25 20:42:11 -07:00
Suraj Menon	aa01a95c6d	Revert D20630760: [pytorch][PR] Enable NNC tests vol. i. add test_tensorexpr.py tests [WIP] Test Plan: revert-hammer Differential Revision: D20630760 Original commit changeset: 7d2f27aca6b1 fbshipit-source-id: 28ac92b3390651a4a67061d6ebf208515b9b9463	2020-03-25 20:34:46 -07:00
Pavel Belevich	2dd867f30f	Move normal() to DistributionTemplates (#35167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35167 The purpose of this PR is to move `normal`/`normal_`/`normal_out` to `native/DistributionTemplates.h`, `native/cpu/DistributionTemplates.h` and `native/cuda/DistributionTemplates.h` to make it reusable for custom RNG, see cpu_rng_test.cpp as an example of custom RNG. Test Plan: Imported from OSS Differential Revision: D20588248 Pulled By: pbelevich fbshipit-source-id: 7ee60be97f81522cd68894ff1389007c05130a60	2020-03-25 19:54:18 -07:00
Kimish Patel	dc2c4d02f9	Add a wrapper to wrap all optimization for mobile. (#35227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35227 This wraps. 1. Conv BN folding (not mobile specific) 2. insert XNNPACK conv2d/Linear ops 3. Remove prepacking ops. Test Plan: Imported from OSS Differential Revision: D20603562 fbshipit-source-id: ff373af7112c070ec6198bac51845282e09ff1f8	2020-03-25 19:21:14 -07:00
Kimish Patel	315929f43e	Refactor code to move const prop to convolution 2d replacer. (#35226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35226 .. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20603561 fbshipit-source-id: 5905da9e6b91071c2bc28c323f58936746364b70	2020-03-25 19:19:44 -07:00
Supriya Rao	b4b8b3c0ca	Revert D20630988: [quant][graph] Add a new observer type for dynamic quantization Test Plan: revert-hammer Differential Revision: D20630988 Original commit changeset: 7e7aca77590f fbshipit-source-id: 6bc67ca322c1703004e0053f8eba9b8f6a3a5f67	2020-03-25 18:52:21 -07:00
Supriya Rao	d7de6ad23f	Revert D20640487: [quant][graph] Update dynamic quant tests to use new qconfig Test Plan: revert-hammer Differential Revision: D20640487 Original commit changeset: e0b5cd36fc7a fbshipit-source-id: 5a3265c7d90cbe848fc53c07365540c54610f481	2020-03-25 18:47:09 -07:00
Shihao Xu	0a3864f81e	Throw an actionable error message on user call rref<ScriptModule>.to_here() in torchscript (#35369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35369 For issue, https://github.com/pytorch/pytorch/issues/35367 Test Plan: ``` buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_remote_script_module buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_torchscript_functions_not_supported ``` Differential Revision: D7870906 fbshipit-source-id: 2e78f2e620a5cc7c8f26ab35400ba33bb303788d	2020-03-25 18:40:05 -07:00
anjali411	efbd6b8533	[C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer (#34957 ) Summary: 1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer) 2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function. 3. BC-compatibility serialization test for LBFGS 4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions 5. Made defaults_ optional argument in all optimizers except SGD TODO: add BC-breaking notes for this PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957 Differential Revision: D20645945 Pulled By: yf225 fbshipit-source-id: 383588065bf1859b38f0ad0a25d93d41e153c96e	2020-03-25 18:26:02 -07:00
Pritam Damania	e08614ffd5	[Autograd Testing] Test failure in parent graph before child reentrant task (#35268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35268 Context: https://github.com/pytorch/pytorch/issues/34367 Test: Mixed with Errors -> Reentrant on different devices -> Make parent error before child finishes ghstack-source-id: 100864838 Test Plan: waitforbuildbot Differential Revision: D20612919 fbshipit-source-id: 7bfa194820c9b91711590d3719356bc90b5937ef	2020-03-25 17:53:45 -07:00
Nikolay Korovaiko	f3a5081bd4	Enable NNC tests vol. i. add test_tensorexpr.py tests [WIP] (#34897 ) Summary: This PR add tensorexpr cpp tests to test_jit.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/34897 Differential Revision: D20630760 Pulled By: Krovatkin fbshipit-source-id: 7d2f27aca6b1e23e3ffed1c765d8f590688118e3	2020-03-25 17:23:48 -07:00
Jerry Zhang	ccc0e35275	[quant][graphmode] quantization support for prim::CallFunction (#34855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34855 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20655305 fbshipit-source-id: 44cc3525967048fb9d9c145b342ac7d76b22e4db	2020-03-25 17:17:19 -07:00
Supriya Rao	64a6faa2c8	[quant][graph] Update dynamic quant tests to use new qconfig (#35325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35325 default_dynamic_qconfig now holds activation observer Test Plan: python test/test_quantize_script.py Imported from OSS Differential Revision: D20640487 fbshipit-source-id: e0b5cd36fc7a9c0dcc9020e12901b46008b3ff40	2020-03-25 16:52:17 -07:00
Supriya Rao	7e24ab8c4a	[quant][graph] Add a new observer type for dynamic quantization (#35265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35265 In graph mode we need to observer the activation tensor for dynamic quantization. This observer should behave the same way as the quantization functions called in the dynamic operator. Currently for qlinear_dynamic we call quant_utils::ChooseQuantizationParams which has its own logic for calculating scale and zero_point. We mimic those calculations in the new observer. Test Plan: python test/test_quantization.py ObserverTest Imported from OSS Differential Revision: D20630988 fbshipit-source-id: 7e7aca77590f965dcb423a705e68d030aaf98550	2020-03-25 16:50:05 -07:00
Ailing Zhang	7580470cc5	Update view op list. (#35399 ) Summary: Adding ops to the list based on our discussion. :D Pull Request resolved: https://github.com/pytorch/pytorch/pull/35399 Differential Revision: D20651393 Pulled By: ailzhang fbshipit-source-id: 8cf9026d10c0d74117953dbb68ebc2f537be956a	2020-03-25 16:15:00 -07:00
Mikhail Zolotukhin	6bcf0b407b	[TensorExpr] Disable fuser-te cuda tests when run on ROCm. (#35388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35388 Test Plan: Imported from OSS Differential Revision: D20648735 Pulled By: ZolotukhinM fbshipit-source-id: 27bd776fbb84ec81034ace4b874522413d9e5643	2020-03-25 16:04:15 -07:00
Wanchao Liang	d7c255d2fc	[jit] add module interface tests to test_jit (#35417 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35417 surprised it's not getting runned by test_jit, added it Test Plan: Imported from OSS Differential Revision: D20655246 Pulled By: wanchaol fbshipit-source-id: 9e1f865b3f2d23b63d4d605aaf2dc3a483a4f0e1	2020-03-25 15:25:28 -07:00
Elias Ellison	00aa23446b	[JIT] [Reland] add complexity tests (#35330 ) Summary: Relanding https://github.com/pytorch/pytorch/pull/34918 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35330 Differential Revision: D20633804 Pulled By: eellison fbshipit-source-id: ce5cf45f53a25830141bedb759ff712a59a534c7	2020-03-25 14:22:52 -07:00
Martin Yuan	a4ea16dbc6	Put prim ops used in full jit only in a separate file (#35232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35232 Some prim operators, like profile and fusion, are not used in mobile (at least in short term). They are coupled with JIT code. Put them in a separate file (register_prim_ops_fulljit.cpp). ghstack-source-id: 100807055 Test Plan: buck build //xplat/caffe2:torch Reviewed By: dreiss Differential Revision: D20408827 fbshipit-source-id: 9013093357cf75723ef00c34bbfdb6b7ea40a4cf	2020-03-25 14:15:34 -07:00
Vitaly Fedyunin	17abb7c31a	Add docs to resize_ and resize_as_ (#35392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35392 Test Plan: Imported from OSS Differential Revision: D20650097 Pulled By: VitalyFedyunin fbshipit-source-id: cff4f555d355dfee42394f6070fe3e466949aeb5	2020-03-25 14:09:25 -07:00
Nikita Shulga	512bcf68be	[Formatting] `if (` -> `if(` in CMakeLists.txt (#35343 ) Summary: Same to `else`, `endif` and `elseif`. Also prefer lowercase over uppercase ones Pull Request resolved: https://github.com/pytorch/pytorch/pull/35343 Test Plan: None at all Differential Revision: D20638789 Pulled By: malfet fbshipit-source-id: 8058075693185e66f5dda7b825b725e139d0d000	2020-03-25 13:48:42 -07:00
Shen Li	c9117f27c4	Fix final callbacks for reentrant backwards (#35066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35066 Closes #24965 Prior to this commit, final_callbacks_ are cleared on exit of ANY backward. When using reentrant backward, the last backward would remove all callbacks from the engine. However, this might lead to unexpected behavior. For example, the application could install a final callback after forward, and expecting this callback to fire when all gradients are ready. If there is a renentrant backward on a subgraph, it would fire the callback and delete it on exit, meaning that when fired, not all gradients are ready. Failed Attempt The 1st attempt was trying to move the callback to the GraphTask in engine::execute(). However, this failed because more callbacks could be installed during backward pass. Current Solution Final callbacks are stored as a member variable in the GraphTask. * Insertion: use the thread_local current_graph_task to find the target GraphTask, and append final callback. * Deletion: final callbacks have the same lifetime as a GraphTask * Execution: Use the GraphTask provided in the argument to find final callbacks. Test Plan: Imported from OSS Differential Revision: D20546474 Pulled By: mrshenli fbshipit-source-id: d3f3449bb5af9f8703bcae63e6b52056cd535f11	2020-03-25 13:47:06 -07:00
Yuichiro Ueno	aadd0fda8b	Document reduce_scatter collective operation (#35274 ) Summary: I don't know why reduce_scatter collective operation is not documented so I add it to the document. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35274 Differential Revision: D20645850 Pulled By: mrshenli fbshipit-source-id: 0a4458bff1a4e15a4593dd4dcc25e4e0f6e2265d	2020-03-25 13:36:29 -07:00
Peter Bell	40b244ceb4	Fix handling of non-finite values in topk (#35253 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34191 `at::native::radixSelect` basically uses integer comparison which creates a defined ordering of non-finite float values. This isn't compatible with IEEE float comparison, so mixing the two leads to unwritten values in the output. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35253 Differential Revision: D20645554 Pulled By: ezyang fbshipit-source-id: 651bcb1742ed67086ec89cc318d862caae65b981	2020-03-25 13:29:45 -07:00
peterjc123	de3044b210	Load all DLLs in the lib directory for Windows (#35362 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35358. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35362 Differential Revision: D20645218 Pulled By: ezyang fbshipit-source-id: 08ef5889fe2cd9139a3f6852ee73fe7742b315b5	2020-03-25 13:22:45 -07:00
Yinghai Lu	34b005954e	Support merge_fp32_inputs_into_fp16 for predefined partitions (#35361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35361 If the inputs we are bundling together will be consumed by ops from the same partition, we can assign the Split and Half2Float ops to the that partition too. Otherwise, we do nothing. Reviewed By: bangshengtang Differential Revision: D20639777 fbshipit-source-id: 4032abb9178f3b44a85e4789ddf5ad5624245e3a	2020-03-25 12:53:48 -07:00
Yinghai Lu	d863fe356d	Ignore rest of outputs of LayerNorm when lowering to Glow (#35338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35338 Pull Request resolved: https://github.com/pytorch/glow/pull/4343 - Ignore rest of outputs of layoutNorm - Support implicit broadcast when loading basic binary ops from Glow. Test Plan: ``` buck test glow/fb/test:test_onnxifinnpi -- test_layernorm_mul --print-passing-details ``` Reviewed By: jfix71 Differential Revision: D20627768 fbshipit-source-id: 30ae8a2f590452f0b354d413ae2c5ec46a4a77d8	2020-03-25 12:52:25 -07:00
Jerry Zhang	15e5453977	[reland][quant][graphmode] Add quantization support for aten::cat (#34346 ) (#35337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35337 Test Plan: python test/test_jit.py Differential Revision: D20648201 Pulled By: jerryzh168 fbshipit-source-id: f6570c3ee2f48a9bc6373d2af873824ac2c8ef62	2020-03-25 12:45:21 -07:00
Mikhail Zolotukhin	51818cc4ea	[TensorExpr] Cleanup implementation of alloc/free insertion. (#35176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35176 Differential Revision: D20585574 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 1fd4330e8d7c1c2bceecc2ba927bcc455f5d858f	2020-03-25 11:51:21 -07:00
Mikhail Zolotukhin	db0f715af6	[TensorExpr] Factor out LoopNest::insertAllocFree. (#35175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35175 Differential Revision: D20585576 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 498b7ddf44df11392f6b5454387a29c5457bdb05	2020-03-25 11:51:16 -07:00
Mikhail Zolotukhin	ceb4ed3733	[TensorExpr] Methods name cleanup in LoopNest class. (#35174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35174 Differential Revision: D20585575 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 0fa8e1e85e1502b9a86cf34608cb791ffb23d395	2020-03-25 11:51:11 -07:00
Mikhail Zolotukhin	450738662b	[TensorExpr] Replace `ExprHandle` with `const Expr*` in `Substitute`. (#35173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35173 Differential Revision: D20585577 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 902f9740a0b97c3d2a0eef2c274d8227b975b3cb	2020-03-25 11:48:14 -07:00
Zafar Takhirov	5959bd6c29	Making sure all tensors in `torch.cat` sequence have the same dtype. (#35150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35150 Fixes #35014 Test Plan: Imported from OSS Differential Revision: D20578589 Pulled By: z-a-f fbshipit-source-id: edeaef133d1cf5152dcbafab2b969f1424ee2836	2020-03-25 11:36:12 -07:00
Rohan Varma	7cb301e48d	[rpc][easy] remove code duplication on ProcessGroupAgent::enqueueSend (#35311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35311 This must have snuck in since a couple PRs updated this same area and the merge conflict was not resolved properly. ghstack-source-id: 100770387 Test Plan: CI Differential Revision: D20602683 fbshipit-source-id: 22134069194b4095dd3be920e4e7f4437dac06f0	2020-03-25 11:28:12 -07:00
Lara Haidar	7e327e1210	Enable Constant Folding for ONNX Opset 12 (#34823 ) Summary: Currently constant folding is only enabled for ONNX opset versions 9 to 11. This PR enables it for the new ONNX opset 12. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34823 Reviewed By: hl475 Differential Revision: D20627629 Pulled By: houseroad fbshipit-source-id: 7501d8ab8295751c0e9a02752d8908a35d8a0454	2020-03-25 11:06:39 -07:00
Supriya Rao	032c27cff7	[quant][graph] Add _choose_qparams function for graph mode (#35235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35235 For dynamic quantization in graph mode, we need an operator that returns the qparams of the tensor similar to the linear_dynamic quantized op Test Plan: python test/test_quantized_tensor.py TestQuantizedTensor.test_choose_qparams Imported from OSS Differential Revision: D20608793 fbshipit-source-id: b923b2620421b32d05f4097db0d6153d53198221	2020-03-25 10:33:21 -07:00
Mingzhe Li	f9889aa390	[reland] Skip OpenMP thread when OMP_NUM_THREADS is set to 1 (#35353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35353 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35324 When the OMP_NUM_THREADS is set to 1, we don't need to launch the parallel_for function on an OpenMP thread since there is no intra-op parallelism. By avoiding that, we can reduce the unnecessary context switches. Test Plan: internal Reviewed By: ilia-cher Differential Revision: D20638734 fbshipit-source-id: 0d5a6537aa2fc35d8d0904c3b9e734e52585eee7	2020-03-25 10:06:57 -07:00
shihongzhi	3645d9b832	Port `diag` cpu from TH to ATen (#35100 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/24689 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35100 Differential Revision: D20624868 Pulled By: VitalyFedyunin fbshipit-source-id: bc436a62369aa9b6257e82051eabf5768652cf58	2020-03-25 09:42:53 -07:00
Xiaodong Wang	53fceff1e1	Change weight scale test to cpu only (#35346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35346 weight scale op doesn't have GPU impl. This is breaking OSS CI from D20506032. Making it cpu only Test Plan: OSS CI Reviewed By: ustctf Differential Revision: D20637440 fbshipit-source-id: 9aa6cce63ce637ab7856788e5d02f527decb2a26	2020-03-25 09:18:58 -07:00
anjali411	c73e97033a	Added type promotion logic for complex numbers (#34093 ) Summary: Issue: https://github.com/pytorch/pytorch/issues/33780 After this PR: 1. dtype promotion logic will correctly work for ops involving complex scalars 2. added alias for complex64 (cfloat) and complex128 (cdouble) 3. added an internal function get_complex_default_dtype (consciously not exposed in public API) - sets the default complex dtype to be double if default_dtype is set to double, else float https://github.com/pytorch/pytorch/pull/34093#discussion_r392350224 >>> 1jtorch.ones(2) tensor([(0.0000 + 1.0000j), (0.0000 + 1.0000j)], dtype=torch.complex64) >>> torch.set_default_dtype(torch.float64) >>> 1jtorch.ones(2) tensor([(0.0000 + 1.0000j), (0.0000 + 1.0000j)], dtype=torch.complex128) >>> 1j + torch.ones(2) tensor([(1.0000 + 1.0000j), (1.0000 + 1.0000j)], dtype=torch.complex128) >>> torch.tensor(1j) + torch.ones(2,2) tensor([[(1.0000 + 1.0000j), (1.0000 + 1.0000j)], [(1.0000 + 1.0000j), (1.0000 + 1.0000j)]], dtype=torch.complex128) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34093 Differential Revision: D20537125 Pulled By: anjali411 fbshipit-source-id: 05fb1f81b8ba039d0b698cdd2c0bbf8b0ce0b767	2020-03-25 09:12:21 -07:00
Martin Yuan	361eed6a6e	Use JIT op registration directly for lite interpreter. (#34070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34070 The first step to make all operators available for lite interpreter. The original code used manual registration for lite interpreter ops with a "_" prefix, for two reasons: 1. To minimize the build size. 2. To avoid duplicate registration in OSS (majorly feature testing and unit tests). Now since we have more and more models to support, the manual registration way is not practical. To make this process automatic while keeping the binary size under control, we plan to: 1. Make all necessary ops callable from lite interpreter. 2. The binary size would be increased because of step 1. Use ljk53 's custom build to selectively build the binary with ops used in specific models. The ops will be automatically collected using get_opnames. 3. The temporary "register_mobile_ops.cpp" can be removed. Test Plan: Imported from OSS Differential Revision: D20291596 Pulled By: iseeyuan fbshipit-source-id: 553b4699619cd71fea20658f3bc8c2d48852ef5c	2020-03-25 07:21:51 -07:00
Ayush Saraf	3789db40f2	[aibench] added support for measuring memory on AI Bench for Caffe2 Models (#35036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35036 Exposing the helper functions in benchmark_helper.h Reviewed By: kimishpatel, geof90 Differential Revision: D20528983 fbshipit-source-id: 73231becd93b1e700d37af425bebb628890dec9a	2020-03-25 01:58:18 -07:00
h6197627	c2804e8229	Fix Caffe2 mobile compilation (#35288 ) Summary: fixes https://github.com/pytorch/pytorch/issues/35211 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35288 Reviewed By: jianyuh Differential Revision: D20639692 Pulled By: jspark1105 fbshipit-source-id: 83e12c1c956271c10ffba197206bd8d5a158e700	2020-03-25 01:43:00 -07:00
Hector Yuen	d6149a7250	move some ops to contrib (#35282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35282 move some ops to contrib: sls, batchnorm Test Plan: https://our.intern.facebook.com/intern/testinfra/testrun/7599824379495916 Reviewed By: yinghai Differential Revision: D20616678 fbshipit-source-id: 6dd8733f89f3932b4fdb127630dce132b9a60ebc	2020-03-25 00:13:50 -07:00
Gabe Schwartz	d6377b7cef	Fix thread_local initializtion in C10 WarningHandler. (#34822 ) Summary: The Windows + MSVC-specific bug discussed here: https://github.com/pytorch/pytorch/issues/19394 and fixed here: https://github.com/pytorch/pytorch/issues/22405 still appears in C10's warning handler class. This results in a crash if a user attempts to run code which would print a warning when that code is running inside a thread created by a DLL. This PR applies a similar fix to that of https://github.com/pytorch/pytorch/issues/22405. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34822 Test Plan: * Tested locally by running CodecverseWorkbench Unity app with patched build. * CI Differential Revision: D20627971 Pulled By: HapeMask fbshipit-source-id: 64dfca531ed7eebbe9e0ecac3d3d4d025c683883	2020-03-24 23:52:23 -07:00
Elias Ellison	f090031e69	[JIT] remove list appends (#33199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33199 Remove list appends when we can match them with a list construction. This helps create a larger functional graph Test Plan: Imported from OSS Differential Revision: D20603187 Pulled By: eellison fbshipit-source-id: a60e933b457479d40960994d8ffdf39ef49eaf6e	2020-03-24 23:46:03 -07:00
Elias Ellison	aab4beb87f	[JIT] Pass To Safely Remove Aten Inplace Ops (#33186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33186 This helps create larger functional graphs. It has the potential to increase memory use, so in order to land this on by default we would probably also do a reuse of buffers pass. This is currently O(n * \| Removed Nodes \| ) because we have to rebuild the alias Db each time we make a change. This pass is critical to creating functional graphs, so this might be a compelling use case to build incremental updates to alias Db. Test Plan: Imported from OSS Differential Revision: D20603189 Pulled By: eellison fbshipit-source-id: 105db52bf38e02188ca6df6d36294466d3309a0a	2020-03-24 23:45:58 -07:00
Elias Ellison	5b2f8cef08	[JIT] Functional Graph Pass (#33020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33020 This is a pass to create functional blocks. The other PRs in the stack help avoid some of the limitations that are are often found in graphs. It's possible that this would work well with a graph that is frozen. Follow up work items that will help this pass: - We don't currently have any capacity in alias analysis to tell whether a Value that came from the wildcard set "re-escapes" back into the wildcard set. - More comments on the semantics of the graph and correctness conditions - We could consider using dynamic dag if the perf of this is a limitation. - potential make Functional Graphs Functional Blocks instead, so that we do not repeatedly copy constants, also to make IR read easier. Test Plan: Imported from OSS Differential Revision: D20603188 Pulled By: eellison fbshipit-source-id: 6822a6e65f4cc2676f8f6445fe8aa1cb858ebeeb	2020-03-24 23:44:18 -07:00
Hao Lu	01a7d6adcb	[caffe2] Fix typo in dataset_ops (#35356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35356 Fix a few typos in dataset_ops (Note: this ignores all push blocking failures!) Test Plan: . Reviewed By: yinghai Differential Revision: D20554176 fbshipit-source-id: 8565f4b34f5d304696adb1c06d4596921938de8f	2020-03-24 22:24:46 -07:00
Linbin Yu	93065ff767	[1] add missing header for C10_EXPORT_CAFFE2_OP_TO_C10 (#35245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35245 add missing header file for C10_EXPORT_CAFFE2_OP_TO_C10_CPU micro (Note: this ignores all push blocking failures!) Test Plan: buck build -c caffe2.expose_op_to_c10=1 //xplat/caffe2:mask_rcnn_opsAndroid Reviewed By: dreiss Differential Revision: D20528761 fbshipit-source-id: 7cd186ba72964c2e193aca994f87a91a71c3c5d7	2020-03-24 22:16:03 -07:00
Zafar Takhirov	6c39e362fd	Minor fix to quantized conv docstring (#35134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35134 Test Plan: Imported from OSS Differential Revision: D20571996 Pulled By: z-a-f fbshipit-source-id: f5bb12e5779bed24c4e0e2f9e2ce90c8d628bd30	2020-03-24 20:04:50 -07:00
Zafar Takhirov	74c02619de	quantized Conv1d (#35093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35093 Test Plan: Imported from OSS Differential Revision: D20555070 Pulled By: z-a-f fbshipit-source-id: 2a6446b82ea59ce318ca743d4b61008916ea6d5c	2020-03-24 19:22:03 -07:00
Alban Desmaison	b8f509fd9b	Revert D20630949: Skip OpenMP thread when OMP_NUM_THREADS is set to 1 Test Plan: revert-hammer Differential Revision: D20630949 Original commit changeset: 0b6f1ba5b535 fbshipit-source-id: a29c3e3a1b20441581009e16eaf4c893725be8d3	2020-03-24 18:25:08 -07:00
Mingzhe Li	574be9f816	Skip OpenMP thread when OMP_NUM_THREADS is set to 1 (#35324 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35324 When the OMP_NUM_THREADS is set to 1, we don't need to launch the parallel_for function on an OpenMP thread since there is no intra-op parallelism. By avoiding that, we can reduce the unnecessary context switches. Test Plan: internal Reviewed By: ilia-cher Differential Revision: D20630949 fbshipit-source-id: 0b6f1ba5b535dafedb16742145a70cc4bb4872a2	2020-03-24 17:46:19 -07:00
Alban Desmaison	a7f8655314	Revert D20624571: [pytorch][PR] [TensorExpr] Extend arithmetic simplifier to work with multi variable expressions Test Plan: revert-hammer Differential Revision: D20624571 Original commit changeset: e49049377bee fbshipit-source-id: 7d8dda0c3b44be1c3236a0313bbfa128b7015de7	2020-03-24 16:59:51 -07:00
Alban Desmaison	ee7cd84fac	Revert D20589145: [quant][graphmode] Add quantization support for aten::cat Test Plan: revert-hammer Differential Revision: D20589145 Original commit changeset: c9159fffa88c fbshipit-source-id: c6b8db13ed1ed19f4437b2fa70d88ce139d445e1	2020-03-24 16:24:22 -07:00
Vasiliy Kuznetsov	f1efe51028	add quantized version of hardswish operator (#34820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34820 Adds quantized version of hardswish, for common quantized operator coverage. Note: * we carry over scale and zero_point from the input to the output, because the range of the output is unbounded if x > 0 * we also skip the .out function to not allow the user to specify a custom scale+zp (flexible on this). Test Plan: ``` python test/test_quantized.py https://gist.github.com/vkuzo/f9b579315ed7f5fdb24839e3218d8465 ``` Imported from OSS Differential Revision: D20472905 fbshipit-source-id: 0f2a83e9f5f7b43485fa46caf30e756dc5d492a9	2020-03-24 15:16:58 -07:00
Vasiliy Kuznetsov	f3e9fa6122	add hardswish FP operator (#34747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34747 Adds the hardswish FP operator from MobileNetV3 to PyTorch. This is for common operator coverage, since this is widely used. A future PR will add the quantized version. CUDA is saved for a future PR as well. Test Plan: tests pass: ``` python test/test_torch.py TestTorchDeviceTypeCPU.test_hardswish_cpu_float32 ``` microbenchmark: https://gist.github.com/vkuzo/b10d3b238f24e58c585314e8b5385aca (batch_size == 1: 11.5GiB/s, batch_size == 4: 11.9GiB/s) Imported from OSS Differential Revision: D20451404 fbshipit-source-id: c7e13c9ab1a83e27a1ba18182947c82c896efae2	2020-03-24 15:15:34 -07:00
Natalia Gimelshein	b6306e1517	Revert D20624698: [pytorch][PR] Make GPU loops support mutable lambda Test Plan: revert-hammer Differential Revision: D20624698 Original commit changeset: 06e398779345 fbshipit-source-id: d17059c692b4b460f3aa8081bc80c296ddb88228	2020-03-24 14:42:40 -07:00
Wanchao Liang	4a84ac5f5d	[jit] make Future type annotation available in Python (#27637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27637 Fixes https://github.com/pytorch/pytorch/issues/26578 Test Plan: Imported from OSS Differential Revision: D20626866 fbshipit-source-id: 20d6a3a719fddcb33e0e17a56d7123535fa20d65	2020-03-24 14:36:05 -07:00
Takeshi Watanabe	2623448746	Match case of package name to suppress warning (#35201 ) Summary: `ee4673c1ae` `find_package(Torch)` is used most of the time: https://pytorch.org/cppdocs/installing.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/35201 Differential Revision: D20627251 Pulled By: albanD fbshipit-source-id: edc151ca437f1f0a778b9834db481bbc887cb9f5	2020-03-24 14:23:04 -07:00
Nick Gibson	fce67800f4	[TensorExpr] Extend arithmetic simplifier to work with multi variable expressions (#35127 ) Summary: A new version of the IR simplifier used by the jit/tensorexpr fuser. This is capable of simplifying expressions containing (shock) multiple variables, eg: ```(m * (1 * n_1) + (n + 1)) - (m * (1 * n_1) + n) => 1``` Similar to the previous IR Simplifier it uses a two stage approach: 1. Traverse the tree combining subtree's of commutable operations in to a flat structure. In this implementation we have two intermediate Exprs: Term (expressing products of sub expressions) and Polynomial (expressing sums of sub expressions). 2. Traverse the tree expanding Term's and Polynomials into their component operators. Using the example above we execute with a process like this to simplify: ``` (m * (1 * n_1) + (n + 1)) - (m * (1 * n_1) + n) # Using PolynomialTransformer: => Sub(Add(Mul(m, Mul(1, n_1)), Add(n, 1)), Add(Mul(m, Mul(1, n_1)), n)) => Sub(Polynomial(Term(m, n_1), n, 1), Polynomial(Term(m, n_1), n)) => Polynomial(Term(m, n_1), Term(-1, m, n_1), n, -n, 1) => Polynomial(1) # Using TermExpander => 1 ``` The IRSimplifier supports arithmetic simplifications of operators Add, Sub and Mul and constant folding of all binary Exprs and Intrinsics, but does not attempt expansion of multiplication of Polynomials to the canonical form since that generally leads to less efficient representations. It will do scalar factorization if it results in removal of operators, and will merge chains of multilane primitives (such as Broadcast and Ramp) down into a single operator. The ir_simplifier unit tests are a short tour of its capabilities. The existing simplifier has a bug where it will sometimes reorder operations on floating point types which are not associative. This causes (at least) the pyhpc equation_of_state benchmark to produce incorrect results. I have fixed that issue in this version and verified that that benchmark produces the same results with and without the simplifier. Tests: all cpp & py tensorexpr tests, and pyphc benchmark: ``` benchmarks.equation_of_state ============================ Running on CPU size backend calls mean stdev min 25% median 75% max Δ ------------------------------------------------------------------------------------------------------------------ 4,194,304 pytorch 10 0.246 0.002 0.243 0.245 0.246 0.248 0.250 1.000 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35127 Differential Revision: D20624571 Pulled By: nickgg fbshipit-source-id: e49049377beee69e02dcf26eb922bef1447ae776	2020-03-24 14:16:07 -07:00
Will Feng	2dc2933358	Move NewModuleTest and NewCriterionTest from test_nn.py to common_nn.py (#35189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35189 Test Plan: Imported from OSS Differential Revision: D20588197 Pulled By: yf225 fbshipit-source-id: 5a28159b653895678c250cbc0c1ddd51bc7a3123	2020-03-24 14:05:45 -07:00
Jerry Zhang	6b5740c5f6	[quant][graphmode] Add quantization support for aten::cat (#34346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34346 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20589145 fbshipit-source-id: c9159fffa88cf25fcdccfcc4eef2622cf4b250b5	2020-03-24 13:56:43 -07:00
davidriazati	b7e4dd15cc	[jit] Remove stray `@script` (#34938 ) Summary: Stacked PRs * #34938 - [jit] Remove stray `script` * #34935 - [jit] Add lazy script decorator Pull Request resolved: https://github.com/pytorch/pytorch/pull/34938 Pulled By: driazati Differential Revision: D20569793 fbshipit-source-id: 1f126646f7bd7c4ea972e15023eaa60f0e301351	2020-03-24 13:44:38 -07:00
davidriazati	44622bbda9	[jit] Add lazy script decorator (#34935 ) Summary: Stacked PRs * #34938 - [jit] Remove stray `script` * #34935 - [jit] Add lazy script decorator Some users maintain libraries of code that is largely trace-able but not script-able. However, some functions may need to be `torch.jit.script`ed if they contain control flow so the tracer will use the compiler version. This however impacts library start up time as in #33418, so this PR adds a workaround in the form of a `torch.jit._lazy_script_while_tracing` that will only initialize the compiler if the function is called while actually tracing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34935 Pulled By: driazati Differential Revision: D20569778 fbshipit-source-id: d87c88c02b1abc86b283729ab8db94285d7d4853	2020-03-24 13:43:18 -07:00
Ryad ZENINE	84dc8c410a	Add's workaround for ScalarType::Byte for cuda (#35027 ) Summary: This PR add's a workaround for `cuda` for `ScalarType::Byte` for the `AT_DISPATCH_*` macros. As discussed here: https://github.com/pytorch/pytorch/issues/34826 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35027 Differential Revision: D20596555 Pulled By: colesbury fbshipit-source-id: 72e842603723a5aa146e4224e79befafc62f2624	2020-03-24 12:51:57 -07:00
Xiang Gao	39a101d06e	Make GPU loops support mutable lambda (#35015 ) Summary: I will need it for https://github.com/pytorch/pytorch/pull/34004 The `mutable` qualifier allows a lambda to capture some values, and modify its own copy. This would be useful for random kernels: we capture a `state` of RNG, initialize it when it first run, and the initialized stated will be used later: ```C++ gpu_kernel(iter, [state, initialized](scalar_t arg) mutable -> scalar_t { if (!initialized) { curand_init(..., state); initialized = true; } return some_math(curand_uniform(state), arg); } ``` The `operator()` of `mutable` lambda is not `const`, so we can not pass it as constant reference. It can not be called inside a non-`mutable` lambda either. Example usage: ```C++ auto t = at::empty({4096}, kCUDA); float thread_work_index_ = 0; auto iter = TensorIterator::nullary_op(t); gpu_kernel(iter, [thread_work_index_]GPU_LAMBDA() mutable -> float { return thread_work_index_++; }); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35015 Differential Revision: D20624698 Pulled By: ngimel fbshipit-source-id: 06e3987793451cd514181d20252510297e2d28a9	2020-03-24 12:30:49 -07:00
Ashkan Aliabadi	edad9c102d	Update XNNPACK to Git revision 1b354636b5942826547055252f3b359b54acff95. (#35081 ) Summary: Required to fix a build issue in https://github.com/pytorch/pytorch/issues/33766. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35081 Reviewed By: dreiss Differential Revision: D20567230 Pulled By: AshkanAliabadi fbshipit-source-id: 1ed61708851402f60b80abc818ae7330e43adb83	2020-03-24 12:24:30 -07:00
xiaobingsuper	abcd4eb993	Optimize min and max(reduce_dim) performance on CPU (#34875 ) Summary: This PR is about improve min and max(reduce_dim) performance on CPU. Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" torch.set_num_threads(1) #warm up for n in [10, 200]: # contiguous # input = torch.randn(n, n, n, requires_grad=False, device=device) # discontiguous input = torch.randn(n, 2n, n, requires_grad=False, device=device)[:, :n, :] for dim in range(input.dim()): for i in range(1000): output = input.min(dim) #output = input.max(dim) for n in [10, 200]: # contiguous input # input = torch.randn(n, n, n, requires_grad=False, device=device) # discontiguous input = torch.randn(n, 2n, n, requires_grad=False, device=device)[:, :n, :] for dim in range(input.dim()): fwd_t = 0 for i in range(10000): t1 = _time() output = input.min(dim) #output = input.max(dim) t2 = _time() fwd_t = fwd_t + (t2 -t1) fwd_avg = fwd_t / 10000 * 1000 print("size = (%d, %d, %d); reduce dim=%d; compute time is %.4f(ms)" % (n, n, n, dim, fwd_avg)) ``` Test device: skx-8180. ### Contiguous case. - num_threads = 56 \| Bef(ms) \| Bef(ms) \| Bef(ms) \| Aft(ms) \| Aft(ms) \| Aft(ms) -- \| -- \| -- \| -- \| -- \| -- \| -- size \| dim=0 \| dim=1 \| dim=2 \| dim=0 \| dim=1 \| dim=2 n=10 \| 0.0243 \| 0.0243 \| 0.0244 \| 0.0063 \| 0.0065 \| 0.0063 n=200 \| 0.9615 \| 0.9453 \| 0.7772 \| 0.2937 \| 0.2675 \| 0.2607 - num_threads = 1 \| Bef(ms) \| Bef(ms) \| Bef(ms) \| Aft(ms) \| Aft(ms) \| Aft(ms) -- \| -- \| -- \| -- \| -- \| -- \| -- size \| dim=0 \| dim=1 \| dim=2 \| dim=0 \| dim=1 \| dim=2 n=10 \| 0.0126 \| 0.0126 \| 0.0114 \| 0.0062 \| 0.0065 \| 0.0064 n=200 \| 32.1276 \| 33.3489 \| 29.0757 \| 8.0556 \| 7.0188 \| 6.5014 ### Discontiguous case. - num_threads = 56 \| Bef(ms) \| Bef(ms) \| Bef(ms) \| Aft(ms) \| Aft(ms) \| Aft(ms) -- \| -- \| -- \| -- \| -- \| -- \| -- size \| dim=0 \| dim=1 \| dim=2 \| dim=0 \| dim=1 \| dim=2 n=10 \| 0.0106 \| 0.0115 \| 0.0131 \| 0.0063 \| 0.0066 \| 0.0065 n=200 \| 14.652 \| 15.3496 \| 9.8153 \| 0.2946 \| 0.2708 \| 0.267 - num_threads = 1 \| Bef(ms) \| Bef(ms) \| Bef(ms) \| Aft(ms) \| Aft(ms) \| Aft(ms) -- \| -- \| -- \| -- \| -- \| -- \| -- size \| dim=0 \| dim=1 \| dim=2 \| dim=0 \| dim=1 \| dim=2 n=10 \| 0.0108 \| 0.0116 \| 0.0132 \| 0.0058 \| 0.0062 \| 0.0061 n=200 \| 12.5132 \| 13.0785 \| 9.6738 \| 8.3733 \| 7.3051 \| 6.4566 https://github.com/pytorch/pytorch/issues/24671 and https://github.com/pytorch/pytorch/issues/24672 are also fixed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34875 Differential Revision: D20605596 Pulled By: ngimel fbshipit-source-id: 08fd4dacd1db63309123d7ec5942a4b8a0071896	2020-03-24 12:12:27 -07:00
xiaobingsuper	fb70893e78	remove cadd_avx2 dead code (#34883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34883 Test Plan: Imported from OSS Differential Revision: D20611526 Pulled By: ngimel fbshipit-source-id: 78c80b7361119fc8d2b9f6b4f0c86b61723fe05d	2020-03-24 12:00:56 -07:00
Edward Yang	3f896ef743	Trying pinning pyyaml and setuptools on macos to older version (#35296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35296 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20624843 Pulled By: ezyang fbshipit-source-id: 9028f1dd62d0c25e916eb4927fd8dd6acbd88886	2020-03-24 11:53:10 -07:00
Nikita Shulga	0f0a5b11b8	Disable C4251 when compiling cpp_extensions on Windows (#35272 ) Summary: Otherwise, VC++ will warn that every exposed C++ symbol, for example: ``` include\c10/core/impl/LocalDispatchKeySet.h(53): warning C4251: 'c10::impl::LocalDispatchKeySet::included_': class 'c10::DispatchKeySet' needs to have dll-interface to be used by clients of struct 'c10::impl::LocalDispatchKeySet' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35272 Test Plan: CI Differential Revision: D20623005 Pulled By: malfet fbshipit-source-id: b635b674159bb9654e4e1a1af4394c4f36fe35bd	2020-03-24 11:08:28 -07:00
Nik Ved	1d52530855	simpler 'cpu_scatter_gather_base_kernel' (#34690 ) Summary: Simplifies `cpu_scatter_gather_base_kernel` to accept only binary operations and spares them from doing redundant checks. CC v0dro Pull Request resolved: https://github.com/pytorch/pytorch/pull/34690 Differential Revision: D20604814 Pulled By: ngimel fbshipit-source-id: 5e22c2f39a8e2861dc763454c88796d1aa38d2eb	2020-03-24 11:00:59 -07:00
Supriya Rao	55019d357e	[quant][graphmode] Add observers for dynamic quant (#35121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35121 For dynamic quantization we insert observers at the input to mimic the quatization of activations that happens in the operator Observer for weight is inserted similar to static quant Test Plan: python test/test_quantize_script.py Sample output for single layer FC .graph(%self : __torch__.___torch_mangle_4.M, %x.2 : Tensor): %_observer_1 : __torch__.torch.quantization.observer.MinMaxObserver = prim::GetAttr[name="_observer_1"](%self) %x.1 : Tensor = prim::CallMethod[name="forward"](%_observer_1, %x.2) %2 : __torch__.torch.nn.modules.linear.___torch_mangle_5.Linear = prim::GetAttr[name="fc"](%self) %3 : Tensor = prim::CallMethod[name="forward"](%2, %x.1) # test/test_quantize_script.py:19:23 return (%3) graph(%self : __torch__.torch.nn.modules.linear.___torch_mangle_5.Linear, %input.1 : Tensor): %2 : Function = prim::Constant[name="linear"]() %3 : Tensor = prim::GetAttr[name="weight"](%self) %_observer_0 : __torch__.torch.quantization.observer.MinMaxObserver = prim::GetAttr[name="_observer_0"](%self) %7 : Tensor = prim::CallMethod[name="forward"](%_observer_0, %3) %4 : Tensor = prim::GetAttr[name="bias"](%self) %5 : Tensor = prim::CallFunction(%2, %input.1, %7, %4) # /home/supriyar/miniconda3/envs/pytorch_py3/lib/python3.7/site-packages/torch/nn/modules/linear.py:87:15 return (%5) Imported from OSS Differential Revision: D20599144 fbshipit-source-id: 9a8fa0e8655b9908826b981dce8a11d86efce5df	2020-03-24 10:54:16 -07:00
Pritam Damania	a045343402	[Autograd Testing] Add a test where child reentrant task fails. (#35223 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35223 Adding tests as part of https://github.com/pytorch/pytorch/issues/34367. This test covers "Mixed with errors" -> "Reentrant on same device" -> "Make child error before parent finishes" ghstack-source-id: 100725947 Test Plan: waitforbuildbot Differential Revision: D20603127 fbshipit-source-id: 08484b0a98053491459e076bdd23caf042c47150	2020-03-24 10:42:37 -07:00
Shihao Xu	36e3c005f0	Add python excepiton handling catch block to resolve deadlock (#35283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35283 https://github.com/pytorch/pytorch/issues/34260 Deadlock on destructing py::error_already_set. There are request callback impls in Python, where Python exceptions could be thrown. For releasing Python exception py::objects, GIL must be held. Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork && \ buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_torchscript_functions_not_supported ``` Differential Revision: D7753253 fbshipit-source-id: 4bfaaaf027e4254f5e3fedaca80228c8b4282e39	2020-03-24 10:19:57 -07:00
Hong Xu	925cdd57dc	Replace all uses of AT_INDEX_ERROR with TORCH_CHECK_INDEX (#35050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35050 Differential Revision: D20550978 Pulled By: ezyang fbshipit-source-id: df7c0730f27d2986601b1dee17e41957be2956de	2020-03-24 09:10:51 -07:00
Michael Carilli	0f0271e255	[RELAND2] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35102 ) Summary: This is the second reland attempt for https://github.com/pytorch/pytorch/pull/32140. The first reland attempt https://github.com/pytorch/pytorch/pull/35011 failed due a [small incompatible change](https://github.com/pytorch/pytorch/pull/35011#issuecomment-601754216) in recent master (`skipIfRocm` was removed from `test_data_parallel.py`). The present PR restores skipIfRocm. Description from first reland attempt https://github.com/pytorch/pytorch/pull/35011: > https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](`d0577e19f0`) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI. The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer. > > The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8. The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8. All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140. > > Original description of https://github.com/pytorch/pytorch/pull/32140: > > Initial integration of eager autocasting, supporting out-of-place ops only for easier review. > Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081 > > > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35102 Differential Revision: D20596918 Pulled By: ezyang fbshipit-source-id: 60caa279bb0ce4a9bb0b28c1d585d42cf1cc7e50	2020-03-24 09:08:04 -07:00
Hong Xu	50eb1a389f	Add cpu_serial_kernel_vec (#34553 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34553 This allows vectorized looping in a serial iteration over TensorIterator. Test Plan: Imported from OSS Differential Revision: D20604238 Pulled By: ezyang fbshipit-source-id: 61c451dac91d47cde7e1a937b271ab78c79e05d3	2020-03-24 08:57:50 -07:00
anjali411	73a36a47a5	Gradcheck for complex (#35238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35238 Differential Revision: D20607581 Pulled By: anjali411 fbshipit-source-id: 2caf78314a87461b255fd65c7f71c72e152b5161	2020-03-24 08:40:14 -07:00
Peter Bell	6f6436ff5d	Fix input overwriting in irfft (#35219 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34551 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35219 Differential Revision: D20605330 Pulled By: ezyang fbshipit-source-id: a62f1685779bb05c3682255bb3a3f6f9ec35814f	2020-03-24 08:27:06 -07:00
peter	d7a7bcb042	Load torch_global_deps for Windows (#35177 ) Summary: Fixes https://discuss.pytorch.org/t/torch-cat-runtimeerror-error-in-loadlibrarya/71188/8. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35177 Differential Revision: D20604654 Pulled By: ezyang fbshipit-source-id: 263eb401300812fd336ff820c53b543342dca95e	2020-03-24 08:20:45 -07:00
James Reed	618c6214aa	[reapply][JIT] Namespaces for TorchBind (#35254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35254 Reapply D20541090 with some BC fixes ghstack-source-id: 100733987 Test Plan: buck test mode/dev-nosan //caffe2/torch/fb/predictor/model_repo/tests:ai_infra_representative_model_shard_6_test -- 'RepresentativeModelTest\/ShardedRepresentativeModelTest\.RunModel\/0' Reviewed By: zdevito Differential Revision: D20607111 fbshipit-source-id: 80f148d860571208c93e9308128cd480ff089f74	2020-03-24 00:39:48 -07:00
James Reed	17068ba467	[JIT] BC shim for TorchBind classes (#35240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35240 This makes it so that if we have an old serialized TorchBind class, we don't try to load it in and instead rely on the ClassType that's in memory. ghstack-source-id: 100703946 Test Plan: buck test mode/dev-nosan //caffe2/torch/fb/predictor/model_repo/tests:ai_infra_representative_model_shard_6_test -- 'RepresentativeModelTest\/ShardedRepresentativeModelTest\.RunModel\/0' Reviewed By: zdevito Differential Revision: D20605681 fbshipit-source-id: 5403f68937f822914c701d9c80573f0b4a93e83b	2020-03-24 00:38:02 -07:00
Karl Ostmo	8b8af0d458	Revert D20539336: [JIT] add IR complexity tests Test Plan: revert-hammer Differential Revision: D20539336 Original commit changeset: 14ac00a7b2b0 fbshipit-source-id: 1a51b461e88720599faf04dd3ca443d87f4de66d	2020-03-23 23:24:17 -07:00
Mike Ruberry	7c1ea736ba	Extends true_divide to be a method (#34794 ) Summary: Per title. See related https://github.com/pytorch/pytorch/pull/34570. In PyTorch 1.7 the plan is for torch.div and Python's division operator to perform "true" division, like Python 3, JAX, and NumPy. To facilitate this change, this PR expands true_divide to be a method so it can cover all of torch.div's use cases. New true_divide tests are added to test_torch.py, test_type_promotion.py, and test_sparse.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34794 Differential Revision: D20545507 Pulled By: mruberry fbshipit-source-id: 55286f819716c8823d1930441a69008560ac2bd5	2020-03-23 23:12:23 -07:00
Jerry Zhang	cd75d4e274	[quant][graphmode] Add prim::ListConstruct to general op handling (#34345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34345 prim::ListConstruct is similar to an op that doesn't require observation we want to make sure we can propagate observed property through it Test Plan: this will be tested when we add support for cat https://github.com/pytorch/pytorch/pull/34346 Imported from OSS Differential Revision: D20524455 fbshipit-source-id: b5f8e0c8776d48d588aeba6735de06dcd308560e	2020-03-23 22:33:43 -07:00
Jerry Zhang	537fdd77d5	[quant][graphmode] quantization support for view, transpose, contiguos (#34854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34854 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20524456 fbshipit-source-id: e6e8fc3db6cccbd32c210d04f921274d81996fe2	2020-03-23 22:33:39 -07:00
Jerry Zhang	4a96911629	[quant][graphmode] quantization support for aten::chunk (#34806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34806 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20524454 fbshipit-source-id: 92ac9bc251581e963258cb90dc3de73f8508c822	2020-03-23 22:33:34 -07:00
Jerry Zhang	9c8f09d1a4	[quant][graphmode] quantization support for prim::ListUnpack (#34807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34807 Test Plan: python test/test_jit.py in https://github.com/pytorch/pytorch/pull/34806 Imported from OSS Differential Revision: D20524452 fbshipit-source-id: 31956894d6be58b6ba96b02d338dd1fd802aeefc	2020-03-23 22:31:52 -07:00
Nikita Shulga	c46c28a7cb	Fix `JitTest.ADFormulas` intermittent failures (#35196 ) Summary: Clamp input tensor values to [3, 3] to limit how small `tanh` gradint can get Pull Request resolved: https://github.com/pytorch/pytorch/pull/35196 Test Plan: CI + `bin/test_jit --gtest_filter=JitTest.ADFormulas --gtest_repeat=60000 --gtest_break_on_failure` Differential Revision: D20611256 Pulled By: malfet fbshipit-source-id: 8640faa5d8567d6c6df8cc5df80c2e65407116eb	2020-03-23 22:21:30 -07:00
Wanchao Liang	9e7821ee82	[autograd] allow PyNode to persist error message (#34845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34845 This PR allows PyNode to persist the error message so that any pure C++ thread that runs autograd with custom Python autograd function can successfully catpure the error message without maintaining a initial PyThreadState. Test Plan: Imported from OSS Differential Revision: D20480685 Pulled By: wanchaol fbshipit-source-id: 0488ea5a4df9a33b53ac5d0d59000c41ab6cb748	2020-03-23 21:54:28 -07:00
Jongsoo Park	8346959f38	[caffe2] merge internal (RowWise)SparseAdagrad into open source version (#35090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35090 As a preparation to open source fp16 + stochastic rounding SparseAdagrad and fused SparseAdagrad Other minor changes: * Removed template parameters T that are not actually used * Removed unnecessary anonymous namespaces used in header files Test Plan: CI Reviewed By: jianyuh Differential Revision: D20552770 fbshipit-source-id: 224fdca15ea786620ce88e33cbcbf97661423538	2020-03-23 21:48:08 -07:00
Jerry Zhang	ac4a0224f3	[quant][graphmode] Replicate quantize node for prim::If (#34804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34804 We want to replicate the quantize node for return values in blocks of prim::If in order to create the quantization patterns. Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20524453 fbshipit-source-id: 2268ac555f646158f4e1ffc98ccc8101d7504194	2020-03-23 21:20:45 -07:00
Jeremy Lilley	340ccf56fb	[PyTorch-RPC] In process_group_agent, avoid read-after-free (#35252 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35252 The torch::from_blob() syntax without a deleter syntax is relatively dangerous and explicitly assumes that the caller will correctly persist the tensor bits for as long as necessary. We were at one point correctly persisting the send tensor bits in process_group_agent, but with the early-return codepaths are not doing so any longer. This change switches to a more robust approach where we instead just use the torch::from_blob-with-deleter syntax, and use std::move to avoid a copy. There's an extra malloc, but that's effectively free compared with the rest of the work involved here. And it means we don't have to worry about the Tensor memory vanishing from underneath the send anymore. The initial motivation here was dist_autograd_node_failure flakiness. While the motivating case is handleSend(), we also fix handlePendingMessage(). ghstack-source-id: 100704883 Test Plan: existing test coverage, e.g. buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/test:ProcessGroupAgentTest Differential Revision: D20607028 fbshipit-source-id: cf9966c5aa9472830cfefaf7fc2f92af9b52630d	2020-03-23 20:57:09 -07:00
Lingyi Liu	fddcd72a31	Add the more fusion (conv3d and batchnorm)support in pytorch quantization flow (#33540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33540 Differential Revision: D19994498 Pulled By: lly-zero-one fbshipit-source-id: e5e13eab6924bd2ce1b57b16b672844b8b9638f5	2020-03-23 20:36:03 -07:00
Peter Bell	bd0ef784e0	FAQ: Add note about recovering from OOM (#35214 ) Summary: Closes https://github.com/pytorch/pytorch/issues/18853 This documents the workaround needed to solve the issues in https://github.com/pytorch/pytorch/issues/18853 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35214 Differential Revision: D20604877 Pulled By: ezyang fbshipit-source-id: 71ed13cfa567d8e88fa9f18180a171cd174fb528	2020-03-23 20:22:46 -07:00
svcscm	97ecfb4929	Updating submodules Summary: GitHub commits: `0bd7af23b3` `4d2f1db963` `d300d10962` `9eafc02456` `a2f301b725` `b0395f0b0e` `dcb403515c` `74a3c0ae53` `60b734a667` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: 6fb6754598b16939b1b8acd7ea7d022dd7ee473c	2020-03-23 20:16:57 -07:00
Shen Li	ccf8dd6209	Print exitcode on failures in test_distributed.py (#35233 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35233 Test Plan: Imported from OSS Differential Revision: D20605264 Pulled By: mrshenli fbshipit-source-id: f3d814943c88c2e2fa5da7e642bbcf9a405d08e6	2020-03-23 18:30:17 -07:00
Mikhail Zolotukhin	1b119861a8	[TensorExpr] Cleanup includes in loopnest.h, use forward declarations when possible. (#35129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35129 Test Plan: Imported from OSS Differential Revision: D20569733 Pulled By: ZolotukhinM fbshipit-source-id: c746c5e705ff79bd8c60c1ec94aa2319dfd669e1	2020-03-23 18:28:47 -07:00
Martin Yuan	b9fbec96e6	Support LIST_UNPACK and TUPLE_SLICE in lite interpreter. (#35241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35241 Test Plan: Imported from OSS Differential Revision: D20609439 Pulled By: iseeyuan fbshipit-source-id: 4f352b8641c203aaf9f2204e4080876bd4d47b0c	2020-03-23 18:21:26 -07:00
Jerry Zhang	eff68bc872	[quant][graphmode] quantization support for aten::add (#34572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34572 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20519607 fbshipit-source-id: c57e062cffc24a47a76b73b58aff7f9ef80183fa	2020-03-23 17:52:28 -07:00
Eli Uriegas	b2dcedf71e	.circleci: Ensure describe happens in pytorch repo (#35065 ) Summary: Found an issue where the git describe wasn't properly executed since the binary_populate_env.sh script was being executed from a different directory. 'git -C' forces the describe to run in the running directory for the script which should contain the correct git information Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35065 Differential Revision: D20603172 Pulled By: seemethere fbshipit-source-id: b19112ce4cb2dc45fbb3f84dedc4f1d3f2259748	2020-03-23 17:45:09 -07:00
Rohan Varma	8bb7f1ad11	[rpc] various fixes for ProcessGroupAgent (#34943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34943 Follow up to address Jeremy's and Shen's comments on https://github.com/pytorch/pytorch/pull/34413: 1) Continue trying even if one `agent->send()` fails when cleaning up dist autograd ctx 2) Use RAII for lock in process group agent `handleSend` 3) Return bool instead of int in `ProcessGroupAgent::handleRecv` to determine if the count should be incremented 4) Move recvCounts increment in timed out future processing to be within the block that ensures the future already doesn't have an error. ghstack-source-id: 100681746 Test Plan: CI Differential Revision: D20506065 fbshipit-source-id: 14a2820b3ae7a65edd103f0b333c4bc21e821235	2020-03-23 17:43:32 -07:00
Zino Benaissa	c321f02756	Follow up on freezing (#34786 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34786 1) Rename 'HashIValue' to 'HashAliasedIValue' 2) Added Object case in getSubValues function 3) Hashes tensors to their storage 4) Added Dict case in orverrideGradient 5) nit clean up Test Plan: Imported from OSS Differential Revision: D20585270 Pulled By: bzinodev fbshipit-source-id: f580f3cb80dd5623088a014efd5f0f5ccc1659c0	2020-03-23 17:25:40 -07:00
Elias Ellison	7ab25b2e6b	[JIT] add id function (#34975 ) Summary: add `id` function so to give uses a way of keeping a `seen` set of nn modules. n practice, this is only used between values of `T` and `T` or `T` and `Optional[T]`, so in this implementation I made it so that None is the only value that can be zero. Python also only guarantees `id()` gives semantically meaningful results for pointer types. EDIT: now only allowing id on class types Pull Request resolved: https://github.com/pytorch/pytorch/pull/34975 Reviewed By: driazati Differential Revision: D20599564 Pulled By: eellison fbshipit-source-id: 3c6666a9b9b0258198adc70969dd6332e3375e4f	2020-03-23 17:10:13 -07:00
Gao, Xiang	131af4412e	Add TORCH_CUDA_API to FilterDescriptor (#35131 ) Summary: `FilterDescriptor` is missing a `TORCH_CUDA_API`, so this symbol is not exported from `torch_cuda.so`, and users could have trouble building cpp_extension when using cudnn. cc: ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/35131 Differential Revision: D20604439 Pulled By: ezyang fbshipit-source-id: c57414fc8a9df9cb1e910e2ec0a48cfdbe7d1779	2020-03-23 17:04:19 -07:00
Nikita Shulga	6fa0b3df2e	[testing] Pass verbosity settings to `XMLTestRunner` (#35224 ) Summary: When `unittest.main()` is invoked with custom testRunner, verbosity settings for the runner must be set manually Pull Request resolved: https://github.com/pytorch/pytorch/pull/35224 Test Plan: CI Differential Revision: D20605896 Pulled By: malfet fbshipit-source-id: 79fc6f55911189b6d8a4bc83bd2390c94bd69e5e	2020-03-23 16:37:52 -07:00
Jeff Daily	bfdcc39301	in test_c10d.py, remove skip_if_rocm from tests that pass locally (#35124 ) Summary: iotamudelta Test passed three iterations on the CI, no flakiness detected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35124 Differential Revision: D20604748 Pulled By: ezyang fbshipit-source-id: ed013ca27f38a3610108421932245b494fac28c0	2020-03-23 15:57:41 -07:00
Vitaly Fedyunin	40da01db6a	Add docs about memory format (#34818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34818 Test Plan: Imported from OSS Differential Revision: D20601336 Pulled By: VitalyFedyunin fbshipit-source-id: d34ad226be950bf134c6b383a4810ea6aa75599e	2020-03-23 15:06:33 -07:00
Nikita Shulga	93983c7d00	Add `USE_TSAN` option (#35197 ) Summary: Sometimes it is important to run code with thread sanitizer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35197 Test Plan: CI Differential Revision: D20605005 Pulled By: malfet fbshipit-source-id: bcd1a5191b5f859e12b6df6737c980099b1edc36	2020-03-23 14:56:42 -07:00
Jerry Zhang	a00e12e755	[quant][graphmode] weight/bias of linear/conv can be reused for multiple ops (#35221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35221 When weight is reused, we only need to insert one observer for the weight Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20602492 fbshipit-source-id: e003e6316f6615f3526f0d00fb7b722148b4749b	2020-03-23 14:21:59 -07:00
Pavel Belevich	3cd3f0b3f1	Fix Tensor __radd__ type hint issue (#35231 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35231 Fixes #35213 (Note: this ignores all push blocking failures!) Test Plan: `mypy -c "import torch; ten = torch.tensor([1.0, 2.0, 3.0]); print(7 + ten)"` should not produce any warnings Differential Revision: D20604924 Pulled By: pbelevich fbshipit-source-id: 53a293a99b3f2ab6ca5516b31f3a92f67eb67a39	2020-03-23 14:13:30 -07:00
Xiao Wang	37e355622a	Pass the missed "non_blocking" argument for to() (#35144 ) Summary: The following code ```python a = torch.randn(42,) b = a.cuda(non_blocking=True) ``` will be blocked in the current master, and will not be blocked in pytorch 1.4 release. This can be verified by a `nvprof --print-api-trace python script.py` profiling. It is causing performance issue. I isolated the problem, and jjsjann123 & ptrblck pointed out the fix. Thanks! cc csarofeen ptrblck jjsjann123 VitalyFedyunin ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/35144 Differential Revision: D20601163 Pulled By: ngimel fbshipit-source-id: edd2b1dabd8e615c106188f30ddb3e763bde7471	2020-03-23 13:49:23 -07:00
Gao, Xiang	521c424b39	Make discontiguous tensors also benefit from unrolling (#34708 ) Summary: This is based on https://github.com/pytorch/pytorch/pull/33720, I didn't use stacked diff because is not very convenient for cherry-picking. Please review after https://github.com/pytorch/pytorch/issues/33720 merged. Benchmark shows an up to ~10% improvement on half on RTX 2080Ti: https://github.com/zasdfgbnm/things/blob/master/2020Q1/benchmark-unroll-with-discontig-input.ipynb We now have a `TrivialOffsetCalculator`, and the unroll strategy takes input offset calculator and output offset calculator as arguments of its constructor. In case of when we know that it is contiguous (for example when the unroll strategy is used inside vectorized kernel), the trivial offset calculator will be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34708 Differential Revision: D20601566 Pulled By: ngimel fbshipit-source-id: e20e38517efb31c8af5fc377538992a980ff4130	2020-03-23 13:41:09 -07:00
Elias Ellison	9441c7a944	[JIT] add IR complexity tests (#34918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34918 I'm going to set this up as a benchmarking test that runs internally in FB, but soliciting reviews externally first. I think that benchmarking complexity of our nn module & functional tests is useful because they are the building blocks of models, so they should be pretty representative of generic model complexity . This also separates out complexity benchmarking into tests that are easily debuggable given a regression, instead of a 50K node resnet graph. For each test, i am testing the profiled graph with consistent shapes, and I am testing - Number of If & loop statements - Number of non-tensor nodes (outputs don't include tensor) This is just a starting off point for testing IR complexity. Future plans could involve: - adding resnet, or other models in the model repo - benchmarking number of guards Current output: Functional tests: ``` ('Name', 'Ifs/Loops', 'non-tensor ops') ('conv1d', 0, 0) ('conv2d', 0, 0) ('conv3d', 0, 0) ('conv_transpose1d', 0, 0) ('conv_transpose2d', 0, 0) ('conv_transpose3d', 0, 0) ('conv_tbc', 0, 0) ('avg_pool1d', 0, 0) ('avg_pool2d', 0, 0) ('avg_pool3d', 0, 0) ('fractional_max_pool2d', 0, 3) ('max_pool1d', 0, 0) ('max_pool1d', 0, 0) ('max_pool2d', 0, 0) ('max_pool2d', 0, 0) ('max_pool3d', 0, 0) ('max_unpool1d', 0, 12) ('max_unpool2d', 0, 22) ('max_unpool3d', 0, 33) ('lp_pool1d', 0, 0) ('lp_pool2d', 0, 0) ('adaptive_max_pool1d', 0, 0) ('adaptive_max_pool2d', 0, 6) ('adaptive_max_pool3d', 0, 9) ('adaptive_avg_pool1d', 0, 0) ('adaptive_avg_pool2d', 0, 6) ('adaptive_avg_pool3d', 0, 9) ('dropout', 0, 0) ('alpha_dropout', 0, 0) ('dropout2d', 0, 0) ('dropout3d', 0, 0) ('feature_alpha_dropout', 0, 0) ('threshold', 0, 0) ('threshold', 0, 0) ('relu', 0, 0) ('relu', 0, 0) ('glu', 0, 0) ('hardtanh', 0, 0) ('hardtanh', 0, 0) ('relu6', 0, 0) ('relu6', 0, 0) ('elu', 0, 0) ('elu', 0, 0) ('selu', 0, 0) ('selu', 0, 0) ('celu', 0, 0) ('celu', 0, 0) ('leaky_relu', 0, 0) ('leaky_relu', 0, 0) ('rrelu', 0, 0) ('rrelu', 0, 0) ('hardshrink', 0, 0) ('tanhshrink', 0, 0) ('softsign', 0, 0) ('softplus', 0, 0) ('softmin', 0, 0) ('softmax', 0, 0) ('softmax', 0, 0) ('tanh', 0, 1) ('sigmoid', 0, 1) ('log_softmax', 0, 0) ('linear', 0, 0) ('linear', 0, 0) ('bilinear', 0, 0) ('embedding', 0, 0) ('embedding_bag', 0, 0) ('batch_norm', 0, 0) ('instance_norm', 1, 6) ('layer_norm', 0, 0) ('layer_norm', 0, 0) ('layer_norm', 0, 0) ('layer_norm', 0, 0) ('group_norm', 3, 53) ('local_response_norm', 0, 0) ('nll_loss', 1, 5) ('poisson_nll_loss', 0, 0) ('poisson_nll_loss', 0, 0) ('kl_div', 0, 1) ('cross_entropy', 1, 5) ('binary_cross_entropy_with_logits', 0, 0) ('smooth_l1_loss', 1, 1) ('l1_loss', 1, 1) ('mse_loss', 1, 1) ('smooth_l1_loss', 1, 1) ('l1_loss', 1, 1) ('mse_loss', 1, 1) ('margin_ranking_loss', 0, 0) ('hinge_embedding_loss', 0, 0) ('soft_margin_loss', 0, 0) ('multilabel_soft_margin_loss', 0, 1) ('cosine_embedding_loss', 0, 0) ('pixel_shuffle', 0, 0) ('affine_grid', 3, 14) ('pad', 0, 0) ('pairwise_distance', 0, 0) ('pdist', 0, 0) ('cosine_similarity', 0, 0) ('triplet_margin_loss', 0, 0) ('normalize', 0, 0) ('unfold', 0, 0) ('fold', 0, 0) ('grid_sample', 0, 1) ('gumbel_softmax', 0, 0) ('gumbel_softmax', 0, 0) ('multilabel_margin_loss', 0, 0) ('multi_margin_loss', 0, 0) ('binary_cross_entropy', 1, 5) ('binary_cross_entropy', 1, 5) ('ctc_loss', 0, 0) ('upsample', 13, 71) ('upsample', 13, 71) ('interpolate', 14, 71) ('interpolate', 13, 70) ('interpolate', 14, 71) ('interpolate', 14, 71) ('interpolate', 13, 70) ('interpolate', 14, 71) ('interpolate', 14, 71) ('interpolate', 13, 70) ('interpolate', 14, 71) ('interpolate', 14, 71) ('interpolate', 13, 70) ('interpolate', 14, 71) ('interpolate', 14, 60) ('interpolate', 13, 58) ('interpolate', 14, 60) ('interpolate', 14, 60) ('interpolate', 13, 58) ('interpolate', 14, 60) ('interpolate', 14, 60) ('interpolate', 13, 58) ('interpolate', 14, 60) ('interpolate', 13, 82) ('interpolate', 14, 82) ('interpolate', 14, 82) ('interpolate', 13, 82) ('interpolate', 14, 82) ('interpolate', 14, 82) ('interpolate', 13, 82) ('interpolate', 14, 82) ('interpolate', 14, 71) ('interpolate', 14, 71) ('interpolate', 15, 106) ('interpolate', 14, 73) ('interpolate', 15, 106) ('interpolate', 14, 73) ('interpolate', 15, 92) ('interpolate', 14, 60) ('interpolate', 15, 94) ('interpolate', 14, 62) ('interpolate', 15, 116) ('interpolate', 14, 82) ('interpolate', 15, 118) ('interpolate', 14, 84) ``` nn module tests: ``` ('Name', 'Ifs/Loops', 'non-tensor ops') ('test_nn_Linear', 0, 0) ('test_nn_Linear_no_bias', 0, 0) ('test_nn_Threshold_threshold_value', 0, 0) ('test_nn_Threshold_large_value', 0, 0) ('test_nn_ReLU', 0, 0) ('test_nn_ReLU6', 0, 0) ('test_nn_RReLU', 0, 0) ('test_nn_RReLU_with_up_down', 0, 0) ('test_nn_Hardtanh', 0, 0) ('test_nn_Sigmoid', 0, 0) ('test_nn_Tanh', 0, 0) ('test_nn_Flatten', 0, 0) ('test_nn_Softmax', 0, 0) ('test_nn_Softmax2d', 0, 0) ('test_nn_LogSoftmax', 0, 0) ('test_nn_LogSoftmax_multiparam', 0, 0) ('test_nn_ELU', 0, 0) ('test_nn_Hardshrink', 0, 0) ('test_nn_LeakyReLU', 0, 0) ('test_nn_LeakyReLU_with_negval', 0, 0) ('test_nn_LogSigmoid', 0, 0) ('test_nn_Softplus', 0, 0) ('test_nn_Softplus_beta', 0, 0) ('test_nn_Softplus_beta_threshold', 0, 0) ('test_nn_Softshrink', 0, 0) ('test_nn_Softshrink_lambda', 0, 0) ('test_nn_PReLU_1d', 0, 0) ('test_nn_PReLU_1d_multiparam', 0, 0) ('test_nn_PReLU_2d', 0, 0) ('test_nn_PReLU_2d_multiparam', 0, 0) ('test_nn_PReLU_3d', 0, 0) ('test_nn_PReLU_3d_multiparam', 0, 0) ('test_nn_Softsign', 0, 0) ('test_nn_Softmin', 0, 0) ('test_nn_Softmin_multidim', 0, 0) ('test_nn_Tanhshrink', 0, 0) ('test_nn_FractionalMaxPool2d_ratio', 0, 7) ('test_nn_FractionalMaxPool2d_size', 0, 0) ('test_nn_FractionalMaxPool3d_ratio', 0, 10) ('test_nn_FractionalMaxPool3d_size', 0, 0) ('test_nn_FractionalMaxPool3d_asymsize', 0, 0) ('test_nn_BatchNorm1d_affine', 2, 3) ('test_nn_BatchNorm1d_3d_input', 3, 9) ('test_nn_BatchNorm1d_affine_simple_average', 2, 5) ('test_nn_BatchNorm1d_not_affine', 2, 3) ('test_nn_BatchNorm1d_not_tracking_stats', 0, 0) ('test_nn_BatchNorm1d_3d_input_not_affine', 3, 9) ('test_nn_BatchNorm1d_zero_batch', 3, 9) ('test_nn_BatchNorm2d', 3, 13) ('test_nn_BatchNorm2d_2d_simple_average', 3, 15) ('test_nn_BatchNorm2d_momentum', 3, 13) ('test_nn_BatchNorm2d_not_affine', 3, 13) ('test_nn_BatchNorm2d_not_tracking_stats', 1, 10) ('test_nn_BatchNorm2d_zero_batch', 3, 13) ('test_nn_BatchNorm3d', 3, 17) ('test_nn_BatchNorm3d_3d_simple_average', 3, 19) ('test_nn_BatchNorm3d_momentum', 3, 17) ('test_nn_BatchNorm3d_not_affine', 3, 17) ('test_nn_BatchNorm3d_not_tracking_stats', 1, 14) ('test_nn_BatchNorm3d_zero_batch', 3, 17) ('test_nn_InstanceNorm1d', 1, 6) ('test_nn_InstanceNorm1d_tracking_stats', 1, 6) ('test_nn_InstanceNorm2d', 1, 10) ('test_nn_InstanceNorm2d_tracking_stats', 1, 10) ('test_nn_InstanceNorm3d', 1, 14) ('test_nn_InstanceNorm3d_tracking_stats', 1, 14) ('test_nn_LayerNorm_1d_elementwise_affine', 0, 0) ('test_nn_LayerNorm_1d_no_elementwise_affine', 0, 0) ('test_nn_LayerNorm_3d_elementwise_affine', 0, 0) ('test_nn_LayerNorm_3d_no_elementwise_affine', 0, 0) ('test_nn_LayerNorm_1d_empty_elementwise_affine', 0, 0) ('test_nn_GroupNorm_1d_affine', 3, 53) ('test_nn_GroupNorm_1d_no_affine_IN', 3, 53) ('test_nn_GroupNorm_1d_no_affine_LN', 3, 53) ('test_nn_GroupNorm_2d_affine', 3, 53) ('test_nn_GroupNorm_2d_no_affine_IN', 3, 53) ('test_nn_GroupNorm_2d_no_affine_LN', 3, 53) ('test_nn_Conv1d', 0, 0) ('test_nn_Conv1d_stride', 0, 0) ('test_nn_Conv1d_pad1', 0, 0) ('test_nn_Conv1d_pad2', 0, 0) ('test_nn_Conv1d_pad1size1', 0, 0) ('test_nn_Conv1d_pad2size1', 0, 0) ('test_nn_Conv1d_zero_batch', 0, 0) ('test_nn_Conv1d_dilated', 0, 0) ('test_nn_Conv1d_groups', 0, 0) ('test_nn_ConvTranspose1d', 0, 0) ('test_nn_ConvTranspose1d_no_bias', 0, 0) ('test_nn_ConvTranspose1d_dilated', 0, 0) ('test_nn_ConvTranspose1d_groups', 0, 0) ('test_nn_MaxPool1d', 0, 0) ('test_nn_MaxPool1d_stride', 0, 0) ('test_nn_Conv2d', 0, 0) ('test_nn_Conv2d_strided', 0, 0) ('test_nn_Conv2d_padding', 0, 0) ('test_nn_Conv2d_dilated', 0, 0) ('test_nn_Conv2d_no_bias', 0, 0) ('test_nn_Conv2d_zero_batch', 0, 0) ('test_nn_Conv2d_groups', 0, 0) ('test_nn_Conv2d_groups_thnn', 0, 0) ('test_nn_ConvTranspose2d', 0, 0) ('test_nn_ConvTranspose2d_dilated', 0, 0) ('test_nn_ConvTranspose2d_no_bias', 0, 0) ('test_nn_ConvTranspose2d_groups', 0, 0) ('test_nn_Conv2d_depthwise', 0, 0) ('test_nn_Conv2d_depthwise_with_multiplier', 0, 0) ('test_nn_Conv2d_depthwise_strided', 0, 0) ('test_nn_Conv2d_depthwise_padded', 0, 0) ('test_nn_Conv2d_depthwise_dilated', 0, 0) ('test_nn_MaxPool2d', 0, 0) ('test_nn_AvgPool1d', 0, 0) ('test_nn_AvgPool1d_stride', 0, 0) ('test_nn_AvgPool1d_stride_pad', 0, 0) ('test_nn_AvgPool2d', 0, 0) ('test_nn_AvgPool2d_stride', 0, 0) ('test_nn_AvgPool2d_stride_pad', 0, 0) ('test_nn_AvgPool2d_divisor', 0, 0) ('test_nn_AvgPool2d_divisor_stride', 0, 0) ('test_nn_AvgPool2d_divisor_stride_pad', 0, 0) ('test_nn_LPPool2d', 0, 0) ('test_nn_LPPool2d_norm', 0, 0) ('test_nn_LPPool1d_norm', 0, 0) ('test_nn_LPPool1d', 0, 0) ('test_nn_LocalResponseNorm_1d', 0, 0) ('test_nn_LocalResponseNorm_2d_uneven_pad', 0, 0) ('test_nn_LocalResponseNorm_3d_custom_params', 0, 0) ('test_nn_ReflectionPad1d', 0, 0) ('test_nn_ReflectionPad2d', 0, 0) ('test_nn_ReplicationPad1d', 0, 0) ('test_nn_ReplicationPad2d', 0, 0) ('test_nn_ZeroPad2d', 0, 0) ('test_nn_ZeroPad2d_negative_dims', 0, 0) ('test_nn_ConstantPad1d', 0, 0) ('test_nn_ConstantPad2d', 0, 0) ('test_nn_ConstantPad3d', 0, 0) ('test_nn_Conv3d', 0, 0) ('test_nn_Conv3d_no_bias', 0, 0) ('test_nn_Conv3d_stride', 0, 0) ('test_nn_Conv3d_stride_padding', 0, 0) ('test_nn_Conv3d_zero_batch', 0, 0) ('test_nn_Conv3d_groups', 0, 0) ('test_nn_Conv3d_dilated', 0, 0) ('test_nn_Conv3d_dilated_strided', 0, 0) ('test_nn_ConvTranspose3d', 0, 0) ('test_nn_ConvTranspose3d_dilated', 0, 0) ('test_nn_MaxPool3d', 0, 0) ('test_nn_MaxPool3d_stride', 0, 0) ('test_nn_MaxPool3d_stride_padding', 0, 0) ('test_nn_AvgPool3d', 0, 0) ('test_nn_AvgPool3d_stride', 0, 0) ('test_nn_AvgPool3d_stride_pad', 0, 0) ('test_nn_AvgPool3d_stride_pad_gpu_fixedkw_output', 0, 0) ('test_nn_AvgPool3d_stride_pad_gpu_general_output', 0, 0) ('test_nn_AvgPool3d_stride1_pad0_gpu_input', 0, 0) ('test_nn_AvgPool3d_stride_pad_gpu_input_nooverlap', 0, 0) ('test_nn_AvgPool3d_divisor', 0, 0) ('test_nn_AvgPool3d_divisor_stride', 0, 0) ('test_nn_AvgPool3d_divisor_stride_pad', 0, 0) ('test_nn_AvgPool3d_divisor_stride_pad_gpu_fixedkw_output', 0, 0) ('test_nn_AvgPool3d_divisor_stride_pad_gpu_general_output', 0, 0) ('test_nn_AvgPool3d_divisor_stride1_pad0_gpu_input', 0, 0) ('test_nn_AvgPool3d_divisor_stride_pad_gpu_input_nooverlap', 0, 0) ('test_nn_ReplicationPad3d', 0, 0) ('test_nn_Embedding', 0, 0) ('test_nn_EmbeddingBag_mean', 0, 2) ('test_nn_EmbeddingBag_sum', 0, 2) ('test_nn_EmbeddingBag_max', 0, 2) ('test_nn_EmbeddingBag_sparse', 0, 2) ('test_nn_Embedding_sparse', 0, 0) ('test_nn_PixelShuffle', 0, 0) ('test_nn_AdaptiveMaxPool1d', 0, 0) ('test_nn_AdaptiveMaxPool2d_single', 0, 6) ('test_nn_AdaptiveMaxPool2d_tuple', 0, 6) ('test_nn_AdaptiveMaxPool3d_single', 0, 9) ('test_nn_AdaptiveMaxPool3d_tuple', 0, 9) ('test_nn_AdaptiveMaxPool3d_single_nonatomic', 0, 9) ('test_nn_AdaptiveMaxPool3d_tuple_nonatomic', 0, 9) ('test_nn_AdaptiveAvgPool1d', 0, 0) ('test_nn_AdaptiveAvgPool1d_one_output', 0, 0) ('test_nn_AdaptiveAvgPool2d_single', 0, 6) ('test_nn_AdaptiveAvgPool2d_single_1x1output', 0, 6) ('test_nn_AdaptiveAvgPool2d_tuple', 0, 6) ('test_nn_AdaptiveAvgPool3d_single', 0, 9) ('test_nn_AdaptiveAvgPool3d_tuple', 0, 9) ('test_nn_SELU', 0, 0) ('test_nn_SELU_scalar', 0, 0) ('test_nn_CELU', 0, 0) ('test_nn_CELU_scalar', 0, 0) ('test_nn_GLU', 0, 0) ('test_nn_GLU_dim', 0, 0) ('test_nn_GELU_scalar', 0, 0) ('test_nn_GELU', 0, 0) ('test_nn_Unfold', 0, 0) ('test_nn_Fold', 0, 0) ('test_nn_Unfold_int_input', 0, 0) ('test_nn_Fold_int_input', 0, 0) ('test_nn_Threshold_threshold_value_scalar', 0, 0) ('test_nn_ReLU_scalar', 0, 0) ('test_nn_ReLU6_scalar', 0, 0) ('test_nn_RReLU_with_up_down_scalar', 0, 0) ('test_nn_Hardtanh_scalar', 0, 0) ('test_nn_Sigmoid_scalar', 0, 0) ('test_nn_Tanh_scalar', 0, 0) ('test_nn_Softmax_scalar', 0, 0) ('test_nn_LogSoftmax_multiparam_scalar', 0, 0) ('test_nn_ELU_scalar', 0, 0) ('test_nn_Hardshrink_scalar', 0, 0) ('test_nn_LeakyReLU_with_negval_scalar', 0, 0) ('test_nn_LogSigmoid_scalar', 0, 0) ('test_nn_Softplus_beta_threshold_scalar', 0, 0) ('test_nn_Softshrink_lambda_scalar', 0, 0) ('test_nn_PReLU_scalar', 0, 0) ('test_nn_Softsign_scalar', 0, 0) ('test_nn_Softmin_scalar', 0, 0) ('test_nn_Tanhshrink_scalar', 0, 0) ('test_nn_Conv1d_reflect_stride2_pad2', 3, 14) ('test_nn_Conv2d_reflect_stride2_pad2', 3, 14) ('test_nn_Conv1d_circular_stride2_pad2', 5, 31) ('test_nn_Conv2d_circular_stride2_pad2', 5, 31) ('test_nn_Conv3d_circular_stride2_pad2', 5, 31) ('test_nn_Conv1d_replicate_stride2_pad2', 3, 14) ('test_nn_Conv2d_replicate_stride2_pad2', 3, 14) ('test_nn_Conv3d_replicate_stride2_pad2', 3, 14) ('test_nn_Conv1d_zeros_stride2_pad2', 0, 0) ('test_nn_Conv2d_zeros_stride2_pad2', 0, 0) ('test_nn_Conv3d_zeros_stride2_pad2', 0, 0) ('test_nn_Bilinear', 0, 0) ('test_nn_RNNCell', 3, 14) ('test_nn_LSTMCell', 5, 22) ('test_nn_GRUCell', 3, 14) ('test_nn_MultiheadAttention', 40, 160) ('test_nn_Transformer', 128, 499) ``` Test Plan: Imported from OSS Differential Revision: D20539336 Pulled By: eellison fbshipit-source-id: 14ac00a7b2b029b9e57f6131dd45426b0101941a	2020-03-23 11:59:11 -07:00
Elias Ellison	4fae5a6721	Move module graph creation to testing utils (#34917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34917 Test Plan: Imported from OSS Differential Revision: D20539338 Pulled By: eellison fbshipit-source-id: 5c46c0ce50e5bcccf5abee264f432ded7d36d040	2020-03-23 11:59:02 -07:00
Elias Ellison	77ccb5c14d	Move functional graph creation to testing utils (#34916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34916 Test Plan: Imported from OSS Differential Revision: D20539337 Pulled By: eellison fbshipit-source-id: 9b777e369facebbe68fe198ca3eec055cf9c5257	2020-03-23 11:57:25 -07:00
Bernie Gray	02ab6ced8e	test_complex inherits from common_utils.TestCase; closes #34648 (#34697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34697 Differential Revision: D20596860 Pulled By: anjali411 fbshipit-source-id: 18599fce5bd3513be17ecf83ba9fb0d64d971fc4	2020-03-23 10:49:30 -07:00
Richard J. Knight	21ecb8d870	Fix reference to NO_CUDA and NO_DISTRIBUTED (#34831 ) Summary: - replace the old build variables NO_CUDA and NO_DISTRIBUTED in CONTRIBUTING.md with the new USE_CUDA and USE_DISTRIBUTED versions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34831 Differential Revision: D20512659 Pulled By: colesbury fbshipit-source-id: 2d6cb6fd35886eec0b4b1c94f568b5137407c551	2020-03-23 10:42:01 -07:00
Daya Khudia	506996c77e	[pt][quant] Optimized qadd_scalar (#34925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34925 Optimized path for qadd scalar. qadd_scalar time goes down from 55.840ms for a model to 4.637ms. ### Before ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.12% 155.807us 0.12% 155.807us 155.807us 1 quantized::conv2d 25.50% 31.981ms 25.50% 31.981ms 273.343us 117 quantized::add_scalar 44.53% 55.840ms 44.53% 55.840ms 809.281us 69 quantized::relu6 1.25% 1.570ms 1.25% 1.570ms 22.749us 69 quantized::mul_scalar 10.73% 13.449ms 10.73% 13.449ms 194.914us 69 quantized::mul 16.67% 20.904ms 16.67% 20.904ms 227.220us 92 adaptive_avg_pool2d 0.03% 41.713us 0.69% 862.922us 35.955us 24 _adaptive_avg_pool2d 0.65% 821.209us 0.65% 821.209us 34.217us 24 sigmoid 0.15% 182.344us 0.15% 182.344us 7.928us 23 quantized::add 0.34% 431.939us 0.34% 431.939us 26.996us 16 dropout 0.00% 1.936us 0.00% 1.936us 1.936us 1 view 0.01% 10.281us 0.01% 10.281us 10.281us 1 dequantize 0.00% 4.562us 0.00% 4.562us 4.562us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 125.394ms ``` ### After ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.18% 130.534us 0.18% 130.534us 130.534us 1 quantized::conv2d 42.29% 31.267ms 42.29% 31.267ms 267.243us 117 quantized::add_scalar 6.27% 4.637ms 6.27% 4.637ms 67.205us 69 quantized::relu6 1.77% 1.312ms 1.77% 1.312ms 19.008us 69 quantized::mul_scalar 18.92% 13.991ms 18.92% 13.991ms 202.768us 69 quantized::mul 28.49% 21.059ms 28.49% 21.059ms 228.904us 92 adaptive_avg_pool2d 0.06% 45.242us 1.27% 942.522us 39.272us 24 _adaptive_avg_pool2d 1.21% 897.280us 1.21% 897.280us 37.387us 24 sigmoid 0.22% 160.282us 0.22% 160.282us 6.969us 23 quantized::add 0.56% 416.276us 0.56% 416.276us 26.017us 16 dropout 0.00% 1.245us 0.00% 1.245us 1.245us 1 view 0.01% 7.122us 0.01% 7.122us 7.122us 1 dequantize 0.01% 5.952us 0.01% 5.952us 5.952us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 73.930ms ``` ghstack-source-id: 100595212 Test Plan: buck test //caffe2/test:quantized -- 'test_qadd' --print-passing-details Differential Revision: D20500848 fbshipit-source-id: c292d15da121e6d13cc4eb92f10549874ff6ab0f	2020-03-23 10:07:23 -07:00
Jerry Zhang	3e4076aa9c	[quant][graphmode] quantization work for prim::If (#34518 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34518 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20519606 fbshipit-source-id: 94d49e18d97df642cbcb446df12376f6d2a397bc	2020-03-23 09:54:24 -07:00
albanD	0e0386b434	Revert "[JIT] add id function (#34975 )" (#35209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35209 This reverts commit 62f11f0a354690338d749fd698c9cb7281d13aae. Test Plan: Imported from OSS Differential Revision: D20596847 Pulled By: albanD fbshipit-source-id: e6777e42356aac772e59f0466a92cc13258218c1	2020-03-23 08:42:09 -07:00
Pavel Belevich	2c69fa93b9	Fix _copysign is not a member of std (Windows) (#35199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35199 While running `test_cpp_extensions_aot_no_ninja` and building `rng_extension.cpp` compilation fails with: [C:\Users\circleci\project\build\win_tmp\build\torch\include\ATen/native/Math.h(82): error C2039: '_copysign': is not a member of 'std'](https://app.circleci.com/pipelines/github/pytorch/pytorch/144367/workflows/f939ad40-273f-4492-a19e-3f602509f6f5/jobs/4907947) this PR should fix it based on [MSDN](https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/copysign-copysignf-copysignl-copysign-copysignf-copysignl?view=vs-2019) Test Plan: Imported from OSS Differential Revision: D20591607 Pulled By: pbelevich fbshipit-source-id: 4d61245cfeb37c074f0ee89027b60c581b5e08b9	2020-03-23 08:24:22 -07:00
Jerry Zhang	c85697d74d	[quant][graphmode][fix] use observed_values_ to check values are observed (#34571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34571 Previously we added wrong values to observed_values_ and also it is not used to check if a value is observed or not Test Plan: . Imported from OSS Differential Revision: D20519605 fbshipit-source-id: 6038b2539bcf7d679b7fe5c5a284b81a979934ee	2020-03-23 08:07:43 -07:00
Jerry Zhang	350c522423	[quant][graphmode][refactor] insertObservers for Block (#34414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34414 Previously we insert observers for Graph(graph is a wrapper around a Block), this PR added insertObservers for Block, so that the code can work for the nodes that have sub blocks. Test Plan: . Imported from OSS Differential Revision: D20519604 fbshipit-source-id: 1908913ea7f0898cd7b4f2edd1f81cdfedf8a211	2020-03-23 08:07:38 -07:00
Jerry Zhang	28bf0038e5	[quant][graphmode][fix] Insert dequantize before use node (#34411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34411 To make sure dequantize and the node that uses the dequantized value reside in the block, so that we can do quant fusion Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20519603 fbshipit-source-id: 3e4c68d0a73142716e19ea6a64ae3a5d6d51fa41	2020-03-23 08:07:33 -07:00
Jerry Zhang	4caa0db6e8	[quant][graphmode][fix] preserve the type of original value when inserting dequant node (#34349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34349 Set the output type of dequantize node to the type of original value this is to fix swap dequantize tensor list Test Plan: . Imported from OSS Differential Revision: D20504456 fbshipit-source-id: 9064d7d598a4310e27e2914a072097526448a02c	2020-03-23 08:06:14 -07:00
Pavel Belevich	358ba59f01	Add THP_API to THPGenerator_Wrap (#35194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35194 Fixes: ``` error LNK2001: unresolved external symbol "struct _object * __cdecl THPGenerator_Wrap(struct at::Generator)" (?THPGenerator_Wrap@YAPEAU_object@UGenerator@at@@Z) build\lib.win-amd64-3.6\torch_test_cpp_extension\rng.cp36-win_amd64.pyd : fatal error LNK1120: 1 unresolved externals ``` I forgot it, my fault Test Plan: Imported from OSS Differential Revision: D20591604 Pulled By: pbelevich fbshipit-source-id: e8986948fb50aec50db99a72ad112702cbbe831f	2020-03-23 05:58:09 -07:00
Mike Ruberry	36e36eff2f	Ignores deliberate undefined float->int conversion (#35086 ) Summary: In C++, casting a floating point value to an integer dtype is undefined when the value is outside the dtype's dynamic range. For example, casting 300.5 to Int8 is undefined behavior because the maximum representable Int8 value is 127, and 300.5 > 127. PyTorch, like NumPy, deliberately allows and makes these casts, however, and when we do this we trigger undefined behavior that causes our sanitizers to (correctly) complain. I propose skipping this sanitization on our cast function. The history of this PR demonstrates the issue, showing a single CI failure in the ASAN build when a test is added that converts a large float value to an integral value. The current PR shows a green CI after the sanitization is skipped. There are alternatives to skipping this sanitization: - Clamping or otherwise converting floats to the dynamic range of integral types they're cast to - Throwing a runtime error if a float value is outside the dynamic range of the integral type it's cast to (this would not be NumPy compatible) - Declaring programs in error if they perform these casts (this is technically true) - Preventing this happening in PyTorch proper so the ASAN build doesn't fail None of these alternatives seems particularly appealing, and I think it's appropriate to skip the sanitization because our behavior is deliberate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35086 Differential Revision: D20591163 Pulled By: mruberry fbshipit-source-id: fa7a90609c73c4c627bd39726a7dcbaeeffa1d1b	2020-03-23 01:08:57 -07:00
svcscm	d743c22990	Updating submodules Summary: GitHub commits: `58c002d159` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: c286881229d1c6ce5030a4ab24ace757549578d8	2020-03-23 01:07:14 -07:00
svcscm	a6672f3b30	Updating submodules Summary: GitHub commits: `f50c345bf5` `d72a3bd5fe` `39929d6fa2` `49a56f7ee0` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: f4ec04552d173f59ff1279dc1e2148c5c0a3f623	2020-03-22 20:03:23 -07:00
Michael Suo	082e48e346	skip ctc_loss test on Windows (#35069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35069 It is flaky on Windows only, so disable for now: https://github.com/pytorch/pytorch/issues/34870 Test Plan: Imported from OSS Differential Revision: D20544736 Pulled By: suo fbshipit-source-id: 49e35a4b4f0d1d20157769a4dff22cb4fe86770c	2020-03-22 18:49:53 -07:00
neginraoof	3f2aa07b13	[ONNX] update producer version (#35059 ) Summary: Updating producer version Pull Request resolved: https://github.com/pytorch/pytorch/pull/35059 Reviewed By: hl475 Differential Revision: D20585173 Pulled By: houseroad fbshipit-source-id: af0c4e3860beb899548466ea99be2050150f905d	2020-03-22 15:43:31 -07:00
Jiakai Liu	1783ea43e7	[pytorch] deprecate code analyzer -closure option (#35179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35179 Transitive dependencies are calculated in python script for both OSS custom build and BUCK selective build, so change the c++ analyzer to take -closure=false by default and remove the param from callsites. ghstack-source-id: 100637068 Test Plan: CI Differential Revision: D20586462 fbshipit-source-id: 195849b71cda6228a49ecd2215d3fb8b4da7f708	2020-03-22 14:36:42 -07:00
Pavel Belevich	11a40410e7	pybind11 type_caster for at::Generator and custom RNG python test (#34774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34774 This PR provides pybind11's `type_caster<at::Generator>` that allows mapping `at::Generator` instance returned from user-defined method to python `torch::Generator`, defined as `THPGenerator ` c++ class. This allows 1) defining custom RNG in c++ extension 2) using custom RNG in python code. `TestRNGExtension.test_rng` shows how to use custom RNG defined in `rng_extension.cpp` Test Plan: Imported from OSS Differential Revision: D20549451 Pulled By: pbelevich fbshipit-source-id: 312a6deccf8228f7f60695bbf95834620d52f5eb	2020-03-22 10:57:35 -07:00
vfdev	b248e23de0	Docs fix: Added missing indent (#35017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35017 Differential Revision: D20552491 Pulled By: yf225 fbshipit-source-id: 4481e7aecb9dc4a54ef95dfdddacbe3ff48f1c5f	2020-03-22 09:57:11 -07:00
Kimish Patel	e1c092fe3a	Changes to transition to generic API for ops with weight prepacking (#35010 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35010 semantics. This PR moves all the xnnpack specific interfces to a generic interface. Accordingly removes xnnpac specific reference from API and some variable names. What has not yet changed: TODO: USE_XNNPACK is still used. This can be removed where no XNNPACK specific things are done. e.g., RegisterOpContext.cpp and xnnpack_rewrite.cpp. Also the filename and structure also remains. Some of the generic class definition can be moved non-XNNPACK specific folder. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20526416 fbshipit-source-id: 2e1725345c44bbb26bdc448097a7384eca121387	2020-03-22 08:31:53 -07:00
svcscm	1ff5d9c557	Updating submodules Summary: GitHub commits: `4b36034a2a` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: 3c3d9fcf2fb02579b7a0cc9cebc117c8e6ec394f	2020-03-22 08:30:15 -07:00
svcscm	a5b509985a	Updating submodules Summary: GitHub commits: `4a8bc17fc7` `d7c4a348e0` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: d90b5234232bfe9ff5eaf6b787777f85ed8065b1	2020-03-21 23:05:18 -07:00
Will Feng	cfc0ff1691	Renaming: MultiLabelMarginLossFuncOptions -> MultilabelMarginLossFuncOptions, MultiLabelSoftMarginLossFuncOptions -> MultilabelSoftMarginLossFuncOptions (#35163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35163 This PR is BC-breaking in the following way: Renaming: - `torch::nn::functional::MultiLabelMarginLossFuncOptions` -> `torch::nn::functional::MultilabelMarginLossFuncOptions` - `torch::nn::functional::MultiLabelSoftMarginLossFuncOptions` -> `torch::nn::functional::MultilabelSoftMarginLossFuncOptions` Reason for renaming: to be consistent with the corresponding functional name after camel case to snake case conversion (e.g. the `multilabel_margin_loss` functional should use `MultilabelMarginLossFuncOptions` as options) Test Plan: Imported from OSS Differential Revision: D20582598 Pulled By: yf225 fbshipit-source-id: 0f5bdb8249d901b310875a14320449a2fdfa8ecd	2020-03-21 18:34:46 -07:00
Pavel Belevich	5306713a36	Replace Generator* with Generator that holds std::shared_ptr<GeneratorImpl> (#34468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34468 This PR prepares `at::Generator` for pybind11's `type_caster<at::Generator>` which is required to implement custom RNG in python. The following changes are done: 1. `at::Generator` was moved to `c10::GeneratorImpl` (similar to `c10::TensorImpl`) 2. `at::Generator` was recreated as a holder of `std::shared_ptr<c10::GeneratorImpl>` (similar to `at::Tensor` that holds `c10::intrusive_ptr<c10::TensorImpl>`) 3. Most of `at::Generator*` usages were replaced with `at::Generator` TBD: replacing `Generator generator = nullptr` with `{}` requires JIT changes(adding Generator to IValue?) Differential Revision: D20549420 Pulled By: pbelevich fbshipit-source-id: 4c92a40eab8f033b359bb6c93f4cd84b07ee8d4e	2020-03-21 17:36:10 -07:00
Lu Fang	a100cf5146	Revert D20541090: [JIT][torchbind] Namespaces for torchbind classes Test Plan: revert-hammer Differential Revision: D20541090 Original commit changeset: ce3d9391dd3c fbshipit-source-id: acc1d660fbda611941381315507dfe594c385db1	2020-03-21 12:20:44 -07:00
Will Feng	bbec4520c6	Add inplace tests for several torch::nn modules / functionals (#35147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35147 Test Plan: Imported from OSS Differential Revision: D20578217 Pulled By: yf225 fbshipit-source-id: b8bafa49ee94c7dfbbca6e100ee3d9df5b2b621c	2020-03-21 10:02:56 -07:00
svcscm	f515d87296	Updating submodules Summary: GitHub commits: `b816974e2e` `fff5d32fbc` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: 5e210ce767d542c13ab1c43cbd38bf87ec7f95df	2020-03-21 04:11:47 -07:00
Mikhail Zolotukhin	95ad94c75b	[TensorExpr] Nuke tensorexpr::schedule namespace. (#35126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35126 Test Plan: Imported from OSS Differential Revision: D20569364 Pulled By: ZolotukhinM fbshipit-source-id: c0d51ecadf411918641cdbdc6d8cb06e207d2c9b	2020-03-20 23:39:14 -07:00
Mikhail Zolotukhin	d609f356de	[TensorExpr] Use `const Expr*` instead of `ExprHandle&` in Range. (#35125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35125 Test Plan: Imported from OSS Differential Revision: D20568555 Pulled By: ZolotukhinM fbshipit-source-id: 5f5467641eff9e864831486a2f1ff097281ad0b0	2020-03-20 23:39:09 -07:00
Mikhail Zolotukhin	65cea95777	[TensorExpr] Rename schedule.{cpp,h} to loopnest.{cpp,h}. (#35119 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35119 Differential Revision: D20567927 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 1fb6d03bd4c6e66aca62140d2b537692577f261d	2020-03-20 23:37:51 -07:00
Jiakai Liu	3342ab89ac	[pytorch] revert register c10 ops for static dispatch (#35148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35148 PR #34275 (commit 064c47845380715e290eb335919a18fe3821ee83) causes size regression for BUCK build before BUCK selective build is enabled. This PR reverts partially (adding back #ifndef USE_STATIC_DISPATCH) to fix the size regression. Will wait for BUCK selective build change to land and soak then revert this revert. Test Plan: Imported from OSS Differential Revision: D20578316 Pulled By: ljk53 fbshipit-source-id: 694f01ec7a69fe3758a389e22e9de20ecd867962	2020-03-20 23:13:01 -07:00
Jerry Zhang	3fa7813b9f	[quant] Add dequantize.tensors (#34348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34348 We need this function to do swap dequantize for prim::ListConstruct since the output of prim::ListConstruct is a list of Tensors Test Plan: . Imported from OSS Differential Revision: D20504454 fbshipit-source-id: e6155e37da98e2219a6f79737cd46fe32a509c9f	2020-03-20 22:51:51 -07:00
Jordan Fix	d87750cd04	[caffe2.proto] Add backend_option to PartitionInfo Summary: Att Test Plan: Updated C2 importer test in stack. Reviewed By: yinghai, bangshengtang Differential Revision: D20527162 fbshipit-source-id: cf3d59089b651565db74f2a52af01f26fdfcbca6	2020-03-20 22:43:50 -07:00
Will Feng	a2557970f3	Fix F::interpolate and torch::nn::Upsample implementation (#35025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35025 This PR fixes `F::interpolate` and `torch::nn::Upsample` implementation to match the Python API implementation. This PR is BC-breaking in the following way: There are changes to `UpsampleOptions` and `InterpolateFuncOptions`: - `size` is changed from `std::vector<int64_t>` to `c10::optional<std::vector<int64_t>>`. If you want to pass a list of `int64_t` to this argument, you must pass it as `std::vector<int64_t>`. - `scale_factor` is changed from `std::vector<double>` to `c10::optional<std::vector<double>>`. If you want to pass a list of `double` to this argument, you must pass it as `std::vector<double>`. TODO: cherry-pick this PR into v1.5 release branch. Test Plan: Imported from OSS Differential Revision: D20559892 Pulled By: yf225 fbshipit-source-id: ac18609e351a9f2931eaeced8966b9491b2995f7	2020-03-20 22:37:13 -07:00
Will Feng	c0958c883e	Fix fractional_max_pool3d_with_indices implementation (#35024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35024 TODO: cherry-pick this PR into v1.5 release branch. Test Plan: Imported from OSS Differential Revision: D20559891 Pulled By: yf225 fbshipit-source-id: c2b5c005c0bd560b5a84d4cc9097dbd64ee902c0	2020-03-20 22:37:08 -07:00
Will Feng	ef7fe371ce	Fix Conv and ConvTranspose implementation (#35023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35023 This PR fixes Conv and ConvTranspose implementation to match the Python API implementation. TODO: cherry-pick this PR into v1.5 release branch. Test Plan: Imported from OSS Differential Revision: D20559889 Pulled By: yf225 fbshipit-source-id: 53783a7398ef968ec6d25b6f568fde44907417c5	2020-03-20 22:37:03 -07:00
Will Feng	d7462dcea6	Fix AdaptiveAvgPool{2,3}d and AdaptiveMaxPool{2,3}d implementation (#35022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35022 This PR fixes `AdaptiveAvgPool{2,3}d` and `AdaptiveMaxPool{2,3}d` implementation to match the Python API implementation. Particularly, `output_size` is changed to accept `c10::nullopt` in its elements, matching the Python API behavior. TODO: cherry-pick this PR into v1.5 release branch. Test Plan: Imported from OSS Differential Revision: D20559890 Pulled By: yf225 fbshipit-source-id: ccddbd278dd39165cf1dda11fc0e49387c76dbef	2020-03-20 22:36:57 -07:00
Fei Tian	845b19c4ef	Add weight_scale in Adagrad (#34944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34944 Reviewed By: chonglinsun Differential Revision: D20506032 fbshipit-source-id: ef025e536da01fdcabc783466bc065685b80ab9a	2020-03-20 22:36:51 -07:00
Wanchao Liang	c21fde6421	[jit] make jit/rpc share the same PythonFutureWrapper (#35039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35039 This is the initial step towards merging ivalue future and rpc future Test Plan: Imported from OSS Differential Revision: D20537164 Pulled By: wanchaol fbshipit-source-id: d4f148c88e49ed6b0881ca4b4dd945ea24166183	2020-03-20 22:35:34 -07:00
svcscm	43fc97db88	Updating submodules Summary: GitHub commits: `db31580d1b` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: e5232f60b99f57bbdef30653feac2a87ef9560d2	2020-03-20 20:08:21 -07:00
svcscm	d45e135d89	Updating submodules Summary: GitHub commits: `cd8de9ff9f` `b335c45643` `cc1d36f0cd` `05177629a2` `20e91cc072` `3ee0b7d56f` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: eea82084992bb504e378163594ad6b06822e51a7	2020-03-20 20:08:14 -07:00
Karl Ostmo	4594433319	Add retry to pip usage in mobile job (#35122 ) Summary: To reduce flakiness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35122 Differential Revision: D20573303 Pulled By: kostmo fbshipit-source-id: ef99abe3910498c1da309c4616067055738425f5	2020-03-20 20:08:07 -07:00
George Gensure	bf31b1b6be	Upgrade protobuf as bazel build preamble (#34662 ) Summary: The protobuf bazel definitions are incompatible with recent bazel versions, so as a prerequisite for any bazel build of pytorch, a more recent protobuf must be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34662 Differential Revision: D20570425 Pulled By: malfet fbshipit-source-id: ed4de3eb3fe05f076df93db7175954e797791300	2020-03-20 20:07:59 -07:00
Rohan Varma	e98b8eb35f	[profiler] remove unused _push_range and _pop_range (#35028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35028 removes these methods that are not used anywhere in the codebase. With this we can also remove public declaration of TORCH_API popRange and TORCH_API pushRange since those were the only use cases. ghstack-source-id: 100560207 Test Plan: CI Differential Revision: D20531148 fbshipit-source-id: 8ceaf64449c77259a582a38b1137827ff1ab07f7	2020-03-20 20:07:53 -07:00
Omkar Salpekar	4025729e88	[1.5 Release][RPC Reliability] RRef Idempotency and RPC Retry enablement (#33636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33636 Fixes https://github.com/pytorch/pytorch/issues/32119, https://github.com/pytorch/pytorch/issues/26116, https://github.com/pytorch/pytorch/issues/33072 Makes RRef control messages idempotent and enables sending with retries for distributed autograd cleanup and RRef internal messages. In order to effectively test whether these RRef and distributed autograd cleanup work with network failures/retries, I implemented an RPC Agent with a faulty send function, and enabled running tests using this as a third backend (in addition to Thrift and PGA). The tests using this backend are in a separate class (the test cases are similar but with minor changes to ensure short-running tests wait for retried RPCs to finish). This faulty RPC Agent is pretty configurable. The tests can configure which messages types to fail, and how many messages to fail, but going forward, other RPC functionality can be overriden with faulty methods to test with failures injected. Differential Revision: D20019236 fbshipit-source-id: 540a977e96b2e29aa0393ff12621fa293fe92b48	2020-03-20 20:07:47 -07:00
Jiakai Liu	61b680c012	[pytorch] force c10 schema registration for custom build Summary: PR #32521 has several issues with mobile builds: 1. It didn't work with static dispatch (which OSS mobile build currently uses); 2. PR #34275 fixed 1) but it doesn't fix custom build for #32521; 3. manuallyBoxedKernel has a bug with ops which only have catchAllKernel: `2d7ede5f71` Both 1) and 2) have similar root cause - some JIT side code expects certain schemas to be registered in JIT registry. For example: considering this code snippet: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/frontend/builtin_functions.cpp#L10 ``` auto scalar_operators_source = CodeTemplate( R"SCRIPT( def mul(a : ${Scalar}, b : Tensor) -> Tensor: return b * a ... ``` It expects "aten::mul.Scalar(Tensor self, Scalar other) -> Tensor" to be registered in JIT - it doesn't necessarily need to call the implementation, though; otherwise it will fail some type check: https://github.com/pytorch/pytorch/pull/34013#issuecomment-592982889 Before #32521, all JIT registrations happen in register_aten_ops_.cpp generated by gen_jit_dispatch.py. After #32521, for ops with full c10 templated boxing/unboxing support, JIT registrations happen in TypeDefault.cpp/CPUType.cpp/... generated by aten/gen.py, with c10 register API via RegistrationListener in register_c10_ops.cpp. However, c10 registration in TypeDefault.cpp/CPUType.cpp/... are gated by `#ifndef USE_STATIC_DISPATCH`, thus these schemas won't be registered in JIT registry when USE_STATIC_DISPATCH is enabled. PR #34275 fixes the problem by moving c10 registration out of `#ifndef USE_STATIC_DISPATCH` in TypeDefault.cpp/CPUType.cpp/..., so that all schemas can still be registered in JIT. But it doesn't fix custom build, where we only keep c10 registrations for ops used by specific model directly (for static dispatch custom build) and indirectly (for dynamic dispatch custom build). Currently there is no way for custom build script to know things like "aten::mul.Scalar(Tensor self, Scalar other) -> Tensor" needs to be kept, and in fact the implementation is not needed, only schema needs to be registered in JIT. Before #32521, the problem was solved by keeping a DUMMY placeholder for unused ops in register_aten_ops_.cpp: https://github.com/pytorch/pytorch/blob/master/tools/jit/gen_jit_dispatch.py#L326 After #32521, we could do similar thing by forcing aten/gen.py to register ALL schema strings for selective build - which is what is PR is doing. Measured impact on custom build size (for MobileNetV2): ``` SELECTED_OP_LIST=MobileNetV2.yaml scripts/build_pytorch_android.sh armeabi-v7a ``` Before: 3,404,978 After: 3,432,569 ~28K compressed size increase due to including more schema strings. The table below summarizes the relationship between codegen flags and 5 build configurations that are related to mobile: ``` +--------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------+ \| \| Open Source \| FB BUCK \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| \| Default Build \| Custom Build w/ Stat-Disp \| Custom Build w/ Dyna-Disp \| Full-JIT \| Lite-JIT \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| Dispatch Type \| Static \| Static \| Dynamic \| Dynamic (WIP) \| Dynamic (WIP) \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| ATen/gen.py \| \| \| \| \| \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| --op_registration_whitelist \| unset \| used root ops \| closure(used root ops) \| unset \| closure(possibly used ops) \| \| --backend_whitelist \| CPU Q-CPU \| CPU Q-CPU \| CPU Q-CPU \| CPU Q-CPU \| CPU Q-CPU \| \| --per_op_registration \| false \| false \| false \| false \| true \| \| --force_schema_registration \| false \| true \| true \| false \| false \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| tools/setup_helpers/generate_code.py \| \| \| \| \| \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ \| --disable-autograd \| true \| true \| true \| false \| WIP \| \| --selected-op-list-path \| file(used root ops) \| file(used root ops) \| file(used root ops) \| unset \| WIP \| \| --disable_gen_tracing \| false \| false \| false \| false \| WIP \| +--------------------------------------+---------------------+---------------------------+---------------------------+---------------+----------------------------+ ``` Differential Revision: D20397421 Test Plan: Imported from OSS Pulled By: ljk53 fbshipit-source-id: 906750949ecacf68ac1e810fd22ee99f2e968d0b	2020-03-20 20:07:34 -07:00
Jiakai Liu	064c478453	[pytorch] register c10 ops for static dispatch to unblock c10 boxing Summary: PR #32521 broke static dispatch because some ops are no longer registered in register_aten_ops_.cpp - it expects the c10 registers in TypeDefault.cpp / CPUType.cpp / etc to register these ops. However, all c10 registers are inside `#ifndef USE_STATIC_DISPATCH` section. To measure the OSS mobile build size impact of this PR: ``` # default build: SELECTED_OP_LIST=MobileNetV2.yaml scripts/build_pytorch_android.sh armeabi-v7a # mobilenetv2 custom build: scripts/build_pytorch_android.sh armeabi-v7a ``` - Before this PR, Android AAR size for arm-v7: default build: 5.5M; * mobilenetv2 custom build: 3.2M; - After this PR: * default build: 6.4M; * mobilenetv2 custom build: 3.3M; It regressed default build size by ~1M because more root ops are registered by c10 registers, e.g. backward ops which are filtered out by gen_jit_dispatch.py for inference-only mobile build. mobilenetv2 custom build size regressed by ~100k presumably because the op whitelist is not yet applied to things like BackendSelectRegister. Differential Revision: D20266240 Test Plan: Imported from OSS Pulled By: ljk53 fbshipit-source-id: 97a9a06779f8c62fe3ff5cce089aa7fa9dee3c4a	2020-03-20 20:07:15 -07:00
Shihao Xu	3a772b798a	Auto-format jit/rpc_test.py with flake8-black (#35075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35075 as titled Test Plan: ` Differential Revision: D7684240 fbshipit-source-id: e883bd2357164e204cd433d4a1ad4da643a03fe4	2020-03-20 20:07:09 -07:00
James Reed	e0496a70fc	[JIT][torchbind] Namespaces for torchbind classes (#35054 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35054 Test Plan: Imported from OSS Differential Revision: D20541090 Pulled By: jamesr66a fbshipit-source-id: ce3d9391dd3cdf619042b8f6ba2645f4c1fc875c	2020-03-20 20:07:02 -07:00
Chen, Jian Ping	65ff064763	Parallelize cpu index_put accumulate float path with cpu_atomic_add_float (#29705 ) Summary: This is try to parallelize index_put accumulate path for float type on CPU. cpu_atomic_add_float is implemented by using atomic_compare_exchange_strong function. for [DLRM](https://github.com/facebookresearch/dlrm) benchmark, _index_put_impl_ function time can be reduced from 827.741ms to 116.646ms for 1000 batches Add a parameter "grain_size" to TensorIterator::for_each to fine tune the index_put performance The default value of grain_size is internal::GRAIN_SIZE. The index_put grain size is tuned to 3000 and cpu_kernel_vec grain size is tuned to 1024. The following is the grain size impact on the DLRM ops ( _index_put_impl_ based on index_put been parallellized with cpu_atomic_add_float): \| Op Name \| without small grain_size \| with 1024 as grain_size in cpu_kernel_vec and 3000 in cpu_index_kernel \| \|-----------------\|----------:\|----------:\| \| add_ \| 11.985s \| 11.601s \| \| mm \| 9.706s \| 9.518s \| \| addmm \| 5.380s \| 5.247s \| \| _embedding_bag \| 2.992s \| 2.663s \| \| _embedding_bag_backward \| 1.330s \| 1.354s \| \| threshold_backward \| 686.920ms \| 659.169ms \| \| _index_put_impl_ \| 489.411ms \| 116.646ms \| \| bmm \| 413.129ms \| 362.967ms \| \| zero_ \| 379.659ms \| 310.623ms \| \| add \| 205.904ms \| 171.111ms \| \| cat \| 187.101ms \| 175.621ms \| \| Self CPU time total (s) \| 36.544 \| 34.742 \| \| Average ms per iteration \| 38.25 \| 36.44 \| The more reason for grain size tuning, please further look at [PR#30803](https://github.com/pytorch/pytorch/issues/30803) to get the DLRM performance here, please also have a look at [PR#23057](https://github.com/pytorch/pytorch/pull/23057), [PR#24385](https://github.com/pytorch/pytorch/pull/24385) and [PR#27804](https://github.com/pytorch/pytorch/pull/27804) and expose the env vars as below: ``` export LD_PRELOAD=$HOME/anaconda3/lib/libjemalloc.so (conda install jemalloc) export KMP_BLOCKTIME=1 export KMP_AFFINITY="granularity=fine,compact,1,0" ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29705 Differential Revision: D19777742 Pulled By: VitalyFedyunin fbshipit-source-id: a8222fe6089b6bf56b674e35f790508ad05385c0	2020-03-20 20:06:54 -07:00
Kimish Patel	3e58cba3c5	Fixes the Conv2d batch_norm folding for various cases. (#34932 ) Summary: This PR adds a preprocessing step in Conv2dBatchNorm folding. It traverses the module to check if the bias of Conv2d module is set to None. If it is, it assume that this a traced module and insert Optional[Tensor] type bias. Furthermore it insert getAttr for bias in the forward graph and fixes _convolution op to take values from getAttr. It also fixes parametere extraction from BN module which may not have weight and bias attributes if affine was set to False. In scripted mode such a BN module will get weight and bias attributes set to None. For the case of eps it gets const propagated in tracing. This is also fixed. Few tests cases are added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34932 Test Plan: python test/test_jit.py TestJit.test_foldbn_trivial python test/test_jit.py TestJit.test_foldbn_trivial_nobias python test/test_jit.py TestJit.test_foldbn_in_submodule python test/test_jit.py TestJit.test_foldbn_shared_classtype python test/test_jit.py TestJit.test_foldbn_complex_cases python test/test_jit.py TestJit.test_nofoldbn_complex_cases Differential Revision: D20536478 Pulled By: kimishpatel fbshipit-source-id: 4e842976a380d0575a71001bb4481390c08c259e	2020-03-20 20:06:44 -07:00
Xiang Gao	df8d6eeb19	Update docs about DP and DDP for CUDA (#35063 ) Summary: We should recommend DDP instead of DP. Hope we can also cherry-pick this for 1.5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35063 Differential Revision: D20549621 Pulled By: ngimel fbshipit-source-id: 86b1b2134664065cc6070ea4212895f993eaf543	2020-03-20 20:06:37 -07:00
Ivan Kobzarev	f9cddff25a	[android] Preload module actions do only once (#32313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32313 `torch::autograd::profiler::pushCallback()`, `torch::jit::setPrintHandler` should be called only once, not before every loading `JITCallGuard guard;` not needed before loading module and has no effect Test Plan: Imported from OSS Differential Revision: D20559676 Pulled By: IvanKobzarev fbshipit-source-id: 70cce5d2dda20a00b378639725294cb3c440bad2	2020-03-20 20:06:25 -07:00
Hao Lu	4bd5d1b3be	[TVM] Use caffe2_predictor_model_shape_hints to pass shape_hints to TVM (#35091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35091 Test Plan: AI/AF canary to make sure it does not affect production: https://our.intern.facebook.com/intern/ads/canary/425387509869003921/ https://our.intern.facebook.com/intern/ads/canary/425387881631488449/ Glow: ``` buck test glow: ``` Reviewed By: yinghai Differential Revision: D20552830 fbshipit-source-id: bdf65fb0ba945963a7c9621cc3f7ea5ebaecb907	2020-03-20 20:06:17 -07:00
Gregory Chanan	ca1e2cda05	Port set_ to ATen. (#34403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34403 Test Plan: Imported from OSS Differential Revision: D20342300 Pulled By: gchanan fbshipit-source-id: 8dd223b19539137fab36dd0e751b19e0b4507959	2020-03-20 20:06:11 -07:00
Elias Ellison	62f11f0a35	[JIT] add id function (#34975 ) Summary: add `id` function so to give uses a way of keeping a `seen` set of nn modules. n practice, this is only used between values of `T` and `T` or `T` and `Optional[T]`, so in this implementation I made it so that None is the only value that can be zero. Python also only guarantees `id()` gives semantically meaningful results for pointer types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34975 Differential Revision: D20549677 Pulled By: eellison fbshipit-source-id: cca5ed4ef013f7540f93abf49f91f9830dfdca14	2020-03-20 20:03:10 -07:00
Mikhail Zolotukhin	12f0052eee	Add TensorExpr Fuser tests (resubmit). (#35085 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35085 Test Plan: Imported from OSS Differential Revision: D20552334 Pulled By: ZolotukhinM fbshipit-source-id: 628fcf4719a879f18978ff8a0a64afbb045df645	2020-03-20 13:19:31 -07:00
Owen Anderson	3c409fc66c	Add guard elimination cases for operators encountered on an RL workload. (#34967 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34967 Differential Revision: D20547478 Pulled By: resistor fbshipit-source-id: da7df159fd6098d0f1278b8088bbbe6717b79cfc	2020-03-20 13:04:44 -07:00
Edward Yang	faa853fefb	Revert D20254663: [pytorch][PR] Vectorize in-place comparison operators Test Plan: revert-hammer Differential Revision: D20254663 Original commit changeset: 68b7109ec435 fbshipit-source-id: 73474d88a7bb96448428ea5ff780e77163a00f88	2020-03-20 13:02:21 -07:00
Ivan Kobzarev	ea41bf3100	[android] Maven publishing license fix (#32474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32474 Test Plan: Imported from OSS Differential Revision: D20559815 Pulled By: IvanKobzarev fbshipit-source-id: 69a4fe951d331eb311bf821f94b372ccecdf1fd6	2020-03-20 12:27:08 -07:00
Mikhail Zolotukhin	8998a1b3d3	Add tensorexpr benchmarks. (#35064 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35064 Test Plan: Imported from OSS Differential Revision: D20543695 Pulled By: ZolotukhinM fbshipit-source-id: 1cf294ab19465cb93557c2b195252c739b40a0f7	2020-03-20 12:01:31 -07:00
Vasiliy Kuznetsov	bf41a7624e	fix missing comma in activation benchmarks (#35104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35104 I missed this in https://github.com/pytorch/pytorch/pull/34959 after a rebase, fixing. Test Plan: running benchmarks no longer crashes CI Imported from OSS Differential Revision: D20560908 fbshipit-source-id: a5494e23953d3c9007e9874d673896291b5322e0	2020-03-20 11:36:05 -07:00
anjali411	7d5a899883	randn cuda kernel complex dtype (#35056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35056 Differential Revision: D20559396 Pulled By: anjali411 fbshipit-source-id: 64b911f893e9c54aef89e8c1e643998d8b70e613	2020-03-20 11:19:08 -07:00
Hong Xu	451e4d578d	Define +, -, *, / between complex numbers and integers (#34506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34506 Test Plan: Imported from OSS Differential Revision: D20559619 Pulled By: anjali411 fbshipit-source-id: c63cb3c07f694c10328fc17f99d69d7134e5c67a	2020-03-20 11:17:03 -07:00
Hong Xu	91d39de149	Vectorize in-place comparison operators (#33252 ) Summary: Benchmark: (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz) ```python import timeit for op in ('gt', 'lt', 'ge', 'le', 'eq', 'ne'): for dtype in ('torch.float', 'torch.double', 'torch.int16', 'torch.int32', 'torch.int64'): for n, t in [(10_000, 100000), (100_000, 10000)]: print(f'a.{op}_(b), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit(f'a.{op}_(b)', setup=f'import torch; a = torch.arange(1, {n}, dtype={dtype}); b = torch.arange({n}, 1, -1, dtype={dtype})', number=t)) ``` Before: ``` a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.778998922000028 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.6359690249992127 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.double 1.0801493119997758 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.9360321379990637 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.7341018620008981 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.6345281440007966 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.7396387640001194 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.6429641230006382 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.7759611700003006 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.6672059659995284 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.7724312530008319 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.6392585769990546 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.7917451840003196 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.6455550159989798 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.739991647998977 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.6572993859990675 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.7627949479992822 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.6476544910001394 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.7965036850000615 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.6780715599998075 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.7653547080008138 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.6383065829995758 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.7895260240002244 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.6508346030004759 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.7409299750015634 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.6383492870008922 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.7620547579990671 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.6474270239996258 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.8070051169997896 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.6712598600006459 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.7627660060006747 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.6406353189995571 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.double 1.0826010620003217 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.9391552950000914 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.7427801039993938 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.6365172640016681 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.7679271510005492 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.6453389289999905 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.788032889000533 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.6708840760002204 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.float 1.078837263999958 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.9397531720005645 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.double 1.1031508050000411 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.9412319389994082 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.7509566959997755 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.638570957000411 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.7592877549996047 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.6458840529994632 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.7984061539991671 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.6776346309998189 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.7724407899986545 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.6581534130000364 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.8303323249983805 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.6954390920000151 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.745512373998281 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.6360954970004968 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.7569978400006221 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.6450422030011396 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.7889118379989668 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.6693385389989999 ``` After: ``` a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.2444220920006046 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.2031730359994981 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.35491806199934217 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.3905606850003096 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.16665379499863775 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.10095906300011848 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.21650469999985944 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.18737469400002738 a.gt_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.35481256200000644 a.gt_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.36696120199849247 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.21976138800164335 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.20275393200063263 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.3695997209997586 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.39441510399956314 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.15657078300137073 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.0992998069996247 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.20425128799979575 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.20352934599941364 a.lt_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.35883567900054913 a.lt_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.39059587599876977 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.21457727400047588 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.18836135499986995 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.35971907199927955 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.3688875009993353 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.1576009280015569 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.09524034199966991 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.2064543649994448 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.18726435600001423 a.ge_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.35351785300008487 a.ge_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.3680737989998306 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.2132134399998904 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.2140274829998816 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.36539215199991304 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.39128020300086064 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.15712150600120367 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.10149904400168452 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.2103407699996751 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.2134442910009966 a.le_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.35387034300038067 a.le_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.38917528399906587 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.2190484450002259 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.2030815980015177 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.3710030169986567 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.36419657899932645 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.15986497499943653 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.10145393699895067 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.21011781599918322 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.20121852699958254 a.eq_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.36681504499938455 a.eq_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.364472848999867 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.float 0.2290963309988001 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.float 0.21674784300012107 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.double 0.3829616689999966 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.double 0.39437660300063726 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.int16 0.1661020749997988 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.int16 0.10052955100036343 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.int32 0.21827425599985872 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.int32 0.21522501399886096 a.ne_(b), numel() == 10000 for 100000 times, dtype=torch.int64 0.37058242300008715 a.ne_(b), numel() == 100000 for 10000 times, dtype=torch.int64 0.39304063900090114 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33252 Differential Revision: D20254663 Pulled By: ezyang fbshipit-source-id: 68b7109ec4359434afbeb96df372e29608f501bb	2020-03-20 10:31:34 -07:00
Enealor	8bcedf7da2	Adds truncated normal initializer (#32397 ) Summary: This adds the `trunc_normal_` function to `torch.nn.init` which allows for modifying tensors in-place to values drawn from a truncated normal distribution. I chose to use the inverse CDF method to implement this. I have included the appropriate code in `test_nn.py` for verifying that the values are from the correct distribution. Reasons I chose this method: 1. Easily implemented to operate on memory in place, as the other initializers are. 1. No resampling delays 1. This method's main weakness is unlikely to be an issue. While the inverse CDF method can fail to generate the correct distribution when `b < mean` or `mean < a`, I expect users will choose `a` and `b` so that `a < mean < b`. This method is extremely effective in this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32397 Differential Revision: D20550996 Pulled By: ezyang fbshipit-source-id: 298a325043a3fd7d1e24d266e3b9b6cc14f81829	2020-03-20 10:29:05 -07:00
Xiao Wang	a5b5ea9852	use new cuda kernel launch code in nvprof parsing (#35016 ) Summary: This PR would fix https://github.com/pytorch/pytorch/issues/33986. The meaning of cbid 13 and 211 can be found at here `837c094852/nvprof2json.py (L238)` `837c094852/nvprof2json.py (L436)` or it can also be found in the header file at `/usr/local/cuda/extras/CUPTI/include/cupti_runtime_cbid.h`. Please also check [this at stackoverflow](https://stackoverflow.com/questions/48552390/whats-the-difference-between-launching-with-an-api-call-vs-the-triple-chevron-s). I also executed the profiling code (in the issue) on CUDA 9.2, and the cbid is already changed to 211. Just in case someone would build pytorch against older CUDA versions, I leave both 13 and 211 in the assertion. cc csarofeen ptrblck ezyang ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/35016 Differential Revision: D20550879 Pulled By: ezyang fbshipit-source-id: 968efc5e1126f1dd31acc9f5f4463f351d8a4c4f	2020-03-20 08:23:52 -07:00
Zhonghao Liu	e3272559e4	[caffe2] SWA operator (#34394 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34394 # SWA operator In this diff, we added a new operator `SWA` which will be used in `AdaGradOptimizer`. The algorithm looks like: {F230902995} # Background In our testings, we found that this operator could improve our models' reproducibility a lot. (KT: 0.86 -> .92) So we hope to land this operator and in future, enable this by default in our Models. Test Plan: Local build `aml.dper3:30f068668cfb408fbb40141fb17129f2` and bento kernel. - Local test: n215857 - f174600345 Reviewed By: chocjy Differential Revision: D20165239 fbshipit-source-id: c03cdd048cb10b091e5f06323f4c0f3999f95d8a	2020-03-20 08:17:08 -07:00
anjali411	781f590f33	[C++ API Parity] Add xor_convergence test for lbfgs (#35001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35001 Differential Revision: D20548983 Pulled By: anjali411 fbshipit-source-id: 1f858635d0680c0109d1ef348b7df4d3844fe0a6	2020-03-20 06:57:24 -07:00
Nikita Shulga	1c958f8ef9	`Engine::~Engine()` should wait for non-reentrant threads to shutdown (#34529 ) Summary: Because `this` must be valid while `Engine::main_thread` is running, at least for non-reentrant worker threads Pull Request resolved: https://github.com/pytorch/pytorch/pull/34529 Test Plan: Run `test_api --gtest-filter=ModulesTest.InstanceNorm1d` in a loop Differential Revision: D20552717 Pulled By: malfet fbshipit-source-id: a0197671db1b7b1499dda675e43e0826f368bf0d	2020-03-20 00:49:48 -07:00
Yanli Zhao	ec9f680973	Enforce rref python pickling to be in the scope of RPC call (#34755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34755 This diff disallows to use python pickler to pickle RRef. RRef can only be pickled in the scope of RPC call using _InternalRPCPickler. ghstack-source-id: 100481337 Test Plan: unit tests Differential Revision: D20453806 fbshipit-source-id: ebd4115ee01457ba6958cde805afd0a87c686612	2020-03-19 23:43:45 -07:00
Yinghai Lu	6000dca5df	[nomnigraph] Copy device option when customize the op conversion (#34976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34976 Previously, we are dropping the original device option info when we override the operator conversion function. Test Plan: ``` buck test caffe2/caffe2/opt:converter_nomigraph_test ``` Reviewed By: ipiszy Differential Revision: D20507277 fbshipit-source-id: 66b5eab07d18651eff27dab2a809cd04872ac224	2020-03-19 22:48:28 -07:00
Mike Ruberry	fe276d541e	Revert D20541921: [pytorch][PR] [RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) Test Plan: revert-hammer Differential Revision: D20541921 Original commit changeset: abb5488dca86 fbshipit-source-id: d2c6038978f80e5429632f8b49107090a8a247f4	2020-03-19 22:39:12 -07:00
Lingyi Liu	0ccdec6b4c	Revert e7fc55e (#35080 ) Summary: resubmit D20464855 and also Fix the broken test due to D20464855 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35080 Differential Revision: D20551174 Pulled By: lly-zero-one fbshipit-source-id: 5a0547a64365c556c3a677a9512423047497cc85	2020-03-19 22:32:32 -07:00
Wojciech Baranowski	eb78f7ea41	torch.cat: disallow inputs on different devices (#35053 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/35045 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35053 Differential Revision: D20545517 Pulled By: ngimel fbshipit-source-id: eee3fc87c7e578ff44d69d5ce6f92a8f496fa97b	2020-03-19 22:06:39 -07:00
Ksenija Stanojevic	89110fbe6c	Fix torch.mm export to ONNX (#34661 ) Summary: torch.mm is exported as Gemm operator in ONNX and both have an optional input: out. out is considered as broadcastable in Gemm and during graph optimization the optional input (out) would get selected. Since out is optional, in case when it is not defined in torch.mm that would result in the following exception: IndexError: vector::_M_range_check: __n (which is 2) >= this->size() (which is 2) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34661 Reviewed By: hl475 Differential Revision: D20496398 Pulled By: houseroad fbshipit-source-id: e677aef0a6aefb1f83a54033153aaabe5c23bc0f	2020-03-19 21:59:34 -07:00
Hong Xu	e65ac7af14	Also vectorize complex types in fill. (#34973 ) Summary: Given that complex types have also been vectorized, there is no need to handle complex types differently in fill. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34973 Differential Revision: D20551014 Pulled By: ezyang fbshipit-source-id: e0cb519aa17f90b7a2d70700b32b80acb0d41b14	2020-03-19 21:22:04 -07:00
Zafar Takhirov	463f7920bd	repr and _*state_dict for qRNN (#31540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31540 Fixes #31468 Test Plan: Imported from OSS Differential Revision: D19205894 Pulled By: z-a-f fbshipit-source-id: 80c36f74aa20a125ea8d74a54e9905576f1bc6d7	2020-03-19 20:49:50 -07:00
Michael Carilli	991b97277a	[RELAND] Eager autocasting, out-of-place ops only (with MSVC 2017 fix) (#35011 ) Summary: https://github.com/pytorch/pytorch/pull/32140 was approved and merged, but [reverted](`d0577e19f0`) because it broke builds with versions of Visual Studio older than 15.8 that were not represented in public CI. The build failures were caused by a [known VS bug](https://developercommunity.visualstudio.com/content/problem/27729/allow-function-with-internal-linkage-as-template-n.html), fixed in versions 15.8 and newer. The present PR reverts the revert (restoring https://github.com/pytorch/pytorch/pull/32140 's diffs) and adds a workaround to enable compilation with VS < 15.8. The workaround isn't pretty, but it's guarded by macros such that it's only used when compiling with VS < 15.8. All other builds compile with the same code/control flow as was merged in https://github.com/pytorch/pytorch/pull/32140. Original description of https://github.com/pytorch/pytorch/pull/32140: > Initial integration of eager autocasting, supporting out-of-place ops only for easier review. Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081 > In-place ops and ops with user-supplied out=... can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/issues/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/35011 Differential Revision: D20541921 Pulled By: ezyang fbshipit-source-id: abb5488dca8620b0daac4306ebf2bb47fc36e4f5	2020-03-19 20:18:18 -07:00
rohithkrn	edb794fb19	[ROCm] Enable BFloat16 type for TopK operator on ROCm. (#34849 ) Summary: This PR enables bfloat16 for topk on ROCm. iotamudelta ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/34849 Differential Revision: D20544732 Pulled By: ezyang fbshipit-source-id: 1ad017a4403d2a429d98e60c8eb1f78b320df920	2020-03-19 20:04:08 -07:00
Shen Li	bb63710c9a	Reduce the number of iterations in test_autograd_context (#35037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35037 Closes #34960 Cannot reproduce the test failure in dev server, local machine, and the CI env that captured the failure. However, the failed test takes very long (~10sec) in MacOS, so reducing the number of iterations to make it lighter. Re-enable the test and will monitor if the error occurs again. Test Plan: Imported from OSS Differential Revision: D20536272 Pulled By: mrshenli fbshipit-source-id: 577822574e5f6271f1cbb14b56c68c644291713e	2020-03-19 19:48:54 -07:00
Natalia Gimelshein	3c90a90730	Revert D20540599: Add TensorExpr Fuser tests. Test Plan: revert-hammer Differential Revision: D20540599 Original commit changeset: ced9b6657fe7 fbshipit-source-id: e8fa11f20207c35f39b3fbe6f45fc627715377c1	2020-03-19 18:37:32 -07:00
Mike Ruberry	e7fc55ef7b	Revert D20464855: [pytorch][PR] Add the fusion of quantized batchnorm and relu Test Plan: revert-hammer Differential Revision: D20464855 Original commit changeset: 57090d427053 fbshipit-source-id: e7c50b5e7cd27a479539d7ee17580118377971c5	2020-03-19 18:31:11 -07:00
Yanli Zhao	a4afac6076	enforce rref JIT pickling to be in the scope of rpc calls (#34689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34689 rref JIT pickling is only allowed inside rpc calls. enforcing this by adding a thread local variable isInRpcCall and set it as True when converting rpc requests or responses to message, before calling JIT::pickle(). Inside JIT::pickle(), it allowes to pickle RRef only when the isInRpcCall is true. ghstack-source-id: 100481001 Test Plan: unit tests Differential Revision: D20429826 fbshipit-source-id: dbc04612ed15de5d6c7d75a4732041ccd4ef3f8c	2020-03-19 18:07:39 -07:00
Michael Suo	8210b2054e	Move ivalue tests to aten (#34985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34985 IValue is part of the overall runtime system, not just the JIT. So it should be tested in the ATen tests. The real motivation though is so that I can use gtest directly, not the hacked-up version the JIT uses. Test Plan: Imported from OSS Differential Revision: D20537902 Pulled By: suo fbshipit-source-id: 09897e015ecde24aa8996babeaa08d98db90ef0d	2020-03-19 17:56:37 -07:00
Michael Suo	5f32dfca16	Add equality comparison to c10::Dict (#34892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34892 Same rationale and implementation as in https://github.com/pytorch/pytorch/pull/34856 Test Plan: Imported from OSS Differential Revision: D20493169 Pulled By: suo fbshipit-source-id: 46d79a4ff5d4af2964cfaeb2c43f56decadf3201	2020-03-19 17:56:32 -07:00
Michael Suo	90045ce5e0	Add equality comparisons to c10::List (#34856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34856 This PR adds Python-like equality comparisons to our List type. - `operator==` performs equality by value. - `is` performs equality by identity. The overall goal is that I want to define equality on `IValue` to avoid people implementing their own broken versions. So, we should have equality reasonably defined on all types that `IValue` can be. smessmer raises the concern that C++ people expect `operator==` on reference types to test identity. I think that's a reasonable concern, but in practice, it seems that people are defining equality functions to do it by value anyway, just poorly. My claim is that if we just tell people that TorchScript types behave like Python types, it will not be super confusing. Test Plan: Imported from OSS Differential Revision: D20483462 Pulled By: suo fbshipit-source-id: ba2909daa6778924293ed6ef456ab9fc84215442	2020-03-19 17:55:16 -07:00
Nikita Shulga	d3f5045bf5	PyTorch should always depend on `future` (#35057 ) Summary: Because `past` is used in `caffe2.python.core` Pull Request resolved: https://github.com/pytorch/pytorch/pull/35057 Test Plan: CI Differential Revision: D20547042 Pulled By: malfet fbshipit-source-id: cad2123c7b88271fea37f21e616df551075383a8	2020-03-19 17:31:47 -07:00
Supriya Rao	33dcaaa872	[quant][onnx] Add aten::max_pool2d to jit pass (#34912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34912 max_pool2d quantized op actually shows up as aten::max_pool2d Test Plan: python test/test_pytorch_onnx_caffe2_quantized.py Imported from OSS Differential Revision: D20497780 fbshipit-source-id: 5524ae41676c2d6de1ae3544fe36ac24f2a77b19	2020-03-19 16:57:20 -07:00
Lingyi Liu	fd57e0901e	remove the slow path(NCHW) for avg_pool3d (#34994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34994 Use the fast path for NCHW input tensor Test Plan: run the pool unit tests Differential Revision: D20522082 fbshipit-source-id: 6e834425d06fbb1a105d851c2c36ef73df9de08f	2020-03-19 16:20:00 -07:00
Rohan Varma	d2d26bf643	[rpc] fix test_debug_info for python 3.5 (#34828 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34828 python 3.5 does not ensure ordering of dictionary keys, this was added in python 3.6+. Fixing this so the test is no longer flaky in 3.5. Tested by 500 stresstests with python 3.5 ghstack-source-id: 100426555 Test Plan: 500 stress tests in python 3.5 Differential Revision: D20474996 fbshipit-source-id: 89b614a32363d1e7f3f7a4f27bec4fd7d507721d	2020-03-19 16:12:58 -07:00
Lingyi Liu	733b6315fd	Add the fusion of quantized batchnorm and relu (#34795 ) Summary: As title, we want to support the BN2d_relu and BN3d_relu Test to be added! Pull Request resolved: https://github.com/pytorch/pytorch/pull/34795 Differential Revision: D20464855 Pulled By: lly-zero-one fbshipit-source-id: 57090d427053c9c94c1b387b33740a7e61261a9d	2020-03-19 16:01:00 -07:00
Yinghai Lu	851579d868	[Onnxifi] Blacklist ops in the partitions that are supposed to run on CPU (#34991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34991 The definition for the partition to be run on CPU is that it will contain an empty device_id list. We chose this over an op with no partitioning info because 1. Backward compatible with models that don't have partitioning info 2. Being explicit can flush out issues in earlier stage. Test Plan: ``` LD_LIBRARY_PATH=third-party-buck/platform007/build/fb-nnpi/lib ./sigrid/predictor/tests/scripts/launch_ads_test_predictor.sh -g --nnpi --force_models=175742819_0 --sigrid_force_model_dir=$HOME/models/ --smc_server_port=7447 --glow-num-devices=1 --glow_interpreter_memory=$((256<<20)) --caffe2_fbgemm_fake_fp16_clamp --glow_global_fp16 --glow_clip_fp16 --glow_global_fused_scale_offset_fp16 --fbgemm_deserialize_to_original_format --caffe2_dump_input_of_type=Onnxifi --caffe2_logging_print_tensor --caffe2_predictor_use_memonger=no --onnxifi_debug_mode=true --caffe2_dump_input_with_recordio --caffe2_predictor_onnxifi_max_batch_size=32 --caffe2_predictor_onnxifi_max_seq_size=9600 --glow_onnxifi_backend=Interpreter --onnxifi_blacklist_ops=SparseLengthsSum,SparseLengthsWeightedSum --glow_dump_graph ``` Now it hits a new error. Reviewed By: ipiszy Differential Revision: D20503167 fbshipit-source-id: 5a609760130bd1131e299ce85b7824cbcbdf1f09	2020-03-19 15:10:49 -07:00
Mikhail Zolotukhin	7b59f41009	Add TensorExpr Fuser tests. (#35052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35052 Differential Revision: D20540599 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: ced9b6657fe72bca61833ab5d59bdaddcacd114b	2020-03-19 14:31:54 -07:00
Hong Xu	02e16b38f3	Remove the use of two magic numbers in vec256 (#35003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35003 Differential Revision: D20536130 Pulled By: albanD fbshipit-source-id: 7e3c46eeebfffff045f53c60c7b510fad62d6a98	2020-03-19 14:20:21 -07:00
Pritam Damania	7065c46ea2	Respect dist autograd context in torch.jit._fork. (#34360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34360 The distributed autograd context sets up a thread local context id which is used to perform appropriate book keeping and autograd recording of RPC functions in the forward pass. However, if we use torch.jit._fork within the distributed autograd context, the code executed within torch.jit._fork will lose this context since it is run in a separate JIT thread and the thread local is not set in that thread. To fix this problem, we pass in the distributed autograd context to torch.jit._fork similar to what we did in https://github.com/pytorch/pytorch/pull/16101. ghstack-source-id: 100445465 Test Plan: waitforbuildbot Differential Revision: D20301352 fbshipit-source-id: aa3fffe69c2b40722c66213351a4e0d77484a621	2020-03-19 14:12:28 -07:00
Chunli Fu	b3fccda4a9	[DPER3][Shape inference] Update Shape Information in dper3 backend (#34475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34475 Differential Revision: D20332799 fbshipit-source-id: 16aa7399eb48ce4d1d0f8431941ae1252322c382	2020-03-19 13:49:34 -07:00
Eli Uriegas	c957580133	Add promotion pipeline for S3 and conda artifacts (#34993 ) Summary: Adds a new promotion pipeline for both our wheel packages hosted on S3 as well as our conda packages hosted on anaconda. Promotion is only run on tags that that match the following regex: /v[0-9]+(\.[0-9]+)*/ Example: v1.5.0 The promotion pipeline is also only run after a manual approval from someone within the CircleCI security context "org-member" > NOTE: This promotion pipeline does not cover promotion of packages that > are published to PyPI, this is an intentional choice as those > packages cannot be reverted after they have been published. TODO: Write a proper testing pipeline for this Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34993 Differential Revision: D20539497 Pulled By: seemethere fbshipit-source-id: 104772d3c3898d77a24ef9bf25f7dbd2496613df	2020-03-19 13:36:51 -07:00
Vasiliy Kuznetsov	37b234a880	quantized hardsigmoid, take 2 (#34959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34959 Adds quantized implementation of hardsigmoid. Original PR was https://github.com/pytorch/pytorch/pull/34607 and had to be reverted for a test breakage, trying again. Test Plan: tests benchmarks Imported from OSS Differential Revision: D20514212 fbshipit-source-id: cc7ae3b67757e2dde5c313c05ce60a0f2625d961	2020-03-19 13:27:22 -07:00
Rohan Varma	74009dc558	[profiler] use swap in allocBlock to reduce time the lock is held. (#34499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34499 RangeEventList::allocBlock currently iterates through `blocks`, which we serialize access to and accumulates them into `result`. Instead of doing this, we can swap with an empty `forward_list` in constant time, and then unlock, and use this local list in order to populate `result`. ghstack-source-id: 100426115 Test Plan: existing profiler tests pass Differential Revision: D20346423 fbshipit-source-id: 0e567b56049daa371051ccec6c5d1630a92db15f	2020-03-19 13:07:35 -07:00
peter	e433271320	Install CUDA manually on Windows CI to avoid flakiness (#34940 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34821. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34940 Differential Revision: D20538056 Pulled By: ezyang fbshipit-source-id: af1b2e9f9a80796763025d01eade3ab31b9d0cdb	2020-03-19 12:23:55 -07:00
Yinghai Lu	53539567cb	[Onnxifi] Copy partitioning info when lowering to glow Summary: So that Glow can use this info to do actual function partitioning. Reviewed By: jfix71 Differential Revision: D20502439 fbshipit-source-id: 0ade94b49b49172dc9370d1fc96454ade52ff269	2020-03-19 12:15:04 -07:00
svcscm	226f559394	Updating submodules Summary: GitHub commits: `ced74147c3` `33849b670b` `63bf7655e4` `d70eb504b7` `442404558a` `fbf509dcb5` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: a3eb6b95a915e85e88719ca5870e5c34f4dfed7f	2020-03-19 11:19:16 -07:00
Daya Khudia	7335f079ab	[pt][quant] qmul and qadd should preserve input memory format (#34834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34834 They should keep the activations in channelLast format, i.e., the same as input tensors to these operations. ### Before ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.06% 129.181us 0.06% 129.181us 129.181us 1 quantized::conv2d 21.74% 47.744ms 21.74% 47.744ms 408.067us 117 quantized::add_scalar 16.36% 35.930ms 16.36% 35.930ms 520.726us 69 quantized::relu6 0.69% 1.515ms 0.69% 1.515ms 21.959us 69 quantized::mul_scalar 6.08% 13.364ms 6.08% 13.364ms 193.676us 69 quantized::mul 53.17% 116.781ms 53.17% 116.781ms 1.269ms 92 adaptive_avg_pool2d 0.02% 42.700us 1.61% 3.527ms 146.948us 24 _adaptive_avg_pool2d 1.59% 3.484ms 1.59% 3.484ms 145.169us 24 sigmoid 0.08% 173.702us 0.08% 173.702us 7.552us 23 quantized::add 0.20% 445.648us 0.20% 445.648us 27.853us 16 dropout 0.00% 2.598us 0.00% 2.598us 2.598us 1 view 0.00% 10.311us 0.00% 10.311us 10.311us 1 dequantize 0.00% 4.645us 0.00% 4.645us 4.645us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 219.627ms ``` ### After ``` ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- quantize_per_tensor 0.12% 155.807us 0.12% 155.807us 155.807us 1 quantized::conv2d 25.50% 31.981ms 25.50% 31.981ms 273.343us 117 quantized::add_scalar 44.53% 55.840ms 44.53% 55.840ms 809.281us 69 quantized::relu6 1.25% 1.570ms 1.25% 1.570ms 22.749us 69 quantized::mul_scalar 10.73% 13.449ms 10.73% 13.449ms 194.914us 69 quantized::mul 16.67% 20.904ms 16.67% 20.904ms 227.220us 92 adaptive_avg_pool2d 0.03% 41.713us 0.69% 862.922us 35.955us 24 _adaptive_avg_pool2d 0.65% 821.209us 0.65% 821.209us 34.217us 24 sigmoid 0.15% 182.344us 0.15% 182.344us 7.928us 23 quantized::add 0.34% 431.939us 0.34% 431.939us 26.996us 16 dropout 0.00% 1.936us 0.00% 1.936us 1.936us 1 view 0.01% 10.281us 0.01% 10.281us 10.281us 1 dequantize 0.00% 4.562us 0.00% 4.562us 4.562us 1 ------------------------- --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 125.394ms ``` ghstack-source-id: 100305788 Test Plan: buck test //caffe2/test:quantized Differential Revision: D20473713 fbshipit-source-id: c878fbb8f5a1a33f0cdac2657cc61e97ceb6c183	2020-03-19 10:12:52 -07:00
Eli Uriegas	6d488714a7	.circleci: Specify setup job to run on everything (#35013 ) Summary: CircleCI by default, chooses to run 0 jobs on tags meaning that when we tag a build that no job is run if a dependent job does not contain the correct filters. This adds an explicit configuration to run the setup job on every branch and every tag that CircleCI can run on. For more information on CircleCI filters and what they do (and more importantly what they do not do) visit: https://circleci.com/docs/2.0/configuration-reference/#filters-1 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/35013 Differential Revision: D20535560 Pulled By: seemethere fbshipit-source-id: 7ee5dddbc0a9416fd76ed198e5447318c53e1873	2020-03-19 09:36:27 -07:00
Jeff Daily	35d9874a35	in test_data_parallel.py, remove skipIfRocm from tests that pass (#34978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34978 Differential Revision: D20535920 Pulled By: mrshenli fbshipit-source-id: 3baa8608dd3b0dd5578bc32e56a2e6c1fe69492d	2020-03-19 09:16:43 -07:00
albanD	1f4a4aaf64	functional autograd api (#34066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34066 Basic implementation of https://github.com/pytorch/pytorch/issues/30632 Test Plan: Imported from OSS Differential Revision: D20260307 Pulled By: albanD fbshipit-source-id: 7db5c2411ddc3e954ff8fbbe93eb3b96a2bcfb2f	2020-03-19 08:24:07 -07:00
Edward Yang	96860af870	Revert D20164420: [1.5 Release][Dist Autograd][Better Engineering] Notify Workers on Failure during Distributed Autograd Test Plan: revert-hammer Differential Revision: D20164420 Original commit changeset: 3d4ed7423096 fbshipit-source-id: 67f0f9c11cee84df6dbe37db7821dd601227df66	2020-03-19 08:02:07 -07:00
Edward Yang	7c06b86e42	Revert D20518647: [pytorch][PR] [C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer Test Plan: revert-hammer Differential Revision: D20518647 Original commit changeset: 4760d1d29df1 fbshipit-source-id: b84f1a06c2de27e147716279223a6844ef89f760	2020-03-19 07:53:43 -07:00
Mike Ruberry	5d92a6cc30	Revert D7778113: Reland "[RPC] Use qualified name str directly in RPC torch script code path" Test Plan: revert-hammer Differential Revision: D7778113 Original commit changeset: b830c03ac946 fbshipit-source-id: ef08b287a6db58320c738cde0c99b3333f5724eb	2020-03-19 06:05:23 -07:00
Mike Ruberry	9c4683e8e3	Revert D20312366: [pytorch][PR] Added type promotion logic for complex numbers Test Plan: revert-hammer Differential Revision: D20312366 Original commit changeset: 90f00a1a916d fbshipit-source-id: 4510739a888b2eec5d8a72e792998ac46da6d82a	2020-03-19 05:55:57 -07:00
Mike Ruberry	0d8447a9b8	Warns when performing integer division with div and addcdiv (#34570 ) Summary: Per title. In the future we want to make div(), the division operator, and addcdiv perform true division as in Python 3, NumPy, and JAX. To do this without silently breaking users we plan to: - Warn (once) in 1.5 when a user performs integer division using div or addcdiv - RuntimeError in 1.6 when a user attempts to perform integer division using div or addcdiv - Always perform true division in 1.7 using div, /, and addcdiv Users can use true_divide or floor_divide today to explicitly specify the type of division they like. A test for this behavior is added to test_type_promotion. Unfortunately, because we are only warning once (to avoid a deluge) the test only uses maybeWarns Regex. The XLA failure is real but will be solved by https://github.com/pytorch/pytorch/pull/34552. I'll be sure to land that PR first to avoid temporarily breaking the XLA build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34570 Differential Revision: D20529211 Pulled By: mruberry fbshipit-source-id: 65af5a9641c5825175d029e8413c9e1730c661d0	2020-03-19 04:10:55 -07:00
Nikita Shulga	6f737dd4a3	Fix signed-unsigned warnings (#34791 ) Summary: And few typos Pull Request resolved: https://github.com/pytorch/pytorch/pull/34791 Test Plan: CI Differential Revision: D20524879 Pulled By: malfet fbshipit-source-id: 58fa03bd6356979e77cd1bffb6370d41a177c409	2020-03-19 00:29:56 -07:00
anjali411	c8f665dcb6	Added type promotion logic for complex numbers (#34093 ) Summary: Issue: https://github.com/pytorch/pytorch/issues/33780 After this PR: 1. dtype promotion logic will correctly work for ops involving complex scalars 2. torch.ComplexFloatTensor, torch.ComplexDoubleTensor works 3. added alias for complex64 (cfloat) and complex128 (cdouble) 4. added an internal function get_complex_default_dtype (consciously not exposed in public API) >>> 1jtorch.ones(2) tensor([(0.0000 + 1.0000j), (0.0000 + 1.0000j)], dtype=torch.complex64) >>> torch.set_default_dtype(torch.float64) >>> 1jtorch.ones(2) tensor([(0.0000 + 1.0000j), (0.0000 + 1.0000j)], dtype=torch.complex128) >>> 1j + torch.ones(2) tensor([(1.0000 + 1.0000j), (1.0000 + 1.0000j)], dtype=torch.complex128) >>> torch.tensor(1j) + torch.ones(2,2) tensor([[(1.0000 + 1.0000j), (1.0000 + 1.0000j)], [(1.0000 + 1.0000j), (1.0000 + 1.0000j)]], dtype=torch.complex128) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34093 Differential Revision: D20312366 Pulled By: anjali411 fbshipit-source-id: 90f00a1a916d9c8eeda101eb6e9d250fce569815	2020-03-18 23:36:13 -07:00
Shihao Xu	d616cad676	Reland "[RPC] Use qualified name str directly in RPC torch script code path" (#34962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34962 Relanding #34733. Fix is in https://github.com/pytorch/pytorch/pull/34988. Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_class_rref_in_py_and_use_in_script buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_module_rref_in_py_and_use_in_script ``` ``` buck test mode/dev //caffe2/test/distributed/rpc/jit:rpc_fork_thrift -- test_return_local_script_module_rref_in_py_and_use_in_script ``` Differential Revision: D7778113 fbshipit-source-id: b830c03ac9463075fca248eba75be364b0e8b080	2020-03-18 22:25:09 -07:00
Natalia Gimelshein	be82e554fe	Revert D20524479: [pytorch][PR] [C++ API Parity] Add xor_convergence test for lbfgs Test Plan: revert-hammer Differential Revision: D20524479 Original commit changeset: 3413779676ab fbshipit-source-id: ef8007ed6c184bc8b8751eb713aac2a891260048	2020-03-18 21:56:17 -07:00
James Reed	153b16ef4c	Doxygen for torchbind (#35007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35007 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D20525680 Pulled By: jamesr66a fbshipit-source-id: aaa768f395e30dcec8007d50e17f21837c306719	2020-03-18 21:49:24 -07:00
Mikhail Zolotukhin	eef17edaa3	Fix warnings in test/test_jit_fuser.py (#34980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34980 We were passing sample inputs to `torch.jit.script` (as if it was `torch.jit.trace`), but this parameter was treated as an optional `optimize` parameter. That parameter is deprecated and that caused a warning. Differential Revision: D20520369 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 87b40a5e35bfc4a3d7a5d95494632bfe117e40b7	2020-03-18 19:55:25 -07:00
Michael Suo	55b254e114	update gitignore to include clangd index (#35018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35018 Test Plan: Imported from OSS Differential Revision: D20528402 Pulled By: suo fbshipit-source-id: badb487a4fbb0299b49c1b1022bcd7b61eba1e88	2020-03-18 19:53:03 -07:00
Nikita Shulga	d3b6099366	[build] Update gloo submodule (#34969 ) Summary: Update gloo submodule to `113bde13035594cafdca247be953610b53026553` be compatible with separate compilation introduced by https://github.com/facebookincubator/gloo/pull/251 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34969 Test Plan: CI Differential Revision: D20527163 Pulled By: malfet fbshipit-source-id: 300d83d8fe95d57b8d740543efada3c56ac7b493	2020-03-18 19:24:23 -07:00
Omkar Salpekar	5f67c923f1	[1.5 Release][Dist Autograd][Better Engineering] Notify Workers on Failure during Distributed Autograd (#34638 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34638 Fixes: https://github.com/pytorch/pytorch/issues/27643 This PR manages notifying workers in the event of a failure during distributed autograd. Gracefully handles propagating errors across all nodes in the backward pass and sets state in the local autograd engines accordingly. (Note: this ignores all push blocking failures!) Test Plan: Added 2 new tests checking errors when they are thrown in an intermediate node during distributed autograd. Ensured that all existing distributed autograd tests pass. Differential Revision: D20164420 fbshipit-source-id: 3d4ed74230969ac70bb763f1b5b1c16d979f66a2	2020-03-18 18:56:14 -07:00
Nikita Shulga	a73dfcf8cf	Adjust ProtoBufPatch to protobuf-3.11.x (#35008 ) Summary: `GetEmptyStringAlreadyInited` invocation pattern in protobuf generated header files chanegd to `:PROTOBUF_NAMESPACE_ID::internal::GetEmptyStringAlreadyInited`, where `PROTOBUF_NAMESPACE_ID` is defined in `protobuf/port_def.inc` as `google::protobuf` This likely to have changed around protobuf-3.8.x time, but I've only tested it using protobuf-3.11.4 Pull Request resolved: https://github.com/pytorch/pytorch/pull/35008 Test Plan: Update `third-party/protobuf` submodule to 3.11.4, compile and run `pattern_net_transform_test` Differential Revision: D20526949 Pulled By: malfet fbshipit-source-id: fddaa3622c48ad883612c73c40a20d306d88d66b	2020-03-18 18:35:23 -07:00
Shihao Xu	e5ee95e448	[RPC] Add to confirmed users immediately if the fork is shared from owner, instead of adding nothing to pending users (#34988 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34988 In https://github.com/pytorch/pytorch/pull/31893, we introduced a confirmedUsers_ map in RRefContext. For the case that the fork is shared from the owner, there is no `pendingUsers_` intermediate phase for this fork, we should put this fork into `confirmedUsers_` immediately. Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork ``` Differential Revision: D7735909 fbshipit-source-id: 14c36a16486f0cc9618dcfb111fe5223781b647d	2020-03-18 18:17:41 -07:00
anjali411	b8e043abca	[C++ API Parity] [Optimizers] Merged Optimizer and LossClosureOptimizer (#34957 ) Summary: 1. Removed LossClosureOptimizer, and merged Optimizer into OptimizerBase (and renamed the merged class to Optimizer) 2. Merged the LBFGS-specific serialize test function and the generic test_serialize_optimizer function. 3. BC-compatibility serialization test for LBFGS 4. Removed mentions of parameters_ in optimizer.cpp, de-virtualize all functions 5. Made defaults_ optional argument in all optimizers except SGD Pull Request resolved: https://github.com/pytorch/pytorch/pull/34957 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20518647 Pulled By: anjali411 fbshipit-source-id: 4760d1d29df1784e2d01e2a476d2a08e9df4ea1c	2020-03-18 17:28:57 -07:00
Meghan Lele	2a1c83823d	[tools] Parallelize tools/clang_format_new.py (#34750 ) Summary: Summary This commit parallelizes the invocation of `clang-format` on all files in `tools/clang_format_new.py` using `asyncio`. Testing Ran and timed the script. Before ``` $ time ./tools/clang_format_new.py --diff ... real 0m7.615s user 0m6.012s sys 0m1.634s ``` After ``` $ time ./tools/clang_format_new.py --diff ... Some files not formatted correctly real 0m2.156s user 0m8.488s sys 0m3.201s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34750 Differential Revision: D20523133 Pulled By: SplitInfinity fbshipit-source-id: 509741a0b4fcfcdcd7c5a45654e3453b4874d256	2020-03-18 17:27:02 -07:00
Jiakai Liu	6e47e7bf52	[pytorch][mobile] fixed AutoGradMode/AutoNonVariableTypeMode uses for mobile callsites Summary: There are three guards related to mobile build: * AutoGradMode * AutoNonVariableTypeMode * GraphOptimizerEnabledGuard Today we need set some of these guards before calling libtorch APIs because we customized mobile build to only support inference (for both OSS and most FB use cases) to optimize binary size. Several changes were made since 1.3 release so there are already inconsistent uses of these guards in the codebase. I did a sweep of all mobile related model loading & forward() call sites, trying to unify the use of these guards: Full JIT: still set all three guards. More specifically: * OSS: Fixed a bug of not setting the guard at model load time correctly in Android JNI. * FB: Not covered by this diff (as we are using mobile interpreter for most internal builds). Lite JIT (mobile interpreter): only needs AutoNonVariableTypeMode guard. AutoGradMode doesn't seem to be relevant (so removed from a few places) and GraphOptimizerEnabledGuard definitely not relevant (only full JIT has graph optimizer). More specifically: * OSS: At this point we are not committed to support Lite-JIT. For Android it shares the same code with FB JNI callsites. * FB: JNI callsites: Use the unified LiteJITCallGuard. For iOS/C++: manually set AutoNonVariableTypeMode for _load_for_mobile() & forward() callsites. Ideally we should avoid having to set AutoNonVariableTypeMode for mobile interpreter. It's currently needed for dynamic dispatch + inference-only mobile build (where variable kernels are not registered) - without the guard it will try to run `variable_fallback_kernel` and crash (PR #34038). The proper fix will take some time so using this workaround to unblock selective BUCK build which depends on dynamic dispatch. PS. The current status (of having to set AutoNonVariableTypeMode) should not block running FL model + mobile interpreter - if all necessary variable kernels are registered then it can call _load_for_mobile()/forward() against the FL model without setting the AutoNonVariableTypeMode guard. It's still inconvenient for JAVA callsites as it's set unconditionally inside JNI methods. Test Plan: - CI Reviewed By: xta0 Differential Revision: D20498017 fbshipit-source-id: ba6740f66839a61790873df46e8e66e4e141c728	2020-03-18 17:19:35 -07:00
Nikolay Korovaiko	a4048b4703	port ge changes from bert/pytorch_fusion (#34942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34942 Differential Revision: D20505894 Pulled By: Krovatkin fbshipit-source-id: 7b442fae6aa2b1a29891b94f824094a1fddae4a2	2020-03-18 17:13:24 -07:00
anjali411	4521477f83	[C++ API Parity] Add xor_convergence test for lbfgs (#35001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/35001 Differential Revision: D20524479 Pulled By: anjali411 fbshipit-source-id: 3413779676ab95c1ee82298f95d3441a89873107	2020-03-18 17:06:53 -07:00
Elias Ellison	bcbde490e4	Fix flake (#34974 ) Summary: fix flake, add overload names Pull Request resolved: https://github.com/pytorch/pytorch/pull/34974 Differential Revision: D20519191 Pulled By: eellison fbshipit-source-id: d08d36b64397287cad484690074e694d8a0e472e	2020-03-18 16:45:33 -07:00
Jerry Zhang	b2e5e0cad6	[quant][graphmode] quantization support for aten::rehshape (#34803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34803 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20504457 fbshipit-source-id: 5ca691ef4880c72d30d62390e63e3288b2f06dce	2020-03-18 15:40:43 -07:00
Li Zhang (DAI)	69e701fbf9	Add transfer_learning_blob_name_mappings into layer_model_helper to support layer model transfer learning Summary: Add transfer_learning_blob_name_mappings into layer_model_helper to support layer model transfer learning Reviewed By: mraway Differential Revision: D20286298 fbshipit-source-id: de3e029611d843f38d3f42ecd4148358f7e14a2b	2020-03-18 15:28:00 -07:00
davidriazati	e35dd4f603	[jit] Include call stack in OSError message (#34669 ) Summary: Previously there was no indication of why you would get an `OSError` for something (such as the generated methods of a `dataclass`). ](https://our.intern.facebook.com/intern/diff/20426570/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34669 Pulled By: driazati Differential Revision: D20426570 fbshipit-source-id: 45d63631984fa26a87c03de5523fb10d8abbc6db	2020-03-18 15:10:23 -07:00
Mike Ruberry	3b7e1cd2cc	Makes floor_divide a method, adds sparse floor division (#34552 ) Summary: (Updated per review feedback) `torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to: - have an out variant: `floor_divide(x, y, out=z)` - be a method on a tensor: `x.floor_divide(y)` - have an in-place variant: `x.floor_divide_(y)` - work with sparse tensors Tests are added to test_sparse.py and test_torch.py for these new behaviors. In addition, this PR: - cleans up the existing sparse division and true_division code and improves their error message - adds testing of sparse true_division to test_sparse.py - extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y). The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors. There are two potential follow-up issues suggested by this PR: - the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes - the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552 Differential Revision: D20509850 Pulled By: mruberry fbshipit-source-id: 2cd3c828aad67191c77f2ed8470411e246f604f8	2020-03-18 15:00:53 -07:00
Jerry Zhang	d77d907f0e	[quant][graphmode] Add quantization support for aten::dropout (#34347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34347 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20504453 fbshipit-source-id: 1bab29e21d0564ed88cdeb4894addfe00ebbd390	2020-03-18 14:35:27 -07:00
Gao, Xiang	c747f09846	Add operator [] to c10::impl::ListIterator (#34926 ) Summary: This is causing failures on my Windows build Pull Request resolved: https://github.com/pytorch/pytorch/pull/34926 Differential Revision: D20501850 Pulled By: smessmer fbshipit-source-id: 92c72dd657b27b1786952dbdccfceff99f4ba743	2020-03-18 12:57:38 -07:00
ashish	064f6285af	Torchvision in jenkins testing (#34909 ) Summary: This pull request updates the Torchvision commit to use ROCm enabled torchvision in `.jenkins/pytorch/test.sh`. Pytorch tests: ``` test_SyncBatchNorm_process_group (__main__.TestDistBackend) test_alexnet (jit.test_models.TestModels) test_script_module_script_resnet (jit.test_models.TestModels) test_script_module_trace_resnet18 (jit.test_models.TestModels) test_torchvision_smoke (__main__.TestTensorBoardPytorchGraph) ``` in `test2` were skipped because torchvision was not installed in `test2` instead it was installed in `test1`. The PR moved torchvision test to correct place and thereby enabling the above mentioned tests. cc: ezyang iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/34909 Differential Revision: D20515333 Pulled By: ezyang fbshipit-source-id: 69439756a687ba441c1f8107233b4dbc1e108387	2020-03-18 12:45:51 -07:00
Mike Ruberry	1afc584188	Deprecates current torch.full integral type inference, adds torch.full complex type inference (#34709 ) Summary: Per title. Currently torch.full will always (attempt to) produce a float tensor. This is inconsistent with NumPy in (at least) two cases: - When integral fill values (including bool) are given - When complex fill values are given For example: ``` np.full((1, 2), 1).dtype : dtype('int64') np.full((1, 2), (1 + 1j)).dtype : dtype('complex128') ``` Whereas in PyTorch ``` torch.full((1, 2), 1).dtype : torch.float32 torch.full((1, 2), (1 + 1j)).dtype : RuntimeError: value cannot be converted to type float without overflow: (1,1) ``` This PR begins the process of deprecating our current behavior of returning float tensors (by default) when given integer fill values by warning the user that integer fill values will require explicitly specifying the dtype or out kwargs in 1.6, and in 1.7 the behavior will change to return a LongTensor by default (BoolTensor for bool values). The intermediate 1.6 release is to prevent changing the behavior silently and unexpectedly. The PR also implements inference for complex types. So that with it: ``` torch.full((1, 2), (1 + 1j)).dtype : torch.complex64 ``` The complex type inference returns a ComplexFloat tensor when given a complex fill value (and no dtype or out kwarg is specified), unless the default dtype is Double, in which case a ComplexDouble tensor is returned. A test for these behaviors is added to test_torch.py. Implementation note: This PR required customizing full's dispatch because currently in eager codegen the TensorOptions object passed to functions improperly sets has_dtype() to true, even if the user did not explicitly provide a dtype. torch.arange already worked around this issue with its own custom implementation. The JIT, however, does pass a properly constructed TensorOptions object. Future Work: This PR does not extend torch.full's complex type inference to ONNX. This seems unlikely to come up and will be a clear error if it does. When integer type inference is added to torch.full, however, then porting the behavior to ONNX may be warranted. torch.arange ported its complex type promotion logic to ONNX, for example. Additionally, this PR mostly leaves existing call sites in PyTorch that would trigger this warning intact. This is to be more minimal (since the PR is BC breaking). I will submit a separate PR fixing PyTorch's call sites. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34709 Differential Revision: D20509387 Pulled By: mruberry fbshipit-source-id: 129593ba06a1662032bbbf8056975eaa59baf933	2020-03-18 12:19:31 -07:00
Michael	f3b8a470e1	Added functionality for all to take Lists as input (#34582 ) Summary: New pull request after rebase error in pull request https://github.com/pytorch/pytorch/issues/33923 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34582 Differential Revision: D20447689 Pulled By: eellison fbshipit-source-id: 4296b64185eccb136b1b614b532deb3af20c7544	2020-03-18 12:01:30 -07:00
Edward Yang	d0577e19f0	Revert D20346700: [pytorch][PR] Eager autocasting, out-of-place ops only Test Plan: revert-hammer Differential Revision: D20346700 Original commit changeset: 12d77b391731 fbshipit-source-id: 108d72bf24232f443c0be293ec932c0c478d6a60	2020-03-18 11:42:51 -07:00
Shen Li	b35e544772	Minor fixes for RPC API doc (#34955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34955 Test Plan: Imported from OSS Differential Revision: D20512262 Pulled By: mrshenli fbshipit-source-id: 86ed099638fd32dc8fbde5a6f284239b146fd5e9	2020-03-18 11:20:32 -07:00
Shihao Xu	d29f450e63	Revert D20442573: [RPC] Use qualified name str directly in RPC torch script code path Test Plan: revert-hammer Differential Revision: D20442573 Original commit changeset: 87f8b7d94adc fbshipit-source-id: db0f10c28352d2b3ca21b5357e8e09c01a50018c	2020-03-18 11:00:09 -07:00
Jerry Zhang	689598df0b	[quant][graphmode] insert quant/dequant work for duplicated debugName (#34315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34315 previously we register quantization parameter attributes using debugName of the observed value, but debugName is not unique, this PR addresses this problem by making attribute names unique Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20504455 fbshipit-source-id: 6dd83bdfc4e4dc77ad3af3d5b48750fb01b2fce1	2020-03-18 10:49:25 -07:00
Michael Carilli	aaa8f02156	Eager autocasting, out-of-place ops only (#32140 ) Summary: Initial integration of eager autocasting, supporting out-of-place ops only for easier review. Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081 In-place ops and ops with user-supplied `out=...` can certainly be supported as well (my initial WIP https://github.com/pytorch/pytorch/pull/29552 handled many) but require substantially more complex special casing in the autocasting backend and tests. Support for these ops (much of which has already been written) will be broken into later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32140 Differential Revision: D20346700 Pulled By: ezyang fbshipit-source-id: 12d77b3917310186fbddf11c59b2794dc859131f	2020-03-18 10:28:21 -07:00
Xiao Wang	fa5bc9fa2e	Fix problem in NHWC max_pool2d; use accumulate type in NHWC max_pool2d (#34934 ) Summary: This PR would fix https://github.com/pytorch/pytorch/issues/34736. Both code snippet in that issue can now execute normally. More tests are also added. This PR is a follow-up on https://github.com/pytorch/pytorch/issues/34519, where one variable was mistakenly missed when updating the max_pool2d kernel. This PR also uses accumulate type of scalar_t in the backward kernel, which resolves the numerical precision issue when stride < kernel_size on fp16. cc csarofeen ptrblck jjsjann123 VitalyFedyunin ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/34934 Differential Revision: D20512062 Pulled By: VitalyFedyunin fbshipit-source-id: a461ebbb3e3684aa183ae40e38d8f55bb6f4fee1	2020-03-18 08:32:10 -07:00
Edward Yang	d927d58c2a	Revert D20289209: Support RowWiseSparseAdam on GPU Test Plan: revert-hammer Differential Revision: D20289209 Original commit changeset: a7a8a21bd18c fbshipit-source-id: 4a8ae684d099a5499c28b7e65578fc7ab10b248d	2020-03-18 07:35:07 -07:00
Mike Ruberry	a1eaaea288	Revert D20497453: [pytorch][PR] Makes floor_divide a method, adds sparse floor division Test Plan: revert-hammer Differential Revision: D20497453 Original commit changeset: ac326f2007d8 fbshipit-source-id: b94b89b1a25521506e3d0a6b072d3d4d8c55e63d	2020-03-18 01:48:50 -07:00
Nikita Shulga	a3de359464	Do not throw from CUDAContext destructor (#34756 ) Summary: Throwing from destructor leads to undefined behaviour (most often to segault) So it's better to leak memory then segault Pull Request resolved: https://github.com/pytorch/pytorch/pull/34756 Test Plan: Run `test_pytorch_onnx_caffe2` Differential Revision: D20504228 Pulled By: malfet fbshipit-source-id: 7a05776fea9036f602e95b8182f8493cb5886dab	2020-03-18 00:13:18 -07:00
Mike Ruberry	b7129050e7	Makes floor_divide a method, adds sparse floor division (#34552 ) Summary: (Updated per review feedback) `torch.floor_divide` is currently a function that can operate on two tensors or a tensor and a scalar (scalar x scalar floor division is handled natively by Python and the JIT has a builtin function for it). This PR updates it to: - have an out variant: `floor_divide(x, y, out=z)` - be a method on a tensor: `x.floor_divide(y)` - have an in-place variant: `x.floor_divide_(y)` - work with sparse tensors Tests are added to test_sparse.py and test_torch.py for these new behaviors. In addition, this PR: - cleans up the existing sparse division and true_division code and improves their error message - adds testing of sparse true_division to test_sparse.py - extends existing floor_divide testing in test_torch to run on CUDA, too, not just the CPU Unfortunately, making floor_divide a method requires breaking backwards compatibility, and floor_divide has been added to the BC whitelist since this is international. The BC issue is that the first parameter name to torch.floor_divide is changing from input to self. If you previously called torch.floor_divide with keyword arguments, e.g. torch.floor_divide(input=x, other=y), you will need to update to torch.floor_divide(self=x, other=y), or the more common torch.floor_divide(x, y). The intent of this PR is to allow floor_divide to be substituted for division (torch.div, /) wherever division was previously used. In 1.6 we expect torch.div to perform true_division, and floor_divide is how users can continue to perform integer division with tensors. There are two potential follow-up issues suggested by this PR: - the test framework might benefit from additional tensor construction classes, like one to create dividends and divisors for multiple dtypes - the test framework might benefit from a universal function test class. while methods have reasonable coverage as part of test_torch.py's TestTensorOp tests, function coverage is spotty. Universal functions are similar enough it should be possible to generate tests for them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34552 Differential Revision: D20497453 Pulled By: mruberry fbshipit-source-id: ac326f2007d8894f730d1278fef84d63bcb07b5d	2020-03-18 00:01:45 -07:00
Jongsoo Park	bcbdba450c	[caffe2] open source 2/4-bit SLS operators (#34903 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34903 Reattempt of D20461609 Moving 2/4-bit SLS and row-wise 2/4-bit conversion operator to open source to be used by DLRM Test Plan: CI Reviewed By: jianyuh Differential Revision: D20495304 fbshipit-source-id: 66a99677583f50fd40e29c514710c7b1a8cdbc29	2020-03-17 22:55:10 -07:00
anjali411	d7e4a379a0	[C++ API Parity] LBFGS optimizer step() update and added closure to the Optimizer step() function (#34564 ) Summary: Follow-ups after this PR: * Remove `LossClosureOptimizer`, and merge `Optimizer` into `OptimizerBase` (and rename the merged class to Optimizer) * Merge the LBFGS-specific serialize test function and the generic `test_serialize_optimizer` function, possibly by passing a bool `has_only_global_state` flag into the `test_serialize_optimizer` function to denote whether `size()` should be equal to 1 or 2? * https://github.com/pytorch/pytorch/pull/34564#discussion_r393780303 * It seems that we don't have the equivalent `XORConvergence_LBFGS` test like the other optimizers, and it would be good to add one * Remove mentions of `parameters_` in optimizer.cpp, de-virtualize all functions, and remove the `OptimizerBase(std::vector<Tensor> parameters)` constructor from `OptimizerBase` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34564 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20495701 Pulled By: anjali411 fbshipit-source-id: 6d35286d2decb6f7dff93d9d3e57515770666622	2020-03-17 22:27:24 -07:00
svcscm	df20f5b374	Updating submodules Summary: GitHub commits: `70331595ce` `51ae830b00` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 045a70a24059fc1120d54d5b85ffe0e2831d2161	2020-03-17 21:34:16 -07:00
James Reed	130e720784	[torchbind] Add more comprehensive docscrings (#34906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34906 Test Plan: Imported from OSS Differential Revision: D20496221 Pulled By: jamesr66a fbshipit-source-id: 3863ec77324564f6f0f1c54b0cbd6c29d12f3c74	2020-03-17 20:41:18 -07:00
James Reed	09a7788a2f	[torchbind] Improve IValue custom class API and remove most Capsule stuff (#34848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34848 Test Plan: Imported from OSS Differential Revision: D20480514 Pulled By: jamesr66a fbshipit-source-id: 1c595faf34e00aab0a6202a8902426bd310551c3	2020-03-17 20:39:34 -07:00
Shen Li	c4fdba326d	Support using self as the destination in rpc.remote for builtin operators (#34931 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34931 Test Plan: Imported from OSS Differential Revision: D20503571 Pulled By: mrshenli fbshipit-source-id: ed1454a349798b18b9953bbf13c86bc43d3b559d	2020-03-17 20:30:19 -07:00
Shihao Xu	b5edf329f8	[JIT] Make RPC RRef Owner WorkerInfo.name available to TorchScript (#34896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34896 Make TorchScript support calling ref.owner() to get owner worker id and calling ref.owner_name() to get owner worker name. Differential Revision: D7652208 fbshipit-source-id: a60125bb316ac2cf19a993cbd2affc933c0af7c9	2020-03-17 20:28:18 -07:00
Shen Li	95f1cb34b9	Revert D20480546: adds quantized implementation of hard sigmoid Test Plan: revert-hammer Differential Revision: D20480546 Original commit changeset: 9febcb44afd9 fbshipit-source-id: 4461b455e63448cf45237e23c988b492c3e0f1b0	2020-03-17 19:58:08 -07:00
Rohan Varma	ff3d205ee5	[rpc] handle exceptions in ProcessGroupAgent::enqueueRecv (#34413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34413 In this diff we have made various improvements to ProcessGroupAgent in order to accomodate edge and error cases such as a "non-clean" shutdown (shutdowns in which we abort RPC as quickly as possible, and don't wait for all pending work across all RPC agents to be completed): 1. Catch and log exceptions in `enqueueRecv`. This prevents us from calling `std::terminate()` in a different thread and logs an error message indicating the issue. With this we no longer have crashes caused by exceptions in this thread during non-graceful shutdown. 2. Provide cleaner error messages everywhere (and use `c10::str` where possible). One example is in `agent::send()`. 3. Add the ability to abort pending sends that cause blocking waits in `handleSend`. The reason we need to abort this is since during a non-graceful shutdown, we could become blocked waiting for these since there is no guarantee the remote end is still active and this would result in a long wait and eventual timeout. We abort these by adding them to a map, and go through this map during `shutdown()`. 4. Fix flaky tests: `test_handle_send_exceptions` and `test_backward_node_failure` and `test_backward_node_failure_python_udf`. These tests were flaky since they dealt with non-graceful shutdown of workers which has chances for a bunch of edge cases explained above. We have also refactored `createExceptionResponse`, `enqueueRecv`, and some test functions for the above reasons in this diff. For testing: Ensured that the tests are no longer flaky with 500 tests runs. Previously, these tests were flaky and disabled. Also added a unit test in the internal `ProcessGroupAgentTest.cpp`. ghstack-source-id: 100311598 Test Plan: Ensured that the tests are no longer flaky with 500 tests runs. Previously, these tests were flaky and disabled. Also added a unit test in the internal `ProcessGroupAgentTest.cpp`. Reviewed By: mrshenli Differential Revision: D20269074 fbshipit-source-id: de9cad7f7185f9864ffbb6b14cd8ca9f6ff8f465	2020-03-17 19:01:41 -07:00
Jerry Zhang	1c8e086537	[quant][graphmode][refactor] Change QParamMap to QParamVector (#34314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34314 Test Plan: . Imported from OSS Differential Revision: D20493032 fbshipit-source-id: fd945b861ae08e1d97f154aa2b1fb3099761882b	2020-03-17 18:35:15 -07:00
Yanli Zhao	4bd3e9b41b	fix barrier in jit test (#34901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34901 init_pg is needed for dist.barrier call, otherwise default process group may not be found for some rpc backend ghstack-source-id: 100319642 Test Plan: unit test Differential Revision: D20495321 fbshipit-source-id: a44241bd2ff6e1404eee9b241270a94e9fd114d0	2020-03-17 18:19:08 -07:00
Nikolay Korovaiko	74a28ff1dd	Make checkInputs more robust (#34838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34838 Differential Revision: D20500828 Pulled By: Krovatkin fbshipit-source-id: 7eff720dff2698423f3e65b3809ff6f598f936d7	2020-03-17 17:51:12 -07:00
Neeraj Pradhan	e43c2d59dd	Reduce memory overhead of categorical.sample (#34900 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34714 (using the discussed solution). Thanks to jjabo for flagging and suggesting this. Instead of expanding `probs` to prepend `sample_shape`, it is better to use the `num_samples` argument to `torch.multinomial` instead, which is faster and consumes lesser memory. Existing tests should cover this. I have profiled this on different inputs and the change results in faster `.sample` (e.g. 100X faster on the example in the issue), or at worst is similar to what we have now with the default `sample_shape` argument. cc. fritzo, alicanb, ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/34900 Differential Revision: D20499065 Pulled By: ngimel fbshipit-source-id: e5be225e3e219bd268f5f635aaa9bf7eca39f09c	2020-03-17 17:49:41 -07:00
Shen Li	85c51a8c10	Fix dist autograd context Example block format (#34921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34921 Test Plan: Imported from OSS Differential Revision: D20500012 Pulled By: mrshenli fbshipit-source-id: 6c81123ad347726032c29630d7bf58feb6d8c5fd	2020-03-17 17:44:14 -07:00
Shen Li	f05abd1259	Fix example block format in Distributed Optimizer API doc (#34919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34919 Test Plan: Imported from OSS Differential Revision: D20500013 Pulled By: mrshenli fbshipit-source-id: d28cbdd1ec207e1e8501ce389b7040fb764f12ca	2020-03-17 17:44:09 -07:00
Shen Li	e87db8a77b	Fix example format in Distributed Autograd doc (#34914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34914 Test Plan: Imported from OSS Differential Revision: D20500015 Pulled By: mrshenli fbshipit-source-id: 55715fd1ffce143952d3f6ffcf60ee83ade0efb4	2020-03-17 17:44:01 -07:00
Shen Li	552f9d3a68	Minor fixes for RPC API docs (#34890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34890 Test Plan: Imported from OSS Differential Revision: D20491788 Pulled By: mrshenli fbshipit-source-id: 95a9821d70e0afe51f586b891845b3106c7105ce	2020-03-17 17:43:55 -07:00
Shen Li	3c48aadd98	Update descriptions for transmitting CUDA tensors (#34888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34888 Test Plan: Imported from OSS Differential Revision: D20491408 Pulled By: mrshenli fbshipit-source-id: 4ca35ac9edd4c1af4f2bae2cfb0f1f6060658d5c	2020-03-17 17:43:48 -07:00
Shen Li	800bdcf000	Removing experimental tag in for RPC and adding experimental tag for RPC+TorchScript (#34887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34887 Test Plan: Imported from OSS Differential Revision: D20491409 Pulled By: mrshenli fbshipit-source-id: ce79c9706eb70a3a52a4032de4f0bd538b694332	2020-03-17 17:43:42 -07:00
Shen Li	6446ccce76	Adding warnings for async Tensor serialization in remote and rpc_async (#34885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34885 Test Plan: Imported from OSS Differential Revision: D20491279 Pulled By: mrshenli fbshipit-source-id: 8c861e7c7e9ea39f9427f80bc4e75c72c0087366	2020-03-17 17:43:35 -07:00
Shen Li	0d857d55b9	Add a warning for RRef serialization (#34884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34884 Test Plan: Imported from OSS Differential Revision: D20491278 Pulled By: mrshenli fbshipit-source-id: fd00701fd0090639ffe392f40610426c78bc9269	2020-03-17 17:40:55 -07:00
Nikita Shulga	f87cd83d11	Append multiple arguments to list of flags as multiple items (#34899 ) Summary: This makes PyTorch compileable(but not linkable) with `CUDA_SEPARABLE_COMPILATION` option enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34899 Test Plan: CI Differential Revision: D20501050 Pulled By: malfet fbshipit-source-id: 02903890a827fcc430a26f397d4d05999cf3a441	2020-03-17 16:48:32 -07:00
Jerry Zhang	841f7600bb	[quant][graphmode] Quantization pattern for aten::linear (#33854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33854 att Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20493031 fbshipit-source-id: bafd0a3ba5d07327d451b3915f043db33b012b53	2020-03-17 16:36:30 -07:00
Shihao Xu	71f02a481b	[RPC] Avoid polluting Python root logger on importing "torch" module (#34871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34871 We used to configure root logger in RPC module. A stream handler is added to `root.handlers`. This is not desired behavior for pytorch users. We should instead keep the root logger handler list untouched. We can configure the logger local to the rpc module, set it's log level, so it doesn't use it's ancestor, which is usually the root which has no stream handlers in most cases. https://docs.python.org/3/library/logging.html#logging.Logger.setLevel And add a stream handler to make it output to stdout, even if the root handlers is not configured and has an empty list. https://docs.python.org/3/library/logging.html#logging.Logger.addHandler https://docs.python.org/3/library/logging.handlers.html#logging.StreamHandler ghstack-source-id: 100322141 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_wait_all_workers ``` Differential Revision: D7677493 fbshipit-source-id: 88a66079e7348c79a7933e3527701917cbebb7ba	2020-03-17 16:07:06 -07:00
Vasiliy Kuznetsov	58c5b6d306	adds quantized implementation of hard sigmoid (#34607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34607 Adds quantized version of hardsigmoid activation. Note: not implementing the _ and .out versions is currently intended, because the implementation changes the scale and zp and it's nice to not allow the user to specify scale and zp. Lmk if we should handle this differently. Test Plan: tests benchmarks Imported from OSS Differential Revision: D20480546 fbshipit-source-id: 9febcb44afd920125ed2ca4900492f0b712078ea	2020-03-17 16:01:39 -07:00
Shihao Xu	97757dca79	Format register_ditributed_ops.cpp (#34922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34922 format Test Plan: ` Differential Revision: D7717743 fbshipit-source-id: 207bd46a6b0579adbd35f6417af239ec717c7a41	2020-03-17 15:42:18 -07:00
Terence Feng	0216c76e12	SNIFAE Template Constructors of IValue (#34647 ) (#34843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34843 Currently, we use not_ok_to_boxing to filter Dimname that can not be converted/constructed to IValue. The correct way should be SNIFAE the constructor of IValue. (Note: this ignores all push blocking failures!) Test Plan: PyTorch compiled after the code change. All unit test passed Imported from OSS Differential Revision: D20494886 fbshipit-source-id: 91dfba6a41a3ae2d6ceba9d4124cbf612ea3f080	2020-03-17 15:40:48 -07:00
Yan Xie	959a7138fd	Support RowWiseSparseAdam on GPU (#34341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34341 Implement RowWiseSparseAdam on CUDA Reviewed By: xianjiec Differential Revision: D20289209 fbshipit-source-id: a7a8a21bd18c1b9891f04f202d3ecaf183e30cad	2020-03-17 15:08:24 -07:00
rohithkrn	72e3d66f50	[ROCm] Fix for std::isnan regression in ROCm (#34664 ) Summary: Filing this PR since we are in the process of migrating ROCm CI to ROCm version 3.1. This patch is to ensure the correct functionality of float <-> bfloat16 conversion in rocm3.1. `std::isnan` regresses with rocm3.1. iotamudelta ezyang cc: ashishfarmer (original author of this patch) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34664 Differential Revision: D20440972 Pulled By: ezyang fbshipit-source-id: 1ccb911c88f05566d94e01878df6c70cf7f31242	2020-03-17 15:03:17 -07:00
Eli Uriegas	b227ea955e	.circleci: Remove should_run_job, no longer needed (#34326 ) Summary: Done at the recommendation of ezyang TODO: - [x] Sync `XImportant` Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34326 Differential Revision: D20496786 Pulled By: seemethere fbshipit-source-id: 8c84e097d81db28d7dcda8720973bce77f6eb4f7	2020-03-17 15:01:59 -07:00
Derun Gu	5857a125df	Turn on exact_dtype by default on test_optim.py (#34825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34825 Test Plan: Imported from OSS Differential Revision: D20498111 Pulled By: great-way fbshipit-source-id: e689ca40c496b6b4cccb0df30bdae89b2c024f31	2020-03-17 14:41:13 -07:00
Owen Anderson	a4224886f3	Eliminate guards through max_pool ops. (#34512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34512 Differential Revision: D20478962 Pulled By: resistor fbshipit-source-id: 86fc926305f95cae8b334ed344d8e0cdd1ef7b2b	2020-03-17 14:00:00 -07:00
Hameer Abbasi	6b701de130	Add types argument to __torch_function__ (#34303 ) Summary: This PR adds the `types` argument to `__torch_function__` as per RFC 0001: https://github.com/pytorch/rfcs/pull/3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34303 Differential Revision: D20474992 Pulled By: ezyang fbshipit-source-id: cdd40b3b38f3bda4ece8812a629f5db87e919d01	2020-03-17 13:32:00 -07:00
Eli Uriegas	275f5c8049	setup.py: Add numpy as required for install_requires (#34510 ) Summary: Was originally not a requirement but we should add it back here since it's required on import and we require it anyways for our conda packages. Tested with: ``` ❯ pkginfo -f requires_dist *.whl requires_dist: ['numpy'] ``` Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34510 Differential Revision: D20352125 Pulled By: seemethere fbshipit-source-id: 383e396fe500ed7043d83c3df57d1772d0fff1e6	2020-03-17 13:31:55 -07:00
Edward Yang	940e678da9	Add back cudaHostRegister to cudart API. (#34665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34665 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20493861 Pulled By: ezyang fbshipit-source-id: 4215e3037a16be460f20cfc2859be5ee074128d3	2020-03-17 13:30:39 -07:00
Kimish Patel	7a3cf67fd8	Implement channels last upsample2d/3d forward pass kernel. (#34597 ) Summary: Thi PR implement channel last upsampling nearest for 2D/3D. This is supposed to be faster, plus, avoids converting formats going in and out of operator. Will post benchmarking numbers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34597 Test Plan: python test/test_nn.py TestNN.test_upsamplingNearest3d_channels_last Differential Revision: D20390583 Pulled By: kimishpatel fbshipit-source-id: e0162fb97604a261887f38fc957d3f787c80954e	2020-03-17 13:04:42 -07:00
Hector Yuen	3ad7dfa2cf	move emulation libraries to contrib (#34861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34861 start with unary ops Test Plan: buck test //glow/fb/test/numerics/... ``` [hyz@devgpu019.snc1 ~/fbsource/fbcode/caffe2/caffe2/contrib] buck test //glow/fb/test/numerics/... Action graph will be rebuilt because files have been added or removed. Parsing buck files: finished in 2.0 sec Building: finished in 9.8 sec (100%) 14826/14826 jobs, 23 updated Total time: 11.9 sec Trace available for this run at /tmp/testpilot.20200316-143829.59858.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision 7228e74a7f7e8e4934ab79a135930e665ca0e589 fbpkg e6db8251dbeb46b68a52a862744deff4 at Sun Mar 8 21:16:39 2020 by twsvcscm from /data/fbprojects/packages/testinfra.testpilot/795/t.par /proc/self/fd/4/__monkeytype_main_wrapper__.py:934: DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated since Python 3.3,and in 3.9 it will stop working Discovering tests Running 34 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/7036874425505432 ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_slsw_all_one_tenth_mel_25 (glow.fb.test.numerics.test_operator_onnxifi.SLSTest) 0.000 1/34 (passed) ✓ glow/fb/test/numerics:test_batchnorm_nnpi_fp16nnpi - test_bn (glow.fb.test.numerics.test_batchnorm_nnpi_fp16.BatchnormTest) 1.974 2/34 (passed) ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - test_clip (glow.fb.test.numerics.test_fc_nnpi_fp16.FCTest) 1.371 3/34 (passed) ✓ glow/fb/test/numerics:test_batchmatmul_nnpi_fp16nnpi - test_batch_matmul (glow.fb.test.numerics.test_batchmatmul_nnpi_fp16.TestBatchMatMul) 2.993 4/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_clip_graph (glow.fb.test.numerics.test_operator_onnxifi.CommonOpsTest) 0.536 5/34 (passed) ✓ glow/fb/test/numerics:test_numerics_nnpinnpi - test_accumulator_limits (glow.fb.test.numerics.test_numerics_nnpi.AccTest) 0.472 6/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_mat_mul_graph (glow.fb.test.numerics.test_operator_onnxifi.MatMulTest) 0.495 7/34 (passed) ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_tanh (glow.fb.test.numerics.test_op_nnpi_fp16.UnaryOpTest) 0.573 8/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_fc_graph (glow.fb.test.numerics.test_operator_onnxifi.FCTest) 0.793 9/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_concat_graph_sampe_shape (glow.fb.test.numerics.test_operator_onnxifi.ConcatTest) 0.441 10/34 (passed) ✓ glow/fb/test/numerics:test_sls_nnpi_fp16nnpi - test_small_sls (glow.fb.test.numerics.test_sls_nnpi_fp16.SparseLengthsSumTest) 0.463 11/34 (passed) ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_add_graph (glow.fb.test.numerics.test_op_nnpi_fp16.ArithmeticOpsTest) 0.772 12/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_fp16fc_graph (glow.fb.test.numerics.test_operator_onnxifi.Fp16FCTest) 0.481 13/34 (passed) ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - test_fc_exercise (glow.fb.test.numerics.test_fc_nnpi_fp16.FCTest) 0.495 14/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_tanh_graph (glow.fb.test.numerics.test_operator_onnxifi.TanhTest) 0.538 15/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_add_graph (glow.fb.test.numerics.test_operator_onnxifi.ArithmeticOpsTest) 0.517 16/34 (passed) ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - test_fc_numeric_cases (glow.fb.test.numerics.test_fc_nnpi_fp16.FCTest) 0.555 17/34 (passed) ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_sub_graph (glow.fb.test.numerics.test_op_nnpi_fp16.ArithmeticOpsTest) 0.692 18/34 (passed) ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_sigmoid (glow.fb.test.numerics.test_op_nnpi_fp16.UnaryOpTest) 1.038 19/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_sigmoid_graph (glow.fb.test.numerics.test_operator_onnxifi.SigmoidTest) 0.530 20/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_div_graph (glow.fb.test.numerics.test_operator_onnxifi.ArithmeticOpsTest) 0.590 21/34 (passed) ✓ glow/fb/test/numerics:test_sls_4bit_nnpi_fp16nnpi - test_slws_fused_4bit_rowwise_all_same (glow.fb.test.numerics.test_sls_4bit_nnpi_fp16.SparseLengthsSumTest) 0.607 22/34 (passed) ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_div_graph (glow.fb.test.numerics.test_op_nnpi_fp16.ArithmeticOpsTest) 0.583 23/34 (passed) ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - test_mul_graph (glow.fb.test.numerics.test_op_nnpi_fp16.ArithmeticOpsTest) 0.803 24/34 (passed) ✓ glow/fb/test/numerics:test_numerics_nnpinnpi - test_accumulator_simple (glow.fb.test.numerics.test_numerics_nnpi.AccTest) 0.484 25/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_slws_fused_8bit_rowwise_length1_graph (glow.fb.test.numerics.test_operator_onnxifi.SLSTest) 9.069 26/34 (passed) ✓ glow/fb/test/numerics:test_sls_nnpi_fp16nnpi - test_slws_fused_8bit_rowwise_intel2 (glow.fb.test.numerics.test_sls_nnpi_fp16.SparseLengthsSumTest) 1.741 27/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_mul_graph (glow.fb.test.numerics.test_operator_onnxifi.ArithmeticOpsTest) 0.902 28/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_sub_graph (glow.fb.test.numerics.test_operator_onnxifi.ArithmeticOpsTest) 0.678 29/34 (passed) ✓ glow/fb/test/numerics:test_sls_nnpi_fp16nnpi - test_slws_fused_8bit_rowwise_all_same (glow.fb.test.numerics.test_sls_nnpi_fp16.SparseLengthsSumTest) 0.726 30/34 (passed) ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - test_fc_num0 (glow.fb.test.numerics.test_fc_nnpi_fp16.FCTest) 1.621 31/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_slws_fused_8bit_rowwise_graph (glow.fb.test.numerics.test_operator_onnxifi.SLSTest) 10.121 32/34 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - test_gather_graph (glow.fb.test.numerics.test_operator_onnxifi.CommonOpsTest) 99.675 33/34 (passed) ✓ glow/fb/test/numerics:fp16_op_test - FP16Test.4BitFusedSLS_NNPI 0.156 34/34 (passed) {emoji:2702} glow/fb/test/numerics:fp16_op_test - FP16Test.4BitFusedSLS_Interpreter 0.000 (OMITTED) Test output: > This test was disabled. > To run this test locally, add the command line flag --run-disabled to your test command (prefix with -- if using buck). > To view why this is disabled or re-enable this test in the test console, visit https://our.intern.facebook.com/intern/testinfra/testdetail/281474992503783 ✓ glow/fb/test/numerics:fp16_op_test - main 3.986 (passed) ✓ glow/fb/test/numerics:test_numerics_nnpinnpi - main 12.606 (passed) ✓ glow/fb/test/numerics:test_sls_nnpi_fp16nnpi - main 12.622 (passed) ✓ glow/fb/test/numerics:test_fc_nnpi_fp16nnpi - main 12.688 (passed) ✓ glow/fb/test/numerics:test_operator_onnxifinnpi - main 12.688 (passed) ✓ glow/fb/test/numerics:test_batchnorm_nnpi_fp16nnpi - main 12.744 (passed) ✓ glow/fb/test/numerics:test_batchmatmul_nnpi_fp16nnpi - main 12.763 (passed) ✓ glow/fb/test/numerics:test_op_nnpi_fp16nnpi - main 12.800 (passed) ✓ glow/fb/test/numerics:test_sls_4bit_nnpi_fp16nnpi - main 13.034 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/7036874425505432 Summary (total time 134.18s): PASS: 43 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 1 glow/fb/test/numerics:fp16_op_test - FP16Test.4BitFusedSLS_Interpreter ``` Reviewed By: yinghai Differential Revision: D20471053 fbshipit-source-id: 0bd8e69fbb843a02dc031f45a060aa78c602b42c	2020-03-17 12:50:41 -07:00
Nikita Shulga	cfab65d90d	Fix CMake Dev warning in caffe2/CMakeLists.txt (#34886 ) Summary: If arguments of `ENDIF()` block are non-empty, they should match corresponding `IF()` BLOCK Pull Request resolved: https://github.com/pytorch/pytorch/pull/34886 Test Plan: CI Differential Revision: D20494631 Pulled By: malfet fbshipit-source-id: 5fed86239b4a0cb4b3aedd02c950c1b800199d2d	2020-03-17 12:19:42 -07:00
Edward Yang	3e68d0c5d0	Revert D20461609: [caffe2] open source 2/4-bit SLS operators Test Plan: revert-hammer Differential Revision: D20461609 Original commit changeset: b3ef73ff10f2 fbshipit-source-id: e90ee5e34b1feab5b0bd582ed7e96e37de7044b0	2020-03-17 11:10:10 -07:00
Mikhail Zolotukhin	95833a49e6	[TensorExpr] Pull changes from bertmaher/pytorch_fusion. (#34842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34842 This PR (hopefully the last one of such kind) is merging changes from a side branch where tensor expessions based fuser work has been done so far. This PR is is a squashed version of changes in the side branch, which is available here: https://github.com/bertmaher/pytorch Differential Revision: D20478208 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 21556e009f1fd88099944732edba72ac40e9b9c0	2020-03-17 11:02:48 -07:00
Shihao Xu	ecd7c0f84c	[RPC] Use qualified name str directly in RPC torch script code path (#34733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34733 simplify ghstack-source-id: 100292435 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_class_rref_in_py_and_use_in_script buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par \ -r test_return_local_script_module_rref_in_py_and_use_in_script ``` Differential Revision: D20442573 fbshipit-source-id: 87f8b7d94adc03544f8e2955d01cd4702bb31a34	2020-03-17 10:28:52 -07:00
svcscm	a0b7a39a92	Updating submodules Summary: GitHub commits: `eff7e6d11d` `7812ac2fa9` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: a3f94dd5b48240169296d773b2828cd97b0871dd	2020-03-17 10:02:37 -07:00
peter	65889388d1	Use randomtemp to resolve intermittent cuda build errors (#34777 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25393. Core logic of randomtemp: https://github.com/peterjc123/randomtemp/blob/master/randomtemp/randomtemp.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/34777 Differential Revision: D20491243 Pulled By: ezyang fbshipit-source-id: 76b0e1819ac1e3f760d5451197bd75ea13df1f0b	2020-03-17 09:56:01 -07:00
peter	67cb018462	Print cuda install logs for Windows CI (#34858 ) Summary: Related to https://github.com/pytorch/pytorch/issues/34821. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34858 Differential Revision: D20491248 Pulled By: ezyang fbshipit-source-id: c6ddd59197a7bce31c1a3ea5dc28b0ee95d5c216	2020-03-17 09:37:25 -07:00
xiaobingsuper	acbca57d18	improve batch_norm contiguous case's performance (#34530 ) Summary: For batch_norm inference contiguous case, we can get a better performance by manually vectorize it. Test script: ``` X import torch import torch.nn as nn import time torch.manual_seed(0) for n in [1, 10, 100]: for c in [1, 10, 100]: for hw in [1, 10, 200]: m = nn.BatchNorm2d(c, affine=False) m.eval() input = torch.randn(20, c, hw, hw) # warm up for i in range(200): output = m(input) fwd_t = 0 for j in range(1000): t1 = time.time() output = m(input) t2 = time.time() fwd_t = fwd_t + (t2 -t1) fwd_avg = fwd_t / 1000 * 1000 print("size = (%d, %d, %d, %d); compute time is %.4f(ms)" % (n, c, hw, hw, fwd_avg)) ``` Before: ``` size = (1, 1, 1, 1); compute time is 0.0110(ms) size = (1, 1, 10, 10); compute time is 0.0123(ms) size = (1, 1, 200, 200); compute time is 0.8166(ms) size = (1, 10, 1, 1); compute time is 0.0107(ms) size = (1, 10, 10, 10); compute time is 0.0257(ms) size = (1, 10, 200, 200); compute time is 8.7533(ms) size = (1, 100, 1, 1); compute time is 0.0122(ms) size = (1, 100, 10, 10); compute time is 0.1619(ms) size = (1, 100, 200, 200); compute time is 123.5674(ms) size = (10, 1, 1, 1); compute time is 0.0109(ms) size = (10, 1, 10, 10); compute time is 0.0123(ms) size = (10, 1, 200, 200); compute time is 0.5629(ms) size = (10, 10, 1, 1); compute time is 0.0107(ms) size = (10, 10, 10, 10); compute time is 0.0253(ms) size = (10, 10, 200, 200); compute time is 8.7817(ms) size = (10, 100, 1, 1); compute time is 0.0120(ms) size = (10, 100, 10, 10); compute time is 0.1655(ms) size = (10, 100, 200, 200); compute time is 123.2488(ms) size = (100, 1, 1, 1); compute time is 0.0109(ms) size = (100, 1, 10, 10); compute time is 0.0123(ms) size = (100, 1, 200, 200); compute time is 0.5740(ms) size = (100, 10, 1, 1); compute time is 0.0108(ms) size = (100, 10, 10, 10); compute time is 0.0257(ms) size = (100, 10, 200, 200); compute time is 8.7201(ms) size = (100, 100, 1, 1); compute time is 0.0122(ms) size = (100, 100, 10, 10); compute time is 0.1628(ms) size = (100, 100, 200, 200); compute time is 123.1739(ms) ``` After: ``` size = (1, 1, 1, 1); compute time is 0.0105(ms) size = (1, 1, 10, 10); compute time is 0.0114(ms) size = (1, 1, 200, 200); compute time is 0.5771(ms) size = (1, 10, 1, 1); compute time is 0.0105(ms) size = (1, 10, 10, 10); compute time is 0.0160(ms) size = (1, 10, 200, 200); compute time is 6.9851(ms) size = (1, 100, 1, 1); compute time is 0.0122(ms) size = (1, 100, 10, 10); compute time is 0.0848(ms) size = (1, 100, 200, 200); compute time is 98.6758(ms) size = (10, 1, 1, 1); compute time is 0.0105(ms) size = (10, 1, 10, 10); compute time is 0.0115(ms) size = (10, 1, 200, 200); compute time is 0.2690(ms) size = (10, 10, 1, 1); compute time is 0.0105(ms) size = (10, 10, 10, 10); compute time is 0.0159(ms) size = (10, 10, 200, 200); compute time is 6.6946(ms) size = (10, 100, 1, 1); compute time is 0.0123(ms) size = (10, 100, 10, 10); compute time is 0.0854(ms) size = (10, 100, 200, 200); compute time is 98.7327(ms) size = (100, 1, 1, 1); compute time is 0.0107(ms) size = (100, 1, 10, 10); compute time is 0.0116(ms) size = (100, 1, 200, 200); compute time is 0.2681(ms) size = (100, 10, 1, 1); compute time is 0.0104(ms) size = (100, 10, 10, 10); compute time is 0.0159(ms) size = (100, 10, 200, 200); compute time is 6.7507(ms) size = (100, 100, 1, 1); compute time is 0.0124(ms) size = (100, 100, 10, 10); compute time is 0.0852(ms) size = (100, 100, 200, 200); compute time is 98.6866(ms) ``` For real modle Resnext101, we can also get ~20% performance improvement for large batch size, Test script: ``` import torch import torchvision import torch import time torch.manual_seed(0) #torch.set_num_threads(1) model = torchvision.models.resnext101_32x8d().eval() for batch_size in [1, 64]: input = torch.randn(batch_size, 3, 224, 224) #warm up with torch.no_grad(): for i in range(5): output = model(input) fwd_t = 0 for i in range(10): t1 = time.time() output = model(input) t2 = time.time() fwd_t = fwd_t + (t2 - t1) time_fwd_avg = fwd_t / 10 * 1000 print("Throughput of resnext101 with batch_size = %d is %10.2f (imgs/s)" % (batch_size, batch_size * 1000/ time_fwd_avg )) ``` Before: ``` Throughput of resnext101 with batch_size = 1 is 7.89 (imgs/s) Throughput of resnext101 with batch_size = 64 is 13.02 (imgs/s) num_threads =1 Throughput of resnext101 with batch_size = 1 is 2.97 (imgs/s) Throughput of resnext101 with batch_size = 64 is 2.75 (imgs/s) ``` After: ``` Throughput of resnext101 with batch_size = 1 is 8.95 (imgs/s) Throughput of resnext101 with batch_size = 64 is 15.52 (imgs/s) num_threads = 1 Throughput of resnext101 with batch_size = 1 is 3.10 (imgs/s) Throughput of resnext101 with batch_size = 64 is 2.88 (imgs/s) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34530 Differential Revision: D20479560 Pulled By: ngimel fbshipit-source-id: 2e788ebcd814556116c90553ec61159eeffb3c16	2020-03-17 09:22:35 -07:00
Hong Xu	a8ca340ad6	Remove all uses of AT_CHECK and replace them with TORCH_CHECK (#34846 ) Summary: AT_CHECK has been deprecated and provides no more features than TORCH_CHECK Pull Request resolved: https://github.com/pytorch/pytorch/pull/34846 Differential Revision: D20481339 Pulled By: mrshenli fbshipit-source-id: 1777e769a069a78e03118270294e5e273d516ca7	2020-03-17 08:59:02 -07:00
Edward Yang	76d9e76b4a	Default to erroring when failing to return from non-void function. (#34663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34663 Been bitten by this so many times. Never more. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20425480 Pulled By: ezyang fbshipit-source-id: c4489efacc4149c9b57d1b8207cc872970c2501f	2020-03-17 07:31:56 -07:00
Jongsoo Park	d9b97a4ffd	[caffe2] open source 2/4-bit SLS operators (#34783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34783 Moving 2/4-bit SLS and row-wise 2/4-bit conversion operator to open source to be used by DLRM Test Plan: CI Reviewed By: yinghai Differential Revision: D20461609 fbshipit-source-id: b3ef73ff10f2433afe06ffa73fe1145282d9ec4c	2020-03-17 01:00:31 -07:00
James Reed	089a0a2117	[torchbind] Test moving custom classes to/from IValue (#34847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34847 Test Plan: Imported from OSS Differential Revision: D20480512 Pulled By: jamesr66a fbshipit-source-id: 87f5f8ea8764e26d383b17e4f72538166ddd0655	2020-03-16 23:57:42 -07:00
James Reed	699a4ed8f5	[testing][do not land] (#34605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34605 Test Plan: Imported from OSS Differential Revision: D20393219 Pulled By: jamesr66a fbshipit-source-id: c74d886f5f01061294203a002b72b75a3c446f09	2020-03-16 23:56:00 -07:00
Yanli Zhao	89cbc0edea	fix tests that could have racy script module instantiation (#34792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34792 it is not thread safe to initiate script module in multiple threads. for both test_remote_script_module and test_torchscript_functions_not_supported, it is possible that client thread is initiating MyScriptModule while server thread is initiating it as well in the same rank process. removing MyScriptModule instatiation in client thread, it is not needed actually. ghstack-source-id: 100266609 Test Plan: unit tests Differential Revision: D20463234 fbshipit-source-id: 6ff70ad90fa50b0b44c78df2495b4bcaabb4487b	2020-03-16 23:14:07 -07:00
Nikita Shulga	e70c28856f	[Caffe2] Move more method implementations from tensor.h to tensor.cc (#34811 ) Summary: To speed up compilation time Pull Request resolved: https://github.com/pytorch/pytorch/pull/34811 Test Plan: CI Differential Revision: D20476992 Pulled By: malfet fbshipit-source-id: 922cde93783fbfc04854851d7a05a635d5239792	2020-03-16 22:15:18 -07:00
Ailing Zhang	471ddacd8b	Add retry decorator and use it for Hub tests. (#34829 ) Summary: fix https://github.com/pytorch/pytorch/issues/34751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34829 Differential Revision: D20476231 Pulled By: ailzhang fbshipit-source-id: eb38ee655e28250352b15e8e37b3b39310a7c378	2020-03-16 20:19:45 -07:00
Supriya Rao	b336deb6ee	[quant][mobile] Not use qnnpack max_pool2d if ceil_mode is true (#34844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34844 QNNPACK max_pool2d operator does not support ceil_mode so this can cause crashes in the kernel when it is set to true. We default to the server implementation when ceil_mode is set to true Test Plan: python test/test_quantized.py Imported from OSS Differential Revision: D20478701 fbshipit-source-id: 7962444ac493f5c3c32a9aa1a7be465e8b84ccc2	2020-03-16 19:27:04 -07:00
Rohan Varma	1e140c353c	[profiler][rpc] fix a race condition in the profiler when multiple threads call (#33719 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33719 We were seeing a strange error where gathering profiler events (specifically `parse_cpu_trace` in `profiler.py`) would fail with the error: `IndexError: pop from empty list`. It turned out that this was because for one particular `Event`, there was a pop recorded but not a push. Instead of the `push` event being completely missing, it was overwritten by a completely different event. After a bunch of debugging, and trying several hypotheses, it turns out that this was a race condition in `RangeEventList::record`. What happened was that different threads would call into `RangeEventList::record` on the same event list instance, and one record would stomp over the data written by the other one. Somehow the data written was a valid `Event` so the error did not manifest itself until the profiler realized a `pop` was missing a matching `push` in the python code. I fixed this by adding a lock to serialize writes to `RangeEventList::record`. This PR also makes a small change to pass in the `RecordFunction` name into `popRange`. It makes the debugging easier when investigating the events recorded. Differential Revision: D20071125 fbshipit-source-id: 70b51a65bcb833a7c88b7462a978fd3a39265f7e	2020-03-16 18:41:16 -07:00
Shen Li	422e348619	Don't run user function until all UserRRefs in the args are confirmed (#34497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34497 Use a thread_local table to intercept UserRRefs created during user function args deserialization, and then wait for confirmations of those UserRRefs before launching the given user function. Differential Revision: D20347464 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: 087484a2d2f03fbfb156752ab25653f39b412a07	2020-03-16 18:30:06 -07:00
Shen Li	d876fef743	Fix send count for local RPC (#34809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34809 Test Plan: Imported from OSS Differential Revision: D20470495 Pulled By: mrshenli fbshipit-source-id: 2d6e2a2889be07fb074443f05db5089291daf8cf	2020-03-16 18:30:01 -07:00
Shen Li	38b2856c71	Split deserialize from runPythonUdf and remove generatePythonUDFResult (#34496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34496 Differential Revision: D20347469 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: b832a3a9e2ef61f149175f737b26f65d63bf797b	2020-03-16 18:28:07 -07:00
Eli Uriegas	ae0c88d6aa	.circleci: Add manywheel builds for python 3.8 (#34732 ) Summary: Not entirely sure why this wasn't here before but we definitely need to test for this. Closes https://github.com/pytorch/pytorch/issues/34727 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34732 Differential Revision: D20480508 Pulled By: seemethere fbshipit-source-id: 43bcff679ca35993f6bf1b10980acd7c86f780b1	2020-03-16 17:28:46 -07:00
neginraoof	480d1849b0	[ONNX] Fix for expand -1 dim value (#34069 ) Summary: PyTorch expand allows size with -1 dim value. -1 dim value means to infer the dimension from input tensor. This can be exported to ONNX expand with 1 dim value since ONNX expand supports two-way broadcast. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34069 Reviewed By: hl475 Differential Revision: D20195532 Pulled By: houseroad fbshipit-source-id: c90e7d51b9d7422c09c5ed6e135ca8263105b8c9	2020-03-16 15:30:20 -07:00
Vasiliy Kuznetsov	1bac5fd0d3	add hardsigmoid FP operator to PyTorch (#34545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34545 This is for common operator coverage, since this is widely used. A future PR will add the quantized version. Some initial questions for reviewers, since it's my first FP operator diff: * do we need a backwards.out method for this? * do we need CUDA? If yes, should it be this PR or is it ok to split Test Plan: ``` // test python test/test_torch.py TestTorchDeviceTypeCPU.test_hardsigmoid_cpu_float32 // benchmark python -m pt.hardsigmoid_test ... Forward Execution Time (us) : 40.315 Forward Execution Time (us) : 42.603 ``` Imported from OSS Differential Revision: D20371692 fbshipit-source-id: 95668400da9577fd1002ce3f76b9777c6f96c327	2020-03-16 15:24:12 -07:00
Kevin Matzen	6d8649dc53	[caffe2] fix Transpose2D calls in NHWC<->NCHW (#34625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34625 These templated function calls are not specifying the template args correctly. The first arg is the index type, not the array data type. That means, right now it's using `T` as the index type as well, which will break if we do a template specialization for uint8_t. If we omit both, it will correctly infer that the index type is `int` and the data type is `T`. Reviewed By: BIT-silence Differential Revision: D20358728 fbshipit-source-id: 8cbd8eeb14bce602c02eb6fce2cc141f0121fa24	2020-03-16 15:18:44 -07:00
Xiang Gao	31eaeba38a	Increase the prec of test_baddbmm (#34764 ) Summary: This test is flaky on my computer, the error is: ``` AssertionError: tensor(1.3351e-05) not less than or equal to 1e-05 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34764 Differential Revision: D20476006 Pulled By: ezyang fbshipit-source-id: dad7e702275346070552c8a98765c37e6ca2c197	2020-03-16 15:06:01 -07:00
Pearu Peterson	8bae1ed144	PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem - copy (#34721 ) Summary: This is a copy of PR https://github.com/pytorch/pytorch/issues/29488 to help the merging process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34721 Differential Revision: D20444270 Pulled By: vincentqb fbshipit-source-id: 042c56c8c0dae37834f52b4aee2deae7dd6fa659	2020-03-16 14:13:30 -07:00
Mikhail Zolotukhin	976d6aaa51	Revert D20251830: [TensorExpr] Add tensorexpr benchmarks. Test Plan: revert-hammer Differential Revision: D20251830 Original commit changeset: bafd66ce32f6 fbshipit-source-id: d8aea4b26441d8aba90c11d7350d3424df494052	2020-03-16 13:20:16 -07:00
Nikita Shulga	ef78fa8668	caffe2::OperatorBase do not need to be aware of at::Tensor functions (#34810 ) Summary: Replacing <ATen/core/Tensor.h> with <<ATen/core/TensorBody.h> speeds up compilation of caffe2 operators by 15% For example, it reduces pool_op.cu compilation from 18.8s to 16s Pull Request resolved: https://github.com/pytorch/pytorch/pull/34810 Test Plan: CI Differential Revision: D20472230 Pulled By: malfet fbshipit-source-id: e1b261cc24ff577f09e2d5f6428be2063c6d4a8b	2020-03-16 12:58:05 -07:00
Mikhail Zolotukhin	e93e7b2795	[TensorExpr] Add tensorexpr benchmarks. (#34230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34230 This PR adds some benchmarks that we used to assess tensor expressions performance. Differential Revision: D20251830 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: bafd66ce32f63077e3733112d854f5c750d5b1af	2020-03-16 11:49:39 -07:00
Mikhail Zolotukhin	ea5c86c276	[TensorExpr] Add LLVM codegen. (#34228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34228 This PR adds LLVM codegen to tensor expressions. LLVM is added as an optional build dependency specified with `USE_LLVM=<path_to_llvm>` variable. If this variable is not set or LLVM is not found in the specified path, the LLVM codegen is completely disabled. Differential Revision: D20251832 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 77e203ab4421eb03afc64f8da17e0daab277ecc2	2020-03-16 11:49:34 -07:00
Mikhail Zolotukhin	35e7efeb9a	[TensorExpr] Add CUDA codegen. (#34227 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34227 This PR adds a CUDA support to tensor expressions. Differential Revision: D20251836 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: ab36a55834cceff30c8371fef6cca1054a32f017	2020-03-16 11:49:29 -07:00
Mikhail Zolotukhin	42b2c8c65d	[TensorExpr] Add a fuser pass based on tensor expressions. (#34226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34226 LLVM and Cuda backends are added in subsequent PRs, so at this point the fuser is pretty useless, but it still can be tested and its logic is not going to change with addition of the codegens. Differential Revision: D20251838 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 82b0d221fa89904ed526689d02a6c7676a8ce8de	2020-03-16 11:49:24 -07:00
Mikhail Zolotukhin	e31d462e92	[TensorExpr] Pull changes to core classes for representing expressions and statements from the side branch. (#34224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34224 Our development has been happening on a side branch `pytorch_fusion` in `bertmaher/pytorch` fork. This PR moves changes to the core classes representing expressions and transformations on them. At this moment, the tensor expressions are only used in tests. Subsequent PRs add LLVM and CUDA codegen for tensor expressions and implement fuser on top of these. This PR is huge as it is a squashed version of changes in the side branch. It is not practical to pull changes one by one from the branch, so here is the squashed version. If you're interested in seeing the history of changes, please refer to https://github.com/bertmaher/pytorch Differential Revision: D20251835 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: 1a871acc09cf3c6f7fb4af40d408cdbb82dc7dab	2020-03-16 11:47:47 -07:00
Xinyi Zhang	99b91ee2ad	[fix][tiny][caffe2] Avoid triggering errors when allow ratio is 100% (#34757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34757 Reviewed By: Wakeupbuddy Differential Revision: D20451255 fbshipit-source-id: 07997cf31dba653b61d082ec3f28357c3b90c4eb	2020-03-16 11:39:32 -07:00
peter	24c9e61e79	Enable JIT tests on Windows (#27029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27029 Reviewed By: eellison Differential Revision: D20458664 Pulled By: jamesr66a fbshipit-source-id: 22be918543703869f471e89b3478423198351bf3	2020-03-16 11:26:21 -07:00
Yinghai Lu	1af6002321	Initial implementation of NNPI Int8FC op Test Plan: ``` buck test mode/no-gpu glow/fb/test/numerics:test_fc_nnpi_int8nnpi -- --print-passing-detail ``` Reviewed By: hyuen Differential Revision: D20450490 fbshipit-source-id: c4811cdc994548b6e319d57115434dfc199e07c2	2020-03-16 10:46:17 -07:00
Michael Suo	a57f92e4de	[jit] copy unused/ignored methods to ScriptModule during compilation (#33981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33981 Okay it turns out that https://github.com/pytorch/pytorch/pull/29342 deletes actually useful things from the resulting Python module. In particular, people like having `ignore`'d methods attached so that they can invoke them from python. Test Plan: Imported from OSS Differential Revision: D20171650 Pulled By: suo fbshipit-source-id: 71862e932c6a56cd055d0cff6657887ee0ceb9a8	2020-03-16 10:38:59 -07:00
Jerry Zhang	cec9758afa	[quant][graphmode] Add quantization pattern for quantized::add_relu (#33532 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33532 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20354880 fbshipit-source-id: ea608a5ace395a909851f9e577ffdcb51512a3af	2020-03-16 10:20:57 -07:00
Gregory Chanan	8eaafbd99b	Remove unused newWithSize declaration. (#34730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34730 Test Plan: Imported from OSS Differential Revision: D20446078 Pulled By: gchanan fbshipit-source-id: 0effc088dcba4f60385e3b23fa656cb772a3b7bc	2020-03-16 09:17:54 -07:00
Gregory Chanan	b94d650868	Remove unused newView declaration. (#34729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34729 Test Plan: Imported from OSS Differential Revision: D20446077 Pulled By: gchanan fbshipit-source-id: b68471aeaf673851bdfc6bb0615aba8ebb883a4c	2020-03-16 09:16:14 -07:00
Xiang Gao	a66b837b19	Migrate dirichlet_grad from CUDA_tensor_apply4 to TensorIterator (#33996 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33996 Test Plan: Imported from OSS Differential Revision: D20196789 Pulled By: VitalyFedyunin fbshipit-source-id: 69ee720f4f3d8a2df91874b77ee3918ce1b951b2	2020-03-16 08:56:32 -07:00
Xiang Gao	c3c0cf1591	Migrate binary_cross_entropy_backward from CUDA_tensor_apply4 to (#33995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33995 TensorIterator Test Plan: Imported from OSS Differential Revision: D20196790 Pulled By: VitalyFedyunin fbshipit-source-id: c0c231a20e6e69fc3c68c3ac5082b20f2feb6158	2020-03-16 08:54:49 -07:00
anjali411	762be86e63	[C++ API Parity] [Optimizers] added closure to optimizers (#34790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34790 Differential Revision: D20468361 Pulled By: anjali411 fbshipit-source-id: 1c6115d735b211dc2bedf002d58931cb32cf657a	2020-03-16 07:51:44 -07:00
Will Feng	bdd7dbfd4b	[C++ API] RNN / GRU / LSTM layer refactoring (#34322 ) Summary: This PR refactors RNN / GRU / LSTM layers in C++ API to exactly match the implementation in Python API. BC-breaking changes: - Instead of returning `RNNOutput`, RNN / GRU forward method now returns `std::tuple<Tensor, Tensor>`, and LSTM forward method now returns `std::tuple<Tensor, std::tuple<Tensor, Tensor>>`, matching Python API. - RNN / LSTM / GRU forward method now accepts the same inputs (input tensor and optionally hidden state), matching Python API. - RNN / LSTM / GRU layers now have `forward_with_packed_input` method which accepts `PackedSequence` as input and optionally hidden state, matching the `forward(PackedSequence, ...)` variant in Python API. - RNN / LSTM / GRU layers no longer have these fields: `w_ih` / `w_hh` / `b_ih` / `b_hh`. Instead, to access the weights and biases of the gates, users should do e.g. `rnn->named_parameters()["weight_ih_l0"]`, which mirrors the Python API `rnn.weight_ih_l0`. - In `RNNOptions` - `tanh()` / `relu()` / `activation` are removed. Instead, `nonlinearity` is added which takes either `torch::kTanh` or `torch::kReLU` - `layers` -> `num_layers` - `with_bias` -> `bias` - In `LSTMOptions` - `layers` -> `num_layers` - `with_bias` -> `bias` - In `GRUOptions` - `layers` -> `num_layers` - `with_bias` -> `bias` The majority of the changes in this PR focused on refactoring the implementations in `torch/csrc/api/src/nn/modules/rnn.cpp` to match the Python API. RNN tests are then changed to reflected the revised API design. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34322 Differential Revision: D20458302 Pulled By: yf225 fbshipit-source-id: ffff2ae1ddb1c742c966956f6ad4d7fba03dc54d	2020-03-15 17:48:29 -07:00
Martin Yuan	d4f182d06b	Add overloaded name to prim operators (#34280 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34280 To have prim ops searchable for lite interpreter, overloaded names need to be added for the operators with the same name but different schema. For example, aten::add in register_prim_ops.cpp. The difference is a combination of args and output type. `"aten::add(str a, str b) ->str"` `"aten::add(int a, int b) ->int"` `"aten::add(float a, float b) ->float"` `"aten::add(int a, float b) ->float"` `"aten::add(float a, int b) ->float"` `"aten::add(Scalar a, Scalar b) ->Scalar"` Solution: Use the argument type and/or output type (the same to the existing overloaded names). The overloaded name should be minimum as long as the operators can be differentiated. For other operators please look into the source code change for details. `"aten::add.str(str a, str b) ->str"` `"aten::add.int(int a, int b) ->int"` `"aten::add.float(float a, float b) ->float"` `"aten::add.int_float(int a, float b) ->float"` `"aten::add.float_int(float a, int b) ->float"` `"aten::add.Scalar_Scalar(Scalar a, Scalar b) ->Scalar"` Test Plan: Imported from OSS Differential Revision: D20456997 Pulled By: iseeyuan fbshipit-source-id: 2c3dc324b4a4e045559f62c6cc2a10fbb9a72dcf	2020-03-15 17:05:54 -07:00
Mike Ruberry	c86d1361b8	Removes unused THCTensor_(triu), THCTensor_(div) (#34712 ) Summary: Per title. Dead code removal. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34712 Differential Revision: D20442618 Pulled By: mruberry fbshipit-source-id: b03aa4984328f94021c1480e21375fd868d6d550	2020-03-15 16:42:35 -07:00
xiaobingsuper	c258e4732a	solve conv3d backward get incorrect result problem (#34358 ) Summary: Fix https://github.com/pytorch/pytorch/issues/34344. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34358 Differential Revision: D20461698 Pulled By: ngimel fbshipit-source-id: 472624d0037ab65d9dcc221f647ec68818be5fc9	2020-03-15 16:15:53 -07:00
xiaobingsuper	7848c229b8	Move min and max(reduce all) to Aten(CPU) (#33936 ) Summary: This PR is about port min and max(reduce all) to Aten. Performance test script: ``` import torch import timeit torch.manual_seed(0) #torch.set_num_threads(1) device = "cpu" print(f'device: {device}') for op in ('max', 'min'): for dtype in ('torch.double', 'torch.float', 'torch.int16', 'torch.int32', 'torch.int64'): for n, t in [(20_000, 200000), (200_000, 20000)]: print(f'a.{op}(), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit(f'a.{op}()', setup=f'import torch; a =(torch.torch.randn({n}) * 100).to({dtype})', number=t)) ``` Test device: skx-8180, 2 sockets Before: ``` a.max(), numel() == 20000 for 200000 times, dtype=torch.double 2.773961597122252 a.max(), numel() == 200000 for 20000 times, dtype=torch.double 2.3256353894248605 a.max(), numel() == 20000 for 200000 times, dtype=torch.float 3.800648272037506 a.max(), numel() == 200000 for 20000 times, dtype=torch.float 3.31692426931113 a.max(), numel() == 20000 for 200000 times, dtype=torch.int16 2.735901520587504 a.max(), numel() == 200000 for 20000 times, dtype=torch.int16 2.2510280115529895 a.max(), numel() == 20000 for 200000 times, dtype=torch.int32 2.723656536079943 a.max(), numel() == 200000 for 20000 times, dtype=torch.int32 2.228839812800288 a.max(), numel() == 20000 for 200000 times, dtype=torch.int64 2.703160767443478 a.max(), numel() == 200000 for 20000 times, dtype=torch.int64 2.3175809988752007 a.min(), numel() == 20000 for 200000 times, dtype=torch.double 2.820106916129589 a.min(), numel() == 200000 for 20000 times, dtype=torch.double 2.325718787498772 a.min(), numel() == 20000 for 200000 times, dtype=torch.float 3.833602518774569 a.min(), numel() == 200000 for 20000 times, dtype=torch.float 3.316444822587073 a.min(), numel() == 20000 for 200000 times, dtype=torch.int16 2.7308286419138312 a.min(), numel() == 200000 for 20000 times, dtype=torch.int16 2.198460517451167 a.min(), numel() == 20000 for 200000 times, dtype=torch.int32 2.730219766497612 a.min(), numel() == 200000 for 20000 times, dtype=torch.int32 2.2268200274556875 a.min(), numel() == 20000 for 200000 times, dtype=torch.int64 2.7342184390872717 a.min(), numel() == 200000 for 20000 times, dtype=torch.int64 2.320415544323623 ``` After: ``` a.max(), numel() == 20000 for 200000 times, dtype=torch.double 1.7767417253926396 a.max(), numel() == 200000 for 20000 times, dtype=torch.double 0.550495645031333 a.max(), numel() == 20000 for 200000 times, dtype=torch.float 1.1113408291712403 a.max(), numel() == 200000 for 20000 times, dtype=torch.float 0.44446005020290613 a.max(), numel() == 20000 for 200000 times, dtype=torch.int16 0.5246349424123764 a.max(), numel() == 200000 for 20000 times, dtype=torch.int16 0.47057845536619425 a.max(), numel() == 20000 for 200000 times, dtype=torch.int32 0.6597231412306428 a.max(), numel() == 200000 for 20000 times, dtype=torch.int32 0.40366593934595585 a.max(), numel() == 20000 for 200000 times, dtype=torch.int64 1.767227927222848 a.max(), numel() == 200000 for 20000 times, dtype=torch.int64 0.6187495030462742 a.min(), numel() == 20000 for 200000 times, dtype=torch.double 1.7881382443010807 a.min(), numel() == 200000 for 20000 times, dtype=torch.double 0.5440589748322964 a.min(), numel() == 20000 for 200000 times, dtype=torch.float 1.1090848250314593 a.min(), numel() == 200000 for 20000 times, dtype=torch.float 0.4293213738128543 a.min(), numel() == 20000 for 200000 times, dtype=torch.int16 0.5207074657082558 a.min(), numel() == 200000 for 20000 times, dtype=torch.int16 0.41422136034816504 a.min(), numel() == 20000 for 200000 times, dtype=torch.int32 0.6145811947062612 a.min(), numel() == 200000 for 20000 times, dtype=torch.int32 0.4172037309035659 a.min(), numel() == 20000 for 200000 times, dtype=torch.int64 1.7397673893719912 a.min(), numel() == 200000 for 20000 times, dtype=torch.int64 0.596766366623342 ``` Single thread: Before: ``` a.max(), numel() == 20000 for 200000 times, dtype=torch.double 2.5068740313872695 a.max(), numel() == 200000 for 20000 times, dtype=torch.double 2.234461876563728 a.max(), numel() == 20000 for 200000 times, dtype=torch.float 3.5549037409946322 a.max(), numel() == 200000 for 20000 times, dtype=torch.float 3.2497852174565196 a.max(), numel() == 20000 for 200000 times, dtype=torch.int16 2.493077039718628 a.max(), numel() == 200000 for 20000 times, dtype=torch.int16 2.171935741789639 a.max(), numel() == 20000 for 200000 times, dtype=torch.int32 2.469274105504155 a.max(), numel() == 200000 for 20000 times, dtype=torch.int32 2.273881389759481 a.max(), numel() == 20000 for 200000 times, dtype=torch.int64 2.5818942049518228 a.max(), numel() == 200000 for 20000 times, dtype=torch.int64 2.2394551979377866 a.min(), numel() == 20000 for 200000 times, dtype=torch.double 2.5894540259614587 a.min(), numel() == 200000 for 20000 times, dtype=torch.double 2.331936141476035 a.min(), numel() == 20000 for 200000 times, dtype=torch.float 3.590122046880424 a.min(), numel() == 200000 for 20000 times, dtype=torch.float 3.255849950015545 a.min(), numel() == 20000 for 200000 times, dtype=torch.int16 2.5205496419221163 a.min(), numel() == 200000 for 20000 times, dtype=torch.int16 2.168218174017966 a.min(), numel() == 20000 for 200000 times, dtype=torch.int32 2.658622432500124 a.min(), numel() == 200000 for 20000 times, dtype=torch.int32 2.3376982398331165 a.min(), numel() == 20000 for 200000 times, dtype=torch.int64 2.496626536361873 a.min(), numel() == 200000 for 20000 times, dtype=torch.int64 2.2504652086645365 ``` After: ``` a.max(), numel() == 20000 for 200000 times, dtype=torch.double 1.9525171788409352 a.max(), numel() == 200000 for 20000 times, dtype=torch.double 1.6108122132718563 a.max(), numel() == 20000 for 200000 times, dtype=torch.float 1.2444602297618985 a.max(), numel() == 200000 for 20000 times, dtype=torch.float 0.7705567870289087 a.max(), numel() == 20000 for 200000 times, dtype=torch.int16 0.6575072864070535 a.max(), numel() == 200000 for 20000 times, dtype=torch.int16 0.13242999743670225 a.max(), numel() == 20000 for 200000 times, dtype=torch.int32 0.829406064003706 a.max(), numel() == 200000 for 20000 times, dtype=torch.int32 0.35575105529278517 a.max(), numel() == 20000 for 200000 times, dtype=torch.int64 1.6426756298169494 a.max(), numel() == 200000 for 20000 times, dtype=torch.int64 1.4049720335751772 a.min(), numel() == 20000 for 200000 times, dtype=torch.double 2.029639278538525 a.min(), numel() == 200000 for 20000 times, dtype=torch.double 1.6363644907251 a.min(), numel() == 20000 for 200000 times, dtype=torch.float 1.3821239182725549 a.min(), numel() == 200000 for 20000 times, dtype=torch.float 0.834847847931087 a.min(), numel() == 20000 for 200000 times, dtype=torch.int16 0.6913397628813982 a.min(), numel() == 200000 for 20000 times, dtype=torch.int16 0.1370067736133933 a.min(), numel() == 20000 for 200000 times, dtype=torch.int32 0.8190992185845971 a.min(), numel() == 200000 for 20000 times, dtype=torch.int32 0.3640836915001273 a.min(), numel() == 20000 for 200000 times, dtype=torch.int64 1.6516661625355482 a.min(), numel() == 200000 for 20000 times, dtype=torch.int64 1.4111155439168215 ``` Fixes: https://github.com/pytorch/pytorch/issues/33197 Fix https://github.com/pytorch/pytorch/issues/24728, https://github.com/pytorch/pytorch/issues/24729 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33936 Differential Revision: D20461658 Pulled By: ngimel fbshipit-source-id: 5749260114ace3ea7b513e32edc805c844a19c8a	2020-03-15 16:09:58 -07:00
Pritam Damania	f058c03b15	Disallow sending CUDA tensors over RPC for current RPC agents. (#33604 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33604 For our current RPC agents, this PR disallows sending CUDA tensors over RPC and asks users to copy them explicitly to CPU. Currently, this seems to be the easiest contract to guarantee for our current RPC agents, otherwise if we do support this transparently it gets a little tricky in terms of whether a CUDA tensor on the client should be sent to CPU/GPU of the remote end and also which GPU device on the remote end. In the future, the TensorPipe RPC agent can have its own specific handling of CUDA tensors. Closes https://github.com/pytorch/pytorch/issues/28881 ghstack-source-id: 100166120 Test Plan: waitforbuildbot Differential Revision: D20020183 fbshipit-source-id: ca4d43d2a24e8fcd3a60b21e654aa0e953e756cb	2020-03-15 15:01:46 -07:00
Xiang Gao	f404537c26	CUDA Loops: move address computation into policy, make policy.load load all arguments (#33720 ) Summary: So that in the future we can make policy accept an offset calculator in its constructor for the support of non-contiguous tensors. The `elementwise_kernel_helper` is now very general and it can handle any cases: ```C++ template<typename func_t, typename policy_t> __device__ inline void elementwise_kernel_helper(func_t f, policy_t policy) { using traits = function_traits<func_t>; using return_t = typename traits::result_type; using args_t = typename traits::ArgsTuple; int idx = blockIdx.x; return_t results[thread_work_size]; cuda9::workaround::enable_default_constructor<args_t> args_[thread_work_size]; args_t args = reinterpret_cast<args_t >(&args_); // load policy.load(args, idx); // compute #pragma unroll for (int i = 0; i < thread_work_size; i++) { if (policy.check_inbounds(i)) { results[i] = c10::guts::apply(f, args[i]); } } // store policy.store(results, idx); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33720 Differential Revision: D20459652 Pulled By: ngimel fbshipit-source-id: aa8b122e0e8c6e08ab354785e04753ff778882e2	2020-03-15 14:41:05 -07:00
Kimish Patel	528aabd373	Fix backward compatibility check test for schemas containing (#34782 ) Summary: "torch.classes". BC check tests skips adding torch.classes based schemas to existing schemas. Removed the skip. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34782 Test Plan: cd test/backward_compatibility python dump_all_function_schemas.py --filename new_schemas.txt python check_backward_compatibility.py --new-schemas new_schemas.txt Before this PR fails with: ``` Mar 15 11:12:20 Broken ops: [ Mar 15 11:12:20 _xnnpack::conv2d_packed(Tensor X, __torch__.torch.classes.XNNPackConv2dOpContext W_prepack) -> (Tensor Y) Mar 15 11:12:20 _xnnpack::conv2d_prepack(Tensor W, Tensor? B, int[2] stride, int[2] padding, int[2] dilation, int groups) -> (__torch__.torch.classes.XNNPackConv2dOpContext) Mar 15 11:12:20 _xnnpack::linear_packed(Tensor X, __torch__.torch.classes.XNNPackLinearOpContext W_prepack) -> (Tensor Y) Mar 15 11:12:20 _xnnpack::linear_prepack(Tensor W, Tensor? B=None) -> (__torch__.torch.classes.XNNPackLinearOpContext) Mar 15 11:12:20 ] ``` After this PR, it passes. Reviewed By: houseroad Differential Revision: D20461994 Pulled By: kimishpatel fbshipit-source-id: de692644ee7d49accf2d8260cd3a10f6e147653a	2020-03-15 14:35:19 -07:00
Lu Fang	15c84c37b6	[PyTorch BC] Clean up the BC whitelist (#34784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34784 Remove stale items Test Plan: ci Reviewed By: hl475 Differential Revision: D20461740 fbshipit-source-id: 46dcc39f3a867165aadee182033b09ca65ee8551	2020-03-15 12:46:57 -07:00
Tugrul Ince	08bc3c6cbf	Remove unnecessary import (#34778 ) Summary: https://github.com/pytorch/pytorch/issues/34563 accidentally introduced a lint error due to an unused import. This PR removes this import. Jit tests run as expected after this change: ``` > python test/test_jit.py ..... Ran 2435 tests in 100.077s OK (skipped=140, expected failures=1) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34778 Differential Revision: D20459708 Pulled By: tugrulince fbshipit-source-id: bb742085fafc849ff3d9507d1557556e01fbeb4b	2020-03-15 09:56:55 -07:00
Xiaomeng Yang	1d81bd02cc	Export roi_align_gradient_op to c10 (#34776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34776 Export roi_align_gradient_op to c10 Test Plan: unittest Reviewed By: houseroad Differential Revision: D20459210 fbshipit-source-id: 80bf065f83bb44b39a150bae25b3591c16f522fa	2020-03-15 02:43:39 -07:00
Yinghai Lu	373c80ee90	Fix missing header (#34762 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34762 So far it's by luck that we somehow include "caffe2/core/tensor.h" before including "caffe2/caffe2/quantization/server/fbgemm_pack_blob.h". This is not safe and this diff fixes it. Test Plan: unittest Reviewed By: jianyuh Differential Revision: D20455352 fbshipit-source-id: 777dae32a23d0ec75fd7e5e1627426b5a5f81f5a	2020-03-15 00:19:42 -07:00
Will Feng	6c555e1508	Revert D20311699: [pytorch][PR] [C++ API] RNN / GRU / LSTM layer refactoring Test Plan: revert-hammer Differential Revision: D20311699 Original commit changeset: e2b60fc7bac6 fbshipit-source-id: 72f4a762189490998d6b716857eeac053a11742d	2020-03-14 16:18:48 -07:00
Kimish Patel	84bd71dbd4	Enable threading for XNNPACK ops. (#34547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34547 This enables threading by passing a threadpool to xnnpack ops. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20370553 fbshipit-source-id: 4db08e73f8c69b9e722b0e11a00621c4e229a31a	2020-03-14 12:53:36 -07:00
Kimish Patel	4da5569300	Pass to remove prepacking ops. (#34319 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34319 Removes prepacking ops and install them as attributes of the top level module. Needs to run freezing as the first pass. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20290726 fbshipit-source-id: 633ceaa867ff7d5c8e69bd814c0362018394cb3a	2020-03-14 12:53:31 -07:00
Kimish Patel	7dd5da2026	JIT pass to insert XNNPACK ops (#34048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34048 Rewrites the graph to insert xnnpack prepack and packed run ops for conv2d and linear. Test Plan: python test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20185658 fbshipit-source-id: c4c073c912ad33e822e7beb4ed86c9f895129d55	2020-03-14 12:53:27 -07:00
Kimish Patel	4c30fc7238	Integrate XNNPACK with custom class for packing weights. (#34047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34047 This PR integrates the added xnnpack conv2d and linear op via custom class registration for packed weights. The packed struct is serializable. Test Plan: python test test/test_xnnpack_integration.py Imported from OSS Differential Revision: D20185657 fbshipit-source-id: fc7e692d8f913e493b293b02d92f4e78536d7698	2020-03-14 12:51:56 -07:00
Will Feng	e23a9dc140	[C++ API] RNN / GRU / LSTM layer refactoring (#34322 ) Summary: This PR refactors RNN / GRU / LSTM layers in C++ API to exactly match the implementation in Python API. BC-breaking changes: - Instead of returning `RNNOutput`, RNN / GRU forward method now returns `std::tuple<Tensor, Tensor>`, and LSTM forward method now returns `std::tuple<Tensor, std::tuple<Tensor, Tensor>>`, matching Python API. - RNN / LSTM / GRU forward method now accepts the same inputs (input tensor and optionally hidden state), matching Python API. - RNN / LSTM / GRU now has `forward_with_packed_input` method which accepts `PackedSequence` as input and optionally hidden state, matching the `forward(PackedSequence, ...)` variant in Python API. - In `RNNOptions` - `tanh()` / `relu()` / `activation` are removed. Instead, `nonlinearity` is added which takes either `torch::kTanh` or `torch::kReLU` - `layers` -> `num_layers` - `with_bias` -> `bias` - In `LSTMOptions` - `layers` -> `num_layers` - `with_bias` -> `bias` - In `GRUOptions` - `layers` -> `num_layers` - `with_bias` -> `bias` The majority of the changes in this PR focused on refactoring the implementations in `torch/csrc/api/src/nn/modules/rnn.cpp` to match the Python API. RNN tests are then changed to reflected the revised API design. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34322 Differential Revision: D20311699 Pulled By: yf225 fbshipit-source-id: e2b60fc7bac64367a8434647d74c08568a7b28f7	2020-03-14 12:09:04 -07:00
Jerry Zhang	5710374e4e	[reland][quant][graphmode] Add quantized conv2d-relu fusion pattern (#33279 ) (#34744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34744 att Test Plan: python test/test_jit.py Differential Revision: D20449667 Pulled By: jerryzh168 fbshipit-source-id: 01bbc26604fac421dcaacaf4fa1b57731f1f08b7	2020-03-14 01:03:18 -07:00
James Reed	fb20621b3b	Move torchbind out of jit namespace (#34745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34745 Test Plan: Imported from OSS Differential Revision: D20450239 Pulled By: jamesr66a fbshipit-source-id: 3f5597626f21d7b5e329b57da358c76b531bf806	2020-03-13 23:03:14 -07:00
Supriya Rao	8a395882ce	[quant][onnx] Support conversion of quantized sigmoid operator from pytorch to caffe2 (#34629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34629 Add support for sigmoid in the conversion flow through onnx Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_quantized_sigmoid python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_small_model Imported from OSS Differential Revision: D20433680 fbshipit-source-id: 95943e14637d294122e4d102c5c19c06d27064c6	2020-03-13 22:42:06 -07:00
Supriya Rao	af28915164	[quant][onnx] Add support to convert max_pool2d quantized pytorch op to C2 (#33945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33945 Add mapping for this operator in symbolics Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_max_pool2d Imported from OSS Differential Revision: D20433681 fbshipit-source-id: 88f02ade698262a6f8824671830bc1f7d40bbfa6	2020-03-13 22:40:49 -07:00
Will Feng	d041d0784e	[C++ API] RNNCell / LSTMCell / GRUCell layers (#34400 ) Summary: This PR adds `RNNCell` / `LSTMCell` / `GRUCell` layers to the C++ frontend, with implementations exactly matching the Python API equivalent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34400 Differential Revision: D20316859 Pulled By: yf225 fbshipit-source-id: bb7cee092622334043c0d0fd0fcb4e75e707699c	2020-03-13 21:52:24 -07:00
Lingyi Liu	68758b2fa0	Add the quantized batch_norm3d and also batch_norm3d fused with relu operators (#34702 ) Summary: as title, for bringing up the quantized video model. Will add the batch_norm_relu test in another PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34702 Differential Revision: D20436092 Pulled By: lly-zero-one fbshipit-source-id: 116bd306f7880bfd763d8575654fbd6c92818338	2020-03-13 20:30:28 -07:00
Will Feng	da11646db1	[C++ API] Link to module options doc for functional that has same options as module (#34752 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34752 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20452681 Pulled By: yf225 fbshipit-source-id: 06b56a08bd480999353ebbff39c035225e4070df	2020-03-13 20:19:43 -07:00
Eli Uriegas	7dee36a061	.circleci: Remove CUDA 10.0, no longer needed (#34726 ) Summary: Since we've added CUDA 10.2, it is time to retire CUDA 10.0 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34726 Differential Revision: D20453081 Pulled By: seemethere fbshipit-source-id: fd5bb35325a5f1577d0f0404d16cd7dfe34c86ad	2020-03-13 18:55:45 -07:00
Zachary DeVito	52005b551c	invokeOperatorFromPython: support overloaded operator calling (#34671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34671 Like the python arg parser, this tries to convert to the schema in order. It introduces schema_match_exception which gets thrown when the schema doesn't match, allowing the overload handler to try the next option. Behavior will not 100% match the schema argument parser but should work for simple cases using custom binding. Test Plan: Imported from OSS Differential Revision: D20432206 Pulled By: zdevito fbshipit-source-id: 280839a2205ea3497db3a9b5741fccc1e2bff9a8	2020-03-13 18:46:03 -07:00
James Reed	ab76a8206f	[JIT][mobile] Support built-in Function call in lite interpreter (#34676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34676 Test Plan: Imported from OSS Differential Revision: D20427938 Pulled By: jamesr66a fbshipit-source-id: 79eebfa858776f26da55ffd49d3f78fa7ae0df9b	2020-03-13 18:24:18 -07:00
Michael Suo	af3a7e2b50	[jit] small cleanups after script:: removal (#34677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34677 1. Remove remaining uses of `script::` namespace from the codebase, 2. Add one more typedef for `script::ExtraFilesMap` which is part of the public interface. Pull Request resolved: #34580 Test Plan: Imported from OSS Reviewed By: zdevito Differential Revision: D20431739 Pulled By: suo fbshipit-source-id: a29d369c755b6506c53447ca1f286b6339222c9a	2020-03-13 17:56:16 -07:00
Jerry Zhang	e7910aa9e5	[fix] use non-inplace for insert observer pass (#34190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34190 inplace modification of ClassType might affect other tests, so we want to do non-inplace modifications. Actually the inplace argument will be removed soon. Test Plan: ci Imported from OSS Differential Revision: D20451765 fbshipit-source-id: e87ad528c4e7f84f5774b94a8e3e85568269682d	2020-03-13 17:25:07 -07:00
Elias Ellison	1734bd6871	skip mask_rcnn test (#34734 ) Summary: fix master Pull Request resolved: https://github.com/pytorch/pytorch/pull/34734 Differential Revision: D20447607 Pulled By: eellison fbshipit-source-id: 165c64f0484abf068b7d3a204a6bcb623ffe0910	2020-03-13 15:50:49 -07:00
Nikita Shulga	6d790c3611	Mark PyTorch incompatible with python-3.6.0 (#34724 ) Summary: Per https://github.com/pytorch/pytorch/issues/19161 PyTorch is incompatible with 3.6.0 due to the missing `PySlice_Unpack` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34724 Test Plan: CI + try to load pytorch binary using python-3.6.0 Differential Revision: D20449052 Pulled By: malfet fbshipit-source-id: 2c787fc64f5d1377c7f935ad2f3c77f46723d7dd	2020-03-13 15:22:34 -07:00
prajjwal1	aedffdf7d8	Support for Tensor Shape Type Hint (#34595 ) Summary: This PR is related to [https://github.com/pytorch/pytorch/issues/33953](https://github.com/pytorch/pytorch/issues/33953). I've created a directory `type_hint_tests` for the example as suggested by zou3519 [here](https://github.com/pytorch/pytorch/issues/33953#issuecomment-597716405). This directory is supposed to contain examples over which mypy will run. I've added the test in `test/test_type_hints.py`. The test can simply be invoked by ``` $ python3 test/test_type_hints.py Fail to import hypothesis in common_utils, tests are not derandomized .b'test/type_hint_tests/size.py:7: error: Tuple index out of range\ntest/type_hint_tests/size.py:8: error: Tuple index out of range\n' . ---------------------------------------------------------------------- Ran 2 tests in 13.660s OK ``` Note that I've not made the change of fixing the stub to show that the test works. The issue can be fixed by changing definition of Size in `class Size(Tuple[_int, ...]): ... ` in `/torch/__init__.pyi.in`. After changing the `Size` definition, the test passes. ``` $ python3 test/test_type_hints.py Fail to import hypothesis in common_utils, tests are not derandomized .b'' . ---------------------------------------------------------------------- Ran 2 tests in 19.382s OK ``` I will do that once i get approval from zou3519. This is an initial implementation, please provide your suggestions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34595 Differential Revision: D20441817 Pulled By: zou3519 fbshipit-source-id: 00a434adf5bca813960f4efea38aa6d6953fe85f	2020-03-13 15:16:24 -07:00
Jeff Hwang	c9ed111894	[caffe2][quantization] Add initializer and precision as read-only property to QueryTensorQparam (#34706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34706 as title Test Plan: test in stacked diff Reviewed By: csummersea Differential Revision: D20436618 fbshipit-source-id: e51ef0a22708425cd296c05f4089fe8c98eda90a	2020-03-13 15:09:35 -07:00
Rohan Varma	c371c3aba7	[rpc][profiler] add a test case to verify record_function context manager works (#34511 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34511 With https://github.com/pytorch/pytorch/pull/34122/files, issues with using record_function context manager and profiling RPCs were fixed. This adds a test case to verify that we can use RPC with the `record_function` decorator. ghstack-source-id: 100109932 Test Plan: Unit test change Differential Revision: D20352242 fbshipit-source-id: d6429e4352ad3b8d874dc0f27b23ecb6202e6b2b	2020-03-13 15:03:30 -07:00
Xiaomeng Yang	0f3b6f3dec	Add min function to cuda math compat (#34723 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34723 Add min function to cuda math compat Test Plan: unittest Reviewed By: houseroad Differential Revision: D20444517 fbshipit-source-id: 1a93343cc57249ef1101eeb7ef373266f6a2873a	2020-03-13 14:31:09 -07:00
Meghan Lele	a730abd997	[PyTorch][tools] Add linux64 clang-format hash Summary: This commit adds a reference hash for the linux64 clang-format binary and in doing so, enables this script to be used on Linux machines. Test Plan: Ran the script. ``` meghanl@devvm1517:caffe2 (ff25240c\|remote/master)$ export http_proxy=fwdproxy:8080 meghanl@devvm1517:caffe2 (ff25240c\|remote/master)$ export https_proxy=fwdproxy:8080 meghanl@devvm1517:caffe2 (ff25240c\|remote/master)$ python3 ./tools/clang_format_new.py --diff Downloading clang-format to /data/users/meghanl/fbsource/fbcode/caffe2/.clang-format-bin 0% \|################################################################\| 100% Using clang-format located at /data/users/meghanl/fbsource/fbcode/caffe2/.clang-format-bin/clang-format meghanl@devvm1517:caffe2 (ff25240c\|remote/master)$ echo $? 1 ``` A non-zero return code indicates that `clang-format` will make changes. Reviewed By: suo Differential Revision: D20434291 fbshipit-source-id: fa13766e9d94720d4b0d8a540d2f1507e788f7a5	2020-03-13 14:22:17 -07:00
Rohan Varma	f933fa3613	[docs][1.5] update RPC docs to reflect correct use of dist_autograd backwards and dist_optim step() (#34670 ) Summary: - Clarify that `torch.distributed.autograd.backwards()` does not use the current thread local autograd context, instead it looks it up based on the context_id passed in - Clarify the same for `torch.distributeed.optimizer.optim.step()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34670 Differential Revision: D20427645 Pulled By: rohan-varma fbshipit-source-id: a1a88de346cdd4dbe65fb2b7627157f86fd2b6a3	2020-03-13 14:09:23 -07:00
Tugrul Ince	c9023e3b12	Support left and right shift operators in JIT (#34563 ) Summary: With this PR, we can now support left and right shift operators in the JIT engine for <int, int> and <Tensor, int>. Updated tests pass as expected: ``` > python test/test_jit.py ... Ran 2427 tests in 84.861s OK (skipped=139, expected failures=1) ``` Running the following code with Python results in the output below: ``` > cat ~/expressions.py import torch torch.jit.script def fn(a, b): # type: (int, int) return ( a << b, # supported b >> a, # supported a & b, a \| b, a ^ b ) print(fn.graph) ``` ``` > python ~/expressions.py graph(%a.1 : int, %b.1 : int): %4 : int = aten::leftshift(%a.1, %b.1) # /home/ince/expressions.py:7:8 %7 : int = aten::rightshift(%b.1, %a.1) # /home/ince/expressions.py:8:8 %10 : int = aten::__and__(%a.1, %b.1) # /home/ince/expressions.py:9:8 %13 : int = aten::__or__(%a.1, %b.1) # /home/ince/expressions.py:10:8 %16 : int = aten::__xor__(%a.1, %b.1) # /home/ince/expressions.py:11:8 %17 : (int, int, int, int, int) = prim::TupleConstruct(%4, %7, %10, %13, %16) return (%17) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34563 Differential Revision: D20434209 Pulled By: tugrulince fbshipit-source-id: 886386c59755106e17b84778b8e495b80a6269cd	2020-03-13 13:00:33 -07:00
Elias Ellison	c34ee4fb6e	[JIT] disable test (#34722 ) Summary: I opened https://github.com/pytorch/pytorch/issues/34658 but it didn't work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34722 Differential Revision: D20444547 Pulled By: eellison fbshipit-source-id: 90aa06098587b48c9760a9c6df9bec01d642fcdb	2020-03-13 12:48:27 -07:00
Hong Xu	027d7f7ba5	Delete AT_WARN and replace all AT_WARN with TORCH_WARN (#34623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34623 The bandaid of "AT_WARN" keeps introducing new warnings. Let's get rid of it entirely. Close #34502 Test Plan: Imported from OSS Differential Revision: D20420112 Pulled By: albanD fbshipit-source-id: 7160c113cb4deb2d2f50a375356f423fe5e86f50	2020-03-13 12:27:22 -07:00
Eli Uriegas	4a599f47fb	scripts: Add script to promote conda packages (#34659 ) Summary: How this actually works: 1. Get's a list of URLs from anaconda for pkgs to download, most likely from pytorch-test 2. Download all of those packages locally in a temp directory 3. Upload all of those packages, with a dry run upload by default This, along with https://github.com/pytorch/pytorch/issues/34500 basically completes the scripting work for the eventual promotion pipeline. Currently testing with: ``` TEST_WITHOUT_GIT_TAG=1 TEST_PYTORCH_PROMOTE_VERSION=1.4.0 PYTORCH_CONDA_FROM=pytorch scripts/release/promote/conda_to_conda.sh ``` Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34659 Differential Revision: D20432687 Pulled By: seemethere fbshipit-source-id: c2a99f6cbc6a7448e83e666cde11d6875aeb878e	2020-03-13 12:14:58 -07:00
Nikita Shulga	b1dbe33056	Skip `TestNN.test_spectral_norm_load_state_` if PyTorch is compiled w… (#34686 ) Summary: …ithout lapack LAPACK is needed for `at::svd``, which is called from `pinverse()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34686 Test Plan: CI + local run Differential Revision: D20442637 Pulled By: malfet fbshipit-source-id: b3531ecc1197b0745ddcf50febb7fb4a7700d612	2020-03-13 11:36:33 -07:00
X Wang	40eff454ce	Fix max_pool2d NHWC for large tensors; fix incorrect use of cudaGetLastError() (#34519 ) Summary: This PR would fix https://github.com/pytorch/pytorch/issues/33988 and fix https://github.com/pytorch/pytorch/issues/34083. Previously, the max_pool2d_nhwc kernels used a shared memory with size proportional to the tensor size (c \* h \* w). When the tensor size is too large, the kernel launch fails. This PR follows the guidance in AdaptiveAvgPool2d_nhwc by increasing the number of grid_x with split in "C" dimension. With that change, there will be a maximum limit in the shared memory size (which is less than 48 kb) regardless of tensor size. A benchmark can be found at [here](`0b98146089/max-pool2d/max-pool2d.ipynb`). TL;DR barely any performance drop is found. cc csarofeen ptrblck jjsjann123 VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/34519 Differential Revision: D20388848 Pulled By: VitalyFedyunin fbshipit-source-id: 9454f385f9315afaab4a05303305578bbcd80b87	2020-03-13 11:28:49 -07:00
Will Feng	3924c55f4c	[C++ API] Update torch::nn functional docs (#34688 ) Summary: - `torch::nn::functional` functions must provide example for how to use the corresponding functional options - `torch::nn::functional` functions must link to the corresponding functional options - remove `TORCH_NN_FUNCTIONAL_USE_MODULE_OPTIONS` macro, and put `torch::nn::functional` options docs inside the functional namespace, right above functional declaration - `torch::nn::functional` options docs should not link back to torch::nn layers. Instead, they should have links to `torch::nn::functional::xxx` ---- This PR is BC-breaking in the following way: `TORCH_NN_FUNCTIONAL_USE_MODULE_OPTIONS` macro is removed, and user should explicitly write ```cpp namespace functional { using SomeFuncOptions = SomeModuleOptions; } // namespace functional ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34688 Differential Revision: D20431251 Pulled By: yf225 fbshipit-source-id: 7d4f27dca3aad2a1e523690927d7afb261b9d308	2020-03-13 10:27:28 -07:00
Xingying Cheng	27410318ad	[PyTorch][Mobile] Fix the operator latency issue. Summary: Last diff enabled operator stats for non-production build including AIBench. But the operator latency is off: https://our.intern.facebook.com/intern/aibench/details/414567479798816 as it is representing operator execution end time, and as the threadLocalDebugInfo was not set, the start time is 0. So this diff is fixing it by creating a new ThreadLocalDebugInfo object when op starts to run and store the model information for logging. Test Plan: ```buck run mode/mac aibench:run_bench_macos -- -b aibench/specifications/models/pytorch/pytext/pytext_mobile_inference.json --platform android --framework pytorch --remote --devices SM-G960F-8.0.0-26``` https://our.intern.facebook.com/intern/aibench/details/922804117425407 ```buck run mode/mac aibench:run_bench_macos -- -b aibench/specifications/models/pytorch/fbnet/fbnet_mobile_inference.json --platform android --framework pytorch --remote --devices SM-G960F-8.0.0-26``` https://our.intern.facebook.com/intern/aibench/details/593403202250750 Reviewed By: xta0 Differential Revision: D20436388 fbshipit-source-id: 740bc94c3f51daef6af9b45c1ed7a708f5fc8836	2020-03-13 09:49:54 -07:00
Andrew Delong	8e8a37d746	Fix bug in baddbmm corner case (#33467 ) (#33538 ) Summary: Ensure `torch.baddbmm(c, a, b)` returns `beta*c` when `a @ b` has empty inner dimension. Fixes https://github.com/pytorch/pytorch/issues/33467. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33538 Differential Revision: D20352352 Pulled By: albanD fbshipit-source-id: a7021c1979f82402ecea4784d6cc39783392ea16	2020-03-13 09:30:20 -07:00
Bangsheng Tang	8f854fb9e2	[1/n][multi-tower] add partition info in predictor construction (#34175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34175 to incorporate PartitionInfo added in D20015493 Test Plan: unit tests Reviewed By: yinghai Differential Revision: D20133759 fbshipit-source-id: 130db2d80bca3c05a7ec91292159f857046718e0	2020-03-13 09:23:39 -07:00
generatedunixname89002005287564	14c1ab049d	[Codemod][FBSourceGoogleJavaFormatLinter] Daily `arc lint --take GOOGLEJAVAFORMAT` Reviewed By: zertosh Differential Revision: D20415422 fbshipit-source-id: 860f8dd9dce0a2420792bafb7d3e58bd883ab7e4	2020-03-13 06:27:03 -07:00
Edward Yang	b93518a662	Revert D20422879: [pytorch][PR] Remove hotpatches that circumvent MAGMA bug Test Plan: revert-hammer Differential Revision: D20422879 Original commit changeset: 8dd7a30b5c31 fbshipit-source-id: a44dda3220d426a92b0e158e9903566be8701374	2020-03-13 06:00:11 -07:00
svcscm	6791ae51a5	Updating submodules Summary: GitHub commits: `e8f09733c7` `7e1606a407` `674cf41732` `e961892c6c` `a5dffd2784` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: eb2e20f65ba40bacbfeb1d0cb54ed373cca564ff	2020-03-13 04:17:59 -07:00
Rohan Varma	fd35596585	[docs][1.5] Update distributed autograd note (#34657 ) Summary: - Update API calls `backward` and `optim.step` now that we require `context_id` - Add notes to clarify purpose of distributed autograd context (this was a source of confusion in some feedback) - Add note that details why optimizer requires context_id - Clearly specify that we don't have SMART mode yet Pull Request resolved: https://github.com/pytorch/pytorch/pull/34657 Differential Revision: D20427667 Pulled By: rohan-varma fbshipit-source-id: 5f8a3539ccf648a78e9e9a0dfdfe389c678b1606	2020-03-12 22:56:32 -07:00
Chunli Fu	808f84ee35	[Shape Inference] Update shape inference in dper3 backend - C2 part (#34474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34474 Add InferQuantization - set current_dim_type_ to CONSTANT for quantization ops. Test Plan: buck test mode/opt-clang caffe2/caffe2/opt:bound_shape_inference_test Reviewed By: yinghai Differential Revision: D20332703 fbshipit-source-id: 36fa9bc81ae9f49dd00d8393d99ccce0884542df	2020-03-12 22:20:51 -07:00
Shen Li	ad4bc8c9b8	Best-effort Error Detection for Using Deleted UserRRefs (#34673 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34673 Test Plan: Imported from OSS Differential Revision: D20427839 Pulled By: mrshenli fbshipit-source-id: b1b12ca42a9ed5294806c53fa7d6f54e7dc8b188	2020-03-12 21:39:15 -07:00
Shen Li	f9aa0c870f	Use c10::str in py_rref.cpp (#34681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34681 Test Plan: Imported from OSS Differential Revision: D20428827 Pulled By: mrshenli fbshipit-source-id: 847486b3114f0e9a2ad5f80c5e44db82d977c6a2	2020-03-12 21:39:10 -07:00
Shen Li	673d56c838	Use c10::str in process_group_agent.cpp (#34679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34679 Test Plan: Imported from OSS Differential Revision: D20428467 Pulled By: mrshenli fbshipit-source-id: 2bfde4e383347c6e709109f074f55b9bc8068a49	2020-03-12 21:38:14 -07:00
Jerry Zhang	e9a660a160	Revert D20354878: [quant][graphmode] Add quantized conv2d-relu fusion pattern Test Plan: revert-hammer Differential Revision: D20354878 Original commit changeset: 2b19797d4b3f fbshipit-source-id: 18f447074794af0d579e145df02af47d01746921	2020-03-12 21:29:08 -07:00
Lingyi Liu	5d65b5cd01	Add the 3d upsample quantized op for video model (#34594 ) Summary: as title, we are currently missing this 3d op, which is required for video related model. Performance benchmark: ``` import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('**', str(dtype), '**') x = torch.rand(1, 56, 64, 56, 256) q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype) q_x = q_x.permute([0, 4, 1, 2, 3]) x = x.permute([0, 4, 1, 2, 3]) NITER = 100 s = time.time() for i in range(NITER): float_out = torch.nn.functional.interpolate(x, size=30, scale_factor=None, mode="nearest", align_corners=None) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.nn.functional.interpolate(q_x, size=30, scale_factor=None, mode="nearest", align_corners=None) time_per_iter_quant = (time.time() - s) / NITER ref_quantized = torch.quantize_per_tensor(float_out, 0.5, 1, dtype) torch.testing.assert_allclose(ref_quantized.dequantize(), quant_out.dequantize()) print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') bytes_float = (x.numel() + float_out.numel()) * x.element_size() bytes_quant = (q_x.numel() + quant_out.numel()) * q_x.element_size() float_bw_gbps = bytes_float / time_per_iter_float / 1e9 quant_bw_gbps = bytes_quant / time_per_iter_quant / 1e9 print('GB/s float', 'GB/s quant', sep='\t') print(float_bw_gbps, quant_bw_gbps, sep='\t') ``` ``` ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 1136.8209528923035 1.294245719909668 0.0011384780660638283 GB/s float GB/s quant 0.20510608588517917 45.03953391792442 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 827.9890131950378 1.11464262008667 0.0013462046021426 GB/s float GB/s quant 0.28160868355034036 52.29678369508914 torch.qint32 *** time/iter ms (float) time/iter ms (quant) quant/float 834.6958303451538 7.481417655944824 0.008963046638020456 GB/s float GB/s quant 0.2793459455806586 31.16640544920269 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34594 Differential Revision: D20389106 Pulled By: lly-zero-one fbshipit-source-id: d3a8c2cac58087d8b29e9cae64822f5b2d4c03ba	2020-03-12 21:06:38 -07:00
Edward Yang	d5f8c8f3ba	Revert D20121169: [pytorch][PR] ONNX Export Support for CrossEntropyLoss Test Plan: revert-hammer Differential Revision: D20121169 Original commit changeset: 7b56617e8c60 fbshipit-source-id: d7f302d1e54f3c978c3be0a0ad1ee600790a5b27	2020-03-12 20:30:54 -07:00
Chunli Fu	4ae74b3b25	[DPER3][Shape Inference] Initial Shape Inference in DPER3 frontend (#33607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33607 Differential Revision: D20025048 fbshipit-source-id: 8b3a3bcfeb450de4d38c555bf2bb116ddedad3ec	2020-03-12 20:25:50 -07:00
Jerry Zhang	0ff4d37933	[quant][graphmode] Add quantized conv2d-relu fusion pattern (#33279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33279 att Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20354878 fbshipit-source-id: 2b19797d4b3fd96918164a58bfbd768211ad6c6d	2020-03-12 19:49:57 -07:00
eellison	44256199a9	[JIT] remove specialized list ops (#34520 ) Summary: Now that lists are no longer specialized, we can register only one operator for list ops that are generic to their element type. This PR reorgs lists into three sets of ops: - CREATE_GENERIC_LIST_OPS - CREATE_SPECIALIZED_LIST_OPS - CREATE_COMPARATOR_LIST_OPS_SPECIALIZED (we didn't bind certain specialized ops to Tensor) This is important to land quickly because mobile is finalizing its bytecode soon, after which we could not remove these ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34520 Reviewed By: iseeyuan Differential Revision: D20429775 Pulled By: eellison fbshipit-source-id: ae6519f9b0f731eaa2bf4ac20736317d0a66b8a0	2020-03-12 17:49:23 -07:00
Eli Uriegas	c78eacb5ee	scripts: Add promotion script for s3 to pypi (#34500 ) Summary: Is reliant on scripts for promotion from s3 to s3 to have already run. A continuation of the work done in https://github.com/pytorch/pytorch/issues/34274 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34500 Test Plan: yeah_sandcastle Differential Revision: D20389101 Pulled By: seemethere fbshipit-source-id: 5e5b554cff964630c5414d48be35f14ba6894021	2020-03-12 17:21:23 -07:00
Meghan Lele	52787388d2	[tools] Add clang_format_new.py to download, verify and run clang-format binary (#34566 ) Summary: Summary This commit adds `tools/clang_format_new.py`, which downloads a platform-appropriate clang-format binary to a `.gitignored` location, verifies the binary by comparing its SHA1 hash to a reference hash (also included in this commit), and runs it on all files matched a specific regex in a list of whitelisted subdirectories of pytorch. This script will eventually replace `tools/clang_format.py`. Testing Ran the script. No Args ``` pytorch > ./tools/clang_format.py Downloading clang-format to /Users/<user>/Desktop/pytorch/.clang-format-bin 0% \|################################################################\| 100% Using clang-format located at /Users/<user>/Desktop/pytorch/.clang-format-bin/clang-format > echo $? 0 > git status <bunch of files> ``` `--diff` mode ``` > ./tools/clang_format.py --diff Using clang-format located at /Users/<user>/Desktop/pytorch/.clang-format-bin/clang-format Some files are not formatted correctly > echo $? 1 <format files using the script> > ./tools/clang_format.py --diff Using clang-format located at /Users/<user>/Desktop/pytorch/.clang-format-bin/clang-format All files are formatted correctly > echo $? 0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34566 Differential Revision: D20431290 Pulled By: SplitInfinity fbshipit-source-id: 3966f769cfb923e58ead9376d85e97127415bdc6	2020-03-12 17:08:54 -07:00
Jerry Zhang	90ca7a1feb	[quant][graphmode] Add Finalize function that inlines graph and produce quantized ops (#33927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33927 Test Plan: test will be added in later PRs Imported from OSS Differential Revision: D20354879 fbshipit-source-id: 03976f4b86c46dbdc4e45764a1e72f1a3855a404	2020-03-12 14:52:58 -07:00
Nikita Shulga	9f05fc9322	[Aten] First argument of check_names_valid_for() should be an unsigned value (#34158 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34158 Test Plan: CI Reviewed By: EscapeZero Differential Revision: D20232089 fbshipit-source-id: d74b5e36a139998e6967b7b6339001c49d9d58e8	2020-03-12 13:46:37 -07:00
Hao Lu	721bd11cc3	[caffe2] Refactor out common util functions from tvm_transformer (#34652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34652 Split from D20006007 because it needs to synced to open source and also for easy testing & landing. Test Plan: ``` buck test caffe2/caffe2/fb/tvm:test_tvm_transform ``` CI Reviewed By: yinghai Differential Revision: D20414037 fbshipit-source-id: 6e17dd9f8cffe87bc59c6e3cc6fd1f8d8def926b	2020-03-12 13:30:15 -07:00
Elias Ellison	787c307e63	Revert D20368543: [pytorch][PR] [JIT] remove specialized list ops Test Plan: revert-hammer Differential Revision: D20368543 Original commit changeset: ad0c6d70d2a6 fbshipit-source-id: b8b1a64ac830d5f544567714b940c57274194d3f	2020-03-12 12:55:49 -07:00
Shihao Xu	8c332ff84f	[JIT] EliminateDeadCode shouldn't remove custom operator node that has untracked mutation (#34635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34635 For custom op, it's removed in EliminateDeadCode IR optimization step, causing wrong training result. EliminateDeadCode decides to remove it, because it has no output, so output is used. Also, it has no side effect, and has no untracked mutation, which is not true, custom op can have untracked mutation. The if statement here only allows aten and prim operator to have untracked mutation, which should be removed. ghstack-source-id: 100001319 Test Plan: ``` buck test mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_jit buck build mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_jit \ && buck-out/gen/caffe2/torch/fb/distributed/pytorch/tests/test_jit\#binary.par -r test_use_dense_adagrad_step ``` Reviewed By: wanchaol Differential Revision: D7440221 fbshipit-source-id: e424417ab397d90075884c7050c59dfc5c84cf77	2020-03-12 12:37:32 -07:00
Chunli Fu	fe9b4e3cba	[DPER3] Blob Reorder (#33579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33579 Differential Revision: D20008865 fbshipit-source-id: f35aded311d9d1d7d438d828ccabd2bab5575e5c	2020-03-12 12:28:12 -07:00
peterjc123	9e6cd98c3f	Ensure torch_cuda is linked against on Windows (#34288 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31611. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34288 Differential Revision: D20314251 Pulled By: seemethere fbshipit-source-id: 15ab2d4de665d553a1622a2d366148697deb6c02	2020-03-12 12:16:44 -07:00
xiaobingsuper	31cd893899	remove some TH dead code (#34644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34644 Test Plan: Imported from OSS Differential Revision: D20423063 Pulled By: ngimel fbshipit-source-id: 2783345ea9b3ed65e51a7d0e17cfa29f2c12cc43	2020-03-12 12:10:32 -07:00
vishwakftw	cb06cb7b9f	Remove hotpatches that circumvent MAGMA bug (#34357 ) Summary: Changelog: - The magma implementation of small singular square batch matrices had a bug that resulted in nan values in the LU factorization result. This has been fixed in MAGMA 2.5.2. This PR removes the existing patch that was a temporary workaround for this bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34357 Test Plan: - Existing tests for det and lu should pass Differential Revision: D20422879 Pulled By: seemethere fbshipit-source-id: 8dd7a30b5c31fc5b844e0a11965efd46067e936a	2020-03-12 11:59:23 -07:00
gabloa	a74fbea345	Continuous bernoulli distribution (take 2) (#34619 ) Summary: We recently had a NeurIPS paper (https://arxiv.org/abs/1907.06845 and https://papers.nips.cc/paper/9484-the-continuous-bernoulli-fixing-a-pervasive-error-in-variational-autoencoders) where we introduce a new [0,1]-supported distribution: the continuous Bernoulli. This pull request implements this distribution in pytorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34619 Differential Revision: D20403123 Pulled By: ngimel fbshipit-source-id: d807c7d0d372c6daf6cb6ef09df178bc7491abb2	2020-03-12 11:53:18 -07:00
Ksenija Stanojevic	944ea4c334	ONNX Export Support for CrossEntropyLoss (#33767 ) Summary: Add ONNX export support for torch.nn.CrossEntropyLoss. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33767 Reviewed By: hl475 Differential Revision: D20121169 Pulled By: houseroad fbshipit-source-id: 7b56617e8c60617b922949fc8b4ecc626eedf7ed	2020-03-12 11:46:58 -07:00
peter	352e9b11e0	Attempt to resolve inconsistent dll linkage warnings on MSVC (#34639 ) Summary: Continue the work in https://github.com/pytorch/pytorch/pull/19242. Remove the template declarations that implies different dll linkage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34639 Differential Revision: D20419400 Pulled By: ezyang fbshipit-source-id: 5c7c30f0a4c3ba555589629f352ddb1c006c0c54	2020-03-12 11:41:02 -07:00
Jeremy Lilley	fff6fe83a7	[pytorch-rpc] WireSerializer should check has_storage() (#34626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34626 We need to check has_storage() before looking at it in cloneSparseTensors(), to avoid gratuitously throwing. Ideally, we'd add a test for this (I wrote one up but had to disable it), but won't work until JIT Pickler supports sparse tensors. ghstack-source-id: 100018077 Test Plan: buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcAgent/... Differential Revision: D20399971 fbshipit-source-id: 5debfa8140eb1f949d37336330223962cc320abc	2020-03-12 11:35:21 -07:00
rohithkrn	2f32b92763	[ROCm] Enable BFloat16 type for EmbeddingBag ops et al (#34630 ) Summary: This PR enables bfloat16 type for - Embedding, Index, Sigmoid Ops used in [DLRM](https://github.com/facebookresearch/dlrm) - Miscellaneous ops like comparison ops, arange op used in unit tests - Rename types list with the pattern `*_with_bfloat16` in `test_torch.py` to avoid confusion iotamudelta ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/34630 Differential Revision: D20405093 Pulled By: ezyang fbshipit-source-id: aa9538acf81b3a5a9a46ce5014529707fdf25687	2020-03-12 11:30:33 -07:00
svcscm	1e6c47413a	Updating submodules Summary: GitHub commits: `87f3feae5a` `cd6c8897f5` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 0c961541c715da74ae417ad25bf29f48e74e45d1	2020-03-12 11:23:39 -07:00
Pritam Damania	d81d65b2f7	Add entry for distributed tests to CODEOWNERS. (#34637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34637 ghstack-source-id: 100003837 Test Plan: waitforbuildbot Differential Revision: D20404552 fbshipit-source-id: a7f35beb8b78ad25e5cd000cd940dd7e94cc65de	2020-03-12 11:17:51 -07:00
eellison	f9f8424386	[JIT] remove specialized list ops (#34520 ) Summary: Now that lists are no longer specialized, we can register only one operator for list ops that are generic to their element type. This PR reorgs lists into three sets of ops: - CREATE_GENERIC_LIST_OPS - CREATE_SPECIALIZED_LIST_OPS - CREATE_COMPARATOR_LIST_OPS_SPECIALIZED (we didn't bind certain specialized ops to Tensor) This is important to land quickly because mobile is finalizing its bytecode soon, after which we could not remove these ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34520 Differential Revision: D20368543 Pulled By: eellison fbshipit-source-id: ad0c6d70d2a6be6ff0e948d6786052167fc43e27	2020-03-12 10:48:14 -07:00
Nathan Goldbaum	3f1ba3c465	Redo of "Add API for listing functions overridable by __torch_function__" (#34240 ) Summary: This is a redo of https://github.com/pytorch/pytorch/pull/33791, which was reverted because it introduced a flaky test. The test was flaky and only flaky on Python3.5 because of dict order randomization. I've fixed the issue with tests clobbering each other in b539fec and removed the override tests for `torch.nn.functional.tanh` and `torch.nn.functional.sigmoid`, which are deprecated and shouldn't be overridable in e0d7402. I also verified that no more test clobbering is happening. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34240 Differential Revision: D20252442 Pulled By: cpuhrsch fbshipit-source-id: 069568e342a41c90e1dc76cbf85ba4aed47f24be	2020-03-12 10:33:17 -07:00
Shihao Xu	4e07c35679	Delete all user forks tracked in RRefContext before graceful shutting down (#31893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31893 In order to resolve the issue summarized in https://github.com/pytorch/pytorch/issues/31325. The overal solution is to proactively send out delete fork messages from user nodes, before user nodes detecting rref leaks. As the first step, we want to have a weak ref tracker to track all user rrefs. ghstack-source-id: 100023142 Test Plan: V22 is the version that make User to wait on delete UseerRRef message. # Unit tests ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_nested_rref_stress --stress-runs 100 buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_nested_rref_stress buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par - r test_rref_forward_chain buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_non_garbage_collected_user_rref_due_to_local_circular_dependency ``` Reviewed By: mrshenli Differential Revision: D19292254 fbshipit-source-id: 92c3e8d0b00f183c5e22f163bdca482cc25a1ce9	2020-03-12 10:23:08 -07:00
Gregory Chanan	dd313f314e	Stop creating unnecessary Storage with newWithStorage1d. (#34389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34389 Test Plan: Imported from OSS Differential Revision: D20311060 Pulled By: gchanan fbshipit-source-id: 6d681e0a78e3ea3982d11cfd2eedca843f48302a	2020-03-12 10:18:28 -07:00
Gregory Chanan	518e9f94c2	Kill newWithStorage. (#34388 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34388 Test Plan: Imported from OSS Differential Revision: D20311059 Pulled By: gchanan fbshipit-source-id: 4619a99c7bea76b54b7938b798eedc5bc2983dd5	2020-03-12 10:18:23 -07:00
Gregory Chanan	9fd08b9c37	Get rid of newWithSize. (#34387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34387 Test Plan: Imported from OSS Differential Revision: D20311058 Pulled By: gchanan fbshipit-source-id: b62653fd31a181d06aa73cda68abe75614cea0a9	2020-03-12 10:17:15 -07:00
Will Feng	a54416d208	[C++ API] Remove deprecated torch::nn::BatchNorm / FeatureDropout / modules_ordered_dict and torch::nn::init::Nonlinearity / FanMode (#34508 ) Summary: This PR is BC-breaking in the following way: - The deprecated `torch::nn::BatchNorm` is removed in favor of `torch::nn::BatchNorm{1,2,3}d` - The deprecated `torch::nn::FeatureDropout` is removed in favor of `torch::nn::Dropout{2,3}d` - The deprecated `torch::nn::modules_ordered_dict` is removed. User should do `Sequential sequential({{"m1", MyModule(1)}, {"m2", MyModule(2)}})` instead. - The deprecated `torch::nn::init::Nonlinearity` is removed, in favor of the following enums: - `torch::kLinear` - `torch::kConv1D` - `torch::kConv2D` - `torch::kConv3D` - `torch::kConvTranspose1D` - `torch::kConvTranspose2D` - `torch::kConvTranspose3D` - `torch::kSigmoid` - `torch::kTanh` - `torch::kReLU` - `torch::kLeakyReLU` - The deprecated `torch::nn::init::FanMode` is removed, in favor of the following enums: - `torch::kFanIn` - `torch::kFanOut` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34508 Differential Revision: D20351601 Pulled By: yf225 fbshipit-source-id: cca0cd112f29a31bb023e348ca8f82780e42bea3	2020-03-12 10:09:58 -07:00
Mansoor	e95657b87e	[C++ API] AdaptiveLogSoftmaxWithLoss (#29076 ) Summary: Implemented AdaptiveLogSoftmaxWithLoss and some tests for modules. Reference https://github.com/pytorch/pytorch/issues/25883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29076 Differential Revision: D20404588 Pulled By: yf225 fbshipit-source-id: edbadf432b8173cbcc6caf83c9c03dd92dc31a37	2020-03-12 09:53:58 -07:00
albanD	157d2d7825	Fix version check for grad_fn for views (#34145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34145 This fix the following behavior: ```python import torch class MyFn(torch.autograd.Function): staticmethod def forward(ctx, inp, inplace): view = inp.clone()[:3] if inplace: view += 2 return view staticmethod def backward(ctx, grad): return grad, None base = torch.rand(10, requires_grad=True) foo = MyFn.apply(base, False) print(foo.grad_fn) # <torch.autograd.function.MyFnBackward object at 0x7f5fd28c4d18> foo = MyFn.apply(base, True) print(foo.grad_fn) # <AsStridedBackward object at 0x7f601c0c3cf0> ``` Where both should be printing `MyFnBackward`. Test Plan: Imported from OSS Differential Revision: D20229907 Pulled By: albanD fbshipit-source-id: 5ebd315d459023017d51760c5bafe43acd5fc3e2	2020-03-12 09:47:56 -07:00
Vasiliy Kuznetsov	43c9cc7a9c	add quantized ELU activation (#34267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34267 Adds quantized ELU. Test Plan: ``` python test/test_quantized.py TestQuantizedOps.test_qelu ``` still need to benchmark, saving that for after the review comments Imported from OSS Differential Revision: D20370953 fbshipit-source-id: fe941bf966f72dd9eee2c4b2ef45fe7afb50c866	2020-03-12 09:31:00 -07:00
Elias Ellison	514cba0661	[JIT] remove builtin interpolate functions (#34514 ) Summary: `torch.nn.functional.interpolate` was written as a builtin op when we scripted the standard library, because it has four possible overloads. As a result, whenever we make a change to `interpolate`, we need to make changes in two places, and it also makes it impossible to optimize the interpolate op. The builtin is tech debt. I talked with ailzhang, and the symbolic script changes are good to remove (i guess that makes a third place we needed to re-implement interpolate). I'm trying to get rid of unneccessary builtin operators because we're standardizing mobile bytecode soon, so we should try to get this landed as soon as possible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34514 Differential Revision: D20391089 Pulled By: eellison fbshipit-source-id: abc84cdecfac67332bcba6b308fca4db44303121	2020-03-12 09:21:33 -07:00
Vitaly Fedyunin	962e362427	Fix _cat operator (#34591 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34591 Test Plan: Imported from OSS Differential Revision: D20388000 Pulled By: VitalyFedyunin fbshipit-source-id: 8ae7593dbddc1a96a03193a99afc9a4ce46203ad	2020-03-12 09:20:10 -07:00
Nikita Shulga	a22008f91e	Prohibit copying autograd engines (#34567 ) Summary: Make sure that there could not be more than one instance of either `torch::autograd::Engine` or `torch::autograd::python::PythonEngine` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34567 Test Plan: CI Differential Revision: D20390622 Pulled By: malfet fbshipit-source-id: c90595032afc88f552dee52901361b58b282dc1a	2020-03-12 08:06:53 -07:00
Terence Feng	3c76b2aeea	Replace THPLayout with at::Layout in Python Argument Parser (#34543 ) (#34584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34584 Test Plan: ``` python setup.py develop python test/test_torch.py ``` Output: ``` ... Ran 3834 tests in 198.825s OK (skipped=180) ``` Imported from OSS Differential Revision: D20403330 fbshipit-source-id: 41474d5e7001db070f98ac8379f909f0ac74deb6	2020-03-12 07:19:00 -07:00
Lingyi Liu	f70945b1c3	fix the quantized batchnorm2d (#34579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34579 Differential Revision: D20382783 Pulled By: lly-zero-one fbshipit-source-id: dadfc4974cb4c808f1eedf8cc4ec52ec8d3ea1b0	2020-03-12 00:48:40 -07:00
Michael Suo	c235be42dd	[jit] kill script namespace (#34515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34515 Once upon a time we thought this was necessary. In reality it is not, so removing it. For backcompat, our public interface (defined in `api/`) still has typedefs to the old `script::` names. There was only one collision: `Pass` as a `Stmt` and `Pass` as a graph transform. I renamed one of them. Test Plan: Imported from OSS Differential Revision: D20353503 Pulled By: suo fbshipit-source-id: 48bb911ce75120a8c9e0c6fb65262ef775dfba93	2020-03-11 23:32:48 -07:00
Edward Yang	cf8b728255	Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema. (#34588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34588 I constructed the patch by deleting OperatorOptions and then rerouting all queries for AliasAnalysisKind to FunctionSchema. Some of the behavior is kind of bogus: we really shouldn't be mutating FunctionSchema after the fact, but that won't get fixed until we actually switch to true schema merging. Reland of https://github.com/pytorch/pytorch/pull/34160 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20387079 Pulled By: ezyang fbshipit-source-id: d189f7a6ad8cd186b88b6fbfa3f189994eea14e8	2020-03-11 20:59:46 -07:00
Samuel	b039bca4db	Fix typo in data.rst (#34624 ) Summary: Fix minor typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/34624 Differential Revision: D20401946 Pulled By: ngimel fbshipit-source-id: 0c6a7d838aa15120b3ecb8b9ba4b57550c9bcd32	2020-03-11 19:40:18 -07:00
Linbin Yu	2fe7fc681d	[PT] add macro to expose caffe2 ops to PyTorch mobile (#34578 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34578 Right now C10_EXPORT_CAFFE2_OP_TO_C10_CPU didn't work on mobile since we disabled some code paths. This diff added a new macro to enable these code paths so we can register caffe2 ops in PT mobile. Test Plan: verified caffe2 ops are registered in PT mobile (on the whole stack) ``` _caffe2::BBoxConcatBatchSplits(Tensor[] input_list, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output) _caffe2::BBoxTransform(Tensor rois, Tensor deltas, Tensor im_info, float[] weights, bool apply_scale, bool rotated, bool angle_bound_on, int angle_bound_lo, int angle_bound_hi, float clip_angle_thresh, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output_0, Tensor output_1) _caffe2::BoxWithNMSLimit(Tensor scores, Tensor boxes, Tensor batch_splits, float score_thresh, float nms, int detections_per_im, bool soft_nms_enabled, str soft_nms_method, float soft_nms_sigma, float soft_nms_min_score_thres, bool rotated, bool cls_agnostic_bbox_reg, bool input_boxes_include_bg_cls, bool output_classes_include_bg_cls, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor scores, Tensor boxes, Tensor classes, Tensor batch_splits, Tensor keeps, Tensor keeps_size) _caffe2::GenerateProposals(Tensor scores, Tensor bbox_deltas, Tensor im_info, Tensor anchors, float spatial_scale, int pre_nms_topN, int post_nms_topN, float nms_thresh, float min_size, bool angle_bound_on, int angle_bound_lo, int angle_bound_hi, float clip_angle_thresh, bool legacy_plus_one, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor output_0, Tensor output_1) _caffe2::HeatmapMaxKeypoint(Tensor heatmaps, Tensor bboxes_in, bool should_output_softmax=True, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor keypoints) _caffe2::ResizeNearest(Tensor X, str order, float width_scale, float height_scale, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor Y) _caffe2::RoIAlign(Tensor features, Tensor rois, str order, float spatial_scale, int pooled_h, int pooled_w, int sampling_ratio, bool aligned, Tensor[]? _caffe2_preallocated_outputs=None) -> (Tensor) Reviewed By: dreiss Differential Revision: D20128254 fbshipit-source-id: 49a837dddc431eb528b5c72ffdfe0d0131cd10b4	2020-03-11 19:15:14 -07:00
Xianjie Chen	0dc0fffca1	[net_transform] only skip ConstantFill for autogen_grad (#34628 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34628 Differential Revision: D20370564 fbshipit-source-id: 854c8ab44ba262e5020383447ed6bb629064ec33	2020-03-11 19:09:52 -07:00
Xiang Gao	86fb522acd	Remove cudaMemcpy on full memory overlap (#34548 ) Summary: TensorIterator is already checking partial overlap, so there is no trivial UB, but TensorITerator allows full overlap, and it is not a bad idea to skip the memcpy in such case. fixes: https://github.com/pytorch/pytorch/issues/34525 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34548 Differential Revision: D20371643 Pulled By: ngimel fbshipit-source-id: ff9e2e872537010afe040204e008b2499af963ad	2020-03-11 17:36:03 -07:00
Kimish Patel	adb8e26182	Fix for handling batch size 0. (#34599 ) Summary: Separating this out in a different diff, however since most of the xnnpack integration is not tested until the PR https://github.com/pytorch/pytorch/issues/34047, this was not caught till then. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34599 Test Plan: Tested in test/test_xnnpack_integration.py via https://github.com/pytorch/pytorch/issues/34047. Differential Revision: D20391000 Pulled By: kimishpatel fbshipit-source-id: 596a3e54445072ab63f700d425d07c7f44586683	2020-03-11 16:36:28 -07:00
Will Feng	9064fafb6e	[C++ API] Update torch::nn layer docs (#34522 ) Summary: This PR updates C++ API torch::nn layer docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34522 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20380832 Pulled By: yf225 fbshipit-source-id: ee99a838ec05c6ce2a23aa97555707e507d09958	2020-03-11 16:09:09 -07:00
Meghan Lele	56832bf7f3	[JIT] Add support for tolist for GPU-resident Tensors (#34554 ) Summary: Summary This commit modifies the JIT implementation of `Tensor.tolist` so that it can be called on GPU-resident Tensors as well. If the Tensors is not on the CPU when the operator is invoked, it is copied to the CPU before doing any of the rest of the work to convert it into a list. Testing This commit adds GPU versions of some of the existing CPU tests for this feature. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34554 Differential Revision: D20392604 Pulled By: SplitInfinity fbshipit-source-id: 69c17b98d866428c19d683588046169538aaf1e3	2020-03-11 15:14:12 -07:00
Michael Suo	866505b100	[ci] try to fix rocm builds (#34600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34600 They are failing with: ``` E: The method driver /usr/lib/apt/methods/https could not be found. ``` Trying the solution recommended in: https://unix.stackexchange.com/questions/263801/apt-get-fails-the-method-driver-usr-lib-apt-methods-https-could-not-be-found The long-term solution is to move all this to be pre-installed in the docker image. Test Plan: Imported from OSS Differential Revision: D20391153 Pulled By: suo fbshipit-source-id: 959dff2ea9e77bb52739c0659e9d800cdbe4cb01	2020-03-11 15:01:12 -07:00
lordeddard	2de4f245c6	Fix typo in documentation (#34581 ) Summary: Update the parameter description of `total_steps` in `OneCycleLR`. References https://github.com/pytorch/pytorch/issues/34531 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34581 Differential Revision: D20386306 Pulled By: albanD fbshipit-source-id: f8b424a01760e8f5d4de5367b6c60fb342019689	2020-03-11 13:57:10 -07:00
Peng Xia	25e4e9eb86	[On-device Benchmark] speed_benchmark_torch switch to log latency from dataset level to row level (#34598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34598 as above Test Plan: test.txt ``` what time is it now could you set a reminder at 7 am waht is the weather today ``` example json ``` { "model": { "category": "CNN", "description": "Assistant Mobile Inference", "files": { "model": { "filename": "model.pt1", "location": "//everstore/GICWmAB2Znbi_mAAAB0P51IPW8UrbllgAAAP/model.pt1", "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3" }, "data": { "filename": "input.txt", "location": "/home/pengxia/test/input.txt", "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3" } }, "format": "pytorch", "framework": "pytorch", "kind": "deployment", "name": "Assistant Mobile Inference" }, "tests": [ { "command": "{program} --model {files.model} --input_dims \"1\" --input_type NLUType --warmup {warmup} --iter 5 --input_file {files.data} --report_pep true", "identifier": "{ID}", "metric": "delay", "iter": 15, "warmup": 2, "log_output": true } ] } ``` iter = 5 (--iter 5 ) 3(3 lintes in the test.txt) = 15 arbabu123 I will provide a wrapper to compute the iter in future. run following command ``` buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/assistant_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices SM-G960U-8.0.0-26 ``` results https://our.intern.facebook.com/intern/aibench/details/275259559594003 Note: this is compatible with the existing examples.* Reviewed By: kimishpatel, ljk53 Differential Revision: D20389285 fbshipit-source-id: 80165ef394439a307ac7986cf540a80fdf3d85d6	2020-03-11 13:51:42 -07:00
Gemfield	70f3298684	Fix SELECTED_OP_LIST file path issue (#33942 ) Summary: If SELECTED_OP_LIST is specified as a relative path in command line, CMake build will fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33942 Differential Revision: D20392797 Pulled By: ljk53 fbshipit-source-id: dffeebc48050970e286cf263bdde8b26d8fe4bce	2020-03-11 13:19:31 -07:00
James Reed	1f834b5c2a	[JIT] Torchbind error if python instantiate class that doesnt exist (#34568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34568 Test Plan: Imported from OSS Differential Revision: D20378106 Pulled By: jamesr66a fbshipit-source-id: 395a3b05d23727b9cfd074440b2d0e8ef002ec09	2020-03-11 13:13:08 -07:00
Syoyo Fujita	12fb8148e4	Disable ROCM when building mobile libtorch. (#34478 ) Summary: When a system has ROCm dev tools installed, `scripts/build_mobile.sh` tried to use it. This PR fixes looking up unused ROCm library when building libtorch mobile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34478 Differential Revision: D20388147 Pulled By: ljk53 fbshipit-source-id: b512c38fa2d3cda9ac20fe47bcd67ad87c848857	2020-03-11 11:28:32 -07:00
Rohan Varma	b553e6911a	[distributed] quicker exit in the case of failed tests in distributed (#34150 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34150 In the distributed setting we commonly have tests in which there are errors where one process exits but the other do not (since they are for example waiting for work from the process that exited). Currently, when this situation happens we do not handle this well, and wait for process 0 to timeout. This results in wasted time waiting for test errors and a less helpful "Process 0 timed out..." error message when the error was actually something else. This diff fixes the issue by checking for exited subprocesses and terminating the test when we see a subprocess that has exited uncleanly. We still enforce timeouts and return when all processes have exited cleantly in the happy path. ghstack-source-id: 99921462 Test Plan: All distributed tests + tested by writing tests that should trigger the unclean subprocess detection, and verified that we exit quickly instead of waiting for the entire timeout. Differential Revision: D20231032 fbshipit-source-id: 3e0d4a20925b7d1098ec4c40ffcc66845425dd62	2020-03-11 11:27:17 -07:00
ettiee	2cf576e9ea	small typos (#34589 ) Summary: Spotted a couple of small typos 🙏 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34589 Differential Revision: D20387653 Pulled By: ngimel fbshipit-source-id: 3089fe606ccb8c8ee57cf7a900aba714fd0ce567	2020-03-11 11:01:31 -07:00
Gregory Chanan	82cdd3abae	Stop last usage of newWithSize. (#34386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34386 Test Plan: Imported from OSS Differential Revision: D20311061 Pulled By: gchanan fbshipit-source-id: 1e90a90db2efa1a566d4a78a6d1b8d918b91cf66	2020-03-11 09:58:30 -07:00
Edward Yang	4b929e5466	Revert D20193196: [pytorch][PR] PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem Test Plan: revert-hammer Differential Revision: D20193196 Original commit changeset: 78a487991242 fbshipit-source-id: 8da4f8cb17c45af41e8c0ce80bc72581eb10dbb8	2020-03-11 09:24:34 -07:00
Edward Yang	6f8a8e4e47	Revert D20282846: Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema. Test Plan: revert-hammer Differential Revision: D20282846 Original commit changeset: ba7bca6e8adc fbshipit-source-id: b9e15d2b2c3d1dbc6e971ab3c0bdf380e769dcf1	2020-03-11 07:50:29 -07:00
Edward Yang	63964175b5	Revert D20379910: [pytorch][PR] Set USE_RCCL cmake option (dependent on USE_NCCL) Test Plan: revert-hammer Differential Revision: D20379910 Original commit changeset: 981f924be93d fbshipit-source-id: 2cfc2eebe6ebabf801f0ea6a183aad2342ada79f	2020-03-11 07:41:13 -07:00
Pearu Peterson	2ec779d46c	PCA and SVD for low-rank matrices, LOBPCG for positive-defined generalized eigenvalue problem (#29488 ) Summary: This PR implements the following linear algebra algorithms for low-rank matrices: - [x] Approximate `A` as `Q Q^H A` - using Algorithm 4.4 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061). + exposed as `torch.lowrank.get_approximate_basis(A, q, niter=2, M=None) -> Q` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices + [x] documentation - [x] SVD - using Algorithm 5.1 from [Halko et al, 2009](http://arxiv.org/abs/0909.4061). + uses `torch.lowrank.get_approximate_basis` + exposed as `torch.svd_lowrank(A, q=6, niter=2, M=None) -> (U, S, V)` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices + [x] documentation - [x] PCA - using `torch.svd_lowrank` + uses `torch.svd_lowrank` + exposed as `torch.pca_lowrank(A, center=True, q=None, niter=2) -> (U, S, V)` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices, uses non-centered sparse matrix algorithm + [x] documentation - [x] generalized eigenvalue solver using the original LOBPCG algorithm [Knyazev, 2001](https://epubs.siam.org/doi/abs/10.1137/S1064827500366124) + exposed as `torch.lobpcg(A, B=None, k=1, method="basic", ...)` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices + [x] documentation - [x] generalized eigenvalue solver using robust LOBPCG with orthogonal basis selection [Stathopoulos, 2002](https://epubs.siam.org/doi/10.1137/S1064827500370883) + exposed as `torch.lobpcg(A, B=None, k=1, method="ortho", ...)` + [x] dense matrices + [x] batches of dense matrices + [x] sparse matrices + [x] documentation - [x] generalized eigenvalue solver using the robust and efficient LOBPCG Algorithm 8 from [Duersch et al, 2018](https://epubs.siam.org/doi/abs/10.1137/17M1129830) that switches to orthogonal basis selection automatically + the "ortho" method improves iterations so rapidly that in the current test cases it does not make sense to use the basic iterations at all. If users will have matrices for which basic iterations could improve convergence then the `tracker` argument allows breaking the iteration process at user choice so that the user can switch to the orthogonal basis selection if needed. In conclusion, there is no need to implement Algorithm 8 at this point. - [x] benchmarks + [x] `torch.svd` vs `torch.svd_lowrank`, see notebook [Low-rank SVD](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/Low-rank%20SVD.ipynb). In conclusion, the low-rank SVD is going to be useful only for large sparse matrices where the full-rank SVD will fail due to memory limitations. + [x] `torch.lobpcg` vs `scipy.sparse.linalg.lobpcg`, see notebook [LOBPCG - pytorch vs scipy](https://github.com/Quansight/pearu-sandbox/blob/master/pytorch/LOBPCG%20-%20pytorch%20vs%20scipy.ipynb). In conculsion, both implementations give the same results (up to numerical errors from different methods), scipy lobpcg implementation is generally faster. + [x] On very small tolerance cases, `torch.lobpcg` is more robust than `scipy.sparse.linalg.lobpcg` (see `test_lobpcg_scipy` results) Resolves https://github.com/pytorch/pytorch/issues/8049. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29488 Differential Revision: D20193196 Pulled By: vincentqb fbshipit-source-id: 78a4879912424595e6ea95a95e483a37487a907e	2020-03-11 07:33:49 -07:00
Peter Bell	5fc5cf6571	Stop using ctypes to interface with CUDA libraries. (#33678 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33016, Continuation of https://github.com/pytorch/pytorch/issues/31160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33678 Differential Revision: D20249187 Pulled By: ezyang fbshipit-source-id: 172ce4a0fee7fbe01436a421d1af22ef6173b6ed	2020-03-11 07:22:46 -07:00
Edward Yang	9d42177a31	Delete OperatorOptions, absorb AliasAnalysisKind into FunctionSchema. (#34160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34160 I constructed the patch by deleting OperatorOptions and then rerouting all queries for AliasAnalysisKind to FunctionSchema. Some of the behavior is kind of bogus: we really shouldn't be mutating FunctionSchema after the fact, but that won't get fixed until we actually switch to true schema merging. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20282846 Pulled By: ezyang fbshipit-source-id: ba7bca6e8adc3365789639b88e54c4e881b1692e	2020-03-11 07:15:18 -07:00
Edward Yang	b2344b70da	Beef up documentation on Dispatcher.h, reorder methods for clarity. (#33838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33838 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20227875 Pulled By: ezyang fbshipit-source-id: 319855b1f0fa436f9ed5256d2106b07f20e6b833	2020-03-11 07:13:39 -07:00
Kurt Mohler	fbbeee0983	Port `remainder` from TH to ATen (CPU and CUDA) (#34136 ) Summary: CPU issue https://github.com/pytorch/pytorch/issues/24753 CUDA issue https://github.com/pytorch/pytorch/issues/24615 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34136 Differential Revision: D20375458 Pulled By: ezyang fbshipit-source-id: 1a9fb39a7e2d17a0d31bd14b211eaacea060e834	2020-03-11 07:08:11 -07:00
Jiakai Liu	7aca9afdfb	[pytorch] remove boilerplate setQEngine() from PyTorch mobile predictors (#34556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34556 According to https://github.com/pytorch/pytorch/pull/34012#discussion_r388581548, this `at::globalContext().setQEngine(at::QEngine::QNNPACK);` call isn't really necessary for mobile. In Context.cpp it selects the last available QEngine if the engine isn't set explicitly. For OSS mobile prebuild it should only include QNNPACK engine so the default behavior should already be desired behavior. It makes difference only when USE_FBGEMM is set - but it should be off for both OSS mobile build and internal mobile build. Test Plan: Imported from OSS Differential Revision: D20374522 Pulled By: ljk53 fbshipit-source-id: d4e437a03c6d4f939edccb5c84f02609633a0698	2020-03-11 00:55:14 -07:00
Pritam Damania	2ce9513b0c	AccumulateGrad: ensure sparse tensor indices and values refcount is always 1 (#34559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34559 We check the use_count for indices and values when we avoid a clone for sparse tensors. The sparse tensor grad itself might have a higher refcount due to DDP hooks/dist autograd structures holding refs, but the indices and values inside the sparse tensor should always have a refcount of 1. ghstack-source-id: 99900534 Test Plan: waitforbuildbot Differential Revision: D20375239 fbshipit-source-id: 6a654549d13071ab3451cef94259caf7627b575c	2020-03-10 23:41:44 -07:00
Ailing Zhang	ab2297dfe6	Add Tensor overload for start in narrow. (#34317 ) Summary: https://github.com/pytorch/pytorch/issues/31558 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34317 Differential Revision: D20294333 Pulled By: ailzhang fbshipit-source-id: 47c6646ae298e04a455923bd5048db026a5e3c7c	2020-03-10 22:33:22 -07:00
Vasiliy Kuznetsov	2e88a78d2e	add quantized_hardtanh (#34097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34097 Adds quantized hardtanh. Calls the clamp kernel behind the scenes. Test Plan: ``` python test/test_quantized.py ``` Imported from OSS Differential Revision: D20208860 fbshipit-source-id: 165a6a1c22f1dcc479679e5ea0c990d0e9c3b6c5	2020-03-10 22:27:15 -07:00
Shihao Xu	8d84c5f1c7	Fix static data initialization deadlock on GIL (#34505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34505 A thread could hold GIL when calling PythonRpcHandler::getInstance(), meantime another thread could have been doing static data initialization by calling `new PythonRpcHandler()`, inside of which GIL is also required. Static data initialization is thread-safe, so the thread holding the GIL will wait for the other thread to finish static data initializating before going forward. Because the initialization can't proceed without GIL, there is a deadlock. We ask the calling thread to release GIL to avoid this situation. ghstack-source-id: 99893858 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_spawn -- 'test_backward_simple_script_call $test_dist_autograd_spawn\.DistAutogradTestWithSpawn$' --stress-runs 100 ``` Differential Revision: D7490489 fbshipit-source-id: 76f63cc7bedf088d3dbff288f53aa0bd33749255	2020-03-10 20:40:22 -07:00
Jithun Nair	ce77d4a316	Set USE_RCCL cmake option (dependent on USE_NCCL) (#31341 ) Summary: so that Gloo build has RCCL path enabled for ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/31341 Differential Revision: D20379910 Pulled By: ezyang fbshipit-source-id: 981f924be93ddcc0705c1934f92d938c29aaf312	2020-03-10 20:26:09 -07:00
davidriazati	23b2fba79a	[jit] Add type tags to lists/dicts in pickle (#33255 ) Summary: Stacked PRs * #33474 - [jit] Remove list specializations from pickler * #33255 - [jit] Add type tags to lists/dicts in pickle This adds a global call to `torch.jit._pickle.restore_type_tags` for lists and dicts so that we can preserve their types after serialization. ](https://our.intern.facebook.com/intern/diff/20346780/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33255 Pulled By: driazati Differential Revision: D20346780 fbshipit-source-id: c8534954ef4adb2e3c880401acbee30cd284f3db	2020-03-10 19:17:01 -07:00
Jiakai Liu	4167db11f7	[pytorch][ci] add build_only flag to mobile CI jobs (#34560 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34560 These jobs don't have next phase so we don't really need commit the docker images. Should also fix issue #34557. Test Plan: Imported from OSS Differential Revision: D20375308 Pulled By: ljk53 fbshipit-source-id: 328cb428fcfb0fbb79b2a233b5f52607158c983c	2020-03-10 17:45:51 -07:00
Daya Khudia	a09c4d3997	[pt][quant] Vectorized qmul and more methods on qint data types (#34376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34376 Vectorized implementation of qmul. qmul is now ~16x faster on my development machine. This implementation works for qint8, quint8 and qint32. Also added some commonly used operations, such as multiply operator, requantize operation etc., to qint vector classes for future use. ``` #!/usr/bin/env python import time import torch import torch.nn as nn torch.set_num_threads(1) # print(torch.__config__.parallel_info()) A = torch.rand(1, 54, 54, 256) B = torch.rand(1, 54, 54, 256) scale = .05 zero_point = 50 for dtype in [torch.quint8, torch.qint8]: qA = torch.quantize_per_tensor(A, scale=scale, zero_point=zero_point, dtype=dtype) qB = torch.quantize_per_tensor(B, scale=scale, zero_point=zero_point, dtype=dtype) NITER = 1000 s = time.time() for i in range(NITER): out = torch.ops.quantized.mul(qA, qB, scale=scale, zero_point=zero_point) time_per_iter = (time.time() - s) / NITER print('dtype: {} time per iter ms: {:.3f}'.format(dtype, time_per_iter * 1000)) ``` ### Before dtype: torch.quint8 time per iter ms: 6.714 dtype: torch.qint8 time per iter ms: 6.780 ### After dtype: torch.quint8 time per iter ms: 0.431 dtype: torch.qint8 time per iter ms: 0.417 ### Test Modified qmul tests to include qint8 and qint32 data types. python test/test_quantized.py TestQuantizedOps.test_qmul_relu_same_qparams python test/test_quantized.py TestQuantizedOps.test_qmul_relu_different_qparams python test/test_quantized.py TestQuantizedOps.test_qmul_broadcast ghstack-source-id: 99862681 Differential Revision: D20308515 fbshipit-source-id: 4fa65b2ba433cfd59260fc183a70f53a6fcc36b4	2020-03-10 16:51:41 -07:00
Meghan Lele	903ad90325	[JIT] Introduce a fake Tensor creation node for IR unit tests (#34334 ) Summary: Summary There is often a need to create a Tensor when writing IR by hand for JIT optimisation pass unit tests. The only options for this today are real Tensor creation functions like `aten::ones`. Any test that uses these functions must also use the same default arguments as the Python/C++ API, which means that all of the tests have to be updated when the API is updated. This commit introduces a new primitive, `prim::MakeTestTensor` with schema `() -> Tensor` that should be used in unit tests instead of real Tensor creation functions. This new primitive has no public-facing API, so the maintenance burden is much lower. Testing This commit updates the alias analysis and DCE tests to use `prim::MakeTestTensor` instead of `aten::rand`, `aten::ones`, and `aten::zeros`. ``` $ ./bin/test_jit CUDA not available. Disabling CUDA and MultiCUDA tests Note: Google Test filter = -_CUDA:_MultiCUDA [==========] Running 75 tests from 1 test case. [----------] Global test environment set-up. [----------] 75 tests from JitTest [ RUN ] JitTest.ADFormulas [ OK ] JitTest.ADFormulas (82 ms) [ RUN ] JitTest.Attributes [ OK ] JitTest.Attributes (0 ms) ... ... ... [ RUN ] JitTest.LiteInterpreterPrim [ OK ] JitTest.LiteInterpreterPrim (0 ms) [ RUN ] JitTest.LiteInterpreterLoadOrigJit [ OK ] JitTest.LiteInterpreterLoadOrigJit (2 ms) [----------] 75 tests from JitTest (150 ms total) [----------] Global test environment tear-down [==========] 75 tests from 1 test case ran. (150 ms total) [ PASSED ] 75 tests. ``` Fixes* This pull request fixes https://github.com/pytorch/pytorch/issues/33500. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34334 Differential Revision: D20296437 Pulled By: SplitInfinity fbshipit-source-id: df4e7b0881ae4913424e5a409bfa171a61c3e568	2020-03-10 16:12:45 -07:00
Gao, Xiang	d0834c5b64	Preserve memory format for torch.cat on CUDA (#34526 ) Summary: fix https://github.com/pytorch/pytorch/issues/34084 cc: ptrblck VitalyFedyunin Pull Request resolved: https://github.com/pytorch/pytorch/pull/34526 Differential Revision: D20371847 Pulled By: ngimel fbshipit-source-id: e3b1a34caff2db8099ad9afe91bf9b473d5da6e8	2020-03-10 16:06:10 -07:00
Vincent Quenneville-Belair	be3bc1deb1	convert counter back to list #33229 (#33356 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33229 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33356 Differential Revision: D20003196 Pulled By: vincentqb fbshipit-source-id: 96f9e0fc7e99a7c2e202f932d1a2ffa158afad92	2020-03-10 15:46:24 -07:00
Nikita Shulga	dd7cec680c	Do not use clang if it can not parse system extensions (#34549 ) Summary: Attempt to build pytorch with ASAN on system with gcc-8 fails due to the mismatch system compilation flags. Address the issue by using original compiler to build `torch._C` extension Pull Request resolved: https://github.com/pytorch/pytorch/pull/34549 Test Plan: Run `.jenkins/pytorch/build-asan.sh` on FC-30 Differential Revision: D20373781 Pulled By: malfet fbshipit-source-id: 041c8d25f96b4436385a5e0eb6fc46e9b5fdf3f1	2020-03-10 15:40:08 -07:00
Lingyi Liu	09296c34a4	Add the build for runtime dispatch for AVX, AVX2 instruction set (#26125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26125 We already had some optimization implementation using AVX2 for improve the quantized kernel performance. In this diff, we want to enable the runtime dispatch. Test Plan: Sandcastle build and test Also test with a python binary calling into vectorized op. torch.__config__.show() PyTorch built with: - GCC 4.2 - clang 8.0.20181009 - Intel(R) Math Kernel Library Version 2017.0.3 Product Build 20170413 for Intel(R) 64 architecture applications - Intel(R) MKL-DNN v0.18.1 (Git Hash N/A) - OpenMP 1 - CPU capability usage: AVX2 - Build settings: Reviewed By: jamesr66a Differential Revision: D17337251 fbshipit-source-id: 8e22d10011a12a4eaf54cea3485353eb1811d828	2020-03-10 15:32:57 -07:00
Igor Sugak	259d7299db	[caffe2] do not declare __assert_fail in clang builds (#33893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33893 It appears that when Clang drives CUDA compilation ` __assert_fail` is always defined as device function. Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true -c cxx.untracked_headers=ignore //fblearner/flow/projects/dper:workflow ``` Reviewed By: ngimel Differential Revision: D20145034 fbshipit-source-id: 23153411ed631e05421c7afcf41b7ea5619cdd96	2020-03-10 14:45:03 -07:00
anjali411	2d24005d18	[C++ API Parity] rmsprop optimizer update (#33450 ) Summary: This PR is BC-breaking in the following way: In RMSpropOptions: 1. learning_rate is renamed to lr. Test plan before 1.5 release: Test that in 1.5 we can load a C++ RMSprop optimizer that was serialized in 1.4, and their states are the same. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33450 Differential Revision: D20366623 Pulled By: anjali411 fbshipit-source-id: 83250be9b583a766927e0e22a4de8b0765379451	2020-03-10 13:30:56 -07:00
David Reiss	6f12145c60	Change std::to_string call to c10::to_string Summary: I'm using this code in an internal Android build, and std::to_string doesn't work in our internal Android builds yet. Test Plan: Internal build. Reviewed By: ljk53 Differential Revision: D20234221 fbshipit-source-id: 8fd61235bf9b487e07a1459c452830e732c7afb0	2020-03-10 13:18:27 -07:00
Terence Feng	2cf344be4c	Turn on exact_dtype by default on test_sparse.py (#34489 ) (#34542 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34542 Turn on exact_dtype by default on test_sparse.py (#34489) Pull Request resolved: #34489 Test Plan: ``` python test/test_sparse.py ``` Imported from OSS Differential Revision: D20369764 fbshipit-source-id: ade2434f77af8ae419bda653b4c46616c052a8b2	2020-03-10 12:52:09 -07:00
Pritam Damania	b185359fb4	Avoid clone for sparse tensors during accumulation of grads. (#33427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33427 This PR is an attempt to avoid clone for sparse tensors similar to how we avoid clone for dense tensors currently. As per my understanding even if the 'indices' and 'values' of a sparse tensor are non-continguous, operations like 'add' are still supported. As a result, the major change in this PR is to use create a shallow copy instead of clone() for sparse tensors. ghstack-source-id: 99838375 Test Plan: waitforbuildbot Differential Revision: D19926698 fbshipit-source-id: b5a3f36c2aa273e17f8b7a9f09c1ea00e7478109	2020-03-10 12:41:47 -07:00
Eli Uriegas	5f61f42c79	.circleci: Switch should_run_job cuda 10.1 -> 10.2 (#34498 ) Summary: We updated the default jobs to run in a different PR but neglected to update this script as well. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34498 Differential Revision: D20368420 Pulled By: seemethere fbshipit-source-id: 240171b18f397095e3a8d57de3a29d1d2e891d85	2020-03-10 12:25:09 -07:00
Natalia Gimelshein	cd9d9a2235	fix handling of replica parameters in DataParallel (#33907 ) Summary: In DataParallel, replica parameters are not leaves (because they are computed via broadcast from master parameters), and should be treated as such. Fixes https://github.com/pytorch/pytorch/issues/33552 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33907 Differential Revision: D20150199 Pulled By: ngimel fbshipit-source-id: 5965d4115b6b3a8433063126ff6269567872fbeb	2020-03-10 10:35:44 -07:00
Xiang Gao	0dbfb26e53	Clean up include list of Shape.cu (#34528 ) Summary: The include list seems to be copied from somewhere else, and some totally unrelated files are included. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34528 Differential Revision: D20358622 Pulled By: ngimel fbshipit-source-id: d8a6260f5f77b0eabdbd68e3728873efd632d9bc	2020-03-10 10:29:20 -07:00
Yanli Zhao	cb689a5d68	remove duplicated process group gloo timeout (#31342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31342 Test Plan: unit test Differential Revision: D19131704 fbshipit-source-id: 4e91d5933635ee2c7c301caf89a5a7009c5cb7c8	2020-03-10 09:08:02 -07:00
Andrew Delong	c7dd5f89a2	Fix #33562 (uncaught domain_error on macOS) (#34301 ) Summary: Tries to fix https://github.com/pytorch/pytorch/issues/33562 by raising `std::runtime_error` instead of `std::domain_error`. * The Python tests already expect `RuntimeError` so this shouldn't affect Python users of PyTorch. * If someone out there is using C10 or ATen from C++ and tries to catch `std::domain_error` specifically, this fix would break their code. Hopefully that's not the case. Alternative to this PR is someone try to really get to the bottom of why `std::domain_error` isn't being caught. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34301 Differential Revision: D20344579 Pulled By: ezyang fbshipit-source-id: d5f3045085a2f75b71b864335ebf44991d0cad80	2020-03-10 08:56:38 -07:00
Jithun Nair	9e94e46453	Check if rnn weights need to be flattened (#34265 ) Summary: cuDNN needs it, MIOpen doesn't. However, since it seems to be the PyTorch preference to not introduce ROCm-specific logic in the python layer, we need to add a C++ function to detect if rnn weight flattening is needed. This PR will be needed to fix the rnn unit test errors arising for PR https://github.com/pytorch/pytorch/issues/33837. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34265 Differential Revision: D20345105 Pulled By: ezyang fbshipit-source-id: a2588a6e2ac6f7d1edf2b7872bc6a879a7df96ec	2020-03-10 08:45:29 -07:00
rohithkrn	29b673392f	[ROCm] Enable BFloat16 type for loss functions and few misc ops required for resnet50 (#34469 ) Summary: This PR enables bfloat16 type for loss criterion ops(and the ops they depend on) and few miscellaneous ops required to train resnet50. iotamudelta ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/34469 Differential Revision: D20348856 Pulled By: ezyang fbshipit-source-id: 0a8f06c2169cfa3c9cf319120e27150170095f6c	2020-03-10 08:39:07 -07:00
Yuxin Wu	20b18a58f1	Update compiler warning about ABI compatibility (#34472 ) Summary: 3ac42677633a39c588c3fea19d2d4121f114edb3 already forces pytorch to use gcc>=5 everywhere Pull Request resolved: https://github.com/pytorch/pytorch/pull/34472 Differential Revision: D20345134 Pulled By: ezyang fbshipit-source-id: 3ce706405e8784cac5c314500466b5f988ad31bf	2020-03-10 08:12:07 -07:00
Richard Zou	f5ee46f1cf	Remove custom function in no_grad block error message (#33896 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33896 Fixes #32625. Previously, we'd receive an error message if we have a custom function return a view of an input in a no_grad block: ``` class Alias(Function): staticmethod def forward(ctx, x): return x[:] staticmethod def backward(ctx, gx): return gx inp = torch.rand(2, requires_grad=True) with torch.no_grad(): # Used to error out output = Alias.apply(inp) ``` After this change, the error no longer happens. The behavior changes to become consistent to if we had implemented an operator that does the same thing as the custom function: - the output requires_grad - we are able to detect (and error out) if the user tries to modify the output in-place outside of the no_grad block. Test Plan: - new test Differential Revision: D20345601 Pulled By: zou3519 fbshipit-source-id: 7f95b4254f52ddbf989d26f449660403bcde1c78	2020-03-10 07:58:55 -07:00
Richard Zou	3e6e2e9b7b	Print the current Node name in anomaly mode (#33875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33875 Fixes #33675. I added a `current_node_name` argument to AnomalyMetadata::print_stack. This is a mandatory arg because I found only one callsite and making it a default arg on a virtual function can be confusing. Test Plan: - Tested locally: https://gist.github.com/zou3519/09937387c83efc76e1700374d5c9c9d9 - I don't know how to add a test for this: the message is printed to stderr but it isn't an exception nor a warning. I considered capturing the stderr of a subprocess but that seems like asking for flakiness. Differential Revision: D20349399 Pulled By: zou3519 fbshipit-source-id: 7585ddffe2bf9e1081f4028a9c44de783978a052	2020-03-10 07:51:52 -07:00
Pritam Damania	d30fa4837e	Unify gradient accumulation between distributed autograd and local autograd (#33214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33214 Distributed autograd had some custom logic in terms of how we accumulated gradients. This was mostly done early on to enable basic functionality. Although, in the long term we should merge this logic with what we have in the local autograd engine. A lot of work has gone into ensuring we accumulate grads correctly and efficiently and we should reuse that as a starting point. We can investigate if we need further custom logic for distributed autograd later on if we need additional optimizations. In this PR I've merged the gradient accumulation logic and also the gradient hooks. As a result, now gradient hooks are called in distributed autograd as well. ghstack-source-id: 99838019 Test Plan: waitforbuildbot Differential Revision: D19843284 fbshipit-source-id: 7923d7e871fb6afd3e98dba7de96606264dcb5f3	2020-03-10 01:56:08 -07:00
Colin Jermain	4f62cbe7de	[ONNX] Support one_hot (#34454 ) Summary: This PR resolves https://github.com/pytorch/pytorch/issues/22534 by adding a converter for the `torch.nn.functional.one_hot` function, and covering it with a test. Are there other places this should be tested? Pull Request resolved: https://github.com/pytorch/pytorch/pull/34454 Reviewed By: hl475 Differential Revision: D20354255 Pulled By: houseroad fbshipit-source-id: 84224c1610b2cc7986c91441c65647ddc090750d	2020-03-09 22:26:36 -07:00
Michael Suo	965146b818	[jit] delete netdef converter (#33807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33807 afaik this is unused, so removing it from the source tree. RIP :( Test Plan: Imported from OSS Differential Revision: D20122118 Pulled By: suo fbshipit-source-id: cb45943f5b9f969482301a2f9fe540326dbc78f2	2020-03-09 22:25:16 -07:00
Mike Ruberry	3671036ef3	Adds true_divide function, analogous to Python 's, JAX's, NumPy's (true) division (#34236 ) Summary: See NumPy's division documentation here: https://numpy.org/doc/1.18/reference/generated/numpy.divide.html#numpy.divide. True division is the same as PyTorch's default division except when both inputs are integer or bool tensors. In the latter case the inputs are (conceptually) cast to the default floating type before the division is performed. The function is implemented for dense and sparse tensors and supports exporting to ONNX from PyTorch's eager mode or JIT traces. The function is inherently incompatible with exporting to ONNX via JIT script, and is another datapoint suggesting we should deprecate exporting scripted graphs to ONNX. Tests are added for the type promotion, named tensor, and ONNX export behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34236 Reviewed By: houseroad Differential Revision: D20334087 Pulled By: mruberry fbshipit-source-id: 83d00d886f46f713215d7d9e02ffd043164c57f1	2020-03-09 21:06:33 -07:00
Nikita Shulga	e408d46477	Print pytorch version before running ASAN tests (#34521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34521 Test Plan: CI Differential Revision: D20357233 Pulled By: malfet fbshipit-source-id: 1c1b5a94a66d828383676a7a1403bbc13bb21c83	2020-03-09 20:52:46 -07:00
Shen Li	b9c32209db	Use SerializedPyObj in PythonRpcHandler::generatePythonUDFResult (#34495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34495 Differential Revision: D20347466 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: 79625adb4ac3c9c6da4f40016e973bf17466c693	2020-03-09 20:41:05 -07:00
Shen Li	b82658810e	Split deserialize from _run_function in RPC internal.py (#34494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34494 Differential Revision: D20347463 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: e6fd886622f26c46bb83ac118e67abb2f5b296b9	2020-03-09 20:41:00 -07:00
Shen Li	544fb64440	Use SerializedPyObj in PythonRpcHandler (#34493 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34493 Differential Revision: D20347462 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: 9edda9eb95b1994464459271bb53ee77b760e474	2020-03-09 20:40:55 -07:00
Shen Li	18ef09f5ac	Remove _load_return_value from RPC internal.py (#34492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34492 Differential Revision: D20347468 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: 92388d0d50a08fb895bacacf94c7b5495b4ae2b6	2020-03-09 20:40:50 -07:00
Shen Li	6d1c4df660	Consolidate Python Messages to use SerializedPyObj (#34491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34491 Differential Revision: D20347467 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: efae4111d961f3a528cede77c863fb049cda9029	2020-03-09 20:40:45 -07:00
Shen Li	3b661eb84c	Avoid copy contents in SerializedPyObj (#34490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34490 Differential Revision: D20347465 Test Plan: Imported from OSS Pulled By: mrshenli fbshipit-source-id: d59e74e3ee9122992a5c50a083e43ab31b7a70f5	2020-03-09 20:38:54 -07:00
James Reed	2de4fa702b	[JIT] Preserve qualified names on traced modules (#34395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34395 fixes: https://github.com/pytorch/pytorch/issues/33913 Test Plan: Imported from OSS Differential Revision: D20347778 Pulled By: jamesr66a fbshipit-source-id: 7b5a35b6f9678c34cb6127d531fa3bfe65703116	2020-03-09 19:23:53 -07:00
Yinghai Lu	79e1305519	[net_runner] Get shape info from qtensors (#34321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34321 Mostly cosmetic as we can infer the shape anyway. It can remove a lot of the noise in the log though. Note that weight sharing doesn't work yet. I'll add another diff to address this. Reviewed By: houseroad Differential Revision: D20290841 fbshipit-source-id: fe6f9b60d05dbe150af15b5d9d7a69fd902e12cc	2020-03-09 18:34:16 -07:00
Nikolay Korovaiko	e16908cb1f	profile block outputs; helps guard elimination (#33889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33889 Reviewed By: zdevito Differential Revision: D20294979 Pulled By: Krovatkin fbshipit-source-id: 2a68710ec8f8f854c99dfe173f49da442a39e498	2020-03-09 17:12:58 -07:00
Johannes M Dieterich	2c1a302d6a	[ROCm] Enable double __shfl_down (#34103 ) Summary: This allows us to enable some double-based pdist tests running into accrued error from casting down to float previously. Addresses https://github.com/pytorch/pytorch/issues/33128 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34103 Differential Revision: D20343279 Pulled By: ezyang fbshipit-source-id: a2da768259fab34ef326976283b7a15bebbbb979	2020-03-09 16:23:56 -07:00
Nikolay Korovaiko	0a4a558c2c	Dictionary Constants (#32869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32869 Differential Revision: D19909339 Pulled By: Krovatkin fbshipit-source-id: 6fe2a9b470768f84b957c69cdf9af3a1bd9b1ca9	2020-03-09 16:12:36 -07:00
Gregory Chanan	90ff3b56d0	Kill some unused TH(C)Storage functions. (#34385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34385 Test Plan: Imported from OSS Differential Revision: D20311064 Pulled By: gchanan fbshipit-source-id: 6dc50621dc417e9ea4624cdebd0970453fa75a77	2020-03-09 16:03:56 -07:00
Gregory Chanan	4e357089b4	Stop calling newWithSize directly. (#34384 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34384 Test Plan: Imported from OSS Differential Revision: D20311057 Pulled By: gchanan fbshipit-source-id: 1e1a1f9b757b62f20d8d806f21abdd70f07b12aa	2020-03-09 16:03:51 -07:00
Elias Ellison	fea618b524	[JIT] remove list with default builtin (#34171 ) Summary: I think this was added when we couldn't compile the function itself. now we can. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34171 Differential Revision: D20269960 Pulled By: eellison fbshipit-source-id: 0a60458d639995d9448789c249d405343881b304	2020-03-09 16:02:26 -07:00
Bruce Lin	34688d2c48	Add brand guidelines link (#34503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34503 Differential Revision: D20349273 Pulled By: soumith fbshipit-source-id: 6b085377741ace5d200ca0d536de433b9bb7825c	2020-03-09 15:55:52 -07:00
Jerry Zhang	2e7eef41ac	[quant][graphmode] Swap quantized functional linear with aten::linear (#33853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33853 Quant fusion relies on inline, but inline will break the CallFunction("linaer", ...) into a if block it will be hard to recognize this block and swap it with quantized::linear, in order to preserve the op, we will swap all quantized functional linear into aten::linear. They might produce different backward graph, but this is called in the step before we get quantized model, so it shouldn't affect anything. We'll integrate this with convert_script later in the new "finalize_quant" API Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20343873 fbshipit-source-id: 423e03bf893b79267d2dc97bc997ee1bfe54ec0f	2020-03-09 15:45:20 -07:00
Kimish Patel	7688ca631a	Enable RTTI for mobile builds, to enable custom class via torchbind in mobile (#34368 ) Summary: Custom classes via torchbind requires runtime type information. We are trying to enable custom class based graph rewrite for XNNPACK in this stacked PRs: https://github.com/pytorch/pytorch/pull/34047. They require RTTI enabled for mobile. Mobile builds are failing currently without it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34368 Differential Revision: D20306155 Pulled By: kimishpatel fbshipit-source-id: 52c61ff5467a619e8f51708a05258eee35dd0a56	2020-03-09 15:43:55 -07:00
davidriazati	2c0f3536b6	[jit] Make `ModuleList`s a sugared value (#34320 ) Summary: Previously when emitting subscripts we only emitted actual values, but now they may sometimes emit a `ModuleValue`, so it should stay as a `SugaredValue`. This allows for the result of the subscript to be treated as a real module (i.e. you can just do `self.modlist[1](inputs)` instead of `self.modlist[1].forward(inputs)`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34320 Pulled By: driazati Differential Revision: D20345642 fbshipit-source-id: 2bedf9a454af747b704422f6bbb8370cbdf4bf61	2020-03-09 15:36:46 -07:00
cyy	c218963270	fix more errors (#34480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34480 Differential Revision: D20345198 Pulled By: ezyang fbshipit-source-id: 583246acd02850ead96f1f0574d01ef6697c6352	2020-03-09 14:54:15 -07:00
Jeremy Lilley	15a7b9cf0a	[RpcAgent] Metrics for current num active/async rpc calls. (#34398 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34398 As part of PR 34109, it was suggested that we track the number of outstanding async calls for RPC DebugInfo, particularly if we move towards using at::launch() threads on occasion for continuations. This particular aspect of the change was distinct from the main purpose of the diff, and started getting bigger, so split this functionality out as a separate diff. For completeness, we track client_active_calls, server_active_calls, server_active_async_calls, and write some very basic unittest coverage. ghstack-source-id: 99708836 Test Plan: buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/... Differential Revision: D20314994 fbshipit-source-id: 2f7c75d5c511b27ed0c09c7b8a67b6fb49df31a5	2020-03-09 13:34:59 -07:00
Tao Xu	8294db8f15	[iOS][CI] Remove org-member from iOS Simulator Builds (#34410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34410 ### Summary Currently, the iOS jobs are not being run on PRs anymore. This is because all iOS jobs have specified the `org-member` as a context which used to include all pytorch members. But seems like recently this rule has changed. It turns out that only users from the admin group or builder group can have access right to the context values. https://circleci.com/gh/organizations/pytorch/settings#contexts/2b885fc9-ef3a-4b86-8f5a-2e6e22bd0cfe This PR will remove `org-member` from the iOS simulator build which doesn't require code signing. For the arm64 builds, they'll only be run on master, not on PRs anymore. ### Test plan - The iOS simulator job should be able to appear in the PR workflow Test Plan: Imported from OSS Differential Revision: D20347270 Pulled By: xta0 fbshipit-source-id: 23f37d40160c237dc280e0e82f879c1d601f72ac	2020-03-09 13:22:54 -07:00
Jerry Zhang	776d2a1e8f	[quant][graphmode] Handling ops doesn't require observation in insertObservers (#33481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33481 We have to propagate observed property of values through ops like max_pool2d, flatten and avoid inserting duplicated observers. For example: ``` x1 = self.conv(x) x2 = maxpool(x1) x3 = self.conv(x2) ``` If x1 is observed, we should propagate this information through maxpool and we should consider x2 as observed as well. Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20261897 fbshipit-source-id: 7de354a3ccb2b6e1708f5c743d4d9f7272691a93	2020-03-09 13:15:54 -07:00
Xiang Gao	2b45368e50	Fix cudnn 64bit indexing issue (#34407 ) Summary: Fix https://github.com/pytorch/pytorch/issues/33143 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34407 Differential Revision: D20325106 Pulled By: ngimel fbshipit-source-id: 5aa52295f5491f189b7a8bea0987f28de0589d98	2020-03-09 12:35:55 -07:00
vishwakftw	e025677e3c	Remove kwargs from torch.meshgrid (#34356 ) Summary: Changelog: - Remove kwargs from torch.meshgrid as they serve no purpose Closes https://github.com/pytorch/pytorch/issues/34206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34356 Differential Revision: D20310971 Pulled By: zou3519 fbshipit-source-id: 97250051504aa3ec1e2a9af9296e7cc71872e5bf	2020-03-09 12:07:43 -07:00
Jiakai Liu	70fe508c26	[pytorch] fix BUILD_CAFFE2_MOBILE gating around caffe2/operators/experimental/c10/cpu (#34354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34354 The condition `NOT INTERN_BUILD_MOBILE AND NOT BUILD_CAFFE2_MOBILE` was added in #27086, but seems it's always false on current master: BUILD_CAFFE2_MOBILE is ON by default - the name is a little bit misleading - it is ON even when it's building non-mobile PyTorch/Caffe2. It is OFF only when it's building PyTorch mobile, where INTERN_BUILD_MOBILE is ON. And when it's building PyTorch mobile, it won't build caffe2/operators at all (by setting BUILD_CAFFE2_OPS OFF: https://github.com/pytorch/pytorch/blob/master/CMakeLists.txt#L345) So I imagine the real intention is to skip when it's building Caffe2 mobile. We can simply remove the deprecating BUILD_CAFFE2_MOBILE condition. Test Plan: Imported from OSS Differential Revision: D20345298 Pulled By: ljk53 fbshipit-source-id: d2cb4e2248fc209d63b2843e0f12e577e323def4	2020-03-09 12:00:57 -07:00
Gregory Chanan	6d3783a6bc	Clean up unused newWithSize variants. (#34383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34383 Test Plan: Imported from OSS Differential Revision: D20311065 Pulled By: gchanan fbshipit-source-id: 9fc2cc4377f32c865401b04868a7405c49929c64	2020-03-09 11:19:30 -07:00
Peng Xia	91e922a338	[AI Bench] Add support for nlu model Summary: add support for nlu specific input Test Plan: tested ``` buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/assistant_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices SM-G950U-7.0-24 ``` make sure it compatible with previous test ``` buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/fbnet/fbnet_mobile_inference.json --platform android/full_jit --framework pytorch --remote --devices SM-G950U-7.0-24 ``` ``` { "model": { "category": "CNN", "description": "Assistant Mobile Inference", "files": { "model": { "filename": "model.pt1", "location": "//everstore/GICWmAB2Znbi_mAAAB0P51IPW8UrbllgAAAP/model.pt1", "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3" }, "data": { "filename": "input.txt", "location": "/home/pengxia/test/input.txt", "md5": "c0f4b29c442bbaeb0007fb0ce513ccb3" } }, "format": "pytorch", "framework": "pytorch", "kind": "deployment", "name": "Assistant Mobile Inference" }, "tests": [ { "command": "{program} --model {files.model} --input_dims \"1\" --input_type NLUType --warmup {warmup} --iter {iter} --input_file {files.data} --report_pep true", "identifier": "{ID}", "metric": "delay", "iter": 5, "warmup": 2, "log_output": true } ] } ``` input.txt ``` what is weather today what time it is set a reminder for tomorrow ``` result https://our.intern.facebook.com/intern/aibench/details/137241352201417 Reviewed By: kimishpatel Differential Revision: D20300947 fbshipit-source-id: 7c1619541a2e9514a560a9acb9029cfc4669f37a	2020-03-09 10:39:49 -07:00
neginraoof	bcfd348858	[ONNX] Export new_zeros (#34077 ) Summary: ONNX export for new_zeros op added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34077 Reviewed By: hl475 Differential Revision: D20332074 Pulled By: houseroad fbshipit-source-id: 4235c4f2c279c37aa8dde6d13c1b26f621967768	2020-03-09 10:38:22 -07:00
Will Feng	baeb359e7a	Remove `using namespace torch::autograd` from header files (#34423 ) Summary: This PR prevents leaking symbols from `torch::autograd` namespace to the root namespace. Fixes https://github.com/pytorch/pytorch/issues/34371. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34423 Differential Revision: D20338404 Pulled By: yf225 fbshipit-source-id: e7ff3348193667a0cee5d38f9a003ae36cc704ca	2020-03-09 10:31:21 -07:00
Adam Paszke	e3d50c4dda	Retain the order of parameters while generating ConcreteModuleTypes (#34131 ) Summary: `ConcreteModuleTypeBuilder` used to keep parameters together with all others attributes in an `unordered_map` often leading to reordering them while building up the type. Parameter order is semantically meaningful, so we need to preserve it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34131 Differential Revision: D20331542 Pulled By: suo fbshipit-source-id: 5b860025f7902654d6099751d3fb14b12f6f5a67	2020-03-09 10:25:45 -07:00
Gregory Chanan	f62a7e7efb	Simplify implementation of newWithStorage1d. (#34382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34382 The previous implementation was handling both newWithStorage and newWithSize, which doesn't make much sense. Test Plan: Imported from OSS Differential Revision: D20311056 Pulled By: gchanan fbshipit-source-id: 2696a4566e6203c98338c86cbf4c236bd18d7c49	2020-03-09 10:18:44 -07:00
prajjwal1	b1bd950a4d	Fixed stub for AdamW (#34299 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/33757](https://github.com/pytorch/pytorch/issues/33757) Pull Request resolved: https://github.com/pytorch/pytorch/pull/34299 Differential Revision: D20337844 Pulled By: ezyang fbshipit-source-id: 54bf174a09b8db9bf6e0c3c717730dd7c795d76b	2020-03-09 08:45:51 -07:00
Will Feng	739d4609c3	[C++ API] Fix ModuleList compile error: error: 'begin' was not declared in this scope (#34463 ) Summary: One example in the current docs for `torch::nn::ModuleList` doesn't compile, and this PR fixes it. Fixes https://github.com/pytorch/pytorch/issues/32414. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34463 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20331120 Pulled By: yf225 fbshipit-source-id: 50bb078fe1a900c9114d5434e92dc40ee13b52bf	2020-03-09 08:15:50 -07:00
Will Feng	b09e90af1e	Fix C++ at::Tensor docs generation (#34467 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25845. Test Plan: Check `pytorch_cpp_doc_push` CI job, and see if there is `classat_1_1_tensor` generated (similar to `structat_1_1native_1_1_convolution_descriptor`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/34467 Differential Revision: D20338190 Pulled By: yf225 fbshipit-source-id: 52dc05af5e0d742e740de5576d0d2b3e17ef28dd	2020-03-09 08:04:32 -07:00
albanD	6e2bb1c054	End of the .data removal in torch/optim (#34211 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34211 Test Plan: Imported from OSS Differential Revision: D20248684 Pulled By: albanD fbshipit-source-id: 2294bfa41b82ff47f000bc98860780f59d7d4421	2020-03-09 06:40:39 -07:00
Mike Ruberry	7e55494502	Warns on read-only Numpy array->tensor conversion (#33615 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/5442. Per title (and see issue). A test is added to test_torch.py to verify the behavior. Update (with new behavior): NumPy arrays can be non-writeable (read-only). When converting a NumPy array to a Torch tensor the storage is shared, but the tensor is always writable (PyTorch doesn't have a read-only tensor). Thus, when a non-writeable NumPy array is converted to a PyTorch tensor it can be written to. In the past, PyTorch would silently copy non-writeable NumPy arrays and then convert those copies into tensors. This behavior violates the from_numpy contract, however, which promises that the tensor and the array share memory. This PR adds a warning message when a non-writeable NumPy array is converted into a Torch tensor. This will not break any networks, but will make end users aware of the behavior. They can work-around the warning message by marking their NumPy arrays as writeable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33615 Differential Revision: D20289894 Pulled By: mruberry fbshipit-source-id: b76df0077399eb91038b12a6bf1917ef38c2cafd	2020-03-08 20:03:50 -07:00
peter	79d47c1c5f	Fix the missing ';' in Conv.cpp (#34448 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/34415. BTW, isn't this tested on CI? Maybe we need to introduce some tests with legacy versions of cuDNN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34448 Differential Revision: D20325104 Pulled By: ngimel fbshipit-source-id: f03dec30ffa6e50a28ee8103d7d49cd6fc0a6d69	2020-03-07 21:43:18 -08:00
Pritam Damania	7d9f611b64	Add worker_name helper to dist_utils. (#34162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34162 This avoids the "worker{}".format(..) in our unit tests to something cleaner. ghstack-source-id: 99713074 Test Plan: waitforbuildbot Differential Revision: D20233533 fbshipit-source-id: 5cff952ca68af5a6d26dc5cc01463cf7756d83d9	2020-03-07 13:24:45 -08:00
James Reed	8a17dc65af	[quantization] Make FP16 RNN use new prepack op (#34339 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34339 Test Plan: Imported from OSS Differential Revision: D20297194 Pulled By: jamesr66a fbshipit-source-id: 8bf6d0f2cb047e90bbdd184aaad337b143040d10	2020-03-07 10:04:01 -08:00
James Reed	45a504dd2d	[JIT] Introduce BuiltinOpFunction and integrate into torchbind (#34098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34098 * #33900 [JIT] Move stuff out of class_type.cpp Test Plan: Imported from OSS Differential Revision: D20229166 Pulled By: jamesr66a fbshipit-source-id: d658a63a5d6e372e675f35b8456adc8de82b49f3	2020-03-07 10:03:56 -08:00
James Reed	60e8615a6d	[JIT] Virtualize Function (#33921 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33921 NOTE FOR REVIEWERS: This PR has internal Facebook specific changes or comments, please review them on [Phabricator](https://our.intern.facebook.com/intern/diff/D20153092/)! Test Plan: Imported from OSS Differential Revision: D20177227 Pulled By: jamesr66a fbshipit-source-id: 87f3e484c4f873d60f76f50f6789c1b4a73bdfde	2020-03-07 10:03:50 -08:00
James Reed	bb1114258c	[JIT] Move stuff out of class_type.cpp (#33900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33900 These functions don't require any libtorch-specific functionality, so move them into the header so they're included in the ATen build Test Plan: Imported from OSS Differential Revision: D20175874 Pulled By: jamesr66a fbshipit-source-id: 1efab1b60e196a635e6c6afadb042b63771170f0	2020-03-07 10:02:32 -08:00
Kamil Wojcicki	65bad41cbe	Fixed typos in quantization docs / docstrings (#34182 ) Summary: Removed extra back quote character. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34182 Differential Revision: D20320146 Pulled By: jerryzh168 fbshipit-source-id: 33c347711a052cc55f7d1a41ed959dadf99a3d7d	2020-03-06 21:53:52 -08:00
Leah Dickstein	c5e822b7bb	Back out "[jit] Add type tags to lists/dicts in pickle" (#34406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34406 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34405 Original commit changeset: 2f1826e6679a Test Plan: reverting, see S197156 Reviewed By: akyrola, volkhin Differential Revision: D20317456 fbshipit-source-id: 89298a9c022edba1d54bcdc7541804cb919e33f5	2020-03-06 20:02:16 -08:00
Brandon Green	392afb9f8b	Fix overlapping keywords (#34142 ) Summary: This commit fixes overlapping keywords in the CPP Docs Pull Request resolved: https://github.com/pytorch/pytorch/pull/34142 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20319949 Pulled By: yf225 fbshipit-source-id: e7bb2efdc286c85792c6f18a260c3bba33c54008	2020-03-06 19:16:21 -08:00
Lingyi Liu	b0479506a8	Add the 3d avg pool for video related model (#33339 ) Summary: ``` import torch, time for dtype in [torch.qint8, torch.quint8, torch.qint32]: print('**', str(dtype), '**') x = torch.rand(1, 5, 56, 56, 256) q_x = torch.quantize_per_tensor(x, 0.5, 1, dtype) q_x = q_x.permute([0, 4, 1, 2, 3]) x = x.permute([0, 4, 1, 2, 3]) NITER = 10 s = time.time() for i in range(NITER): float_out = torch.nn.functional.avg_pool3d(x, kernel_size=3, stride=None, padding=0) time_per_iter_float = (time.time() - s) / NITER s = time.time() for i in range(NITER): quant_out = torch.nn.quantized.functional.avg_pool3d(q_x, kernel_size=3, stride=None, padding=0) time_per_iter_quant = (time.time() - s) / NITER print('time/iter ms (float)', 'time/iter ms (quant)', 'quant/float', sep='\t') print(time_per_iter_float 1000, time_per_iter_quant * 1000, time_per_iter_quant / time_per_iter_float, sep='\t') ``` ``` ** torch.qint8 * time/iter ms (float) time/iter ms (quant) quant/float 16.286182403564453 0.7308721542358398 0.04487682479080417 torch.quint8 * time/iter ms (float) time/iter ms (quant) quant/float 15.364313125610352 0.6497383117675781 0.042288796541418254 torch.qint32 *** time/iter ms (float) time/iter ms (quant) quant/float 15.649032592773438 13.879132270812988 0.8869003363966556 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33339 Differential Revision: D19900904 Pulled By: lly-zero-one fbshipit-source-id: 4522cc6b4a0751aeda6c7edc258e0cb3f55a8fe3	2020-03-06 17:44:34 -08:00
Lu Fang	d98516026e	[PyTorch BC] Clean up the BC whitelist (#34393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34393 Clean up the list Test Plan: CI Reviewed By: hl475 Differential Revision: D20300530 fbshipit-source-id: 50e7da0a9f8295eff33590982f32f84abee96d9c	2020-03-06 16:10:20 -08:00
Xiao Wang	ccf6fab65e	Fix doc and type hints for "torch.add"; fix deprecated python calls in tests (#33935 ) Summary: This PR fixed documentation for `torch.add` with alpha. It also fixed these deprecated python calls `torch.add` and `torch.addmm` in tests, which may affect performance in test/test_sparse.py and test/test_nn.py. cc csarofeen ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/33935 Differential Revision: D20313320 Pulled By: ngimel fbshipit-source-id: fb08413d7e244865952e3fc0e1be7f1794ce4e9a	2020-03-06 15:53:58 -08:00
Martin Yuan	01edb7450f	[Lite Trainer] Add necessary registrations for MNIST model (#33717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33717 Because of the special treatment of operator names for lite interpreter, all the operators used in lite interpreter are still prepended by "_". Add the necessary registrations for MNIST model. All the ops with autograd capability are included in torch_mobile_train. After rebase the selective build from D19649074 can be utilized to strip the unused ops. Note that this diff is for feasibility test. The training accuracy are not covered in the test. ghstack-source-id: 97780066 Test Plan: ``` buck run xplat/caffe2/fb/lite_trainer:lite_trainer -c pt.disable_gen_tracing=1 -c pt.static_dispatch=0 -- --model=/path/MnistModel.bc ``` {F227898221} Reviewed By: dreiss Differential Revision: D19743201 fbshipit-source-id: cacadd76f3729faa0018d147a69466bbf54312fd	2020-03-06 15:49:03 -08:00
Xiang Gao	96ca06cfce	Add nhwc memory format test for dropout (#34379 ) Summary: cc: ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/34379 Differential Revision: D20310118 Pulled By: ngimel fbshipit-source-id: a9bafd6b8fbcb57443e22181cf6bd9879b6f6051	2020-03-06 15:43:21 -08:00
Xiang Gao	37dfc6c498	Reenable large conv tests (#34259 ) Summary: Please merge after https://github.com/pytorch/pytorch/pull/33073 With that PR, we are now trying different algorithms when OOM, so hopefully there will be some algo working at low memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34259 Differential Revision: D20310094 Pulled By: ngimel fbshipit-source-id: bccd8162bd06a0e54ac6f42a7fd9a5b766f92cd7	2020-03-06 15:36:54 -08:00
Duncan Riach	516a587438	Enhance reproducibility documentation (#33795 ) Summary: Improves explanation of non-determinism when running on GPUs. Adds info about `torch.nn.BCELoss` operating non-deterministically on GPUs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33795 Differential Revision: D20284880 Pulled By: ngimel fbshipit-source-id: d543959636d261a80c234150304344b19a37ba5d	2020-03-06 15:32:04 -08:00
Eli Uriegas	079de7f376	.circleci: Remove macOS builds related to CUDA (#34333 ) Summary: We don't release binaries for macOS with CUDA support so we should just remove it from our regular PR pipeline Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34333 Differential Revision: D20312565 Pulled By: seemethere fbshipit-source-id: 376228680aa0e814d1b37f1ff63b7d1262515e44	2020-03-06 13:18:06 -08:00
Eli Uriegas	2d3f6cbf03	.circleci: Update default smoke tests from cuda 10.0 -> 10.2 (#34328 ) Summary: Now that https://github.com/pytorch/pytorch/issues/34241 is merged, we can update these to the latest cuda version to get a better signal. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34328 Differential Revision: D20312552 Pulled By: seemethere fbshipit-source-id: 8e6bf797e067500d5dd9a607c6c19465028637bc	2020-03-06 13:11:58 -08:00
Nikita Shulga	5608ffc46c	[PyTorch] Remove const modifiers from passed by value integers in qbatch_norm_fn (#34378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34378 This fixes strange symbol mangling mismatch beteen `DECLARE_DISPATCH(qbatch_norm_fn, qbatch_norm_stub)` and `REGISTER_DISPATCH(qbatch_norm_stub, &q_batch_norm_kernel<false>);` if code is build on Windows with clang Test Plan: CI + build PyTorch on Windows using clang Reviewed By: EscapeZero Differential Revision: D20309550 fbshipit-source-id: e97c7c3b6fee2e41ea6b2f8167ce197aec404e3d	2020-03-06 13:04:54 -08:00
Gao, Xiang	c6ea71b6e8	Fix Conv.cpp, &&= is not a C++ operator (#34381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34381 Differential Revision: D20310674 Pulled By: ngimel fbshipit-source-id: a453c1d07bcf7aead7402f091bccb4af7b1ec690	2020-03-06 12:38:58 -08:00
Jeremy Lilley	5f641f93f1	[aten] Don't deadlock in IValue::Future impl, tests. (#34099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34099 This change effectively applies into IValue's future impl a few fixes we discovered when using the torch::utils::Future<T> impl. The parallel impls should probably eventually be merged, but until then: - Don't hold the lock when invoking the callbacks. This makes it effectively impossible (deadlocks) to call value() to get the value from inside the callback. - We discovered that it was slightly cleaner in practice to notify condition variables prior to invoking callbacks (best to unblock paused threads ASAP, before spawning new work). - Fix some var naming inconsistency. - Add a some caffe2 cpp test coverage. ghstack-source-id: 99336569 Test Plan: ``` buck test mode/dev //caffe2/test/cpp/jit:jit -- 'JitTest\.IValueFuture' ``` Differential Revision: D20203278 fbshipit-source-id: 6e805ba547899dab9aab458e4b23049db31f930e	2020-03-06 12:34:50 -08:00
Eli Uriegas	0489b8da42	Add scripts to promote S3 artifacts from test channels to stable channels (#34274 ) Summary: Currently testing against the older release `1.4.0` with: ``` PYTORCH_S3_FROM=nightly TEST_WITHOUT_GIT_TAG=1 TEST_PYTORCH_PROMOTE_VERSION=1.4.0 scripts/release/promote/libtorch_to_s3.sh PYTORCH_S3_FROM=nightly TEST_WITHOUT_GIT_TAG=1 TEST_PYTORCH_PROMOTE_VERSION=1.4.0 scripts/release/promote/wheel_to_s3.sh ``` These scripts can also be used for `torchvision` as well which may make the release process better there as well. Later on this should be made into a re-usable module that can be downloaded from anywhere and used amongst all pytorch repositories. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34274 Test Plan: sandcastle_will_deliver Differential Revision: D20294419 Pulled By: seemethere fbshipit-source-id: c8c31b5c42af5096f09275166ac43d45a459d25c	2020-03-06 12:18:16 -08:00
Rohith Menon	879a90b322	[ModelLoading] Use byte encoding for uint8, fp16 etc. instead of int32 (#34343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34343 Use byte encoding for uint8, fp16 etc. instead of int32 in TensorProto serialization/deserialization tl;dr - fp16 tensor deserialization 12x faster, serialized size 25% lower - uint8 tensor deserialization 36x faster, serialized size 25% lower Test Plan: ``` ============================================================================ caffe2/caffe2/fb/predictor/ModelLoaderBenchmark.cpprelative time/iter iters/s ============================================================================ BlobProtoInt32DeserializationFloat16 12.37ms 80.82 BlobProtoByteDeserializationFloat16 1125.46% 1.10ms 909.64 ---------------------------------------------------------------------------- BlobProtoInt32DeserializationUInt8 17.57ms 56.92 BlobProtoByteDeserializationUInt8 3629.45% 484.02us 2.07K ============================================================================ ``` Reviewed By: yinghai Differential Revision: D20137451 fbshipit-source-id: 8ed4be2286a6d4c7e134fcb0832f22bc645039a1	2020-03-06 11:58:30 -08:00
albanD	98afce3c56	Remove unnecessary assert in autograd engine (#34307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34307 Test Plan: Imported from OSS Differential Revision: D20283401 Pulled By: albanD fbshipit-source-id: 34f6eb8955b7d9cb259260abc1056ddd9f354107	2020-03-06 11:45:46 -08:00
Nikita Shulga	6d8a0f6731	[Aten] Init container iterators to an unsigned type (#34159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34159 This fixes `comparison of integers of different sign` warnings Test Plan: CI Reviewed By: EscapeZero Differential Revision: D20232085 fbshipit-source-id: 8f325be54395be54c704335cb7edf2ec7ef75e75	2020-03-06 10:35:43 -08:00
Xiaodong Wang	4c99351de6	[AMD] Remove num_gpu check for remote execution (#34318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34318 Stop checking whether we have AMD GPU devices on the host, because we may be constructing a net on a machine without GPU, and run the net on another one with GPU Reviewed By: ajauhri Differential Revision: D20269562 fbshipit-source-id: 1f561086cacdcead3ce7c03c2d02c25336c8b11a	2020-03-06 09:53:57 -08:00
Jongsoo Park	4872b126fd	[aten] remove stmt unreachable, variable never used warnings (#34017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34017 Remove warning ``` caffe2/aten/src/THC/generic/THCTensorMathBlas.cu(437): warning: statement is unreachable caffe2/aten/src/THC/generic/THCTensorMathBlas.cu(271): warning: variable "transpose_m1" was set but never used caffe2/aten/src/THC/generic/THCTensorMathBlas.cu(271): warning: variable "transpose_m2" was set but never used ``` Test Plan: CI Reviewed By: ngimel Differential Revision: D20181179 fbshipit-source-id: 3665912ba55bffbd8b4555f8a6803e57a502c103	2020-03-06 09:52:43 -08:00
Jongsoo Park	82a177c07f	[c10] remove warning attribute does not apply to any entity (#34018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34018 Remove warning ``` caffe2/c10/util/ArrayRef.h(278): warning: attribute does not apply to any entity ``` Test Plan: CI Reviewed By: jianyuh Differential Revision: D20181191 fbshipit-source-id: 58bd168a87a94fec925c7cde8b8d728a4257446c	2020-03-06 09:47:10 -08:00
Shihao Xu	17ceb6941f	[RPC] Create local RRef<ModuleInterface> remotely in Python, use it remotely in TorchScript (#34183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34183 https://github.com/pytorch/pytorch/pull/33263 enhanced the RRef Python constructor to infer most types, by `jit::tryToInferType(..)`. But this helper function can't infer `ScriptModule` type due to `ScriptModule`'s special per-Module type singleton logic, so it's still not possible for an Python-created RRef to know the JIT type of it's contained `ScriptModule`. Instead of inferring the specific type of a Module, which could leads to too many candidate types (due to Module's multiple inheritance possibility), it's more straightforward to set it's type as a user-specified `ModuleInterface` type. We added an optional argument `type_hint` for users to mark an `RRef` for what `ModuleInterface` type it's holds. ghstack-source-id: 99649379 (Note: this ignores all push blocking failures!) Test Plan: Aspects that need to be confirmed in the test cases https://fb.quip.com/aGxRAh2lCg05 ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_create_local_script_class_rref buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_create_local_script_module_rref buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_return_local_script_class_rref_in_py_and_use_in_script buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_return_local_script_module_rref_in_py_and_use_in_script buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_torchscript_function_exception ``` Differential Revision: D7065050 fbshipit-source-id: e10210c0996622969e499e4a35b0659b36787c1c	2020-03-06 08:28:22 -08:00
Gregory Chanan	a7da4490cc	Clean up some legacy scalar/empty handling. (#34217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34217 LegacyNoScalar variants cause 0-dim tensors to behave like 1-dim tensors. LegacyAll variants cause 0-dim tensors to behave like 1-dim tensors, and numel == 0 tensors to be treated like 0-dimensional tensors. This this was done by codemod, these are often unneeded and often translated incorrectly to ATen. Test Plan: Imported from OSS Differential Revision: D20249577 Pulled By: gchanan fbshipit-source-id: 6f2876d3e479562c9323f3629357a73a47869150	2020-03-06 08:13:31 -08:00
Hong Xu	9c5578fd0a	Make sure Vec256 int32_t and int16_t loadu temprary arrays are properly initialized (#34281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34281 Seems like #32722 has missed two loadu functions Test Plan: Imported from OSS Differential Revision: D20287731 Pulled By: albanD fbshipit-source-id: d959b2508de3f9f660368152d7260026d7fbccbe	2020-03-06 07:55:45 -08:00
Pavel Belevich	35b6d2945d	Tensor.random_ check that from and to are in tensor dtype bounds (#34033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34033 Test Plan: Imported from OSS Differential Revision: D20182414 Pulled By: pbelevich fbshipit-source-id: 3704570ead7de169ce13c81164be0aff0806fb46	2020-03-06 07:22:47 -08:00
Shen Li	30680196e4	Revert D20121915: [JIT] Add support for list() Test Plan: revert-hammer Differential Revision: D20121915 Original commit changeset: c6c4ef444dbf fbshipit-source-id: 829adb58780f4d0f41acebb3e7640a9c68bdbc1b	2020-03-06 07:16:40 -08:00
lixinyu	f9f135c5d8	ChannelsLast3d support is_contiguous, contiguous, suggest_memory_format, caching (#33033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33033 Test Plan: Imported from OSS Differential Revision: D19759661 Pulled By: glaringlee fbshipit-source-id: 6c4798fa93589338c0c71c5308b9fd1151330245	2020-03-06 06:02:03 -08:00
Will Feng	415595ace4	[C++ API] Remove init-list form of at::indexing::Slice (#34255 ) Summary: The init-list form of `at::indexing::Slice` (i.e. `tensor.index({{1, None, 2}, ...})` instead of `tensor.index({Slice(1, None, 2), ...})`) in C++ API can be easily confused with the list-form indexing in Python API (e.g. `tensor[[1, 3, 2], ...]`), which is not good from readability perspective. This PR removes the init-list form of `at::indexing::Slice` to make the API less confusing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34255 Test Plan: Imported from GitHub, without a `Test Plan:` line. Differential Revision: D20290166 Pulled By: yf225 fbshipit-source-id: abbcbeca0b179219e5e1f196a33ef8aec87ebb76	2020-03-06 05:51:53 -08:00
meganset	b8fd88319a	C++ make torch::nn::Sequential push_back(AnyModule) methods public (#34208 ) Summary: Issue https://github.com/pytorch/pytorch/issues/33192 Moves Sequential::push_back methods with AnyModule from private -> public Allows adding an existing AnyModule via something like: ``` torch::nn::Sequential q; auto a=torch::nn::AnyModule(torch::nn::Linear(1,2)); q->push_back(a); q->push_back("fc",a); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34208 Differential Revision: D20300278 Pulled By: yf225 fbshipit-source-id: 4525319bb7fb6667e43a006c9f446a2193781005	2020-03-06 05:47:14 -08:00
Jiakai Liu	9a5e9d8cec	[pytorch][mobile] change mobile build scripts to build PyTorch by default (#34203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34203 Currently cmake and mobile build scripts still build libcaffe2 by default. To build pytorch mobile users have to set environment variable BUILD_PYTORCH_MOBILE=1 or set cmake option BUILD_CAFFE2_MOBILE=OFF. PyTorch mobile has been released for a while. It's about time to change CMake and build scripts to build libtorch by default. Changed caffe2 CI job to build libcaffe2 by setting BUILD_CAFFE2_MOBILE=1 environment variable. Only found android CI for libcaffe2 - do we ever have iOS CI for libcaffe2? Test Plan: Imported from OSS Differential Revision: D20267274 Pulled By: ljk53 fbshipit-source-id: 9d997032a599c874d62fbcfc4f5d4fbf8323a12e	2020-03-05 23:40:47 -08:00
Ilia Cherniavskii	b50825e011	Make RecordFunction more robust for async use cases (#34122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34122 Earlier work added support for async rpc cases when RecordFunction's end callbacks might be called in a different thread; in addition some extra care was needed to handle pointer to parent function; This PR makes RecordFunction aware of potentially multiple threads in use, as well as removes unused parent() call and restricts current() RecordFunction to scope-based record functions (RECORD_FUNCTION macro) Test Plan: unit tests Differential Revision: D20297709 Pulled By: ilia-cher fbshipit-source-id: 46a59e1b2eea0bbd8a59630385e193b38d30f9d1	2020-03-05 22:28:53 -08:00
Elias Ellison	38857734f0	[JIT] fix py35 test (#34350 ) Summary: test_module_interfaces was using syntax only supported in >= 3.6 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34350 Reviewed By: mrshenli Differential Revision: D20298869 Pulled By: eellison fbshipit-source-id: 22319ca403113cff2eedf57767bb34d9580e6db3	2020-03-05 21:31:19 -08:00
anjali411	76035f050b	[C++ API Parity] Adam: updated step and class design (#33730 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33730 Differential Revision: D20292073 Pulled By: anjali411 fbshipit-source-id: a7b4a70f29027ab355aebb91873ea55d5cb51783	2020-03-05 19:15:24 -08:00
Shihao Xu	f4da78f1b3	Remove RPC TorchScript private API (#33978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33978 We can directly pass user_callbale to rpc_async API in TorchScript. There is no need to have private API for taking qualified name. ghstack-source-id: 99600360 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_torchscript_functions_not_supported ``` Differential Revision: D7420993 fbshipit-source-id: 228c15b21848e67418fab780e3fd6a1c6da5142d	2020-03-05 18:35:05 -08:00
Kimish Patel	02478984d6	Add support to dump unsupported ops. Add lite_interpter_load test. (#34278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34278 This diff helps check all the ops not supported by lite_interpreter. Helpful mainly to find all the ops that need to be added instead of adding them one by one. Test Plan: buck run caffe2/binaries:lite_interpreter_model_load -- --model=<bytecode-model-path> Reviewed By: iseeyuan Differential Revision: D20266341 fbshipit-source-id: 5a6c7a5bc52f910cea82a72045870da8105ccb87	2020-03-05 18:31:31 -08:00
Supriya Rao	434af5d94a	[quant] Speed up per-channel min-max observer (#34118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34118 Previously calc_per_channel_qparams was using for loops and python primitives, which called `item` many times causing slowdown during training. These changes uses torch primitives on the tensor to speed up the operation over 60x Perf results on MobileNetV2 during training using autograd profiler FP32 forward call - Self CPU time total: 47.222ms CUDA time total: 124.001ms before change FakeQuant Model - Self CPU time total: 19.107s CUDA time total: 27.177s after change FakeQuant Model - Self CPU time total: 404.667ms CUDA time total: 446.344ms Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D20287841 fbshipit-source-id: 6b706b8206e0d0da3c3c217b014e8da5b71b870d	2020-03-05 18:29:41 -08:00
neginraoof	d2b5eb2a45	[ONNX] Fix for random generators export (#33789 ) Summary: Export random generator with dynamic input size Pull Request resolved: https://github.com/pytorch/pytorch/pull/33789 Reviewed By: hl475 Differential Revision: D20121175 Pulled By: houseroad fbshipit-source-id: c16d11eb07678166d125759d97aadfcd7c80ef14	2020-03-05 17:58:54 -08:00
Jiakai Liu	89d314b5d5	[pytorch] update mobile docker image version (#34337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34337 Test Plan: Imported from OSS Differential Revision: D20296975 Pulled By: ljk53 fbshipit-source-id: bc4a39689dca22e4530f25225f1884eda9bc74de	2020-03-05 17:47:36 -08:00
Supriya Rao	1cf12b7e53	[quant] Fix histogram observer to work with QAT on GPU (#34232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34232 By default `torch.zeros` creates the tensor on GPU. Need to specify the device argument to get it to work correctly on GPU during QAT. Test Plan: 1. Tested by running QAT on GPU 2. python test/test_quantization.py Imported from OSS Differential Revision: D20286351 fbshipit-source-id: 745723c85d902870c56c1c7492f26cb027ae9dc6	2020-03-05 17:19:12 -08:00
Xiang Gao	e4a883e601	cuDNN convolution try multiple algo (#33073 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/31336 https://github.com/pytorch/pytorch/issues/1664 Sometimes cuDNN heuristics return algorithms that can not be used. Instead of just using the first algorithm returned, we should try these algorithms one by one until one of them succeed. Benchmark: https://github.com/zasdfgbnm/things/blob/master/2020Q1/conv-benchmark.ipynb ```python i = torch.randn(256, 3, 256, 256).cuda() c = torch.nn.Conv2d(3, 3, 3, 3).cuda() %timeit c(i); torch.cuda.synchronize() ``` before vs after = 498 vs 490 µs The performance is improved I guess because, before this PR, we always call the heuristics to get the algorithm, but after this PR, we only do at the first time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33073 Differential Revision: D20284755 Pulled By: ngimel fbshipit-source-id: b03af37c75939ca50c2cb401c706ba26914dd10e	2020-03-05 17:06:21 -08:00
Meghan Lele	5500c3de0a	Revert D20150304: [pytorch][PR] [JIT] Introduce a fake Tensor creation node for IR unit tests Test Plan: revert-hammer Differential Revision: D20150304 Original commit changeset: c88f5289055a fbshipit-source-id: 14ac0e46145e9fb4f200c6318b63edd541380aeb	2020-03-05 16:25:08 -08:00
Elias Ellison	78aebbcb88	[JIT] add other module apis (#34106 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34106 Test Plan: Imported from OSS Differential Revision: D20283996 Pulled By: eellison fbshipit-source-id: 88e7bc4547e96717d6c8efe0b25ede0d198d9e68	2020-03-05 16:12:29 -08:00
Peter Bell	2af64ba3ed	Allow output to zero-strided tensors if the size is <= 1 along that dim (#34100 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33812 Pull Request resolved: https://github.com/pytorch/pytorch/pull/34100 Differential Revision: D20267778 Pulled By: ngimel fbshipit-source-id: 1b84c4f6e6bf5d29c3698daa3cb71554b25c1eee	2020-03-05 16:01:33 -08:00
Martin Yuan	ccf4d69b75	[Lite Interpreter] Enable __setstate__ (#33294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33294 1. Serialize bytecode of __setstate__ and run it when loading the model. 2. One use case is quantization. To test this use case a few operators are registered temporarily for lite interpreter. The "_" prefix registration will be removed when the operators are all migrated to mobile. Test Plan: Imported from OSS Differential Revision: D20162898 Pulled By: iseeyuan fbshipit-source-id: 7a3180807bf38fbce594d86993896861f12bb58c	2020-03-05 15:24:21 -08:00
Eli Uriegas	765c5b1c95	.circleci: Add CUDA 10.2 to CI (#34241 ) Summary: Basically a re-do of https://github.com/pytorch/pytorch/pull/33471 Should be safe to merge now that https://github.com/pytorch/pytorch/issues/34135 has been merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34241 Differential Revision: D20292711 Pulled By: seemethere fbshipit-source-id: c508b5ef58f52aa3a263fd33b0373f31719fa0a4	2020-03-05 15:06:34 -08:00
Elias Ellison	f218842f2e	[JIT] Add support for list() (#33818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33818 Test Plan: Imported from OSS Differential Revision: D20121915 Pulled By: eellison fbshipit-source-id: c6c4ef444dbf1d4134dccb28c13315e225945b64	2020-03-05 14:48:20 -08:00
Elias Ellison	479c3b0aa5	[JIT] add support for torch.norm (#33783 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33783 Fix for https://github.com/pytorch/pytorch/issues/20113 Test Plan: Imported from OSS Differential Revision: D20121917 Pulled By: eellison fbshipit-source-id: ffedcc40678cd80f5529ff9323088eed544e5158	2020-03-05 14:46:24 -08:00
neginraoof	beb4309406	[ONNX] Reduce ONNX test time on CI (#33242 ) Summary: Among all ONNX tests, ONNXRuntime tests are taking the most time on CI (almost 60%). This is because we are testing larger models (mainly torchvision RCNNs) for multiple onnx opsets. I decided to divide tests between two jobs for older/newer opsets. This is now reducing the test time from 2h to around 1h10mins. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33242 Reviewed By: hl475 Differential Revision: D19866498 Pulled By: houseroad fbshipit-source-id: 446c1fe659e85f5aef30efc5c4549144fcb5778c	2020-03-05 14:38:34 -08:00
Ailing Zhang	ff2731b45c	Revert "Disable MNIST test in test_xla() (#34261 )" (#34316 ) Summary: Should be passing now ;) This reverts commit 4a194f89aadc7cd1d7e24622b53855cfb885da75. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34316 Reviewed By: mrshenli Differential Revision: D20287196 Pulled By: ailzhang fbshipit-source-id: 1cc48a11edcc48a0ec4161c94487912eba63c9a5	2020-03-05 14:27:26 -08:00
Yinghai Lu	9651088228	Tuck the packing logic into Int8FCPackWeight op (#34289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34289 Test Plan: ``` buck test caffe2/caffe2/quantization/server:fully_connected_dnnlowp_op_test ``` Reviewed By: csummersea Differential Revision: D20275538 fbshipit-source-id: 699ca2a145c7c9a50b0fdab7bd68d8557a031ac0	2020-03-05 13:43:08 -08:00
Meghan Lele	9ce833879f	[JIT] Introduce a fake Tensor creation node for IR unit tests (#33914 ) Summary: Summary There is often a need to create a Tensor when writing IR by hand for JIT optimisation pass unit tests. The only options for this today are real Tensor creation functions like `aten::ones`. Any test that uses these functions must also use the same default arguments as the Python/C++ API, which means that all of the tests have to be updated when the API is updated. This commit introduces a new primitive, `prim::MakeTestTensor` with schema `() -> Tensor` that should be used in unit tests instead of real Tensor creation functions. This new primitive has no public-facing API, so the maintenance burden is much lower. Testing This commit updates the alias analysis and DCE tests to use `prim::MakeTestTensor` instead of `aten::rand`, `aten::ones`, and `aten::zeros`. ``` $ ./bin/test_jit CUDA not available. Disabling CUDA and MultiCUDA tests Note: Google Test filter = -_CUDA:_MultiCUDA [==========] Running 75 tests from 1 test case. [----------] Global test environment set-up. [----------] 75 tests from JitTest [ RUN ] JitTest.ADFormulas [ OK ] JitTest.ADFormulas (82 ms) [ RUN ] JitTest.Attributes [ OK ] JitTest.Attributes (0 ms) ... ... ... [ RUN ] JitTest.LiteInterpreterPrim [ OK ] JitTest.LiteInterpreterPrim (0 ms) [ RUN ] JitTest.LiteInterpreterLoadOrigJit [ OK ] JitTest.LiteInterpreterLoadOrigJit (2 ms) [----------] 75 tests from JitTest (150 ms total) [----------] Global test environment tear-down [==========] 75 tests from 1 test case ran. (150 ms total) [ PASSED ] 75 tests. ``` Fixes* This pull request fixes https://github.com/pytorch/pytorch/issues/33500. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33914 Differential Revision: D20150304 Pulled By: SplitInfinity fbshipit-source-id: c88f5289055a02dc20b7a5dcdf87469f9816d020	2020-03-05 12:42:42 -08:00
Artem Volkhin	75d29f8d3e	Allow converting IValue to vector<string> (#34269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34269 follow up for https://github.com/pytorch/pytorch/pull/16519 Test Plan: unit tests Reviewed By: houseroad Differential Revision: D20261495 fbshipit-source-id: 947f3cbd469d9258ec2dbb36cb68efe15a3b19eb	2020-03-05 12:31:23 -08:00
Adam Paszke	3a4bac5c76	Throw a proper error when parsing local variable annotations without assignments (#34133 ) Summary: Currently, putting `outputs: List[Tensor]` instead of `outputs: List[Tensor] = []` in your JITed code results in: ``` Traceback (most recent call last): File "custom_lstms.py", line 453, in <module> test_script_stacked_bidir_rnn(5, 2, 3, 7, 4) File "custom_lstms.py", line 404, in test_script_stacked_bidir_rnn rnn = script_lstm(input_size, hidden_size, num_layers, bidirectional=True) File "custom_lstms.py", line 62, in script_lstm other_layer_args=[LSTMCell, hidden_size * dirs, hidden_size])) File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1267, in script return torch.jit._recursive.create_script_module(obj, torch.jit._recursive.infer_methods_to_compile) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 305, in create_script_module return create_script_module_impl(nn_module, concrete_type, stubs_fn) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 348, in create_script_module_impl script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn) File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1612, in _construct init_fn(script_module) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 340, in init_fn scripted = create_script_module_impl(orig_value, sub_concrete_type, infer_methods_to_compile) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 348, in create_script_module_impl script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn) File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1612, in _construct init_fn(script_module) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 340, in init_fn scripted = create_script_module_impl(orig_value, sub_concrete_type, infer_methods_to_compile) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 348, in create_script_module_impl script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn) File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1612, in _construct init_fn(script_module) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 340, in init_fn scripted = create_script_module_impl(orig_value, sub_concrete_type, infer_methods_to_compile) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 348, in create_script_module_impl script_module = torch.jit.RecursiveScriptModule._construct(cpp_module, init_fn) File "/home/apaszke/pytorch/torch/jit/__init__.py", line 1612, in _construct init_fn(script_module) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 340, in init_fn scripted = create_script_module_impl(orig_value, sub_concrete_type, infer_methods_to_compile) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 317, in create_script_module_impl stubs = stubs_fn(nn_module) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 511, in infer_methods_to_compile stubs.append(make_stub_from_method(nn_module, method)) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 41, in make_stub_from_method return make_stub(func) File "/home/apaszke/pytorch/torch/jit/_recursive.py", line 34, in make_stub ast = torch.jit.get_jit_def(func, self_name="RecursiveScriptModule") File "/home/apaszke/pytorch/torch/jit/frontend.py", line 173, in get_jit_def return build_def(ctx, py_ast.body[0], type_line, self_name) File "/home/apaszke/pytorch/torch/jit/frontend.py", line 206, in build_def build_stmts(ctx, body)) File "/home/apaszke/pytorch/torch/jit/frontend.py", line 129, in build_stmts stmts = [build_stmt(ctx, s) for s in stmts] File "/home/apaszke/pytorch/torch/jit/frontend.py", line 129, in <listcomp> stmts = [build_stmt(ctx, s) for s in stmts] File "/home/apaszke/pytorch/torch/jit/frontend.py", line 181, in __call__ return method(ctx, node) File "/home/apaszke/pytorch/torch/jit/frontend.py", line 294, in build_AnnAssign rhs = build_expr(ctx, stmt.value) File "/home/apaszke/pytorch/torch/jit/frontend.py", line 180, in __call__ raise UnsupportedNodeError(ctx, node) File "/home/apaszke/pytorch/torch/jit/frontend.py", line 116, in __init__ source_range = ctx.make_range(offending_node.lineno, AttributeError: 'NoneType' object has no attribute 'lineno' ``` This patch makes the error message more reasonable: ``` torch.jit.frontend.UnsupportedNodeError: annotated assignments without assigned value aren't supported: File "custom_lstms.py", line 221 # type: (Tensor, Tuple[Tensor, Tensor]) -> Tuple[Tensor, Tuple[Tensor, Tensor]] inputs = reverse(input.unbind(0)) outputs: List[Tensor] ~ <--- HERE for i in range(len(inputs)): out, state = self.cell(inputs[i], state) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/34133 Differential Revision: D20249076 Pulled By: ezyang fbshipit-source-id: 40ec34ad38859f9fe56f379d3f8d08644b00fab9	2020-03-05 11:23:07 -08:00
Yunus Rahbar	ed11e2536a	[pytorch_ci] Skip determination tests in rocm Summary: I don't know why, but this segfaults on rocm. Test Plan: Can only be tested on master Reviewed By: mrshenli Differential Revision: D20286011 fbshipit-source-id: dde952449bf54ae459d36020f3e3db6fa087b39f	2020-03-05 11:23:02 -08:00
rohithkrn	e907128caf	[ROCm] Enable BFloat16 type for pooling ops (#34166 ) Summary: This PR enables bfloat16 type for pooling ops on ROCm. Also adds bfloat16 implementation of atomicAdd since pooling ops use it. Note: Changes in the lambda function blocks is only indentation as it is now wrapped inside `AT_SKIP_BFLOAT16_IF_NOT_ROCM` macro. iotamudelta ezyang bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/34166 Differential Revision: D20263421 Pulled By: ezyang fbshipit-source-id: 3f4199ec57522e638ec29f45e22c6ec919b7816d	2020-03-05 11:20:54 -08:00
Lara Haidar	8216d9ae64	ONNX Export Support for NLLLoss (#33509 ) Summary: Adding ONNX export support for torch.nn.NLLLoss(). Pull Request resolved: https://github.com/pytorch/pytorch/pull/33509 Reviewed By: hl475 Differential Revision: D20052212 Pulled By: houseroad fbshipit-source-id: 62efcff4efa1e0e97c65ad1b670c2fc1da08d28f	2020-03-05 11:13:21 -08:00
Jiakai Liu	e642a65bea	[pytorch][CI] add e2e mobile custom build jobs to CI (#34184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34184 Add mobile custom build with static dispatch & dynamic dispatch to CI. Most of mobile code analysis CI should be covered by the custom build + dynamic dispatch flow, so changing it to running on master only. Test Plan: Imported from OSS Differential Revision: D20241774 Pulled By: ljk53 fbshipit-source-id: f34c5748735c536ab6b42c8eb1429d8bbdaefd62	2020-03-05 10:26:45 -08:00
Rohan Varma	d98bd5e1f5	[test all] Back out "Revert D20171428: [profiler] fix chrome tracing for profiler run with cuda" Summary: There was an error in https://github.com/pytorch/pytorch/pull/30724/files that resulted in export_chrome_trace generating invalid JSON. This only came up when the profiler is run with use_cuda=True from what it looks like. In the future, we should have tests that ensure we generate valid JSON because we no longer use the json library. ghstack-source-id: 99508836 Test Plan: Added a unit test. Differential Revision: D20237040 fbshipit-source-id: 510befbdf4ec39632ac56544afcddee6c8cc3aca	2020-03-05 09:05:56 -08:00
Shen Li	4a194f89aa	Disable MNIST test in test_xla() (#34261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34261 Test Plan: Imported from OSS Differential Revision: D20260350 Pulled By: mrshenli fbshipit-source-id: b92a6b79e59bdfdf8e68b5dd73f87ea1dfd0daed	2020-03-05 07:55:52 -08:00
Jie	2b79bab029	[CUDA_FUSER] Fork CUDA fuser (#33527 ) Summary: Separating CUDA fuser from CPU fuser. 1. New node in IR - prim::CudaFusionGroup: This enables the cuda fuser to co-exist along side the old fuser. Allows us to incrementally build and expand cuda fuser. 2. copied FuseGraph optimization passes to CudaFuserGraph: We will re-factor & reuse Chunk/Concat in the old fuser logic, which is handled in the optimization pass at this moment. Unfortunately many code in the pass is tightly binded with the legacy fuser, which makes code sharing difficult. The CudaFusionGraph will support only a subset of operations comparing to legacy fuser (CUDA only). It is registered as a custom pass post fusion via ```torch._C._jit_register_cuda_fuser()``` To have it in effect, you should also turn off fusion on GPU via ```torch._C._jit_override_can_fuse_on_gpu(False)``` 3. We don't have codegen in this PR yet (WIP). Currently we just fall back to the old fuser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33527 Differential Revision: D20171598 Pulled By: ZolotukhinM fbshipit-source-id: 9a3c0f06f46da7eaa80ae7551c04869f5b03ef71	2020-03-04 20:25:08 -08:00
Elias Ellison	e132047f1b	[JIT] fix alias assertion (#34268 ) Summary: [This check](`019ffdca31/torch/csrc/jit/ir/alias_analysis.cpp (L772)`) wasn't being triggered for None outputs of tuples, because `mustBeNone` would return false if `num_outputs != 1`. This caused an assertion to fail in alias analysis. It's kind of a convoluted case to repro and I wasn't able to make a succinct one, but I tested internally and it fixed the bug. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34268 Differential Revision: D20261539 Pulled By: eellison fbshipit-source-id: 95edea10e2971727cfd3f3bc2b6bdf9dbadca6a9	2020-03-04 19:00:58 -08:00
Shihao Xu	e2ddf935bb	Run RPC JIT tests with variable type hints only in Python >=3.6 (#34284 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34284 Python 3.5 only supports function type hints. Variable type hints are introduced in Python 3.6. So these tests with JIT type hints will fail with "Syntax Error" in Python 3.5 environment. ghstack-source-id: 99542199 Test Plan: ` Differential Revision: D7348891 fbshipit-source-id: c4c71ac021f35b5e6f7ce4d3e6af10dd1d2600cc	2020-03-04 18:59:08 -08:00
Pritam Damania	c62de4286e	Add test to verify dist_autograd doesn't populate .grad field. (#33949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33949 ghstack-source-id: 99419830 Test Plan: waitforbuildbot Differential Revision: D20165254 fbshipit-source-id: ef4413637b1568d81e4aca053838230025df6bba	2020-03-04 17:08:48 -08:00
Vitaly Fedyunin	e1c6f93f14	Clean warning message (#34143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34143 Test Plan: Imported from OSS Differential Revision: D20228174 Pulled By: VitalyFedyunin fbshipit-source-id: 7ab873e87be8621b0f72e8300942fd82cbc19b29	2020-03-04 15:02:19 -08:00
Yunus Rahbar	1546d2afeb	[pytorch_ci] Don't run determination tests in py35 Test Plan: Can only really be tested in PyTorch master Reviewed By: mrshenli Differential Revision: D20260023 fbshipit-source-id: b5444c376894bfccd6524cf04a71cf76eea72275	2020-03-04 14:23:40 -08:00
Supriya Rao	e236e15934	[quant] Run weight_post_process for QAT (#33852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33852 This fixes an issue for QAT models. During eval if we call `prepare_qat` and `convert` before calling `load_state_dict` it throws an error because the weight info (num channels) is not updated in the observer module. It is not an issue for per-tensor case Fixes issue #33830 Test Plan: python test/test_quantization.py EagerModePostTrainingQuantTest.test_eval_after_train python test/test_quantization.py EagerModeQuantizationAwareTrainingTest.test_eval_after_train Imported from OSS Differential Revision: D20212996 fbshipit-source-id: a04af8fe4df2e555270ae4d6693f5777d86f8a46	2020-03-04 14:01:32 -08:00
Shen Li	d59e036f4d	Revert D20194092: Add support to dump unsupported ops. Add lite_interpter_load test. Test Plan: revert-hammer Differential Revision: D20194092 Original commit changeset: 0d596cd02043 fbshipit-source-id: 17b4bae27543f231bd6c12d90368d399ca55ebdf	2020-03-04 13:53:58 -08:00
Kimish Patel	17a5c67796	Add support to dump unsupported ops. Add lite_interpter_load test. (#34072 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34072 This diff helps check all the ops not supported by lite_interpreter. Helpful mainly to find all the ops that need to be added instead of adding them one by one. Test Plan: buck run caffe2/binaries:lite_interpreter_model_load -- --model=<bytecode-model-path> Reviewed By: iseeyuan Differential Revision: D20194092 fbshipit-source-id: 0d596cd0204308027194af7ed738551d0c32a374	2020-03-04 13:18:12 -08:00
Jiakai Liu	385067ed4f	[pytorch][cmake] improve build mobile with host toolchain (#34187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34187 Noticed that a recent PR broke Android/iOS CI but didn't break mobile build with host toolchain. Turns out one mobile related flag was not set on PYTORCH_BUILD_MOBILE code path: ``` "set(INTERN_DISABLE_MOBILE_INTERP ON)" ``` First, move the INTERN_DISABLE_MOBILE_INTERP macro below, to stay with other "mobile + pytorch" options - it's not relevant to "mobile + caffe2" so doesn't need to be set as common "mobile" option; Second, rename PYTORCH_BUILD_MOBILE env-variable to BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN - it's a bit verbose but becomes more clear what it does - there is another env-variable "BUILD_PYTORCH_MOBILE" used in scripts/build_android.sh, build_ios.sh, which toggles between "mobile + pytorch" v.s. "mobile + caffe2"; Third, combine BUILD_PYTORCH_MOBILE_WITH_HOST_TOOLCHAIN with ANDROID/IOS to avoid missing common mobile options again in future. Test Plan: Imported from OSS Differential Revision: D20251864 Pulled By: ljk53 fbshipit-source-id: dc90cc87ffd4d0bf8a78ae960c4ce33a8bb9e912	2020-03-04 11:43:16 -08:00
Edward Yang	93990bab58	Make use of our S3 mirror if Yann Lecunn's website is not accessible (#34215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34215 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20251538 Pulled By: ezyang fbshipit-source-id: c419f0ce869aca4dede7e37ebd274a08632d10bf	2020-03-04 11:35:34 -08:00
Dmytro Dzhulgakov	67608cc018	Fix MKLDNN conv2d 5d weight handling (#34115 ) Summary: Effectively backporting `c5c00c119f` before that PR lands The bug didn't manifesting itself earlier because MkldnnConv2d constructor didn't reorder the weights. So the issue was arising only on second serialization/deserialization. This also fixes the constructor to deliver better perf right away. Note, that I still serialize 5d tensor - it was the previous behavior, we have to handle it anyway and with https://github.com/pytorch/pytorch/issues/32422 the output of `mkldnn_reorder_conv2d_weight` will always be 4d. cc pinzhenx Pull Request resolved: https://github.com/pytorch/pytorch/pull/34115 Reviewed By: wanchaol Differential Revision: D20224685 Pulled By: dzhulgakov fbshipit-source-id: 24ca9227c4eb4c139096a64ae348808d7478d7dc	2020-03-04 11:26:38 -08:00
Nikita Shulga	9dd5d51b01	[ATen] Exclude CUDA tests when running `basic` under valgrind (#34181 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34181 Test Plan: CI Reviewed By: orionr, seemethere Differential Revision: D20241021 fbshipit-source-id: a7371afc45acc2c07a36c8216036338e14170a56	2020-03-04 11:24:33 -08:00
Kimish Patel	8269c4f3d3	Added nullptr check for pthradpool_get_threads_count (#34087 ) Summary: We get seg fault without this in using XNNPACK. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34087 Differential Revision: D20199787 Pulled By: kimishpatel fbshipit-source-id: d3d274e7bb197461632b21688820cd4c10dcd819	2020-03-04 11:10:53 -08:00
Shen Li	ac6e75a165	Revert D20195053: [pytorch][PR] Add API for listing functions overridable by __torch_function__ Test Plan: revert-hammer Differential Revision: D20195053 Original commit changeset: 1585f4e405f5 fbshipit-source-id: 3c1aab9c60e3138d40d200ae4238bda0cddf8896	2020-03-04 10:13:54 -08:00
Omkar Salpekar	78b81dad83	[Dist Autograd][Better Engineering] Enhanced Error Reporting in Dist Autograd/RPC (#34179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34179 Fixes: https://github.com/pytorch/pytorch/issues/27644 Test Plan: Asserted `test_backward_autograd_engine_error` throws an exception with node information. Differential Revision: D20238150 fbshipit-source-id: a49b279b77416a7e0e09043aa44ed616023d8e70	2020-03-04 10:13:49 -08:00
Nikita Shulga	45b8c8dbcb	[torch] Fix sign-compare warning in `torch::utils::rnn:pack_sequence` (#34185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34185 ArrayRef<T>::size() is size_t Test Plan: CI Reviewed By: EscapeZero Differential Revision: D20241552 fbshipit-source-id: 73cd062db810ebc5a4e34e094dfe6c7e6571ef2d	2020-03-04 10:13:45 -08:00
Mingfei Ma	39f78db7ec	optimize UpSampleNearest 1d 2d and 3d performance on CPU (#31452 ) Summary: This PR aims at improving `UpSample` performance with `mode='nearest'` on 1D 2D and 3D, both inference and training are covered. Current implementation from 'ATen' doesn't have parallelization. 1. single socket inference speedup for 1d, 2d and 3d: 63x, 57x, 46x. 2. single core inference speedup for 1d, 2d and 3d: 5.9x, 4.6x, 3.4x. 3. dual sockets training speedup for 1d, 2d and 3d: 38x, 33x, 65x Pull Request resolved: https://github.com/pytorch/pytorch/pull/31452 Differential Revision: D20077828 Pulled By: VitalyFedyunin fbshipit-source-id: a7815cf2ae344696067d2ec63bd4f4e858eaafff	2020-03-04 10:13:41 -08:00
Hong Xu	112cecc440	Remove the use of macros when defining division between integers (#34104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34104 Test Plan: Imported from OSS Differential Revision: D20222676 Pulled By: VitalyFedyunin fbshipit-source-id: fb026ce7843e7931324ea82542fb07784e40efdb	2020-03-04 10:13:36 -08:00
Hong Xu	438f4ea0ac	Cleaner implementation of bitwise operations of integeral types (#33849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33849 For integral types, there is no need to manipulate with `reinterpret_cast` and therefore a cleaner implementation is available. This might also be helpful on some less optimized compilers or on a less optimized arch (while a test on gcc 8.3 x64 shows no difference in performance). Test Plan: Imported from OSS Differential Revision: D20222675 Pulled By: VitalyFedyunin fbshipit-source-id: 875890d1479f8abab4c4a19d934fe9807d12dfd2	2020-03-04 10:13:32 -08:00
Hong Xu	3a3fcbbc39	Use templates instead of macros when defining bitwise operators. (#33835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33835 Test Plan: Imported from OSS Differential Revision: D20131414 Pulled By: VitalyFedyunin fbshipit-source-id: ec7eb7cb14e037a277cc8d71d5c9df27abf51752	2020-03-04 10:11:36 -08:00
Shen Li	78ad3dc174	Fix Lint (#34218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34218 Test Plan: Imported from OSS Differential Revision: D20249788 Pulled By: mrshenli fbshipit-source-id: 5ca2acaff5344fc4455c70af60576f8e93e54cbf	2020-03-04 09:48:57 -08:00
Jerry Zhang	6f52562e75	[quant][graphmode] Add add_relu pattern in skip values (#32816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32816 att Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20208786 fbshipit-source-id: ef84b77f46f88b192a75c123aabaa203836a7dfb	2020-03-04 09:36:02 -08:00
Edward Yang	22506ae71d	Reduce code duplication in OperatorEntry by keying hash map on optional<DispatchKey> (#33817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33817 Then, nullopt denotes catch all, whereas everything else is specific to a DispatchKey. I can delete the second copy of methods when I do this. This refactor should be pushed all the way to the frontend but I am doing it one step at a time. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20125163 Pulled By: ezyang fbshipit-source-id: 026075a4bab81b0bd88b07f0800f6e6bbeb2166a	2020-03-04 08:57:22 -08:00
Summer Deng	c688eb28a2	Minor fix for quantizing the Ads complex model Summary: Remove Int8Relu in quantized model Suppress log warnings if verbose is false Test Plan: TBD Reviewed By: yinghai Differential Revision: D20202474 fbshipit-source-id: 995ef8e665d8edeee810eedac831440b55271a7b	2020-03-04 08:34:59 -08:00
peter	5f4a01b2ea	Update MAGMA to 2.5.2 for Windows (#34205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34205 Differential Revision: D20248224 Pulled By: soumith fbshipit-source-id: f5e0fe06aa8f8ee551abe45db1d55d06e95ab928	2020-03-04 08:28:09 -08:00
Peter Bell	f6c883ccea	TH: Defer to ATen's AVX detection code (#34088 ) Summary: As per https://github.com/pytorch/pytorch/issues/22338#issuecomment-593028168, this removes the AVX detection code from TH. Now the environment variable `ATEN_CPU_CAPABILITY` is the only setting needed to disable AVX/AVX2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34088 Differential Revision: D20236039 Pulled By: ezyang fbshipit-source-id: eecec64b41a7a6ca7e42c1c2762032eb47af535c	2020-03-04 08:22:02 -08:00
Martin Yuan	fdd771c90f	Make tracing in code gen optional (#33715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33715 Tracing codes depend on the full JIT, which is not available in lite interpreter. Use `-c pt.disable_gen_tracing=1` to turn off generating tracing part. ghstack-source-id: 99252322 Test Plan: ``` buck build xplat/caffe2:torch -c pt.disable_gen_tracing=1 ``` The tracing part of generated/VariableType_?.cpp will not be generated. Reviewed By: smessmer Differential Revision: D19684577 fbshipit-source-id: a1e5b80eca5e51c7bf72b5cc8f0e36c2135fabc2	2020-03-04 08:16:31 -08:00
Nikita Shulga	790274bff2	[caffe2] Fix signed unsigned comparison warning (#34161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34161 Test Plan: CI Reviewed By: EscapeZero Differential Revision: D20232087 fbshipit-source-id: 09dc8d452c5923cd2941e0cc01eac7a6677b38e8	2020-03-04 08:02:44 -08:00
Jessica Lin	6d78882158	Add layout.html to template for stable docs (#33770 ) Summary: When docs are built, conf.py points to a _templates-stable/layout.html that does not exist. Adding this file here so future stable docs will build with Google Analytics tags and without the unstable able that is in _templates/layout.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/33770 Differential Revision: D20164895 Pulled By: jlin27 fbshipit-source-id: 5fca9f9b825b1484dab52e2b2d91f92ae6372371	2020-03-04 03:14:52 -08:00
Hao Lu	fc6dce6033	[c10] Fix TORCH_INTERNAL_ASSERT_DEBUG_ONLY MSVC bug (#34173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34173 Test Plan: Temporarily change `AT_ASSERTM` to `TORCH_INTERNAL_ASSERT_DEBUG_ONLY` to test MSVC fix. ``` buck test mode/opt //caffe2/caffe2:caffe2_test_cpu -- 'BlobTest' ``` & CI Reviewed By: yinghai Differential Revision: D20235886 fbshipit-source-id: 2b7d618e924a0ede95f4a6b8f60cc08e9d58b09d	2020-03-04 02:45:35 -08:00
Martin Yuan	f097ca503d	Add and test training in lite interpreter. (#32359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32359 Test Plan: Imported from OSS Differential Revision: D19450614 Pulled By: iseeyuan fbshipit-source-id: 6bafff39d7880a5b7fb9cd70c33a4e584812be12	2020-03-03 23:33:43 -08:00
Yinghai Lu	2ba74b741e	Add backward Int8Quantize shape inference (#34152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34152 Propagate the input shape of Int8Quantize backwards. Test Plan: ``` buck test caffe2/caffe2/opt:bound_shape_inference_test ``` Reviewed By: csummersea Differential Revision: D20231521 fbshipit-source-id: a77c61b0d5bc570241e62553cecd9ff38553ff44	2020-03-03 22:04:25 -08:00
lixinyu	57c1b80ec2	[pytorch]Migrate _th_ger to Aten and kill resize_scalar in codegen (#33792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33792 Test Plan: Imported from OSS Differential Revision: D20107158 Pulled By: glaringlee fbshipit-source-id: bceddb2d39d3abf36f277daba537677312449c9c	2020-03-03 20:27:54 -08:00
Shihao Xu	7d01888a75	[JIT] Register rpc.rpc_async(..) as a JIT operator (#33329 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33329 # Use case ``` torch.jit.script def send_rpc_async(dst_worker_name, user_callable_qual_name, tensor): # type: (str, str, Tensor) -> None rpc._rpc_async_torchscript( dst_worker_name, user_callable_qual_name, args=(tensor,) ) ``` # Problem ``` torch.jit.frontend.NotSupportedError: keyword-arg expansion is not supported: File "/data/users/shihaoxu/fbsource/fbcode/buck-out/dev/gen/caffe2/test/distributed/rpc/rpc_spawn#binary,link-tree/torch/distributed/rpc/api.py", line 722 args = args if args else () kwargs = kwargs if kwargs else {} fut = _invoke_rpc_torchscript(to, qualified_name, args, *kwargs) ~~~~~~ <--- HERE return fut ``` # Solution Register `rpc.rpc_async(..)` as a JIT operator to handle variable-length argument list. # Plan This PR is the required changes to make `rpc.rpc_async(..)` a JIT prim operator, which can dynamically handle different number of arguments. - Register "prim::rpc_async" as a `Symbol` in "interned_string.h" - Add a if branch in "python_sugared_value.cpp" `toSugarValue(py::object, ..)` entry utility function to set up how JIT frontend convert `torch.distributed.rpc.rpc_async(..)` Python function (Python object) into a `SpecialFormValue` (IR SugaredValue). - Add a switch case for "prim::rpc_aynsc" Symbol in "ir_emitter.cpp" and `emitApplySpecialForm(..)` to set up how JIT compiler provides inputs to the "prim::rpc_aynsc" Operator. - Register "prim::rpc_async" as a `jit::Operator` and provide implementation in "register_distributed_ops.cpp". Notice, since the distributed module is an optional part when building PyTorch. The code to be added in this PR should be wrapped within preprocessing maco. ``` #ifdef USE_DISTRIBUTED new code here #endif ``` Test Plan: Items that need to be confirmed in the test cases https://fb.quip.com/DCvdA9ZLjeO0 ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork \ \ && buck-out/gen/caffe2/test/distributed/rpc/jit/rpc_fork\#binary.par -r test_call_python_function_remotely_from_script_not_supported ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_spawn ``` ``` buck test mode/dev-nosan //caffe2/caffe2/python/operator_test:layer_norm_op_test-2.7 -- test_layer_norm_op_jit ``` Differential Revision: D5738300 fbshipit-source-id: a4604fe762e00be062dc8232ca9790df31fb2074	2020-03-03 19:57:42 -08:00
davidriazati	9b39ad7f2c	[jit] Fix iOS build (#34180 ) Summary: `unpickler.cpp` depends on the mobile type parser all the time, so include it regardless of whether it's a mobile build or not Pull Request resolved: https://github.com/pytorch/pytorch/pull/34180 Pulled By: driazati Differential Revision: D20241881 fbshipit-source-id: a998dd2b3f1c7f58e55bb7851dc595c8ddf9eacb	2020-03-03 19:44:43 -08:00
Jiakai Liu	3c042a6ab9	[pytorch][mobile] support for custom mobile build with dynamic dispatch (#34055 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34055 Enable custom mobile build with dynamic dispatch for OSS build. It calls a python util script to calculate transitive dependencies from the op dependency graph and the list of used root ops, then pass the result as the op registration whitelist to aten codegen, so that only these used ops are registered and kept at link time. For custom build with dynamic dispatch to work correctly, it's critical to have the accurate list of used ops. Current assumption is that only those ops referenced by TorchScript model are used. It works well if client code doesn't call libtorch API (e.g. tensor methods) directly; otherwise the extra used ops need to be added to the whitelist manually, as shown by the HACK in prepare_model.py. Also, if JIT starts calling extra ops independent of specific model, then the extra ops need to be added to the whitelist as well. Verified the correctness of the whole process with MobileNetV2: ``` TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh ``` Test Plan: Imported from OSS Reviewed By: bhosmer Differential Revision: D20193327 Pulled By: ljk53 fbshipit-source-id: 9d369b8864856b098342aea79e0ac8eec04149aa	2020-03-03 19:25:16 -08:00
Jerry Zhang	e5bbd23ca7	[quant][graphmode] Skip quantizing input and output in matched module (#32814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32814 We skip quantization for the intermediate values for patterns like `Conv - ReLU`, but currently we didn't skip quantizing the input/output of the graphs of matched modules, since we now changed the way we add observers, this also needs to be updated. Test Plan: python test/test_jit.py -- 'TestJit.test_insert_observers_skip_values' Imported from OSS Differential Revision: D20208785 fbshipit-source-id: ce30f2c4c8ce737500d0b41357c80ec8b33aecf9	2020-03-03 18:38:36 -08:00
Yunus Rahbar	7cee787a19	[pytorch_ci] Python target determinator (#33577 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33577 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33221 This will make it so that if a pull request is just pure Python files, then we'll only run the Python tests that are connected to the dependency graph of the touched files. Assumptions made: - the Python code does not do dynamic imports - test_X.py never imports from test_Y.py Right now this is only done for test_nn (presumably the largest test entrypoint), but it's not much more work to do it for all the other test entrypoints too. Test Plan: CircleCI results when touching just a few Python files: - pytorch_macos_10_13_py3_test: 41 ->13 minutes https://circleci.com/gh/pytorch/pytorch/4550574?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link - pytorch_windows_vs2019_py36_cuda10.1_test1: 11 -> 2 minutes https://circleci.com/gh/pytorch/pytorch/4550846?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link - pytorch_windows_vs2019_py36_cuda10.1_test2: 51 -> 21 minutes https://circleci.com/gh/pytorch/pytorch/4550845?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link - pytorch_linux_xenial_py3_6_gcc5_4_test: 41 -> 14 minutes https://circleci.com/gh/pytorch/pytorch/4550543?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link Differential Revision: D20009089 fbshipit-source-id: 41708cc301d1c866eb92a04421d8346feb0e3cb5	2020-03-03 18:01:12 -08:00
Amy Yang	7c20578794	NNPI op mapping correct SpatialBN NNPI op name (#34176 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34176 Wrong operator name for the NNPI SpatialBN Test Plan: flow canary Reviewed By: hyuen Differential Revision: D20237933 fbshipit-source-id: dfde658dcbf2482320e36d549f7d83c27df264a0	2020-03-03 17:57:28 -08:00
Hao Lu	a19db54b36	[Redo][ATen] Remove AT_ASSERTM from Blob::free_() (#34168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34168 Redo D19153199. It was reverted because it broke CI, due to the change of `AT_ASSERTM` to `TORCH_INTERNAL_ASSERT_DEBUG_ONLY`. Two problems: 1) bug in `TORCH_INTERNAL_ASSERT_DEBUG_ONLY` about MSVC. I'm sending another diff to fix this bug. 2) BlobTest was expecting `Blob::template Get<T>()` to throw when there is a type mismatch. For now I'll leave `AT_ASSERTM` as it is. Test Plan: ``` buck test mode/dev //caffe2/caffe2:caffe2_test_cpu -- 'BlobTest' --run-disabled buck test mode/opt //caffe2/caffe2:caffe2_test_cpu -- 'BlobTest' --run-disabled ``` Reviewed By: yinghai Differential Revision: D20235225 fbshipit-source-id: 594dad97c03c419afaa8f9023408bc5a119b3cfa	2020-03-03 17:54:05 -08:00
Emilio Castillo	31cc311143	Expose `CUDACachingAllocator` `raw_alloc` and `raw_delete` to python (#33860 ) Summary: This PR aims to improve the interoperability with [CuPy](https://github.com/cupy/cupy/pulls). Instead of having two separate and conflicting memory pools. With this PR, CuPy can directly alloc memory from the PyTorch allocator by means of this proposal https://github.com/cupy/cupy/pull/3126 We would like to gather feedback to know if this approach makes sense for PyTorch, or other alternative designs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33860 Differential Revision: D20212788 Pulled By: ngimel fbshipit-source-id: bc1e08a66da1992d26021147bf645dc65239581c	2020-03-03 17:50:11 -08:00
Nikita Shulga	4edff32f81	[c10] Fix typo in __assert_fail noreturn modifier guard (#34157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34157 `[[noreturn]` only conficts with CUDA __asert_fail defition if clang is used if host compiler Test Plan: CI Reviewed By: EscapeZero Differential Revision: D20232088 fbshipit-source-id: 7182c28a15278e03175865cd0c87410c5de5bf2c	2020-03-03 17:25:25 -08:00
davidriazati	99e211e661	[jit] Add type tags to lists/dicts in pickle (#33255 ) Summary: Stacked PRs * #33474 - [jit] Remove list specializations from pickler * #33255 - [jit] Add type tags to lists/dicts in pickle This adds a global call to `torch.jit._pickle.restore_type_tags` for lists and dicts so that we can preserve their types after serialization. ](https://our.intern.facebook.com/intern/diff/19868637/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33255 Pulled By: driazati Reviewed By: xman1979, Tianshu-Bao Differential Revision: D19868637 fbshipit-source-id: 2f1826e6679a786ca209198690269f399a542c04	2020-03-03 16:48:21 -08:00
Shen Li	7da24b36b1	Apply clang-format to RPC files (#34139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34139 Test Plan: Imported from OSS Differential Revision: D20227342 Pulled By: mrshenli fbshipit-source-id: 01b478bde1f6a51f69eb5277fa90ba6ac2d4b5dc	2020-03-03 16:44:35 -08:00
Shen Li	3af0dffe84	Use double quotes in C++ to stay consistent with Python RPC docs (#34095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34095 Test Plan: Imported from OSS Differential Revision: D20227343 Pulled By: mrshenli fbshipit-source-id: 69c556beee1f9e944eb1053b5ff0ac368dd99c60	2020-03-03 16:44:30 -08:00
Shen Li	f1085a8e41	Improve ProcessGroup RpcBackendOptions Constructor API (#34081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34081 Before this commit, applications have to do the following to configure number of threads in ProcessGroup RPC backend: ``` op = ProcessGroupRpcBackendOptions() op.rpc_timeout = rpc_timeout op.init_method = init_method op.num_send_recv_threads = 32 init_rpc(...., rpc_backend_options=op) ``` After this commit, it can be simplified to: ``` init_rpc(...., rpc_backend_options=ProcessGroupRpcBackendOptions(num_send_recv_threads=32)) ``` Fixes #34075 Test Plan: Imported from OSS Differential Revision: D20227344 Pulled By: mrshenli fbshipit-source-id: def4318e987179b8c8ecca44d7ff935702c8a6e7	2020-03-03 16:43:29 -08:00
Nikita Shulga	9d1c971b11	[Aten] Suppress valgrind leaks in libcuda (#34169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34169 Valgrind have no insight how memory is being initialized by ioctls() Test Plan: CI Reviewed By: seemethere Differential Revision: D20235974 fbshipit-source-id: 46413afa4842e7d42582bbbda903438b1d98691f	2020-03-03 16:00:17 -08:00
Xiang Gao	1beb309e03	Make DEBUG == REL_WITH_DEB_INFO on CUDA build (#34153 ) Summary: Related issue: https://github.com/pytorch/pytorch/issues/34079 I don't know how much we care about the difference between `-G` and `-lineinfo` in `DEBUG` vs `REL_WITH_DEB_INFO`, but since `-G` never worked, let's just use `-lineinfo` on both `DEBUG` and `REL_WITH_DEB_INFO`. This would resolve the failure in `DEBUG=1` build. Locally tested to work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34153 Reviewed By: ljk53 Differential Revision: D20232049 Pulled By: ngimel fbshipit-source-id: 4e48ff818850ba911298b0cc159522f33a305aaa	2020-03-03 15:07:42 -08:00
Eli Uriegas	cb3905e8cf	.circleci: Re-do run nightly pipelines on tag (#34148 ) Summary: Commit that this commit relied on was found to be causing issues with valgrind https://github.com/pytorch/pytorch/issues/33471 Re-does https://github.com/pytorch/pytorch/issues/34078 after revert. This reverts commit 1aff3e2dd3c3937aa1fedbfeee2143cfca25abcc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34148 Differential Revision: D20234451 Pulled By: seemethere fbshipit-source-id: cb5e496a3f761beeeb0cc8df71f9ebc0b271737b	2020-03-03 15:00:59 -08:00
lixinyu	7cda964e20	Remove deprecated codepath for old-style autograd.Function (#30696 ) (#33956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33956 Test Plan: Imported from OSS Differential Revision: D20167359 Pulled By: glaringlee fbshipit-source-id: 9b323bd29eca97bce0475225ad2b3b2ded29005d	2020-03-03 14:58:02 -08:00
Elias Ellison	04378eb618	[JIT] Add modulelist indexing for integer literal (#29236 ) Summary: Allow indexing into modulelists for integer literals. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29236 Differential Revision: D19583935 Pulled By: eellison fbshipit-source-id: 24d54051422a69769dac5e82f3bf622ded2bd8a6	2020-03-03 14:47:31 -08:00
Edward Yang	ba1bd41767	Turn on strict dtype checking for test_torch.py (#33825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33825 Partially addresses #20376 I do this by overriding assertEqual in classes that opt into this. This means I have to fix #33821. The fix is a little unsatisfactory as idiomatic Python 2 super() calls don't work (since the class is no longer in scope); hopefully this will just work when we go to Python 3. General approach taken: - A lot of dtype mismatches are because we specified tensor constants that infer to some dtype, but the actual dtype needed is something else. Those are easy, just annotate the tensor() constructor (often a legacy Tensor/FloatTensor call) with dtype - There are a few cases where the promotion rules are nontrivial. Some of them I just typed out the expected promotion rules manually (based on trial and error) - There are some more complex cases; if it gets too hairy I just set exact_dtype=False and nope the fuck out I don't have time to do it for all the other classes. But the setup should work if people just incrementally add the overrides to classes, and then eventually flip the default. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20125791 Pulled By: ezyang fbshipit-source-id: 389c2d1efbd93172af02f13e38ac5e92fe730c57	2020-03-03 14:45:53 -08:00
Rohan Varma	c579976603	Revert D20171428: [profiler] fix chrome tracing for profiler run with cuda Test Plan: revert-hammer Differential Revision: D20171428 Original commit changeset: ec135a154ce3 fbshipit-source-id: 51ef4351a0df33fd087edbca1b7cd753cdbf1fdf	2020-03-03 14:36:01 -08:00
Xiang Gao	f299c2d6e1	Completely kill CUDA_tensor_apply3 (#34026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34026 Test Plan: Imported from OSS Differential Revision: D20196078 Pulled By: VitalyFedyunin fbshipit-source-id: 502184f412edee90a4f4c030def277a99a7369d4	2020-03-03 14:18:17 -08:00
Xiang Gao	1affaf8d10	Migrate lerp from CUDA_tensor_apply3 to TensorIterator (#34025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34025 Test Plan: Imported from OSS Differential Revision: D20196079 Pulled By: VitalyFedyunin fbshipit-source-id: 150d1de6632c58850020b73ee72e0ed380072926	2020-03-03 14:18:12 -08:00
Xiang Gao	27f56632a4	Migrate bce loss from CUDA_tensor_apply3 to TensorIterator (#34023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34023 Test Plan: Imported from OSS Differential Revision: D20196084 Pulled By: VitalyFedyunin fbshipit-source-id: bd000f09139cb848562e5310f10067db85e1b935	2020-03-03 14:16:40 -08:00
Rohan Varma	92083f31b5	[gloo] dont hold locks in calls to buffer in ProcessGroupGloo:RecvWork::wait() and (#33926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33926 The UnboundBuffer calls here are already protected by a mutex. We only need to hold the lock while writing the shared structures completed_ and exception_. ghstack-source-id: 99315427 Test Plan: CI CI Differential Revision: D20154546 fbshipit-source-id: d1b74508c917b21acdcd0f6a914eb0455437ca0e	2020-03-03 13:28:45 -08:00
Rohan Varma	c93b1d427c	[profiler] fix chrome tracing for profiler run with cuda (#33987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33987 There was an error in https://github.com/pytorch/pytorch/pull/30724/files that resulted in `export_chrome_trace` generating invalid JSON. This only came up when the profiler is run with `use_cuda=True` from what it looks like. In the future, we should have tests that ensure we generate valid JSON because we no longer use the json library. Test Plan: Add UT to validate JSON. Differential Revision: D20171428 fbshipit-source-id: ec135a154ce33f62b78d98468174dce4cf01fedf	2020-03-03 13:27:26 -08:00
Eleanor Dwight Holland	6a97777f72	Remove use of `.data` from optimizers (#33640 ) Summary: Removes all uses of `.data` from optimizers. Or tries to. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33640 Reviewed By: vincentqb Differential Revision: D20203216 Pulled By: albanD fbshipit-source-id: 9bfe78bbed00fd4aaa690801cff0201f0bd680a0	2020-03-03 13:21:55 -08:00
Jerry Zhang	f26bbb5f86	[fix] flake8 lint error (#34146 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34146 Test Plan: . Imported from OSS Differential Revision: D20228830 fbshipit-source-id: 41de3c27c10256939ae6309d25b0499f708a3dca	2020-03-03 13:15:27 -08:00
Dmytro Dzhulgakov	a8fc3d8c2a	Fix HistogramObserver to not do detach on input (#34114 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33545, added a unittest Pull Request resolved: https://github.com/pytorch/pytorch/pull/34114 Differential Revision: D20224719 Pulled By: dzhulgakov fbshipit-source-id: 053d3b3b0c86340027ba1b95b5f3c247aa151aee	2020-03-03 13:15:22 -08:00
Igor Sugak	9650253d70	[caffe2] fix ambiguous call to 'fmaxType' THCHalfAutoNumerics.cuh (#33569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33569 Clang reported a few places where a call to `fmaxType` is ambiguous. In all cases one of the arguments is `double` and another is `float`. Fix the error by creating a proper value 0 and remove the unneeded `ZERO_MACRO` code. Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: ngimel Differential Revision: D20006926 fbshipit-source-id: ca6cfacd57459b1c48eb5080b822d9509b03544d	2020-03-03 13:13:19 -08:00
Hector Yuen	49586a2a7e	fix sph batchnorm to use sph fma Summary: make use of springhill's fma on SpatialBatchnorm Test Plan: re-enabled the unit test, ran it a couple of times pending: net runner Reviewed By: amylittleyang Differential Revision: D20227767 fbshipit-source-id: 7c601f185940249c0a32bdf95d74a20552cd2625	2020-03-03 12:53:08 -08:00
Hong Xu	49921cad28	Minimum build should also exclude XNNPACK (#34110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34110 Differential Revision: D20228129 Pulled By: ezyang fbshipit-source-id: 24e1482f6a6ff423de966bb7a7a45ad3815791e9	2020-03-03 12:51:37 -08:00
anjali411	fbc9c61c81	randn and normal_ for complex tensors (#34037 ) Summary: 1. randn and normal_ methods will work for complex tensors after this PR 2. added an internal function for viewing complex tensors as float tensors which enables us to reuse functions defined for float tensors for complex tensors with change in arguments passed(like size, standard deviation in case of normal_). currently the resultant new float tensor doesn't share the storage with the input complex tensor which means that the version counter wouldn't be updated if any function is called on this resultant tensor, but once the dtype entry is removed from the storage class, this issue will be resolved. Side notes: 1. didn't add a separate header for the util functions because of this issue https://github.com/pytorch/pytorch/issues/20686#issuecomment-593002293 2. we should eventually have a public API method view_complex_as_float once (2) mentioned above gets resolved Pull Request resolved: https://github.com/pytorch/pytorch/pull/34037 Differential Revision: D20221793 Pulled By: anjali411 fbshipit-source-id: a78f5e83d6104e2f55e0b250c4ec32e8d29a14eb	2020-03-03 12:46:01 -08:00
Nathan Goldbaum	ad2825a2c9	Add API for listing functions overridable by __torch_function__ (#33791 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33182 This adds private API functions that developers of types that implement `__torch_function__` can use to ensure full coverage of the subset of the PyTorch API that can be overrided. I've refactored some of the code in the tests into a new `torch._overrides.get_overridable_functions` function. I've also changed `TENSOR_LIKE_TORCH_OVERRIDES` into `torch._overrides.get_testing_overrides` and `IGNORED_TORCH_FUNCTIONS` into `torch._overrides.get_ignored_functions`. Making these two static global variables in the tests into functions should allow rewriting their implementation to construct their return values instead of just statically defining the return value as is done here. Currently that is blocked on not being able to inspect function signatures of compiled kernels in PyTorch (see https://github.com/pytorch/pytorch/issues/28233). See the docs I've added for usage examples of these new functions. I also refactored the existing override tests to make use of these new functions, which should be a good forcing function to make sure they're kept up-to-date. Finally, while working on this I discovered that `TestTorchFunctionOverrides.test_mean` and `TestTorchFunctionOverrides.test_mm` weren't ever being run because they were getting clobbered by the other dynamically generated override tests. I fixed that by renaming the tests and then fixing the actual test code. I've verified that all the subclassing semantics is correct and that the updated test answers are correct. I'm happy to put the fixes to the existing tests in as a separate pull request if that would be easier to review. ping cpuhrsch since the feature request originally came from them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33791 Differential Revision: D20195053 Pulled By: cpuhrsch fbshipit-source-id: 1585f4e405f5223932b410eae03a288dc8eb627e	2020-03-03 12:40:34 -08:00
Zachary DeVito	358450e02b	improved TorchScript traceback (#33834 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33834 This changes how we report Tracebacks to make them more clear when there are both serialized and non-serialized ranges. It now looks like: ``` Traceback (most recent call last): File "foo.py", line 25, in <module> s2(a, b) File "/scratch/zdevito/pytorch/torch/nn/modules/module.py", line 550, in __call__ result = self.forward(input, kwargs) RuntimeError: The following operation failed in the TorchScript interpreter. Traceback of TorchScript, serialized code (most recent call last): File "code/__torch__.py", line 7, in forward x: Tensor, y: Tensor) -> Tensor: return (self).bar(x, y, ) ~~~~~~~~~ <--- HERE def bar(self: __torch__.Moo, x: Tensor, File "code/__torch__.py", line 11, in bar x: Tensor, y: Tensor) -> Tensor: _0 = (self).baz(x, y, ) ~~~~~~~~~ <--- HERE _1 = torch.ones([3], dtype=None, layout=None, device=None, pin_memory=None) return torch.add(_0, _1, alpha=1) File "code/__torch__.py", line 17, in baz x: Tensor, y: Tensor) -> Tensor: return torch.add(x, y, alpha=1) ~~~~~~~~~ <--- HERE Traceback of TorchScript, original code (most recent call last): File "foo.py", line 11, in forward def forward(self, x, y): return self.bar(x, y) ~~~~~~~~ <--- HERE File "foo.py", line 9, in bar def bar(self, x, y): return self.baz(x, y) + torch.ones(3) ~~~~~~~~ <--- HERE File "foo.py", line 7, in baz def baz(self, x, y): return x + y ~~~~~ <--- HERE RuntimeError: The size of tensor a (4) must match the size of tensor b (5) at non-singleton dimension 1 ``` It follows Python convension of having the most important information last and reading from the bottom up. Changes: Moved the error message to the end, to copy Python * Report original traceback separate from serialized traceback * Make sure root functions have names in the interpreter trace. Test Plan: Imported from OSS Differential Revision: D20126136 Pulled By: zdevito fbshipit-source-id: fd01f9985e5d74e04c4d064c02e8bc320f4fac13	2020-03-03 12:27:38 -08:00
Edward Yang	74a0663afd	In torch_test, mark every test that takes >5s on a DEBUG CPU-only build as slow test (#33901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33901 After this change, the pytest profile looks like: 4.83s call test/test_torch.py::TestTorch::test_fft_ifft_rfft_irfft 4.23s call test/test_torch.py::TestTorch::test_var_dim 4.22s call test/test_torch.py::TestTorch::test_std_dim 4.19s call test/test_torch.py::TestTorch::test_max 4.06s call test/test_torch.py::TestTorch::test_min 3.60s call test/test_torch.py::TestTorchDeviceTypeCPU::test_cdist_norm_batch_cpu 2.62s call test/test_torch.py::TestTorchDeviceTypeCPU::test_pow_cpu 2.60s call test/test_torch.py::TestTorch::test_matmul_small_brute_force_1d_Nd And the entire CPU-only test suite can be run in 88s on my Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20222288 Pulled By: ezyang fbshipit-source-id: 4224a9117f42566e290ae202881d76f1545cebec	2020-03-03 11:49:49 -08:00
Simon Layton	9b527b35bb	CUDA Vectorized Dropout (#33879 ) Summary: Add vectorization to dropout kernels for both reads & writes. Moved the `masked_scale_kernel` implementation to `TensorIterator` to pick up recent autovectorization additions by zasdfgbnm , and wrote a vectorized specialization of the dropout training kernel (along with some fairly conservative dispatch logic). Pull Request resolved: https://github.com/pytorch/pytorch/pull/33879 Differential Revision: D20222853 Pulled By: ngimel fbshipit-source-id: 711f56ca907fbc792a10d4bf069c28adab7d6ad7	2020-03-03 11:43:45 -08:00
Jiakai Liu	0cf34cf672	[pytorch][mobile] make sure mobile build work with dynamic dispatch (#34038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34038 Mobile build doesn't include autograd/VariableType dispatch. As the result AutoNonVariableTypeMode needs to be set in mobile runtime. With static dispatch this works is done inside generated jit-dispatch code - AutoNonVariableTypeMode needs to be set on per-op basis. Setting it globally or setting it for wrong ops might break some `is_variable()` checks in the codebase. Thanks to the unification of Variable class and Tensor class, all is_variable() checks have been removed, so AutoNonVariableTypeMode can be set globally now. We never tested inference-only mobile build with dynamic dispatch. It seems that dynamic dispatch also requires setting AutoNonVariableTypeMode for our mobile build (where VariableType functions are not registered). Verified the end-to-end test works with this change: ``` TEST_CUSTOM_BUILD_DYNAMIC=1 test/mobile/custom_build/build.sh ``` Test Plan: Imported from OSS Differential Revision: D20193329 Pulled By: ljk53 fbshipit-source-id: cc98414d89d12463dc82b0cdde0b6160dafc0349	2020-03-03 11:34:08 -08:00
Jiakai Liu	51936c5ea4	[pytorch][CI] end-to-end custom build script (#34012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34012 Today some mobile simulator tests only run on landed PRs and it requires setting up special build environment to repro errors locally. The goal of the PR is to do end-to-end mobile custom build & integration tests with host toolchain (using same CMake options as mobile build). This way, non-mobile engineers can capture & debug mobile related build issues much more easily. There are three custom build types that this script supports: 1. `TEST_DEFAULT_BUILD=1 ./build.sh` - it is similar to the prebuilt libtorch libraries released for Android and iOS (same CMake build options + host toolchain), which doesn't contain autograd function nor backward ops thus is smaller than full LibTorch. 2. `TEST_CUSTOM_BUILD_STATIC=1 ./build.sh` - it further optimizes libtorch size by only including ops used by a specific model. 3. `TEST_CUSTOM_BUILD_DYNAMIC=1 ./build.sh` - similar as 2) except that it relies on the op dependency graph (instead of static dispatch) to calculate and keep all transitively dependent ops by the model. Type 2) will be deprecated by type 3) in the future. Type 3) custom build has not been fully supported yet so it's expected to fail. Replacing existing mobile build CI to run Type 1) build & integration test. Test Plan: Imported from OSS Differential Revision: D20193328 Pulled By: ljk53 fbshipit-source-id: 48c14cae849fde86e27123f00f9911996c1cf40e	2020-03-03 10:55:17 -08:00
Jerry Zhang	5b9f1ada30	[quant][graphmode] Observing input/output values in call site (#33277 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33277 Currently we insert observer in the called graph, which is incorrect since graphs can be shared and the decision of whether to insert observer or not might dependend on where the graph is called. For example, for a call sequence `self.conv1(self.conv2(x))`, we can't inserting observer correctly if `self.conv1` and `self.conv2` are sharing the same type in the current implementation, because we insert observer in the graph of the forward method of Conv2d right now and this call sequence requires us to insert only one observer for the output of self.conv1/input of self.conv2. We'll need to insert observers for input/output values of the graph in call site instead. Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20208787 fbshipit-source-id: 739e1d877639c0d0ed24e573bbd36211defa6836	2020-03-03 10:53:24 -08:00
Igor Sugak	7289e8e865	[caffe2] std::numeric_limits<double>::quiet_NaN() use instead of ::nan("") (#33566 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33566 Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: ngimel Differential Revision: D20006447 fbshipit-source-id: ec522bc2065ad033ee2eeedd26d4a8a7a27e5f56	2020-03-03 10:42:58 -08:00
Michael Ranieri	1702152ef9	fixup unit tests (#34105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34105 make parallel_net_test.cc chronos conforming. exclude gtest asserts that check thrown exceptions when exceptions are disabled. Test Plan: CI green Differential Revision: D20153525 fbshipit-source-id: 7371e559da948f46773fed09e3a23a77411d59e0	2020-03-03 10:33:21 -08:00
Xiang Gao	5082839de5	Migrate Lerp from CUDA_tensor_apply4 to TensorIterator (#33994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33994 Test Plan: Imported from OSS Differential Revision: D20196788 Pulled By: VitalyFedyunin fbshipit-source-id: e5e281460e8cca7ea3911fe56549e1ab62d50e76	2020-03-03 09:38:49 -08:00
Xiang Gao	4074d559e4	Migrate kl_div_backward from CUDA_tensor_apply3 to TensorIterator (#34022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34022 Test Plan: Imported from OSS Differential Revision: D20196080 Pulled By: VitalyFedyunin fbshipit-source-id: 265884dc01c3260197776ee5baaadbe6b523fede	2020-03-03 09:33:31 -08:00
Gregory Chanan	3def76583a	[RESUBMIT] [pytorch] Migrating index_add cuda to ATen (#33548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33548 Mostly just moved code. Index dim and number of indices checks are added to make checks idential to index_add_cpu_ This is a resubmit of #30573, which got reverted. Test Plan: Imported from OSS Differential Revision: D20002248 Pulled By: gchanan fbshipit-source-id: 46df4047cb3fc1dff37a15b83c70b2cbb7a6460b	2020-03-03 09:06:13 -08:00
Gerard Goossen	f29110fdf8	[pytorch] blas gemm fix for k=0 (#33819 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33819 These conditions are for the specific implementation, the fallback implementation works without these checks. So use that if any of these checks isn't true. Resubmit of https://github.com/pytorch/pytorch/pull/33419 (which got reverted due to a problem with XLA, but which now has been fixed) ghstack-source-id: 99333280 Test Plan: Test included Differential Revision: D20121460 fbshipit-source-id: c1056b8e26751e24078bbe80c7cb4b223bcca7cb	2020-03-03 08:56:05 -08:00
Shen Li	b1fd7ba019	Revert D20169501: [pytorch][PR] .circleci: Add CUDA 10.2 to our CI pipeline Test Plan: revert-hammer Differential Revision: D20169501 Original commit changeset: 43b7ca680200 fbshipit-source-id: dbeb0315ccc06b8e082d019cd1ffcd97e1d38e04	2020-03-03 08:15:36 -08:00
Shen Li	1aff3e2dd3	Revert D20204104: [pytorch][PR] .circleci: Add filter to run nightly builds on tag Test Plan: revert-hammer Differential Revision: D20204104 Original commit changeset: 685630e8a04b fbshipit-source-id: 1f4c890b0b199b406bac51e30febb8c6482e7e31	2020-03-03 08:03:03 -08:00
cyy	5be8a4e027	find mkl installed by nuget (#34031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34031 Differential Revision: D20221807 Pulled By: ezyang fbshipit-source-id: 827e2775956f408febb287676bbf9a96a70fe2d4	2020-03-03 07:44:20 -08:00
momohatt	a23e8099dd	Fix typo (#34008 ) Summary: This PR removes apparently unnecessary dots in the documentation of `torch.t`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34008 Differential Revision: D20195084 Pulled By: ezyang fbshipit-source-id: a34022de6b7a32d05a0bb3da197ee3507f4b8d8e	2020-03-03 07:38:40 -08:00
Boyuan Chen	2ce9d26809	Support cdf for mixture_same_family distribution (#33408 ) Summary: The new added mixture_same_family should support cdf if the family has cdf implemented. This is very useful for flow models where cdf of mixture of gassian/logistic is used to model flow Pull Request resolved: https://github.com/pytorch/pytorch/pull/33408 Differential Revision: D20191552 Pulled By: ezyang fbshipit-source-id: 0bfd7973aa335c162919398a12ddec7425712297	2020-03-03 07:31:24 -08:00
Andrey Malevich	e0b90b87a4	[C2] Fix slowness of the ReshapeOp. (#33729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33729 ReshapeOp is doing some useless movements of data between CPU and GPU, which results in crazy amount of kernel calls from this operator. Which makes this operator ridiculosly slow compared to BatchMatMul for cases of pretty cheap models (for example on some versions of GAT). This diff is moving ReshapeOp to leverage CPU storage and reduce amount of kernel calls from num_dims + 3 calls for case of 3-D tensor to 2 calls. Test Plan: Unit-tests are still passing. TODO: perf testing Reviewed By: akyrola Differential Revision: D19659491 fbshipit-source-id: 2341b21e57208b988169f2df5fb598be3dc8acb2	2020-03-03 00:44:22 -08:00
Rohan Varma	0afee0c20b	[rpc][metrics] add initial metric handler classes. (#33153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33153 Test Plan: Added unit tests. Reviewed By: pritamdamania87 Differential Revision: D19615364 fbshipit-source-id: e0447463651390b08ad48e134cb73764d8dcf4f3	2020-03-02 22:03:12 -08:00
Nikita Shulga	0689cf8fc1	[c10] Make __assert_fail CUDA definition compilable with clang host compiler (#34102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34102 if nvcc is invoked with clang host compiler, it will fail with the following error due to the decorators mismatch defined in cuda and c10: ``` error: attribute "noreturn" did not appear on original declaration ``` Test Plan: Build pytorch with clang Reviewed By: EscapeZero Differential Revision: D20204951 fbshipit-source-id: ff7cef0db43436e50590cb4bbf1ae7302c1440fa	2020-03-02 20:11:49 -08:00
cyy	8a14b41617	fix warnings reported by PVS (#33868 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33868 Differential Revision: D20169059 Pulled By: ailzhang fbshipit-source-id: ec12226ae27ddd89fa5bacdd35151981ebfedcfd	2020-03-02 18:51:38 -08:00
Eli Uriegas	0729ad733d	Change lint from python2 -> python3 (#34107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34107 Updates linter to only lint for python3 instead of linting for python2 Test Plan: good_testplan Reviewed By: orionr Differential Revision: D20205395 fbshipit-source-id: 1fa34e5fdf15f7aed96a66d2ce824a7337ee6218	2020-03-02 18:11:42 -08:00
Wanchao Liang	f909b5535e	[autograd] fix allow_unused checking for C++ API (#34035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34035 Bug for the conditon check in https://github.com/pytorch/pytorch/pull/24342, realized we don't have tests in either python or cpp to catch this, so added testes for both python and cpp. Thanks hczhu on capturing it! Test Plan: Imported from OSS Differential Revision: D20198837 Pulled By: wanchaol fbshipit-source-id: 33846a14c0a8e7aac2e8328189d10c38a0d7e6ee	2020-03-02 17:57:15 -08:00
Amy Yang	0759191f12	blacklist spatialBN until bitwise matching (#34092 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34092 Disable op in transform map until we get bitwise matching to ice-ref Test Plan: CI Reviewed By: hyuen Differential Revision: D20177936 fbshipit-source-id: e316384184cb264852e63e5edce721a8614742d1	2020-03-02 17:55:00 -08:00
Eli Uriegas	3b93928ada	.circleci: Add filter to run nightly builds on tag (#34078 ) Summary: ## What this will do: When the repository is tagged the current nightly build pipelines will run and upload to the `test` subdirectory in our S3 bucket for `download.pytorch.org`. Will also upload to the correct organization on anaconda [pytorch-nightly](https://anaconda.org/pytorch-test) This is only meant for release candidates and will actually not run on any tag that does not match the release candidate regex. This has been tested on a small scale with: `3ebe0ff2f8` ## Related PRs: * `.circleci: Divert packages to test channel on tag`: https://github.com/pytorch/pytorch/pull/33842 * `.cirlceci: Swap PYTORCH_BUILD_VERSION if on tag`: https://github.com/pytorch/pytorch/pull/33326 ## Work to be done later: - [ ] Figure out how to remove manual step of updating s3 html indices. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/34078 Differential Revision: D20204104 Pulled By: seemethere fbshipit-source-id: 685630e8a04b19fc17374585e9228a13a8c3e407	2020-03-02 17:20:21 -08:00
Jiakai Liu	ad3f4a32bd	[pytorch][buck] fix selective buck build (#34090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34090 Update the per-op-registration template file to use the new c10 registration API. ghstack-source-id: 99318973 Test Plan: ``` buck build -c pt.selective_build=1 \ fbandroid/mode/dev_clang_libcxx fbandroid/mode/server \ xplat/caffe2/fb/lite_predictor:lite_predictor_resnet ``` Differential Revision: D20200452 fbshipit-source-id: dc619cb6bdfc0c787b87475eb24b6a2da29e70e2	2020-03-02 17:13:08 -08:00
Rohan Varma	1ed950e1b6	[distributed] skip use_ignore_output tests in c10d if not built with gloo (#33513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33513 These tests require gloo so like the other tests, they should be skipped if not building with gloo. Otherwise they crash on Mac if not built with gloo currently. verified that it does not crash anymore with this PR. ghstack-source-id: 99303707 Test Plan: Built on Mac and verified that the tests do not fail. Differential Revision: D19976908 fbshipit-source-id: 6a2a70c3eab83efd0e188e86cabe56de4a869f4d	2020-03-02 16:43:21 -08:00
Xiang Gao	ff1fc402a8	Migrate dirichlet from CUDA_tensor_apply3 to TensorIterator (#34021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34021 Test Plan: Imported from OSS Differential Revision: D20196082 Pulled By: VitalyFedyunin fbshipit-source-id: 9736a0ebbc529975e95a4f996dbc28e070cf1e63	2020-03-02 16:31:32 -08:00
Xiang Gao	77b9016a8e	Migrate gamma grad from CUDA_tensor_apply3 to TensorIterator (#34020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34020 Test Plan: Imported from OSS Differential Revision: D20196083 Pulled By: VitalyFedyunin fbshipit-source-id: 8659bc004678a656071263c94e929f2e1a686812	2020-03-02 16:29:45 -08:00
Eli Uriegas	bb4465f9f5	.circleci: Add CUDA 10.2 to our CI pipeline (#33471 ) Summary: Adds support for CUDA 10.2 builds on our nightly pipelines / regular test pipeliens. Depends on https://github.com/pytorch/builder/pull/404 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33471 Test Plan: sandcastle_will_deliver Reviewed By: ezyang Differential Revision: D20169501 Pulled By: seemethere fbshipit-source-id: 43b7ca680200a67fa88ad4f7b5a121954c9f089d	2020-03-02 15:50:48 -08:00
Michael Ranieri	b874c039f6	Allow checking for cached module before asserting (#33954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33954 fixes caffe2/core/module_test.cc on windows misc lint fixes. Test Plan: CI green Reviewed By: malfet Differential Revision: D20153512 fbshipit-source-id: aeae84a028e26edd65c7218611e3c49a8d9bb8c0	2020-03-02 15:43:50 -08:00
davidriazati	a4716d0e26	Fix lint (#34094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34094 Pulled By: driazati Differential Revision: D20201433 fbshipit-source-id: d8292b329aebd232556db517b71daeee3f266bfc	2020-03-02 15:34:52 -08:00
Ilia Cherniavskii	c206b4398d	Show errors from the tasks in the thread pool (#33938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33938 Making sure we don't silently ignore exceptions from the tasks in the thread pool Test Plan: python setup.py clean && python setup.py develop install Differential Revision: D20178603 Pulled By: ilia-cher fbshipit-source-id: 34971032205a1a53fb7419ed84ebb986f9e959ad	2020-03-02 14:49:52 -08:00
hirano	a57a7b4c29	Change input value in examples of `BCEWithLogitsLoss` (#34053 ) Summary: In the examples of `BCEWithLogitsLoss`, `0.999` is passed as the prediction value. The value `0.999` seems to be a probability, but actually it's not. I think it's better to pass a value that is greater than 1, not to confuse readers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34053 Differential Revision: D20195456 Pulled By: ezyang fbshipit-source-id: 3abbda6232ee1ab141d202d0ce1177526ad59c53	2020-03-02 14:35:56 -08:00
Michael Ranieri	15bf4892f2	prevent crash on exit from static destructor race (#33955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33955 unit tests on windows (clang and cl) were crashing on exit due to racing with static variable destruction. Test Plan: CI green Differential Revision: D20153587 fbshipit-source-id: 22e35e591660d49f3a755f93d0c14d7a023ebb2a	2020-03-02 14:28:13 -08:00
Pavel Belevich	e568c039bd	Enable Tensor.random_(from, to) for half on CPU (#34030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34030 Test Plan: Imported from OSS Differential Revision: D20182412 Pulled By: pbelevich fbshipit-source-id: b7439e6d66e1c0b9ffa8b397cab057c9146f5714	2020-03-02 14:22:35 -08:00
Alfredo Canziani	384a4feab6	Fix bad math typesetting (#34027 ) Summary: Fixing documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34027 Differential Revision: D20195235 Pulled By: ezyang fbshipit-source-id: 0281bc0e8718e700e0982ced1342969b367ba57c	2020-03-02 14:16:34 -08:00
davidriazati	11843049d5	[jit] Fix flipped PackedSequence outputs in script (#32955 ) Summary: Stacked PRs * #32955 - [jit] Fix flipped PackedSequence outputs in script * #32953 - [jit] Support properties on `Device` Fixes #32605 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32955 Pulled By: driazati Differential Revision: D20165514 fbshipit-source-id: a130c438b40e51ec27d36f021b0dc7869570aa6a	2020-03-02 13:50:36 -08:00
Ankesh Anand	45c45195cd	Remove warning about building from source to use the NCCL backend (#34051 ) Summary: I think this warning isn't true anymore, and the NCCL backend works without PyTorch needing to be built from source. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34051 Differential Revision: D20195310 Pulled By: ezyang fbshipit-source-id: 14f879a8c43ea5efdbdf0f638792ea2b90011f4a	2020-03-02 13:43:43 -08:00
Michael Ranieri	51d969e86a	preprocessor cleanup (#33957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33957 lots of small preprocessor warning cleanup for windows Test Plan: CI green Reviewed By: malfet, albanD Differential Revision: D20153582 fbshipit-source-id: 18fd61c466fd1f55ededdae4448b3009a9cedc04	2020-03-02 13:37:19 -08:00
Peter Bell	4b3ae7e0af	Enable -Werror=format compile errors on torch exception types (#34019 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33899 In the issue, we have ``` TypeError("expected %s (got %s)", dispatch_key, toString(other.key_set()).c_str()); ``` which results in `dispatch_key` being interpreted as a c-string by `sprintf`. Adding `__attrbute__((format))` to the `TypeError` constructor allows gcc or clang to detect this at compile time. Then `-Werror=format` makes it a hard error at compile time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34019 Differential Revision: D20194842 Pulled By: ezyang fbshipit-source-id: fa4448916c309d91e3d949fa65bb3aa7cca5c6a8	2020-03-02 13:25:39 -08:00
Michael Ranieri	9239608037	fix windows clang attributes (#33959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33959 make sure clang on windows uses correct attributes. add support for cl.exe style pragma attributes Test Plan: CI green Differential Revision: D20153548 fbshipit-source-id: bfbfd374e8f5e7d7b8598453c3ca2b6693a425f1	2020-03-02 13:20:51 -08:00
Xiang Gao	87b3f87f27	Migrate prelu from CUDA_tensor_apply2 to TensorIterator (#34003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34003 Test Plan: Imported from OSS Differential Revision: D20196994 Pulled By: VitalyFedyunin fbshipit-source-id: 1749a968b1ec6636e08c11c93de43b5599e7cf4b	2020-03-02 12:49:32 -08:00
Shen Li	9956a231b9	Fix backward compatibility tests (#34071 ) Summary: 1. As RRef has been added as a JIT type in https://github.com/pytorch/pytorch/issues/32992, we no longer need to skip them 2. Nightly now knows about Any Pull Request resolved: https://github.com/pytorch/pytorch/pull/34071 Reviewed By: houseroad Differential Revision: D20196963 Pulled By: mrshenli fbshipit-source-id: 1ea79c5682e8be9087b9cb74104e1b914c3fc456	2020-03-02 12:42:33 -08:00
Michael Ranieri	ec0f2184ba	clang intrinsics targeting (#33958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33958 look for clang intrinsic headers on windows Test Plan: CI green Differential Revision: D20153573 fbshipit-source-id: c87da3b0e9950d3df0bf8350df8ae592064d6613	2020-03-02 12:37:07 -08:00
anjali411	ba4cff2ffc	[dtype inference] Following pytorch default for float vs double (#33713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33713 Differential Revision: D20193387 Pulled By: anjali411 fbshipit-source-id: d802ec395df4e75e2be02e91d7288ae6fb7cf8e0	2020-03-02 11:56:34 -08:00
Zino Benaissa	cab8772c6c	Freezing Torchscript modules (#32178 ) Summary: This patch enables folding GetAttr nodes with their corresponding values. _jit_pass_freeze_module API returns a new TorchScipt module where all function calls and get attributes are inlined. Usage: frozen_model = torch._C._freeze_module(scrited_model._c) frozen_model.forward(...) This API currently optimizes the forward method. We will follow up to to preserve and optimize methods and attributes that are annotated as torch.jit.interface. Several future improvements to JIT optimizations are required to maximize clean up/de-sugar the graph and eliminate redundancies. Ideally, we want to produce a graph that can easily be lowered to GLOW and other low-level backends. __ Pull Request resolved: https://github.com/pytorch/pytorch/pull/32178 Differential Revision: D19419640 Pulled By: bzinodev fbshipit-source-id: 52baffaba9bca2cd60a8e747baa68d57711ad42b	2020-03-02 11:38:36 -08:00
Hong Xu	e73d4286b0	Fix conflict between XNNPACK's clog dependency and our cpuinfo dependency (#33922 ) Summary: Currently if we run ```bash DEBUG=1 ONNX_ML=0 MAX_JOBS=8 CMAKE_CXX_COMPILER_LAUNCHER=ccache CMAKE_C_COMPILER_LAUNCHER=ccache CMAKE_CUDA_COMPILER_LAUNCHER=ccache USE_OPENMP=0 USE_DISTRIBUTED=0 USE_MKLDNN=0 USE_NCCL=0 USE_CUDA=1 USE_CUDNN=0 USE_STATIC_CUDNN=0 USE_NNPACK=0 USE_QNNPACK=0 USE_FBGEMM=0 BUILD_TEST=0 TORCH_CUDA_ARCH_LIST="6.1" python setup.py develop --cmake-only ``` then `touch build/CMakeCache.txt` (which adjusting build options will do), then `python setup.py develop`, the following error message will show up: ``` CMake Error at build/clog-source/CMakeLists.txt:249 (ADD_SUBDIRECTORY): ADD_SUBDIRECTORY not given a binary directory but the given source directory "/home/hong/wsrc/pytorch/build/clog-source" is not a subdirectory of "/home/hong/wsrc/pytorch/build/clog-source". When specifying an out-of-tree source a binary directory must be explicitly specified. ``` This is due to a conflict between our cpuinfo submodule and XNNPACK's external clog dependency. Moving our cpuinfo upward and setting CLOG_SOURCE_DIR resolves the issue. --- Also reverted https://github.com/pytorch/pytorch/issues/33947 , where `CLOG_SOURCE_DIR` as an option is not quite appropriate (given that cpuinfo uses its included clog subdir) and the setting of this variable should be a bit later when the dir of cpuinfo is known. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33922 Differential Revision: D20193572 Pulled By: ezyang fbshipit-source-id: 7cdbdc947a6c7e0ef10df33feccb5b20e1b3ba43	2020-03-02 10:40:12 -08:00
Jie	e54b8e1a47	[CUDNN NHWC CONVOLUTION] Re-stride input tensors for wgrad in cudnn_convolution (#33784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33784 Differential Revision: D20127485 Pulled By: VitalyFedyunin fbshipit-source-id: 9d893ffe7ff9499e7e9a7e8bed720e9441d1018e	2020-03-02 10:05:59 -08:00
Jongsoo Park	31737e989d	[aten] remove shadowed declaration warning (#34014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34014 Remove warning ``` caffe2/aten/src/ATen/core/op_registration/op_registration.h: In lambda function: caffe2/aten/src/ATen/core/op_registration/op_registration.h:704:47: warning: declaration of ‘c10::DeviceType t’ shadows a parameter [-Wshadow=compatible-local] auto deviceTypeToDispatchKey = [](DeviceType t){ ^ caffe2/aten/src/ATen/core/op_registration/op_registration.h:703:21: note: shadowed declaration is here inline CppFunction dispatch(DeviceType t, Func&& raw_f) { ~~~~~~~~~~~^ ``` Test Plan: CI Reviewed By: dzhulgakov Differential Revision: D20181155 fbshipit-source-id: 41947d171369b9bd7a87e3e367492f9b2165fd6b	2020-03-02 09:22:13 -08:00
Orion Reblitz-Richardson	ad17dafc50	[caffe2] Remove python2 from operator_test (#33977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33977 Removing python2 from operator_test so we can retire python2 support for PyTorch. Test Plan: waitforsandcastle Reviewed By: seemethere Differential Revision: D20129500 fbshipit-source-id: d4c82e4acfc795be9bec6a162c713e37ffb9f5ff	2020-03-02 08:55:53 -08:00
Sean Silva	f4532d7542	Fix typo (#33925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33925 Differential Revision: D20171970 Pulled By: vincentqb fbshipit-source-id: 5c1a8553760f74cecebaea7e88463b767ab81211	2020-03-02 08:13:55 -08:00
Shen Li	71f8624ecb	Revert D19153199: [ATen] Remove `AT_ASSERTM` from Blob::free_() Test Plan: revert-hammer Differential Revision: D19153199 Original commit changeset: f93983d5bf32 fbshipit-source-id: d79cf659f3cb26427196b9d9d1fe44e15874ad79	2020-03-02 07:35:35 -08:00
Moto Hira	6631c2a627	[doc] Add grad context manager doc to toplevel torch module. (#33877 ) Summary: fixes https://github.com/pytorch/pytorch/issues/32014 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33877 Differential Revision: D20141801 Pulled By: albanD fbshipit-source-id: bac713382a71666dd5e2499f710c51a55cc579ba	2020-03-02 06:32:36 -08:00
Xiao Wang	a500491cbc	Fix index_put when tensor length > int_max (#33753 ) Summary: This PR would fix https://github.com/pytorch/pytorch/issues/33345. The original CUDA kernel looks good. I changed most appearances of `int` to `int64_t` to avoid the CUDA memory access issue. Removed the two `TORCH_CHECK`. Added a unit test. cc csarofeen ngimel ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/33753 Differential Revision: D20185005 Pulled By: ngimel fbshipit-source-id: ef0abdc12ea680e10fe6b85266e2773c7a272f0d	2020-03-01 21:51:23 -08:00
Hao Lu	f857fe18cd	[ATen] Remove `AT_ASSERTM` from Blob::free_() (#33929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33929 `Blob::~Blob()` calls `Blob::free_()`. `Blob::free_()` throws and destructors should not throw. A few other minor tweaks include: - Remove `static_cast<void*>()` in `ShareExternal` - Remove default values of `pointer_` and `has_ownership_` Test Plan: ``` buck test caffe2/caffe2:caffe2_test_cpu ``` https://our.intern.facebook.com/intern/ads/canary/424941782651397826 https://our.intern.facebook.com/intern/ads/canary/424941799628450155 Reviewed By: yinghai Differential Revision: D19153199 fbshipit-source-id: f93983d5bf324b9a464ad2d1ed0dba13f807d2f6	2020-03-01 21:09:04 -08:00
svcscm	e017b1e9fb	Updating submodules Summary: GitHub commits: `af57f36db0` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: 4bd71218aee5e2a20a3496f2a51d464a19c0f879	2020-03-01 20:54:32 -08:00
Basil Hosmer	ad769d74d9	Collapse _like overloads into a single overload. (#33705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33705 The fact that there were two overloads appears to be a historical artifact that dates back to when goldsborough originally added these bindings in the first place. If TensorOptions is made optional, then you only need one overload, not two, as they are exactly redundant with each other. When MemoryFormat was added, it was made a little harder to do this, as the C++ syntax at::empty_like(t, memory_format) would not work if you collapsed the overload; but now it works because TensorOptions supports MemoryFormat. The upshot is, I can get rid of all the overloads and just have one overload. Amazingly, this change is backwards compatible, as the test attests. While I was at it, I also deleted the overload name from the functions entirely. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20073355 Pulled By: bhosmer fbshipit-source-id: c6a8908213b32ccf6737ea864d135e2cce34f56b	2020-03-01 19:40:22 -08:00
Basil Hosmer	b98bce8cd4	Add MemoryFormat to TensorOptions, but not codegen. (#33704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33704 This diff adds MemoryFormat field to TensorOptions, and teaches all kernels that take TensorOptions to respect it, but doesn't teach the codegen about it. As such, it is now possible to specify memory_format using TensorOptions syntax, e.g., at::empty_like(tensor, at::memory_format(MemoryFormat::Contiguous)) in the C++ API, but there isn't any other user visible effect. The intended end state of this diff stack is to eliminate the explicit MemoryFormat? arguments from native functions, but as this change has BC implications I'd prefer to do it separately. So this starts things off with a non-BC breaking addition to the API. For all internal functions that are not bound by codegen, I switch them to exclusively using TensorOptions (eliminating MemoryFormat); there's only a few, mostly quantized and to(). To keep things screwed down in the short term, it is a HARD ERROR to specify both the explicit MemoryFormat argument as well as TensorOptions. This caught a few errors in my diff where I needed to modify memory format settings and then call code later, esp in empty_like. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20073356 Pulled By: bhosmer fbshipit-source-id: 18d310d7ee7cf2ee182994104652afcfc9d613e2	2020-03-01 18:22:12 -08:00
svcscm	9f7708eecb	Updating submodules Summary: GitHub commits: `8c1badaa4a` `ce1ee42199` `b23caba073` `aa48f50c9a` `f7695cddae` `8a386d9549` `baab5386e2` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: 6c036499de97418afd9337979e89365ce13ceee7	2020-03-01 16:05:00 -08:00
Yanli Zhao	15caf3b516	move test helper functions out of test funciton (#33960 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33960 test helper functions should be out of test function. it is possible process 2 launches test functions slower than process 1, and process 1 sends request to run a helper function on process 2. process 2 may have not compile the helper function yet when process 2 starts to serve processs 1's request, and thus may return error like "attempted to get undefined function" ghstack-source-id: 99205620 Test Plan: test_remote_script_module was flaky for thrift backend in my local stress test runs, due to error "attempted to get undefined function". With fix in this diff, stress runs passed Differential Revision: D20167969 fbshipit-source-id: 8a2b9cd7bd62462e24bdbcb69ad32dca745d6956	2020-03-01 14:16:56 -08:00
Yinghai Lu	84ec5357d3	Make HashNode visible (#34045 ) Summary: HashNode and CompareNode are useful functions for hanlding jit::Node. This is to unblock https://github.com/pytorch/glow/pull/4235. Pull Request resolved: https://github.com/pytorch/pytorch/pull/34045 Reviewed By: houseroad Differential Revision: D20184733 Pulled By: yinghai fbshipit-source-id: 6c829f2f111a490fd2d85017475c1731cd97fb20	2020-03-01 12:28:18 -08:00
Wanchao Liang	ace2b4f37f	[resubmit] try to infer rref type from python (#33992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33992 resubmit of https://github.com/pytorch/pytorch/pull/33369 with tweaks on when the rref type being created to ensure ivalue->type() hold the correct RRef type inside of inner element type. Test Plan: Imported from OSS Differential Revision: D20175043 Pulled By: wanchaol fbshipit-source-id: a08b178e989c995632374e6c868d23c5a85526ae	2020-02-29 20:26:40 -08:00
Basil Hosmer	7747fe81c4	reuse named tensor error message in generated code (#33536 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33536 Simple fix, merge the identical string literals that were being inlined into every wrapper for ops that don't support named tensors. E.g. ``` Tensor all(const Tensor & self, int64_t dim, bool keepdim) { if (self.has_names()) { AT_ERROR( "all is not yet supported with named tensors. Please drop names via " "`tensor = tensor.rename(None)`, call the op with an unnamed tensor, " "and set names on the result of the operation."); } const OptionalDeviceGuard device_guard(device_of(self)); return at::native::all(self, dim, keepdim); } ``` becomes ``` Tensor all(const Tensor & self, int64_t dim, bool keepdim) { if (self.has_names()) { AT_ERROR("all", named_tensors_unsupported_error); } const OptionalDeviceGuard device_guard(device_of(self)); return at::native::all(self, dim, keepdim); } ``` Also updated the generated file comments to include the source template names, e.g. ``` // generated by aten/src/ATen/gen.py from TypeDefault.cpp ``` Test Plan: Imported from OSS Differential Revision: D19993407 Pulled By: bhosmer fbshipit-source-id: 88395a649e6ba53191332344123555c217c5eb40	2020-02-29 17:00:13 -08:00
Jiakai Liu	7f7ea685c0	Revert D18672405: Use codegen'ed unboxing wrappers Test Plan: revert-hammer Differential Revision: D18672405 Original commit changeset: bf2a7056082d fbshipit-source-id: b7ef1529fc266b4856e49e4dbd1fe8c7ba3d455d	2020-02-29 15:27:54 -08:00
Jiakai Liu	3acfccafbb	Revert D20172782: Fix mobile build Test Plan: revert-hammer Differential Revision: D20172782 Original commit changeset: e4bfca2a6076 fbshipit-source-id: 3093efd4a135f8d6c3174887ad1e3362aad9aa7c	2020-02-29 15:21:07 -08:00
Jiakai Liu	595445e889	Revert D20178827: Fix mobile build Test Plan: revert-hammer Differential Revision: D20178827 Original commit changeset: 980ac3d1ab3d fbshipit-source-id: 9af6cb319e80c9b6a916bbdeffd69920075c7aec	2020-02-29 15:04:35 -08:00
Jiakai Liu	c596ec7eb3	[pytorch] update code analyzer script to cover new c10::Module::def API (#33975 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33975 Currently the code analysis script doesn't go beyond the scope of the registration API call, i.e. calling registration via a wrapper will not be covered by the analysis - currently the new API is essentially a wrapper around old API. Simply adding the new API signature to the registration API pattern can solve the problem for now. We might need change the analyzer code if things change significantly in the future. Test Plan: - update test project to use the new API; - run analyzer against pytorch codebase; Differential Revision: D20169549 Pulled By: ljk53 fbshipit-source-id: c7925fa0486eee18f07e791a38c32152fee59004	2020-02-29 10:29:45 -08:00
Sebastian Messmer	5a8562a6af	Fix mobile build (#34000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/34000 - ghstack-source-id: 99241400 Test Plan: liujiakai Differential Revision: D20178827 fbshipit-source-id: 980ac3d1ab3d47c12613c20ee9b8dc7d083f56a9	2020-02-28 23:28:00 -08:00
Will Feng	1494005cfd	C++ tensor indexing: more indexing tests (#30427 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30427 Test Plan: Imported from OSS Differential Revision: D18695899 Pulled By: yf225 fbshipit-source-id: 74455fe52ef922556fabe65aefca9ec93fe2346d	2020-02-28 22:07:41 -08:00
Kimish Patel	0e52627358	Fixing pthreadpool symbol conflict issue. (#33869 ) Summary: Mainly renaming pthread_create of C2, the only one referred internally in NNPACK, that is conflicting, to pthread_create_c2. Removed 2 other conflicting symbols that are not used internally at all. Pointing XNNPACK to original repo instead of the fork. Copy pasted the new interface and implementation to caff2/utils/threadpool, so that for internal builds we compile against this. When threadpool is unified this will be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33869 Differential Revision: D20140580 Pulled By: kimishpatel fbshipit-source-id: de70df0af9c7d6bc065e85ede0e1c4dd6a9e6be3	2020-02-28 21:23:18 -08:00
Elias Ellison	85b1c45a45	[JIT] fix alias assertion (#33952 ) Summary: This bug has been hit a couple times recently. We need to handle all bivariant types, not just optional, when asserting mutability/immutability of pointed-to elements in alias analysis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33952 Differential Revision: D20166025 Pulled By: eellison fbshipit-source-id: cf3df9897a639641ef8303a08ba2b13523d01ef1	2020-02-28 19:54:29 -08:00
davidriazati	2111c4ff0c	[jit] Add missing tensor properties (#33906 ) Summary: Fixes #30775 This adds TorchScript implementations (copied from `python_variable.cpp`) for the remainin `Tensor` properties that were missing from the jit, in addition to a test that ensures new properties will trigger a failure so we can decide whether we want to add them as well. For `some_tensor`, adds: * `some_tensor.T` * `some_tensor.ndim` * `some_tensor.is_leaf` * `some_tensor.name` ](https://our.intern.facebook.com/intern/diff/20153288/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33906 Pulled By: driazati Differential Revision: D20153288 fbshipit-source-id: 2ddc48a14267077bc176065267e5ce52181b3d6b	2020-02-28 19:06:11 -08:00
Sebastian Messmer	6e70b2da62	Fix mobile build (#33985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33985 This was broken by https://github.com/pytorch/pytorch/pull/32521 but only showed up in master CI builds ghstack-source-id: 99220995 Test Plan: CI Differential Revision: D20172782 fbshipit-source-id: e4bfca2a6076f1bc1c562fca9c7dfcb156bfbf3e	2020-02-28 18:43:18 -08:00
davidriazati	2f6ffe8c39	[jit] Resolve type annotation names to types (#29623 ) Summary: This adds some machinery so that we use Python to resolve types to a value and the corresponding resolution logic in `annotations.py` instead of using the string. This PR also `slowTests` a random test since it was taking > 1 min whereas all the other tests take < 10 seconds. Fixes #31864 Fixes #31950 ](https://our.intern.facebook.com/intern/diff/20144407/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29623 Pulled By: driazati Differential Revision: D20144407 fbshipit-source-id: ef3699f6b86039d8b4646ffc42c21bd1132d1681	2020-02-28 18:35:10 -08:00
Martin Yuan	55b44f6746	Throw an exception when method cannot be found from mobile module. (#33972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33972 Test Plan: Imported from OSS Differential Revision: D20168965 Pulled By: iseeyuan fbshipit-source-id: 2efe5dcb1fb80407cd88a47c50cb382ecd8aa275	2020-02-28 18:28:09 -08:00
Ailing Zhang	de55e47a4b	Pass all ops to XLA with additional info about whether it's compound (#33908 ) Summary: This PR prepares us to allow XLA use `XLAPreAutograd` to override compound ops. To do this we'll need to pass all ops, with additional infomation about whether it's compound or not for XLA to parse. Companion PR: https://github.com/pytorch/xla/pull/1698 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33908 Differential Revision: D20149585 Pulled By: ailzhang fbshipit-source-id: a93140e8a34548fcabcea454386d15df58177c1d	2020-02-28 18:17:23 -08:00
Bert Maher	38b6cb479b	Check fuser results when profiling (#33944 ) Summary: With the profiling executor enabled the fuser won't be invoked until the second pass over a script function, so some of these tests weren't correctly comparing the fused output with the interpreter output. I've used the `checkScript` method where applicable, which seems to do the right thing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33944 Test Plan: Locally inject obvious errors into the fuser and verify that the updated tests fail when they're supposed to. Differential Revision: D20162320 Pulled By: bertmaher fbshipit-source-id: 4a2f3f2d2ff1d81f23db504dc8cd0d5417bdcc50	2020-02-28 17:01:34 -08:00
Igor Sugak	4377061baf	[caffe2] fix atomicAdd redeclaration Clang error (#33559 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33559 For sm_60+ CUDA supports `atomicAdd(double, double)` function and for lower compute capabilities the CUDA C Programming Guide [1] suggest a user implementation as in this code. On the other side, Clang's CUDA wrappers unconditionally define this function, regardless of compute capability, and merit an error if it actually get's used. So the problem is: when Clang is used for < sm_60, CUDA's `atomicAdd(double, double)` cannot be used and it cannot be redeclared in standard compliant C++. Workaround the problem by using Clang's `enable_if` attribute [2], which has a side effect of function redeclaration. 1. https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#atomic-functions 2. https://clang.llvm.org/docs/AttributeReference.html#enable-if Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: ngimel Differential Revision: D20005113 fbshipit-source-id: d0d4bd6514f201af9cdeba1229bd9b798df0d02e	2020-02-28 15:48:19 -08:00
Igor Sugak	4fb8679218	[caffe2] fix field initialization after base Clang errors (#33556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33556 Fix several places exposed by Clang where order of member initializer list doesn't actually match the actual initialization order. The fix is to simply reorder member initializer lists. Also accepted formatting changes suggested by clang-format linter. Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: ngimel Differential Revision: D20004834 fbshipit-source-id: b61c7c3f1fe8413bbb3512f6b62177a3ddf67682	2020-02-28 15:42:49 -08:00
David Reiss	991f7a20f2	Use clog from cpuinfo/deps instead of downloading (#33947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33947 XNNPACK was downloading clog because we weren't setting CLOG_SOURCE_DIR. Actually, it was downloading cpuinfo and pointing to the copy of clog within that. So let's just point to the copy of clog within the cpuinfo submodule we already have. (Note: this ignores all push blocking failures!) Test Plan: Ran cmake and didn't see any downloading. Verified that our clog is the same as the one that was being downloaded with `diff -Naur`. Differential Revision: D20169656 Pulled By: suo fbshipit-source-id: ba0f7d1535f702e504fbc4f0102e567f860db94b	2020-02-28 15:19:03 -08:00
Ailing Zhang	69d2741480	Add list of view ops to public doc. (#32560 ) Summary: This PR comes from discussion with albanD in https://fb.quip.com/npBHAXaPfnbu. Main goal is to clarify view ops with general outplace/inplace ops and remind users about the difference. For reference this information is only available in code which is internal and hard to find. Also changes to this list actually affect users so we think it's better to expose it as public information. It's also helpful for new backend like XLA when implementing PyTorch ops. `19bbb4fccb/tools/autograd/gen_autograd.py (L32-L68)` Please feel free to comment! Pull Request resolved: https://github.com/pytorch/pytorch/pull/32560 Differential Revision: D20161069 Pulled By: ailzhang fbshipit-source-id: b5f1fd4353fe7594a427784db288aeb5a37dc521	2020-02-28 15:05:55 -08:00
xiaobing.zhang	b678256bfb	Move glu to Aten(CPU) (#33179 ) Summary: This PR move glu to Aten(CPU). Test script: ``` import torch import torch.nn.functional as F import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" #warm up for n in [10, 100, 1000, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n // 2, device=device) for i in range(1000): output = F.glu(input) output.backward(grad_output) for n in [10, 100, 1000, 10000]: fwd_t = 0 bwd_t = 0 input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n // 2, device=device) for i in range(10000): t1 = _time() output = F.glu(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test device: skx-8180. Before: ``` input size(128, 10) forward time is 0.04 (ms); backwad avg time is 0.08 (ms). input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.14 (ms). input size(128, 1000) forward time is 0.11 (ms); backwad avg time is 0.31 (ms). input size(128, 10000) forward time is 1.52 (ms); backwad avg time is 2.04 (ms). ``` After: ``` input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.05 (ms). input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms). input size(128, 1000) forward time is 0.07 (ms); backwad avg time is 0.17 (ms). input size(128, 10000) forward time is 0.13 (ms); backwad avg time is 1.03 (ms). ``` Fix https://github.com/pytorch/pytorch/issues/24707, https://github.com/pytorch/pytorch/issues/24708. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33179 Differential Revision: D19839835 Pulled By: VitalyFedyunin fbshipit-source-id: e4d3438556a1068da2c4a7e573d6bbf8d2a6e2b9	2020-02-28 14:54:38 -08:00
Sebastian Messmer	3c5677a676	Use codegen'ed unboxing wrappers (#32521 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32521 Not all ops support the templated unboxing wrappers yet. For the ones that don't, let's use the codegen'ed unboxing wrappers from register_aten_ops.cpp, but register them with c10 directly instead of JIT. The `use_c10_dispatcher` setting in `native_functions.yaml` now has a new option 'with_codegenerated_unboxing_wrapper' which means we take the codegened unboxing wrapper from register_aten_ops.cpp and stuff it into c10. This new argument is the default, 'unboxed_only' is not the default anymore. For the (very few) ops that don't support boxed dispatch yet (i.e. ops taking TensorOptions arguments), we set them to 'unboxed_only' and they follow the old behavior of having register_aten_ops.cpp register the jit op. Next steps here are (1) to make TensorOptions work with boxed dispatch and remove the `unboxed_only` option from `use_c10_dispatcher`, so that all ops go through the new path and (2) make the new path template-only and remove codegen from it (see https://github.com/pytorch/pytorch/issues/32366). First experiments show that - For a small JITted model that calls add (i.e. a op with just two arguments that are both tensors) on two tensors in a loop, we see a 2-4% performance improvement (~35-50ns) when compared to the old path. This is a simple op that takes two tensor arguments and no non-tensor arguments, so iterating over it in boxed dispatch is cheap. - For a small JITted model that calls avgpool1d (i.e. an op that has one tensor arg and 5 non-tensor args) on a tensor in a loop, we see a 3-4% performance regression (~60ns) when compared to the old path. This is an op that takes only one tensor argument and then 6 non-tensor arguments. Unboxed dispatch doesn’t have to look at those but boxed dispatch still needs to iterate over them. This performance difference is likely due to boxed dispatch iterating over all arguments in a loop and unboxed dispatch not having to look at non-tensor arguments. ghstack-source-id: 99161484 Test Plan: unit tests that call existing ops through JIT Differential Revision: D18672405 fbshipit-source-id: bf2a7056082dfad61e7e83e9eeff337060eb6944	2020-02-28 14:48:25 -08:00
Sebastian Messmer	2fa51dde28	Remove unnecessary tensor copies (#33732 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33732 move and forward instead of copy Benchmarks: A microbenchmark calling the add operation on two tensors in a tight loop shows a 5% improvement in performance. No visible change for a model like resnet that does more work in its kernels. ghstack-source-id: 99161486 Test Plan: benchmarks Differential Revision: D20082642 fbshipit-source-id: eeac59686f8621dd5eaa85d61e6d219bba48c847	2020-02-28 14:47:04 -08:00
Gregory Chanan	917e56e950	Throw an error if nbytes is called on a sparse tensor. (#33897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33897 Test Plan: Imported from OSS Differential Revision: D20146388 Pulled By: gchanan fbshipit-source-id: b5853096e290fa7fb50be41446b138ebdf71009f	2020-02-28 14:12:50 -08:00
Gregory Chanan	f5d92fbc25	Get rid of newWithStorage2d calls. (#33823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33823 Test Plan: Imported from OSS Differential Revision: D20122448 Pulled By: gchanan fbshipit-source-id: b249372c93ee71b84a293dfb5c298a8fb664da16	2020-02-28 14:07:44 -08:00
Hector Yuen	56d9906083	update mapping of fake operators (#33946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33946 update mapping of fake operators to model nnpi update SpatialBN to non-lowered Test Plan: compilation https://github.com/pytorch/pytorch/pull/33946 Reviewed By: amylittleyang Differential Revision: D20156136 fbshipit-source-id: e6ed87c3c5eba692a49376f0d9dae37ae185f185	2020-02-28 14:01:02 -08:00
svcscm	ad44394f15	Updating submodules Summary: GitHub commits: `e5b1164ad7` `6df461c14e` `41535d0218` `30c57a1a0e` `3b9aeb2ebe` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: 8361b5814c531edc99f96f11db97d6b2adcc5280	2020-02-28 13:29:48 -08:00
Joseph Spisak	9fd1a7697f	Create CODE_OF_CONDUCT.md	2020-02-28 13:20:00 -08:00
Michael Carilli	a726827ec8	Formatting changes for gradient scaling (#33832 ) Summary: hard to get right locally...I can build the docs but never quite match what it looks like live. the bullet point indentation was just an oversight. Removing `Returns:` formatting tabs because they take up a lot of space when rendered and add no clarity. Some functions in Pytorch [do use them](https://pytorch.org/docs/master/torch.html#torch.eye), but [many don't bother](https://pytorch.org/docs/master/torch.html#torch.is_tensor), so apparently some people shared my feelings (Not using them is in line with existing practice). Pull Request resolved: https://github.com/pytorch/pytorch/pull/33832 Differential Revision: D20135581 Pulled By: ngimel fbshipit-source-id: bc788a7e57b142f95c4fa5baf3fe01f94c45abd8	2020-02-28 11:40:48 -08:00
Igor Sugak	5dde8cd483	[caffe2] fix no matching function min/max Clang errors (#33563 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33563 When NVCC or Clang are driving CUDA compilation many math functions are declared by default, with a small difference: Clang marks them as `__device__` only, while NVCC uses both `__host__` and `__device__`. This makes every un-elaborated `min` or `max` function call from a `__host__` function generate a syntax error when Clang is used. Fix the errors by using `std::min` and `std::max` from `<algorithm>`, since C++14 they are `constexpr` and can be used in the `__device__` code [1]. 1. https://llvm.org/docs/CompileCudaWithLLVM.html#algorithm Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: ngimel Differential Revision: D20005795 fbshipit-source-id: 98a3f35e8a96c15d3ad3d2066396591f5cca1696	2020-02-28 11:33:24 -08:00
Mingfei Ma	c6d301220a	Fix torch.cat() performance regression on single core CPU (#33534 ) Summary: This PR addresses the performance regression on `torch.cat()` on CPU with single thread. Previous optimization https://github.com/pytorch/pytorch/issues/30806 introduced regression for several cases on pytorch operator benchmark. See https://github.com/pytorch/pytorch/issues/33334 for detail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33534 Differential Revision: D20129963 Pulled By: VitalyFedyunin fbshipit-source-id: 3fa6cd266978e5b54fa37105555502b77352df3e	2020-02-28 11:22:08 -08:00
svcscm	890242254b	Updating submodules Summary: GitHub commits: `6f4df6e0cd` `6b7df86da1` `f873713ad6` `2b3b76cc4d` `b990727d33` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: bf7b1639ee23e1e823bc2217f56c87dc7befaf7f	2020-02-28 10:42:20 -08:00
Gregory Chanan	04dc0e6973	Split Distribution.cu into smaller files to reduce compilation time. (#33892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33892 Test Plan: Imported from OSS Differential Revision: D20148925 Pulled By: gchanan fbshipit-source-id: 955e6ff22ee5fb24000b9f2ee58a243e76edf993	2020-02-28 09:21:51 -08:00
anjali411	dece155335	Modified assertEqual to handle complex tensors (#33773 ) Summary: - Modified assertEqual to handle complex tensors - added a test in test_torch.py to test torch.zeros - added dispatch for complex for index_kernel, index_put_kernel Pull Request resolved: https://github.com/pytorch/pytorch/pull/33773 Differential Revision: D20135553 Pulled By: anjali411 fbshipit-source-id: f716604535c0447ecffa335b0fc843431397c988	2020-02-28 08:43:28 -08:00
anjali411	09046713cc	removed .data from test_autograd.py (#33886 ) Summary: issue: https://github.com/pytorch/pytorch/issues/33630 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33886 Differential Revision: D20160292 Pulled By: anjali411 fbshipit-source-id: 14a42d8148bd60db2dd8ec39f83f99c061ae19c1	2020-02-28 08:24:07 -08:00
Jerry Zhang	f5f1e5e7f6	[quant][graphmode][refactor] Factor out getInvokedMethod (#33649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33649 Test Plan: . Imported from OSS Differential Revision: D20123589 fbshipit-source-id: 0853d757434fb85c6d86666ff9fc991f8c4cb4bc	2020-02-27 23:48:09 -08:00
Jerry Zhang	7f1112820a	[quant][graphmode][refactor] Move check for weight outside of insertObserverFor (#33276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33276 att Test Plan: . Imported from OSS Differential Revision: D20123593 fbshipit-source-id: 45dc8488ddf02225ba2c20374c9385edd77a4912	2020-02-27 23:48:04 -08:00
Jerry Zhang	7c13f576ea	[quant][graphmode][refactor] Checks for bias and weight (#33273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33273 - Move the check for bias to valueNeedsToBeQuantized - Move TORCH_CHECK inside the functions for checking if a value is bias or weight Test Plan: . Imported from OSS Differential Revision: D20123595 fbshipit-source-id: 4b805d57dcaf41a6436506d021dd5f6518bc88fd	2020-02-27 23:47:59 -08:00
Jerry Zhang	97541a5106	[quant][graphmode][refactor] Move values_to_skip check inside valueNeedsToBeQuantized (#33275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33275 att Test Plan: . Imported from OSS Differential Revision: D20123592 fbshipit-source-id: 2b56ea8bab27eb9ea2bf792c83e48a7af8917e1a	2020-02-27 23:46:29 -08:00
Wanchao Liang	64aab3260a	[jit] allow RRef local creation with IValue objects (#33263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33263 This PR allow PyRRef local creation to inspect the pyobject, if it founds that we could turn it to an IValue, turn to an IValue first, otherwise hold it as a PyObjectType Test Plan: Imported from OSS https://fb.quip.com/aGxRAh2lCg05 Differential Revision: D19871243 Pulled By: wanchaol fbshipit-source-id: ae5be3c52fb1e6db33c64e64ef64bc8b9ea63a9a	2020-02-27 22:49:53 -08:00
Igor Sugak	1507573a52	[caffe2] fix no return statement in constexpr function Clang error in TypeIndex.h (#33576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33576 `throw` statement at the end of `constexpr` is ill-formed according to Clang. It happens when Clang is driving CUDA compilation and compiles for device the effected code. Due to its compilation model it requires host code to be well-formed even when compiling for device. Fix the error by guarding the entire definition of `type_index_impl` with `__CUDA_ARCH__` check. Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Execute tests on devgpu: ``` buck test mode/dev-nosan -j 8 //caffe2/caffe2/python/operator_test/... //caffe2/test:cuda ``` Reviewed By: smessmer Differential Revision: D20008881 fbshipit-source-id: b0dc9abf0dc308b8b8637b54646a0411baf7fef3	2020-02-27 22:29:58 -08:00
peter	c18cb1eb52	Improve dll loading logic on Windows (#33856 ) Summary: The way it works on the Anaconda distribution of Python 3.8 is a bit different. Loading DLLs explicitly (e.g. `ctype.CDLL`) relies on paths appended by `os.add_dll_directory`. But if you try to load DLLs implicitly (e.g. `from torch._C import *`), it will rely on `PATH`. Fixes https://github.com/pytorch/vision/issues/1916. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33856 Differential Revision: D20150080 Pulled By: soumith fbshipit-source-id: cdbe76c138ea259ef7414c6634d4f7e0b1871af3	2020-02-27 21:58:35 -08:00
Meghan Lele	cb8d9f99aa	[JIT] Implement Tensor.tolist() (#33472 ) Summary: Summary This commit adds an implementation of `Tensor.tolist()` to the JIT interpreter. Testing This commit adds several unit tests that test that this function works correctly for 0D, 1D, 2D and 3D tensors of type `float`, `int` and `bool`. ``` (base) meghanl-mbp:pytorch meghanl$ python test/test_jit.py TestList.test_to_list -v Fail to import hypothesis in common_utils, tests are not derandomized test_to_list (jit.test_list_dict.TestList) Unit tests for Tensor.tolist() function. ... ok ---------------------------------------------------------------------- Ran 1 test in 0.329s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33472 Differential Revision: D20109738 Pulled By: SplitInfinity fbshipit-source-id: a6e3fee5e3201d5e1f0c4ca45048488ae2bf5e33	2020-02-27 21:45:46 -08:00
Wanchao Liang	5029ff001b	[Revert] manual revert of D19918320 (#33920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33920 revert D19918320 Test Plan: revert diff Reviewed By: zhaojuanmao Differential Revision: D20151299 fbshipit-source-id: c346554ae9074991331479e434e54b0cc513f1a4	2020-02-27 21:22:36 -08:00
Michael Suo	8f84deddd1	[jit] fix up refs in overview.md (#33919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33919 Test Plan: Imported from OSS Differential Revision: D20154953 Pulled By: suo fbshipit-source-id: 2ef83cce8da88212bed7edc813c9b233267ea81b	2020-02-27 19:22:51 -08:00
Michael Suo	d6485b411b	[jit] add top-level readme to csrc/jit (#33916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33916 Test Plan: Imported from OSS Differential Revision: D20150771 Pulled By: suo fbshipit-source-id: c7550954ddd6a294ce833348bf9fa058503e9bd7	2020-02-27 19:21:05 -08:00
Michael Suo	bd7e9c490a	[jit] stop printing crap in test_jit (#33917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33917 Test Plan: Imported from OSS Differential Revision: D20150750 Pulled By: suo fbshipit-source-id: 9a35298a8856d423fb6b9043174853cccf968706	2020-02-27 19:06:43 -08:00
lixinyu	d66c320b10	disable leaky_relu_ backward calculation with negative slope (#33639 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33639 Test Plan: Imported from OSS Differential Revision: D20045735 Pulled By: glaringlee fbshipit-source-id: b3becf30a8fe9ee178792bd88f6ee10102504ed5	2020-02-27 18:54:57 -08:00
Jerry Zhang	997b5b5797	[quant][graphmode][refactor] Simplify signature for insertObserverFor (#33274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33274 att Test Plan: . Imported from OSS Differential Revision: D20123588 fbshipit-source-id: e656d96e0b6004bfcca5df2ab222184d4e1dd6ad	2020-02-27 17:24:41 -08:00
Michael Suo	db4a24e008	[jit] remove some unused/redundant files (#33806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33806 as title Test Plan: Imported from OSS Differential Revision: D20122117 Pulled By: suo fbshipit-source-id: 209d29ed2c873181140c9fb5cdc305c200ce4008	2020-02-27 17:16:12 -08:00
Vitaly Fedyunin	877ab3afe3	Better handing of Autograd+Fork errors. (#33885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33885 Fixes: #32835 Fixes: #5834 Can not combine with CUDA's implementation as each of them requires individual `std::once_flag` as well as different `forked_autograd_child` functions. CUDA version relays to python module, autograd uses TORCH_CHECK to report error to python and cpp. Test Plan: Imported from OSS Differential Revision: D20144024 Pulled By: VitalyFedyunin fbshipit-source-id: e7cf30568fff5110e9df7fe5b23f18ed992fa17f	2020-02-27 16:07:29 -08:00
Simón Sepúlveda Osses	746e5218e7	Mistake in MSELoss documentation (#33836 ) Summary: Replaced `sum` with `mean` in [line 392](https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/loss.py#L392) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33836 Differential Revision: D20142053 Pulled By: ailzhang fbshipit-source-id: 2bfe19944ffc5534902dd9087023e70ddf5746c3	2020-02-27 15:34:46 -08:00
Ailing Zhang	48fd410e44	Try fix XLAPreAutograd with _like functions. (#33848 ) Summary: In _like functions we call `globalLegacyTypeDispatch().initForDispatchKeySet(c10::detail::multi_dispatch_key_set(self, options));` -> `dispatchKeyToBackend` and thus this change. `self` has both `XLAPreAutograd` and `XLATensorId` in key set. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33848 Differential Revision: D20135898 Pulled By: ailzhang fbshipit-source-id: a8585f39f3fa77b53718f20d3144f4f2f3cb8e53	2020-02-27 15:28:40 -08:00
Gregory Chanan	87e97ced20	Split UnaryOpsKernel into smaller files for faster compilation. (#33888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33888 Test Plan: Imported from OSS Differential Revision: D20143653 Pulled By: gchanan fbshipit-source-id: de708030e93e96091e0c01a89b4342872d0657dd	2020-02-27 15:13:01 -08:00
Eli Uriegas	aff1da5aac	.circleci: Remove trailing slash, fix conda upload (#33903 ) Summary: Conda registers a suffixed slash as a new user so it was failing to upload the anaconda packages. In the future this should be handled through a single variable that can be used for both but until then this will have to do. Bug was introduced in https://github.com/pytorch/pytorch/issues/33842 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/33903 Differential Revision: D20148679 Pulled By: seemethere fbshipit-source-id: 27c95f5d906ce84aa34bf5d76fd6f1ef5df08fb9	2020-02-27 14:56:02 -08:00
Jongsoo Park	a7fe200f5f	[caffe2] simplify caffe2 code with fbgemm handling block size 1 emb (#33774 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33774 Simplify caffe2 code using D19246900 Test Plan: CI Reviewed By: jianyuh Differential Revision: D20102410 fbshipit-source-id: 8de4d9cfac66898db0718ac6477339fd5e5428e3	2020-02-27 14:45:28 -08:00
Jack Cao	524dad13a8	Add device to the test tensor. Default device type is CPU, in pytorch… (#33635 ) Summary: …/xla this will result in a failure since it is comparing a XLA tensor with a CPU tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33635 Differential Revision: D20043517 Pulled By: ailzhang fbshipit-source-id: d84038ea675e4d4a9c02e7a8b0924bdb12f40501	2020-02-27 14:40:07 -08:00
HearyShen	edd5c009f7	fix docs mistakes in lr_scheduler.MultiplicativeLR (#33805 ) Summary: This PR is referenced to an issue: [The docs of `MultiplicativeLR` use `LambdaLR` as example](https://github.com/pytorch/pytorch/issues/33752#issue-570374087) https://github.com/pytorch/pytorch/issues/33752 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33805 Differential Revision: D20121314 Pulled By: mruberry fbshipit-source-id: 5afa63bbe83d35ce4e55705b8cbd96326a907651	2020-02-27 14:11:57 -08:00
Gregory Chanan	d97560999b	Split BinaryCompareKernel.cu into a file-per-kernel to speed up compilation. (#33871 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33871 Test Plan: Imported from OSS Differential Revision: D20140862 Pulled By: gchanan fbshipit-source-id: a4fde38c1c7c5905e3855fa490ea2e87bb24c703	2020-02-27 13:48:36 -08:00
Meghan Lele	5eacdfb21f	Revert D20127441: [pytorch][PR] [JIT] Introduce a fake Tensor creation node for IR unit tests Test Plan: revert-hammer Differential Revision: D20127441 Original commit changeset: 56da4f23ac46 fbshipit-source-id: 7d4602e5011bec6f6871eab16af05a3198694e5d	2020-02-27 13:48:31 -08:00
Gregory Chanan	c4d611a0f5	Split BinaryMiscOpsKernels into more files for faster build times. (#33873 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33873 Test Plan: Imported from OSS Differential Revision: D20140974 Pulled By: gchanan fbshipit-source-id: 88b982881e8034f3b03cdb6911ae4239d2bb1596	2020-02-27 13:47:06 -08:00
Brian Vaughan	910acafc79	Revert D20124224: [jit] stop printing crap in test_jit Test Plan: revert-hammer Differential Revision: D20124224 Original commit changeset: 9241d21fdf94 fbshipit-source-id: 0680f9db922f9a33a4e859eedd142b87a51bbede	2020-02-27 13:40:34 -08:00
svcscm	53630f7681	Updating submodules Summary: GitHub commits: `ae68f84fcd` `6cb0beaf0e` `401fb54029` `fe8777e593` `44fcf005eb` `72ee067b90` `01a3c124d4` `c94f8f43b9` `a09b292a28` `472e40a902` `967d4bc051` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: e8e43b1cbc365fd7f5b068d625c4020240358690	2020-02-27 13:35:14 -08:00
Brian Vaughan	243af17d65	Revert D20103905: [jit] Fix flipped PackedSequence outputs in script Test Plan: revert-hammer Differential Revision: D20103905 Original commit changeset: 84081213ed21 fbshipit-source-id: 2b260654fac87e52fbaf8035018e4ea484928af1	2020-02-27 13:29:35 -08:00
Brian Vaughan	a7cf5c859f	Revert D20136865: fix lint Test Plan: revert-hammer Differential Revision: D20136865 Original commit changeset: 4bf7ac324a0a fbshipit-source-id: 94cc83cda180f744cec174d269f1b82babff0e5c	2020-02-27 13:21:44 -08:00
iurii zdebskyi	908eee5583	remove .data from test/distributed/ (#33874 ) Summary: `.data` calls are unsafe and should not be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33874 Differential Revision: D20141059 Pulled By: izdeby fbshipit-source-id: 8e11afc74f0cb04f5b18b458068fb813a6d51708	2020-02-27 13:14:29 -08:00
Meghan Lele	390d4d6df3	[JIT] Introduce a fake Tensor creation node for IR unit tests (#33595 ) Summary: Summary There is often a need to create a Tensor when writing IR by hand for JIT optimisation pass unit tests. The only options for this today are real Tensor creation functions like `aten::ones`. Any test that uses these functions must also use the same default arguments as the Python/C++ API, which means that all of the tests have to be updated when the API is updated. This commit introduces a new primitive, `prim::MakeTestTensor` with schema `() -> Tensor` that should be used in unit tests instead of real Tensor creation functions. This new primitive has no public-facing API, so the maintenance burden is much lower. Testing This commit updates the alias analysis and DCE tests to use `prim::MakeTestTensor` instead of `aten::rand`, `aten::ones`, and `aten::zeros`. ``` $ ./bin/test_jit CUDA not available. Disabling CUDA and MultiCUDA tests Note: Google Test filter = -_CUDA:_MultiCUDA [==========] Running 75 tests from 1 test case. [----------] Global test environment set-up. [----------] 75 tests from JitTest [ RUN ] JitTest.ADFormulas [ OK ] JitTest.ADFormulas (82 ms) [ RUN ] JitTest.Attributes [ OK ] JitTest.Attributes (0 ms) ... ... ... [ RUN ] JitTest.LiteInterpreterPrim [ OK ] JitTest.LiteInterpreterPrim (0 ms) [ RUN ] JitTest.LiteInterpreterLoadOrigJit [ OK ] JitTest.LiteInterpreterLoadOrigJit (2 ms) [----------] 75 tests from JitTest (150 ms total) [----------] Global test environment tear-down [==========] 75 tests from 1 test case ran. (150 ms total) [ PASSED ] 75 tests. ``` Fixes* This pull request fixes https://github.com/pytorch/pytorch/issues/33500. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33595 Differential Revision: D20127441 Pulled By: SplitInfinity fbshipit-source-id: 56da4f23ac46335227254f606c6481718108f378	2020-02-27 13:10:20 -08:00
Michael Suo	dbe850af5b	[jit] do the code reorg (#33851 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33851 Rationale and context described in #33828. Script to reproduce the move: https://gist.github.com/suo/16cbefaaeb67ca5a7c6caffd49b7f6e9 ghstack-source-id: 99079645 Test Plan: Make sure CI passes Reviewed By: jamesr66a Differential Revision: D20133869 fbshipit-source-id: 390e9241a9c85366d9005c492ac31f10aa96488e	2020-02-27 13:02:51 -08:00
Jerry Zhang	afbd04449e	[quant][graphmode] Swap dequantize after inline for ops that doesn't require observation (#33173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33173 How to deal with ops that’s defined for both floating point and quantized Tensor? Category of ops: the ones that doesn’t require observers, which means the quantization parameters(scale/zero_point) of the output of this op can be inferred from the quantization parameters of inputs. For example: avg_pool, max_pool, flatten, transpose, upsample Another related topic to previous one is how do we deal with things like adaptive_avg_pool2d that does not require to be observed and it works with quantized tensor as well? If we insert quant/dequant for them, even the quant fusion becomes a numerically changing operation because the scale/zero_point for input and output are different. Proposal We can swap the operator with dequantize whenever we see it. For example, for pattern Let’s say aten::general_op is defined for both floating point and quantized %r = aten::conv(...) %q = quantize(%r) %dq = dequantize(%q) %f = aten::general_op(%dq) ... We detect that all inputs of aten::general_op is produced by dequantize, we’ll first delete all the dequantize for the inputs and then insert dequantize for each use of the output of the aten::general_op, note that this should work generally for all the case we might encounter. After transformation we’ll have: %r = aten::conv(...) %q = quantize(%r) %x = aten::general_op(%q) %f = dequantize(%x) ... 1. Multiple inputs 1. We need to make sure all inputs of the aten::general_op are produced by dequantize before we do this transformation 2. Input used by multiple operators 1. We already did this by inserting dequantize for each use of the value 3. Output used by multiple operators 1. We’ll reuse the code that inserts dequantize(might need some refactor) Note that current concat does not belong to this category right now since it does not inherit quantization parameters from inputs. Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20123590 fbshipit-source-id: de2febe1f37e4079457a23acaeccbc6d9c9e1f8a	2020-02-27 12:42:29 -08:00
Lu Fang	6647a44e8c	Automatic update of fbcode/onnx to 9fdae4c68960a2d44cd1cc871c74a6a9d469fa1f (#33858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33858 Previous import was 04a29addfd5b912812addb8dea5f8763fbfaad01 Included changes: - [9fdae4c6](https://github.com/onnx/onnx/commit/9fdae4c6): Copy sizes in some optimizers to remain shape information (#2574) <daquexian> - [c978d102](https://github.com/onnx/onnx/commit/c978d102): Implement CELU node as a Function (#2575) <Jeremy Cochoy> - [c677aef4](https://github.com/onnx/onnx/commit/c677aef4): Fix CI build break (#2603) <Changming Sun> - [d343755d](https://github.com/onnx/onnx/commit/d343755d): Allow function body to rely on other operator sets (#2597) <Ke Zhang> Test Plan: ci Reviewed By: hl475 Differential Revision: D20135343 fbshipit-source-id: d719c4ba2ae26892a5fa921691c84eba64b59291	2020-02-27 12:40:39 -08:00
Gregory Chanan	bd77abffe3	Kill some unused (TH)Storage-based APIs. (#33815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33815 Test Plan: Imported from OSS Differential Revision: D20119333 Pulled By: gchanan fbshipit-source-id: 15042ca0fabdc88b53d662b6dd964968f64997f4	2020-02-27 12:23:25 -08:00
JeongUkJae	b10761d890	fix type stub errors (#33762 ) Summary: I've been using pytorch with type hintings, and I found errors that can be easily fixed. So I'm creating this PR to fix type bugs. I expected below code should be type-checked without any errors. ```python import torch from torch.nn import Linear from torch.autograd import Variable from torch.optim import AdamW from torch.utils import hooks # nn.Module should have training attribute module = Linear(10, 20) module.training # torch should have dtype bfloat16 tensor2 = torch.tensor([1,2,3], dtype=torch.bfloat16) # torch.Tensor.cuda should accept int or str value torch.randn(5).cuda(1) torch.tensor(5).cuda('cuda:0') # optimizer should have default attribute module = Linear(10, 20) print(AdamW(module.weight).default) # torch.Tensor should have these boolean attributes torch.tensor([1]).is_sparse torch.tensor([1]).is_quantized torch.tensor([1]).is_mkldnn # Size class should tuple of int a, b = torch.tensor([[1,2,3]]).size() # check modules can be accessed torch.nn.parallel torch.autograd.profiler torch.multiprocessing torch.sparse torch.onnx torch.jit torch.hub torch.random torch.distributions torch.quantization torch.__config__ torch.__future__ torch.ops torch.classes # Variable class's constructor should return Tensor def fn_to_test_variable(t: torch.Tensor): return None v = Variable(torch.tensor(1)) fn_to_test_variable(v) # check RemovableHandle attributes can be accessed handle = hooks.RemovableHandle({}) handle.id handle.next_id # check torch function hints torch.is_grad_enabled() ``` But current master branch raises errors. (I checked with pyright) ``` $ pyright test.py Searching for source files Found 1 source file test.py 12:45 - error: 'bfloat16' is not a known member of module 15:21 - error: Argument of type 'Literal[1]' cannot be assigned to parameter 'device' of type 'Optional[device]' 'int' is incompatible with 'device' Cannot assign to 'None' 16:22 - error: Argument of type 'Literal['cuda:0']' cannot be assigned to parameter 'device' of type 'Optional[device]' 'str' is incompatible with 'device' Cannot assign to 'None' 23:19 - error: Cannot access member 'is_sparse' for type 'Tensor' Member 'is_sparse' is unknown 24:19 - error: Cannot access member 'is_quantized' for type 'Tensor' Member 'is_quantized' is unknown 25:19 - error: Cannot access member 'is_mkldnn' for type 'Tensor' Member 'is_mkldnn' is unknown 32:7 - error: 'autograd' is not a known member of module 33:7 - error: 'multiprocessing' is not a known member of module 34:7 - error: 'sparse' is not a known member of module 35:7 - error: 'onnx' is not a known member of module 36:7 - error: 'jit' is not a known member of module 37:7 - error: 'hub' is not a known member of module 38:7 - error: 'random' is not a known member of module 39:7 - error: 'distributions' is not a known member of module 40:7 - error: 'quantization' is not a known member of module 41:7 - error: '__config__' is not a known member of module 42:7 - error: '__future__' is not a known member of module 44:7 - error: 'ops' is not a known member of module 45:7 - error: 'classes' is not a known member of module 60:7 - error: 'is_grad_enabled' is not a known member of module 20 errors, 0 warnings Completed in 1.436sec ``` and below list is not checked as errors, but I think these are errors too. * `nn.Module.training` is not boolean * return type of `torch.Tensor.size()` is `Tuple[Unknown]`. --- related issues. https://github.com/pytorch/pytorch/issues/23731, https://github.com/pytorch/pytorch/issues/32824, https://github.com/pytorch/pytorch/issues/31753 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33762 Differential Revision: D20118884 Pulled By: albanD fbshipit-source-id: 41557d66674a11b8e7503a48476d4cdd0f278eab	2020-02-27 06:58:53 -08:00
Pavel Belevich	095de1e872	Migrate `random_` from the TH to Aten (CPU and CUDA) (#33663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33663 Test Plan: Imported from OSS Differential Revision: D20056350 Pulled By: pbelevich fbshipit-source-id: f9859b79ffdec70c48d6ee3ec70fd6fad593a9f5	2020-02-27 05:05:42 -08:00
Michael Suo	f5952cf7cb	fix lint (#33861 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33861 Test Plan: Imported from OSS Differential Revision: D20136865 Pulled By: suo fbshipit-source-id: 4bf7ac324a0abce9b45121ac5ab438448a6f3149	2020-02-27 00:33:51 -08:00
Shihao Xu	9733711394	[JIT] Support calling Tensor.element_size() in TorchScript (#33808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33808 # Problem https://github.com/pytorch/pytorch/issues/33620 ghstack-source-id: 99073701 Test Plan: ``` buck test mode/dev-nosan //caffe2/test:jit -- test_numel buck test mode/dev-nosan //caffe2/test:jit -- test_element_size buck build mode/dev-nosan //caffe2/test:jit \ && buck-out/gen/caffe2/test/jit\#binary.par -r test_numel buck build mode/dev-nosan //caffe2/test:jit \ && buck-out/gen/caffe2/test/jit\#binary.par -r test_element_size ``` Compile error P126667043 Generated code, ``` buck-out/dev/gen/caffe2/generate-code=register_aten_ops_0.cpp/register_aten_ops_0.cpp buck-out/dev/gen/caffe2/generate-code=register_aten_ops_2.cpp/register_aten_ops_2.cpp ``` P126667064 Differential Revision: D7050644 fbshipit-source-id: 20dbdb9c500b6d7683c23e3049d43ed0ca06d831	2020-02-26 22:30:44 -08:00
Hong Xu	00f685d2d8	Add Scalar::type() (#33603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33603 This function returns ScalarType based on its value. This is helpful to avoid code generated in aten_op.h has returned Scalars depending on arg self to determine its type. Test Plan: Imported from OSS Differential Revision: D20100218 Pulled By: ezyang fbshipit-source-id: 337729a7559e6abb3a16b2a563a2b92aa96c7016	2020-02-26 22:25:18 -08:00
Edward Yang	d41c8d0461	Correctly preserve "not set anywhere" TensorOptions when merging. (#33510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33510 Previously, we would fill in TensorOption with defaults whenever an item was missing from both the left and right side of the merge. This is morally incorrect: if we don't have an item on the left or right, we should keep the entry empty (so the downstream user can apply the appropriate defaulting rule). I don't think this caused any bugs, but I noticed this error when working on a later patch in my diff stack. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20001775 Pulled By: ezyang fbshipit-source-id: 88139fc268b488cd1834043584a0d73f46c8ecaa	2020-02-26 21:46:39 -08:00
Edward Yang	ca002a0f6b	Switch empty_like to use merge_in to process TensorOptions. (#33505 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33505 This shouldn't change semantics, but it has the benefit of making torch::empty_like(x, dtype(kFloat)) actually work (previously, this would just ignore all of the properties from x). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D20001776 Pulled By: ezyang fbshipit-source-id: ba81186d3293abc65da6130b2684d42e9e675208	2020-02-26 21:44:33 -08:00
Nathan Goldbaum	84101f353e	Avoid problematic pickle usages on Python 3.8.0 and 3.8.1 (#33824 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32289 This has been fixed upstream as of Python 3.8.2. I think the easiest and least invasive way to ameliorate this is to catch the error condition and print a more informative error asking the user to update their Python version. It might be possible to buffer the data into memory and then read from memory, but that would be an invasive change and might cause memory exhaustion for very large models. Suggestions for alternate fixes or ways to improve the error message wording are very welcome. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33824 Differential Revision: D20131722 Pulled By: ezyang fbshipit-source-id: a6e3fbf4bf7f9dcce5772b36f7a622cbf14b5ae4	2020-02-26 21:15:38 -08:00
Pritam Damania	421e3e9a54	Release GIL for RPC pybind functions. (#33610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33610 Our pybind definitions for several RPC functions didn't release GIL once we were processing stuff in C++. This PR adds asserts that we release GIL appropriately and adds py::gil_scoped_release and py::gil_scoped_acquire in the appropriate places. ghstack-source-id: 99066749 Test Plan: waitforbuildbot Differential Revision: D20025847 fbshipit-source-id: 57a778cba0336cf87352b07c89bbfb9254c4bdd7	2020-02-26 20:56:06 -08:00
davidriazati	cea0cc8ca8	[jit] Unify augmented assign handling (#33578 ) Summary: Stacked PRs * #33578 - [jit] Unify augmented assign handling * #32993 - [jit] Fix aug assign for non-tensor attributes We handle augmented assignments to `Select` and `Var` statements differently, but the actual in place update is the same for both, so this PR factors it out into a method so we don't have 2 code paths doing the same thing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33578 Pulled By: driazati Differential Revision: D20127647 fbshipit-source-id: 94f37acbd2551498de9d2ca09a514508266f7d31	2020-02-26 19:13:15 -08:00
Omkar Salpekar	24dd800e6a	[Dist Autograd] Functional API for Dist Autograd and Dist Optimizer (#33711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33711 Fixed #33480 This makes `dist_autograd.backward` and `dist_optimizer.step` functional by making the user explicitly pass in the `context_id` as opposed to relying on the confusing thread_local context_id. This diff incorporates these API changes and all places where these functions are called. More concretely, this code: ``` with dist_autograd.context(): # Forward pass. dist_autograd.backward([loss.sum()]) dist_optim.step() ``` should now be written as follows: ``` with dist_autograd.context() as context_id: # Forward pass. dist_autograd.backward(context_id, [loss.sum()]) dist_optim.step(context_id) ``` Test Plan: Ensuring all existing dist_autograd and dist_optimizer tests pass with the new API. Also added a new test case for input checking. Differential Revision: D20011710 fbshipit-source-id: 216e12207934a2a79c7223332b97c558d89d4d65	2020-02-26 19:08:28 -08:00
Jerry Zhang	4c33222c51	[quant][graphmode] Replicate dequantize nodes (#33531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33531 We already insert dequantize for each use of the value, but there might still be cases where we only see the value is used multiple times after inline. This pass adds the support to replicate dequantize after inline to ensure output of dequantize is only used by one node, which is necessary to preserve all quantization patterns like `dequant - conv - quant` Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D20123591 fbshipit-source-id: 6edb10a4566538bcf9379d332233f870372b7a63	2020-02-26 18:59:16 -08:00
davidriazati	2b9fa4a756	[jit] Fix flipped PackedSequence outputs in script (#32955 ) Summary: Stacked PRs * #32955 - [jit] Fix flipped PackedSequence outputs in script * #32953 - [jit] Support properties on `Device` Fixes #32605 ](https://our.intern.facebook.com/intern/diff/20103905/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32955 Pulled By: driazati Differential Revision: D20103905 fbshipit-source-id: 84081213ed214846e563b9f05bcab0210bb1a71b	2020-02-26 18:53:27 -08:00
Michael Suo	150e025be8	[jit] stop printing crap in test_jit (#33779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33779 This should eliminate random warnings and print spew from test_jit. It also fixes a bug where we weren't properly comparing captured outputs (!) Test Plan: Imported from OSS Differential Revision: D20124224 Pulled By: suo fbshipit-source-id: 9241d21fdf9470531b0437427b28e325cdf08d3a	2020-02-26 18:46:03 -08:00
Wanchao Liang	4dad00b64b	[rpc] special case tensor type check when getting RRef (#33582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33582 Test Plan: Imported from OSS Differential Revision: D20009837 Pulled By: wanchaol fbshipit-source-id: 7e9ab87d4dddb822c7575891a2b620eff83bfa00	2020-02-26 18:44:40 -08:00
Wanchao Liang	d494986171	[jit] make RRef type annotation available in Python (#33526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33526 Test Plan: Imported from OSS Differential Revision: D19988848 Pulled By: wanchaol fbshipit-source-id: aeebc946d08b38dac0b656617bf395e86bcea558	2020-02-26 18:44:35 -08:00
Wanchao Liang	2448c97a53	[jit] infer RRef type as container type (#33369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33369 This PR add RRef type infer rule when we try to infer a type from a pyobject, this allow script module attributes could contain a rref, (i.e. List[RRefs] as a module attribute) Test Plan: Imported from OSS Differential Revision: D19918320 Pulled By: wanchaol fbshipit-source-id: e5fd99c0ba5693b22ed48f0c0550b5e1dac89990	2020-02-26 18:43:13 -08:00
Elias Ellison	857eb4145e	[JIT] add support for torch.cdist (#33737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33737 Test Plan: Imported from OSS Differential Revision: D20121916 Pulled By: eellison fbshipit-source-id: b0427bbfd3ade1f3129c4a95a542fbc32c3abd76	2020-02-26 18:37:37 -08:00
Elias Ellison	f31b1d3453	[JIT] add support for lu_unpack (#33736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33736 Test Plan: Imported from OSS Differential Revision: D20121914 Pulled By: eellison fbshipit-source-id: 1136f4d7678a2233129aefe3e30234af385b8353	2020-02-26 18:37:33 -08:00
Elias Ellison	4543cf4eb1	[JIT] add support for torch.lu to torchscript (#33724 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33724 Fix for https://github.com/pytorch/pytorch/issues/33381, partial fix of https://github.com/pytorch/pytorch/issues/30786 Test Plan: Imported from OSS Differential Revision: D20077321 Pulled By: eellison fbshipit-source-id: a1e6a0370712b36c9f66979098ac2f9d500ca5f6	2020-02-26 18:37:28 -08:00
Elias Ellison	fddf73250d	[JIT] fix resolving of functions in torch/functional. fix compilation of torch.stft (#33504 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33504 Fix resolution fo functions that are bound onto torch in torch/functional.py. This does not fix compilation of all of those functions, those will be done in follow ups. Does torch.stft as a start. Fixes #21478 Test Plan: Imported from OSS Differential Revision: D20014591 Pulled By: eellison fbshipit-source-id: bb362f1b5479adbb890e72a54111ef716679d127	2020-02-26 18:35:43 -08:00
Elias Ellison	057fd5e10d	add support for _modules, reducing special casing of nn.Sequential (#29495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29495 This PR adds support for `_modules`, making it so we no longer need to special case support for `nn.Sequential`. I was getting internal errors around the previous approach using `self.define()`, so i am adding this PR as part of the stack. Fix for https://github.com/pytorch/pytorch/issues/28998 Test Plan: Imported from OSS Differential Revision: D18412561 Pulled By: eellison fbshipit-source-id: a8b24ebee39638fccf63b2701f65f8bb0de84faa	2020-02-26 18:07:19 -08:00
Eli Uriegas	6eef66e1f4	.circleci: Divert packages to test channel on tag (#33842 ) Summary: This sets up PIP_UPLOAD_FOLDER to point to the correct channel for release candidates as opposed to nightlies. Removes an old safety check that's not needed anymore for devtoolset3 And provides a nice default for PIP_UPLOAD_FOLDER, which should clear up confusion on where it's initially set This is a stepping stone towards the promotable pipeline. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/33842 Differential Revision: D20130791 Pulled By: seemethere fbshipit-source-id: dac94ef46299574c36c08c968dd36faddeae6363	2020-02-26 17:25:18 -08:00
Mingfei Ma	cd0acf4374	port masked_fill from TH to ATen (#33330 ) Summary: port `masked_fill` from TH to ATen with TensorIterator. single core performance roughly stays the same, single socket performance has 3~16x boost. `masked_fill` is missing from https://github.com/pytorch/pytorch/issues/24507 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33330 Differential Revision: D20098812 Pulled By: VitalyFedyunin fbshipit-source-id: ff20712ffc00cc665550997abcfdfb8916c18e40	2020-02-26 17:20:07 -08:00
Lara Haidar	a0e90e1b45	ONNX Error Message on Missing Op (#33593 ) Summary: Print a complete and comprehensive error message with a description of the issue when an op is missing during ONNX export (previously an ambiguous "key not in registry" error was thrown which was not helpful for the user to understand the failure). Pull Request resolved: https://github.com/pytorch/pytorch/pull/33593 Reviewed By: hl475 Differential Revision: D20052213 Pulled By: houseroad fbshipit-source-id: ae3010a97efdab26effad5e4a418e9cc41f5b04e	2020-02-26 15:18:16 -08:00
Gregory Chanan	02908dfa67	remove setStorage with null StorageImpl support. (#33735 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33735 This apparently used to create a new storage, but I couldn't find anywhere in the code where this actually happens. Changing it to an assert to see what happens. Test Plan: Imported from OSS Differential Revision: D20084029 Pulled By: gchanan fbshipit-source-id: e9c4db115a25fc2e17a3b166c1ff5a0e6b56d690	2020-02-26 15:12:41 -08:00
Yinghai Lu	04f88a3a7b	Add partition info message to NetDef (#33616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33616 Att. We start by assign `node_name` of DeviceOption in each of the op in the net. The for each unique node_name, we will have a PartitionInfo describing the partition, including logic devices that it can be assigned and we establish the link by partition names. Test Plan: unittests Canaries: AF: https://our.intern.facebook.com/intern/ads/canary/424817103900710410 AI: https://our.intern.facebook.com/intern/ads/canary/424737510862189908 Reviewed By: ipiszy, bangshengtang, jfix71 Differential Revision: D20015493 fbshipit-source-id: 0bb0f30cfc3892f7b8709d87b8bc1fbab2f2c46d	2020-02-26 14:54:58 -08:00
David Riazati	51e405743f	Revert D20010383: [jit] Unify augmented assign handling Test Plan: revert-hammer Differential Revision: D20010383 Original commit changeset: 52e559ce907e fbshipit-source-id: 7ca938070d5e98c91e7a7b8485a3c1e790c3ceb2	2020-02-26 14:22:14 -08:00
davidriazati	867990dc17	[jit] Unify augmented assign handling (#33578 ) Summary: Stacked PRs * #33578 - [jit] Unify augmented assign handling * #32993 - [jit] Fix aug assign for non-tensor attributes We handle augmented assignments to `Select` and `Var` statements differently, but the actual in place update is the same for both, so this PR factors it out into a method so we don't have 2 code paths doing the same thing. ](https://our.intern.facebook.com/intern/diff/20010383/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33578 Pulled By: driazati Differential Revision: D20010383 fbshipit-source-id: 52e559ce907e95e5c169ab9d9690d0d235db36f3	2020-02-26 14:09:40 -08:00
Jerry Zhang	c32fa465a5	Preserve Backward compatibility of models serialized before #31040 (#33796 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33796 Test Plan: Imported from OSS Differential Revision: D20109662 Pulled By: jerryzh168 fbshipit-source-id: 9bc936a59fd6dd1031fbf05eb90f98ae9677b936	2020-02-26 13:40:38 -08:00
Will Feng	5c33d98b0d	Add assert_tensor_equal and assert_tensor_not_equal to test/cpp/api/support.h (#30426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30426 This PR adds `assert_tensor_equal` and `assert_tensor_not_equal` to `test/cpp/api/support.h`, as better functions for testing whether two tensors are equal / not equal. Test Plan: Imported from OSS Differential Revision: D18695900 Pulled By: yf225 fbshipit-source-id: c19b9bc4c4e84d9f444015023649d27618fcbdf5	2020-02-26 13:25:25 -08:00
Wojciech Baranowski	8aa09de19e	build: set -DNDEBUG in Release (#32719 ) Summary: This might lead to silent undefined behaviour (e.g. with out-of-bound indices). This affects `test_multinomial_invalid_probs_cuda` which is now removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32719 Test Plan: * Build with VERBOSE=1 and manually inspect `less ndebug.build.log \| grep 'c++' \| grep -v -- -DNDEBUG` (only with nina on Linux) * CI Fixes https://github.com/pytorch/pytorch/issues/22745 Differential Revision: D20104340 Pulled By: yf225 fbshipit-source-id: 2ebfd7ddae632258a36316999eeb5c968fb7642c	2020-02-26 12:53:31 -08:00
Eli Uriegas	93e30c16cb	.circleci: Switch to using robot token for conda uploads (#33786 ) Summary: Thanks to pjh5 for continued use of his account to upload binaries but I think we can start using a bot account now for this. Just a draft until we can ensure the env variables get injected correctly and the token can actually upload Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/33786 Differential Revision: D20122423 Pulled By: seemethere fbshipit-source-id: 0444584831a40ae730325d258935f6d1b873961b	2020-02-26 11:37:40 -08:00
Gao, Xiang	45e4b614d1	Per channel quantization performance improvement (#33772 ) Summary: Benchmark: NVIDIA GTX 1650 + AMD Ryzen Threadripper 3970X ```python import torch print(torch.__version__) for i in range(1000): torch.randn(1024 * 128, device='cuda') def cuda(e): a = torch.randn(2 e, 32, device='cuda') s = torch.randn(32, device='cuda') z = torch.randn(32, device='cuda') torch.cuda.synchronize() %timeit torch.fake_quantize_per_channel_affine(a, s, z, 1, -999, 999); torch.cuda.synchronize() def cpu(e): a = torch.randn(2 e, 32, device='cpu') s = torch.randn(32, device='cpu') z = torch.randn(32, device='cpu') %timeit torch.fake_quantize_per_channel_affine(a, s, z, 1, -999, 999); for i in range(10, 24): cuda(i) print() for i in range(10, 32): cpu(i) ``` Before ``` 1.5.0a0+9bc922d 849 µs ± 44.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 817 µs ± 30.4 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 814 µs ± 2.93 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.11 ms ± 1.32 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.19 ms ± 4.19 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.6 ms ± 5.58 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.44 ms ± 14.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.14 ms ± 2.55 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.41 ms ± 2.46 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 13.9 ms ± 2.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 26.9 ms ± 254 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 52.6 ms ± 260 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 104 ms ± 176 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 207 ms ± 1.24 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 249 µs ± 158 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 420 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 766 µs ± 391 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.45 ms ± 574 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.84 ms ± 34.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.69 ms ± 83 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.29 ms ± 2.58 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.32 ms ± 13.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 17.4 ms ± 38.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 47.5 ms ± 264 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 187 ms ± 1.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 379 ms ± 5.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 652 ms ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.22 s ± 4.58 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 2.34 s ± 8.77 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 4.56 s ± 7.15 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 8.97 s ± 33.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 17.8 s ± 32.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 35.2 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` After ``` 1.5.0a0+a7ec8cc 92.5 µs ± 2.03 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 97.7 µs ± 469 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 109 µs ± 4.73 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 119 µs ± 6.17 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 146 µs ± 1.84 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 211 µs ± 2.45 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 347 µs ± 4.18 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 624 µs ± 14.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.17 ms ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.25 ms ± 48.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.43 ms ± 220 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 8.51 ms ± 44.3 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 16.9 ms ± 30.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 33.7 ms ± 7.64 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 201 µs ± 234 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 285 µs ± 465 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 287 µs ± 214 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 287 µs ± 221 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 287 µs ± 761 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 347 µs ± 399 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 675 µs ± 213 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.34 ms ± 643 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 4.82 ms ± 34.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10.7 ms ± 88.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 20.3 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 39.4 ms ± 242 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 78.8 ms ± 2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each) 153 ms ± 786 µs per loop (mean ± std. dev. of 7 runs, 10 loops each) 285 ms ± 911 µs per loop (mean ± std. dev. of 7 runs, 1 loop each) 541 ms ± 1.09 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.03 s ± 1.67 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 1.97 s ± 8.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) 3.81 s ± 10.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` Fixes https://github.com/pytorch/pytorch/issues/33647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33772 Differential Revision: D20112531 Pulled By: ngimel fbshipit-source-id: f90e3ef1b5be8276851637f3e1251cb8f1af411f	2020-02-26 10:19:25 -08:00
Barak Nehoran	f597ac6efc	Fix grid_sample gradients at image borders (#32829 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/23925 This fixes the incorrect gradients returned by `F.grid_sample` at image borders under `"border"` and `"reflection"` padding modes. At nondifferentiable points, the choice of which gradient to return among its super- or subgradients is rather arbitrary and generally does not affect training. Before this change, however, a bug in the code meant that the gradient returned at the exact borders was not selected from among the super- or subgradients. The gradient is now set to zero at the borders, which is a defensible choice for both the `"border"` and `"reflection"` padding modes: * For `"border"` padding, this effectively means that the exact borders of the image are now considered out of bounds, and therefore receive zero gradient. * For `"reflection"` padding, this effectively treats the exact borders as extrema. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32829 Differential Revision: D20118564 Pulled By: soumith fbshipit-source-id: ef8571ff585be35ab1b90a922af299f53ab9c095	2020-02-26 10:10:42 -08:00
Michela Paganini	b8f0acf50f	Fix examples with updated pruning naming convention (#33144 ) Summary: Fix in docs requested by vainaijr. Closes issue https://github.com/pytorch/pytorch/issues/32991 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33144 Differential Revision: D20104640 Pulled By: albanD fbshipit-source-id: 9b1be2c1cbde1964967967a9581bb6932a305d81	2020-02-26 10:02:50 -08:00
Daya Khudia	a8e7ed48f4	[pt][quant] Parallelize quantize and dequantize (#33765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33765 quantize and dequantize methods now make use of multiple threads. This makes use of shz0116's recent parallelization of quantize/dequantize routines in FBGEMM. Fixes: https://github.com/pytorch/pytorch/issues/32006 https://github.com/pytorch/FBGEMM/issues/142 Alternative to https://github.com/pytorch/pytorch/pull/30153 ``` #!/usr/bin/env python import time import torch import torch.nn as nn torch.set_num_threads(4) # print(torch.__config__.parallel_info()) W = torch.rand(1, 54, 54, 256) NITER = 1000 s = time.time() for i in range(NITER): W_q = torch.quantize_per_tensor(W, scale=1.0, zero_point = 0, dtype=torch.quint8) time_per_iter = (time.time() - s) / NITER print('quantize time per iter ms', time_per_iter * 1000) s = time.time() for i in range(NITER): W_deq = W_q.dequantize() time_per_iter = (time.time() - s) / NITER print('dequantize time per iter ms', time_per_iter * 1000) ``` ### With 1 thread quantize time per iter ms 0.22633790969848633 dequantize time per iter ms 0.6573665142059326 ### With 4 threads quantize time per iter ms 0.0905618667602539 dequantize time per iter ms 0.19511842727661133 ghstack-source-id: 98935895 Test Plan: python test/test_quantized.py Reviewed By: jspark1105 Differential Revision: D20098521 fbshipit-source-id: bd8c45761b4651fcd5b20b95759e3868a136c048	2020-02-26 10:00:40 -08:00
Peter Bell	2eb95d8f4a	Migrate `fmod` and `fmod_` from TH to ATen (CPU) (#33592 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24701 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33592 Differential Revision: D20043875 Pulled By: ezyang fbshipit-source-id: b8c0a4e73a3cef6e55e91bbd35f8aadca8114c56	2020-02-26 09:35:16 -08:00
Hong Xu	f87b0b2515	Remove the use of macros in defining binary ops for base Vec256 (#33733 ) Summary: This greatly improves readability and maintainability (e.g., debugging) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33733 Differential Revision: D20103187 Pulled By: ezyang fbshipit-source-id: e539e46f5d378a2b01da7ecaa6b850655e0fa866	2020-02-26 09:21:35 -08:00
Xiao Wang	c1dd70688a	Fix deprecated python "add" calls (#33428 ) Summary: This PR fixed those python "add" calls using deprecated signature `add(Scalar, Tensor)`. The alternative signature `add(Tensor, alpha = Scalar)` is used. cc csarofeen zasdfgbnm ptrblck ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/33428 Differential Revision: D20002534 Pulled By: vincentqb fbshipit-source-id: 81f2dd6170a47a9b53a17e5817c26e70d8afa130	2020-02-26 09:02:31 -08:00
Ahmad Salim Al-Sibahi	24659d28a1	Feature/vonmises upstream (#33418 ) Summary: Third try of https://github.com/pytorch/pytorch/issues/33177 😄 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33418 Differential Revision: D20069683 Pulled By: ezyang fbshipit-source-id: f58e45e91b672bfde2e41a4480215ba4c613f9de	2020-02-26 08:19:12 -08:00
Martin Yuan	758ad516f3	[Lite interpreter] Pass shared_ptr properly (#33667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33667 Pass shared_ptr properly according to C++ guidances. Thank kimishpatel for pointing it out. Test Plan: Imported from OSS Differential Revision: D20111001 Pulled By: iseeyuan fbshipit-source-id: 213a0f950a7f3b9199d789dc0155911f6102d77a	2020-02-25 21:40:05 -08:00
Michael Carilli	fc6a153688	[WIP] Reanimate gradient scaling API with original scale update heuristic (#33366 ) Summary: Also, windows memory failures responsible for the earlier reversion have been fixed. This PR (initially) contains 2 commits: * a revert of the revert * all changes to implement the original Apex scale update heuristic, squashed into a single commit for easier diff review Pull Request resolved: https://github.com/pytorch/pytorch/pull/33366 Differential Revision: D20099026 Pulled By: ngimel fbshipit-source-id: 339b9b6bd5134bf055057492cd1eedb7e4461529	2020-02-25 19:00:34 -08:00
Emilio Castillo	a836c4ca78	Skip manual backward for `cdist` with case `p=2` (#31167 ) Summary: Fixes an issue with `cdist` backward calculation for large inputs for the euclidean case. The grid size when launching the kernel exceeded the 2^16 limit for the second dimension, resulting in `RuntimeError: CUDA error: invalid configuration argument` Code to reproduce: ``` h, w, d = 800, 1216, 12 n = 133 A = torch.randn(n, d).cuda() B = torch.randn(h, w, d).cuda() A.requires_grad = True B.requires_grad = True B = B.reshape(-1, d).contiguous() dist = torch.cdist(A, B) loss = dist.sum() loss.backward() ``` Thanks to tkerola for the bug report, reproduction and suggesting a solution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31167 Differential Revision: D20035605 Pulled By: ngimel fbshipit-source-id: ae28ba4b549ee07a8bd937bb1de2438dc24eaa17	2020-02-25 18:19:30 -08:00
Stas Bekman	9a5ea71380	pad_packed_sequence: doc improvement (#33768 ) Summary: pad_packed_sequence: 1. clarify that batch's order is restored to the original one 2. add example This is a follow up to https://github.com/pytorch/pytorch/issues/33746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33768 Differential Revision: D20102792 Pulled By: ngimel fbshipit-source-id: 5ef511e5e3833edcb85cc01af0e92568b6d7a3cf	2020-02-25 18:00:04 -08:00
joerg-de	5bac7febad	removed padding and dilation from LPPool2d Doc (#33714 ) Summary: removed padding and dilation from LPPool2d Doc as the function dose not support padding and dilation Pull Request resolved: https://github.com/pytorch/pytorch/pull/33714 Differential Revision: D20097021 Pulled By: ngimel fbshipit-source-id: fc1c2d918b32f4b45c7e6e6bd93f018e867a628f	2020-02-25 17:54:38 -08:00
Haixin Liu	038ee01393	Disable printing of the histogram when dump (#33749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33749 Disable printing of the histogram when dump to make the log cleaner. Test Plan: CI Reviewed By: amylittleyang Differential Revision: D20087735 fbshipit-source-id: 5421cd9d25c340d92f29ce63fed2a58aefef567d	2020-02-25 17:37:55 -08:00
Jerry Zhang	8667379133	[quant][graphmode][refactor] Factor out insertDequantCall (#33172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33172 For code reuse Test Plan: . Imported from OSS Differential Revision: D20087842 fbshipit-source-id: 797868d31b96c4ff8640121ea4bee1396deb6b57	2020-02-25 17:22:35 -08:00
Jerry Zhang	a13ee18982	[quant][graphmode] refactor nodeQuantizable (#33171 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33171 For better code reuse Test Plan: . Imported from OSS Differential Revision: D20087845 fbshipit-source-id: f88cffb410bd54a1b3f937786104f46bcd1190d3	2020-02-25 15:20:22 -08:00
Edward Yang	8159316714	Revert D19941103: [pytorch] blas gemm fix for k=0 Test Plan: revert-hammer Differential Revision: D19941103 Original commit changeset: e1c85d1e7574 fbshipit-source-id: da12747130c60b61452aa46e269c66546a1075f9	2020-02-25 13:30:38 -08:00
xiaobing.zhang	4d203c6fc8	Move cumprod and cumsum to Aten(CPU) (#33280 ) Summary: This PR is about move cumprod and cumsum to Aten. Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" #torch.set_num_threads(1) #warm up for n in [10, 300]: input = torch.randn(n, n, n, requires_grad=False, device=device) input = input * 0.01 + 1 for dim in range(input.dim()): for i in range(100): #output = input.cumsum(dim) output = input.cumprod(dim) for n in [10, 300]: input = torch.randn(n, n, n, requires_grad=False, device=device) input = input * 0.01 + 1 for dim in range(input.dim()): fwd_t = 0 for i in range(1000): t1 = _time() #output = input.cumsum(dim) output = input.cumprod(dim) t2 = _time() fwd_t = fwd_t + (t2 -t1) fwd_avg = fwd_t / 1000 * 1000 print("size = (%d, %d, %d); reduce dim=%d; compute time is %.4f(ms)" % (n, n, n, dim, fwd_avg)) ``` Test device: skx-8180. Performance: ``` size = (10, 10, 10); reduce dim=0; compute time is 0.0098(ms) size = (10, 10, 10); reduce dim=1; compute time is 0.0089(ms) size = (10, 10, 10); reduce dim=2; compute time is 0.0089(ms) size = (300, 300, 300); reduce dim=0; compute time is 208.9403(ms) size = (300, 300, 300); reduce dim=1; compute time is 241.5989(ms) size = (300, 300, 300); reduce dim=2; compute time is 66.2587(ms) After: size = (10, 10, 10); reduce dim=0; compute time is 0.0065(ms) size = (10, 10, 10); reduce dim=1; compute time is 0.0063(ms) size = (10, 10, 10); reduce dim=2; compute time is 0.0053(ms) size = (300, 300, 300); reduce dim=0; compute time is 36.0139(ms) size = (300, 300, 300); reduce dim=1; compute time is 36.0776(ms) size = (300, 300, 300); reduce dim=2; compute time is 21.0111(ms) number_threads = 1: size = (10, 10, 10); reduce dim=0; compute time is 0.0053(ms) size = (10, 10, 10); reduce dim=1; compute time is 0.0052(ms) size = (10, 10, 10); reduce dim=2; compute time is 0.0051(ms) size = (300, 300, 300); reduce dim=0; compute time is 81.8831(ms) size = (300, 300, 300); reduce dim=1; compute time is 88.5687(ms) size = (300, 300, 300); reduce dim=2; compute time is 54.9922(ms) cumprod: Before: size = (10, 10, 10); reduce dim=0; compute time is 0.0096(ms) size = (10, 10, 10); reduce dim=1; compute time is 0.0088(ms) size = (10, 10, 10); reduce dim=2; compute time is 0.0088(ms) size = (300, 300, 300); reduce dim=0; compute time is 221.2601(ms) size = (300, 300, 300); reduce dim=1; compute time is 249.7894(ms) size = (300, 300, 300); reduce dim=2; compute time is 71.5182(ms) number_threads = 1: size = (10, 10, 10); reduce dim=0; compute time is 0.0100(ms) size = (10, 10, 10); reduce dim=1; compute time is 0.0093(ms) size = (10, 10, 10); reduce dim=2; compute time is 0.0093(ms) size = (300, 300, 300); reduce dim=0; compute time is 207.6287(ms) size = (300, 300, 300); reduce dim=1; compute time is 241.6693(ms) size = (300, 300, 300); reduce dim=2; compute time is 66.2977(ms) After: size = (10, 10, 10); reduce dim=0; compute time is 0.0063(ms) size = (10, 10, 10); reduce dim=1; compute time is 0.0062(ms) size = (10, 10, 10); reduce dim=2; compute time is 0.0053(ms) size = (300, 300, 300); reduce dim=0; compute time is 36.4283(ms) size = (300, 300, 300); reduce dim=1; compute time is 38.1139(ms) size = (300, 300, 300); reduce dim=2; compute time is 20.9140(ms) number_threads =1: size = (10, 10, 10); reduce dim=0; compute time is 0.0052(ms) size = (10, 10, 10); reduce dim=1; compute time is 0.0052(ms) size = (10, 10, 10); reduce dim=2; compute time is 0.0050(ms) size = (300, 300, 300); reduce dim=0; compute time is 82.6926(ms) size = (300, 300, 300); reduce dim=1; compute time is 90.1265(ms) size = (300, 300, 300); reduce dim=2; compute time is 55.0196(ms) ``` Fix https://github.com/pytorch/pytorch/issues/24668, https://github.com/pytorch/pytorch/issues/24669. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33280 Differential Revision: D20076997 Pulled By: VitalyFedyunin fbshipit-source-id: 12225767da8cfdc5e44257462a432bffa04cd469	2020-02-25 13:03:16 -08:00
Will Feng	0dded4026e	[C++ API] Add PackedSequence / pack_padded_sequence / pad_packed_sequence / pack_sequence (#33652 ) Summary: Most of the function implementation and test code are translated from the Python version. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33652 Differential Revision: D20052211 Pulled By: yf225 fbshipit-source-id: ce6767db54364f91ef4f06674239a12278c2752a	2020-02-25 12:53:41 -08:00
Yuxin Wu	c20628c5f6	Remove `clean_tag` from tensorboard (#33133 ) Summary: The function originally comes from `4279f99847/tensorflow/python/ops/summary_op_util.py (L45-L68)` As its comment says: ``` # In the past, the first argument to summary ops was a tag, which allowed # arbitrary characters. Now we are changing the first argument to be the node # name. This has a number of advantages (users of summary ops now can # take advantage of the tf name scope system) but risks breaking existing # usage, because a much smaller set of characters are allowed in node names. # This function replaces all illegal characters with _s, and logs a warning. # It also strips leading slashes from the name. ``` This function is only for compatibility with TF's operator name restrictions, and is therefore no longer valid in pytorch. By removing it, tensorboard summaries can use more characters in the names. Before: ![0209-12:10:14](https://user-images.githubusercontent.com/1381301/74109072-37382e00-4b35-11ea-8c9f-ab37a8bd5808.png) After: ![0209-12:10:57](https://user-images.githubusercontent.com/1381301/74109081-4323f000-4b35-11ea-9dab-447f8466a41e.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33133 Differential Revision: D20089307 Pulled By: ezyang fbshipit-source-id: 3552646dce1d5fa0bde7470f32d5376e67ec31c6	2020-02-25 12:41:58 -08:00
peter	72288e82e2	Use shim executable sccache-cl as the compiler instead of sccache cl (#33745 ) Summary: CMake only views the first item of `CC` and `CXX` as executable. So calling `sccache.exe` directly won't work. Using a shim executable resolves this problem. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33745 Differential Revision: D20100397 Pulled By: soumith fbshipit-source-id: 3a130d30dd548b7c2e726c064e66ae4fccb30c44	2020-02-25 12:24:05 -08:00
Edward Yang	0e74cbcc54	Revert "Revert "Revert D19975411: Remove special case codegen for tril_indices/triu_indices." (#33572 )" (#33742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33742 This reverts commit 90f4c5695e1785883d9ae7c86ad3fabd1963a4cb. Test Plan: Imported from OSS Differential Revision: D20095103 Pulled By: ezyang fbshipit-source-id: ff47dae21c278570b4ca497d76deedb75823d6d7	2020-02-25 12:09:49 -08:00
peter	9bc922d518	Extend cuda install timeout for Windows jobs (#33755 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33755 Differential Revision: D20100372 Pulled By: soumith fbshipit-source-id: 8b39177d3e87d248857f0582de6c9e203d09d4a7	2020-02-25 11:51:43 -08:00
Jerry Zhang	7eba36b1f6	[quant][graphmode][refactor] Separate preprocess step for insertObserver (#32813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32813 We need to separate the step to make the logic more clear and also to find all the values we want to skip in advance without the interference of inserted observers Test Plan: . Imported from OSS Differential Revision: D20087841 fbshipit-source-id: ec3654ca561c0d4e2c05011988bb9ecc8671c5c2	2020-02-25 11:26:22 -08:00
Rohan Varma	d82093e665	[profiler] remove redundant assert in record_function_ops (#33225 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33225 This removes a redundant assert statement in `record_function_ops`. In the else branch in question, we are guaranteed to have `current == &rec`, so this assert will never fire. Although, maybe we should add an assert failure when `current == &rec` since it seems that `current` should always be profiler::record_function_exit. ghstack-source-id: 98852219 Test Plan: Existing autograd profiler UTs past Differential Revision: D19849145 fbshipit-source-id: 2014a0d3b9d11e5b64942a54e0fb45e21f46cfa2	2020-02-25 10:59:10 -08:00
Meghan Lele	2b404de347	[scripts] Add script to fetch clang-format binary from AWS S3 (#33644 ) Summary: Summary This commit adds a script that fetches a platform-appropriate `clang-format` binary from S3 for use during PyTorch development. The goal is for everyone to use the exact same `clang-format` binary so that there are no formatting conflicts. Testing Ran the script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33644 Differential Revision: D20076598 Pulled By: SplitInfinity fbshipit-source-id: cd837076fd30e9c7a8280665c0d652a33b559047	2020-02-25 10:47:03 -08:00
Xiang Gao	98526c7444	Migrate fake_quant_slice to TensorIterator (#33744 ) Summary: This is a quick improvement for per tensor quantization. per-channel should remove the loop in https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/quantized/fake_quant_per_channel_affine.cpp # Benchmark: device = GTX-1650 ```python import torch print(torch.__version__) for i in range(1000): torch.randn(1024 * 128, device='cuda') def f(e): a = torch.randn(2 ** e, device='cuda') torch.cuda.synchronize() %timeit torch.fake_quantize_per_tensor_affine(a, 0.5, 0, 0, 1); torch.cuda.synchronize() for i in range(15, 27): f(i) ``` Before ``` 1.5.0a0+bf00b4d 14.5 µs ± 981 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 18.2 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 100000 loops each) 25.6 µs ± 2.72 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 38.6 µs ± 135 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 70.2 µs ± 5.21 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 125 µs ± 4.98 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 231 µs ± 1.36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 461 µs ± 22.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 891 µs ± 88.2 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.77 ms ± 8.13 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.77 ms ± 80.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.16 ms ± 216 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` After ``` 1.5.0a0+3f18ac3 12.5 µs ± 738 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 13.7 µs ± 195 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 17.9 µs ± 850 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 29.7 µs ± 285 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 50.4 µs ± 1.94 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 95 µs ± 8.23 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 173 µs ± 7.37 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 348 µs ± 29.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 657 µs ± 22.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.33 ms ± 77.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.71 ms ± 211 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.33 ms ± 439 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33744 Differential Revision: D20090129 Pulled By: ngimel fbshipit-source-id: 5dd48a0c5455a2b6c5c638d747c1767cb259255d	2020-02-25 10:44:21 -08:00
Gregory Chanan	8196ec0115	Remove some dead THStorage related code. (#33734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33734 Test Plan: Imported from OSS Differential Revision: D20084030 Pulled By: gchanan fbshipit-source-id: 29aa5459e8ecc8af8af31157797f44057d6a786e	2020-02-25 09:44:05 -08:00
Jianyu Huang	5ef1c2c5d2	Back out "[pt][quant] RNN debug test" (#33750 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33750 Original commit changeset: 8c38d8f067e5 ghstack-source-id: 98911215 Test Plan: CI Differential Revision: D20090521 fbshipit-source-id: 73df43ad60574e44e80b36ebf6392030c3efb66e	2020-02-25 09:28:00 -08:00
Alex Cheparukhin	ee23944f46	[Caffe2] Fix shape inference for element-wise operators (#33431 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33431 Some elementwise operators don't have shape and type inference specified for the output tensor: `BitwiseOr`, `BitwiseAnd`, `BitwiseXor`, `Not`, `Sign`. This change fixes this issue: - For `Not` and `Sign` operators, the output has the same type and shape as the input, so `IdenticalTypeAndShapeOfInput` function is used to specify that. - For bitwise operators created by `CAFFE2_SCHEMA_FOR_BINARY_BITWISE_OP` macro, the type and shape inference rules should be the same as for other binary element-wise operators, so `TensorInferenceFunction(ElementwiseOpShapeInference)` is used to specify that. Also some tests were modified to ensure that the shape and type are inferred (`ensure_outputs_are_inferred` parameter) Test Plan: ``` CAFFE2_ASSERT_SHAPEINFERENCE=1 buck test caffe2/caffe2/python/operator_test:elementwise_ops_test CAFFE2_ASSERT_SHAPEINFERENCE=1 buck test caffe2/caffe2/python/operator_test:math_ops_test ``` Note that the tests have to be executed with `CAFFE2_ASSERT_SHAPEINFERENCE=1` in order to fail upon shape inference failure. Reviewed By: idning Differential Revision: D19880164 fbshipit-source-id: 5d7902e045d79e5669e5e98dfb13a39711294939	2020-02-25 09:03:06 -08:00
Jeong Ukjae	819ca2c285	add bfloat16 conversion method in type stub (__init__.pyi) (#33747 ) Summary: Resolve https://github.com/pytorch/pytorch/issues/33699 `torch/__init__.pyi` will be generated like ```python # TODO: One downside of doing it this way, is direct use of # torch.tensor.Tensor doesn't get type annotations. Nobody # should really do that, so maybe this is not so bad. class Tensor: requires_grad: _bool = ... grad: Optional[Tensor] = ... # some methods here... overload def bernoulli_(self, p: _float=0.5, *, generator: Generator=None) -> Tensor: ... def bfloat16(self) -> Tensor: ... def bincount(self, weights: Optional[Tensor]=None, minlength: _int=0) -> Tensor: ... # some methods here... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33747 Differential Revision: D20090316 Pulled By: ngimel fbshipit-source-id: b9ce4c0d4ef720c94ccac0a0342a012e8cf3af0c	2020-02-25 08:49:47 -08:00
Jeong Ukjae	fd175fa8a2	fix bugs in gen_pyi.py (#33748 ) Summary: This loop should generate type hints for inplace binary operator methods (`binop` variable) but had been using `name` variable. That's why that wrong type hints had been generated. Resolve https://github.com/pytorch/pytorch/issues/33698 --- Current `__init__.pyi` has these type hints. ```python class Tensor: # some codes here... overload def zeros_like_(self, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like_(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def zeros_like_(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def zeros_like__(self, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like__(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like__(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def zeros_like__(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def zeros_like___(self, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like___(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like___(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def zeros_like___(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def zeros_like____(self, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like____(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def zeros_like____(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def zeros_like____(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... # some codes here... ``` But `__init__.pyi` should generate these type hints. ```python class Tensor: # some codes here... overload def add_(self, other: Union[Tensor, Number]) -> Tensor: ... overload def add_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def add_(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def add_(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... # some codes here... overload def div_(self, other: Union[Tensor, Number]) -> Tensor: ... overload def div_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def div_(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def div_(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... # some codes here... overload def mul_(self, other: Union[Tensor, Number]) -> Tensor: ... overload def mul_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def mul_(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def mul_(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... # some codes here... overload def sub_(self, other: Union[Tensor, Number]) -> Tensor: ... overload def sub_(self, value: Number, other: Union[Tensor, Number]) -> Tensor: ... overload def sub_(self, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... overload def sub_(self, value: Number, other: Union[Tensor, Number], , out: Optional[Tensor]=None) -> Tensor: ... # some codes here... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33748 Differential Revision: D20090444 Pulled By: ngimel fbshipit-source-id: e4a5dd08126629ec4c54b630a87ee540e669ec9a	2020-02-25 08:45:19 -08:00
albanD	6bdb59539f	follow-up test_torch .data removal (#33696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33696 This changes two tests: - The batchnorm inference cannot change the memory format of the weights as they are 1D. So this is removed. - The batchnorm test now run both in affine and not affine mode. - I added back the test for type errors using .data. In particular, `.data` allows to change the type of a Tensor inplace (very bad, never do it!) but since it is possible, we should test it until .data is removed. cc Enealor who did the first version of the PR. Test Plan: Imported from OSS Differential Revision: D20069241 Pulled By: albanD fbshipit-source-id: a0348f40c44df38d654fb2a2b2b526d9d42f598a	2020-02-25 07:36:42 -08:00
Tongzhou Wang	4ef854b4b4	Fix potential hang when exiting main process (#33721 ) Summary: The following script reproduces the hang ```py import multiprocessing, logging logger = multiprocessing.log_to_stderr() logger.setLevel(multiprocessing.SUBDEBUG) import torch class Dataset: def __len__(self): return 23425 def __getitem__(self, idx): return torch.randn(3, 128, 128), idx % 100 ds = Dataset() trdl = torch.utils.data.DataLoader(ds, batch_size=64, num_workers=300, pin_memory=True, shuffle=True) for e in range(1000): for ii, (x, y) in enumerate(trdl): print(f'tr {e: 5d} {ii: 5d} avg y={y.mean(dtype=torch.double).item()}') if ii % 2 == 0: print("="200 + "BEFORE ERROR" + "="200) 1/0 ``` The process will hang at joining the putting thread of `data_queue` in main process. The root cause is that too many things are put in the queue from the worker processes, and the `put` at `062ac6b472/torch/utils/data/dataloader.py (L928)` is blocked at background thread. The `pin_memory_thread` exits from the set `pin_memory_thread_done_event`, without getting the `(None, None)`. Hence, the main process needs the same treatment as the workers did at `062ac6b472/torch/utils/data/_utils/worker.py (L198)` . After the patch, the script finishes correctly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33721 Differential Revision: D20089209 Pulled By: ezyang fbshipit-source-id: e73fbfdd7631afe1ce5e1edd05dbdeb7b85ba961	2020-02-25 07:04:41 -08:00
Gerard Goossen	7a8b6c2c6b	[pytorch] blas gemm fix for k=0 (#33419 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33419 These conditions are for the specific implementation, the fallback implementation works without these checks. So use that if any of these checks isn't true. ghstack-source-id: 98836075 Test Plan: Previously got error for special case where k=0 which has gone. The error was in some complicated autograd, and I'm not sure how and where an simple regression test should be added. Differential Revision: D19941103 fbshipit-source-id: e1c85d1e75744b1c51ad9b71c7b3211af3c5bcc6	2020-02-25 06:49:50 -08:00
Andrey Malevich	4460c8b034	[C2] Tiny changes to adagrad to make it slightly better. (#33727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33727 Some small changes to adagrad (tiny bit faster, though there is more interesting diff in the stack on this). Test Plan: Part of the stack Reviewed By: chocjy Differential Revision: D20029499 fbshipit-source-id: 7f4fddb9288d7881ef54673b17a0e19ef10d64c0	2020-02-24 23:02:17 -08:00
Andrey Malevich	65864d3634	[C2] Small improvement for elementwise_mul operator. (#33537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33537 Cases of embeddings smaller than 128, we can get a bit more compute by allocating less threads per block. Test Plan: Unit-test, benchmark. Reviewed By: xianjiec Differential Revision: D19969594 fbshipit-source-id: 6cc6b14fc61302804bed9093ea3591f21e3827d8	2020-02-24 23:00:27 -08:00
peter	adbe289870	Update MKL to 2020.0.166 for Windows (#33690 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33690 Differential Revision: D20089300 Pulled By: ezyang fbshipit-source-id: 887c006fbdb2c837f0a1c607a196811f44f1fb35	2020-02-24 22:43:34 -08:00
Will Feng	36919278cc	C++ tensor multi-dim indexing: add index() and index_put_() overloads, simple indexing tests, merge with Python indexing path (#32841 ) Summary: This PR adds the following items: - 1st item: `ArrayRef<TensorIndex>` and `std::initializer_list<TensorIndex>` overloads for `Tensor::index` and `Tensor::index_put_`, to be used specifically for multi-dim indexing purpose. Design rationale: * C++ `Tensor::index` and `Tensor::index_put_` are both existing tensor APIs, and they currently (before this PR) only accept a list of tensors (i.e. `ArrayRef<Tensor>`) as indices. If we change their signatures to also accept non-tensors as indices (i.e. `ArrayRef<TensorIndex>`, and `TensorIndex` is convertible from `Tensor` / `Slice` / `None` / `Ellipsis`), it would slow down the original code path (since now it has to go through more steps), which is undesirable. To get around this problem, the proposed solution is to keep the original `ArrayRef<Tensor>` overload, and add `ArrayRef<TensorIndex>` and `std::initializer_list<TensorIndex>` overloads to `Tensor::index` and `Tensor::index_put_`. This way, the original code path won’t be affected, and the tensor multi-dim indexing API is only used when the user explicitly pass an `ArrayRef<TensorIndex>` or a braced-init-list of `TensorIndex`-convertible types to `Tensor::index` and `Tensor::index_put_` . Note that the above proposed solution would still affect perf for the user’s original `Tensor::index` or `Tensor::index_put_` call sites that use a braced-init-list of tensors as input, e.g. `tensor.index({...})` or `tensor.index_put_({...}, value)`, since now such function calls would take the multi-dim indexing path instead of the original advanced indexing path. However, there are only two instances of this in our codebase (one in ATen cpp test, one in a C++ API nn init function), and they can be easily changed to explicitly use `ArrayRef<Tensor>` as input (I changed them in this PR). For external user’s code, since this is part of the C++ frontend which is still considered experimental, we will only talk about this change in the release note, and ask users to switch to using `ArrayRef<Tensor>` explicitly if they want to keep using the original advanced indexing code path. - 2nd item: Mechanisms for parsing `ArrayRef<TensorIndex>` indices and performing indexing operations (mirroring the functions in `torch/csrc/autograd/python_variable_indexing.cpp`). - 3rd item: Simple tests to demonstrate that the `Tensor::index()` and `Tensor::index_put_()` APIs work. I will add more tests after the first few PRs are reviewed. - 4th item: Merge Python/C++ indexing code paths, for code simplicity. I tested locally and found that there is no perf regression resulting from the merge. I will get more concrete numbers for common use cases when we settle on the overall design. This PR supersedes https://github.com/pytorch/pytorch/pull/30425. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32841 Differential Revision: D19919692 Pulled By: yf225 fbshipit-source-id: 7467e64f97fc0e407624809dd183c95ea16b1482	2020-02-24 22:04:00 -08:00
Ashkan Aliabadi	6aecfd1e80	Mobile Backend: NHWC memory layout + XNNPACK integration. (#33722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33722 In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509 Test Plan: Build: CI Functionality: Not exposed Reviewed By: dreiss Differential Revision: D20069796 Pulled By: AshkanAliabadi fbshipit-source-id: d46c1c91d4bea91979ea5bd46971ced5417d309c	2020-02-24 21:58:56 -08:00
peter	2a4aad7466	Don't activate vc env again for cuda with ninja on Windows (#33700 ) Summary: Possibly get rid of https://github.com/pytorch/pytorch/issues/28271, https://github.com/pytorch/pytorch/issues/27463 and https://github.com/pytorch/pytorch/issues/25393. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33700 Differential Revision: D20089251 Pulled By: ezyang fbshipit-source-id: 0cfe62b869fb874e25f06894aa76fadc44cf6817	2020-02-24 21:56:29 -08:00
Jerry Zhang	7caf3c396b	[quant][graphmode][refactor] Change signature of getModuleAccessPath (#32812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32812 We'll error out for the case we can't handle inside the function, instead of checking each time in the callsite Test Plan: . Imported from OSS Differential Revision: D20087846 fbshipit-source-id: ae6d33a94adf29c4df86d67783e7ef8753c91f90	2020-02-24 21:52:43 -08:00
Shihao Xu	a1862468d0	Add missing test launchers for JitRpcTest and JitDistAutogradTest (#32891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32891 - Add JitDistAutoGradTest into fork/spawn test launcher - Add JitRpcTest into fork/spawn test launcher ghstack-source-id: 98900090 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_spawn ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_spawn ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_fork_thrift buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_spawn buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:rpc_spawn_thrift ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_fork buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_fork_thrift buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_spawn buck test mode/dev-nosan //caffe2/test/distributed/rpc/jit:dist_autograd_spawn_thrift ``` Differential Revision: D5785394 fbshipit-source-id: 335a85424d22f1a83874be81a8139499c9a68ce2	2020-02-24 21:42:47 -08:00
Natalia Gimelshein	a9cef05f5d	improve EmbeddingBag performance on cuda (#33589 ) Summary: This PR improves performance of EmbeddingBag on cuda by removing 5 kernel launches (2 of those are synchronizing memcopies). - 2 memcopies are checking values of offsets[0] and offsets[-1] to be in expected range (0 for the former, less than number of indices for the latter). It seems strange to check only those 2 values, if users are providing invalid offsets, invalid values can be anywhere in the array, not only the first and last element. After this PR, the checks are skipped on cuda, the first value is forced to 0, if the last value is larger than expected, cuda kernel will assert. It is less nice than ValueError, but then again, the kernel could have asserted if other offset values were invalid. On the cpu, the checks are moved inside the cpu implementation from functional.py, and will throw RuntimeError instead of ValueError. - 3 or 4 initializations (depending on the mode) of the output tensors with .zeros() are unnecessary, because every element of those tensors is written to, so their data can be uninitialized on the start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33589 Reviewed By: jianyuh Differential Revision: D20078011 Pulled By: ngimel fbshipit-source-id: 2fb2e2080313af64adc5cf1b9fc6ffbdc6efaf16	2020-02-24 21:37:34 -08:00
Jeong Ukjae	3cf97bc23c	Fix typing error of torch/nn/modules/container.pyi.in (#33686 ) Summary: * `Sequential` has `__iter__` method, but type stub doesn't * `ModuleList.__getitem__` returns `Module`, but type stub doesn't * Type stub says `ParameterList` has `insert` method, but actual `ParameterList` doesn't * `ParameterDict.__getitem__` should returns `Parameter` * `ParameterList` and `ParameterDict` have `extra_repr` methods --- torch/nn/modules/container.py: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/container.py torch/nn/modules/container.pyi.in: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/container.pyi.in Pull Request resolved: https://github.com/pytorch/pytorch/pull/33686 Differential Revision: D20086730 Pulled By: ngimel fbshipit-source-id: a8271489417461c67ff84a239c4cd96c3aa17b5c	2020-02-24 21:20:38 -08:00
Sameer Deshmukh	d6ea4be153	Fix minor problems in index_put_ docs (#33689 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/33641 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33689 Differential Revision: D20086967 Pulled By: ngimel fbshipit-source-id: d9dde8edb904de1cf56b9337920cb29e008b72fb	2020-02-24 21:15:36 -08:00
Jerry Zhang	54aac4af1f	Update hypothesis_utils.py (#33739 ) Summary: A typo.. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33739 Differential Revision: D20088096 Pulled By: jerryzh168 fbshipit-source-id: d8b5d263c25f8c779698607be87bf76aca1811ab	2020-02-24 20:56:42 -08:00
Kevin Chen	cba8af9b24	[pytorch] Set alias analysis kind to FROM_SCHEMA for qadd, qmul, qclamp, qconcat (#33359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33359 Updated alias analysis kind to FROM_SCHEMA so input tensors can be marked as nonmutable when appropriate, allowing for constant folding of these tensors. Needed to update the schemas of the _out variants with annotations to mark the output input tensor as aliased and mutable. Test Plan: ``` import torch class M(torch.nn.Module): def __init__(self): super(M, self).__init__() def forward(self, x): w = torch.tensor([3], dtype=torch.float) w = torch.quantize_per_tensor(w, 1.0, 0, torch.qint8) y = torch.tensor([3], dtype=torch.float) y = torch.quantize_per_tensor(w, 1.0, 0, torch.qint8) return torch.ops.quantized.add_out(x, w, y) m = torch.jit.script(M()) torch._C._jit_pass_constant_propagation(m.graph) print(m.graph) ``` ``` graph(%self : __torch__.___torch_mangle_9.M, %x.1 : Tensor): %11 : int = prim::Constant[value=12]() # <ipython-input-11-1dd94c30cb58>:9:49 %9 : float = prim::Constant[value=1.]() # <ipython-input-11-1dd94c30cb58>:9:41 %10 : int = prim::Constant[value=0]() # <ipython-input-11-1dd94c30cb58>:9:46 %36 : QInt8(1) = prim::Constant[value={3}]() %y.2 : Tensor = aten::quantize_per_tensor(%36, %9, %10, %11) # <ipython-input-11-1dd94c30cb58>:11:12 %24 : Tensor = quantized::add_out(%x.1, %36, %y.2) # <ipython-input-11-1dd94c30cb58>:12:15 return (%24) ``` As expected, the aten::quantize_per_tensor() for w is now folded. The aten::quantize_per_tensor() for y is not folded, since that tensor is aliased/modified. Differential Revision: D19910667 fbshipit-source-id: 127071909573151dc664500d363399e3643441b7	2020-02-24 20:08:06 -08:00
Jerry Zhang	bc5e9e0d55	[quant][graphmode][refactor] Move the check for qconfig inside insertObserver call (#32809 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32809 This is a refactor to help further changes to quantization.cpp We want some operations on the graph happen before we call insertObserver for invoked methods, especially `addIntermediateValuesToSkipObserver` since we want to skip the input of the ReLU module in `Conv - ReLU` pattern. Test Plan: test_jit.py test_quantization.py Imported from OSS Differential Revision: D20087844 fbshipit-source-id: 28b7fa0c7ce9e254ab9208eb344893fb705e14d9	2020-02-24 20:03:33 -08:00
Mikhail Zolotukhin	bf00b4d305	[TensorExpr] Add a boilerplate pass for future TensorExpr fusion pass. (#33464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33464 I added a python-exposed knob to register this pass in custom passes pipeline. If the knob is not used, the pass is not registered and thus not run at all. Differential Revision: D19958217 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: fecdd98567fcda069fbdf8995c796899a3dbfa5c	2020-02-24 18:47:31 -08:00
Tongzhou Wang	9278196d89	scatter_add uses src, not other (#32307 ) Summary: using `other` kwarg gives `TypeError: scatter_add_() missing 1 required positional arguments: "src"` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32307 Differential Revision: D20076859 Pulled By: zou3519 fbshipit-source-id: dfb417c087d5be41fad02dc0b2cf0506c89b1b02	2020-02-24 18:01:34 -08:00
Supriya Rao	98af01ee7c	[quant] Make FakeQuant use REGISTER_DISPATCH (#33682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33682 Previously, there were two API's for CPU and CUDA. This change keeps one top level API, i.e `fake_quantize_per_tensor_affine` and `fake_quantize_per_channel_affine` and uses the device type to dispatch to different backends (CPU and CUDA). CPU kernel implementation is in QuantizedOpKernels.cpp CUDA kernel implementation is in fake_quantize_core.cu Test Plan: python test/test_fake_quant.py Benchmark Results for CPU FakeQuantize tensor of size (2, 256, 128, 128) Before: per tensor quant ms 9.905877113342285 per channel quant ms 74.93825674057007 After: per tensor quant ms 6.028120517730713 per channel quant ms 44.91588592529297 Imported from OSS Differential Revision: D20072656 fbshipit-source-id: 0424f763775f88b93380a452e3d6dd0c90cb814b	2020-02-24 17:48:13 -08:00
Wojciech Baranowski	b10a39bb32	Migrate _cat from TH to ATen (CUDA) (#33237 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24520 Benchmarks: Upstream: ``` $ python -m pt.cat_test --tag_filter all --device cuda --omp_num_threads 1 --mkl_num_threads 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1,1,1)_N2_dim0_cuda # Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 17.355 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 30.718 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(128,1024,2)_N2_dim1_cuda # Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 17.329 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 30.176 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim0_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 74.417 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1025,1023,2)_N2_dim1_cuda # Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 75.728 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim2_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 190.165 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa8876fcf28>,111,65]_N5_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa8876fcf28>, 111, 65], N: 5, dim: 0, device: cuda Forward Execution Time (us) : 57.711 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[96,<function<lambda>at0x7fa886237048>,64]_N5_dim1_cuda # Input: sizes: [96, <function <lambda> at 0x7fa886237048>, 64], N: 5, dim: 1, device: cuda Forward Execution Time (us) : 49.903 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[128,64,<function<lambda>at0x7fa7b57bb840>]_N5_dim2_cuda # Input: sizes: [128, 64, <function <lambda> at 0x7fa7b57bb840>], N: 5, dim: 2, device: cuda Forward Execution Time (us) : 84.181 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bba60>,32,64]_N50_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bba60>, 32, 64], N: 50, dim: 0, device: cuda Forward Execution Time (us) : 82.339 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[32,<function<lambda>at0x7fa7b57bbae8>,64]_N50_dim1_cuda # Input: sizes: [32, <function <lambda> at 0x7fa7b57bbae8>, 64], N: 50, dim: 1, device: cuda Forward Execution Time (us) : 82.312 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[33,65,<function<lambda>at0x7fa7b57bbb70>]_N50_dim2_cuda # Input: sizes: [33, 65, <function <lambda> at 0x7fa7b57bbb70>], N: 50, dim: 2, device: cuda Forward Execution Time (us) : 90.715 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda # Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 129.021 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda # Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda Forward Execution Time (us) : 142.966 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda # Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda Forward Execution Time (us) : 387.023 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bbbf8>]_N100_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bbbf8>], N: 100, dim: 0, device: cuda Forward Execution Time (us) : 36.647 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bbc80>]_N1000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bbc80>], N: 1000, dim: 0, device: cuda Forward Execution Time (us) : 278.890 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bbd08>]_N2000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bbd08>], N: 2000, dim: 0, device: cuda Forward Execution Time (us) : 557.752 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7fa7b57bbd90>]_N3000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7fa7b57bbd90>], N: 3000, dim: 0, device: cuda Forward Execution Time (us) : 842.512 ``` New version: ``` $ python -m pt.cat_test --tag_filter all --device cuda --omp_num_threads 1 --mkl_num_threads 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : all # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1,1,1)_N2_dim0_cuda # Input: sizes: (1, 1, 1), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 24.419 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 25.025 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(128,1024,2)_N2_dim1_cuda # Input: sizes: (128, 1024, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 24.247 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(512,512,2)_N2_dim1_cuda # Input: sizes: (512, 512, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 25.098 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim0_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 0, device: cuda Forward Execution Time (us) : 74.441 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1025,1023,2)_N2_dim1_cuda # Input: sizes: (1025, 1023, 2), N: 2, dim: 1, device: cuda Forward Execution Time (us) : 74.866 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(1024,1024,2)_N2_dim2_cuda # Input: sizes: (1024, 1024, 2), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 189.280 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1c9b056048>,111,65]_N5_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1c9b056048>, 111, 65], N: 5, dim: 0, device: cuda Forward Execution Time (us) : 57.629 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[96,<function<lambda>at0x7f1c9b0560d0>,64]_N5_dim1_cuda # Input: sizes: [96, <function <lambda> at 0x7f1c9b0560d0>, 64], N: 5, dim: 1, device: cuda Forward Execution Time (us) : 49.975 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[128,64,<function<lambda>at0x7f1bce8f38c8>]_N5_dim2_cuda # Input: sizes: [128, 64, <function <lambda> at 0x7f1bce8f38c8>], N: 5, dim: 2, device: cuda Forward Execution Time (us) : 83.643 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3ae8>,32,64]_N50_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3ae8>, 32, 64], N: 50, dim: 0, device: cuda Forward Execution Time (us) : 82.307 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[32,<function<lambda>at0x7f1bce8f3b70>,64]_N50_dim1_cuda # Input: sizes: [32, <function <lambda> at 0x7f1bce8f3b70>, 64], N: 50, dim: 1, device: cuda Forward Execution Time (us) : 82.323 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[33,65,<function<lambda>at0x7f1bce8f3bf8>]_N50_dim2_cuda # Input: sizes: [33, 65, <function <lambda> at 0x7f1bce8f3bf8>], N: 50, dim: 2, device: cuda Forward Execution Time (us) : 90.549 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(64,32,4,16,32)_N2_dim2_cuda # Input: sizes: (64, 32, 4, 16, 32), N: 2, dim: 2, device: cuda Forward Execution Time (us) : 129.022 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(16,32,4,16,32)_N8_dim2_cuda # Input: sizes: (16, 32, 4, 16, 32), N: 8, dim: 2, device: cuda Forward Execution Time (us) : 142.969 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes(9,31,5,15,33)_N17_dim4_cuda # Input: sizes: (9, 31, 5, 15, 33), N: 17, dim: 4, device: cuda Forward Execution Time (us) : 386.973 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3c80>]_N100_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3c80>], N: 100, dim: 0, device: cuda Forward Execution Time (us) : 43.800 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3d08>]_N1000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3d08>], N: 1000, dim: 0, device: cuda Forward Execution Time (us) : 279.023 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3d90>]_N2000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3d90>], N: 2000, dim: 0, device: cuda Forward Execution Time (us) : 565.790 # Benchmarking PyTorch: cat # Mode: Eager # Name: cat_sizes[<function<lambda>at0x7f1bce8f3e18>]_N3000_dim0_cuda # Input: sizes: [<function <lambda> at 0x7f1bce8f3e18>], N: 3000, dim: 0, device: cuda Forward Execution Time (us) : 845.153 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33237 Differential Revision: D20069181 Pulled By: ngimel fbshipit-source-id: b392e1ffd72c0d8df0c5a2d3ac96f59b37c84e32	2020-02-24 17:41:16 -08:00
svcscm	97da60d511	Updating submodules Summary: GitHub commits: `ea8bae1f0f` `134472ee45` `37e6cf9d62` `eb367d45c0` `76de6e15c0` `e1b1a55309` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: 9d0d688d81be822900475223a787c5649e143e85	2020-02-24 17:34:59 -08:00
Jerry Zhang	479e474a37	[quant][graphmode] FoldConvBatchNorm2d support shared ClassTypes (#32379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32379 Folding Conv2d - BatchNorm2d modules means recalculate the weight and bias of Conv2d module by incorproating the parameters of BatchNorm2d, and also change the method calls to calling only forward of Conv2d module, this involves change of both module types and graph because the bias of Conv2d is a parameter when it has value and is an attribute when it is None(since JIT code has assumption of prameter being Tensor in multiple places), therefore we'll need to remove the bias attribute when it is None and add a bias attribute later. Since ClassType might be shared, we separate remove and add in separate steps and also keep track of the processed graph to avoid modifying the graph and type multiple times. However we'll have to record the slot index of bias as well so we can replay the slot removal on other instances of Conv2d module. Test Plan: tbd Imported from OSS Differential Revision: D20078719 fbshipit-source-id: cee5cf3764f3e0c0a4a2a167b78dbada2e3835cc	2020-02-24 17:29:13 -08:00
Xiang Gao	54e41a87eb	Make ELU great again (#33244 ) Summary: Due to compiler bug, we have to make some workaround on ELU for CUDA. A necessary condition for this bug to happen is `invoke_with_array` in `Loops.cuh`. Now, https://github.com/pytorch/pytorch/issues/33222 will kill that function, and we need to remove that workaround once https://github.com/pytorch/pytorch/issues/33222 is landed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33244 Differential Revision: D20076197 Pulled By: ngimel fbshipit-source-id: 39f99783014c78cecad1c39cb46092278ff220b9	2020-02-24 17:18:30 -08:00
Jianyu Huang	5b031d961d	[pt][quant] RNN debug test (#33621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33621 ghstack-source-id: 98746093 Test Plan: buck test mode/dev caffe2/test:quantization -- 'test_quantized_rnn $test_quantization\.PostTrainingDynamicQuantTest$' --print-passing-details Differential Revision: D20036968 fbshipit-source-id: 7cbb027a6afbe28bc250fc663089c6a9406e880b	2020-02-24 16:15:17 -08:00
Xinyi Zhang	696527e659	[caffe2] Add embedding empty ratio checker (disabled by default) (#33145 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33145 Reviewed By: xianjiec Differential Revision: D19716574 fbshipit-source-id: 42a636600ac3977910d35093916865790bbe5b10	2020-02-24 16:10:01 -08:00
Yanli Zhao	5090d7082b	add propagate flag USE_DISTRIBUTED for libtorch_python_source Reviewed By: pritamdamania87 Differential Revision: D20070789 fbshipit-source-id: fdb8a2eefb5bfc1ae1d80e29bd15eb1d70920c87	2020-02-24 16:02:47 -08:00
Gregory Chanan	330b69fef8	Kill dead scalar_check. (#33695 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33695 I'm not sure how this stuck around, but it has no effect. Test Plan: Imported from OSS Differential Revision: D20068867 Pulled By: gchanan fbshipit-source-id: 79191338a8bc7a195e2b7265005ca6f00aab3818	2020-02-24 14:53:24 -08:00
Supriya Rao	996c0adb53	[quant] Regsiter fake_quant and observer attributes as buffers (#33626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33626 For DDP we require the attributes to be registered as buffers. By doing this the value is broadcast from one device to the rest. Test Plan: Tested on actual model on GPU Imported from OSS Differential Revision: D20038839 fbshipit-source-id: 82e829fc3baca0b3262c3894a283c375eb08a4a4	2020-02-24 14:16:03 -08:00
Michael Suo	dc3d47110a	[docs] add experimental warning to TorchScript classes in language reference (#33697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33697 reference Test Plan: Imported from OSS Differential Revision: D20070220 Pulled By: suo fbshipit-source-id: 9828d876afed59203cc472eaf0134d52d399069e	2020-02-24 14:01:19 -08:00
Will Feng	533b973fd0	Fix visibility of torch::nn::RNNImpl::options (#33718 ) Summary: In PR https://github.com/pytorch/pytorch/issues/33027, `options` in RNNImpl was mistakenly changed to `protected` (it was `public` before) ``` protected: FORWARD_HAS_DEFAULT_ARGS({1, AnyValue(Tensor())}) RNNOptions options; ``` This PR changes it back to `public` again. Fixes https://github.com/pytorch/pytorch/issues/33694. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33718 Differential Revision: D20075149 Pulled By: yf225 fbshipit-source-id: 82901369eeaacd82df849e17df64dc1aaf98f9fe	2020-02-24 13:50:39 -08:00
Edward Yang	062ac6b472	Bring up new-style registration API as wrapper around old-style (#33205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33205 A number of important use-cases are implemented: - def(schema): defines a schema, with no implementation (alias inferred from schema, by default) - def(schema, fn_ptr): registers fn_ptr as a catch-all kernel for the operation - def(schema, lambda): registers lambda as a catch-all kernel for the operation - def(schema, torch::dispatch(dispatch_key, fn)), and def(schema, torch::dispatch(device_type, fn)): registers the function to only be executed when dispatch_key/device_type is selected for use - def(schema, TORCH_OPTIMIZED_FN(fn)): registers the function as unboxed only, using the inline syntax All of our code generated registrations in ATen are switched to the new API. Some aspects of the API which are not fully implemented: - It's still not valid to omit the schema when registering a function pointer, due to #32549 - Although it's possible to take advantage of top-level namespaces ala torch::import("aten"), we don't use it because this results in worse code (as we have to cat everything back together). This is not an essential problem, we just need the internals to be less stupid. There are some aspects of the API which don't semantically make sense, but I chose not to fix them in this PR: - For some reason, TORCH_OPTIMIZED_FN uses the runtime wrapper to do wrapping, rather than the compile time one which inlines the function in. This means that there isn't any reason we should be passing in the function pointer as a template argument; a regular old argument ought to have worked fine. This is seemingly consistent with the current API though; needs further investigation. - There's no reason to optional<DispatchKey>, DispatchKey would work just fine (use DispatchKey::Undefined for the nullopt case) In the long term, we should swap the wrapper around: the new-style API has the real implementation, and the old-style API is backwards compatibility. However, this implies a lot of internal refactoring, so I decided to short circuit around it to get this in faster Ancillary changes: - I stopped moving optional<DispatchKey>, it's literally just two words, pass it by value please. - Needed to add a & qualified version of RegisterOps::op, since I'm storing RegisterOps as a member inside the new style Namespace and I cannot conveniently get a rvalue reference to it in that situation. (BTW, register_ = std::move(register_) really doesn't work, don't try it!) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19856626 Pulled By: ezyang fbshipit-source-id: 104de24b33fdfdde9447c104853479b305cbca9a	2020-02-24 11:45:14 -08:00
David Reiss	ced8865d91	Add sigmoid to mobile ops Summary: Used by segmentation model. Test Plan: Ran segmentation model on mobile. Reviewed By: iseeyuan Differential Revision: D19881378 fbshipit-source-id: 87f00058050fd173fbff1e88987ce09007622b83	2020-02-24 11:37:24 -08:00
Peter Bell	32c93099c4	Add typing info for data members of utils.data.sampler classes (#33679 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33490 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33679 Differential Revision: D20063099 Pulled By: ngimel fbshipit-source-id: 1bbf71a65408d117019ab38d7d095cfd337f5d1e	2020-02-24 11:29:59 -08:00
Yanli Zhao	4d9b649261	jit pickling rref (#32959 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32959 in rpc torch script call path, we need to pickle/unpickle rref, this diff is added to make jit pickler/unpickler be able to pickle/unpickle rref. It is similar to what is implemented for PyRef::pickle() and PyRef::unpickle(). The pickling/unpickling design assumes it is always coupled with RPC calls. It is not needed to checkpoint a model with rref, before checkpointing the model, user should call ref.to_here() to get value inside rref. The pickling process is: 1. push torch.distributed.rpc.rref global string 1. call rref.fork() and create rrefForkData, which is a few IDs and type str of the value held inside the rref, the IDs includes rref id, fork id, caller work id, callee work id, owner work id 2. push the rrefForkData The unpickling process is: 1. read torch.distributed.rpc.rref global string, and retrieve the cached global lamda function 2. the globa lamda function will get rrefForkData 3. if callee is also owner work id, then get owner rref based on Ids inside rrefFork data and return the ownerRRef 4. if callee is not owner work id, then create user rref using the rrefForkData and return the userRRef 5. meanwhile owner rref will be notified and do reference counting correctly During unpickling, a type_resolver is needed to parse type str. This type_resolver has python dependency, so we get it from rpc_agent, and pass it to unpickler during construction. So we added a type_resolver argumenmt to jit unpickler constructor in this diff. ghstack-source-id: 98814793 Test Plan: unit test Differential Revision: D19713293 fbshipit-source-id: 4fd776cdd4ce8f457c4034d79acdfb4cd095c52e	2020-02-24 11:16:35 -08:00
Thomas Viehmann	481e7f2e78	catch and propagate warnings for JIT ScriptMethods (#33010 ) Summary: We align it with ScriptFunctions by using the HANDLE_TH_ERRORS/END_HANDLE_TH_ERRORS_PYBIND macros. Fixes https://github.com/pytorch/pytorch/issues/24155 or https://github.com/pytorch/pytorch/issues/24828 ? Pull Request resolved: https://github.com/pytorch/pytorch/pull/33010 Differential Revision: D20053585 Pulled By: suo fbshipit-source-id: c8876b54069285ba9638bb2328fd8738b59c396d	2020-02-24 10:28:17 -08:00
Xingdong Zuo	6a76433b9d	[Update independent.py]add explicit string representation (#33676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33676 Differential Revision: D20069202 Pulled By: ngimel fbshipit-source-id: 48b609d4fb7a098e9e3383553103a9441673d63f	2020-02-24 10:15:00 -08:00
DuckSoft	6a275b696e	adding IterableDataset to utils.data.__init__ (#33543 ) Summary: this shall fix issue https://github.com/pytorch/pytorch/issues/27820 again Pull Request resolved: https://github.com/pytorch/pytorch/pull/33543 Differential Revision: D20002446 Pulled By: vincentqb fbshipit-source-id: 7563a56fd6238efe8ea5626b02ba5e8fcda0780e	2020-02-24 10:09:38 -08:00
Gregory Chanan	e3ba533c8b	Minimize the cases where we have to cpu_zero. (#33570 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33570 In this PR, we are a bit more careful about avoiding zero-ing the output. Analysis as follows: 1) `mm` doesn't need zero_ because it never calls scal, which is the underlying problem. 2) for `mv`, which does call scal (in certain cases), we can just move the zeroing to where it would actually be a problem, namely when the scalar value is 0. In this case we just run the non-BLAS version of the code. Test Plan: Imported from OSS Differential Revision: D20007665 Pulled By: gchanan fbshipit-source-id: 1f3a56954501aa9b2940d2f4b35095b2f60089a8	2020-02-24 07:47:36 -08:00
Gregory Chanan	641750e33c	Fix NaN handling in torch.mv. (#31666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31666 List of changes: 1) Fix a case where torch.mv was not handling NaNs correctly. In particular, with a transposed tensor and expanded vector, NaNs in the output are kept, even if beta = 0. This is handled in the `out=` case by zero-ing out the passed-in Tensor, but this can happen just the same with the non-out variant if the allocated tensor happens to have a NaN. Also adds tests for this case. NOTE: we zero out the output tensor in all cases for mv and mm, even though this is probably overkill. I didn't find another case where this would be a problem, but the old code at least attempted to do this for all mv and mm calls and I didn't add comprehensive testing to be sure that it's not a problem. 2) on CPU: move mv, mv_out, mm, mm_out to be direct wrappers on _th_addmv, _th_addmm, rather than having their own wrappers in Declarations.cwrap. Ths is to remove the magic around cpu_zero from the codegen, which simplifies the codegen and makes testing this easier. Test Plan: Imported from OSS Differential Revision: D19239953 Pulled By: gchanan fbshipit-source-id: 27d0748d215ad46d17a8684696d88f4cfd8a917e	2020-02-24 07:46:08 -08:00
Ashkan Aliabadi	039dc90854	Revert D19521853: [pytorch][PR] Mobile Backend: NHWC memory layout + XNNPACK integration. Test Plan: revert-hammer Differential Revision: D19521853 Original commit changeset: 99a1fab31d0e fbshipit-source-id: 76dfc1f481797ba2386997533cf19957637687d6	2020-02-23 22:07:19 -08:00
James Reed	9d834cc889	[JIT] Fix FunctionType::python_str() (#33680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33680 Test Plan: Imported from OSS Differential Revision: D20062777 Pulled By: jamesr66a fbshipit-source-id: fcdb0527ca6776ff161cd535794e9c12bb32bdde	2020-02-23 21:52:09 -08:00
Zachary DeVito	5fa03d4dbb	Fix bug where we were trying to get a schema for prim::Constant, which is not registered as an operator. (#33645 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33645 Fix bug where we were trying to get a schema for prim::Constant, which is not registered as an operator. ghstack-source-id: 98785729 Test Plan: buck test mode/dev //pytext/models/test:scripted_seq2seq_generator_test -- 'test_generator $pytext\.models\.test\.scripted_seq2seq_generator_test\.ScriptedSeq2SeqGeneratorTest$' Differential Revision: D20050833 fbshipit-source-id: cc38510b0135b750fdf57fb9c1e66ce1d91ee128	2020-02-23 21:37:35 -08:00
Michael Carilli	e1bddbbaf6	Bounds checking for functor execution in vectorized/unrolled kernels (#33642 ) Summary: The current logic for vectorized/unrolled operations in CUDALoops.cuh applies bounds checking to loads and stores, [but not to the actual functor's execution](`16d6c17845/aten/src/ATen/native/cuda/CUDALoops.cuh (L264)`). In other words, for a block acting on the tail of a tensor that doesn't require the whole block to participate in memory transactions, many threads execute their functor on uninitialized data. For functors that only communicate with the outside world via the bounds-checked loads and stores, that's ok. The threads acting on garbage data never actually write their results. But [my proposed inf/nan checking kernel](https://github.com/pytorch/pytorch/pull/33366/files#diff-9701a2b34900195d160bdc234e001b79R70-R79) has the additional side effect of writing to a `found_inf` flag in global memory. For irregularly-shaped tensors where tail threads execute the functor on garbage data, these threads would sometimes see and report spurious infs/nans. In general, we can't guarantee functors won't have side effects. For safety (and efficiency) we should apply bounds checking to the functor execution as well as the loads and stores. Is it possible that other elementwise kernels (in addition to the strided/vectorized implementation) are also executing functors unconditionally? That would cause similar failures. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33642 Differential Revision: D20062985 Pulled By: ngimel fbshipit-source-id: 65b8d75a001ce57865ed1c0cf89105d33f3f4dd4	2020-02-23 21:17:31 -08:00
Ashkan Aliabadi	941b42428a	Mobile Backend: NHWC memory layout + XNNPACK integration. (#32509 ) Summary: In order to improve CPU performance on floating-point models on mobile, this PR introduces a new CPU backend for mobile that implements the most common mobile operators with NHWC memory layout support through integration with XNNPACK. XNNPACK itself, and this codepath, are currently only included in the build, but the actual integration is gated with USE_XNNPACK preprocessor guards. This preprocessor symbol is intentionally not passed on to the compiler, so as to enable this rollout in multiple stages in follow up PRs. This changeset will build XNNPACK as part of the build if the identically named USE_XNNPACK CMAKE variable, defaulted to ON, is enabled, but will not actually expose or enable this code path in any other way. Furthermore, it is worth pointing out that in order to efficiently map models to these operators, some front-end method of exposing this backend to the user is needed. The less efficient implementation would be to hook these operators into their corresponding native implementations, granted that a series of XNNPACK-specific conditions are met, much like how NNPACK is integrated with PyTorch today for instance. Having said that, while the above implementation is still expected to outperform NNPACK based on the benchmarks I ran, the above integration would be leave a considerable gap between the performance achieved and the maximum performance potential XNNPACK enables, as it does not provide a way to compute and factor out one-time operations out of the inner most forward() loop. The more optimal solution, and one we will decide on soon, would involve either providing a JIT pass that maps nn operators onto these newly introduced operators, while allowing one-time calculations to be factored out, much like quantized mobile models. Alternatively, new eager-mode modules can also be introduced that would directly call into these implementations either through c10 or some other mechanism, also allowing for decoupling of op creation from op execution. This PR does not include any of the front end changes mentioned above. Neither does it include the mobile threadpool unification present in the original https://github.com/pytorch/pytorch/issues/30644. Furthermore, this codepath seems to be faster than NNPACK in a good number of use cases, which can potentially allow us to remove NNPACK from aten to make the codebase a little simpler, granted that there is widespread support for such a move. Regardless, these changes will be introduced gradually and in a more controlled way in subsequent PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32509 Reviewed By: dreiss Differential Revision: D19521853 Pulled By: AshkanAliabadi fbshipit-source-id: 99a1fab31d0ece64961df074003bb852c36acaaa	2020-02-23 19:08:42 -08:00
Enealor	7aa605ed92	Remove uses of `.data` in test_torch (#33638 ) Summary: Removes almost every usage of `.data` in test_torch to address part of https://github.com/pytorch/pytorch/issues/33629. Lines 4706-4710 had to be refactored to allow this. The changed test is fundamentally the same, as it appears to be meant to confirm that using an input of a different type than the weight causes an appropriate error. There is one remaining usage of `.data`, and it is on line 5132. This was left as the `set_` and `resize_` methods still mention `.data` explicitly. I figure the right time to remove this is when those methods have their runtime errors updated. Note: ~~some tests are skipped locally, and so I am still verifying that nothing has been obviously broken.~~ Appears to be passing early tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33638 Differential Revision: D20062288 Pulled By: albanD fbshipit-source-id: 672a6d7a20007baedb114a20bf1ddcf6c4c0a16a	2020-02-23 14:11:21 -08:00
Lu Fang	6d448acb34	[PyTorch BC] Skip aten::random_ to fix BC CI (#33666 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33666 it's caused by a revert. So let's skip it. Test Plan: ci Reviewed By: hl475 Differential Revision: D20057382 fbshipit-source-id: d71af8efe68b31befcef5dddc372540e8a8ae2ac	2020-02-22 21:28:18 -08:00
Akash S M	9e384f9ce4	Remove duplicate header include. (#33656 ) Summary: The same header `<torch/nn/functional/conv.h>` is included twice. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33656 Differential Revision: D20056913 Pulled By: yf225 fbshipit-source-id: b1563035c9821731b99c26eec130ff0b9cc627a7	2020-02-22 14:17:07 -08:00
Pavel Belevich	312627a7c3	Revert D19776613: Migrate `random_` from the TH to Aten (CPU) Test Plan: revert-hammer Differential Revision: D19776613 Original commit changeset: a8d262bccf5f fbshipit-source-id: 36389ffa3d8377743f55f97221d7a7ee25a409f6	2020-02-22 08:15:27 -08:00
Yinghai Lu	a2f3c6c26f	Call RandomNumberSeed() on-demand (#33539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33539 We rarely use the `random_seed_` in context but we always initialize it with `RandomNumberSeed()` which isn't trivial. This diff makes it that we only call `RandomNumberSeed()` once when we want to use `random_seed_`. Test Plan: unittests. Canaries: AF: https://our.intern.facebook.com/intern/ads/canary/424753437441438410 AI: https://our.intern.facebook.com/intern/ads/canary/424753467414318838 Prospector: https://our.intern.facebook.com/intern/ads/canary/424753976999968569 Reviewed By: ipiszy Differential Revision: D19993190 fbshipit-source-id: 1d2606bd65476ff3b519c69f9cbfa3b80f75cdff	2020-02-22 01:22:18 -08:00
Mike Ruberry	8291e06f8f	Fixes cuda->numpy and non-strided->numpy segfaults (#33612 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/33300. Calling .numpy() on a CUDA or non-strided (e.g. sparse) tensor segfaults in current PyTorch. This fixes the segfaults and throws the appropriate TypeError, as was intended. Two tests, one in test_cuda.py and the other in test_sparse.py, are added to verify the behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33612 Differential Revision: D20038210 Pulled By: mruberry fbshipit-source-id: 265531dacd37c392232fd3ec763489a62ef54795	2020-02-21 22:23:08 -08:00
Lu Fang	59daf1611b	[Caffe2] Skip //caffe2/caffe2:caffe2_test_cpu -- 'DBSeekTest\.RocksDB' Summary: Skip the test to unblock dper fbpkg push Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu -- 'DBSeekTest\.RocksDB' --run-disabled Reviewed By: cheshen1 Differential Revision: D20043418 fbshipit-source-id: 05ceb2cea08722a671fa211d73680fd4b78f354c	2020-02-21 21:30:02 -08:00
Lu Fang	1c08fa7051	[Caffe2] Skip caffe2/caffe2:caffe2_test_cpu - DBSeekTest.LMDB Summary: skip broken tests in https://fburl.com/svc/zsbsrc7a to unblock dper fbpkg push. Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu -- 'DBSeekTest\.LMDB' --run-disabled Reviewed By: cheshen1 Differential Revision: D20042330 fbshipit-source-id: 5b86e66da2a219c915c471b8e87f33239bdc5ba9	2020-02-21 21:28:31 -08:00
Nikolay Korovaiko	a7e22b4c6a	add bailout checks to checkScript (#32802 ) Summary: this adds enough infrastructure to run bailout checks in `checkScript`. I'll need to figure out the best way to enable it for nightly builds now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32802 Differential Revision: D19974718 Pulled By: Krovatkin fbshipit-source-id: 40485503f6d3ae14edcce98e1eec1f0559f3ad08	2020-02-21 21:18:54 -08:00
Michael Ranieri	9b2b15f4fc	misc windows warning fixes (#33632 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33632 * `inline_container.h` was unnecessarily exposing all includers to caffe2 headers via `caffe2/core/logging.h` * Add msvc version of hiding unused warnings. * Make sure clang on windows does not use msvc pragmas. * Don't redefine math macro. Test Plan: CI green Differential Revision: D20017046 fbshipit-source-id: 230a9743eb88aee08d0a4833680ec2f01b7ab1e9	2020-02-21 19:36:25 -08:00
Pavel Belevich	d971007c29	Migrate `random_` from the TH to Aten (CPU) (#32534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32534 Fixes #24752 Fixes #32510 Test Plan: Imported from OSS Differential Revision: D19776613 Pulled By: pbelevich fbshipit-source-id: a8d262bccf5f2807f6125c83080aa16d77491b19	2020-02-21 16:13:58 -08:00
Dmytro Dzhulgakov	e10aa6b72f	Fix flaky DagNetTest unittest Summary: The first run of the net is noisy sometimes - just run it twice. Reviewed By: cheshen1 Differential Revision: D20039274 fbshipit-source-id: 639e65646bf52f3efe1ecd4bbcd0e413d9389b29	2020-02-21 16:08:04 -08:00
Andrey Malevich	6474ea404d	[C2] Native GPU implementation for bucketize (#33529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33529 Current version goes through GPU -> CPU -> GPU copy and is pretty slow: ~19 ms for 1M elements with 20 possible buckets based on benchmark. This new version is ~0.2 on the same Test Plan: benchmark + unit-test Reviewed By: chocjy Differential Revision: D19969518 fbshipit-source-id: 51889bc9a232b6d45d9533e53b7b7f4531da481f	2020-02-21 15:47:04 -08:00
Hong Xu	15ba902c08	Turn ONNX_ML into a proper build option. (#33424 ) Summary: The detection of the env variable ONNX_ML has been properly handled in tools/setup_helpers/cmake.py, line 242. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33424 Differential Revision: D20043991 Pulled By: ezyang fbshipit-source-id: 91d1d49a5a12f719e67d9507cc203c8a40992f03	2020-02-21 15:42:33 -08:00
Natalia Gimelshein	16d6c17845	improve roll performance (#33623 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33544 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33623 Differential Revision: D20037643 Pulled By: ngimel fbshipit-source-id: 9fd293eca5242daf414c116344b2e1fde9f9ebc5	2020-02-21 15:09:51 -08:00
Xiang Gao	f62f1b2ef0	Revert "Revert D19964089: [pytorch][PR] Allow vectorized gpu loop to … (#33553 ) Summary: …have different argument types" This reverts commit 05fb160048b71c1b8b00d2083a08618318158c1a. Please go to https://github.com/pytorch/pytorch/pull/33558 and check the CUDA9 on CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/33553 Differential Revision: D20017575 Pulled By: ngimel fbshipit-source-id: a5fd78eea00c7b0925ab21fd90a7daeb66725f1a	2020-02-21 14:56:30 -08:00
Edward Yang	a72946dbab	Stop generating out full function type for registration, use decltype or infer it (#33097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33097 Previously, we had to specify full types because the functions we registering might be overloaded, and the type was necessary to resolve the ambiguity. I disambiguate all of these names by mangling the names of the methods we place on CPUType/CUDAType/TypeDefault with the overload name (these are internal wrappers which are not user visible), and then can strip the generation of full function types from the registration. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19837898 Pulled By: ezyang fbshipit-source-id: 5f557184f6ec84cb0613d4eb2e33b83fd1712090	2020-02-21 14:26:14 -08:00
Edward Yang	22963f42ec	Delete unnecessary aliasAnalysis specification from operator registrations. (#33093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33093 In #30187 the aliasAnalysis field on operator registration was updated so that alias analysis could be specified in only some registration call sites, rather than requiring it be consistently specified in all call sites. With this change, we can eliminate the requirement that all registrations specify aliasAnalysis; as long as we know one site specifies the correct aliasAnalysis, we don't have to specify it any of the other sites. In this patch, the "one site" is TypeDefault.cpp (previously we only generated these stub declarations for manually registered functions, but now we generate the stubs for everything). Then I delete aliasAnalysis anywhere we register an op for an existing function (which is a lot of places). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19837897 Pulled By: ezyang fbshipit-source-id: 26a7fbc809ec1553da89ea5c0361f3e81526d4c2	2020-02-21 14:24:44 -08:00
Yanli Zhao	d5b768dffd	refactor strongTypePtr (#33590 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33590 ghstack-source-id: 98713798 Test Plan: unit test Differential Revision: D20015521 fbshipit-source-id: 8c744a6f30f12671bef89c3555110ce26609d9a3	2020-02-21 13:32:18 -08:00
Suyash458	47e90d774e	C++/Python API Parity: add pad_sequence (#32387 ) Summary: - add `pad_sequence` and tests - related issue https://github.com/pytorch/pytorch/issues/25883 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32387 Differential Revision: D20025421 Pulled By: yf225 fbshipit-source-id: caa9ae2114bece8db387a3a1610f24a3e06b1324	2020-02-21 13:16:09 -08:00
Mikhail Zolotukhin	bb5181b716	[TensorExpr] Add IR Printer. (#33220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33220 Test Plan: Imported from OSS Differential Revision: D19848379 Pulled By: ZolotukhinM fbshipit-source-id: 1c6ab4f63080d4506dedc3c47938de92fb4bfba2	2020-02-21 13:10:26 -08:00
Mikhail Zolotukhin	fc70fc3610	[TensorExpr] Add IR visitor, IR mutator, and IR evaluator. (#33219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33219 Test Plan: Imported from OSS Differential Revision: D19848381 Pulled By: ZolotukhinM fbshipit-source-id: 44ca7cd99c25e290a8ffd8146785c19f9c785dfd	2020-02-21 13:10:22 -08:00
Mikhail Zolotukhin	49af9425a7	[TensorExpr] Add core classes for representing expressions and statements. (#33218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33218 Test Plan: Imported from OSS Differential Revision: D19848378 Pulled By: ZolotukhinM fbshipit-source-id: 48399f8651324d5ad0607e08573d5d7b2026bb23	2020-02-21 13:10:17 -08:00
Mikhail Zolotukhin	1a4f997178	[TensorExpr] Add a class for representing data type. (#33217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33217 Test Plan: Imported from OSS Differential Revision: D19848380 Pulled By: ZolotukhinM fbshipit-source-id: d8683f8fc4555d2456cd2a7c827d8e8231915b49	2020-02-21 13:10:12 -08:00
Mikhail Zolotukhin	089d658153	[TensorExpr] Add classes for memory management in tensor expressions. (#33216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33216 All tensor expressions belong to a kernel arena and are freed when the arena is destroyed. Until it is destroyed, all expressions stay valid. Test Plan: Imported from OSS Differential Revision: D19848382 Pulled By: ZolotukhinM fbshipit-source-id: a581ea2b635b9ba2cc53949616a13d8d3a47caae	2020-02-21 13:08:50 -08:00
ashish	616beb1412	[ROCm] Added support for pytorch extensions to use HIP (#32669 ) Summary: This pull request has changes for: 1. Enabling a torch module with HIP code to be compiled by cpp_extensions.py 2. Fixes for hipify module to be able to be used by a torch extension cc: ezyang iotamudelta jeffdaily Pull Request resolved: https://github.com/pytorch/pytorch/pull/32669 Differential Revision: D20033893 Pulled By: zou3519 fbshipit-source-id: fd6ddc8cdcd3930f41008636bb2bc9dd26cdb008	2020-02-21 12:10:02 -08:00
Stas Bekman	ca8e025cdf	improve the doc of enforce_sorted in pack_padded_sequence (#33617 ) Summary: this is a follow up PR to https://github.com/pytorch/pytorch/issues/33602: torch/nn/utils/rnn.html: `pack_padded_sequence` has a confusing and incomplete description of the `enforce_sorted` param. Currently it goes: ``` enforce_sorted (bool, optional): if ``True``, the input is expected to contain sequences sorted by length in a decreasing order. If ``False``, this condition is not checked. Default: ``True``. ``` The second part "this condition is not checked" (1) makes no sense since the alluded to condition is not described and (2) it's incomplete as it doesn't reflect the important part, that it actually does the sorting. I think it should say something like: ``` enforce_sorted (bool, optional): if ``True``, the input is expected to contain sequences sorted by length in a decreasing order. If ``False``, the input will get sorted unconditionally. Default: ``True``. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33617 Differential Revision: D20035131 Pulled By: albanD fbshipit-source-id: 654382eb0cb62b5abc78497faa5b4bca42db5fda	2020-02-21 11:51:08 -08:00
Yash	293fa5fc44	[Documentation] Fix minor typo in torch.serialization (#33549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33549 Differential Revision: D20002545 Pulled By: albanD fbshipit-source-id: 46fe2002329e5250c009eb066432909b71ecd74d	2020-02-21 09:29:13 -08:00
nicolov	e77abb9a5b	Normalize reward-to-go in C++ actor-critic (#33550 ) Summary: Comparing to the [Python implementation](https://github.com/pytorch/examples/blob/master/reinforcement_learning/actor_critic.py), it seems like the tensor of normalized reward-to-go is computed but never used. Even if it's just an integration test, this PR switches to the normalized version for better convergence. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33550 Differential Revision: D20024393 Pulled By: yf225 fbshipit-source-id: ebcf0fee14ff39f65f6744278fb0cbf1fc92b919	2020-02-21 09:19:39 -08:00
davidriazati	ee28831341	[jit] Fix aug assign for non-tensor attributes (#32993 ) Summary: Instead of erroring out this de-sugars augmented assignments to class members from `self.a += 1` to `self.a = self.a + 1`. Fixes #32973 ](https://our.intern.facebook.com/intern/diff/19737636/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32993 Pulled By: driazati Differential Revision: D19737636 fbshipit-source-id: 07307cde88d8c348a7affdafe26db21c74e28ec0	2020-02-21 08:42:35 -08:00
Nathan Goldbaum	fa80299bdf	__torch_function__ overrides for torch.functional and torch.nn.functional (#32799 ) Summary: This adds `__torch_function__` support for all functions in `torch.functional` and `torch.nn.functional`. The changes to C++ code and codegen scripts are to facilitate adding `__torch_function__` support for the native functions in `torch._C._nn`. Note that I moved the `handle_torch_function` C++ function to a header that both `python_torch_functions.cpp` and `python_nn_functions.cpp` include. The changes to `python_nn_functions.cpp` mirror the changes I made to `python_torch_functions.cpp` when `__torch_function__` support was first added in https://github.com/pytorch/pytorch/issues/27064. Due to the somewhat different way the `torch._C` and `torch._C._nn` namespaces are initialized I needed to create a new static reference to the `torch._C._nn` namespace (`THPNNVariableFunctions`). I'm not sure if that is the best way to do this. In principle I could import these namespaces in each kernel and avoid the global variable but that would have a runtime cost. I added `__torch_function__` support to the Python functions in `torch.nn.functional` following the approach in https://github.com/pytorch/pytorch/issues/32194. I re-enabled the test that checks if all functions in the `torch` namespace are explicitly tested for `__torch_function__` support. I also generalized the check to work for `torch.functional` and `torch.nn.functional` as well. This test was explicitly disabled in https://github.com/pytorch/pytorch/issues/30730 and I'm happy to disable it again if you think that's appropriate. I figured now was as good a time as any to try to re-enable it. Finally I adjusted the existing torch API tests to suppress deprecation warnings and add keyword arguments used by some of the code in `torch.nn.functional` that were missed when I originally added the tests in https://github.com/pytorch/pytorch/issues/27064. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32799 Differential Revision: D19956809 Pulled By: ezyang fbshipit-source-id: 40d34e0109cc4b9f3ef62f409d2d35a1d84e3d22	2020-02-21 08:38:37 -08:00
Hong Xu	6cec555926	Replace AT_CHECK with TORCH_CHECK in torch/csrc/jit/pybind_utils.h (#33524 ) Summary: This is generating a considerable amount of warning, due to the fact that the header file is included in multiple places. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33524 Differential Revision: D20006604 Pulled By: ezyang fbshipit-source-id: 0885cd2a708679ba5eeabb172366eb4c5a3bbef4	2020-02-21 08:38:32 -08:00
Edward Yang	90f4c5695e	Revert "Revert D19975411: Remove special case codegen for tril_indices/triu_indices." (#33572 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33572 This reverts commit 687a7e4a2566861c53c8fb53a80b198465168b38. Original PR #33305 Reland with BC tests whitelisted. See https://github.com/pytorch/pytorch/issues/33580 for reasoning why this change is not actually BC breaking. Test Plan: Imported from OSS Differential Revision: D20011011 Pulled By: ezyang fbshipit-source-id: 116374efc93af12b8ad738a0989d6f0daa9569e2	2020-02-21 08:36:32 -08:00
Hong Xu	e2a9ea0f72	Ensure that lambda is no less than zero in softshrink (#33201 ) Summary: Softshrink is ill-defined when `lambda < 0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33201 Differential Revision: D19899571 Pulled By: ezyang fbshipit-source-id: ac0dd8edea3435810a76a3a88152f83a024c7859	2020-02-21 08:34:06 -08:00
Hong Xu	a6a72ac68f	Fix all occurrences of C416. (#33429 ) Summary: C416: Unnecessary (list/set) comprehension - rewrite using list/set(). See https://pypi.org/project/flake8-comprehensions/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/33429 Differential Revision: D19972858 Pulled By: ezyang fbshipit-source-id: faac042a94c59d737bd5ae983121a0a029346e23	2020-02-21 08:32:22 -08:00
Xiang Gao	4588f49f68	Kill cudaDeviceAllocator in THCState (#33380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33380 Differential Revision: D19973151 Pulled By: ezyang fbshipit-source-id: 41634c43b28ca723e39e761afd32e5015e122368	2020-02-21 08:06:11 -08:00
Nikolay Korovaiko	a943b0518b	strict check for a device type in Fuser (#33025 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33025 Differential Revision: D19975873 Pulled By: Krovatkin fbshipit-source-id: 57f160bec9e4285dda63611f12665264754aac32	2020-02-20 23:53:27 -08:00
Xiang Gao	e8a03438cc	Make TestCuda.test_memory_stats more robust (#33575 ) Summary: IIUC Python does not guarantee when an object is garbage collected. So it is possible that, some other test running before `TestCuda.test_memory_stats` creates object which is only garbage collected during `TestCuda.test_memory_stats`, causing mem stats to change and causing this test to fail. This kind of failure is very hard to debug (it took me and mcarilli and ptrblck quite a while to figure out what is happening), and it is the root cause of mcarilli's gradient scaling PR https://github.com/pytorch/pytorch/pull/26512 failing on Windows. cc: csarofeen Pull Request resolved: https://github.com/pytorch/pytorch/pull/33575 Differential Revision: D20009260 Pulled By: ngimel fbshipit-source-id: 62f2716aefac3aa6c7d1898aa8a78e6b8aa3075a	2020-02-20 21:02:55 -08:00
Jiakai Liu	009293ec5c	[pytorch][size] remove unused SparseCPUType from mobile build (#33517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33517 I don't think any mobile model uses SparseCPU backend yet so we can skip generating dispatch code for this backend type. This will help reduce mobile code size with dynamic dispatch turned on, roughly ~100K for uncompressed iOS: D19616007 +413K v.s. D19616016 +319K. It probably doesn't affect much static dispatch build size as the unused static dispatch methods will be stripped by linker in the end. ghstack-source-id: 98615810 Test Plan: - CI & BuildSizeBot Reviewed By: linbinyu Differential Revision: D19978633 fbshipit-source-id: 27bf6ada2ba98482084cf23724cf400b538b0a03	2020-02-20 20:12:36 -08:00
Zachary DeVito	ac9b40164d	Use cheaper check in isTensorList (#33528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33528 Test Plan: Imported from OSS Reviewed By: ajyu Differential Revision: D19989166 Pulled By: zdevito fbshipit-source-id: b0c484e037ca48226ed4d9204a06982e0c627ff0	2020-02-20 20:10:51 -08:00
svcscm	d3d975cbf6	Updating submodules Summary: GitHub commits: `a16cb11a77` `d92f4e3e1e` `d021412065` `a7c056b5b4` `ac6d53d1c9` `d75ce0a8ae` `622abbcbb3` `e1f7368d51` `dc2e654b75` `50c9e44631` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 452151a75a70f744cba309b2700f274275d476bd	2020-02-20 18:25:57 -08:00
Jeremy Lilley	9266bde970	[pytorch] Minor: add GIL assert to PythonRpcHandler::handleExceptionGILHeld (#33557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33557 We should add GIL asserts in some places to keep assumptions documented. This just adds one in an exception codepath as a placeholder for more. This change also moves a #define from a .h to the .cpp to reduce scope. ghstack-source-id: 98673532 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D20005387 fbshipit-source-id: b7eff54a6f1dd69d199f8ca05cdb3001c50b37c4	2020-02-20 18:15:44 -08:00
Michael Suo	0bde610c14	Re-sync with internal repository (#33591 )	2020-02-20 16:46:16 -08:00
Hao Lu	3498c000e2	[TVM] Remove dynamic batch size dispatching (#33584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33584 - Remove dynamic batch size dispatching - Set caffe2_tvm_min_ops to 8 - Set caffe2_tvm_profiling_based_jit to false - Rename some variable names Test Plan: buck test caffe2/caffe2/fb/tvm:test_tvm_transform Reviewed By: yinghai Differential Revision: D19850620 fbshipit-source-id: 2ec9bbd9fa72f953e79f3e27609ad00d4e135710	2020-02-20 16:13:29 -08:00
Elias Ellison	faa800eb5b	[JIT] remove inline everything jitter skip (#33468 ) Summary: The `not inline_everything` check was causing the jitter check to be skipped whenever we emitted a function. thanks SplitInfinity for pointing this out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33468 Differential Revision: D19975934 Pulled By: eellison fbshipit-source-id: 03faf8d2fd93f148100d8cf49cb67b8e15cf1f04	2020-02-20 15:58:25 -08:00
Peter Bell	c882425c24	Add 64-bit indexing support to THC index reductions (#33405 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32863, (together with https://github.com/pytorch/pytorch/issues/33310 for the `TensorIterator` reductions) This adds 64-bit indexed kernels for `THC_reduceDimIndex` and uses `THCTensor_canUse32BitIndexMath` to switch between the two at runtime. I have a test for this locally but haven't included it here because `max` is much slower than `argmax`. To the point where the test takes several minutes to call max on just one `2**32` element tensor. That seems excessive, even for a slow test but I can push it if preferred. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33405 Differential Revision: D20010769 Pulled By: ezyang fbshipit-source-id: a8a86f662598d5fade4d90448436418422c699a3	2020-02-20 15:20:14 -08:00
Igor Sugak	23846d5a38	[caffe2] use Clang identification macro in various places (#33574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33574 Sprinkle with Clang identification macro places that otherwise would cause build errors when Clang is used to drive the CUDA compilation. Note: `__clang__` is defined when either Clang is used as host compiler by NVCC or when Clang drives the compilation. `__CUDA__` is defined only for the latter case. Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow ``` Reviewed By: BIT-silence Differential Revision: D20007440 fbshipit-source-id: 53caa70695b99461a3910d41dc71a9f6d0728a75	2020-02-20 15:16:11 -08:00
Martin Yuan	5782758b54	Add instructions and operators for new bytecode format of PyText model (#33555 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33555 A quick fix for the PyText model (in internal production) on the new bytecode format. Test Plan: Imported from OSS Differential Revision: D20008266 Pulled By: iseeyuan fbshipit-source-id: 1916bd0bf41093898713c567c7f6fa546b9ea440	2020-02-20 15:05:37 -08:00
Igor Sugak	108fc78395	[caffe2] fix invalid % escape in inline assembly strings (#33554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33554 NVCC/GCC accepts the existing syntax, but not Clang which requires a proper escape. Here `%laneid` is one of the many registers that CUDA's pseudo-asm provides [1]. And using the extra `%` doesn't change the semantics, as PTX expects `%laneid` value after it's processed by the asm tool. 1. https://docs.nvidia.com/cuda/parallel-thread-execution/index.html Test Plan: ```lang=bash buck build mode/opt -c fbcode.cuda_use_clang=true //fblearner/flow/projects/dper:workflow buck build mode/opt //fblearner/flow/projects/dper:workflow Reviewed By: bddppq Differential Revision: D20003621 fbshipit-source-id: 8e550e55a3455925e7bd92c6df3e504b5d38c2dc	2020-02-20 14:31:52 -08:00
anjali411	e5cf7afd0a	torch.tensor can infer complex dtype now (#33361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33361 Test Plan: Imported from OSS Differential Revision: D19943477 Pulled By: anjali411 fbshipit-source-id: ff6d7d2a6fdb6c58390f33bdd8be2f3fa182518b	2020-02-20 14:24:15 -08:00
anjali411	13e4ee7883	Added tensor.is_complex(), is_complex and dtype.is_complex py binding, tensor printing, and dixed the scalar type returned for complex float (#33268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33268 Test Plan: Imported from OSS Differential Revision: D19907698 Pulled By: anjali411 fbshipit-source-id: c3ce2e99fc09da91a90a8fb94e5525a00bb23703	2020-02-20 13:38:01 -08:00
Nick Korovaiko	36d724c963	run peephole to do profile-based optimizations (#33337 ) Summary: We need to run a peephole before constant propagation in the profiling pipeline, so we fold `prim::shape` for inputs with complete tensor types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33337 Differential Revision: D19905624 Pulled By: Krovatkin fbshipit-source-id: 80fff067941556053847ddc7afe0fd1c7a89a3ba	2020-02-20 12:39:22 -08:00
vishwakftw	1a25747342	Check for consistent devices in at::where (#33432 ) Summary: Changelog: - Add a check to ensure that all inputs to `where` lie on the same device Pull Request resolved: https://github.com/pytorch/pytorch/pull/33432 Test Plan: - Added test_where_invalid_device Fixes https://github.com/pytorch/pytorch/issues/33422 Differential Revision: D19981115 Pulled By: VitalyFedyunin fbshipit-source-id: 745896927edb53f61f3dd48ba9e1e6cd10d35434	2020-02-20 12:18:01 -08:00
Edward Yang	71225ecc8c	Revert D20006312: Revert D19975410: Update documentation on why _cudnn_init_dropout_state looks the way it is. Test Plan: revert-hammer Differential Revision: D20006312 Original commit changeset: 4d4cc8ae78ad fbshipit-source-id: 4bd4b9d1331dc97f5b83e0df491be5fd0a11214a	2020-02-20 12:05:13 -08:00
Vitaly Fedyunin	687a7e4a25	Revert D19975411: Remove special case codegen for tril_indices/triu_indices. Test Plan: revert-hammer Differential Revision: D19975411 Original commit changeset: 996598759bed fbshipit-source-id: 6bdb4b8f903e13815fc146e6f3260e5bb04c1045	2020-02-20 11:29:53 -08:00
Nikolay Novik	d19a50bf27	Add missing weight_decay parameter validation for Adam and AdamW (#33126 ) Summary: Adam and AdamW are missing parameter validation for weight_decay. Other optimisers have this check present. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33126 Differential Revision: D19860366 Pulled By: vincentqb fbshipit-source-id: 286d7dc90e2f4ccf6540638286d2fe17939648fc	2020-02-20 11:11:51 -08:00
Edgar Andrés Margffoy Tuay	cdf381c967	Fix LambdaLR scheduler side effects (#32848 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32756 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32848 Differential Revision: D19859736 Pulled By: vincentqb fbshipit-source-id: 43b3cbb2b6bed208c75aad37aebc2a8a9565fe0d	2020-02-20 11:09:56 -08:00
Vitaly Fedyunin	3233033a17	Revert D19975410: Update documentation on why _cudnn_init_dropout_state looks the way it is. Test Plan: revert-hammer Differential Revision: D19975410 Original commit changeset: eb729870c2d2 fbshipit-source-id: 4d4cc8ae78ad18751c126b93d82932ac2732f1b5	2020-02-20 11:01:44 -08:00
Jithun Nair	718c538ff9	Add ability to enable/disable MIOpen at runtime (#33118 ) Summary: 1. Set `torch._C.has_cudnn` to `True` for ROCm 2. Make MIOpen invocations respect value of `cudnn_enabled` or `at::globalContext().userEnabledCuDNN()` 3. `torch/backends/cudnn/__init__.py`: Add hip-specific changes (use "hide whitespace changes" option to view simpler diff) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33118 Differential Revision: D19977719 Pulled By: bddppq fbshipit-source-id: 64d4dd1d78afcf96201360d85b8be5950f96dfad	2020-02-20 10:47:57 -08:00
Yanli Zhao	01e1de8220	allow remote torchscript call to itself (#32990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32990 right now remote torchscript call can not call to itself, this diff is to support this in the same way as how is supported when calling remote python call to itself ghstack-source-id: 98599082 Test Plan: unit test Differential Revision: D19731910 fbshipit-source-id: 6495db68c3eaa58812aa0c5c1e72e8b6057dc5c4	2020-02-20 09:44:10 -08:00
Edward Yang	a9e4448dff	Update documentation on why _cudnn_init_dropout_state looks the way it is. (#33347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33347 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19975410 Pulled By: ezyang fbshipit-source-id: eb729870c2d279d7d9ca43c92e514fe38dedb06d	2020-02-20 09:36:26 -08:00
Edward Yang	196fda5a79	Remove special case codegen for tril_indices/triu_indices. (#33305 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33305 The current TensorOptions code is written to exactly extract out TensorOptions based on exact struct match, including default arguments. That meant that tril_indices/triu_indices which had a different default argument didn't match, and thus needed a special case. I resolve this special case by instead replacing the explicit long default argument with a None default argument, and then adjusting the actual implementations to select the correct dtype when none was specified. I think the general rule I'm following here is that it is always acceptable to replace an explicit default argument, with a None argument (assuming the backend will compute it appropriately); the documentation gets modestly worse, but everything that was previously expressible continues to be expressible. Maybe later we should switch the default argument back to long, but for now the simplification in code is worth it. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19975411 Pulled By: ezyang fbshipit-source-id: 996598759bed9e8d54fe61e19354ad038ed0e852	2020-02-20 09:34:28 -08:00
peter	ffe327f7d9	Revert "Disable flaky test TestCppExtensionAOT.test_cuda_extension in… (#33404 ) Summary: … Windows CI (https://github.com/pytorch/pytorch/issues/33282)" This reverts commit 5b922918d023126ad1f468c68577c9b599ad202d. Fixes https://github.com/pytorch/pytorch/issues/33270. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33404 Differential Revision: D19972594 Pulled By: ezyang fbshipit-source-id: c8f67536fd6e4b7135171d621ad671b1b2a21fd4	2020-02-20 09:08:29 -08:00
Vitaly Fedyunin	05fb160048	Revert D19964089: [pytorch][PR] Allow vectorized gpu loop to have different argument types Test Plan: revert-hammer Differential Revision: D19964089 Original commit changeset: a1e8e62d1ebc fbshipit-source-id: fee9423d5924714f0e92eea712cde2d2163b3cf0	2020-02-20 08:19:21 -08:00
Edward Z. Yang	883b18ea70	Delete build_variables.bzl following configerator change. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2020-02-20 10:26:49 -05:00
Jongsoo Park	e95282ab28	[caffe2] make fused rowwise quant/dequant op work for N-dim tensors (#33426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33426 Make 2/4/8-bit fused rowwise conversion operators more general to work for N-dim tensors Test Plan: CI Reviewed By: ellie-wen Differential Revision: D19943136 fbshipit-source-id: 47008544dd7e1d11a346d34f35449e0fcc0e7ee0	2020-02-19 23:29:42 -08:00
Spandan Tiwari	bf0951d937	Updating ONNX checker logic. (#33522 ) Summary: We want to run ONNX checker only when selected operator type is ONNX, and nowhere else. This PR updates the logic in the exporter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33522 Reviewed By: hl475 Differential Revision: D19983954 Pulled By: houseroad fbshipit-source-id: 15db726321637a96fa110051cc54e9833e201133	2020-02-19 19:30:29 -08:00
Gao, Xiang	1fe635be3c	Allow vectorized gpu loop to have different argument types (#33222 ) Summary: Although currently the only user of GPU loops that has args with different dtypes is `where`, it sounds strange to restrict the args to have the same dtype. Allowing args to have different dtypes also makes it possible for me to clean up legacy code by reusing current code to implement unrolled GPU loop for non-contiguous tensors. The stack storage of `elementwise_kernel_helper` is changed from `arg_t args[nt][arity]` to `traits:: ArgsTuple args[nt]`. Due to this change, we can no longer get element by `operator[]`, but instead we should use `std::get`. As a result, we can no longer unroll the loop wrt arity using pragma, but we have to create a `static_unroll` to make use of template meta-programming to do the same job. A good side effect of this change is, `invoke_with_array` is no longer needed and can be replaced with already existing `c10::guts::apply`. And we don't need the `namespace arg_type` workaround either. This makes the code less ugly. The same approach might also work for ROCm loops, but I didn't change anything on ROCm in this PR, because I don't want potential compilation error or perf regression to delay this PR. But after this gets merged, I will try on ROCm and send a separate PR to make the code less diverge if the same approach trivially applies (trivially apply means a mindless copy-paste doesn't introduce unexpected compilation error or perf regression). Assembly (https://github.com/zasdfgbnm/things/blob/master/2020Q1/disassembly-elementwise-vec.ipynb#33222): ``` Symbol: void at::native::modern::elementwise_kernel<4, 64, 4, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char, 3> >(int, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char, 3>) ASM: .section .text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,"ax",progbits .sectioninfo @"SHI_REGISTERS=20" .align 128 .global _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_ .type _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,function .size _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,(.L_40520 - _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_) .other _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,@"STO_CUDA_ENTRY STV_DEFAULT" _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_: .text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 253 /0000/ IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ; /0010/ @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ; /0020/ S2R R9, SR_CTAID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 39 /0030/ S2R R0, SR_TID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 253 /0040/ IMAD.SHL.U32 R9, R9, 0x100, RZ ; /0050/ IADD3 R5, -R9, c[0x0][0x160], RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227 /0060/ SHF.R.S32.HI R17, RZ, 0x1f, R9 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 255 /0070/ ISETP.GE.AND P0, PT, R5, 0x100, PT ; /0080/ @!P0 BRA `(.L_2919) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227 /0090/ IMAD.SHL.U32 R12, R9.reuse, 0x4, RZ ; /00a0/ SHF.L.U64.HI R17, R9, 0x2, R17 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 229 /00b0/ IADD3 R8, P0, R12.reuse, c[0x0][0x188], RZ ; /00c0/ IADD3 R2, P1, R12, c[0x0][0x190], RZ ; /00d0/ IADD3.X R9, R17.reuse, c[0x0][0x18c], RZ, P0, !PT ; /00e0/ IADD3.X R3, R17, c[0x0][0x194], RZ, P1, !PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 82 /00f0/ IMAD.WIDE R8, R0, 0x10, R8 ; /0100/ IMAD.WIDE R2, R0, 0x10, R2 ; /0110/ LDG.E.128.SYS R8, [R8] ; /0120/ LDG.E.128.SYS R4, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 227 /0130/ IADD3 R12, P0, R12, c[0x0][0x180], RZ ; /0140/ IADD3.X R13, R17, c[0x0][0x184], RZ, P0, !PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102 /0150/ IMAD.WIDE R12, R0, 0x10, R12 ; //## File "/usr/include/c++/8/tuple", line 1315 /0160/ FFMA R7, R7, c[0x0][0x168], R11 ; /0170/ FFMA R6, R6, c[0x0][0x168], R10 ; /0180/ FFMA R5, R5, c[0x0][0x168], R9 ; /0190/ FFMA R4, R4, c[0x0][0x168], R8 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102 /01a0/ STG.E.128.SYS [R12], R4 ; /01b0/ EXIT ; .L_2919: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /01c0/ ISETP.GE.AND P0, PT, R0, R5, PT ; /01d0/ BMOV.32.CLEAR RZ, B0 ; /01e0/ BSSY B0, `(.L_2920) ; /01f0/ IMAD.MOV.U32 R4, RZ, RZ, RZ ; /0200/ CS2R R6, SRZ ; /0210/ IMAD.MOV.U32 R8, RZ, RZ, RZ ; /0220/ IMAD.MOV.U32 R10, RZ, RZ, RZ ; /0230/ P0 BRA `(.L_2921) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /0240/ IADD3 R3, P1, R9, R0, RZ ; /0250/ LEA.HI.X.SX32 R6, R0, R17, 0x1, P1 ; /0260/ LEA R2, P1, R3, c[0x0][0x188], 0x2 ; /0270/ LEA.HI.X R3, R3, c[0x0][0x18c], R6, 0x2, P1 ; /0280/ LDG.E.SYS R10, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0290/ IADD3 R6, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /02a0/ ISETP.GE.AND P1, PT, R6, R5, PT ; /02b0/ P1 BRA `(.L_2922) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /02c0/ LDG.E.SYS R6, [R2+0x100] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /02d0/ IADD3 R8, R0, 0x80, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /02e0/ ISETP.GE.AND P1, PT, R8, R5, PT ; /02f0/ P1 BRA `(.L_2923) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0300/ IADD3 R8, R0, 0xc0, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /0310/ ISETP.GE.AND P1, PT, R8, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /0320/ LDG.E.SYS R8, [R2+0x200] ; /0330/ @!P1 LDG.E.SYS R7, [R2+0x300] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 102 /0340/ P1 IMAD.MOV.U32 R7, RZ, RZ, RZ ; /0350/ BRA `(.L_2921) ; .L_2923: /0360/ IMAD.MOV.U32 R7, RZ, RZ, RZ ; /0370/ IMAD.MOV.U32 R8, RZ, RZ, RZ ; /0380/ BRA `(.L_2921) ; .L_2922: /0390/ CS2R R6, SRZ ; /03a0/ IMAD.MOV.U32 R8, RZ, RZ, RZ ; .L_2921: /03b0/ BSYNC B0 ; .L_2920: /03c0/ BMOV.32.CLEAR RZ, B0 ; /03d0/ BSSY B0, `(.L_2924) ; /03e0/ P0 BRA `(.L_2925) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /03f0/ IADD3 R3, P1, R9, R0, RZ ; /0400/ LEA.HI.X.SX32 R12, R0, R17, 0x1, P1 ; /0410/ LEA R2, P1, R3, c[0x0][0x190], 0x2 ; /0420/ LEA.HI.X R3, R3, c[0x0][0x194], R12, 0x2, P1 ; /0430/ LDG.E.SYS R11, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0440/ IADD3 R12, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /0450/ ISETP.GE.AND P1, PT, R12, R5, PT ; /0460/ P1 BRA `(.L_2926) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /0470/ LDG.E.SYS R13, [R2+0x100] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /0480/ IADD3 R12, R0, 0x80, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /0490/ ISETP.GE.AND P1, PT, R12, R5, PT ; /04a0/ P1 BRA `(.L_2927) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /04b0/ LDG.E.SYS R15, [R2+0x200] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 46 /04c0/ IADD3 R12, R0, 0xc0, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 42 /04d0/ ISETP.GE.AND P1, PT, R12, R5, PT ; /04e0/ P1 BRA `(.L_2928) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 45 /04f0/ LDG.E.SYS R4, [R2+0x300] ; /0500/ BRA `(.L_2928) ; .L_2927: /0510/ IMAD.MOV.U32 R15, RZ, RZ, RZ ; /0520/ BRA `(.L_2928) ; .L_2926: /0530/ IMAD.MOV.U32 R15, RZ, RZ, RZ ; /0540/ IMAD.MOV.U32 R13, RZ, RZ, RZ ; /0550/ BRA `(.L_2928) ; .L_2925: /0560/ IMAD.MOV.U32 R15, RZ, RZ, RZ ; /0570/ IMAD.MOV.U32 R13, RZ, RZ, RZ ; /0580/ IMAD.MOV.U32 R11, RZ, RZ, RZ ; .L_2928: /0590/ BSYNC B0 ; .L_2924: //## File "/usr/include/c++/8/tuple", line 1315 /05a0/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /05b0/ IADD3 R9, P0, R9, R0, RZ ; //## File "/usr/include/c++/8/tuple", line 1315 /05c0/ FFMA R11, R11, c[0x0][0x168], R10 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59 /05d0/ IADD3 R14, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /05e0/ LEA.HI.X.SX32 R12, R0, R17, 0x1, P0 ; /05f0/ LEA R2, P0, R9.reuse, c[0x0][0x180], 0x2 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0600/ ISETP.GE.AND P1, PT, R14, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /0610/ LEA.HI.X R3, R9, c[0x0][0x184], R12, 0x2, P0 ; /0620/ STG.E.SYS [R2], R11 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0630/ P1 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59 /0640/ IADD3 R10, R0, 0x80, RZ ; //## File "/usr/include/c++/8/tuple", line 1315 /0650/ FFMA R13, R13, c[0x0][0x168], R6 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0660/ ISETP.GE.AND P0, PT, R10, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /0670/ STG.E.SYS [R2+0x100], R13 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /0680/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 59 /0690/ IADD3 R0, R0, 0xc0, RZ ; //## File "/usr/include/c++/8/tuple", line 1315 /06a0/ FFMA R15, R15, c[0x0][0x168], R8 ; /06b0/ FFMA R7, R4, c[0x0][0x168], R7 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /06c0/ ISETP.GE.AND P0, PT, R0, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /06d0/ STG.E.SYS [R2+0x200], R15 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 55 /06e0/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 58 /06f0/ STG.E.SYS [R2+0x300], R7 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/CUDALoops.cuh", line 260 /0700/ EXIT ; .L_2929: /0710/ BRA `(.L_2929); /0720/ NOP; /0730/ NOP; /0740/ NOP; /0750/ NOP; /0760/ NOP; /0770/ NOP; .L_40520: ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33222 Differential Revision: D19964089 Pulled By: ngimel fbshipit-source-id: a1e8e62d1ebcc67fb49f00d87c02bcdd13194024	2020-02-19 18:41:27 -08:00
Hao Lu	81394581a3	[Caffe2][ThreadPool] Make sure numThreads does not exceed the number of big cores (#33523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33523 When using `ThreadPool::setNumThreads` to set the number of threads, it should not exceed the number of big cores. Otherwise, the performance could degrade significantly. Test Plan: ``` cd ~/fbsource/xplat buck test caffe2:caffe2_testAndroid ``` Reviewed By: dreiss Differential Revision: D19779267 fbshipit-source-id: 4e980e8a0ccc2f37e1c8ed16e2f4651d72924dbd	2020-02-19 18:24:24 -08:00
Nik Ved	602ef0d9d0	[WIP] migrate scatter_ to ATen CPU (+multithreading, nondeterministic) (#33139 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/24757, partially https://github.com/pytorch/pytorch/issues/33094. Uses fix introduces in https://github.com/pytorch/pytorch/issues/33108 to avoid regressions for some compilers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33139 Differential Revision: D19882462 Pulled By: ngimel fbshipit-source-id: 5016f186a4aadc3cc32edcfd9abdea11786f27e9	2020-02-19 18:17:37 -08:00
Rohan Varma	6cb9e6b015	Back out "Revert D19871946: [distributed] pass in timeout to TCP store when initializing" (#33434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33434 Reland of https://github.com/pytorch/pytorch/pull/33325, since the unit test was flaky and failed on land. To ensure that the test is not flaky, I bumped the timeout so the rendezvous does not timeout (timing out the rendezvous in 1s led to the flakiness). I also generalized our mechanism for retrying on errors to include retrying on errors due to timeout in rendezvous. ghstack-source-id: 98558377 Test Plan: Added UT test_tcp_store_timeout_set Differential Revision: D19935390 fbshipit-source-id: 56ccf8c333dd2f954a33614d35cd1642d4e9473a	2020-02-19 17:17:17 -08:00
Lingyi Liu	ecb05f12c3	Support broadcast for quantized mul kernel (#30442 ) Summary: Since the tensor iterator supports the broadcast, we will just remove the assertion on input shapes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30442 Differential Revision: D19976562 Pulled By: lly-zero-one fbshipit-source-id: 91b27fc8b2570f29d110c6df26eacdd16f587b9f	2020-02-19 16:52:31 -08:00
Vitaly Fedyunin	ea514c819a	Make slow_conv_transpose2d_backward tensors contiguous (#33462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33462 Test Plan: Imported from OSS Differential Revision: D19956516 Pulled By: VitalyFedyunin fbshipit-source-id: 4fa9dcba0dd02b891ab36e6ecee8fc59e049c15c	2020-02-19 16:44:14 -08:00
Gaurav Singh	e5a02aa2fe	[caffe2] simplify relative error expr (#32999 ) Summary: simplify relative error expr Pull Request resolved: https://github.com/pytorch/pytorch/pull/32999 Differential Revision: D19739382 Pulled By: jerryzh168 fbshipit-source-id: 95e0c68f6d9cb6708f400cc1cdb311af83b0621e	2020-02-19 16:35:44 -08:00
Zhu, Haozhe	bd3c6e8e91	avoid large vector copy when query per_channel q_params (#31040 ) Summary: The quantizer use std::vector to save per_channel scales and zero_points, but when query scales(zero_points), it requires to return tensor. These lead to use std::vector to initialize tensors and it dose cost lots of time. So I change quantizer to save per_channel scales and zero_points by using tensor directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31040 Differential Revision: D19701070 Pulled By: jerryzh168 fbshipit-source-id: 9043f16c44b74dd8289b8474e540171765a7f92a	2020-02-19 16:24:24 -08:00
Jerry Zhang	8527ba8b70	[jit] Add None parameter as parameter instead of attributes (#32964 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32964 att Test Plan: . Imported from OSS Differential Revision: D19913188 fbshipit-source-id: 9cdd93cbaf9892f4311656c786637765a675a68c	2020-02-19 16:06:56 -08:00
Omkar Salpekar	507f963aa6	[RPC Reliability] Enabled retries for RPCs with exponential backoff (#33365 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33365 This adds functionality for re-trying RPC's that are sent with the function sendWithRetries(). It adds RPC's that will potentially need to be retried to a sorted map that contains the timeout at which to retry the RPC and associated metadata. A separate thread iteratively removes the earliest retry-able RPC from the map, sleeps until the corresponding time point, re-tries the RPC, and adds to the map again with a future timeout. GitHub Issue: https://github.com/pytorch/pytorch/issues/32124 Per the first 4 milestones, the following will be addressed in future PR's: * enabling RPC Retries for RRef internal messages Differential Revision: D19915694 fbshipit-source-id: 4a520e32d5084ebcf90e97fd9f26867115a35c0c	2020-02-19 15:59:29 -08:00
Michael Suo	416413dec4	[jit] add `inlined_graph` method to ScriptFunctions (#33508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33508 Ever since we switched to not inlining by default, some users have complained since they relied on inlining occuring to, e.g. process the graph with some other tool. Add an inlined_graph for convenience in those cases. Test Plan: Imported from OSS Differential Revision: D19977638 Pulled By: suo fbshipit-source-id: fe1fa92ff888959203d5d1995930d488b5f9e24c	2020-02-19 15:41:25 -08:00
Hongzhang Shan	5e80ca12bb	[pt][fbgemm] Turn on USE_FBGEMM on Windows env (#297 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/297 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33250 As Title says. FBGEMM has recently added the support for Windows. ghstack-source-id: 97932881 Test Plan: CI Reviewed By: jspark1105 Differential Revision: D19738268 fbshipit-source-id: e7f3c91f033018f6355edeaf6003bd2803119df4	2020-02-19 15:09:21 -08:00
Michael Suo	cbf8657945	[jit] Fix ModuleDict type sharing (#33515 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33515 Previously, if we had a `ModuleDict` with the same value types but different names for keys, they would share types under certain conditions. This only happens for `ModuleDict`, because in other cases a simple Python class check invalidates the class. Test Plan: Imported from OSS Differential Revision: D19978552 Pulled By: suo fbshipit-source-id: f31b2af490064f89b70aa35f83ba740ddaf2a77a	2020-02-19 15:01:46 -08:00
albanD	8908b62fb2	Clean views created inside no_grad that are modified inplace (#32839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32839 As mentioned in the updated comment in `variable.h`, this disambiguate code like: ```python base = torch.rand(10, requires_grad=True) with torch.no_grad(): view = base[1] view.copy_(var) torch.autograd.grad(base.sum(), var) # <- what should it return? ``` Given that there is no consensus of what should happen here (does the gradient flow through the view in the no_grad or not). This special case is detected and forbidden. As mentionned in the error message: - If you want it to be tracked: move both out of the no_grad - If do not want them to be tracked, move both inside the no_grad This implies that any custom Function that returns views does not allow inplace modification on its output. I'll add a PR to the stack to relax this to be a DeprecationWarning for now. And we will make it into an actual error for 1.6 This replaces https://github.com/pytorch/pytorch/pull/26607 cc sublee Test Plan: Imported from OSS Differential Revision: D19814114 Pulled By: albanD fbshipit-source-id: ff2c9d97c8f876d9c31773a2170e37b06d88bed7	2020-02-19 14:55:53 -08:00
Michael Suo	20c1e25832	Re-sync with internal repository (#33519 )	2020-02-19 14:33:44 -08:00
Matthew Haines	1d9fcf8bd2	Correct documentation for torch.unsqueeze (#33478 ) Summary: "out" argument in torch.unsqueeze is not actually implemented, fixed documentation https://github.com/pytorch/pytorch/issues/29800 After: ![image](https://user-images.githubusercontent.com/33493903/74796371-6289ee00-5296-11ea-8493-e8c18ac63bdf.png) Before: ![image](https://user-images.githubusercontent.com/33493903/74796444-96651380-5296-11ea-816c-2adacfa79e35.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33478 Differential Revision: D19978477 Pulled By: yf225 fbshipit-source-id: 42337326c1ec04975307366c94591ee32a11b091	2020-02-19 14:01:06 -08:00
Ailing Zhang	62c953b348	Fix svd tests between devices. (#33470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33470 Differential Revision: D19974449 Pulled By: ailzhang fbshipit-source-id: e456608fe95d270d822e786a5955cce7c746165c	2020-02-19 13:53:10 -08:00
anjali411	a8bd1d24c9	[Documentation] cummin doc fix (#33492 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33492 Differential Revision: D19976082 Pulled By: anjali411 fbshipit-source-id: c9f8f541783fded98b8aba54e293f824c926496e	2020-02-19 13:51:38 -08:00
Mikhail Zolotukhin	d4e4513a64	[JIT] Add more ops to 'removableGuard' in guard elimination pass. (#33465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33465 Differential Revision: D19958385 Test Plan: Imported from OSS Pulled By: ZolotukhinM fbshipit-source-id: f89b6a2ead279b55af286072223fc9ea1b5fe3b3	2020-02-19 11:47:23 -08:00
Jerry Zhang	07e5e42713	[jit][fix] Remove slot in parameter slot (#32846 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32846 att Test Plan: build/bin/test_jit Imported from OSS Differential Revision: D19844711 fbshipit-source-id: 3d29e5e97e97781f5dc00069827971baed52d76e	2020-02-19 11:15:15 -08:00
ptrblck	1e3664b6ef	Remove c/pdist tests from _internal/common_utils.py (#33409 ) Summary: * remove brute_test from `torch/testing/_internal/common_utils.py` * add these tests as internal tests to `test_torch.py` CC ailzhang Pull Request resolved: https://github.com/pytorch/pytorch/pull/33409 Differential Revision: D19951729 Pulled By: ailzhang fbshipit-source-id: b1126aaf26fa64a0f17cbb582dc8038b79cfe3eb	2020-02-19 10:27:30 -08:00
matham	60339a38ed	Fixes #33001 (#33456 ) Summary: This fixes https://github.com/pytorch/pytorch/issues/33001. When subtracting 1 from a empty array, instead of being `-1` as seems to be expected in the later code (while loop), because `size()` seems to be unsigned, it becomes a very large number. This causes a segfault during the while loop later in the code where it tries to access a empty array. This issue seemed to happen only on the pi with the following example code: `v = torch.FloatTensor(1, 135).fill_(0); v[0, [1]] += 2`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33456 Differential Revision: D19963711 Pulled By: ezyang fbshipit-source-id: 1dbddd59a5df544cd7e025fc540c9efe2c4e19f4	2020-02-19 09:57:52 -08:00
Gao, Xiang	165b1ad8e8	Kill THCState_getNumDevices (#33375 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33375 Differential Revision: D19973163 Pulled By: ezyang fbshipit-source-id: d8edede3a3ac5012e4208bb30b6e66d8a2d1019f	2020-02-19 09:52:40 -08:00
Yinghai Lu	96e5dea9f4	Remove unused variable (#33484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33484 att Test Plan: unittests Reviewed By: jfix71 Differential Revision: D19862090 fbshipit-source-id: c6a33604e2fc78fb90ae2b5fcc72421ee89a02aa	2020-02-19 08:51:56 -08:00
Gregory Chanan	d7f00b1b45	Remove using declaration from widely-used header file. (#33293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33293 Test Plan: Imported from OSS Differential Revision: D19904992 Pulled By: gchanan fbshipit-source-id: b5ac76db2e5cdb422671c6c5424858e1d97c323e	2020-02-19 08:19:11 -08:00
peter	a67691e508	Fix isnan for integral types in MSVC (#33483 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/32537#discussion_r381077989. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33483 Differential Revision: D19970623 Pulled By: anjali411 fbshipit-source-id: 53502101822672a333ab5349d93b6e93f7ee4265	2020-02-19 08:13:03 -08:00
davidriazati	53ad596342	[jit] Remove `torch.jit._dump_trace (#33453 ) Summary: This was old code that isn't tested and is broken, it should have been deleted in #24874 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33453 Pulled By: driazati Differential Revision: D19961403 fbshipit-source-id: 94c52360460194d279dad5b0ea756ee366f525e1	2020-02-19 07:49:44 -08:00
svcscm	8b6a898d2b	Updating submodules Summary: GitHub commits: `d9ead2de34` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 6c245f2a656d30b7baf8d0bff85a49090174c289	2020-02-19 05:09:56 -08:00
Michael Suo	d13c1b8af8	[jit] de-optionalize SourceRange context (#32880 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32880 The PR below made it impossible to construct a SourceRange without a context, so get rid of its optional-ness Test Plan: Imported from OSS Differential Revision: D19670923 Pulled By: suo fbshipit-source-id: 05936fca2a3d5e613313ade9287b2210bc4a3ccd	2020-02-18 23:46:05 -08:00
Michael Suo	d85c913bfd	[jit] Delete the ErrorReport default constructor (#32879 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32879 An error report without a SourceRange context is bad, because it doesn't tell the user where something happened. Delete the default constructor to make it harder to create errors like this (you can still use a fake SourceRange if you absolutely need to). Also clean up the only case where the default constructor was used. Test Plan: Imported from OSS Differential Revision: D19670924 Pulled By: suo fbshipit-source-id: 46888a86e5d32b84c8d6d52c0c8d70243722b14a	2020-02-18 23:44:32 -08:00
Pieter Noordhuis	e9ac92a242	Make RPC message constructor actually move (#33440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33440 The constructors make a copy without `std::move` in the initializer list. Test Plan: Confirmed manually that without this change, the `data()` pointer of the vector changes. With this change it does not, as intended. Reviewed By: mrshenli Differential Revision: D19948685 fbshipit-source-id: ee4f22e29894b858ad86068722dc2f4651987517	2020-02-18 23:31:33 -08:00
svcscm	d50305e2f3	Updating submodules Summary: GitHub commits: `7903fc3142` `462eaef5fc` `e2966a7507` `09013ed8c4` `df7e47c39b` `f40e6d1dbf` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 37553007eb60438d5ddd9cb16f0edc24e4637c25	2020-02-18 23:27:08 -08:00
Xiang Gao	a5f01846c2	Kill THCState_getCurrentStream (#33376 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33376 Differential Revision: D19964101 Pulled By: ngimel fbshipit-source-id: d6b76327191a469f3a88a54d8ffe07121139ab16	2020-02-18 21:24:27 -08:00
Spandan Tiwari	96989a2a11	[ONNX] Adding ONNX large model export support in exporter (#33062 ) Summary: There are large models such as GPT2-large which cannot be exported with the current exporter because of the 2GB protobuf limit (e.g. see https://github.com/pytorch/pytorch/issues/19277). ONNX spec specifies a special format for large (> 2GB) models. This PR adds support for exporting large models in ONNX large model format in the PyTorch-ONNX exporter. This is the first PR for this feature that enables the end-to-end execution. Tests for large model export have been added. We may need follow-up PRs to refine this workflow based on user feedback. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33062 Reviewed By: hl475 Differential Revision: D19782292 Pulled By: houseroad fbshipit-source-id: e972fcb066065cae6336aa91c03023d9c41c88bd	2020-02-18 20:51:43 -08:00
Jerry Zhang	3ad59734d7	Add type annotation for bias in _ConvNd (#32885 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32885 Currently Tensor bias is registered as parameter and None bias is registered as attribute. We need the type annotation because when we try to fold ConvBn in graph mode quantization we'll remove the None bias attribute and add a Tensor bias attribute, without type annotation the bias Value in the graph will be marked with different type in these two cases, so we have rewrite the graph to change the type as well in that case. But with type annotation we don't need to modify the graph since both cases the bias value will have type `Tensor?` Test Plan: . Imported from OSS Differential Revision: D19844710 fbshipit-source-id: 52438bc72e481ab78560533467f9379a8b0b0cfa	2020-02-18 20:09:18 -08:00
Xingdong Zuo	feaa622fc6	[Update transforms.py]Add `TanhTransform` (#19785 ) Summary: Resolves https://github.com/pytorch/pytorch/issues/33195 Pull Request resolved: https://github.com/pytorch/pytorch/pull/19785 Differential Revision: D19642395 Pulled By: ezyang fbshipit-source-id: 73c386fb89cd195201757b5fa47d6c01914a1f8f	2020-02-18 17:42:10 -08:00
Ashkan Aliabadi	43e015f4b1	Bug fix in dynamic quantization kernels + better test coverage. (#33320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33320 Reviewed By: supriyar Differential Revision: D19893911 Pulled By: AshkanAliabadi fbshipit-source-id: e79dd06af333c6629e3412315550814da28d9c24	2020-02-18 15:32:44 -08:00
Zachary DeVito	f1b73799d5	Clean up isinstance flags (#33265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33265 This removes the need for isinstance to keep trace of list and tuple separately by introducing AnyListType and AnyTupleType into the JIT type system to be the common supertype of any lists or tuples. This allows us to remove the weird flags from the interpreter for the isinstance operator. Test Plan: Imported from OSS Differential Revision: D19883933 Pulled By: zdevito fbshipit-source-id: f998041b42d8b4554c5b99f4d95d1d42553c4d81	2020-02-18 15:07:06 -08:00
Zachary DeVito	7f2c25b6fa	Move special ops into interpreter (#32889 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32889 Common primitive ops that have special inputs make it very hard to serialize the bytecode for mobile because information about how the op behaves is hidden in the Node. This changes how we handle the following ops so that they are encoded as their own interpreter bytecodes. ``` USES NODE: prim::TupleUnpack(...) -> (...) USES NODE: prim::TupleSlice(...) -> (...) USES NODE: prim::TupleConstruct(...) -> (...) USES NODE: prim::ListUnpack(...) -> (...) USES NODE: prim::ListConstruct(...) -> (...) USES NODE: prim::DictConstruct(...) -> (...) USES NODE: prim::Constant() -> (...) USES NODE: prim::isinstance(...) -> (...) USES NODE: prim::CreateObject(...) -> (...) USES NODE: prim::fork(...) -> (...) USES NODE: aten::warn(str message, , int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack ``` This leaves a state where the _only_ remaining Node*-consuming builtins are things that are only introduced during JIT optimization and will not appear in mobile code. Serialization of bytecode can now be made to directly write the CodeImpl object without modification. Test Plan: Imported from OSS Differential Revision: D19673157 Pulled By: zdevito fbshipit-source-id: 7b8c633d38a4c783b250fbdb222705e71a83ad26	2020-02-18 15:07:01 -08:00
Zachary DeVito	83c347ff4a	Remove prim::Constant op (#32804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32804 Constants are interpreter primitives so the op was not actually used. This cleans up some of the logic around it. This also fixes constant prop such that failures to look up an op do not silently stop constant propagation. Instead, only errors inside the op implementation itself will do this. Test Plan: Imported from OSS Differential Revision: D19673156 Pulled By: zdevito fbshipit-source-id: 7beee59a6a67a6c2f8261d86bd505280fefa999e	2020-02-18 15:06:56 -08:00
Zachary DeVito	c59e35b147	interpreter handling for varargs to remove need for looking at Node (#32791 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32791 When a registered operator has varags (ends with ... in its schema), the interpreter now appends the number of arguments to the top of the stack before invoking the operator. This allows the removal of more uses of Node* in the interpreter. This PR also then cleans up the constructors for Operator to make it more likely someone chooses the correct one. After making these ops: ``` USES NODE: prim::TupleUnpack(...) -> (...) USES NODE: prim::TupleSlice(...) -> (...) USES NODE: prim::TupleConstruct(...) -> (...) USES NODE: prim::ListUnpack(...) -> (...) USES NODE: prim::ListConstruct(...) -> (...) USES NODE: prim::DictConstruct(...) -> (...) USES NODE: prim::Constant() -> (...) USES NODE: prim::isinstance(...) -> (...) USES NODE: prim::CreateObject(...) -> (...) USES NODE: prim::fork(...) -> (...) USES NODE: aten::warn(str message, *, int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack ``` Into interpreter primitives, we can remove all but two constructors for operators: one that is (schema_string, operation), and one that is (symbol, op_creator) for the remaining weird primitives. Test Plan: Imported from OSS Differential Revision: D19673158 Pulled By: zdevito fbshipit-source-id: 95442a001538a6f53c1db4a210f8557ef118de66	2020-02-18 15:04:48 -08:00
anjali411	da015c77a1	Cummax and Cummin doc update and performance benchmark (#32537 ) Summary: [CPU] Benchmark results for cummax, cummin: In [1]: import torch In [2]: x=torch.randn(5,6,7).cuda() In [3]: %timeit x.cummax(0) 134 µs ± 1.59 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [4]: %timeit x.max(0) 114 µs ± 560 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [5]: %timeit x.cummax(1) 134 µs ± 760 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [6]: %timeit x.max(1) 118 µs ± 514 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [7]: %timeit x.cumsum(0) 97.1 µs ± 6.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [8]: %timeit x.cumprod(0) 83.6 µs ± 689 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [9]: %timeit x.cumprod(1) 86.3 µs ± 528 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [10]: y=torch.randn(5,6,7) In [11]: %timeit y.cummax(0) 148 µs ± 1.43 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [12]: %timeit y.max(0) 111 µs ± 125 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [13]: %timeit y.cumsum(0) 54.8 µs ± 311 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) In [14]: %timeit y.cumprod(0) 56.2 µs ± 836 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32537 Differential Revision: D19951171 Pulled By: anjali411 fbshipit-source-id: cf972c550189473e9ce62e24ac7dd34b9373fef9	2020-02-18 14:12:25 -08:00
anjali411	016d73bd74	remove Complex CPU/CUDA backend enum keys (#33267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33267 Test Plan: Imported from OSS Differential Revision: D19907696 Pulled By: anjali411 fbshipit-source-id: 78cc55344313387c4b05bb003688915cee64e3be	2020-02-18 13:38:39 -08:00
Owen Anderson	1d743e3154	Add guard elimination support for aten::unsqueeze. (#33371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33371 Differential Revision: D19920041 Pulled By: resistor fbshipit-source-id: 906af47676dba014c31eef069a4753207f2efc60	2020-02-18 13:22:58 -08:00
Michael Ranieri	1af30451e5	sync srcs between fbcode and ovrsource targets (#33368 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33368 reorganizing files that describe sources to ensure the same list is used for both fbcode and ovrsource targets. (BUCK vs TARGETS) Test Plan: CI green Reviewed By: malfet Differential Revision: D19803036 fbshipit-source-id: 69c1fa10877c3f0c0e9c1517784949c3c9939710	2020-02-18 13:00:43 -08:00
Peter Bell	44af8ee6cd	Add pybind11 exception translator (#30588 ) Summary: Closes https://github.com/pytorch/pytorch/issues/30027 The idea here is that you can bind a function with `pybind11` in a single line and without modifying the function: ```cpp m.def("foo", foo, py::call_guard<torch::PyWarningHandler>()); ``` Where warnings are handled by the [`call_guard`](https://pybind11.readthedocs.io/en/stable/advanced/functions.html#call-guard) and exceptions are handled by the `pybind11` exception translator. To do this, I have added support for handling C++ exceptions in `torch::PyWarningHandler`'s destructor without setting the python error state before hand. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30588 Differential Revision: D19905626 Pulled By: albanD fbshipit-source-id: 90c0a5e298b123cc0c8ab9c52c91be4e96ea47c6	2020-02-18 11:33:29 -08:00
peter	4c8064c9e1	Fix avx-512 detection logic for jit fuser with MSVC 2019 (#33403 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/33401. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33403 Differential Revision: D19949812 Pulled By: soumith fbshipit-source-id: 00dc3c99b5ba1c13394d5d38bcb148720434b0a3	2020-02-18 11:04:18 -08:00
Michael Suo	abbf6e7f53	fix clang-tidy lint (#33448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33448 Test Plan: Imported from OSS Differential Revision: D19952962 Pulled By: suo fbshipit-source-id: db04bf74f6156edd1bd0716b12f6ca911c84a6bf	2020-02-18 11:02:57 -08:00
svcscm	4468a7b7b3	Updating submodules Summary: GitHub commits: `efc34423b6` `75bb459654` `fc1945c2e0` `332a31a145` `2b6eef4dc9` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: d105b9aa5001c53f884f007406684b73809a7680	2020-02-18 10:21:04 -08:00
Gregory Chanan	f938b3b4e0	Remove TH binding of set_(Tensor). (#33358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33358 We just translate this code to ATen. Test Plan: Imported from OSS Differential Revision: D19911114 Pulled By: gchanan fbshipit-source-id: 2279e63bb7006f7253620417937e3ce9301e0cdb	2020-02-18 10:10:00 -08:00
Jeong Ukjae	879cf0b15a	fix typing bug of LambdaLR.__init__ (#33271 ) Summary: ## problem ```python class LambdaLR(_LRScheduler): """Sets the learning rate of each parameter group to the initial lr times a given function. When last_epoch=-1, sets initial lr as lr. Args: optimizer (Optimizer): Wrapped optimizer. lr_lambda (function or list): A function which computes a multiplicative factor given an integer parameter epoch, or a list of such functions, one for each group in optimizer.param_groups. last_epoch (int): The index of last epoch. Default: -1. Example: >>> # Assuming optimizer has two groups. >>> lambda1 = lambda epoch: epoch // 30 >>> lambda2 = lambda epoch: 0.95 ** epoch >>> scheduler = LambdaLR(optimizer, lr_lambda=[lambda1, lambda2]) >>> for epoch in range(100): >>> train(...) >>> validate(...) >>> scheduler.step() """ ``` `LambdaLR` takes a lambda that returns a float and takes a int, or a list of such lambdas. ## related issue Resolve https://github.com/pytorch/pytorch/issues/32645 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33271 Differential Revision: D19878665 Pulled By: vincentqb fbshipit-source-id: 50b16caea13de5a3cbd187e688369f33500499d0	2020-02-18 09:10:00 -08:00
Assaf Shocher	2c99ea8654	Dirac init compatibility with group convolutions (#32825 ) Summary: Initializing weights of group-conv with init.dirac_, and applying, previously resulted in an output that makes no sense: ``` x = torch.randn([1, 3, 3, 3]) print('input:\n', x) conv_layer = torch.nn.Conv2d(3, 3, 3, padding=1, groups=3, bias=False) torch.nn.init.dirac_(conv_layer.weight.data) print('\noutput (before this PR):\n',conv_layer(x)) input: tensor([[[[ 0.5369, -1.1428, 0.1031], [ 0.4638, -0.0854, -0.6553], [ 0.8321, -2.5926, -0.3214]], [[-0.2289, -0.0895, 0.4407], [ 1.2309, -1.2096, -1.5216], [-0.1798, 1.1694, 0.3469]], [[ 0.1905, 0.8095, 0.5490], [-0.4525, -0.4284, -0.1141], [ 1.1857, -0.9246, -0.5119]]]]) output (before this PR): tensor([[[[ 0.5369, -1.1428, 0.1031], [ 0.4638, -0.0854, -0.6553], [ 0.8321, -2.5926, -0.3214]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]]], grad_fn=<MkldnnConvolutionBackward>) ```` This PR allows introducing groups to the initialization: ``` torch.nn.init.dirac_(conv_layer.weight.data, groups=3) print('output (after this PR):\n', conv_layer(x)) output (after this PR): tensor([[[[ 0.5369, -1.1428, 0.1031], [ 0.4638, -0.0854, -0.6553], [ 0.8321, -2.5926, -0.3214]], [[-0.2289, -0.0895, 0.4407], [ 1.2309, -1.2096, -1.5216], [-0.1798, 1.1694, 0.3469]], [[ 0.1905, 0.8095, 0.5490], [-0.4525, -0.4284, -0.1141], [ 1.1857, -0.9246, -0.5119]]]], grad_fn=<MkldnnConvolutionBackward>) ``` When out_channels is different than input_channels, it does the natural thing which is applying identity in each group separately: ``` x = torch.randn([1, 2, 3, 3]) print('input:\n', x) conv_layer = torch.nn.Conv2d(2, 4, 3, padding=1, groups=2, bias=False) torch.nn.init.dirac_(conv_layer.weight.data, groups=2) print('\noutput:\n', conv_layer(x)) input: tensor([[[[ 1.2205, -0.6608, 0.8640], [-0.5464, 1.1288, 1.4726], [-0.6693, 0.4000, -1.7613]], [[-0.8760, -0.8814, -0.4705], [ 0.6283, -0.5943, 0.6873], [-0.6852, 1.4723, 0.3325]]]]) output: tensor([[[[ 1.2205, -0.6608, 0.8640], [-0.5464, 1.1288, 1.4726], [-0.6693, 0.4000, -1.7613]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]], [[-0.8760, -0.8814, -0.4705], [ 0.6283, -0.5943, 0.6873], [-0.6852, 1.4723, 0.3325]], [[ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000], [ 0.0000, 0.0000, 0.0000]]]], grad_fn=<MkldnnConvolutionBackward>) ``` Argument 'groups' defaults to 1 so it is backward compatible. Tests are modified to include cases of with groups>1 but also contain groups=1 cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32825 Differential Revision: D19859926 Pulled By: vincentqb fbshipit-source-id: 9dfdd24471ff14d79c442dfd28c1891aff812fdf	2020-02-18 09:00:12 -08:00
Richard Zou	28c5213a97	Add mechanism to pass a number of workers to cpp extensions (#33346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33346 Fixes #33091 This PR lets users control the number of workers that cpp extensions uses through the environment variable `MAX_JOBS`. If the environment variable is a non-negative integer we use that many threads; otherwise, ninja falls back to the default. I chose to use the name `MAX_JOBS` because we use it in PyTorch already to control the number of workers PyTorch builds with. There is a risk that users of cpp extensions already have `MAX_JOBS` set but we are hoping that that risk is small and/or it means semantically the same thing. Test Plan: - tested locally Differential Revision: D19911645 Pulled By: zou3519 fbshipit-source-id: d20ed42de4f845499ed38f1a1c73e9ccb620f780	2020-02-18 06:48:11 -08:00
Vasil Khalidov	cfb4862673	[pytorch] correct input size check for GroupNorm (#33008 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33008 Corrects D19373507 to allow valid use cases that fail now. Multiplies batch size by the number of elements in a group to get the correct number of elements over which statistics are computed. Details: The current implementation disallows GroupNorm to be applied to tensors of shape e.g. `(1, C, 1, 1)` to prevent cases where statistics are computed over 1 element and thus result in a tensor filled with zeros. However, in GroupNorm the statistics are calculated across channels. So in case where one has an input tensor of shape `(1, 256, 1, 1)` for `GroupNorm(32, 256)`, the statistics will be computed over 8 elements and thus be meaningful. One use case is [Atrous Spatial Pyramid Pooling (ASPPPooling)](`791c172a33/torchvision/models/segmentation/deeplabv3.py (L50)`), where GroupNorm could be used in place of BatchNorm [here](`791c172a33/torchvision/models/segmentation/deeplabv3.py (L55)`). However, now this is prohibited and results in failures. Proposed solution consists in correcting the computation of the number of elements over which statistics are computed. The number of elements per group is taken into account in the batch size. Test Plan: check that existing tests pass Reviewed By: fmassa Differential Revision: D19723407 fbshipit-source-id: c85c244c832e6592e9aedb279d0acc867eef8f0c	2020-02-18 06:43:53 -08:00
Mikhail Zolotukhin	dde2ff4608	[Fuser] Add a knob for disabling/enabling CUDA fuser. (#33395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33395 By default the GPU fuser stays enabled, but this function allows to manually disable it. It will be useful for working on other implementations of fuser. Test Plan: Imported from OSS Differential Revision: D19926911 Pulled By: ZolotukhinM fbshipit-source-id: 7ea9d1dd7821453d640f81c487b63e1d585123c4	2020-02-17 21:28:09 -08:00
Will Feng	a203dc2e6d	[C++ API] Allow skipping default arguments in module's forward method when module is used in Sequential (#33027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33027 This PR allows default arguments in module's forward method to be skipped when module is used in `torch::nn::Sequential`, by introducing the `FORWARD_HAS_DEFAULT_ARGS` macro and requiring that all modules that have default arguments in its forward method must have a corresponding `FORWARD_HAS_DEFAULT_ARGS` macro call. Fixes issue mentioned in https://github.com/pytorch/pytorch/issues/30931#issuecomment-564144468. Test Plan: Imported from OSS Differential Revision: D19777815 Pulled By: yf225 fbshipit-source-id: 73282fcf63377530063e0092a9d84b6c139d2e32	2020-02-17 20:38:02 -08:00
Will Feng	4724964810	[C++ API] Expose AnyValue and AnyModuleHolder classes (#33026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33026 This PR contains necessary changes to prepare for https://github.com/pytorch/pytorch/pull/33027. It exposes the following classes to public: 1. `torch::nn::AnyValue`, because if the user has optional arguments in their module's forward method, they must also use the `FORWARD_HAS_DEFAULT_ARGS` macro and pass in the default values for those optional arguments wrapped by `torch::nn::AnyValue`. 2. `torch::nn::AnyModuleHolder`, because `torch::nn::Module` needs to declare it as a friend class for it to be able to access `torch::nn::Module`'s protected methods such as `_forward_has_default_args` / `_forward_num_required_args` / `_forward_populate_default_args`. Test Plan: Imported from OSS Differential Revision: D19777814 Pulled By: yf225 fbshipit-source-id: 1c9d5aa24f0689154752c426a83ee98f64c9d02f	2020-02-17 20:35:22 -08:00
Will Feng	5d7f42847c	Add at::Tensor::retain_grad API (#33349 ) Summary: This PR adds `at::Tensor::retain_grad`, and its implementation mirrors the Python `torch.Tensor.retain_grad` API: `c6271c63f2/torch/tensor.py (L292-L315)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33349 Differential Revision: D19944524 Pulled By: yf225 fbshipit-source-id: e61d5d761996b6d1b860c04c4b4650c1a49a6a8c	2020-02-17 20:03:48 -08:00
Xiang Gao	55fa133cdc	Remove gpu_kernel_with_index (#33370 ) Summary: Although `gpu_kernel_with_index` might look like a quite general helper function at first look, it actually isn't. The problem is not only 32bit indexing, but something more fundamental: `TensorIterator` reorder dims and shapes, so if you have non-contiguous tensor such as `torch.empty(5, 5).t()` , the index won't be correct. Since the whole point of `TensorIterator` is to manipulate shapes/strides to speedup loops, it is fundamentally impossible to get the correct linear index without tons of efforts. Currently, the range factories are not failing on an `out=non_contiguous_tensor` is because it is so lucky that `has_internal_overlap` is stupid enough to return everything not contiguous as `TOO_HARD`. Since `gpu_kernel_with_index` is not general, we should move it from `Loops.cuh` to `RangeFactories.cu`. And since the kernel is so simple to implement, it makes no sense to use `TensorIterator` which goes through tons of unnecessary checks like `compute_dtypes`. `torch.range` is not tested for 64bit-indexing, and I will file a new PR to remove it (it was supposed to be removed at 0.5). Benchmark: The device is GTX-1650, I don't have a good GPU at home. Code: ```python import torch print(torch.__version__) for i in range(100): torch.randn(1000, device='cuda') torch.cuda.synchronize() for i in range(15, 29): %timeit torch.arange(2 ** i, device='cuda'); torch.cuda.synchronize() ``` Before: ``` 1.5.0a0+c37a9b8 11.9 µs ± 412 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 12.7 µs ± 309 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 19.6 µs ± 209 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 28.9 µs ± 923 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 48.4 µs ± 1.64 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 85.7 µs ± 1.46 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 162 µs ± 1.09 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each) 312 µs ± 9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 618 µs ± 15.7 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.22 ms ± 9.91 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.45 ms ± 97.1 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.9 ms ± 155 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 10.1 ms ± 378 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` After: ``` 1.5.0a0+7960d19 11 µs ± 29.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 12.4 µs ± 550 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 18.4 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 27.6 µs ± 10.9 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 46.2 µs ± 18.6 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 83.3 µs ± 5.61 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 158 µs ± 373 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 307 µs ± 1.44 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 603 µs ± 112 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.2 ms ± 1.05 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.4 ms ± 23.5 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.77 ms ± 25.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 9.51 ms ± 933 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/33370 Differential Revision: D19925990 Pulled By: ngimel fbshipit-source-id: f4a732fe14a5582b35a56618941120d62e82fdce	2020-02-17 17:15:04 -08:00
Xiaomeng Yang	ebb008eb68	Optimize Unfold3dAcc to improve performance of conv3d backward (#33317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33317 Optimize Unfold3dAcc to improve performance of conv3d backward Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "Conv3d" Reviewed By: houseroad Differential Revision: D19892678 fbshipit-source-id: 18873dd1d1409263d9925840db302b21fb3b490d	2020-02-17 14:49:02 -08:00
Pritam Damania	c90b393c00	Fix logging for aborted communicators in ProcessGroupNCCL. (#33147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33147 The log mentioned that it is aborting communicators even if `blockingWait_` was false. This was incorrect, and I updated the logging to reflect the appropriate behavior. ghstack-source-id: 98025017 Test Plan: waitforbuildbot Differential Revision: D19817967 fbshipit-source-id: fb3415af2cc99eb20981ceaa5203c0a1880fd6f3	2020-02-17 14:42:51 -08:00
Haixin Liu	1a589f50bd	[auto quant] Add quant_scheme_generator to interface with dper Summary: Add quant_scheme_generator that will be used to interface with dper. Also updated two related functions: - Add batch_size option to save_local_dataset() in dataset utils to be more flexible. Test Plan: Tested in the stacked diff D19747206. buck test deeplearning/numeric_suite/toolkit/test:int8_static_utils_test Reviewed By: csummersea Differential Revision: D19745159 fbshipit-source-id: a4ac1ef0ffdddc68bdf5e209ae801b8c475d0b96	2020-02-17 10:41:22 -08:00
svcscm	87dc2dbcce	Updating submodules Summary: GitHub commits: `19c040cb01` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: ddc41000622a682874ab3a11fdf4a91038f9c15f	2020-02-16 23:57:14 -08:00
Jongsoo Park	c57f8984e6	[caffe2] make order btw div and mul in adgrad consistent (#32974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32974 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/286 Re-attempt of D18805426 . Decided to be consistent with PyTorch Adagrad There was an inconsistency in the order of operation between scalar and SIMD code when we compute Adagrad. This diff make them consistent by doing w += lr * grad / (sqrt(moment) + epsilon) in Adagrad and w += lr / (sqrt(moment) + epsilon) * grad in RowWiseSparseAdagrad. The Adagrad order is consistent with PyTorch (see aten/src/ATen/native/cpu/PointwiseOpsKernel.cpp addcmul_cpu_kernel function). The RowWiseSparseAdagrad order is to make compute more efficient. In RowWiseSparseAdagrad, lr / (sqrt(moment) + epsilon) is shared among all elements in the row And, we're not going to use FMA to be consistent with PyTorch (even though it provides a little accuracy benefit) Test Plan: CI Reviewed By: wx1988 Differential Revision: D19342865 fbshipit-source-id: e950c16f2e1c4a2f2a3ef53b1705db373c67f341	2020-02-16 22:45:59 -08:00
svcscm	d29997373e	Updating submodules Summary: GitHub commits: `80dda47903` `797af57bb6` `b2fceb9d05` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: dde5fb9abca185422df11dc61c658dc333ad63ca	2020-02-16 21:01:37 -08:00
Rohan Varma	d4e4beddc4	Revert D19871946: [distributed] pass in timeout to TCP store when initializing Test Plan: revert-hammer Differential Revision: D19871946 Original commit changeset: dd002180c4c8 fbshipit-source-id: 40b0676c51e43366c0700e81d16cc7927ee8efc2	2020-02-16 19:37:44 -08:00
Rohan Varma	df47a3abe0	[distributed] pass in timeout to TCP store when initializing (#33325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33325 Closes https://github.com/pytorch/pytorch/issues/32924. There was a bug where for TCPStore, we would not respect the timeout passed into `init_process_group` while constructing the TCPStore. Instead, we'd set the timeout after the rendezvous created the store, meaning that we used the default timeout of 300s while connecting to the server. This diff passes the timeout passed into `init_process_group` to rendezvous so that it can be passed into the constructor for TCPStore, so that we can use the right timeout at construction time. Question: Should we make this change for FileStore as well? Currently the FileStore constructor does not take in a timeout at all. ghstack-source-id: 98401875 Test Plan: Added a UT Differential Revision: D19871946 fbshipit-source-id: dd002180c4c883216645b8a97cc472c6116ac117	2020-02-16 17:59:44 -08:00
Huayu Li	c75d06d854	Move gating part of SparseFeatureGating to local Summary: in dper2, local net is hard-coded by whitelisting some layers. Add SparseFeatureGating related layers to local net explicitly. Test Plan: * workflow: f167812211 * QRT: fall back looks normal {F228442018} Differential Revision: D19852280 fbshipit-source-id: 6fecc3d745c3f742d029575a7b9fe320618f1863	2020-02-16 14:18:27 -08:00
Lu Fang	f6808df75f	[BC] Temporarily fix the BC check (#33387 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33387 CI is broken. Skip two functions to fix the problem. Test Plan: ci Reviewed By: hl475 Differential Revision: D19926249 fbshipit-source-id: a46d1465c59de8616d2af5fb0b9cc18532359f88	2020-02-15 18:31:25 -08:00
Peter Bell	495bd5818b	Fix index truncation in argmin/max for large tensors (#33310 ) Summary: Fixes the `TensorIterator` parts of https://github.com/pytorch/pytorch/issues/32863 (THC is still broken) `TensorIterator::split` now keeps track of the `view_offsets` into the full tensor range. With this, I can take the base offset for the reduced dimension and translate partial results from the sub-iter into the index range of the full tensor. This happens only once for each intermediate result, so we should still benefit from the performance of 32-bit indexing in loops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33310 Differential Revision: D19906136 Pulled By: ngimel fbshipit-source-id: 3372ee4b8d5b115a53be79aeafc52e80ff9c490b	2020-02-15 17:24:55 -08:00
Xiang Gao	cd038c0ae9	Get rid of some template arguments in GPU loop (#33308 ) Summary: Globally define ```C++ constexpr int num_threads = C10_WARP_SIZE * 2; constexpr int thread_work_size = 4; constexpr int block_work_size = thread_work_size * num_threads; ``` and kill all the template arguments passing these values. These are effectively global, but we are now passing them around by template arguments, causing many inconvenience in coding. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33308 Differential Revision: D19907250 Pulled By: ngimel fbshipit-source-id: 4623b69baea7e6e77f460ffdfa07cf9f8cba588a	2020-02-15 15:17:46 -08:00
Pritam Damania	fd684cc312	Use torch.set_default_dtype in test_data_parallel and rename dtype2prec (#32962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32962 As per gchanan's comments on https://github.com/pytorch/pytorch/pull/30445, I've used `torch.set_default_dtype` in test_data_parallel instead of specifying dtype=torch.double everywhere. Also, renamed dtype2prec to dtype2prec_DONTUSE ghstack-source-id: 98388429 Test Plan: waitforbuildbot Differential Revision: D19714374 fbshipit-source-id: eb55bbca33881625636ba9ea6dd4cb692f25668e	2020-02-15 14:07:54 -08:00
Lu Fang	6dd6b0bfae	Revert D19900566: [pytorch][PR] Simplify prim::shape when we have complete tensor types. Test Plan: revert-hammer Differential Revision: D19900566 Original commit changeset: c8eaad70c8ea fbshipit-source-id: 764f2139fdf19f22a397694d011078ec525f5e8a	2020-02-15 11:37:35 -08:00
Owen Anderson	d35a4c202e	Add support for aten::slice to guard elimination. (#33311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33311 Differential Revision: D19911105 Pulled By: resistor fbshipit-source-id: 402cfe5f2e03a62b78ed13157e1462cefd9eeafb	2020-02-14 22:54:37 -08:00
svcscm	c37a9b874b	Updating submodules Summary: GitHub commits: `65758fd3b1` `fb73204584` `618f71a795` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 814ebbcf35bcecc62ec64854a26ea645d651fbc2	2020-02-14 20:48:09 -08:00
lixinyu	1e76649d30	fast setup for output tensor in tensor iterator (#33165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33165 Test Plan: Imported from OSS Differential Revision: D19825853 Pulled By: glaringlee fbshipit-source-id: 8f908f2e93a4e377306a77e8a771208603b20e72	2020-02-14 20:34:50 -08:00
svcscm	c6271c63f2	Updating submodules Summary: GitHub commits: `46fd5fed10` `87cd6087c6` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 402427af823fe31ac1f6e18c5a020ec6ec7cc1af	2020-02-14 20:04:48 -08:00
Mikhail Zolotukhin	e1a895858f	Allow to register custom passes both before and after fusion. (#33261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33261 It was requested in #33114. Test Plan: Imported from OSS Differential Revision: D19910600 Pulled By: ZolotukhinM fbshipit-source-id: 827f1744b97f386065a21d1ba5d82c1f90edbe46	2020-02-14 16:28:52 -08:00
Eli Uriegas	3359871f5d	.circleci: Use volume mounts instead of docker cp (#33355 ) Summary: docker cp was erroring out, so lets just use volume mounts instead which should hopefully be more consistent Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/33355 Differential Revision: D19913948 Pulled By: seemethere fbshipit-source-id: 059ddd36a8162f946cfea451b5dcd1706f1209e9	2020-02-14 15:32:57 -08:00
Eli Uriegas	dfafe2aad1	.cirlceci: Swap PYTORCH_BUILD_VERSION if on tag (#33326 ) Summary: Basically just fills out PYTORCH_BUILD_VERSION to the correct version baesd on the git tag. This makes it so that we don't have to continually edit this file when doing releases. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/33326 Differential Revision: D19911035 Pulled By: seemethere fbshipit-source-id: e27105f3e193a49dd68452d8f60232f8a132acad	2020-02-14 14:43:29 -08:00
George Guanheng Zhang	5cab54e0db	Revert D19560159: [RPC Reliability] Implemented retries for RPCs with exponential backoff Test Plan: revert-hammer Differential Revision: D19560159 Original commit changeset: 40cd86f9a25d fbshipit-source-id: 70f5b19bc05fc34e3c912f42f9d32b9fb80aed06	2020-02-14 14:29:59 -08:00
Will Feng	0b5b2b864a	[BC-Breaking] Rename at::Tensor::base() to _base() (#33316 ) Summary: This PR renames `at::Tensor::base()` to `at::Tensor::_base()`, to achieve parity with Python `torch.Tensor._base` API. ---- This PR is BC-breaking in the following way: Previously, to get the tensor that this tensor is a view of, the user would call `tensor.base()` in C++. Now, they must call `tensor._base()`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33316 Differential Revision: D19905687 Pulled By: yf225 fbshipit-source-id: 949d97b707b2c82becb99ac89e9ac24359d183e6	2020-02-14 14:06:58 -08:00
Tao Xu	9c0625b004	[iOS] Add watchOS support (#33318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33318 ### Summary Recently, we have a [discussion](https://discuss.pytorch.org/t/libtorch-on-watchos/69073/14) in the forum about watchOS. This PR adds the support for building watchOS libraries. ### Test Plan - `BUILD_PYTORCH_MOBILE=1 IOS_PLATFORM=WATCHOS ./scripts/build_ios.sh` Test Plan: Imported from OSS Differential Revision: D19896534 Pulled By: xta0 fbshipit-source-id: 7b9286475e895d9fefd998246e7090ac92c4c9b6	2020-02-14 14:02:22 -08:00
Owen Anderson	ecd9a5ad12	Simplify prim::shape when we have complete tensor types. (#33336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33336 Differential Revision: D19900566 Pulled By: resistor fbshipit-source-id: c8eaad70c8ea57ebbe920dcfbdaf6a9435b49506	2020-02-14 13:53:08 -08:00
Edward Yang	9c8b67b179	Revert D19905015: Revert D19858239: [pytorch][PR] Refactor and add VS 14.16 and 2019 CI for Windows Test Plan: revert-hammer Differential Revision: D19905015 Original commit changeset: b117e44d5552 fbshipit-source-id: a10c78aed953434f69f466bdd36f914334ba82f3	2020-02-14 13:42:29 -08:00
anjali411	b730c5a3bd	remove dispatch key (#33266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33266 Test Plan: Imported from OSS Differential Revision: D19907697 Pulled By: anjali411 fbshipit-source-id: 99fc06b7c41229e8d9ed4271de62247cda12ee6e	2020-02-14 13:26:15 -08:00
Johannes M Dieterich	6ade7e3a15	[ROCm] Enable 3D convolutions through ROCm (#33067 ) Summary: For both the Caffe2 and PyTorch backends, enable 3D convolutions through MIOpen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33067 Reviewed By: BIT-silence Differential Revision: D19880495 Pulled By: bddppq fbshipit-source-id: 8f6f970910654c1c5aa871b48a04c1054875691c	2020-02-14 13:19:10 -08:00
Negin Raoof	9823662b43	[ONNX] Export split with list of sizes (#33161 ) Summary: Exporting Split with a dynamic list of split_sizes is not supported. This PR enables export using onnx SplitToSequence + SequenceAt Pull Request resolved: https://github.com/pytorch/pytorch/pull/33161 Reviewed By: hl475 Differential Revision: D19860152 Pulled By: houseroad fbshipit-source-id: 300afedc22b01923efb23acd1a3627aa146bb251	2020-02-14 12:46:33 -08:00
meganset	e9e9331927	Fractional Max Pooling: output ratios defined as double (#33304 ) Summary: References https://github.com/pytorch/pytorch/issues/33240 Changes options.output_ratio from long integer to double to allow ratios to used to calculate output size from inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33304 Differential Revision: D19887318 Pulled By: yf225 fbshipit-source-id: 228c2c6bf4158307700c2a983d27d539c6b9eded	2020-02-14 12:31:39 -08:00
Raghuraman Krishnamoorthi	243cc20451	Enable inplace relu fusion for training (#33105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33105 Support inplace relu for Conv+BN+Relu fusion during training. ghstack-source-id: 97944659 Test Plan: buck test caffe2/test:quantization -- 'test_fuse_module_train $test_quantization\.FusionTest$' --print-passing-details Differential Revision: D19795221 fbshipit-source-id: 056dc06050d145750c4d0044c0fc1c3febcfdafc	2020-02-14 12:15:58 -08:00
Guanheng Zhang	8245641091	Re-activate binary_macos_libtorch_2_7_cpu_build and binary_macos_li… (#33321 ) Summary: Re-send the PR as Intel has restored the relevant packages. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33321 Differential Revision: D19894221 Pulled By: zhangguanheng66 fbshipit-source-id: bc19dcfa5b17ff047f9ae09ebd8eadfb01f7ed68	2020-02-14 12:01:56 -08:00
Omkar Salpekar	92b67c03e4	[RPC Reliability] Implemented retries for RPCs with exponential backoff (#32602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32602 This adds functionality for re-trying RPC's that are sent with the function `sendWithRetries()`. It adds RPC's that will potentially need to be retried to a sorted map that contains the timeout at which to retry the RPC and associated metadata. A separate thread iteratively removes the earliest retry-able RPC from the map, sleeps until the corresponding time point, re-tries the RPC, and adds to the map again with a future timeout. GitHub Issue: https://github.com/pytorch/pytorch/issues/32124 Per the first 3 milestones, the following will be addressed in future PR's: * enabling RPC Retries for RRef internal messages Differential Revision: D19560159 fbshipit-source-id: 40cd86f9a25dc24367624d279a3b9720b20824cf	2020-02-14 11:57:24 -08:00
Edward Yang	ae53f8dd25	Revert D19859905: [pytorch][PR] Gradient scaling API Test Plan: revert-hammer Differential Revision: D19859905 Original commit changeset: bb8ae6966214 fbshipit-source-id: 28f1c93e8a00e3a4bbe8cc981499b15468f0b970	2020-02-14 11:03:27 -08:00
xiaobing.zhang	b276ddda38	remove THC dist code which nerver be used (#33283 ) Summary: Remove THC dist code which nerver be used. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33283 Differential Revision: D19905361 Pulled By: gchanan fbshipit-source-id: 367fd31e2209d36b30af31511554fdbdd67c98e4	2020-02-14 10:37:23 -08:00
Nicki Skafte	4bef344210	Implementation of mixture distributions (#22742 ) Summary: Addressing issue https://github.com/pytorch/pytorch/issues/18125 This implements a mixture distributions, where all components are from the same distribution family. Right now the implementation supports the ```mean, variance, sample, log_prob``` methods. cc: fritzo and neerajprad - [x] add import and `__all__` string in `torch/distributions/__init__.py` - [x] register docs in docs/source/distributions.rst ### Tests (all tests live in tests/distributions.py) - [x] add an `Example(MixtureSameFamily, [...])` to the `EXAMPLES` list, populating `[...]` with three examples: one with `Normal`, one with `Categorical`, and one with `MultivariateNormal` (to exercise, `FloatTensor`, `LongTensor`, and nontrivial `event_dim`) - [x] add a `test_mixture_same_family_shape()` to `TestDistributions`. It would be good to test this with both `Normal` and `MultivariateNormal` - [x] add a `test_mixture_same_family_log_prob()` to `TestDistributions`. - [x] add a `test_mixture_same_family_sample()` to `TestDistributions`. - [x] add a `test_mixture_same_family_shape()` to `TestDistributionShapes` ### Triaged for follup-up PR? - support batch shape - implement `.expand()` - implement `kl_divergence()` in torch/distributions/kl.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/22742 Differential Revision: D19899726 Pulled By: ezyang fbshipit-source-id: 9c816e83a2ef104fe3ea3117c95680b51c7a2fa4	2020-02-14 10:31:56 -08:00
Hong Xu	7dde91b0ae	Vectorize elu and its backward function on CPU (#32986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32986 Benchmark: (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz) ```python import timeit for op in ('ELU',): print('Forward') for dtype in ('torch.double', 'torch.float'): for n, t in [(10_000, 100000), (100_000, 10000)]: print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('m(a)', setup=f'import torch; m = torch.nn.{op}(); a = torch.linspace(-1, 1, {n}, dtype={dtype})', number=t)) print('Backward') for dtype in ('torch.double', 'torch.float'): for n, t in [(20_000, 100000), (200_000, 10000)]: print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('y.backward(retain_graph=True)', setup=f'import torch; m = torch.nn.{op}(); a = torch.linspace(-1, 1, {n}, requires_grad=True, dtype={dtype}); x = m(a); y = x.sum()', number=t)) ``` Before: ``` Forward torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.double 5.292799739996553 torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.double 4.828570917001343 torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.float 3.1359513780043926 torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.float 2.7030876770004397 Backward torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.double 4.568238995998399 torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.double 1.8908141480060294 torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.float 3.8652471189998323 torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.float 1.13068484600808 ``` After: ``` Forward torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.double 2.1265591429983033 torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.double 1.6708065870043356 torch.nn.ELU()(a), numel() == 10000 for 100000 times, dtype=torch.float 1.1806934149935842 torch.nn.ELU()(a), numel() == 100000 for 10000 times, dtype=torch.float 0.77735430400935 Backward torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.double 4.494567882007686 torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.double 2.007220732004498 torch.nn.ELU()(a), numel() == 20000 for 100000 times, dtype=torch.float 3.615133151994087 torch.nn.ELU()(a), numel() == 200000 for 10000 times, dtype=torch.float 1.105554559995653 ``` Test Plan: Imported from OSS Differential Revision: D19794595 Pulled By: VitalyFedyunin fbshipit-source-id: c319ec04676ced22179b8b34789ac8bf6428deab	2020-02-14 09:45:17 -08:00
Jeremy Lilley	1b2d2ba504	[PyTorch] Fix write-after-free (TSAN) in GraphTask::set_error() (#33156 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33156 When dist_autograd_spawn_thrift's 'test_backward_node_failure_python_udf' test is run, it was encountering a TSAN error related to holding the mutex while the underlying datastructure was being dealloced. In this change, we simply get a shared_ptr<> reference to the future, and set_exception() without having the lock held, to avoid deallocing underneath the lock. ghstack-source-id: 98303434 Test Plan: buck test mode/opt-tsan //caffe2/test/distributed/rpc:dist_autograd_spawn_thrift -- 'test_backward_node_failure_python_udf $test_dist_autograd_spawn\.DistAutogradTestWithSpawn$' Differential Revision: D19821362 fbshipit-source-id: 82f735e33f8e608552418ae71592400fa3621e40	2020-02-14 09:32:17 -08:00
George Guanheng Zhang	0c98939b7b	Revert D19899550: [pytorch][PR] Second try on Von Mises: Make it JIT compatible Test Plan: revert-hammer Differential Revision: D19899550 Original commit changeset: fbcdd9bc9143 fbshipit-source-id: c8a675a8b53f884acd0e6c57bc7aa15faf83d5d6	2020-02-14 08:42:16 -08:00
George Guanheng Zhang	ff5f38f53b	Revert D19858239: [pytorch][PR] Refactor and add VS 14.16 and 2019 CI for Windows Test Plan: revert-hammer Differential Revision: D19858239 Original commit changeset: f068d8505886 fbshipit-source-id: b117e44d5552e157747920d8098ce3b86a29c6bf	2020-02-14 07:35:08 -08:00
Ahmad Salim Al-Sibahi	b1583ceb1e	Second try on Von Mises: Make it JIT compatible (#33177 ) Summary: Follow up from https://github.com/pytorch/pytorch/issues/17168 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/33177 Differential Revision: D19899550 Pulled By: ezyang fbshipit-source-id: fbcdd9bc91438164bcb2b1cbc314c765520754e1	2020-02-14 07:16:41 -08:00
Yinghai Lu	ecd3c252b4	Suport all length one SLS op lowering: C2 part (#33332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33332 We check the input shape of lengths and indices of SLS and add an attribute if they are the same. Test Plan: ``` buck test glow/fb/test/numerics:test_operator_onnxifinnpi -- test_slws_fused_8bit_rowwise_length1_graph ``` Reviewed By: ipiszy Differential Revision: D19874903 fbshipit-source-id: 06b643b5351d0ba19ba209b5a5b599fbb38b1dfc	2020-02-13 22:53:11 -08:00
Francis Charette Migneault	0150f40dde	dont force msvc /Ox flag which can conflict with /RTC1 in debug config (#33164 ) Summary: Relates to https://github.com/pytorch/pytorch/issues/33132 This fix doesn't add full multi-configuration support described in https://github.com/pytorch/pytorch/issues/33132 but at least avoid the error presented in the issue when `CMAKE_BUILD_TYPE=Debug` is used with MSVC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33164 Differential Revision: D19899727 Pulled By: ezyang fbshipit-source-id: 28a364d920c4a3fb577c6b484ccd69a133fbcf5d	2020-02-13 22:15:20 -08:00
Xiang Gao	602aec325d	Kill old cuda support (#33302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33302 Differential Revision: D19899586 Pulled By: ezyang fbshipit-source-id: 11293475795b4bfee9a65133bb6718649e220787	2020-02-13 21:48:07 -08:00
songyouwei	e5218e3e12	Add missing error messages for container modules (#29991 ) Summary: Container `Module`s, including `ModuleList`, `ParameterList` and `ParameterDict`, should not be called like a regular `Module`. This PR add error messages for these special modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29991 Differential Revision: D19698535 Pulled By: ezyang fbshipit-source-id: fe156a0bbb033041086734b38f8c6fde034829bf	2020-02-13 21:34:27 -08:00
Jongsoo Park	92fbf7cf97	[caffe2] use JIT'ed fp16 SLS (#32432 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32432 Use JIT'ed fp16 SLS in D19477209 from Caffe2 operators Test Plan: CI Reviewed By: jianyuh Differential Revision: D19477208 fbshipit-source-id: ef2ccba10f5f4c475166141bf09c266dedb92d38	2020-02-13 21:15:39 -08:00
Lu Fang	642bd51043	[ONNX] Skip problematic ONNX test to unblock CI (#33323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33323 skip the tests until it is fixed Test Plan: ci Reviewed By: hl475 Differential Revision: D19894675 fbshipit-source-id: 1cfc153577bf021171f4412115d84719beae7a91	2020-02-13 21:08:27 -08:00
Lu Fang	e5c7b7b8b5	Automatic update of fbcode/onnx to 04a29addfd5b912812addb8dea5f8763fbfaad01 (#33328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33328 Previous import was 8b3f7e2e7a0f2aba0e629e23d89f07c7fc0e6a5e Included changes: - [04a29add](https://github.com/onnx/onnx/commit/04a29add): Use // instead of # (#2598) <Lu Fang> - [f8e140a9](https://github.com/onnx/onnx/commit/f8e140a9): Kezhan/function update (#2596) <Ke Zhang> - [6185faae](https://github.com/onnx/onnx/commit/6185faae): fix the attribute types section in IR.md (#2590) <Ke Zhang> - [f254647a](https://github.com/onnx/onnx/commit/f254647a): Allow Constant operator to promote scalar and list to tensors. (#2592) <Jeremy Cochoy> - [f12ec799](https://github.com/onnx/onnx/commit/f12ec799): Add NegativeLogLikelihood(NllLoss) op (#2551) <liqunfu> Test Plan: ci Reviewed By: hl475 Differential Revision: D19897554 fbshipit-source-id: d8efb5c5ac8f9d71727de33c67af681ed8ec8123	2020-02-13 21:03:17 -08:00
Wanchao Liang	93179b1c1c	[jit] Initial use RRef in TorchScript (#33190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33190 This enable the initial RRef type to be used inside TorchScript, user could pass a python RRef into a torchscript function and call to_here inside. Specifically, this PR: - Add RRef schema type parsing - Add python interop for RRef in Python and into JIT - register to_here op in register_distributed_ops More support for RRef in TorchScript will be added in future PRs Test Plan: Imported from OSS Differential Revision: D19871244 Pulled By: wanchaol fbshipit-source-id: 7eca6c491a84666b261c70806254b705603bd663	2020-02-13 20:17:25 -08:00
Wanchao Liang	b2c5896432	[jit] Add RRef to IValue and JIT type system (#32992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32992 This PR add RRef to IValue and the JIT type system. - The RRefInterface abstract class inherit from intrusive_ptr_target, this made the RRef class can be hold in ivalue as intrusive_ptr - Add RRefType as a JIT type, it's a container type similar to future type. Test Plan: Imported from OSS Differential Revision: D19871242 Pulled By: wanchaol fbshipit-source-id: cb80ca32605096f9a42ef147109fb368a7c1d4d3	2020-02-13 20:17:20 -08:00
Wanchao Liang	9ae4d38a21	[rpc] Switch RRef to be managed by intrusive_ptr (#33189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33189 Add RRefInterface to Aten/Core, which will later be used by IValue Switch all the rpc code base to use intrusive_ptr instead of shared_ptr, so that we could add it to IValue. Actual adding to IValue and JIT will be in next PR Test Plan: Imported from OSS Differential Revision: D19871241 Pulled By: wanchaol fbshipit-source-id: d7e1fd04b46320e0f26c18591b49c92ad30a4032	2020-02-13 20:15:31 -08:00
Mike Ruberry	cb4e6d025a	Updates numpy to tensor negative stride error message (#33254 ) Summary: See https://discuss.pytorch.org/t/bugs-about-torch-from-numpy-array/43312. This update incorporates albanD 's suggestion into the error message, saving future users from having to ask or look on the forums if they encounter this issue and don't mind making their arrays contiguous. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33254 Differential Revision: D19885808 Pulled By: mruberry fbshipit-source-id: 8f0fd994cf8c088bf3c3940ab4dfb3ddbc5b3ede	2020-02-13 15:38:52 -08:00
Hector Yuen	a80d0330e4	add int4 fake fp16 mappings Summary: update this mapping with thte int4 sls ops so we can run netrunner Test Plan: testing with net_runner Reviewed By: jfix71 Differential Revision: D19879826 fbshipit-source-id: eac84b10e2365c21cb8a7cfbf3123e26a9945deb	2020-02-13 15:37:23 -08:00
Rohan Varma	eb9b4b1f29	handle errors in ProcessGroupAgent::listenLoop(). (#32957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32957 Closes https://github.com/pytorch/pytorch/issues/29703. If there is a gloo timeout and `recvWork->wait()` times out in `listenLoop()`, processGroupagent crashes since there is an unhandled exception in a thread. This catches the exception and exits the listen loop. In a follow up diff, we will enhance these error conditions so that if users attempt to send RPCs again, they are notified that the RPC agent was in a bad state and it was shutdown. This PR also adds a new option, `processGroupTimeout` to PG agent's backend options. This allows us to control the gloo timeout. ghstack-source-id: 98236783 Test Plan: Added a unit test. Differential Revision: D19678979 fbshipit-source-id: 3895ae754f407b84aca76c6ed3cb087d19178c40	2020-02-13 14:50:05 -08:00
Gregory Chanan	7ae1e023e7	glu: port cpu forward implementation to ATen (#26410 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26410 I only ported the CPU forward implementation for now to try a CPU-only benchmark. Test Plan: Imported from OSS Differential Revision: D17454519 Pulled By: gchanan fbshipit-source-id: ff757cf972c5627074fea2f92a670129007a49f4	2020-02-13 14:32:25 -08:00
Peter Bell	0808485c6a	Workaround performance bug / memory leak in GOMP (#32875 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32008 This is similar to CaoZhongZ's patch which runs on all OpenMP threads in the team and selectively exits early to scale the number of threads active. I have also restored the `if` clause from before https://github.com/pytorch/pytorch/issues/26963 so that running on 1 thread should still avoid additional synchronisation. One comment is that this does slightly change the meaning of `at::get_num_threads` inside of a `parallel_for` loop since it's not guaranteed that the function was called on that many threads. I've looked at the uses within ATen and couldn't see anything that would be problematic. There are a few places in `quantized` that seem to make this assumption but they always use a grain size of 1 so should be safe: `d9e99ab544/aten/src/ATen/native/quantized/cpu/qconv.cpp (L436-L437)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32875 Differential Revision: D19775823 Pulled By: VitalyFedyunin fbshipit-source-id: 4f843b78cdb9e2766339590d728923786a00af6d	2020-02-13 14:31:08 -08:00
Hong Xu	bbdc5b7bd0	Optimize error checking in mvlgamma (#32665 ) Summary: - Clean up error checking code - Avoid unecessary floating-point computation - Use float instead of double when possible to avoid massive cast in the tensor - Use bool instead of uint8_t for clear Boolean purpose - Improve error message Pull Request resolved: https://github.com/pytorch/pytorch/pull/32665 Differential Revision: D19601920 Pulled By: VitalyFedyunin fbshipit-source-id: 0c6c6b5ff227b1437a6c1bae79b2c4135a13cd37	2020-02-13 14:05:19 -08:00
Will Feng	5b922918d0	Disable flaky test TestCppExtensionAOT.test_cuda_extension in Windows CI (#33282 ) Summary: See https://github.com/pytorch/pytorch/issues/33270 for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33282 Differential Revision: D19886975 Pulled By: yf225 fbshipit-source-id: 7e6756095b1bb8c55fc5acb8fc2cb02c1e89b032	2020-02-13 13:10:44 -08:00
Prajjwal Bhargava	0c93c2b142	Add a warning sign for anomaly detection (#33176 ) (#33239 ) Summary: Fixes [33176](https://github.com/pytorch/pytorch/issues/33176) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33239 Differential Revision: D19879847 Pulled By: albanD fbshipit-source-id: 594b936c10f98c364331e782b64f42059413a741	2020-02-13 12:52:21 -08:00
Edward Yang	6c6a814a2c	Beef up documentation on DispatchKey.h (#33011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33011 I also reordered some of the keys in non-semantic ways to make the organizational grouping mroe clear. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19796584 Pulled By: ezyang fbshipit-source-id: 3083abadb47e9f382b9fbe981af0b34203c6ea4d	2020-02-13 12:26:19 -08:00
Supriya Rao	2e88d3d703	[quant] Add Quantized BatchNorm2d module (#33109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33109 Test Plan: python test/test_quantized_nn_mods.py ModuleAPITest.test_batch_norm Imported from OSS Differential Revision: D19861926 fbshipit-source-id: 67315e49b4b3577b965d422ca707d927d977feeb	2020-02-13 12:15:43 -08:00
Supriya Rao	d0435604a5	[quant] Add a quantized batch_norm operator (#33080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33080 Quantized batch norm for cases where batch norm cannot be fused with conv. AVX2 implementation is from Caffe2. Test Plan: python test/test_quantized.py TestQuantizedOps.test_batch_norm Imported from OSS Differential Revision: D19861927 fbshipit-source-id: bd8cd101fc063cb6358132ab7c651a160999293c	2020-02-13 12:15:38 -08:00
Andres Suarez	b28a834813	[codemod][lint][fbcode] Apply google-java-format Test Plan: Sandcastle. Visual inspection. Reviewed By: scottrice Differential Revision: D19878711 fbshipit-source-id: be56f70b35825140676be511903e5274d1808f25	2020-02-13 12:14:14 -08:00
Elias Ellison	bf16688538	[JIT] peephole optimize values with NoneType (#33264 ) Summary: If a value has the type None, we can always replace it with a None constant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33264 Differential Revision: D19878695 Pulled By: eellison fbshipit-source-id: 5d0e7ffb37c5747997df093fec3183039d8dff4d	2020-02-13 12:03:49 -08:00
Hong Xu	0c474d95d9	Remove Half support in binary cross entropy and some activation functions on CPU (#33206 ) Summary: For reasons similar to https://github.com/pytorch/pytorch/issues/33021. Note that the support of half type has not been available in any releases yet so it should be safe to remove (All forward ones concerning this PR were added in daef363b15c8a3aaaed09892004dc655df76ff81 and 8cb05e72c69fdd837548419770f3f1ba9807c16d) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33206 Differential Revision: D19861137 Pulled By: ezyang fbshipit-source-id: 38a3a398a716a782c26a611c56ddeab7eb7ac79e	2020-02-13 11:47:42 -08:00
peter	946f3a9ed7	Refactor and add VS 14.16 and 2019 CI for Windows (#33117 ) Summary: Changes according to https://github.com/pytorch/pytorch/issues/18319. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33117 Differential Revision: D19858239 Pulled By: ezyang fbshipit-source-id: f068d8505886b92c9388c9c636eab5bd20377ceb	2020-02-13 11:45:41 -08:00
Chaitanya Sri Krishna Lolla	2635055229	[ROCm] Enable 3D batch norms through MIOpen (#33262 ) Summary: Enable test for Caffe2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33262 Differential Revision: D19880486 Pulled By: bddppq fbshipit-source-id: af663a11137a53302e55198f38117ab6bdc9ec89	2020-02-13 11:29:51 -08:00
t-kuha	acea368095	Fix compilation error when buildng with FFMPEG (#27589 ) Summary: When building with FFMPEG, I encountered compilation error due to missing include/library. I also find the change in video_input_op.h will improve build on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27589 Differential Revision: D19700351 Pulled By: ezyang fbshipit-source-id: feff25daa43bd2234d5e75c66b9865b672a8fb51	2020-02-13 11:23:48 -08:00
Michael Carilli	40246fa63c	Gradient scaling API (#26512 ) Summary: This PR implements the gradient scaling API that mruberry, jjsjann123, ngimel, zdevito, gchanan and I have been discussing. Relevant issue/RFC: https://github.com/pytorch/pytorch/issues/25081. Volume-wise, this PR is mostly documentation and tests. The Python API (found entirely in `torch/cuda/amp/amp_scaler.py`) is lightweight . The exposed functions are intended to make the implementation and control flow of gradient scaling convenient, intuitive, and performant. The API is probably easiest to digest by looking at the documentation and examples. `docs/source/amp.rst` is the homepage for the Automatic Mixed Precision package. `docs/source/notes/amp_examples.rst` includes several examples demonstrating common but not-immediately-obvious use cases. Examples are backed by tests in `test_cuda.py` (and thankfully the tests pass :P). Two small utility kernels have been added in `native/cuda/AmpKernels.cu` to improve performance and avoid host-device synchronizations wherever possible. Existing optimizers, both in the wild and in Pytorch core, do not need to change to use the scaling API. However, the API was also designed to establish a contract between user scripts and optimizers such that writers of _new_ custom optimizers have the control points they need to implement fast, optionally sync-free updates. User scripts that obey the scaling API can drop such custom optimizers in and reap performance benefits without having to change anything aside from the optimizer constructor itself. [I know what the contract with custom optimizers should be](`35829f24ef/torch/cuda/amp/amp_scaler.py (L179-L184)`), but I'm waiting for review on the rest of the API before I go about documenting it (it will be given a dedicated section in `docs/source/notes/amp_examples.rst`. Currently, the gradient scaling examples do not include the auto-casting API as discussed in https://github.com/pytorch/pytorch/issues/25081. The gradient scaling API is intended to be orthogonal/modular relative to autocasting. Without auto-casting the gradient scaling API is fully use-_able_, but not terribly use-_ful_, so it's up to you guys whether you want to wait until auto-casting is ready before merging the scaling API as well. ### Todo - [ ] How do I get c10 registered status for my two custom kernels? They're very simple. Pull Request resolved: https://github.com/pytorch/pytorch/pull/26512 Differential Revision: D19859905 Pulled By: mruberry fbshipit-source-id: bb8ae6966214718dfee11345db824389e4286923	2020-02-13 11:06:06 -08:00
Rohan Varma	d613bd0522	[rpc][easy] move unnecessary python call directly to pybind (#33174 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33174 Closes https://github.com/pytorch/pytorch/issues/32780. It looks like this is the only callsite where we do `_get_current_rpc_agent().foo()`, and we can do this directly in the pybind layer to save some overhead. ghstack-source-id: 98200664 Test Plan: All UTs should pass. Differential Revision: D19828786 fbshipit-source-id: 5c34a96b5a970e57e6a1fdf7f6e54c1f6b88f3d8	2020-02-13 09:14:13 -08:00
George Guanheng Zhang	0bf60e348f	Revert D19878241: [pytorch][PR] Restore tests binary_macos_libtorch_2_7_cpu_build and binary_macos_li… Test Plan: revert-hammer Differential Revision: D19878241 Original commit changeset: 07bce43e4667 fbshipit-source-id: 7f76717d73e264f30e8f56fb7bc38c8928dea092	2020-02-13 09:09:11 -08:00
Guanheng Zhang	ff7d147732	Restore tests binary_macos_libtorch_2_7_cpu_build and binary_macos_li… (#33291 ) Summary: Fix https://github.com/pytorch/pytorch/issues/33209 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33291 Differential Revision: D19878241 Pulled By: zhangguanheng66 fbshipit-source-id: 07bce43e466708dacd37b87ba3419435c6a7cde5	2020-02-13 08:48:16 -08:00
Summer Deng	d554b112e3	Add histogram collection and weight prepacking utils (#33125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33125 Provide histogram collection and weights prepacking interface for Dper to auto quantize the Ads models. Test Plan: buck test mode/opt deeplearning/numeric_suite/toolkit/test:int8_static_utils_test buck test mode/opt deeplearning/numeric_suite/toolkit/test:histogram_utils_test Reviewed By: amylittleyang Differential Revision: D19794819 fbshipit-source-id: 6a4f4a6684da0977b7df2feed8a4b961db716da8	2020-02-13 01:40:20 -08:00
Hao Lu	b98c7d34ed	[TVM] Add clip op to c2_frontend (#33257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33257 Test Plan: buck test caffe2/caffe2/fb/tvm:test_tvm_transform Reviewed By: yinghai Differential Revision: D19866406 fbshipit-source-id: e903e15178af323d0bd1f804e09919023c0a2989	2020-02-12 22:30:43 -08:00
Hao Lu	16685d93e9	[TVM] Add ReplaceNaN op (#33256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33256 Test Plan: buck test caffe2/caffe2/fb/tvm:test_tvm_transform Reviewed By: yinghai Differential Revision: D19851553 fbshipit-source-id: dee048c52ade16d9e531256b90e5d3391632cd8e	2020-02-12 22:29:30 -08:00
Lu Fang	03e9b9ce18	[PyTorch BC] Remove unnecessary items in whitelist (#33247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33247 remove stale items. Test Plan: ci Reviewed By: hl475 Differential Revision: D19861294 fbshipit-source-id: 2b112e5908c19a1ff190e3850085038065d21c53	2020-02-12 21:34:18 -08:00
Michael Ranieri	e45343fa14	TORCH_INTERNAL_ASSERT_DEBUG_ONLY not eating message string (#33251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33251 Somehow this was preventing `c10::Error` exceptions from ever being thrown on windows when `defined(NDEBUG) == false`. Kinda scary. Test Plan: sandcastle green, made sure `intrusive_ptr_test.cpp` (givenStackObject_whenReclaimed_thenCrashes) passed inside ovrsource using `mode/win/dev-debug` Reviewed By: malfet Differential Revision: D19865667 fbshipit-source-id: c32d5752025c043e57d16c6d14a94b069bed0bc3	2020-02-12 21:23:34 -08:00
davidriazati	f61b45fc89	[jit] Support properties on `Device` (#32953 ) Summary: Stacked PRs * #32955 - [jit] Fix flipped PackedSequence outputs in script * #32953 - [jit] Support properties on `Device` PyTorch devices have a `index` and `type` property. This PR adds support for both to TorchScript ](https://our.intern.facebook.com/intern/diff/19849320/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32953 Pulled By: driazati Differential Revision: D19849320 fbshipit-source-id: ce845258c6110058dd9ea1f759ef74b7ed2e786e	2020-02-12 18:59:10 -08:00
Mikhail Zolotukhin	806e7daa1f	Rename TorchScript compiler to IR emitter to better reflect its function. (#33127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33127 Test Plan: Imported from OSS Differential Revision: D19806503 Pulled By: ZolotukhinM fbshipit-source-id: ab78bdbbac5f12dbcc6c2e2573f5862a16ffcf3d	2020-02-12 18:45:13 -08:00
anjali411	91744907d4	SGD: updated step and class design (#32592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32592 Differential Revision: D19868154 Pulled By: anjali411 fbshipit-source-id: ce888efc68b1531d97e8b0abf2b146198e012d2f	2020-02-12 18:38:55 -08:00
Jianyu Huang	914610d079	[pytorch][quant] Add assert for min, max, qmin, qmax for ChooseQuantizationParams (#32739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32739 As Title says. ghstack-source-id: 98061467 Test Plan: CI Differential Revision: D19610810 fbshipit-source-id: f9621cd7d780769941ed77974b19c5226d4b2b30	2020-02-12 16:49:31 -08:00
Xiaomeng Yang	bc0ab07064	Opitmize Unfold3d to improve performance of Conv3d (#33191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33191 Opitmize Unfold3d to improve performance of Conv3d forward Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "Conv3d" Reviewed By: houseroad Differential Revision: D19821946 fbshipit-source-id: 937adafddb9a1aef5f1d1423dd99884c59e465f9	2020-02-12 16:34:55 -08:00
Martin Yuan	0e753b2818	Fix SIGABORT caused by double exception in PyTorchStreamReader when file not found. (#33243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33243 If a file does not exist in an archive, PyTorchStreamReader throws an exception. However, when PyTorchStreamReader is destructed another exception is thrown while processing the first exception. As a result of this double exception there is SIGABORT. Thanks dreiss for catching this bug and suggesting the fix. It happened when he used _load_for_mobile to load a torch script file without bytecode session. A unittest is added to test this case. Test Plan: Imported from OSS Differential Revision: D19859205 Pulled By: iseeyuan fbshipit-source-id: 8f96b6256f1a1f933fce1c256d64604c7e9269e4	2020-02-12 16:27:15 -08:00
svcscm	ac8511a21e	Updating submodules Summary: GitHub commits: `927d8afa7a` `e64508917b` `40d690970f` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 9135af67550f83a598a0a0baa1f9f6b1e4311ddf	2020-02-12 15:43:34 -08:00
Brian Stark	f9ad5528e0	Fix for rand_like as well. (#33095 ) Summary: This is a followup PR to https://github.com/pytorch/pytorch/issues/32830 This solves the same issue for RandLike which we saw in RandNLike Pull Request resolved: https://github.com/pytorch/pytorch/pull/33095 Reviewed By: hl475 Differential Revision: D19848625 Pulled By: houseroad fbshipit-source-id: 147921becf79490027a93606d52c5bc41d9eaf7f	2020-02-12 14:54:39 -08:00
Zachary DeVito	f045dab3dd	Remove ImplicitTensorToNum (#32761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32761 This replaces ImplicitTensorToNum with result-specific operators like IntImplicit, FloatImplicit, or ScalarImplicit. Note that ScalarImplicit was not correctly implemented before and this PR fixes the lapse. This does not change on-disk serialization because these operators are not serialized directly but written as eg. `annotated(int, foo)`. Test Plan: Imported from OSS Differential Revision: D19615385 Pulled By: zdevito fbshipit-source-id: 48575f408e8219d2ec5b46936fc2aa691f283976	2020-02-12 14:49:07 -08:00
Zachary DeVito	99349defc1	remove unnecessary Node* ops (#32760 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32760 Minor changes to the way ops are implemented to remove incidental use of Node* in the operator implementation. Current state for operators that previously took Node: ``` TBD: USES NODE: prim::DifferentiableGraph(...) -> (...) USES NODE: prim::profile(...) -> (...) USES NODE: prim::FusionGroup(...) -> (...) USES NODE: prim::PythonOp(...) -> (...) USES NODE: prim::ImplicitTensorToNum(Tensor a) -> Scalar # next PR Should be made interpreter primitives: USES NODE: prim::TupleUnpack(...) -> (...) USES NODE: prim::TupleSlice(...) -> (...) USES NODE: prim::TupleConstruct(...) -> (...) USES NODE: prim::ListUnpack(...) -> (...) USES NODE: prim::ListConstruct(...) -> (...) USES NODE: prim::DictConstruct(...) -> (...) USES NODE: prim::Constant() -> (...) USES NODE: prim::isinstance(...) -> (...) USES NODE: prim::CreateObject(...) -> (...) USES NODE: prim::fork(...) -> (...) USES NODE: aten::warn(str message, , int stacklevel=2) -> () # need stack level information, so ideally in interpreter so it can look at the stack Should be made into vararg operators, i.e. the operators last argument should be an IValue that contains the number of arguments. USES NODE: prim::FusedConcat(...) -> (...) USES NODE: prim::MMTreeReduce(...) -> (...) USES NODE: prim::MMBatchSide(...) -> (...) USES NODE: prim::ConstantChunk(...) -> (...) USES NODE: prim::AutogradAnyNonZero(...) -> bool USES NODE: prim::BroadcastSizes(...) -> (...) USES NODE: prim::ChunkSizes(...) -> (...) USES NODE: aten::format(str self, ...) -> str USES NODE: prim::Print(...) -> (...) fixed: USES NODE: aten::extend(Tensor[](a!) self, Tensor [] other) -> () USES NODE: aten::copy(Tensor[](a) self) -> Tensor[] USES NODE: aten::extend(int[](a!) self, int [] other) -> () USES NODE: aten::copy(int[](a) self) -> int[] USES NODE: aten::extend(float[](a!) self, float [] other) -> () USES NODE: aten::copy(float[](a) self) -> float[] USES NODE: aten::extend(bool[](a!) self, bool [] other) -> () USES NODE: aten::copy(bool[](a) self) -> bool[] USES NODE: aten::extend(t[](a!) self, t [] other) -> () USES NODE: aten::copy(t[](a) self) -> t[] USES NODE: aten::keys(Dict(str, t) self) -> str[]() USES NODE: aten::values(Dict(str, t) self) -> t[]() USES NODE: aten::dict((str, tVal)[] inputs) -> Dict(str, tVal) USES NODE: aten::keys(Dict(int, t) self) -> int[]() USES NODE: aten::values(Dict(int, t) self) -> t[]() USES NODE: aten::dict((int, tVal)[] inputs) -> Dict(int, tVal) USES NODE: aten::keys(Dict(float, t) self) -> float[]() USES NODE: aten::values(Dict(float, t) self) -> t[]() USES NODE: aten::dict((float, tVal)[] inputs) -> Dict(float, tVal) USES NODE: aten::keys(Dict(Tensor, t) self) -> Tensor[]() USES NODE: aten::values(Dict(Tensor, t) self) -> t[]() USES NODE: aten::dict((Tensor, tVal)[] inputs) -> Dict(Tensor, tVal) USES NODE: aten::test_vartype2(t a, t[] b) -> (t[]) USES NODE: aten::_ncf_unsqueeze(Tensor self, int ndim) -> Tensor USES NODE: aten::_ncf_view(Tensor self, int[] input_shape, int normalized_ndim) -> Tensor USES NODE: prim::is_none(int? a) -> bool USES NODE: aten::__interpolate(Tensor input, int? size = None, float[]? scale_factor = None, str mode = 'nearest', bool? align_corners = None, bool? recompute_scale_factor = None) -> Tensor USES NODE: aten::__interpolate(Tensor input, int[]? size = None, float[]? scale_factor = None, str mode = 'nearest', bool? align_corners = None, bool? recompute_scale_factor = None) -> Tensor USES NODE: aten::__interpolate(Tensor input, int? size = None, float? scale_factor = None, str mode = 'nearest', bool? align_corners = None, bool? recompute_scale_factor = None) -> Tensor USES NODE: aten::__interpolate(Tensor input, int[]? size = None, float? scale_factor = None, str mode = 'nearest', bool? align_corners = None, bool? recompute_scale_factor = None) -> Tensor USES NODE: aten::sorted(t[](a) self) -> (t[]) USES NODE: aten::sort(t[](a!) self, bool reverse=False) -> () USES NODE: aten::test_vartype(t[] a, t b) -> (t) USES NODE: prim::unchecked_unwrap_optional(t(a)? optional) -> t(a) USES NODE: prim::unchecked_cast(...) -> (...) USES NODE: aten::dict() -> Dict(str, Tensor) USES NODE: prim::Load(...) -> (...) USES NODE: prim::Store(...) -> (...) USES NODE: prim::Drop(...) -> (...) USES NODE: aten::tensor(t[] data, , ScalarType? dtype=None, Device? device=None, bool requires_grad=False) -> Tensor USES NODE: aten::as_tensor(t[] data, *, ScalarType? dtype=None, Device? device=None) -> Tensor ``` Test Plan: Imported from OSS Differential Revision: D19615387 Pulled By: zdevito fbshipit-source-id: 95298c3c4249b9f812c332d13f0fb79daeecb662	2020-02-12 14:49:02 -08:00
Zachary DeVito	72a00a8a9c	Remove Node dependencies from operator.h (#32682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32682 This moves code around so that operator.h/cpp no longer requires a full definition of Node* nor does it include alias analysis or the pretty printer. This should make it possible to include in the mobile build. Functionality for checking if operators match Node and to look up and operator for a Node have moved to the Node object. Test Plan: Imported from OSS Differential Revision: D19615386 Pulled By: zdevito fbshipit-source-id: e38bdf29971183597ef940d061c06ba56e71d9c5	2020-02-12 14:47:26 -08:00
root	ab14375b08	Workaround for CUDA10.2.89 CUDA extension compilation error (#33230 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/33203 PR based on https://github.com/mpark/variant/pull/73 Verified locally on CUDA10.2.89 and 10.1.243 Thanks ngimel for the hint and gridley for the initial fix in the variant repo! :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/33230 Differential Revision: D19858083 Pulled By: ngimel fbshipit-source-id: b9438084f5688712c6aa6b17813c68ccde237bbb	2020-02-12 14:23:30 -08:00
Michael Ranieri	40265e2d66	prevent various warnings related to undef and redef (#33196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33196 Test Plan: Sandcastle green Reviewed By: malfet Differential Revision: D19842268 fbshipit-source-id: 47bc3d7a75e803041491e11a648b4a9e7d9cc72c	2020-02-12 13:28:35 -08:00
lixinyu	323b0e0a0f	fix #30480 torch.normal shape checking is broken (#32243 ) (#33050 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33050 Following what gchanan proposed in #30480 - If the (logical) shapes of mean and std are broadcastable, we broadcast them for the output Done in tensor iterator already. - If the (logical) shapes of mean and std are not broadcastable and they have the same number of elements, we fall back to the old behavior (pick the shape of mean) Done by reshape std to the same shape of mean. - If the (logical) shapes of mean and std are not broadcastable and don't have the same number of elements, we error out. Done by tensor iterator already. Test Plan: Imported from OSS Differential Revision: D19771186 Pulled By: glaringlee fbshipit-source-id: a0b71063c7f5fdda2d4ceb84e06384414d7b4262	2020-02-12 12:43:09 -08:00
rivergold	2e9b7c5fe1	Migrate dist from TH to ATen(CPU, CUDA) (#29714 ) Summary: [https://github.com/pytorch/pytorch/issues/24691](https://github.com/pytorch/pytorch/issues/24691) [https://github.com/pytorch/pytorch/issues/24551](https://github.com/pytorch/pytorch/issues/24551) Benchmark: Speed ```python import time, sys import torch import math inf = math.inf torch.manual_seed(0) devices = ["cpu", "cuda"] ps = [0, 1, 2, 3, 4, inf, -inf] # Warm up for device in devices: for n in [1, 10, 100, 1000]: x = torch.randn(100, n, requires_grad=False, device=device) y = torch.randn(100, n, requires_grad=False, device=device) for i in range(1000): for p in ps: dist_xy = torch.dist(x, y, p) for device in devices: print('On {}'.format(device)) for n in [1, 10, 100, 1000]: total_time = 0 x = torch.randn(100, n, requires_grad=False, device=device) y = torch.randn(100, n, requires_grad=False, device=device) for i in range(10000): for p in ps: t1 = time.time() dist_xy = torch.dist(x, y, p) t2 = time.time() total_time += (t2 - t1) average_time = total_time / 10000 / len(ps) * 1000 print("input size(100, %d) average time is %.8f (ms)." % (n, average_time)) ``` Output Before: ```shel On cpu input size(100, 1) average time is 0.0079491 (ms). input size(100, 10) average time is 0.0364167 (ms). input size(100, 100) average time is 0.3120752 (ms). input size(100, 1000) average time is 3.0605820 (ms). On cuda input size(100, 1) average time is 0.04745627 (ms). input size(100, 10) average time is 0.04919453 (ms). input size(100, 100) average time is 0.06601572 (ms). input size(100, 1000) average time is 0.07849015 (ms). ``` After: ```shell On cpu input size(100, 1) average time is 0.0099936 (ms). input size(100, 10) average time is 0.0340414 (ms). input size(100, 100) average time is 0.2793379 (ms). input size(100, 1000) average time is 0.7858076 (ms). On cuda input size(100, 1) average time is 0.04410237 (ms). input size(100, 10) average time is 0.03326339 (ms). input size(100, 100) average time is 0.03314828 (ms). input size(100, 1000) average time is 0.03990038 (ms). ``` Precision ```python for device in devices: torch.manual_seed(0) print('On {}'.format(device)) for n in [1, 10, 100, 1000]: x = torch.randn(100, n, requires_grad=False).to(device) y = torch.randn(100, n, requires_grad=False).to(device) for p in ps: dist_xy_float = torch.dist(x, y, p) dist_xy_double = torch.dist(x.double(), y.double(), p) difference = torch.abs(dist_xy_double - dist_xy_float) print('input size (100, {}), p: {}, float: {}, double: {}, difference: {}'.format(n, p, dist_xy_float, dist_xy_double, difference)) ``` Part of [output](https://gist.github.com/rivergold/dd95014dc7f163b22f72699d1134cdd2) Before: ```shell On cpu input size (100, 100), p: 0, float: 10000.0, double: 10000.0, difference: 0.0 input size (100, 100), p: 1, float: 11474.1806640625, double: 11474.185433543797, difference: 0.00476948129653465 input size (100, 100), p: 2, float: 143.50729370117188, double: 143.5073391487937, difference: 4.5447621829453055e-05 input size (100, 100), p: 3, float: 36.045475006103516, double: 36.04550275212738, difference: 2.774602386779179e-05 input size (100, 100), p: 4, float: 18.796083450317383, double: 18.79609807865317, difference: 1.4628335787136848e-05 input size (100, 100), p: inf, float: 5.540258407592773, double: 5.5402586460113525, difference: 2.384185791015625e-07 input size (100, 100), p: -inf, float: 3.4868717193603516e-06, double: 3.4868717193603516e-06, difference: 0.0 On cuda input size (100, 100), p: 0, float: 10000.0, double: 10000.0, difference: 0.0 input size (100, 100), p: 1, float: 11474.1865234375, double: 11474.185433543797, difference: 0.00108989370346535 input size (100, 100), p: 2, float: 143.50733947753906, double: 143.5073391487933, difference: 3.2874575595087663e-07 input size (100, 100), p: 3, float: 36.04550552368164, double: 36.045502752127405, difference: 2.7715542358919265e-06 input size (100, 100), p: 4, float: 18.796098709106445, double: 18.796098078653177, difference: 6.304532682577246e-07 input size (100, 100), p: inf, float: 5.540258407592773, double: 5.5402586460113525, difference: 2.384185791015625e-07 input size (100, 100), p: -inf, float: 3.4868717193603516e-06, double: 3.4868717193603516e-06, difference: 0.0 ``` After ```shell On cpu input size (100, 100), p: 0, float: 10000.0, double: 10000.0, difference: 0.0 input size (100, 100), p: 1, float: 11474.1806640625, double: 11474.185433543797, difference: 0.00476948129653465 input size (100, 100), p: 2, float: 143.50729370117188, double: 143.5073391487937, difference: 4.5447621829453055e-05 input size (100, 100), p: 3, float: 36.045475006103516, double: 36.04550275212738, difference: 2.774602386779179e-05 input size (100, 100), p: 4, float: 18.796083450317383, double: 18.79609807865317, difference: 1.4628335787136848e-05 input size (100, 100), p: inf, float: 5.540258407592773, double: 5.5402586460113525, difference: 2.384185791015625e-07 input size (100, 100), p: -inf, float: 3.4868717193603516e-06, double: 3.4868717193603516e-06, difference: 0.0 On cuda input size (100, 100), p: 0, float: 10000.0, double: 10000.0, difference: 0.0 input size (100, 100), p: 1, float: 11474.185546875, double: 11474.185433543797, difference: 0.00011333120346534997 input size (100, 100), p: 2, float: 143.50733947753906, double: 143.5073391487933, difference: 3.2874575595087663e-07 input size (100, 100), p: 3, float: 36.04550552368164, double: 36.045502752127405, difference: 2.7715542358919265e-06 input size (100, 100), p: 4, float: 18.796096801757812, double: 18.796098078653177, difference: 1.2768953645547754e-06 input size (100, 100), p: inf, float: 5.540258407592773, double: 5.5402586460113525, difference: 2.384185791015625e-07 input size (100, 100), p: -inf, float: 3.4868717193603516e-06, double: 3.4868717193603516e-06, difference: 0.0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29714 Differential Revision: D19769518 Pulled By: albanD fbshipit-source-id: 69b79b64f1f190b410efe884662b6601e903eccf	2020-02-12 12:26:48 -08:00
Tao Xu	97bf41ca22	Fix iOS x86_64 CI failure (#33194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33194 ### Summary The iOS x86_64 job has been failed for a few days. I haven't found the root cause, but seems like updating the torchvision to its latest version can fix the problem ### Test Plan - the x86_64 job works Test Plan: Imported from OSS Differential Revision: D19845079 Pulled By: xta0 fbshipit-source-id: 5034e252600b6704b860d68c371a65bef4cf37fc	2020-02-12 11:07:48 -08:00
Xiang Gao	87640570b3	Make CUDA OOM error a type (#33056 ) Summary: There are cases when we want to recover from CUDA OOM, for example, some cuDNN algorithms use huge workspace and we want to recover from OOM to pick a different algorithm, in such cases, there is no reason to catch all errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33056 Differential Revision: D19795359 Pulled By: ezyang fbshipit-source-id: a34e23bf6d172dc0257389251dafef5b38d27d2b	2020-02-12 10:45:40 -08:00
Gregory Chanan	a389f8fa18	Revert D18912680: Prepare templates Test Plan: revert-hammer Differential Revision: D18912680 Original commit changeset: 9e3828e42ee5 fbshipit-source-id: 9ef81991394f4e36f0652dfe594d5122969bd9cf	2020-02-12 10:39:09 -08:00
Kurt Mohler	3cfea39968	Document how BCELoss avoids infinite results (#33160 ) Summary: Issue https://github.com/pytorch/pytorch/issues/31453 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33160 Differential Revision: D19835527 Pulled By: albanD fbshipit-source-id: 82fd2dd46ffbc87e90ca8e100db411b6ff6bfe32	2020-02-12 07:56:19 -08:00
albanD	05281a5671	Add nice error message if missing overrides in custom autograd.Function Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33142 Test Plan: Imported from OSS Differential Revision: D19815786 Pulled By: albanD fbshipit-source-id: 5513d900c7b711b625383686fcf03f822ab7ea80	2020-02-12 07:55:06 -08:00
Kirby Banman	09915ad570	[TensorBoard] Correct typo and wrap dataformats. (#31604 ) Summary: Resolves issue https://github.com/pytorch/pytorch/issues/31603 - A minor spelling typo is corrected: "suitible" --> "suitable" - A minor quality of life improvement is added: the data format strings are better rendered as fixed width to indicate that they are string constants. "CHW" --> "`CHW`" Pull Request resolved: https://github.com/pytorch/pytorch/pull/31604 Differential Revision: D19697293 Pulled By: ezyang fbshipit-source-id: ee38b0d4c9ca8a233ac9243c310d9a3b42ad6f32	2020-02-12 07:51:04 -08:00
vfdev	c6e0360812	Minor change of docstring example of WeightedRandomSampler (#30846 ) Summary: Previous example ```python >>> list(WeightedRandomSampler([0.1, 0.9, 0.4, 0.7, 3.0, 0.6], 5, replacement=True)) [0, 0, 0, 1, 0] ``` may seem misleading according to provided weights. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30846 Differential Revision: D19697367 Pulled By: ezyang fbshipit-source-id: 3d6e3cd0cecb5272a368707ba35bc7acdbd82c30	2020-02-12 07:46:39 -08:00
Jongsoo Park	1767ae8daf	[caffe2] remove dnnlowp log code (#33184 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33184 dnnlowp specific code shouldn't be in the default FC in the first place Test Plan: Just removing #ifdef #endif Reviewed By: jianyuh Differential Revision: D19835301 fbshipit-source-id: 7880cf298bedb3f0bc407d140d342124663ea4a7	2020-02-12 00:47:09 -08:00
Lin Yang	9d9fa2eace	[2/3] Bind Bucketize to PyTorch (#33014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33014 Export Bucketize to PyTorch. Test Plan: buck test caffe2/caffe2/python/operator_test:torch_integration_test Reviewed By: bddppq Differential Revision: D19737534 fbshipit-source-id: be1c892bb8d01da9892f221f150f1a2788ac732e	2020-02-11 23:20:10 -08:00
Jithun Nair	47e589eb6e	Disable flaky tests test_DistributedDataParallel and test_backend_group for ROCm (#33211 ) Summary: Getting intermittent error in CI runs: TestDistBackend.test_DistributedDataParallel ``` 02:36:32 File "/var/lib/jenkins/.local/lib/python3.6/site-packages/torch/serialization.py", line 442, in _legacy_save 02:36:32 pickler.dump(obj) 02:36:32 AttributeError: Can't pickle local object 'Module._replicate_for_data_parallel.<locals>.zero_grad' ``` Some CI runs where it failed: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/16163/console https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/16165/console TestDistBackend.test_backend_group ``` test_backend_group (__main__.TestDistBackend) ... Memory access fault by GPU node-5 (Agent handle: 0x265c670) on address 0x7fded754a000. Reason: Page not present or supervisor privilege. ``` Some CI runs where it failed: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py3.6-clang7-rocmdeb-ubuntu16.04-test2/16288/console Pull Request resolved: https://github.com/pytorch/pytorch/pull/33211 Differential Revision: D19849089 Pulled By: bddppq fbshipit-source-id: 5e997653cc344f4c6819d46bedc6d3bd75b5d854	2020-02-11 22:50:03 -08:00
Hongyu Cai	5bc5dd58f3	[jit] fix a typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29107 Differential Revision: D19698662 Pulled By: ezyang fbshipit-source-id: e7eea3246008e2c6d560ff5e4d84b90f65ff1afd	2020-02-11 22:45:28 -08:00
Xiang Gao	b9a5353fee	Move where cuda implementation to TensorIterator (#33228 ) Summary: Reopen of https://github.com/pytorch/pytorch/pull/32984 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33228 Differential Revision: D19850862 Pulled By: ngimel fbshipit-source-id: b92446a49b4980188fa4788220a2164650e905c2	2020-02-11 22:28:27 -08:00
svcscm	7863d2413d	Updating submodules Summary: GitHub commits: `9fd0d1a3c7` `bcaf9cdf1f` `3e49249d30` `98307ea1ec` `f48ebb4d48` `353f9c9f29` `1caef25fc0` `805ab665f2` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 609187c69ba2c6b31a05dcfdb1770054002ddb6e	2020-02-11 22:00:54 -08:00
Summer Deng	d609497dde	bulk_eval_collect_histograms Summary: Collect activation histograms along the model evaluation and aggregate all the histograms from multiple threads/readers into one file The original functionality of bulk_eval workflow is still valid. The output predictions and extra blobs will be exported to a hive table, which will be very useful for numerical debugging. Test Plan: FBL ```flow-cli canary dper.workflows.bulk_eval.export --mode dbg --parameters-file experimental/summerdeng/sparsenn/bulk_eval_input_configs.json --run-as-secure-group team_ai_system_sw_hw_co-design --entitlement gpu_prod --name "Histogram collection with caffe2 logging. Attach histogram observer to the predict net. Use small model 102343030. " ``` f163861773 When the flow is done, we can get all the histogram files under the specified dir. For example: ``` -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6ca65cc0 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6cde8a80 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6d144840 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6d4a9600 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6da303c0 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6dd1c800 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6e0855c0 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6e3e0380 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6e95a140 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6eafcf00 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6ed1a100 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6f094ec0 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6f561c80 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6f783a40 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb6fccb7c0 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb7003d580 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb703ae340 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb7084ae80 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb70bc1c40 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb70f43a00 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb70ff7680 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb71361300 -rw-rw-r--. 1 185754 185754 3945012 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb716df0c0 -rw-rw-r--. 1 185754 185754 4024538 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb7199c780 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb71b72f00 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb72330000 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb72598100 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb7290d880 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb72b03980 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb72f1f160 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fcb8bcee9e0 -rw-rw-r--. 1 185754 185754 3944091 Jan 23 09:45 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.0x7fd51b457260 -rw-rw-r--. 1 185754 185754 4026659 Jan 23 09:51 /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.final ``` The aggregated histogram file is /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.final. It can be loaded to the following auto quant workflow for int8 static quantization. ######## Code refactoring ######## Moved the utility functions to process activation histograms to the deeplearning/numeric_suite/toolkit:hist_processor and add the dependency in dper. We also had a hist_compiler in the caffe2/caffe2/fb/fbgemm/numerical_debugger/python_utils/hist_compiler.py. Also refactored the code to reuse the utility functions in deeplearning/numeric_suite/toolkit:hist_processor. The histograms from bulk_eval and the hist_compiler are identical. /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.compiled.bak /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/sparsenn/bulk_eval.txt.final.bak Reviewed By: hx89 Differential Revision: D19270090 fbshipit-source-id: c7ecb4f2bbf1ea725c52e903356ad9a7b9ad73ac	2020-02-11 21:39:47 -08:00
Trevor Hickey	9e7638f7c1	"batchSize" was set but never used (#32294 ) Summary: fixes a compiler warning: ``` torch/aten/src/ATen/native/cuda/MaxUnpooling.cu.cc(402): warning: variable "batchSize" was set but never used ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32294 Differential Revision: D19697277 Pulled By: ezyang fbshipit-source-id: b9821be325826dc4785cad7994803b54f1711a0c	2020-02-11 21:28:49 -08:00
rohithkrn	66ee4f1c81	[ROCm] Enable Bfloat16 type for activation and batch-norm Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32065 Differential Revision: D19728858 Pulled By: ezyang fbshipit-source-id: 8f828c558bfe6c5f43f476ff8a0f967341f8f351	2020-02-11 21:04:20 -08:00
Hong Xu	f255b7a3ac	Drop support of the build option USE_GLOO_IBVERBS (#33163 ) Summary: Two releases have passed since its deprecation: 8a026d4f74b71944ac2860c315996165a40f5626 Pull Request resolved: https://github.com/pytorch/pytorch/pull/33163 Differential Revision: D19850713 Pulled By: ezyang fbshipit-source-id: 30a60df470b88e8c40e33112296e437cde29c49f	2020-02-11 20:35:50 -08:00
cshesse	1487137c5b	add missing default value for LRScheduler.step() (#32411 ) Summary: see also other type errors in https://github.com/pytorch/pytorch/pull/30576 and https://github.com/pytorch/pytorch/pull/30441 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32411 Differential Revision: D19697245 Pulled By: ezyang fbshipit-source-id: d0295d747541adec5d6fad646f4cf4bb2f04abf5	2020-02-11 20:34:33 -08:00
Nathan Goldbaum	139afd0ea7	Fix link to py-spy content in contribution guide TOC (#31760 ) Summary: The extra dashes are breaking the link here Pull Request resolved: https://github.com/pytorch/pytorch/pull/31760 Differential Revision: D19697301 Pulled By: ezyang fbshipit-source-id: 65de026b9016dc8689c9dac9efb8aafd00b535cd	2020-02-11 20:27:35 -08:00
Saurabh Aggarwal	74c8a8f7bc	Revert D19825127: [pytorch][PR] Move where cuda implementation to TensorIterator Test Plan: revert-hammer Differential Revision: D19825127 Original commit changeset: bbf4682349d9 fbshipit-source-id: 0c439b8c9a00a5aa46fd196396cf7cc83cddb1b4	2020-02-11 19:49:18 -08:00
Michael Ranieri	000a5e2b7f	bad tbb lambda capture, bad chunk size (#30352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30352 1) tbb forwards us ident through parameter, we don't need to capture it. 2) tbb is being passed steps <= 0 which is bad. Taken from TBB documentation: ``` The index type must be an integral type. The loop must not wrap around. The step value must be positive. If omitted, it is implicitly 1. ``` I have a build that uses `TBB_USE_DEBUG=1` and there are currently a lot of issues with PyTorch use. Is TBB version not tested very much right now? ghstack-source-id: 94459382 Test Plan: CI green Differential Revision: D18666029 fbshipit-source-id: d5aa8327b03181d349e1964f9c8211298c433d6a	2020-02-11 18:46:32 -08:00
Zafar Takhirov	a23009f98f	Quantized leaky relu Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33004 Test Plan: Imported from OSS Differential Revision: D19740193 Pulled By: z-a-f fbshipit-source-id: 32542d5465db44190366a2f8b737305a03b5fa76	2020-02-11 17:56:02 -08:00
peter	769abddfa3	Build ahead-of-time C++ extensions with ninja on windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33084 Differential Revision: D19817361 Pulled By: ezyang fbshipit-source-id: 95a6d0ffa9beb6885c8a41688621b33da51706ae	2020-02-11 17:50:09 -08:00
Zafar Takhirov	acd51e13f7	TorchScript add check if quantized Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32890 Test Plan: Imported from OSS Differential Revision: D19673463 Pulled By: z-a-f fbshipit-source-id: 453ff662810845fcaeb8e6d5919afa8e2d395768	2020-02-11 17:38:49 -08:00
Jithun Nair	cb39a5400c	Use C10_WARP_SIZE to fix functionality on HIP vs CUDA for batch_norm_backward_reduce (#33098 ) Summary: 1. Use C10_WARP_SIZE instead of hardcoded value "32". 2. `getNumThreads` returns a minimum of 32 for CUDA, which is same as the warp size in CUDA. However, for HIP, it returns a minimum of 16, which is less than the warp size (64) in HIP. This creates an issue in the [reduce function](`14548c2d5b/aten/src/ATen/native/cuda/Normalization.cuh (L115)`) when it zeroes out the other entries in shared memory [here](`14548c2d5b/aten/src/ATen/native/cuda/Normalization.cuh (L137)`): since `blockDim.x` is at least equal to the warp size in CUDA, this never zeroes out `shared[0]`, but for HIP, since `blockDim.x` could be 16 or 32, which is less than the warp size (64), this results in `blockDim.x * blockDim.y` being potentially less than the warp size for small cases, which then zeroes out `shared[0]` as well. This results in an erroneous output of zero for the reduce function on ROCm (depending on how the block dimensions are set). Pull Request resolved: https://github.com/pytorch/pytorch/pull/33098 Differential Revision: D19837355 Pulled By: bddppq fbshipit-source-id: ea526acd82ec08b1acb25be860b7e663c38ff173	2020-02-11 16:47:22 -08:00
Lu Fang	44723a1c24	[ONNX] Fix ONNX CI (#33200 ) Summary: Move the data to aws Pull Request resolved: https://github.com/pytorch/pytorch/pull/33200 Reviewed By: hl475 Differential Revision: D19843193 Pulled By: houseroad fbshipit-source-id: bb0451d211cfc951ddb66264b92586c43b6e8841	2020-02-11 16:38:26 -08:00
Karl Ostmo	af4d6120bd	Temporarily disable failing 'binary_macos_libtorch_2_7_cpu_build' and… (#33207 ) Summary: … 'binary_macos_wheel_3_6_cpu_build' jobs Pull Request resolved: https://github.com/pytorch/pytorch/pull/33207 Differential Revision: D19844787 Pulled By: kostmo fbshipit-source-id: d44a0e26bf76afe4a5f94d7f1ad2d558de6f5d47	2020-02-11 15:44:35 -08:00
Ilia Cherniavskii	04829e924a	Update CPU threading doc (#33083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33083 Added more recommendations, some notes and warning Test Plan: cd docs ; make html Differential Revision: D19829133 Pulled By: ilia-cher fbshipit-source-id: b9fbd89f5875b3ce35cc42ba75a3b44bb132c506	2020-02-11 14:13:51 -08:00
Iurii Zdebskyi	6706c3f457	Prepare templates (#30982 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30982 This stack is a first step toward an effort to fix, clean up and simplify code generation logic. �Please see the master [task](https://github.com/pytorch/pytorch/issues/30405) to see related discussions and all the known issues. Main focus of these changes is TensorOptions in code generation. Goals: - Remove TensorOptions from generated code wherever it's possible. Leave it only in python/C++ API layers. - Refactor TensorOptions logic to a single place. - Log all discovered issues. Non goals: - Fix Everything! - Remove all the hacks in code generation scripts. - Clean up and defector all code generation scripts. ----------- In this PR: Updating the templates. ----------- Test Plan: Imported from OSS Differential Revision: D18912680 Pulled By: izdeby fbshipit-source-id: 9e3828e42ee5c3aefbf3729f4a8d6db813f2e7c3	2020-02-11 13:10:14 -08:00
Hong Xu	45818a3de4	Remove some Half support in some binary CPU kernels (#33021 ) Summary: They were probably mistakenly added as we do not intend to support Half on CPUs in general and in these situations Half type would probably be significantly slower than their float and double counterpart due to the lack of vectorization and the need of additional casting. cc XiaobingSuper Pull Request resolved: https://github.com/pytorch/pytorch/pull/33021 Differential Revision: D19795152 Pulled By: VitalyFedyunin fbshipit-source-id: b19796db88880a46557e1b2fd06e584d46093562	2020-02-11 12:54:47 -08:00
Mingfei Ma	7b50e76255	optimize cat performance on CPU with TensorIterator (#30806 ) Summary: This PR aims at improving `cat` performance on CPU. Current `cat` logic from `TH` module has no parallelization when the input tensor array are all contiguous. This code also try to reuse the same `TensorIterator` as much as possible, in order to reduce overhead of creating `TensorIterator`, this is helpful when the slice of copy is not large enough. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30806 Differential Revision: D19275026 Pulled By: VitalyFedyunin fbshipit-source-id: 756e9b86891f725c256b0a6981887ff06d88b053	2020-02-11 12:49:56 -08:00
Mike Ruberry	ad90c97c0a	Removes flaky check (#33146 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/32949. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33146 Differential Revision: D19836001 Pulled By: mruberry fbshipit-source-id: 773069ae0c181e1a050b65b888c87590c1dddb32	2020-02-11 12:21:07 -08:00
ptrblck	a64d0ffe81	Use int64 in pdist kernel to handle batches >= 46342 #30583 (#31593 ) Summary: Currently `torch.pdist` yields an illegal CUDA memory access for batch sizes >= 46342 as reported by SsnL in https://github.com/pytorch/pytorch/issues/30583. Thanks for the minimal code reproduction, btw! ;) Reason for this bug: The calculation if `i` in the [`pdist_kerne_cuda_impl`](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L112)`) might overflow, if a tensor with a `batch size >= 46342` is passed to `torch.pdist`. Detailed description: * `result` is resizes as ` n * (n - 1) / 2 = 1073767311` ([line of code](`46ad80c839/aten/src/ATen/native/Distance.cpp (L140)`)) * `grid` is initialized as `result.numel()` ([line of code](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L246)`)) * `k` is assigned to the `blockIdx.x` as an `int32` ([line of code](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L108)`)) * `i` is calculated using `2 * k >= 2147534622` ([line of code](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L112)`)), which overflows, since `2147534622 > 2147483647 (int32_max)`. Using `const int64_t k = blockIdx.x;` would solve the illegal memory access. This seems also be done for [`cdist_kernel_cuda_impl`](`46ad80c839/aten/src/ATen/native/cuda/DistanceKernel.cu (L198-L201)`). However, we might expect a slowdown, so I've timed the current PyTorch master vs. this PR: (tested with `x = torch.randn(x.size(0), 128)` on a V100) \|x.size(0) \| int32 idx \| int64 idx \| slowdown \| \|----------\|-----------\|-----------\|----------\| \| 50000 \| - \| 4.4460 \| - \| \| 25000 \| 1.02522 \| 1.10869 \| 7.53% \| \| 12500 \| 0.25182 \| 0.27277 \| 7.68% \| \| 6250 \| 0.06291 \| 0.06817 \| 7.72% \| \| 3125 \| 0.01573 \| 0.01704 \| 7.69% \| \| 1562 \| 0.00393 \| 0.00426 \| 7.75% \| While checking the backward kernel, it seems I'm triggering another error with a size limit of ```python x = torch.randn(1449, 1, device='cuda', requires_grad=True) out = torch.pdist(x) out.mean().backward() > RuntimeError: CUDA error: invalid configuration argument ``` , while `[<=1448, 1]` works. I'll take another look at this issue. Let me know, if the potential fix should go into this PR or if I should open a new issue. CC ngimel, csarofeen Pull Request resolved: https://github.com/pytorch/pytorch/pull/31593 Differential Revision: D19825571 Pulled By: ngimel fbshipit-source-id: ace9ccab49f3cf0ce894cdb6daef0795e2e8ec03	2020-02-11 12:00:39 -08:00
Xiang Gao	367488b001	Move where cuda implementation to TensorIterator (#32984 ) Summary: `where` is special because the arguments do not have the same type, which does not satisfy the assumption in modern https://github.com/pytorch/pytorch/pull/32383. I migrate it to TensorIterator so that there is something to test that this case is not broken. Currently, this case fallback to using legacy (not vectorized, not unrolled) code. It should be supported in the future when I cleanup `Loops.cuh`. I also move some sharing part of `CUDALoops.cuh` and `ROCmLoops.cuh` into `Loops.cuh` so that to logic for checking whether `func_t` has the same arg types could be shared. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32984 Differential Revision: D19825127 Pulled By: ngimel fbshipit-source-id: bbf4682349d96b4480c4d657f3c18a3a67a9bf17	2020-02-11 11:10:06 -08:00
Hong Xu	31370949be	Add zero_mask function for vectorized functions. (#32985 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32985 This can be useful in many situations to decide whether all elements are zeros or non-zeros, such as elu as shown in #32986 . Test Plan: Imported from OSS Differential Revision: D19794549 Pulled By: VitalyFedyunin fbshipit-source-id: 1be1c863d69b9a19fdcfcdd7cb52343066f740d3	2020-02-11 11:01:29 -08:00
George Guanheng Zhang	855ee6446f	Revert D18749922: [pytorch] Migrating index_add cuda to ATen Test Plan: revert-hammer Differential Revision: D18749922 Original commit changeset: d243be43a3b6 fbshipit-source-id: 15dafa644d84ff8803bd9ab3cdd40e12d805924a	2020-02-11 10:33:20 -08:00
Iurii Zdebskyi	857bae39e0	Updated DispatchKeyExtractor to expect TensorOptions (#30981 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30981 This stack is a first step toward an effort to fix, clean up and simplify code generation logic. �Please see the master [task](https://github.com/pytorch/pytorch/issues/30405) to see related discussions and all the known issues. Main focus of these changes is TensorOptions in code generation. Goals: - Remove TensorOptions from generated code wherever it's possible. Leave it only in python/C++ API layers. - Refactor TensorOptions logic to a single place. - Log all discovered issues. Non goals: - Fix Everything! - Remove all the hacks in code generation scripts. - Clean up and defector all code generation scripts. ----------- In this PR: Extended DispatchKeyExtractor logic to expect TensorOptions. ----------- Test Plan: Imported from OSS Differential Revision: D18912684 Pulled By: izdeby fbshipit-source-id: 25cf1c397caa14272ca65b4003f1f03ff282ea77	2020-02-11 10:09:08 -08:00
Vincent Quenneville-Belair	e7f0b15473	Remove return value for __exit__ (#32997 ) Summary: When an error is raised and `__exit__` in a context manager returns `True`, the error is suppressed; otherwise the error is raised. No return value should be given to maintain the default behavior of context manager. Fixes https://github.com/pytorch/pytorch/issues/32639. The `get_lr` function was overridden with a function taking an epoch parameter, which is not allowed. However, the relevant error was not being raised. ```python In [1]: import torch ...: ...: class MultiStepLR(torch.optim.lr_scheduler._LRScheduler): ...: def __init__(self, optimizer, gamma, milestones, last_epoch = -1): ...: self.init_lr = [group['lr'] for group in optimizer.param_groups] ...: self.gamma = gamma ...: self.milestones = milestones ...: super().__init__(optimizer, last_epoch) ...: ...: def get_lr(self, step): ...: global_step = self.last_epoch #iteration number in pytorch ...: gamma_power = ([0] + [i + 1 for i, m in enumerate(self.milestones) if global_step >= m])[-1] ...: return [init_lr * (self.gamma ** gamma_power) for init_lr in self.init_lr] ...: ...: optimizer = torch.optim.SGD([torch.rand(1)], lr = 1) ...: scheduler = MultiStepLR(optimizer, gamma = 1, milestones = [10, 20]) ``` ``` --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-1-7fad6ba050b0> in <module> 14 15 optimizer = torch.optim.SGD([torch.rand(1)], lr = 1) ---> 16 scheduler = MultiStepLR(optimizer, gamma = 1, milestones = [10, 20]) <ipython-input-1-7fad6ba050b0> in __init__(self, optimizer, gamma, milestones, last_epoch) 6 self.gamma = gamma 7 self.milestones = milestones ----> 8 super().__init__(optimizer, last_epoch) 9 10 def get_lr(self, step): ~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/optim/lr_scheduler.py in __init__(self, optimizer, last_epoch) 75 self._step_count = 0 76 ---> 77 self.step() 78 79 def state_dict(self): ~/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/optim/lr_scheduler.py in step(self, epoch) 141 print("1a") 142 # try: --> 143 values = self.get_lr() 144 # except TypeError: 145 # raise RuntimeError TypeError: get_lr() missing 1 required positional argument: 'step' ``` May be related to https://github.com/pytorch/pytorch/issues/32898. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32997 Differential Revision: D19737731 Pulled By: vincentqb fbshipit-source-id: 5cf84beada69b91f91e36b20c3278e9920343655	2020-02-11 09:27:29 -08:00
Jongsoo Park	6c0dc66cb4	[caffe2] use JIT'ed fp32 SLS (#33123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32413 Use JIT'ed fp32 SLS in Caffe2 operators Test Plan: ``` ./fblearner/flow/run_integration_tests --regex dper.workflows.canary.canary_workflow --wait ``` f167043951 was killed due to 3hr timeout instead of failed. Reviewed By: jianyuh Differential Revision: D19680711 fbshipit-source-id: efaca333edcfeab0007ad88f4f5168b2229e7e66	2020-02-11 08:59:17 -08:00
albanD	3655975565	Add allow_rebase_history flag and fix codegen functions for multiple views (#32790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32790 Same as https://github.com/pytorch/pytorch/pull/31990 but without the first commit in the stack that is problematic for a lot of people. Test Plan: Imported from OSS Differential Revision: D19814116 Pulled By: albanD fbshipit-source-id: d104911a5b098a5807b4bc08b69803ebd4f69fa6	2020-02-11 07:16:02 -08:00
Gerard Goossen	330d051bd5	[pytorch] Migrating index_add cuda to ATen (#30573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30573 Mostly just moved code. Index dim and number of indices checks are added to make checks idential to index_add_cpu_ ghstack-source-id: 98010129 Test Plan: existing tests Differential Revision: D18749922 fbshipit-source-id: d243be43a3b6a9b9591caf0c35ef2fb6ec0d3ead	2020-02-11 06:03:53 -08:00
Natalia Gimelshein	9857d9b4cd	fix gather regression by not materializing loop vars in the error mes… (#33108 ) Summary: …sage Per title, fixes regression reported in https://github.com/pytorch/pytorch/issues/32425. cc nikitaved Pull Request resolved: https://github.com/pytorch/pytorch/pull/33108 Differential Revision: D19816116 Pulled By: ngimel fbshipit-source-id: 9f4a84c8e4533873b71bb7bbf3a7915b05308845	2020-02-10 18:27:02 -08:00
Lin Yang	6f46962f21	[1/3] Bind IndexHash to PyTorch (#33015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33015 Export IndexHash to PyTorch Test Plan: buck test caffe2/caffe2/python/operator_test:torch_integration_test ✓ caffe2/caffe2/python/operator_test:torch_integration_test-2.7 - test_index_hash_op (caffe2.caffe2.python.operator_test.torch_integration_test.TorchIntegration) 0.151 44/50 (passed) Reviewed By: bddppq Differential Revision: D19727301 fbshipit-source-id: a65c954539e81a15577fe5c3c0deb3614e983534	2020-02-10 17:47:38 -08:00
svcscm	61ac14a483	Updating submodules Summary: GitHub commits: `543b39c9ad` `38c2e0ee44` `552c07c32b` `4369f2c7bb` `07dbb5d2f4` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 803108a618a5be9ea58a38644c851486bad3bfbc	2020-02-10 17:19:07 -08:00
Ailing Zhang	a3e69d3405	Use bazelisk instead of specifying bazel version manually. (#33036 ) Summary: Bazelisk automatically reads `.bazelversion` file and install the required version of Bazel. This saves us from updating CI script everytime we need a Bazel upgrade. Use clang-8 for consistency with pytorch/xla repo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33036 Differential Revision: D19820819 Pulled By: ailzhang fbshipit-source-id: 1560ec225cd037a811769a509a704b0df77ea183	2020-02-10 17:14:08 -08:00
svcscm	524fe8a96c	Updating submodules Summary: GitHub commits: `4bc5213b66` `9ae570bb89` `b2bc1da561` `dcde8696bd` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: c5ca30dab73f80cd13f5a5bf6e3867083b2512ac	2020-02-10 15:07:12 -08:00
Ivan Kobzarev	d672779339	[CI][treehug] Disable xenial_py2.7 tests due to mypy min version py3.5 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33159 Test Plan: Imported from OSS Differential Revision: D19822400 Pulled By: IvanKobzarev fbshipit-source-id: 8e7b561e6a6181ec1f9b6f56a539ddcb538b3858	2020-02-10 14:52:29 -08:00
Jiakai Liu	495c1df510	[pytorch] convert code analyzer to a binary (#33102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33102 Add a simple main() to build code analyzer as a binary. This enables easier integration with FB internal build environment. ghstack-source-id: 97958658 Test Plan: - CI Differential Revision: D19798560 Pulled By: ljk53 fbshipit-source-id: 126230e3bf7568046a309e8a6785230f820e0222	2020-02-10 14:46:29 -08:00
Karl Ostmo	e8c4f5a74b	Temporarily disable failing iOS builds Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33154 Differential Revision: D19820655 Pulled By: kostmo fbshipit-source-id: fc3e22b1bf4ec112085ea846c3999efd0f3e26f3	2020-02-10 13:47:57 -08:00
Gregory Chanan	3bde97d5a5	Move a resize from codegen to code. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33024 Test Plan: Imported from OSS Differential Revision: D19774147 Pulled By: gchanan fbshipit-source-id: 08cb099f1695b28117e4236e214976b548aec7a1	2020-02-10 12:47:14 -08:00
Jithun Nair	3c4cec56aa	Enable test_distributed for ROCm but only with nccl backend [REDUX] (#32551 ) Summary: This is a redux of the original PR https://github.com/pytorch/pytorch/issues/28814 which was reverted in PR https://github.com/pytorch/pytorch/issues/29736 due to test_DistributedDataParallel being suspected as being flaky. Further investigation revealed it wasn't flakiness, but a bug in the PyTorch source code which has been now fixed in PR https://github.com/pytorch/pytorch/issues/32356. This PR is another attempt at enabling the test_distributed unit test suite only for the nccl backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32551 Differential Revision: D19729966 Pulled By: bddppq fbshipit-source-id: 12a0d850991a903cc7723d63693b6157071d7115	2020-02-10 12:42:36 -08:00
George Guanheng Zhang	f4fbe9549d	Revert D19800021: [pytorch][PR] Improve error message for assertWarnsRegex Test Plan: revert-hammer Differential Revision: D19800021 Original commit changeset: 1c31ae785c8f fbshipit-source-id: d7b340d678562c25a84d48be66c576075000b50d	2020-02-10 12:17:52 -08:00
Jeremy Lilley	6be4ec100f	[pytorch] Elide more Thrift Tensor send copies. (#31998 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31998 This change builds on recent torch::from_blob() changes to avoid Tensor copies on send in more cases. Particularly, this change adds an enabled option to assume if the Tensor Storage's DataPtr has a non-trivial deleter, then the Tensor does in fact manage the underlying memory. And hence we can reference the Tensor's Storage via an IOBuf that is referenced while sending, saving a Tensor copy. We add appropriate test cases, particularly re: torch::from_blob() which would have been problematic would recent changes. ghstack-source-id: 97778619 Test Plan: buck test mode/dev caffe2/torch/fb/distributed/wireSerializer/test/... Reviewed By: satgera Differential Revision: D19306682 fbshipit-source-id: 05f56efb2d5d6279ae4b54dfcbba0f729c2c13fa	2020-02-10 11:34:33 -08:00
peterjc123	ebed008dd4	Correct /MP usage in MSVC (#33120 ) Summary: ## Several flags `/MP[M]`: It is a flag for the compiler `cl`. It leads to object-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC. `/maxcpucount:[M]`: It is a flag for the generator `msbuild`. It leads to project-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC. `/p:CL_MPCount=[M]`: It is a flag for the generator `msbuild`. It leads the generator to pass `/MP[M]` to the compiler. `/j[M]`: It is a flag for the generator `ninja`. It leads to object-level multiprocessing. By default, it spawns M processes where M is the number of cores on the PC. ## Reason for the change 1. Object-level multiprocessing is preferred over project-level multiprocessing. 2. ~For ninja, we don't need to set `/MP` otherwise M * M processes will be spawned.~ Actually, it is not correct because in ninja configs, there are only one source file in the command. Therefore, the `/MP` switch should be useless. 3. For msbuild, if it is called through Python configuration scripts, then `/p:CL_MPCount=[M]` will be added, otherwise, we add `/MP` to `CMAKE_CXX_FLAGS`. 4. ~It may be a possible fix for https://github.com/pytorch/pytorch/issues/28271, https://github.com/pytorch/pytorch/issues/27463 and https://github.com/pytorch/pytorch/issues/25393. Because `/MP` is also passed to `nvcc`.~ It is probably not true. Because `/MP` should not be effective given there is only one source file per command. ## Reference 1. https://docs.microsoft.com/en-us/cpp/build/reference/mp-build-with-multiple-processes?view=vs-2019 2. https://github.com/Microsoft/checkedc-clang/wiki/Parallel-builds-of-clang-on-Windows 3. https://blog.kitware.com/cmake-building-with-all-your-cores/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/33120 Differential Revision: D19817227 Pulled By: ezyang fbshipit-source-id: f8d01f835016971729c7a8d8a0d1cb8a8c2c6a5f	2020-02-10 11:29:25 -08:00
mfkasim91	9d94f56ce0	Backward operation of torch.eig for real eigenvalues (#33090 ) Summary: Another pull request to follow up issue https://github.com/pytorch/pytorch/issues/32531. Here I implemented the backward operation for `torch.eig` with a condition that all the eigenvalues are real. This pull request is independent of my another pull request https://github.com/pytorch/pytorch/issues/32932, which means that there is no dependency between this PR and my another PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33090 Differential Revision: D19814347 Pulled By: albanD fbshipit-source-id: 2fae30964e97987abb690544df8240aedeae56e8	2020-02-10 09:52:56 -08:00
Peter Bell	c917a247a8	Improve error message for assertWarnsRegex (#33099 ) Summary: `assertWarnsRegex` now prints out any warnings that it caught while failing to find a matching warning. This makes it easier to debug tests by just looking at the CI logs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33099 Differential Revision: D19800021 Pulled By: ezyang fbshipit-source-id: 1c31ae785c8ffc5d47619aff6597e479263be2de	2020-02-10 07:27:59 -08:00
albanD	3e8d813263	Add more checks to custom Function (#33069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33069 This PR adds the following: - Warn when a non-input Tensor is given to `mark_dirty()` as it is not needed. - Raise an error if we modify inplace an input that is a view and that we have multiple output. This setting is not handled by `CopySlices` and will raise a cryptic error during the backward. - Raise an error if an input is modified inplace but not returned. That will prevent the graph rewrite from being done correctly. Test Plan: Imported from OSS Differential Revision: D19791563 Pulled By: albanD fbshipit-source-id: 4d8806c27290efe82ef2fe9c8c4dc2b26579abd1	2020-02-10 07:25:24 -08:00
albanD	e1c53a5c86	Fix version counter bump in cpp Function (#33068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33068 The version counter is already tracked if we use pytorch's functions but not if the user unpack the Tensor and modifies it by hand or with a third party library. Test Plan: Imported from OSS Differential Revision: D19791564 Pulled By: albanD fbshipit-source-id: a73c0f73d8fd0c0e5bf838f14bed54fa66937840	2020-02-10 07:22:29 -08:00
Peter Bell	efba630287	Issue a warning when zero_grad is used in DataParallel (#33064 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31768, second attempt of https://github.com/pytorch/pytorch/issues/32870 DataParallel creates replicas of the original `nn.Module` with the parameters duplicated onto the destination devices. Calling `backwards` will propagate gradients onto the original module parameters but calling `zero_grad` on the replica module doesn't clear the gradients from the parent module. However, any replica using backwards was broken anyway since the replica's parameters are not leaf nodes in autograd. So, we should issue a warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33064 Differential Revision: D19790178 Pulled By: albanD fbshipit-source-id: 886f36640acef4834a6fa57a26ce16b42ff0e9ad	2020-02-10 07:04:27 -08:00
Summer Deng	e2f1288514	Add utils to inspect fp16/int8 packed weights (#32979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32979 Since we use prepacked weights in the Fp16 FCs and future Int8 FCs in production Ads models, we provide the python utils to inspect the unpacked format of the weights for debugging purpose. The main interfaces are the following: ``` from deeplearning.numeric_suite.toolkit import packed_weights_inspector # inspect fp16 packed weights unpacked_fp16_weights = packed_weights_inspector.extract_fp16_fc_packed_weights(fp16_weight_blob_name) # inspect int8 packed weights unpacked_int8_weights, qparams = packed_weights_inspector.extract_int8_fc_packed_weights(int8_weight_blob_name) ``` Test Plan: ``` buck test mode/opt deeplearning/numeric_suite/toolkit/test:packed_weights_inspector_test ``` Reviewed By: amylittleyang Differential Revision: D19724474 fbshipit-source-id: e937672b3722e61bc44c2587aab2288a86aece9a	2020-02-08 18:18:56 -08:00
Negin Raoof	6249d7302b	[ONNX] Fix export for avg_pool with default stride (#33017 ) Summary: If using nn.functional avg_pool, stride is an optional arg. If not provided, it is set to kernel_size. This PR fixes the export of avg_pool with default stride. Pull Request resolved: https://github.com/pytorch/pytorch/pull/33017 Reviewed By: hl475 Differential Revision: D19759604 Pulled By: houseroad fbshipit-source-id: b0352db6fbaf427f4cff9ba8a942efdeb39b6f02	2020-02-07 22:46:46 -08:00
Raghuraman Krishnamoorthi	0e29e9e0f6	Re-enable internal test runs Summary: Fix internal error message due to old version of hypothesis test_suite = self.load_tests() File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/__fb_test_main__.py", line 678, in load_tests suite = loader.load_all() File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/__fb_test_main__.py", line 467, in load_all __import__(module_name, level=0) File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/test_quantization.py", line 45, in <module> hu.assert_deadline_disabled() File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/torch/testing/_internal/hypothesis_utils.py", line 322, in assert_deadline_disabled assert settings().deadline is None File "/data/sandcastle/boxes/eden-trunk-hg-fbcode-fbsource/fbcode/buck-out/dev/gen/caffe2/test/quantization#binary,link-tree/hypothesis/_settings.py", line 127, in __getattr__ raise AttributeError('settings has no attribute %s' % (name,)) AttributeError: settings has no attribute deadline Test Plan: buck test mode/dev //caffe2/test:quantization -- --run-disabled runs successfully Differential Revision: D19795232 fbshipit-source-id: ef1d8be20b4be30e1cfad4cd5019c4779a5f4568	2020-02-07 18:08:18 -08:00
Brian Stark	17d4ef9e9e	Support using scalar tensor for split (#32493 ) Summary: split requires an int input, however in tracing operators such as size(axis) return a tensor, which is different behavior than when not tracing. As such need to modify split to handle these cases. Fixes https://github.com/pytorch/pytorch/issues/27551 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32493 Reviewed By: hl475 Differential Revision: D19538254 Pulled By: houseroad fbshipit-source-id: c8623009de5926aa38685e08121f4b48604bd8c0	2020-02-07 17:16:43 -08:00
Kiuk Chung	7314f1c281	[torch/multiprocessing] Update documentation indicating that start_method is ignored for mp.spawn() (#33070 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33070 `start_method` parameter is intentionally ignored for `mp.spawn()`. Document this fact and point the user to `start_processes` if they want to use a different `start_method`. Test Plan: Warning message looks like: ``` main.py:8: UserWarning: This method only supports start_method=spawn (got: fork). To use a different start_method use: torch.multiprocessing.start_process(...) warnings.warn(msg) ``` Reviewed By: ailzhang Differential Revision: D19780235 fbshipit-source-id: 4599cd18c3ba6cc401810efe4f390290ffa8023b	2020-02-07 15:26:00 -08:00
Nikolay Korovaiko	c6fa6d82ae	move Decompose before profiling to prevent clearing shape info Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33100 Differential Revision: D19793346 Pulled By: Krovatkin fbshipit-source-id: fdc5927f4970eabbb5a8f62a499d5b79117af2a9	2020-02-07 14:04:40 -08:00
Lara	868db903ae	ONNX support for torch.take (#33061 ) Summary: Adding ONNX export support for torch.take() Pull Request resolved: https://github.com/pytorch/pytorch/pull/33061 Reviewed By: hl475 Differential Revision: D19782651 Pulled By: houseroad fbshipit-source-id: 0168fb941e166acda4ca607165248b8e0b260ace	2020-02-07 13:41:26 -08:00
Hong Xu	a9583c1f75	Vectorize softplus and its backward function on CPU (#32944 ) Summary: The benchmarking shows a huge performance gain (2-7x faster). Also note that I removed Half support because it isn't generally supported on CPU. Benchmark: (Debian 10, Release build, gcc 8.3, no turbo, Intel(R) Xeon(R) E-2136 CPU @ 3.30GHz) ```python import timeit for op in ('Softplus',): print('Forward') for dtype in ('torch.double', 'torch.float'): for n, t in [(10_000, 10000), (100_000, 1000)]: print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('m(a)', setup=f'import torch; m = torch.nn.{op}(); a = torch.randn({n}, dtype={dtype})', number=t)) print('Backward') for dtype in ('torch.double', 'torch.float'): for n, t in [(10_000, 40000), (100_000, 4000)]: print(f'torch.nn.{op}()(a), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit('y.backward(retain_graph=True)', setup=f'import torch; m = torch.nn.{op}(); a = torch.randn({n}, dtype={dtype}, requires_grad=True); x = m(a); y = x.sum()', number=t)) ``` Before: ``` Forward torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.double 3.73130346799735 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.double 3.6790116359916283 torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.float 2.7477027159911813 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.float 2.7382752639969112 Backward torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.double 7.037510035006562 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.double 5.855093962003593 torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.float 3.413616877005552 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.float 2.5485514330066508 ``` After: ``` Forward torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.double 0.9465823079954134 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.double 0.8799468770012027 torch.nn.Softplus()(a), numel() == 10000 for 10000 times, dtype=torch.float 0.39715987400268205 torch.nn.Softplus()(a), numel() == 100000 for 1000 times, dtype=torch.float 0.3563060039887205 Backward torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.double 2.400547721001203 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.double 1.4740848699875642 torch.nn.Softplus()(a), numel() == 10000 for 40000 times, dtype=torch.float 1.6684603010071442 torch.nn.Softplus()(a), numel() == 100000 for 4000 times, dtype=torch.float 0.6815649690106511 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32944 Differential Revision: D19725407 Pulled By: VitalyFedyunin fbshipit-source-id: 7430de838df731bd17617eff63f10107d5ad6b8b	2020-02-07 11:28:49 -08:00
Kimish Patel	e7b42209eb	Added sparkspot model. Summary: Lite interpereter does not have softplus and sub ops for this model. Test Plan: buck run fbsource//xplat/aibench:run_bench -- -b ../xplat/aibench/specifications/models/pytorch/mobile_migration/sparkspot.json --platform android --framework pytorch --remote --devices SM-G960U-8.0.0-26 https://our.intern.facebook.com/intern/aibench/details/890521439770638 buck run fbsource//xplat/aibench:run_bench -- -b ../xplat/aibench/specifications/models/pytorch/mobile_migration/sparkspot.json --platform android/arm64 --framework pytorch --remote --devices SM-G960U-8.0.0-26 https://our.intern.facebook.com/intern/aibench/details/485779747361527 For Caffe2: buck run fbsource//xplat/aibench:run_bench -- -b ../xplat/aibench/specifications/models/caffe2/mobile_migration/sparkspot.json --platform android --framework caffe2 --remote --devices SM-G950U-7.0-24 https://our.intern.facebook.com/intern/aibench/details/177482569133423 Reviewed By: ljk53, iseeyuan Differential Revision: D19757721 fbshipit-source-id: cdd4b39d072925fc8de17184f2c90918de6245ba	2020-02-07 11:22:06 -08:00
Hongyu Cai	de27f4261d	[jit] remove redundant variables from JIT TestCase Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29091 Differential Revision: D19746083 Pulled By: suo fbshipit-source-id: 76fd71740fe7a3f52da361d96a7b694ec208de24	2020-02-07 10:42:33 -08:00
Negin Raoof	d678093907	[ONNX] Extend op registration to next opsets (#32943 ) Summary: Currently, custom ops are registered for a specific opset version. For example, all torchvision custom ops are registered for opset 11, and cannot be exported into higher opset versions. This PR extends op registration to higher opset versions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32943 Reviewed By: hl475 Differential Revision: D19739406 Pulled By: houseroad fbshipit-source-id: dd8b616de3a69a529d135fdd02608a17a8e421bc	2020-02-07 10:37:50 -08:00
Alban Desmaison	3b2f267ad8	add to codeowner to get better inbox notification for PR Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33087 Differential Revision: D19790389 Pulled By: albanD fbshipit-source-id: 360ee1fc47a9b0b8d8ddbe47b77f2cbffaead9c8	2020-02-07 07:56:47 -08:00
Lu Fang	674dca0831	Automatic update of fbcode/onnx to 8b3f7e2e7a0f2aba0e629e23d89f07c7fc0e6a5e (#33075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33075 Previous import was 65020daafa9183c769938b4512ce543fd5740f8f Included changes: - [8b3f7e2e](https://github.com/onnx/onnx/commit/8b3f7e2e): Update Dropout and BatchNorm to be Training Friendly (#2568) <Lara Haidar> - [61f0bbc5](https://github.com/onnx/onnx/commit/61f0bbc5): Fix a bug in ScatterND shape inference (#2577) <Bowen Bao> - [05bce9cf](https://github.com/onnx/onnx/commit/05bce9cf): add utility function to make reference attribute whose name is not the same as the attribute it refers. (#2583) <Ke Zhang> - [71181c83](https://github.com/onnx/onnx/commit/71181c83): Clarify spec for constant of shape with dim_n = 0 (#2567) <Negin Raoof> - [eadba733](https://github.com/onnx/onnx/commit/eadba733): Update sigs.md with link to calendar page (#2579) <Prasanth Pulavarthi> - [08562f8e](https://github.com/onnx/onnx/commit/08562f8e): Update working-groups.md (#2580) <Prasanth Pulavarthi> - [0e718913](https://github.com/onnx/onnx/commit/0e718913): Fix Slice op's shape inference logic (#2526) <Hariharan Seshadri> - [12111410](https://github.com/onnx/onnx/commit/12111410): Add missing spaces to RandomLike doc (#2572) <Takeshi Watanabe> - [7e6e61d6](https://github.com/onnx/onnx/commit/7e6e61d6): Contributing: fix typos (#2571) <Maher Jendoubi> - [bbd604ef](https://github.com/onnx/onnx/commit/bbd604ef): Add Einsum op (#2504) <Negin Raoof> - [fd3ab73a](https://github.com/onnx/onnx/commit/fd3ab73a): Clarify split supports zero length splits (#2544) <Negin Raoof> - [6dd73774](https://github.com/onnx/onnx/commit/6dd73774): Fix circleci build and drop unsupported Windows builds (#2565) <Wei-Sheng Chin> - [b3d201a2](https://github.com/onnx/onnx/commit/b3d201a2): Fix the formula of intermediate zero calculation for DynamicQuantizeLinear (#2556) <Yufeng Li> - [3613eb25](https://github.com/onnx/onnx/commit/3613eb25): Add wording to clarify. (#2555) <Dwayne Robinson> - [dfa4384c](https://github.com/onnx/onnx/commit/dfa4384c): Fix shape inference for Split with split attribute (#2328) <Shinichiro Hamaji> - [684fc1bc](https://github.com/onnx/onnx/commit/684fc1bc)*: Keep symbolic dims in Concat with a single input (#2418) <Shinichiro Hamaji> Test Plan: ci Reviewed By: hl475 Differential Revision: D19784487 fbshipit-source-id: 421cdc3394faeff0168853f4ff065fc599ca3967	2020-02-07 02:18:57 -08:00
Michael Ranieri	e025f393f6	windows template specialization bug (#33076 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33076 attempt at fixing https://github.com/pytorch/pytorch/issues/30886 Test Plan: circleCI with `call "C:\Program Files (x86)\Microsoft Visual Studio\2019\Community\VC\Auxiliary\Build\vcvarsall.bat" x64 -vcvars_ver=14.16` passes Differential Revision: D19784550 fbshipit-source-id: 9fb42c3854d1d00d96cd7179bef9dd1aa2972ea6	2020-02-07 00:41:22 -08:00
Pritam Damania	05d18ffaf5	Distributed Autograd: Allow multiple backward passes to accumulate gradients. (#32506 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32506 In this PR, we've introduced a `retain_graph` parameter to distributed autograd similar to `torch.autograd.backward`. In terms of design, this parameter is sent over RPC to all nodes and is used to create the GraphTask on the local nodes. This enables us to run `dist_autograd.backward()` multiple times in the same context. The use case currently for this is to benchmark only the backward pass for distributed autograd. We'd like to measure the QPS for the backward pass and as a result, running a single forward pass and multiple backward passes in a loop is one way to benchmark backward pass performance. ghstack-source-id: 97868900 Test Plan: waitforbuildbot Differential Revision: D19521288 fbshipit-source-id: 7ad8521059fd400d7b5a6ab77ce56e1927ced90a	2020-02-06 23:27:21 -08:00
Jeremy Lilley	f0d7bd41b9	[jit] Minor: avoid recalculating some keys for map accesses in pickler. (#33060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33060 Noticed this when tracking down a partially-related SIGSEGV. If inserting a non-present key into a memoized map, don't re-calculate it twice (probably safer that way anyway). ghstack-source-id: 97904485 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D19778008 fbshipit-source-id: 95b1d708c034a54b96a22ccbdffb24f72d08dffd	2020-02-06 21:25:04 -08:00
svcscm	10db323b75	Updating submodules Summary: GitHub commits: `4121390031` `fdd24faa6c` `94471e632b` `0a24425afd` `8b79c69b6c` `99f3917826` `3853cef0ba` `5db0cb90fc` `714edbb20f` `880ade1420` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: a63558a8df40c936d8959287f815835502b6cbd9	2020-02-06 21:01:50 -08:00
Brian Stark	afa8cbf8c2	Modifed randNLike for scripting (#32830 ) Summary: the rand N like function had required args which were not being used. As such modified the method signature to give default values so when scripting does not provide these arguments which are not even being used, no error is thrown. Additionally modified the const checker for handling prim::Constant as well Pull Request resolved: https://github.com/pytorch/pytorch/pull/32830 Reviewed By: hl475 Differential Revision: D19731715 Pulled By: houseroad fbshipit-source-id: a3cacb3977eecb88b122e0ceb654fdbf1c8286c1	2020-02-06 18:19:42 -08:00
BowenBao	432858c960	[ONNX] Fix exporting copy_ with index as tensor input (#32801 ) Summary: Supporting the below case. Previously index for copy_ was only considered as constant integer, where as it could be a tensor input as well. ```python class InPlaceIndexedAssignment(torch.nn.Module): def forward(self, data, index, new_data): data[index] = new_data return data data = torch.zeros(3, 4) index = torch.tensor(1) new_data = torch.arange(4).to(torch.float32) torch.onnx.export(InPlaceIndexedAssignment(), (data, index, new_data), 'inplace_assign.onnx', opset_version=11) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32801 Reviewed By: hl475 Differential Revision: D19731666 Pulled By: houseroad fbshipit-source-id: 08703fdccd817f901282e19847e259d93929e702	2020-02-06 18:11:47 -08:00
Elias Ellison	ca33aeba09	[JIT] Add Exit Transform / Convert To SSA to docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24114 Differential Revision: D19780828 Pulled By: eellison fbshipit-source-id: d481ad886b2ad6349a1646672e507336d45759fb	2020-02-06 18:04:06 -08:00
Avinash Madasu	b0476dc6e6	Fix Typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33038 Differential Revision: D19769127 Pulled By: zou3519 fbshipit-source-id: 53a7fa603b097d7070ca484997a587ec74e87357	2020-02-06 11:16:56 -08:00
James Reed	38820a7014	[JIT] Resolve custom classes in source importer (#32977 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32977 ghstack-source-id: 97736042 Test Plan: Imported from OSS Differential Revision: D19724588 fbshipit-source-id: b31b6ae14d2881d3604922e611fe4749108e674d	2020-02-06 10:45:40 -08:00
James Reed	757cea92a4	[c10] Allow taking a std::tuple as arg (#32948 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32948 ghstack-source-id: 97736044 Test Plan: Imported from OSS Differential Revision: D19709119 fbshipit-source-id: 26b069a95ae7a79a2d5cbe3845eb1a5dcd398be1	2020-02-06 10:44:31 -08:00
Richard Zou	8195961f20	Revert D19730209: [pytorch][PR] Issue a warning when using zero_grad in DataParallel Test Plan: revert-hammer Differential Revision: D19730209 Original commit changeset: cb9b2cb0c2e0 fbshipit-source-id: 5bf53ea3c37a7ed2411a2acc34e40d07eff144c9	2020-02-06 07:05:51 -08:00
Richard Zou	ec1e9a1ae2	Revert D19417087: fix #30480 torch.normal shape checking is broken Test Plan: revert-hammer Differential Revision: D19417087 Original commit changeset: 1c4bc7df9231 fbshipit-source-id: ee579304cd79e48a6ce87daf490b53baabc655a8	2020-02-06 07:01:29 -08:00
Andrey Malevich	e76fa9822d	[C2] Introduce extra_info force CPU tags for auto-generated iteration counter blobs (#32607 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32607 As desc. Test Plan: Unit-test. Reviewed By: xw285cornell, chocjy Differential Revision: D19551567 fbshipit-source-id: 3a121351d2b4016e99a1536dec746be970698664	2020-02-05 23:49:27 -08:00
lixinyu	3c17cbb6c8	fix #30480 torch.normal shape checking is broken (#32243 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32243 Following what gchanan proposed in #30480 - If the (logical) shapes of mean and std are broadcastable, we broadcast them for the output Done in tensor iterator already. - If the (logical) shapes of mean and std are not broadcastable and they have the same number of elements, we fall back to the old behavior (pick the shape of mean) Done by reshape std to the same shape of mean. - If the (logical) shapes of mean and std are not broadcastable and don't have the same number of elements, we error out. Done by tensor iterator already. Test Plan: Imported from OSS Differential Revision: D19417087 Pulled By: glaringlee fbshipit-source-id: 1c4bc7df923110a803620b9e2abd11a7151fc33e	2020-02-05 23:47:14 -08:00
xiaobing.zhang	b00345a6f2	Move normal distribution to Aten(CPU) (#32031 ) Summary: Fix https://github.com/pytorch/pytorch/issues/24746 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32031 Differential Revision: D19729002 Pulled By: ezyang fbshipit-source-id: f571368a8a2ac4068c937062167a2fd89e64098c	2020-02-05 20:39:40 -08:00
Peter Bell	46c3c18bcc	Issue a warning when using zero_grad in DataParallel (#32870 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31768 `DataParallel` creates replicas of the original `nn.Module` with the parameters duplicated onto the destination devices. Calling `backwards` will propagate gradients onto the original module parameters but calling `zero_grad` on the replica module doesn't clear the gradients from the parent module, ~breaking any model that uses `backward`-`zero_grad` in its `forward`. I fix this by patching the replica module so that `zero_grad` clears grads on the parent as well.~ However, any replica using backwards was broken anyway since the replica's parameters are not leaf nodes in autograd. So, we should raise a warning. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32870 Differential Revision: D19730209 Pulled By: ezyang fbshipit-source-id: cb9b2cb0c2e0aca688ce0ff3e56b40fbd2aa3c66	2020-02-05 20:25:04 -08:00
Richard Zou	6209412647	Add option to use ninja to compile ahead-of-time cpp_extensions (#32495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32495 Background ------------------------------ Previously, ninja was used to compile+link inline cpp_extensions and ahead-of-time cpp_extensions were compiled with distutils. This PR adds the ability to compile (but not link) ahead-of-time cpp_extensions with ninja. The main motivation for this is to speed up cpp_extension builds: distutils does not make use of parallelism. With this PR, using the new option, on my machine, - torchvision compilation goes from 3m43s to 49s - nestedtensor compilation goes from 2m0s to 28s. User-facing changes ------------------------------ I added a `use_ninja` flag to BuildExtension. This defaults to `True`. When `use_ninja` is True: - it will attempt to use ninja. - If we cannot use ninja, then this throws a warning and falls back to distutils. - Situations we cannot use ninja: Windows (NYI, I'll open a new issue for this), if ninja cannot be found on the system. Implementation Details ------------------------------ This PR makes this change in two steps. Please me know if it would be easier to review this if I split this up into a stacked diff. Those changes are: 1) refactor _write_ninja_file to separate the policy (what compiler flags to pass) from the mechanism (how to write the ninja file and do compilation). 2) call _write_ninja_file and _run_ninja_build while building ahead-of-time cpp_extensions. These are only used to compile objects; distutils still handles the linking. Change 1: refactor _write_ninja_file to seperate policy from mechanism - I split _write_ninja_file into: _write_ninja_file and _write_ninja_file_to_build_library - I renamed _build_extension_module to _run_ninja_build Change 2: Call _write_ninja_file while building ahead-of-time cpp_extensions - _write_ninja_file_and_compile_objects calls _write_ninja_file to only build object files. - We monkey-patch distutils.CCompiler.compile to call _write_ninja_files_and_compile_objects - distutils still handles the linking step. The linking step is not a bottleneck so it was not a concern. - This change only works on unix-based systems. Our code for windows goes down a different codepath and I did not want to mess with that. - If a system does not support ninja, we raise a warning and fall back to the original compilation path. Test Plan ------------------------------ Adhoc testing - I built torchvision using pytorch master and printed out the build commands. Next, I used this branch to build torchvision and looked at the ninja file. I compared the ninja file with the build commands and asserted that they were functionally the same. - I repeated the above for pytorch/nestedtensor. PyTorch test suite - I split `test_cpp_extensions` into `test_cpp_extensions_aot` and `test_cpp_extensions_jit`. The AOT (ahead-of-time) version tests ahead-of-time and the JIT version tests just-in-time (not to be confused with TorchScript) - `test_cpp_extensions_aot` gets run TWICE by run_test.py, once with a module that was built with ninja, and once with a module that was built without ninja. - run_test.py asserts that when we are building with use_ninja=True, ninja is actually available on the system. Test Plan: Imported from OSS Differential Revision: D19730432 Pulled By: zou3519 fbshipit-source-id: 819590d01cf65e8da5a1e8019b8b3084792fee90	2020-02-05 18:49:29 -08:00
BowenBao	e54d954572	[ONNX] Add flag to enable script tests (#32654 ) Summary: This will allow us to incrementally enable more tests for scripting as we put in fixes. houseroad spandantiwari Pull Request resolved: https://github.com/pytorch/pytorch/pull/32654 Reviewed By: hl475 Differential Revision: D19583401 Pulled By: houseroad fbshipit-source-id: 8dc05e4784df819c939dffdf33b00cbb80bfa364	2020-02-05 17:51:00 -08:00
Edgar Andrés Margffoy Tuay	1b746b95fb	Consider hub_dir alongside TORCH_HOME env variable for storing hub models (#32844 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32844 Differential Revision: D19747566 Pulled By: ailzhang fbshipit-source-id: caca41a3a057d7d280d4783515aba2cc48c82012	2020-02-05 15:35:53 -08:00
davidriazati	74ce3a032c	Fix some bugs with zipfile serialization (#32244 ) Summary: Stacked PRs * #32958 - Make zip serialization the default * #32244 - Fix some bugs with zipfile serialization It includes the following changes: * Split up tests so that we can test both serialization methods * Loading something within a buffer doesn't work anymore, so those tests are only on the old serialization method (it's possible but introduces a big slowdown since it requires a linear scan of the entire zipfile to find the magic number at the end) * Call `readinto` on a buffer if possible instead of `read` + a copy * Disable CRC-32 checks on read (there was some issue where miniz said the CRC was wrong but `zipinfo` and `unzip` said the zip file was fine) ](https://our.intern.facebook.com/intern/diff/19418935/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32244 Pulled By: driazati Reviewed By: eellison Differential Revision: D19418935 fbshipit-source-id: df140854f52ecd04236225417d625374fd99f573	2020-02-05 15:32:14 -08:00
Pritam Damania	ab75d64e6e	Add ability to abort NCCL communicators from the store. (#32895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32895 When a particular rank calls `ncclCommAbort` on a communicator, it is important to ensure all other ranks call `ncclCommAbort` on their respective communicators. If this is not done, the other ranks could get stuck causing the GPU to spin with 100% utilization. To alleviate this issue, whenever any rank calls `ncclCommAbort` we put the unique communicator id in the store. The NCCL watchdog thread then monitors the store and aborts any communicators found in the store as "aborted". A few more general fixes in this PR: 1) Use std::shared_ptr for the store in PrefixStore. PrefixStore was using a reference to the store and when that reference went out of scope the store object it was holding onto was invalid. This caused a segfault in the watchdog thread. 2) Enhanced logging for the watchdog thread. Test Plan: waitforbuildbot Differential Revision: D19638159 fbshipit-source-id: 596cd87c9fe6d4aeaaab4cb7319cc37784d06eaa	2020-02-05 15:28:05 -08:00
Michael Suo	df1d68d52e	[jit] fix parser for one-line functions (#32941 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32941 The Python grammar allows single-statement one-line functions. So we should allow it in the string parser. Test Plan: Imported from OSS Differential Revision: D19704153 Pulled By: suo fbshipit-source-id: 8c06cc9c600aa2a9567b484a1ecc0360aad443e3	2020-02-05 13:11:47 -08:00
Pruthvi Madugundu	908b451efb	Enabling the nccl/rccl test for ROCM environment (#32340 ) Summary: Enabling the RCCL test on rocm by adding a temporary grace period to clean up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32340 Differential Revision: D19744459 Pulled By: xw285cornell fbshipit-source-id: 1af3b64113a67f93e622d010ddd3020e5d6c8bc8	2020-02-05 12:02:31 -08:00
Natalia Gimelshein	e8581869f2	Properly update _flat_weights in RNN models (#32989 ) Summary: Resubmitting https://github.com/pytorch/pytorch/issues/32939 Should fix https://github.com/pytorch/pytorch/issues/32346 hopefully. Now when _flat_weights list is updated, None elements are appended to it if some weights are missing, subsequent setattr calls for the missing weights should repair _flat_weights and make it suitable to use in the backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32989 Differential Revision: D19731952 Pulled By: ngimel fbshipit-source-id: 2118a19840491e7ab0fef15185fad982f42795a6	2020-02-05 11:53:41 -08:00
Gregory Chanan	72b9412be2	Move some broadcasting logic away from codegen. (#32982 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32982 For masked_scatter_ and masked_fill_ (which already have manually written wrappers), move the broadcasting logic into the manually written wrappers. Test Plan: Imported from OSS Differential Revision: D19726830 Pulled By: gchanan fbshipit-source-id: 1f6e55e19c1314a76e43946b14d58f147c0f8204	2020-02-05 10:23:49 -08:00
Gaurav Singh	fbde3c05b6	[aten] fix vector memory leak (#32478 ) Summary: free(y) missing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32478 Differential Revision: D19728471 Pulled By: agolynski fbshipit-source-id: 73e7933c832f9c19f3fe09df76699c7b335a87bd	2020-02-05 10:18:54 -08:00
Ailing Zhang	81a9046301	Fix dispatch of argmax/argmin. (#32961 ) Summary: The way we currently dispatch argmax/argmin to out-of-source devices is bad and caused issues, e.g it doesn't work well when the input requires grad. https://github.com/pytorch/xla/issues/1585. Making argmax/argmin dispatch at device level resolves it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32961 Differential Revision: D19726826 Pulled By: ailzhang fbshipit-source-id: f7fb445fd8e7691524afcc47d24d8e6b0171d10c	2020-02-05 10:17:50 -08:00
Gregory Chanan	3531f99384	Kill _th_max, _th_min overloads that aren't used. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32981 Test Plan: Imported from OSS Differential Revision: D19726831 Pulled By: gchanan fbshipit-source-id: 22b5b9115838360850c4ee250ed95742f3444dc8	2020-02-05 09:20:21 -08:00
Edward Yang	16c166e2ea	Add XLAPreAutograd key for XLA use cases that need custom autograd. (#32788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32788 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19628643 Pulled By: ezyang fbshipit-source-id: 7099b08eff37913144b961dda00b070bd4b939d4	2020-02-05 08:10:02 -08:00
Edward Yang	6b0813ea5d	Stop using dispatchTypeId to do checks for tensor list unwrap. (#32787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32787 Gets rid of a longstanding TODO. TensorList unwrap is only used for cat, which means we can assume that the inputs are dense, and do something similar to how we do the dense tensor wrapping above. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19628642 Pulled By: ezyang fbshipit-source-id: 3264439407585fb97995a9a2302c2913efecb421	2020-02-05 08:08:16 -08:00
lixinyu	1b446aa2ee	Expose Channel Last 3d enum Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32947 Test Plan: Imported from OSS Differential Revision: D19707716 Pulled By: glaringlee fbshipit-source-id: 03824769376043bc6151a4580aba27654de5077f	2020-02-04 23:33:19 -08:00
James Reed	836b4c9e64	Attempt to workaround MSVC17 static constexpr bug Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/33002 Test Plan: Imported from OSS Differential Revision: D19739097 Pulled By: jamesr66a fbshipit-source-id: 7ce54ddb1f56a741d88d3215b154192171c54dfa	2020-02-04 22:33:22 -08:00
James Reed	f393adc0ed	[JIT] Fix python pickle serialization for torchbind (#32878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32878 ghstack-source-id: 97736045 Test Plan: Imported from OSS Differential Revision: D19669879 fbshipit-source-id: 23ea91cffe7344d1eed014e2509983c281dd18d3	2020-02-04 19:29:55 -08:00
James Reed	23a4800708	[JIT] Make IRParser use op schema (#32854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32854 ghstack-source-id: 97736043 Test Plan: Imported from OSS Differential Revision: D19656881 fbshipit-source-id: 509d09fdbd765ca5cd153bec6440aedfb4e6d23b	2020-02-04 19:29:50 -08:00
James Reed	bc4790b3aa	[JIT] Trace uses of torchbind classes as module attributes (#32833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32833 ghstack-source-id: 97736046 Test Plan: Imported from OSS Differential Revision: D19645714 fbshipit-source-id: 10a7271f13c3588aea666b44b916e90ba7b3c666	2020-02-04 19:28:37 -08:00
Pavel Belevich	d141465713	Fix torch::allclose to handle std::numeric_limits<T>::lowest() for integral types (#32978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32978 Fixes #32946 Test Plan: Imported from OSS Differential Revision: D19726013 Pulled By: pbelevich fbshipit-source-id: ada4aeabc8e39016d24f1a40f02fb7c56f069cd3	2020-02-04 19:06:52 -08:00
svcscm	e4f633ba0b	Updating submodules Summary: GitHub commits: `619d2503cb` `c442208177` `75d9b18eba` `ed5142083a` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 11a53fea064f8e40c2a89d3068421d7cad231d00	2020-02-04 16:36:24 -08:00
Lara	4502d8c391	Interpolate Float [] support in ONNX (#32554 ) Summary: The PR https://github.com/pytorch/pytorch/pull/31791 adds support for float[] constant, which affects some cases of ONNX interpolate support. This PR adds float[] constants support in ONNX, updates interpolate in ONNX, and re-enable the disabled tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32554 Reviewed By: hl475 Differential Revision: D19566596 Pulled By: houseroad fbshipit-source-id: 843f62c86126fdf4f9c0117b65965682a776e7e9	2020-02-04 16:14:40 -08:00
Rohan Varma	bda874b480	[rpc] throw correct Exception on local client based on the RemoteException (#32936 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32936 Closes https://github.com/pytorch/pytorch/issues/32732. Currently if a UDF run in RPC throws an exception such as ValueError or TypeError, we wrap this in a RemoteException on the callee side. When raising this on the caller side, we currently raise a vanilla Exception. This diff changes it so that the correct exception is thrown. Tested by changing the current rpc tests to assert on the right type of error rather than just the base `Exception`. ghstack-source-id: 97706957 Test Plan: Modified unit test. Differential Revision: D19700434 fbshipit-source-id: e451b772ea6aecc1d2e109e67e7f932eb9151f15	2020-02-04 16:08:25 -08:00
John-Mark Allen	a9141dd240	Patch `Half.h` for compiling CUDA with clang (#29027 ) Summary: Following discussion: https://github.com/pytorch/pytorch/issues/28417 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29027 Differential Revision: D19698745 Pulled By: ezyang fbshipit-source-id: fab4be3bcbac8f3b334d7e0a56e6a790e2c6b6d8	2020-02-04 15:05:52 -08:00
Enealor	7ea6559658	Add size checks to `torch.stack` (#32931 ) Summary: Checks the size of each tensor passed to `torch.stack` before calling `cat` to address https://github.com/pytorch/pytorch/issues/29510. This is done in the `get_stack_input` function as that is a common path. The function now compares the size of each tensor in the TensorList to the size of the first tensor and throws an exception when the sizes are not equal. To compare: ``` x = torch.zeros([1, 2]) y = torch.zeros([1, 3]) torch.stack([x, y]) # Errors due to size differences ``` Current error: ``` RuntimeError: invalid argument 0: Sizes of tensors must match except in dimension 0. Got 2 and 3 in dimension 2 at (path)\aten\src\TH/generic/THTensor.cpp:612 ``` New error: ``` RuntimeError: stack expects each tensor to be equal size, but got [1, 2] at entry 0 and [1, 3] at entry 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32931 Differential Revision: D19700110 Pulled By: ezyang fbshipit-source-id: 7e18bb00fa2c137e418e340d719b6b76170b83e3	2020-02-04 15:00:54 -08:00
Ehsan Azar	58e8d5588a	[ONNX] Export bitwise_not for bool (logical_not) (#28439 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/25805 (for bool tensors as in the issue) Pull Request resolved: https://github.com/pytorch/pytorch/pull/28439 Differential Revision: D19700156 Pulled By: ezyang fbshipit-source-id: 0706ada6a8d259dce381ba2d009f226e14c3c14f	2020-02-04 14:45:58 -08:00
aviloria	4f5908d5d7	Remove unneded TORCH_API (#32015 ) Summary: It was causing a build error when compiling on MINGW64 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32015 Differential Revision: D19697296 Pulled By: ezyang fbshipit-source-id: 71e58783c48f8e99755c091b2027d59740dfca47	2020-02-04 14:44:35 -08:00
Ralf Gommers	6305e4a88f	Add warning and example for seeding to DistributedSampler (#32951 ) Summary: Closes gh-31771 Also note that the `epoch` attribute is only used as a manual seed in each iteration (so it could easily be changed/renamed). Seeding consecutive iterations with `[0, 1, 2, ...]` is low-entropy, however in practice it probably doesn't matter when using the sampler in combination with a dataloader (because there won't be enough data nor epochs to run into statistical issues due to low-entropy seeding). So leaving that as is. Rendered docstring: <img width="534" alt="image" src="https://user-images.githubusercontent.com/98330/73701250-35134100-46e9-11ea-97b8-3baeb60fcb37.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/32951 Differential Revision: D19729333 Pulled By: ezyang fbshipit-source-id: 3ddf90a3828b8bbae88aa2195a5d0b7d8ee1b066	2020-02-04 14:36:59 -08:00
Ashkan Aliabadi	b0d5ce3848	Revert D19710990: [pytorch][PR] properly update _flat_weights in RNN modules Test Plan: revert-hammer Differential Revision: D19710990 Original commit changeset: c978c7519464 fbshipit-source-id: 8710bc2f4f1d01d9c93d038b59caf1e6859375dd	2020-02-04 14:35:55 -08:00
cyy	27e1fecabd	let user specify CUDA_HOST_COMPILER Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32904 Differential Revision: D19729047 Pulled By: ezyang fbshipit-source-id: c233e3924f71a025c51d25a7e3a8d728dac8730a	2020-02-04 14:32:12 -08:00
hello@nicklashansen.com	d3a0bdd06b	proofreading (#29797 ) Summary: two instances of if -> it in torch.nn.modules.batchnorm.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/29797 Differential Revision: D19698613 Pulled By: ezyang fbshipit-source-id: 7312b2333f227113e904dfa91db90d00e525affb	2020-02-04 14:30:36 -08:00
Brian W. Hart	ea968f5cc3	fix possible pandas import error during tensorboard tests (#29650 ) Summary: TensorBoard tests using SummaryWriter() may fail with a pandas import complaint if TensorFlow packages are installed in the same python environment as PyTorch: Traceback (most recent call last): File "test_tensorboard.py", line 212, in test_writer with self.createSummaryWriter() as writer: File "test_tensorboard.py", line 64, in createSummaryWriter return SummaryWriter(temp_dir) ... File "[...]/site-packages/pandas/core/arrays/categorical.py", line 52, in <module> import pandas.core.algorithms as algorithms AttributeError: module 'pandas' has no attribute 'core' The exact failure may depend on the pandas version. We've also seen: File "[...]/site-packages/pandas/core/arrays/categorical.py", line 9, in <module> import pandas.compat as compat AttributeError: module 'pandas' has no attribute 'compat' The module import chain leading to the failure is tensorboard imports tensorflow imports tensorflow_estimator imports pandas. pandas includes a submodule named 'bottleneck', whose name collides with the PyTorch 'test/bottleneck/' subdirectory. So IF tensorboard, tensorflow, tensorflow_estimator, and pandas are installed in the python environment AND IF testing is run from within PyTorch's 'test/' directory (or maybe just with 'test/' in PYTHONPATH, etc.), then TensorBoard tests using SummaryWriter() will fail. Rename the 'bottleneck/' directory slightly to avoid the name collision. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29650 Differential Revision: D19698638 Pulled By: ezyang fbshipit-source-id: cb59342ed407cb37aefc833d67f768a8809129ac	2020-02-04 14:27:46 -08:00
Shinichiro Hamaji	478356aeec	Fix broken links in governance.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30815 Differential Revision: D19697401 Pulled By: ezyang fbshipit-source-id: d7e1a1b54039624f471b6cfb568428feb73060f4	2020-02-04 14:26:09 -08:00
peng	18d1896ba0	Fix confusing "does not have GPU support" warning message (#30721 ) Summary: Many people who use caffe2 are confused about "does not have GPU support" warning message. https://github.com/facebookresearch/video-nonlocal-net/issues/6 facebookarchive/caffe2#346 facebookarchive/caffe2#1634 facebookarchive/caffe2#197 Many none GPU reasons can cause this warning message. It is better to give the error info. ![image](https://user-images.githubusercontent.com/13826327/70129721-41175e00-16ba-11ea-85df-a4b1a1690149.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30721 Differential Revision: D19697413 Pulled By: ezyang fbshipit-source-id: bd24b7c814e7e677352068b9e9f77a68de080159	2020-02-04 14:20:00 -08:00
Shinichiro Hamaji	67706187fb	Fix a broken link in contribution_guide.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30814 Differential Revision: D19697403 Pulled By: ezyang fbshipit-source-id: b01fd0e189b3bc7ccaa197c9c64e12fee70a6310	2020-02-04 14:14:25 -08:00
nihui	b69c685c4a	try to find cudnn header in /usr/include/cuda (#31755 ) Summary: With fedora negativo17 repo, the cudnn headers are installed in /usr/include/cuda directory, along side with other cuda libraries. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31755 Differential Revision: D19697262 Pulled By: ezyang fbshipit-source-id: be80d3467ffb90fd677d551f4403aea65a2ef5b3	2020-02-04 14:10:32 -08:00
svcscm	e999095594	Updating submodules Summary: GitHub commits: `8f3d7019bb` `a5df50cf5c` `b896a52075` `3a073234da` `7c05bee055` `90f0aa9665` `5cdd1abbb9` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 70dd062814f68bda77e119bb9deaefbf71c551e6	2020-02-04 13:00:26 -08:00
peter	d3fa68eeec	Fix for MKL detection script on Windows (#32970 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32914. 1. Use `DEFINED ENV{MKLProductDir}` instead of `$ENV{MKLProductDir}` 2. Cache `INTEL_COMPILER_DIR` and `INTEL_MKL_DIR` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32970 Differential Revision: D19727677 Pulled By: soumith fbshipit-source-id: 065c6bee35a2295f1c478df1460cad7668b25af5	2020-02-04 12:41:39 -08:00
Jiakai Liu	e922826dda	[pytorch] simplify lazy initialization of DefaultCPUGenerator singleton (#32897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32897 Moving the default static instance into the method to achieve the same purpose. ghstack-source-id: 97570792 Test Plan: - CI Reviewed By: dreiss Differential Revision: D19674566 fbshipit-source-id: 27f54da66dd7667c34905eddaac6579e64aa1118	2020-02-04 11:37:14 -08:00
Mike Ruberry	aa3c871739	Adds TestViewOps, updates documentation (#32512 ) Summary: Understanding which ops return views and which return tensors with new storage is a common user issue, and an issue for developers connecting accelerators to PyTorch, too. This generic test suite verifies that ops which should return views do (and a few ops that shouldn't don't). The documentation has also been updated for .t(), permute(), unfold(), and select() to clarify they return views. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32512 Differential Revision: D19659454 Pulled By: mruberry fbshipit-source-id: b4334be9b698253a979e1bb8746fdb3ca24aa4e3	2020-02-04 11:10:34 -08:00
James Reed	341fb6d11d	Make caffe2/caffe2/python/models/seq2seq python3 compatible Test Plan: watiforsadcastle Reviewed By: dzhulgakov Differential Revision: D19698403 fbshipit-source-id: 36b73e07e598c848abbe368e522484da9ba4c78f	2020-02-04 10:51:47 -08:00
Jie	9e7c47644f	[NHWC CUDNN CONV]Update cudnn convolution memory_format behavior (#32482 ) Summary: 1. Allows both the memory_format of weight & input to dictate the output memory_format. 2. Provides utility function to recursively convert memory_format of Conv2d and ConvTranspose2d layers. This allows easy model conversion and ensures that lost memory_format through incompatible layers could be restored at Convolution-like layer, where significant performance boost is expected on later generation CUDA devices. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32482 Differential Revision: D19647903 Pulled By: VitalyFedyunin fbshipit-source-id: 62c96ff6208ff5e84fae1f55b63af9a010ad199a	2020-02-04 09:50:57 -08:00
Gregory Chanan	ec2c974bd5	Simplify some TH codegen by moving code out of the switch and killing dead code. (#32888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32888 This kills ~1500 lines of generated code by doing the following: 1) Stop binding _th_clone, which isn't used anymore. 2) Move allocation code out of the switch, because it doesn't need to be there, example: Now: ``` auto dispatch_scalar_type = infer_scalar_type(self); auto result_ = c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(c10::Storage(scalarTypeToTypeMeta(dispatch_scalar_type), 0, allocator(), true),DispatchKey::CPUTensorId).release(); auto result = Tensor(c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl>::reclaim(result_)); switch (dispatch_scalar_type) { case ScalarType::Bool: { ... case ScalarType::Byte: { ... ``` Before: ``` auto dispatch_scalar_type = infer_scalar_type(self); switch(dispatch_scalar_type) { case ScalarType::Bool: { auto result_ = c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(caffe2::TypeMeta::Make<bool>(), 0, allocator(), true),DispatchKey::CPUTensorId).release(); auto result = Tensor(c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl>::reclaim(result_)); case ScalarType::Byte: { auto result_ = c10::make_intrusive<TensorImpl, UndefinedTensorImpl>(caffe2::TypeMeta::Make<byte>(), 0, allocator(), true),DispatchKey::CPUTensorId).release(); auto result = Tensor(c10::intrusive_ptr<TensorImpl, UndefinedTensorImpl>::reclaim(result_)); ``` Note there's one extra lookup from ScalarType -> TypeMeta, but that can go away once we are able to put everything in a dispatch macro. 3) Prepare for more moves out of the switch by using dispatch_scalar_type where we would have used an explicit ScalarType::Name More moves are currently blocked by "real" types needing to map scalar_type -> C++ type. Dispatch macros can solve that, but I'll need to wrap the actual TH calls in templates so the entire thing can be done via dispatch. 4) Kill some codegen that isn't used anymore: ALLOC_WRAP, is_actual_return_long. Test Plan: Imported from OSS Differential Revision: D19672613 Pulled By: gchanan fbshipit-source-id: 753f480842d11757e10182e43b471bd3abaa5446	2020-02-04 08:41:20 -08:00
Kimish Patel	820410b505	Added upsample_neartest2d op for lite interpreter. (#32913 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32913 This enables mobile detection and tracking models. Test Plan: buck test caffe2/test/cpp/jit:jit -- JitTest.LiteInterpreterUpsampleNearest2d Reviewed By: iseeyuan Differential Revision: D19664502 fbshipit-source-id: 1c7270dcf394aba7b510c5aa80552c58a5038f24	2020-02-04 07:59:03 -08:00
Jeremy Lilley	b894dc06de	[Pytorch] Propagate errors in clearAndWaitForOutstandingRpcsAsync. (#32952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32952 When the Async() version of clearAndWaitForOutstandingRpcs() was written, we didn't yet have the generic Future<T> class, and hadn't worked out our error model fully. This change fixes that method to properly propagate the first encountered error to the future, using a bool+CAS. ghstack-source-id: 97665749 Test Plan: existing test coverage, buck test mode/dev-nosan caffe2/test/... Differential Revision: D19710337 fbshipit-source-id: 66ce5593a94a16ea624930dbb9409917ef5cfd5d	2020-02-03 20:47:51 -08:00
Yinghai Lu	b4b1b100bd	Add a loop test for onnxified net (#32935 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32935 Mock away the content of onnxified net with some low cost ops so that we can still mimic the input/output transfer while doing minimal work on the card. Test Plan: ``` buck run glow/fb/test:sparsenn_test -- --gtest_filter='SparseNNTest.vanillaC2' --onnxifi_debug_mode --onnxifi_loop_test_mode --nocaffe2_predictor_use_memonger ``` Differential Revision: D19631971 fbshipit-source-id: f970c55ccb410702f479255eeb750e01e3f8c2ae	2020-02-03 18:35:41 -08:00
Natalia Gimelshein	df71b3e23a	properly update _flat_weights in RNN modules (#32939 ) Summary: Should fix https://github.com/pytorch/pytorch/issues/32346 hopefully. Now when _flat_weights list is updated, `None` elements are appended to it if some weights are missing, subsequent `setattr` calls for the missing weights should repair _flat_weights and make it suitable to use in the backend. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32939 Differential Revision: D19710990 Pulled By: ngimel fbshipit-source-id: c978c7519464e94beeffa9bc33b9172854a2f298	2020-02-03 18:27:00 -08:00
Hong Xu	3cac9900ca	Clarify when softplus is reverted to linear. (#32945 ) Summary: The default value is removed because it is explained right below. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32945 Reviewed By: soumith Differential Revision: D19706567 Pulled By: ailzhang fbshipit-source-id: 1b7cc87991532f69b81aaae2451d944f70dda427	2020-02-03 17:54:31 -08:00
Basil Hosmer	544eab37d0	Move deprecation warning out of generated code into python_arg_parser. (#32907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32907 All op-specific information used in this logic was available to the parser itself, so the check can be done in that context, no codegen needed. No change in the warning behavior itself, mod minor formatting tweak - passes existing tests. Saves like ~275K binary size on mac: ``` -rwxr-xr-x 1 bhosmer 1876110778 16502064 Feb 1 00:43 torch/lib/libtorch_python.dylib -rwxr-xr-x 1 bhosmer 1876110778 16247888 Feb 1 00:44 torch/lib/libtorch_python.dylib ``` [codegen diff](https://github.com/bhosmer/scratch/compare/deprecation_warning_before...deprecation_warning_after) More important than the size savings is the minimization of codegen. Ideally the generated artifact should express distinctive per-op properties in as minimal a form as practically possible - e.g. here instead of generating check-and-warn behavior into every binding, we generate only the data that triggers the behavior in the parser. (And actually we were generating it already.) Test Plan: Imported from OSS Differential Revision: D19679928 Pulled By: bhosmer fbshipit-source-id: cf0140573118430720c6b797c762fe5be98acd86	2020-02-03 17:47:04 -08:00
Shinichiro Hamaji	612e621da0	Improve CHECK_OP macro (#29539 ) Summary: - Show values in question like glog. - Handle expressions with logical operators properly by adding parentheses around expressions. - Allow outputting nullptr (some build failed without this) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29539 Reviewed By: dreiss Differential Revision: D19698991 Pulled By: ljk53 fbshipit-source-id: e329c01622cfc386ac009904092519a4adfe94a8	2020-02-03 17:27:41 -08:00
Sameer Deshmukh	5ca7bf453d	Tests for verifying behaviour of BatchNorm using 0-dim batch sizes. (#32384 ) Summary: The `BatchNorm*` part of the issue (see gh-12013) seems to have been fixed in the master branch and these tests would make it concrete. However I would appreciate comments on https://github.com/pytorch/pytorch/issues/12013#issuecomment-575871264 on whether the current behaviour is satisfactory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32384 Differential Revision: D19704154 Pulled By: ngimel fbshipit-source-id: 1bbbbf1ae1215a460b22cf26e6b263e518ecf60b	2020-02-03 16:58:23 -08:00
Xiang Gao	9c2ed2574a	Vectorized memory access in TensorIterator GPU loop for 1d contiguous case (#32383 ) Summary: Step 2 of https://github.com/pytorch/pytorch/issues/31975 Vectorized memory access is enabled. Generated code: https://github.com/zasdfgbnm/things/blob/master/2020Q1/disassembly-elementwise-vec.ipynb ``` void at::native::modern::elementwise_kernel<4, 64, 4, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char, 3> >(int, at::native::add_kernel_cuda(at::TensorIterator&, c10::Scalar)::{lambda()https://github.com/pytorch/pytorch/issues/1}::operator()() const::{lambda()https://github.com/pytorch/pytorch/issues/4}::operator()() const::{lambda(float, float)https://github.com/pytorch/pytorch/issues/1}, at::detail::Array<char, 3>) ASM: .section .text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,"ax",progbits .sectioninfo @"SHI_REGISTERS=20" .align 128 .global _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_ .type _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,function .size _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,(.L_40898 - _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_) .other _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_,@"STO_CUDA_ENTRY STV_DEFAULT" _ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_: .text._ZN2at6native6modern18elementwise_kernelILi4ELi64ELi4EZZZNS0_15add_kernel_cudaERNS_14TensorIteratorEN3c106ScalarEENKUlvE_clEvENKUlvE2_clEvEUlffE_NS_6detail5ArrayIPcLi3EEEEEviT2_T3_: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 294 /0000/ IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ; /0010/ @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ; /0020/ S2R R9, SR_CTAID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 177 /0030/ S2R R0, SR_TID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 294 /0040/ IMAD.SHL.U32 R9, R9, 0x100, RZ ; /0050/ IADD3 R5, -R9, c[0x0][0x160], RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 256 /0060/ SHF.R.S32.HI R17, RZ, 0x1f, R9 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 296 /0070/ ISETP.GE.AND P0, PT, R5, 0x100, PT ; /0080/ @!P0 BRA `(.L_3173) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 256 /0090/ IMAD.SHL.U32 R12, R9.reuse, 0x4, RZ ; /00a0/ SHF.L.U64.HI R17, R9, 0x2, R17 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 260 /00b0/ IADD3 R8, P0, R12.reuse, c[0x0][0x188], RZ ; /00c0/ IADD3 R2, P1, R12, c[0x0][0x190], RZ ; /00d0/ IADD3.X R9, R17.reuse, c[0x0][0x18c], RZ, P0, !PT ; /00e0/ IADD3.X R3, R17, c[0x0][0x194], RZ, P1, !PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 218 /00f0/ IMAD.WIDE R8, R0, 0x10, R8 ; /0100/ IMAD.WIDE R2, R0, 0x10, R2 ; /0110/ LDG.E.128.SYS R8, [R8] ; /0120/ LDG.E.128.SYS R4, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 256 /0130/ IADD3 R12, P0, R12, c[0x0][0x180], RZ ; /0140/ IADD3.X R13, R17, c[0x0][0x184], RZ, P0, !PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 238 /0150/ IMAD.WIDE R12, R0, 0x10, R12 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 196 /0160/ FFMA R7, R7, c[0x0][0x168], R11 ; /0170/ FFMA R6, R6, c[0x0][0x168], R10 ; /0180/ FFMA R5, R5, c[0x0][0x168], R9 ; /0190/ FFMA R4, R4, c[0x0][0x168], R8 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 238 /01a0/ STG.E.128.SYS [R12], R4 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 301 /01b0/ EXIT ; .L_3173: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /01c0/ ISETP.GE.AND P0, PT, R0, R5, PT ; /01d0/ BMOV.32.CLEAR RZ, B0 ; /01e0/ BSSY B0, `(.L_3174) ; /01f0/ P0 BRA `(.L_3175) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183 /0200/ IADD3 R3, P1, R9, R0, RZ ; /0210/ LEA.HI.X.SX32 R4, R0, R17, 0x1, P1 ; /0220/ LEA R2, P1, R3, c[0x0][0x188], 0x2 ; /0230/ LEA.HI.X R3, R3, c[0x0][0x18c], R4, 0x2, P1 ; /0240/ LDG.E.SYS R8, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184 /0250/ IADD3 R4, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /0260/ ISETP.GE.AND P1, PT, R4, R5, PT ; /0270/ P1 BRA `(.L_3175) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183 /0280/ LDG.E.SYS R4, [R2+0x100] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184 /0290/ IADD3 R6, R0, 0x80, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /02a0/ ISETP.GE.AND P1, PT, R6, R5, PT ; /02b0/ P1 BRA `(.L_3175) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184 /02c0/ IADD3 R10, R0, 0xc0, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183 /02d0/ LDG.E.SYS R7, [R2+0x200] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /02e0/ ISETP.GE.AND P1, PT, R10, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183 /02f0/ @!P1 LDG.E.SYS R6, [R2+0x300] ; .L_3175: /0300/ BSYNC B0 ; .L_3174: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /0310/ BMOV.32.CLEAR RZ, B0 ; /0320/ BSSY B0, `(.L_3176) ; /0330/ P0 BRA `(.L_3177) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183 /0340/ IADD3 R3, P1, R9, R0, RZ ; /0350/ LEA.HI.X.SX32 R10, R0, R17, 0x1, P1 ; /0360/ LEA R2, P1, R3, c[0x0][0x190], 0x2 ; /0370/ LEA.HI.X R3, R3, c[0x0][0x194], R10, 0x2, P1 ; /0380/ LDG.E.SYS R11, [R2] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184 /0390/ IADD3 R10, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /03a0/ ISETP.GE.AND P1, PT, R10, R5, PT ; /03b0/ P1 BRA `(.L_3177) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183 /03c0/ LDG.E.SYS R13, [R2+0x100] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184 /03d0/ IADD3 R10, R0, 0x80, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /03e0/ ISETP.GE.AND P1, PT, R10, R5, PT ; /03f0/ P1 BRA `(.L_3177) ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 184 /0400/ IADD3 R10, R0, 0xc0, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 180 /0410/ ISETP.GE.AND P1, PT, R10, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 183 /0420/ LDG.E.SYS R10, [R2+0x200] ; /0430/ @!P1 LDG.E.SYS R15, [R2+0x300] ; .L_3177: /0440/ BSYNC B0 ; .L_3176: //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193 /0450/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196 /0460/ IADD3 R9, P0, R9, R0, RZ ; /0470/ FFMA R11, R11, c[0x0][0x168], R8 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 197 /0480/ IADD3 R14, R0, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196 /0490/ LEA.HI.X.SX32 R12, R0, R17, 0x1, P0 ; /04a0/ LEA R2, P0, R9.reuse, c[0x0][0x180], 0x2 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193 /04b0/ ISETP.GE.AND P1, PT, R14, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196 /04c0/ LEA.HI.X R3, R9, c[0x0][0x184], R12, 0x2, P0 ; /04d0/ STG.E.SYS [R2], R11 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193 /04e0/ P1 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 197 /04f0/ IADD3 R8, R0, 0x80, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 196 /0500/ FFMA R13, R13, c[0x0][0x168], R4 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193 /0510/ ISETP.GE.AND P0, PT, R8, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196 /0520/ STG.E.SYS [R2+0x100], R13 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193 /0530/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 197 /0540/ IADD3 R0, R0, 0xc0, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 196 /0550/ FFMA R7, R10, c[0x0][0x168], R7 ; /0560/ FFMA R15, R15, c[0x0][0x168], R6 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193 /0570/ ISETP.GE.AND P0, PT, R0, R5, PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196 /0580/ STG.E.SYS [R2+0x200], R7 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 193 /0590/ P0 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/MemoryAccess.cuh", line 196 /05a0/ STG.E.SYS [R2+0x300], R15 ; /05b0/ EXIT ; .L_3178: /05c0/ BRA `(.L_3178); /05d0/ NOP; /05e0/ NOP; /05f0/ NOP; .L_40898: ``` We can clearly see the `LDG.E.128` in it, which is a result of vectorization. Benchmark: https://github.com/zasdfgbnm/things/blob/master/2020Q1/benchmark-vec.ipynb Benchmark on P100, dtype `uint8`: before: ``` 1.4.0a0+a5b4d78 e1d97025eeeddcf083e9bee0c8f6a53168991a71 22.2 µs ± 89.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 34.7 µs ± 38.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 52 µs ± 312 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 86.9 µs ± 135 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 154 µs ± 204 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 291 µs ± 668 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 566 µs ± 1.16 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.18 ms ± 1.54 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.29 ms ± 1.48 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 4.4 ms ± 1.15 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` after: ``` 1.4.0a0+a5b4d78 1281cdfd8188fe86241ecaf71d001809d016c3a3 24 µs ± 116 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 30.5 µs ± 355 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 43.1 µs ± 300 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 67.6 µs ± 113 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 116 µs ± 275 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 215 µs ± 142 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 413 µs ± 791 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 824 µs ± 891 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.63 ms ± 478 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.19 ms ± 1.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Benchmark on P100, dtype `half`: Before: ``` 1.4.0a0+a5b4d78 1c017f0c14c91bd5125ab387a90441b0c0e2f3ad 30.8 µs ± 226 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 43.4 µs ± 164 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 69.1 µs ± 83 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 119 µs ± 103 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 224 µs ± 99.1 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 418 µs ± 206 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 865 µs ± 237 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.69 ms ± 695 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.3 ms ± 527 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 6.77 ms ± 741 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` After ``` 1.4.0a0+a5b4d78 7e50ee27333e7047072d328d03767b4845286356 28.9 µs ± 61.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 40.2 µs ± 244 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 63.8 µs ± 350 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 109 µs ± 196 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 199 µs ± 157 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 380 µs ± 446 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 743 µs ± 2.17 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.47 ms ± 1.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) 2.91 ms ± 9.17 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) 5.8 ms ± 296 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` cc: csarofeen ptrblck Pull Request resolved: https://github.com/pytorch/pytorch/pull/32383 Differential Revision: D19697455 Pulled By: ngimel fbshipit-source-id: 0707481c2f334e6634c000b4afd275b2fee8fbe1	2020-02-03 16:20:40 -08:00
Hector Yuen	4baadd54d7	add SpatialBN lowered fake fp16 Summary: SpatialBNFakeLoweredFp16NNPI this is the fake operator for SpatialBN that gets lowered into add/mul/div, etc. Test Plan: test_spatialbn Reviewed By: tracelogfb, amylittleyang Differential Revision: D19658680 fbshipit-source-id: 2abddbcd9a2023ac75c494f20eaac2051b7139dc	2020-02-03 15:03:34 -08:00
neginraoof	5c019fede3	[ONNX] Fix for constant folding flaky tests (#32546 ) Summary: Fix for constant folding flaky tests Looks like the constant folding test modules are sometimes exported with ONNX_ATEN op export type, which is causing the CI failures. I'm unable to repro this issue locally, but my guess is that the op export param is being overwritten on CI build at some point. This PR sets the op export type and hopefully fixes the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32546 Reviewed By: hl475 Differential Revision: D19606919 Pulled By: houseroad fbshipit-source-id: 31793d6857bbbf99b43b4a7c22a045a56ae19e44	2020-02-03 14:23:50 -08:00
Pritam Damania	a751ddaaa5	Use leaky singletons for torch.distributed. (#32923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32923 As per https://isocpp.org/wiki/faq/ctors#construct-on-first-use-v2 and https://isocpp.org/wiki/faq/ctors#static-init-order-on-first-use-members, we should be using leaky singletons to avoid static initialization order problem. Closes https://github.com/pytorch/pytorch/issues/27412 ghstack-source-id: 97601384 Test Plan: waitforbuildbot Differential Revision: D19688986 fbshipit-source-id: 8c1935fb7da8a7116dbca55eb43dc04bc02695ac	2020-02-03 14:15:18 -08:00
Santiago Castro	6996f8d880	Add missing `default_collate` in dataloader.pyi Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28935 Differential Revision: D19698781 Pulled By: ezyang fbshipit-source-id: abdd735c98656ed16cd326529441d1fcec2ace3e	2020-02-03 14:01:49 -08:00
BowenBao	1c42b9466b	[ONNX] Update support of exporting bool type index mask (#32445 ) Summary: e.g. `tensor[torch.tensor([0, 1, 0], dtype=torch.bool)]` Previously the mask is of type uint8. Both uint8 and bool should be supported for export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32445 Reviewed By: hl475 Differential Revision: D19610713 Pulled By: houseroad fbshipit-source-id: 8df636e0c3cb0b82919a689242a962c79220209c	2020-02-03 13:01:14 -08:00
neginraoof	e03e4f3a2d	[ONNX] Add einsum export (#32716 ) Summary: Adding symbolic for onnx einsum as part of opset 12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32716 Reviewed By: hl475 Differential Revision: D19626168 Pulled By: houseroad fbshipit-source-id: d8cc8af5f05f36aca3cd55dead602261ccdfec51	2020-02-03 12:56:50 -08:00
Santiago Castro	167a892e99	Add missing `shuffle` attribute to DistributedSampler typing file Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28763 Differential Revision: D19698808 Pulled By: ezyang fbshipit-source-id: 7820acd7b0715ebf1d9ae954dca0058b6759075e	2020-02-03 12:02:58 -08:00
Yang Yang	48eff08256	Fix the level of headers in pytorch/CONTRIBUTING.md (#28412 ) Summary: Running Clang-Tidy, Pre-commit Tidy/Linting Hook, Building PyTorch with ASAN shouldn't belong to Windows development tips. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28412 Differential Revision: D19700228 Pulled By: ezyang fbshipit-source-id: 39d999c68e4bd9264f4ae1fdab517871c883a663	2020-02-03 11:50:25 -08:00
Jonathan Reynolds	14c15eb3b0	Py2 -> py3 for caffe2/caffe2/contrib/tensorboard (#32882 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32882 Update tensorboard binary and unit tests to python 3 Test Plan: ``` > buck test //caffe2/caffe2/contrib/tensorboard:tensorboard_test ``` ``` > buck test //caffe2/caffe2/contrib/tensorboard:tensorboard_exporter_test ``` Reviewed By: sanekmelnikov Differential Revision: D19670873 fbshipit-source-id: f5eb65ccbb4ecfdc801b9fa05a60d4c5c29dc428	2020-02-03 11:36:35 -08:00
David Samuel	00c6b90327	Fix in documentation of convolutional modules (#30079 ) Summary: I noticed the description of the initialization of convolutional modules is inconsistent with the actual implementation. There are two such cases: 1) `k` in the initialization of ConvTranspose modules is not dependent on the input channels but on the output channels (`kaiming_uniform_` uses the size of the second dimension of `weight` which is transposed in the first two dimensions). 2) Both the normal convolutions and the transposed ones use `k` divided by `groups`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30079 Differential Revision: D19698511 Pulled By: ezyang fbshipit-source-id: 1ba938fbbd97663eaf29fd1245872179d2761fff	2020-02-03 11:22:36 -08:00
Pierre Fenoll	37953d92d1	raise when jit-load.ing a folder (#27836 ) Summary: Very similar to https://github.com/pytorch/pytorch/issues/16267 but handling directories. Stoked to contribute! Pull Request resolved: https://github.com/pytorch/pytorch/pull/27836 Differential Revision: D19698398 Pulled By: ezyang fbshipit-source-id: eabc3a44d258124f860babb47ab91e22c2c3d6cc	2020-02-03 11:19:57 -08:00
kngwyu	3fa907c145	[docs] Fix argument type of torch.masked_select (#30385 ) Summary: This should be `BoolTensor` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30385 Differential Revision: D19698414 Pulled By: ezyang fbshipit-source-id: 68f1e10eb9d4b99552bb158f6ad7e6ff0f7cc1c4	2020-02-03 11:15:11 -08:00
BowenBao	10183061eb	[ONNX] Update ONNX landing page since 1.3 (#32805 ) Summary: * New ops supported for exporting. * Updates on support for tensor indexing and dynamic list of tensors. * lara-hdr, spandantiwari Should we also include updates on torchvision support in this page? cc houseroad, neginraoof Please review if I have missed anything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32805 Reviewed By: hl475 Differential Revision: D19635699 Pulled By: houseroad fbshipit-source-id: b6be4fce641f852dcbceed20b4433f4037d8024a	2020-02-03 10:38:29 -08:00
StandbyMe	ef50161ec9	[JIT] Update OVERVIEW.md Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28870 Differential Revision: D19698758 Pulled By: ezyang fbshipit-source-id: 23167ec5bf9f7ab81012a124206bb4c2bdd6ca06	2020-02-03 10:32:36 -08:00
Wojciech Baranowski	7cddc302e5	min, max: check that operand and outputs are on the same device type (#32862 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/32001 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32862 Differential Revision: D19695935 Pulled By: ezyang fbshipit-source-id: bb37eb7a187214aa69259828024366f479a258d7	2020-02-03 10:16:22 -08:00
Sameer Deshmukh	b34e0dda24	Emit the C++ version when compiling pytorch from source. (#32819 ) Summary: The need for this is felt because sometimes we change a build script and change the `std=c++XX` flag, which does not get caught until the compilation has progressed for a while. https://github.com/pytorch/pytorch/issues/31757 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32819 Differential Revision: D19697205 Pulled By: ezyang fbshipit-source-id: b045a1d15e24c4c6007b5d1464756051d32bf911	2020-02-03 10:12:03 -08:00
cshesse	c841ab403c	add missing method annotations to torch.Tensor (#30576 ) Summary: Looks like some of the tensor methods defined in https://github.com/pytorch/pytorch/blob/master/torch/tensor.py#L393 were missing. Also add missing self object to `map_` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30576 Differential Revision: D19698355 Pulled By: ezyang fbshipit-source-id: 6df99f17d5de11715dbe89aecb292612405c08ac	2020-02-03 09:59:14 -08:00
Enealor	e085c55e53	Fix `\\` warnings/errors when building optim documentation (#32911 ) Summary: This PR fixes the warnings and errors attributed to the use of `\\` outside of a proper environment. While rendered correctly in the documentation, it produces the warning ``` LaTeX-incompatible input and strict mode is set to 'warn': In LaTeX, \\ or \newline does nothing in display mode [newLineInDisplayMode] ``` on the CI tools and errors with ``` ParseError: KaTeX parse error: Expected 'EOF', got '\\' at position (x): ... ``` when not set to warn. This PR also makes minor formatting adjustments. The `CosineAnnealingLR` documentation has been adjusted to remove an unnecessarily large fraction and to improve spacing. The `SGD` documentation has been adjusted so that variables are consistently typeset and so that it follows the convention of punctuating equations. I attached images of the current documentation, the new documentation and a marked version to highlight differences. * SGD: New: ![new_sgd](https://user-images.githubusercontent.com/53704971/73596383-98795500-44d6-11ea-97ce-bac02a0a1638.png) Current: ![current_sgd](https://user-images.githubusercontent.com/53704971/73596384-98795500-44d6-11ea-86d3-b407cebbb513.png) Marked new: ![marked_sgd](https://user-images.githubusercontent.com/53704971/73596385-98795500-44d6-11ea-9e06-9ac5e5e27270.png) * CosineAnnealingLR: New: ![new_calr](https://user-images.githubusercontent.com/53704971/73596382-98795500-44d6-11ea-9c90-02406d297bae.png) Current: ![current_calr](https://user-images.githubusercontent.com/53704971/73596387-9911eb80-44d6-11ea-93fb-ee72d695312a.png) Marked new: ![marked_calr](https://user-images.githubusercontent.com/53704971/73596386-9911eb80-44d6-11ea-91a6-ed7a62b4e255.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32911 Differential Revision: D19697114 Pulled By: ezyang fbshipit-source-id: 567304bd4adcfa4086eae497cb818cf74375fe5d	2020-02-03 09:54:38 -08:00
Hong Xu	7101f6b5c0	Properly handle NaN in binary max and min (#32541 ) Summary: The output depends asymmetrically on whether the first or the second argument is NaN. See https://github.com/pytorch/pytorch/issues/25016 for detail of the issue. This is part of a continuing effort that was dropped in https://github.com/pytorch/pytorch/issues/30851 The failure in https://github.com/pytorch/pytorch/issues/27185 is resolved by explicitly casting a half type number to float when applying `isnan`. Close https://github.com/pytorch/pytorch/issues/25016 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32541 Differential Revision: D19644643 Pulled By: VitalyFedyunin fbshipit-source-id: 8d49e6ed5a9996a817df7a9419dc5eee601430bc	2020-02-03 09:04:39 -08:00
Nikolay Novik	e87887ccb4	Update type hints for torch.optim.optimizer.Optimizer (#32900 ) Summary: This PR fixes type hints for `torch.optim.optimizer.Optimizer` object, issue also reported in https://github.com/pytorch/pytorch/issues/23731 To test things I used following optimiser implementation, that is fully covered with type hints: ```python from typing import Optional, Callable, Union, Iterable from torch import Tensor from torch.optim.optimizer import Optimizer OptClosure = Optional[Callable[[], float]] _params_t = Union[Iterable[Tensor], Iterable[dict]] class SGD(Optimizer): def __init__(self, params: _params_t, lr: float = 0.1) -> None: defaults = dict(lr=lr) super(SGD, self).__init__(params, defaults) def __setstate__(self, state: dict) -> None: super(SGD, self).__setstate__(state) def step(self, closure: OptClosure = None) -> Optional[float]: loss = None if closure is not None: loss = closure() for group in self.param_groups: for p in group['params']: if p.grad is None: continue d_p = p.grad.data p.data.add_(-group['lr'], d_p) return loss ``` Without fix `mypy` reports bunch of inconsistencies in types and missing properties: ```bash $ mypy torch_optimizer/sgd.py torch_optimizer/sgd.py:14: error: Too many arguments for "__init__" of "Optimizer" torch_optimizer/sgd.py:17: error: "__setstate__" undefined in superclass torch_optimizer/sgd.py:19: error: Return type "Optional[float]" of "step" incompatible with return type "None" in supertype "Optimizer" torch_optimizer/sgd.py:24: error: "SGD" has no attribute "param_groups" Found 4 errors in 1 file (checked 1 source file) ``` with fix not issues: ```bash $ mypy torch_optimizer/sgd.py Success: no issues found in 1 source file ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32900 Differential Revision: D19697175 Pulled By: ezyang fbshipit-source-id: d5e2b3c421f69da3df8c32b3d53b4b6d15d61a41	2020-02-03 09:00:01 -08:00
caozhong	29e6f13cd1	Enable MKL on MacOS if installed (#32905 ) Summary: Fix cmake script that missed MKL directories Signed-off-by: caozhong <zhong.z.cao@intel.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/32905 Differential Revision: D19688496 Pulled By: ezyang fbshipit-source-id: d04a608eea5f983e153a48b0b1eb0390aebbe6c0	2020-02-02 14:57:43 -08:00
svcscm	f8dd65f2a1	Updating submodules Summary: GitHub commits: `e384ddc186` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 18d4371821439388a6b546a1953c31856c80ec85	2020-02-02 14:56:10 -08:00
svcscm	ff0ba563d5	Updating submodules Summary: GitHub commits: `6eb4ee98ba` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 74dda0be26516756cd4d4d2df2167392fc48074a	2020-02-02 12:22:16 -08:00
Hong Xu	71ad88199a	Clarify the searched string is displayed in the error message Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32789 Differential Revision: D19646635 Pulled By: suo fbshipit-source-id: 18233fee7c75f7da2a1826fb66f78a519e6d9c77	2020-02-01 17:24:37 -08:00
Will Feng	b564eaf7a8	Bug fixes: torch::tensor(floating-point values) -> default dtype, and torch::tensor(integer values) ->at::kLong (#32367 ) Summary: Some of the `torch::tensor` behavior is updated to better match Python API. Fixes https://github.com/pytorch/pytorch/issues/32234. This PR is BC-breaking in the following way: - `torch::tensor({1.0f, 2.0f})`: float -> default dtype - `torch::tensor(at::ArrayRef<int>({1, 2, 3}))`: int -> at::kLong - `torch::tensor(std::vector<int>({1, 2, 3}))`: int -> at::kLong - `torch::tensor(at::ArrayRef<float>({1.f, 2.f, 3.f}))`: float -> default dtype - `torch::tensor(std::vector<float>({1.f, 2.f, 3.f}))`: float -> default dtype - `torch::tensor(at::ArrayRef<double>({1., 2., 3.}))`: double -> default dtype - `torch::tensor(std::vector<double>({1., 2., 3.}))`: double -> default dtype Pull Request resolved: https://github.com/pytorch/pytorch/pull/32367 Differential Revision: D19498484 Pulled By: yf225 fbshipit-source-id: 19c8dc2a56476266153cff4c404e7f84d309eb12	2020-02-01 15:00:07 -08:00
Zafar Takhirov	4cc6e6bbbe	Adding scalar to the c10 registration type check Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32886 Test Plan: Imported from OSS Differential Revision: D19673484 Pulled By: z-a-f fbshipit-source-id: ea8478a4fe6788dcb044ec1ab7d51dc50ab3fa60	2020-02-01 13:15:50 -08:00
svcscm	ce07fb26c0	Updating submodules Summary: GitHub commits: `3f4acb24bb` `930ea23548` `c0c5daf3db` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 878178c5412375d74e7f64d7e4142f57ddbc931f	2020-02-01 13:14:30 -08:00
svcscm	c83f984906	Updating submodules Summary: GitHub commits: `5adba3596a` `d8b4f2ff66` `daa254211a` `9c4684ff10` `fdb82b21cb` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 4e74f7e888cc2004ba937d3bb253645fbd2388c5	2020-01-31 23:24:51 -08:00
Elias Ellison	040bc1d0e1	[JIT] make is_scripting a condvalue (#32871 ) Summary: Add `torch.jit.is_scripting` to the list of CondValues, or values that if they are an input to a if statement we only compile one side of the if. I'm not sure if we actually want this PR. Pros: - Makes it easier to add features that are not yet supported in TorchScript (like has_torch_function) - The current idiom of writing `torch.jit.is_scripting` and factoring out the block to a function annotated with `torch.jit.ignore` is functionally equivalent and much more cumbersome Cons: - Makes it easier to add features that are not yet supported in TorchScript - Perhaps is confusing as a reader what is being compiled. Potentially could give all caps name or otherwise change name to make it more visually stand out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32871 Differential Revision: D19670383 Pulled By: eellison fbshipit-source-id: 5257b0bd23c66f199d59a7f2c911e948301e5588	2020-01-31 18:23:42 -08:00
Sampath Mummadi	4d7ab255d3	[PyTorch][TorchScript] Add support for join on List of strings in TorchScript (#32847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32847 Add support for join on List of strings in TorchScript. Test Plan: (pytorch) smummadi@smummadi-mbp pytorch % python test/test_jit_string.py Fail to import hypothesis in common_utils, tests are not derandomized . Ran 1 test in 1.090s OK Differential Revision: D19650809 fbshipit-source-id: 387a8f0e3cc3111fd3dadd3d54c90fc8c7774cf9	2020-01-31 18:20:38 -08:00
Rohan Varma	144eb59756	[rpc] don't crash callee when function does not exist on it, instead return Exception (#32726 ) Summary: Closes https://github.com/pytorch/pytorch/issues/27368. Previously, if a function `'func` did not exist on worker A but existed in B, and the user ran `rpc.rpc_sync(A, func)`, A would crash with a segmentation fault since it is not able to find the function. B would eventually timeout since RPCs by default time out in 60s. At the root this comes from an unhandled exception when trying to deserialize the `PythonUDF` to run. This PR makes it so that we can recover from this error, and A reports back a `RemoteException` to B indicating that the function was not found. Now, A will no longer crash and B can handle the exception appropriately and with more information. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32726 Differential Revision: D19648825 Pulled By: rohan-varma fbshipit-source-id: 53847f4bfb68187db41c61d69ddac13613e814b4	2020-01-31 18:02:12 -08:00
svcscm	a8d39a7937	Updating submodules Summary: GitHub commits: `e0fd90427f` `c892e21dc6` `3cdc99f2b2` `800d24ddc5` `74326cdb3c` `e4af160c09` `6c2fb05f6d` `a0555ecf37` `e4122f77fc` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 9e3e0a7231c3e5cc0167cd935541dd7a8a4ea84d	2020-01-31 17:56:39 -08:00
Xingying Cheng	4493b10500	[PyTorch] Gate out mobile operator logging observer. Summary: Introduce separate gating for mobile operator logging observer. Reviewed By: ljk53 Differential Revision: D19665993 fbshipit-source-id: b81a228c55110a02edb8c2b6f9fd02e750b2ad69	2020-01-31 17:25:53 -08:00
Elias Ellison	10bd21d550	[JIT] fix nested select assign (#32877 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/31902 ``` self.sub.a = 1 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32877 Differential Revision: D19670322 Pulled By: eellison fbshipit-source-id: 6d8f350b4d1169be1d2a56050fccd7c246ad9212	2020-01-31 16:58:26 -08:00
Omkar Salpekar	ad78c0f4fc	Fixed the flaky test_rref_context_debug_info (#32749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32749 The test was flaky since the message from owner RRef confirming fork would arrive after the test checked whether the pending User RRefs map was empty - leading to an assertion error. This diff creates a utility function that should be used by any test to wait for this message to complete processing before doing any assertions related to the pending User RRefs map. GitHub Issue: https://github.com/pytorch/pytorch/issues/30988 Test Plan: Stress tested `test_rref_context_debug_info` 200 times. Differential Revision: D19612289 fbshipit-source-id: 57a7c19b1cf792b94c263d3efbbbb6da60c07d07	2020-01-31 16:53:18 -08:00
Charles Hofer	d03c9aaa05	Fix upsampling test case on ppc (#32786 ) Summary: Power and x86 are giving slightly different results when scaling images up using `torch.nn.functional.interpolate` and when using OpenCV's `resize`. This is causing `test_upsampling_not_recompute_scale_factor` to fail on Power, but not x86. This changes the expected value to what OpenCV on Power produces if the test case is running on Power as well. See https://github.com/pytorch/pytorch/issues/31915 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/32786 Differential Revision: D19672053 Pulled By: ezyang fbshipit-source-id: 3497f852bdc6d782646773792f9107c857c7b806	2020-01-31 16:40:56 -08:00
Elias Ellison	fe01376ffe	[JIT] namedtuple constants (#32873 ) Summary: If there was a namedtuple with immutable constant inputs, that was also the input / output of a function which expected a namedtuple it would fail. Fix by using namedtuple constructor on serialization. (no one has run into this bug yet). Pull Request resolved: https://github.com/pytorch/pytorch/pull/32873 Differential Revision: D19668807 Pulled By: eellison fbshipit-source-id: bae33506e53b6a979b4e65a3e7c989b1408c98f4	2020-01-31 15:25:31 -08:00
Zafar Takhirov	fbe121e395	Quantized sigmoid function Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31851 Test Plan: Imported from OSS Differential Revision: D19280716 Pulled By: z-a-f fbshipit-source-id: f47d37e32a675756fcaca293e2c14f90c43891de	2020-01-31 14:40:21 -08:00
Kushashwa Ravi Shrimali	7b65acdf9e	Solves Issue #32750 - torch.prod now works fine with FP16 Input Tensor and FP32 Output Tensor (#32831 ) Summary: This PR solves Issue https://github.com/pytorch/pytorch/issues/32750. - Changes function prod_kernel_impl to use `out_t` argument instead of `scalar_t` (which caused the garbage output for FP16 input and FP32 output tensor type). - Adds test case for `torch.prod` (for CUDA): tests both `torch.prod` and `torch.tensor.prod`. Checks all the combinations for dtypes: `torch.float16` and `torch.float32`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32831 Differential Revision: D19664666 Pulled By: ngimel fbshipit-source-id: c275363355c832899f10325043535949cd12b2f8	2020-01-31 14:25:08 -08:00
Jerry Zhang	8ddd5bb0e9	Don't serialize None values in observer (#32733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32733 Similar to https://github.com/pytorch/pytorch/pull/32318, we should stop serializing None values since they can't be broadcasted Test Plan: Imported from OSS Differential Revision: D19611586 Pulled By: jerryzh168 fbshipit-source-id: 369881de0567ed8eb25bdada892227f49bb5b29d	2020-01-31 13:28:43 -08:00
Gregory Chanan	1760d5b83c	Remove wrap_dim from codegen layer. (#32738 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32738 This is to simplify the codegen layer, with the goal of making it simple enough to just check in. Test Plan: Imported from OSS Differential Revision: D19610927 Pulled By: gchanan fbshipit-source-id: 760734f579b1f655775e6d270918c361985f3743	2020-01-31 13:13:35 -08:00
Hong Xu	660a93c558	Code cleaning: Some iterating variables in builtin_functions.cpp can be const (#32852 ) Summary: To suppress a clang-tidy warning: torch/csrc/jit/script/builtin_functions.cpp#L89 [performance-for-range-copy] warning: loop variable is copied but only used as const reference; consider making it a const reference Also make the const qualifier of scalar explicit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32852 Differential Revision: D19663277 Pulled By: ezyang fbshipit-source-id: f4ec5688d3cbea9a5f40db6063b7d111b0bf0cce	2020-01-31 12:55:20 -08:00
Jiakai Liu	ada966b7d7	[pytorch] avoid `thread_local std::vector<Call>` for mobile build (#32849 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32849 We learned that Android NDK's gcc + gnustl combination might produce a use-after-free for thread_local variables with non-trivial destructors. This PR removes such a thread_local use case from error_report.cpp for mobile build, which is the only case included in mobile lite-JIT build. ghstack-source-id: 97491327 Test Plan: - CI Reviewed By: dreiss Differential Revision: D19652702 fbshipit-source-id: ee8d316ad5c6e6c8a8006eb25f3bba1618dd7e6d	2020-01-31 12:48:57 -08:00
Gao, Xiang	d9e99ab544	Loops.cuh legacy code cleanup -- gpu_kernel_with_index (#32777 ) Summary: I didn't see any use case where the functor of `gpu_kernel_with_index` needs to have argument other than the index. Merge conflict with https://github.com/pytorch/pytorch/pull/32755. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32777 Differential Revision: D19646381 Pulled By: ngimel fbshipit-source-id: 81d2be74170457e39943274e3689845e83758bfa	2020-01-31 12:02:50 -08:00
svcscm	fd3bd7777d	Updating submodules Summary: GitHub commits: `01fc273e29` `53222db222` `dea724242e` `3dd493b166` `ec496347bc` `03f4ec299e` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: e362b5df2099f1c3dd2ef7702d4bbd5bb85e4b27	2020-01-31 11:54:30 -08:00
Hong Xu	b16dab8a41	Coding header is better specified in lowercase letters (#32850 ) Summary: The Python document <https://www.python.org/dev/peps/pep-0263/> gives all examples using lowercase letters. Although it doesn't say straightly, the following paragraph seems to indicate that uppercase letters aren't legitimate: > If a source file uses both the UTF-8 BOM mark signature and a magic encoding comment, the only allowed encoding for the comment is 'utf-8'. Any other encoding will cause an error. My Emacs also complains about the uppercase letters every time I save the file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32850 Differential Revision: D19663281 Pulled By: ezyang fbshipit-source-id: 48127d3c2fd6e22dd732a2766913735136ec2ebc	2020-01-31 10:02:30 -08:00
svcscm	22466552e3	Updating submodules Summary: GitHub commits: `edc4a4f551` `72c7112964` `62c8286307` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 92dd070a28091dda81e315591d6d12cddfecf00f	2020-01-31 10:01:15 -08:00
svcscm	ed10408cc6	Updating submodules Summary: GitHub commits: `a3394d248c` `91f92d0106` `e50c78af57` `d49bb54c3d` `504fda5cda` `42086f8764` `d5b454a9c0` `0e31e0a8b0` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 7ce9d3444d653c6889ffe080425aa082c33f137a	2020-01-30 22:05:39 -08:00
Martin Yuan	03557a9838	Make save_for_lite_interpreter private (#32771 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32771 It's a patch to #32621, make the api private. Test Plan: Imported from OSS Differential Revision: D19657307 Pulled By: iseeyuan fbshipit-source-id: e604a0cbed6a1e61413daaafc65bea92b90f1f5d	2020-01-30 21:01:54 -08:00
Nikolay Korovaiko	c3b4bfcfed	Add knobs to set the number of profiling runs and bailout depth (#32735 ) Summary: Diagnostic API to simplify debugging and experiments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32735 Differential Revision: D19626708 Pulled By: Krovatkin fbshipit-source-id: aa8c0da94d4559329fd7c8093329aea4e0271b6a	2020-01-30 18:50:56 -08:00
Shihao Xu	12bcfa7c77	Remove Python dependency (toPyTuple/fromPyTuple, jitCompilationUnit, deserialize) in rref_impl.h/cpp (#32753 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32753 Functions to be bound as an Aten operator could not have Python dependency. This is to refactor and remove Python dependency. ghstack-source-id: 97485800 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_script_functions_not_supported buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_script_functions_not_supported ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call ``` Differential Revision: D5741675 fbshipit-source-id: 31ee60955be8d815d0773f3699e3ff2f1f9d8849	2020-01-30 17:52:48 -08:00
Natalia Gimelshein	29fabb1fbc	make tests for empty inputs check zero parameter grads (#32820 ) Summary: Make batch norm with empty inputs return zero parameter gradients. Now batch norm, group norm and convolutions now return zero grads for parameters, so make tests check that. Fixes some bullet points in https://github.com/pytorch/pytorch/issues/12013 (interpolate is not fixed by this PR, is being fixed in other PRs) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32820 Differential Revision: D19651470 Pulled By: ngimel fbshipit-source-id: 96fdd085f9b0e98e91217dd2ac1f30f9c482b8be	2020-01-30 17:42:55 -08:00
Hovhannes Harutyunyan	bc2e05a398	Update Docs for building PyTorch for Android. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32578 Reviewed By: ljk53 Differential Revision: D19588904 Pulled By: dreiss fbshipit-source-id: 2934752b9c5b94f2f141417669d8385be44d703b	2020-01-30 17:12:03 -08:00
Xiang Gao	fcf9fcedf4	Remove needs_dynamic_casting from TensorIterator and move it to Loops.cuh (#32755 ) Summary: Remove `needs_dynamic_casting` from TensorIterator and move it to `Loops.cuh`. The original design of `needs_dynamic_casting` is fundamentally flawed: it injects logics into TensorIterator and uses a bunch of boolean values to test whether the dynamic casting is needed. This makes it very fragile, as the TensorIterator is so complicated and it is easy to introduce unnecessary dynamic casts. It also makes the `gpu_kernel` very unflexible, differently cases needs to manipulate TensorIterator to make it work. For example, currently ```python torch.zeros(10, device='cuda').mul_(0.9) ``` needs dynamic cast, but it shouldn't. Testing whether dynamic casting is needed could be easy: just compare the dtypes of the lambda with the dtypes of operands. If they don't match, then dynamically cast, otherwise don't cast. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32755 Differential Revision: D19644092 Pulled By: ngimel fbshipit-source-id: 130bb8bd78d20c2ed1bdfc9d9fb451eb0f0c7e55	2020-01-30 17:06:23 -08:00
root	0f0972051a	Cudnn bn size fix (#32763 ) Summary: Should fix https://github.com/pytorch/pytorch/issues/29744 by falling back to native batch norm implementation, if cudnn cannot execute the provided shape. Shape numbers were verified for cudnn 7.6.5.32 with tensor shapes: ```python # for spatial bn x = torch.Size([880801, 256, 5]) x = torch.Size([65535, 256, 5]) x = torch.Size([880801, 64, 4, 4]) x = torch.Size([65535, 64, 4, 4]) # for per-act bn x = torch.Size([131070, 2048]) x = torch.Size([262136, 2048]) ``` for `training()` and `eval()` mode using `torch.float32` and `torch.float16`. I've increased the shape of our current smoke test to, but I can also add all use cases of the support matrix, if wanted. CC ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/32763 Differential Revision: D19644328 Pulled By: ngimel fbshipit-source-id: c2151bf9fe6bac79b8cbc69cff517a4b0b3867aa	2020-01-30 16:57:15 -08:00
Lu Fang	bcb7c22679	[PyTorch BC] Fix the ci (#32843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32843 fix the ci by skipping aten::join Test Plan: ci Reviewed By: hl475 Differential Revision: D19650584 fbshipit-source-id: 4446eef568ded334217ff9205a795daffebe41a1	2020-01-30 16:05:03 -08:00
svcscm	5380e16db9	Updating submodules Summary: GitHub commits: `73638a8795` `7a83deaa83` `969d173d11` Test Plan: n/a Reviewed By: wittgenst fbshipit-source-id: 399ed7a972876727a6bfd1409667c735c406fef5	2020-01-30 15:41:49 -08:00
Gaurav Singh	765904f1b9	[torch] fd error check Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32797 Differential Revision: D19642262 Pulled By: mrshenli fbshipit-source-id: 1720812166dd583dca6d72cb7e24b65ec013a62b	2020-01-30 15:30:03 -08:00
Yinghai Lu	94ddc2c462	Resubmit more code fakefp16 mapping unification (#32798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32798 ATT Test Plan: unittests Reviewed By: amylittleyang Differential Revision: D19632251 fbshipit-source-id: 670004050d67415bb24392f3520afa32b64ce740	2020-01-30 12:48:48 -08:00
Edward Yang	690d41f24e	Centralize addition of "always on" dispatch keys. (#32734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32734 VariableTensorId is the only key with this treatment today, but BackendSelect and CompoundOp are coming soon. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19628091 Pulled By: ezyang fbshipit-source-id: 250753f90528fa282af7a18d8d2f7736382754bd	2020-01-30 11:49:40 -08:00
Edward Yang	5ddd2cd92b	Make DispatchKeyGuards accept DispatchKey::Undefined (#32729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32729 When working on the vmap prototype I noticed that this was helpful as it lets me easily initialize a no-op guard, if I need to do it at constructor time (which I usually do, because the guards don't have move constructors). Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19628092 Pulled By: ezyang fbshipit-source-id: d6259a3f70d287cdac2e4a5f3984e2880f19bdc2	2020-01-30 11:49:35 -08:00
Edward Yang	3d0a470d89	Rename DispatchKey::UndefinedTensorId to Undefined (#32728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32728 It doesn't have much to do with tensors anymore. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19628093 Pulled By: ezyang fbshipit-source-id: 4d57111cdf44ba347bec8a32bb5b4b47a83c1eaf	2020-01-30 11:47:40 -08:00
Shen Li	a40a19ccab	Remove GIL from RRefContext (#32807 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32807 After this commit, RRefContext no longer depends on pybind. Test Plan: Imported from OSS Differential Revision: D19636316 Pulled By: mrshenli fbshipit-source-id: 88faa101c32e9019e979ae8e5da6706e49842726	2020-01-30 10:53:25 -08:00
Mike Ruberry	413c0f6c29	Fixes moving after weight norm application (#32563 ) Summary: This PR updates how RNNs handle their "flat weights." In particular, it allows for only some flat weights to be "materialized" when apply is called, and it updates the flattening behavior to only apply if all flat weights are (1) materialized, (2) share a dtype and (3) are acceptable to cuDNN. One test is modified and another created to test these changes. One practical effect of this change is that weight norm can be successfully applied to a module BEFORE that module is moved to an accelerator. Previously doing so would throw an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32563 Differential Revision: D19602725 Pulled By: mruberry fbshipit-source-id: d8f9441d17815c8c9ba15b256d4be36f784a3cf9	2020-01-30 10:31:11 -08:00
peter	9bab617b3e	Make python version a parameterizable option for Windows CI. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32823 Differential Revision: D19642347 Pulled By: ezyang fbshipit-source-id: a4d461aa29a06bb7f5e5d359a2df2c90e9a4fd41	2020-01-30 08:16:43 -08:00
James Reed	cc35c876cb	Fix backcompat for linear_relu_dynamic_fp16 (#32803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32803 Stack from [ghstack](https://github.com/ezyang/ghstack): * #32803 Fix backcompat for linear_relu_dynamic_fp16 Test Plan: Imported from OSS Reviewed By: jamesr66a Differential Revision: D19642281 Pulled By: albanD fbshipit-source-id: 3b6ae4dd81bf8a70dd81ccbb02fffd7653bbd08c	2020-01-30 08:08:29 -08:00
albanD	fa65859270	Re-enable non-deterministic autograd tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32793 Test Plan: Imported from OSS Differential Revision: D19634632 Pulled By: albanD fbshipit-source-id: 9dda29536c2ed4afb81ecbea471ba615241bbac2	2020-01-30 08:00:19 -08:00
Pavel Belevich	85bd3e5bdb	Removing @expectedFailureXLA from test_nll_loss_empty_tensor_reduction_mean (#32701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32701 Because it's disabled in XLA(https://github.com/pytorch/xla/pull/1563) Discussed in https://github.com/pytorch/xla/issues/1539 Test Plan: Imported from OSS Differential Revision: D19633349 Pulled By: pbelevich fbshipit-source-id: b9a81c976a96b325356ff210ff838dfcd5352db7	2020-01-30 07:38:12 -08:00
Edward Yang	6874278985	Revert D19611800: [PyTorch][TorchScript] Add support for join on List of strings in TorchScript Test Plan: revert-hammer Differential Revision: D19611800 Original commit changeset: cef66356abc1 fbshipit-source-id: 41af9e0de83b1fb808b17255ec905e137909457d	2020-01-30 06:46:28 -08:00
Shihao Xu	b0923acb29	Reduce RPC branches for Python/BuiltinOp/TorchScript (#32689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32689 As described in https://github.com/pytorch/pytorch/issues/32565 ghstack-source-id: 97440343 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_script_functions_not_supported buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_script_functions_not_supported ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call ``` Differential Revision: D5721814 fbshipit-source-id: 9079e81764be1e7c7b85dd72a18c76f3ecfd2547	2020-01-30 01:19:35 -08:00
Basil Hosmer	affd598c1f	Fix/simplify alias annotation handling in op codegen. (#32574 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32574 Previously, we ignored alias annotations when deriving argument mutability and instead recognized particular signature patterns (in-place, out variant) and assigned mutability accordingly. Op signatures that didn't fit these patterns would error (e.g. see #30526, which this fixes). No change in the generated binding code. Code changes: 1. in function_wrapper.py, fix the mutability derivation logic used when creating an argument's c++ type property. Note that we temporarily need to trap a special case and apply the old logic, see code comment for details. 2. in gen_jit_dispatch.py, update logic that assumed only one mutable Tensor argument per declaration. Happily this mostly was accomplished by bypassing some now-redundant signature regeneration machinery. Another special case here requires that we keep the old machinery around temporarily. Test Plan: Imported from OSS Differential Revision: D19564875 Pulled By: bhosmer fbshipit-source-id: 5637a9672923676d408c9586f3420bcc0028471a	2020-01-30 00:31:03 -08:00
Basil Hosmer	fb159b5236	Some work on eager op binding codegen (gen_python_functions.py) (#29986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29986 Previously in addition to generating a python binding for each op, we would generate an almost-trivial helper for each overload. This PR eliminates the helpers, simplifying codegen logic a bit and reducing the source-level indirection by a step. Perf should be unchanged. codegen diff: `1f2f07fb60` Note: in the interests of keeping the diff contained, there's only some light cleanup here beyond what's necessary for the codegen changes. Plan is to do some more substantial refactoring in followup PRs that leave generated code unchanged. Test Plan: Imported from OSS Differential Revision: D18567980 Pulled By: bhosmer fbshipit-source-id: eb9a81babb4489abd470842757af45580d4c9906	2020-01-30 00:29:53 -08:00
Jeremy Lilley	821b6aa769	[pytorch] Minor: avoid acquiring GIL twice in PyRRef::localValue() (#32785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32785 Add PythonRpcHandler::handleExceptionWithGIL() so that in PyRRef::localValue(), we don't need to release the GIL and re-acquire the following line. ghstack-source-id: 97418465 Test Plan: existing test coverage Differential Revision: D19626195 fbshipit-source-id: db694d04b078811f819626789e1e86f1b35adb5b	2020-01-29 21:27:43 -08:00
Supriya Rao	c2d736cefb	Add support for Dynamic LSTM quantization on Mobile (#32757 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32757 This PR updates the main quantize_dynamic API to use QNNPACK backend for mobile Test Plan: python test/test_quantization.py PostTrainingDynamicQuantTest.test_quantized_rnn Imported from OSS Differential Revision: D19632220 fbshipit-source-id: b4c51485c281d088524101b97c84dd806438b597	2020-01-29 20:55:48 -08:00
Brian Stark	55c382e62b	Fixed access to element in size tensor for scripting (#32652 ) Summary: when using scripting, there was an error in attempting to access a specific element from within the size tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32652 Reviewed By: hl475 Differential Revision: D19610726 Pulled By: houseroad fbshipit-source-id: bca49927bbe71dbe7e7d7edf301908fe79e089b5	2020-01-29 18:33:46 -08:00
Sampath Mummadi	8ead65a946	[PyTorch][TorchScript] Add support for join on List of strings in TorchScript Summary: Add support for join on List of strings in TorchScript. Test Plan: (pytorch) smummadi@smummadi-mbp pytorch % python test/test_jit_string.py Fail to import hypothesis in common_utils, tests are not derandomized . ---------------------------------------------------------------------- Ran 1 test in 1.090s OK Differential Revision: D19611800 fbshipit-source-id: cef66356abc14dfd100a806d25dd1a8bc9af0a11	2020-01-29 18:22:52 -08:00
Mingzhe Li	cccf5e7011	Resolve rendezvous race condition Summary: When running the ctr_mbl_feed, we've encountered hang issue related to the rendezvous handshake based on zeus. It was mitigated by this diff https://our.intern.facebook.com/intern/diff/D19167151/. This diff resolves the race condition by adding a reference to the rendezvous handler. Test Plan: x7340282797 Reviewed By: yifuwang Differential Revision: D19627293 fbshipit-source-id: 560af289db8ef6cf8d6f101f95ec27d5a361fd04	2020-01-29 17:49:07 -08:00
Michael Suo	3552be1090	[jit] fix the NoneType param/buffer hack (#32745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32745 Some parameters (like `bias` in conv) are optional. To achieve this previously, you had to add `bias` as a constant, which would invoke some pretty weird behavior in the frontend, summarized as: ``` if bias is not None: add it as a parameter normally else: # bias is None add it as a constant with the value None ``` There are several things bad about this: 1. Bias is not a constant. Marking it `__constants__` is confusing. 2. It basically relies on an implementation detail (the frontend processes parameters before constants) to work. Okay, whatever. I don't even know why we did this originally, but getting rid of it doesn't break anything, so I assume improved NoneType refinement has made this a non-issue. Note on perf: this will make no difference; if bias was `None` it's still folded out today, if bias is a Tensor it would be added as a parameter both before and after this change Test Plan: Imported from OSS Differential Revision: D19628634 Pulled By: suo fbshipit-source-id: d9128a09c5d096b938fcf567b8c23b09ac9ab37f	2020-01-29 17:04:39 -08:00
Natalia Gimelshein	2e359ef86d	enable empty batch for all flavor of convolutions (#32709 ) Summary: resubmitting https://github.com/pytorch/pytorch/issues/32612 after a merge gone wrong. Enables convolution with an empty batch or number of channels for all flavors of convolution (grouped convolution, convTranspose). Would make https://github.com/pytorch/pytorch/issues/31658 unnecessary. Also returns zero gradients for the parameters, that's necessary for correct DDP operation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32709 Differential Revision: D19627968 Pulled By: ngimel fbshipit-source-id: 7359759bd05ff0df0eb658cac55651c607f1b59f	2020-01-29 16:33:48 -08:00
Jianyu Huang	a840afbeb4	[pytorch][embeddingbag_8bit] Add include_last_offset option to Fused 8bit EmbeddingBag and parallelize the op (#32683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32683 Pull Request resolved: https://github.com/pytorch/glow/pull/4079 Similar to D17768404, we changed the EmbeddingBag operator for 8-bit fused version to add the option to include the last offset and parallelize the op. ghstack-source-id: 97404645 Test Plan: To generate the AVX2 code (`embedding_lookup_fused_8bit_rowwise_idx_avx2.cc`): ``` python hp_emblookup_codegen.py --fused --use-offsets ``` To test the correctness: ``` buck test //caffe2/torch/fb/sparsenn:test -- test_embedding_bag_byte_rowwise_offsets --print-passing-details ``` Reviewed By: yinghai Differential Revision: D19592761 fbshipit-source-id: f009d675ea3f2228f62e9f86b7ccb94700a0dfe0	2020-01-29 16:04:56 -08:00
Enealor	b565d9b356	Logspace fixes (#32744 ) Summary: Reopening of PR https://github.com/pytorch/pytorch/issues/32631 with `viable/strict` base for testing Pull Request resolved: https://github.com/pytorch/pytorch/pull/32744 Differential Revision: D19626090 Pulled By: ngimel fbshipit-source-id: ed0fc759198ee2edc23afdcb1e190a11d70ec4c8	2020-01-29 15:17:00 -08:00
James Reed	fc2ff7912f	[quantization] Remove incorrect fp16 dynamic linear/relu op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32774 Test Plan: Imported from OSS Differential Revision: D19624471 Pulled By: jamesr66a fbshipit-source-id: eb6cb11fabf2ddd5edf345aff35b86b83c3af94c	2020-01-29 14:50:24 -08:00
Pavel Belevich	9357b91180	Remove -Werror from test/cpp_extensions/setup.py (#32704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32704 -Werror is too aggressive check for test cpp extensions because it fails even on deprecation warnings which is are included from core codebase. Fixes #32136 Test Plan: Imported from OSS Differential Revision: D19620190 Pulled By: pbelevich fbshipit-source-id: 0e91566eb5de853559bb59e68a02b0bb15e7341b	2020-01-29 14:12:32 -08:00
Gao, Xiang	8b187e8f2a	Fix ivalue_inl.h:353:29: warning: comparison of unsigned expression >= 0 is always true (#32778 ) Summary: `slot` is unsigned integer which is `always >= 0` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32778 Differential Revision: D19625789 Pulled By: ngimel fbshipit-source-id: c92c35c65d4372be934283e87aeba99e9e0ef353	2020-01-29 14:04:05 -08:00
Edward Yang	c47c78d0bf	Revert D19597036: More code fakefp16 mapping unification Test Plan: revert-hammer Differential Revision: D19597036 Original commit changeset: deed61945884 fbshipit-source-id: c057e57810a99464aefb00b645613ecd6a7c5533	2020-01-29 13:32:42 -08:00
Edward Yang	3ee6673e99	Refreshing numel on a stride update is pointless. (#32116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32116 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19579875 Pulled By: ezyang fbshipit-source-id: 00393c9dc101967c79231bfae36b23b7b80135fb	2020-01-29 13:26:28 -08:00
Edward Yang	8c6f52ac24	Delete resize_dim() (#32114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32114 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19579876 Pulled By: ezyang fbshipit-source-id: d09a231ba891403a06eae0c2203e0ad7dd6d3a12	2020-01-29 13:26:23 -08:00
Edward Yang	b371eab8c7	Expunge last two sites of resize_dim (#32112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32112 It turns out we already removed these from the CPU version; copy the changes over. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19579874 Pulled By: ezyang fbshipit-source-id: e40efbf94e128fd81421b227b76dd9c9c0256d96	2020-01-29 13:25:22 -08:00
Edward Yang	c7df28a2a3	Delete copy/move constructors on these RAII guards. (#32727 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32727 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19621858 Pulled By: ezyang fbshipit-source-id: 5112c849252478d8249de4f8c8c5a2d6caf60672	2020-01-29 13:20:15 -08:00
Edward Yang	5ffa1efa52	Add missing C10_API to dispatch key TLS setter/getters (#32557 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32557 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19579853 Pulled By: ezyang fbshipit-source-id: 45f83a7a5ead0344e4c13526abb5fafdedaed4a4	2020-01-29 13:20:09 -08:00
Edward Yang	3b47922855	Improve documentation in dispatcher; remove unnecessary optional (#32533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32533 Applies renames based on comments in #32439. I also updated some other documentation and variable names while I was at it. Fixes #32435. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19579854 Pulled By: ezyang fbshipit-source-id: 85021a92a2a84501f49ee5c16318f81f5df64f8d	2020-01-29 13:18:29 -08:00
Kurt Mohler	8cb05e72c6	Port BCELoss to ATen to increase accuracy (#31365 ) Summary: Fixes issue https://github.com/pytorch/pytorch/issues/24933 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31365 Differential Revision: D19557712 Pulled By: ezyang fbshipit-source-id: 3ae78c949b2f6c21b294d986d28e09daa9b0c526	2020-01-29 12:58:37 -08:00
Edward Yang	50d82f5122	Make VC++ version a parametrizable option for Windows CI. (#32043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32043 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19621910 Pulled By: ezyang fbshipit-source-id: dce00a56ff679548fd9f467661c3c54c71a3dd4e	2020-01-29 12:11:47 -08:00
Deepali Chourasia	e84f9d9d0c	Fix TensorProtosDBInput AttributeError (#32274 ) Summary: https://github.com/pytorch/pytorch/issues/6794 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32274 Differential Revision: D19621889 Pulled By: ezyang fbshipit-source-id: 1bdd042b6421a2798c7f1e9030dfc6dfc1246989	2020-01-29 12:05:43 -08:00
Ailing Zhang	8693164acb	Randomize xla port (#32718 ) Summary: fixes https://github.com/pytorch/pytorch/issues/30717 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32718 Differential Revision: D19607998 Pulled By: ailzhang fbshipit-source-id: 81ba9c7c71988a64cdc8fa5500967509657438fe	2020-01-29 12:04:01 -08:00
Yanli Zhao	b5d8982ae2	clean up GIL usuage (#32748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32748 This is to follow up PR #30630, we need to have GIL when calling jit::toPyObject(), for some binded functions need to be taged with GIL release if underneath C++ codes requires GIL. so 1. pyRef::to_here() and pyRef::local_value() added GIL 2. pyRef::pickle and pyRef::unpickle() added GIL release tag 3. in request_callback_impl, also added GIL as needed 4. for typeParser, use cached jitCompilationUnit_, also clean it up in cleanUp() function ghstack-source-id: 97373011 Test Plan: unit test Differential Revision: D19612337 fbshipit-source-id: 4d09f9b52ba626545ae7d31fea6b671301ed3890	2020-01-29 11:58:46 -08:00
Ivan Kobzarev	eab99ab08e	[android] fbjni DoNotStrip annotation for oss native methods (#32567 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32567 As a first change to support proguard. even if these methods could be not called from java, on jni level we register them and this registration will fail if methods are stripped. Adding DoNotStrip to all native methods that are registered in OSS. After integration of consumerProguardFiles in fbjni that prevents stripping by proguard DoNotStrip it will fix errors with proguard on. Test Plan: Imported from OSS Differential Revision: D19624684 Pulled By: IvanKobzarev fbshipit-source-id: cd7d9153e9f8faf31c99583cede4adbf06bab507	2020-01-29 11:52:53 -08:00
Dylan Bespalko	2471ddc96c	Improved speed of frobenous norm for non-complex dtype (#30871 ) Summary: In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for CUDA complex numbers is here: [pytorch-cuda-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cuda-strided-complex) Changes: [x] Fixed performance issue raise in https://github.com/pytorch/pytorch/issues/30704 so that non-complex numbers do not call `conj()` and `real()`. [x] Fixed tensor_to_numpy() conversion likely broken by a `checkBackend()` in https://github.com/pytorch/pytorch/issues/27064. [x] Fixed some ReduceOps and TensorCompare Ops that recently added a `checkBackend()`. - `checkBackend()` is replaced with a device type check and a layout check. - This ensures the ComplexCPU Type ID is supported. [x] Added AVX support for complex `exp()`, as requested in https://github.com/pytorch/pytorch/issues/755 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30871 Differential Revision: D19200726 Pulled By: ezyang fbshipit-source-id: d7e1be0b0a89c5d6e5f4a68ce5fcd2adc5b88277	2020-01-29 11:43:53 -08:00
Pavel Belevich	b1c85dd916	Custom RNG DispatchKey (#32325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32325 The purpose of this PR is to enable PyTorch dispatching on `at::Generator` parameters and demonstrate how it can be used in cpp extensions to implement custom RNG. 1. `CustomRNGKeyId` value added to DispatchKey enum and `DispatchKeySet key_set_` added to `at::Generator` 2. The overloaded `operator()(at::Generator gen)` added to MultiDispatchKeySet. 3. The existing CPUGenerator and CUDAGenerator class are supplied with CPUTensorId and CUDATensorId dispatch keys 4. The implementation of CPU's `cauchy_kernel`(as an example, because it's already moved to ATen) was templatized and moved to `ATen/native/cpu/DistributionTemplates.h` to make it available for cpp extensions 5. Minor CMake changes to make native/cpu tensors available for cpp extensions 6. RegisterCustomRNG test that demonstrates how CustomCPUGenerator class can be implemented and how custom_rng_cauchy_ native function can be registered to handle Tensor::cauchy_ calls. Test Plan: Imported from OSS Differential Revision: D19604558 Pulled By: pbelevich fbshipit-source-id: 2619f14076cee5742094a0be832d8530bba72728	2020-01-29 11:30:04 -08:00
Yinghai Lu	642c9ef922	More code fakefp16 mapping unification Summary: ATT Reviewed By: amylittleyang Differential Revision: D19597036 fbshipit-source-id: deed61945884fb4b01d058f3c72c75f5a937a41c	2020-01-29 11:01:24 -08:00
Xiang Gao	d119de8abd	Deduplication of type casting codes (#32730 ) Summary: These codes are implemented twice at different places by different people, we should merge them together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32730 Differential Revision: D19622023 Pulled By: ezyang fbshipit-source-id: a9cbda31428b335bf28a7e4050f51f58e787b94f	2020-01-29 10:13:15 -08:00
Rohan Varma	cbb744f00f	apply linter to rpc test files (#32659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32659 Applies linter to RPC test files so that we can use linter shortcuts without getting unnecessary changes to the whole file. ghstack-source-id: 97361237 Test Plan: No actual changes. Differential Revision: D19584742 fbshipit-source-id: a11ce74ee0e2817e6f774fff7c39bcab06e99307	2020-01-29 09:49:45 -08:00
Max Balandat	8bc889e502	Fix crash of SobolEngine if default tensor type is cuda (#32496 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32496 Addresses https://github.com/pytorch/pytorch/issues/32494 Test Plan: ``` import torch from torch.quasirandom import SobolEngine torch.set_default_tensor_type(torch.cuda.FloatTensor) se = SobolEngine(3) ``` Reviewed By: 2timesjay Differential Revision: D19517571 fbshipit-source-id: 02eb499ffbd4260474d348e9bb536fb8c36c2c31	2020-01-29 08:49:18 -08:00
Seyyed Hossein Hasanpour	c7bf4d22fe	added exception args to the returned error message (#32693 ) Summary: addresses https://github.com/pytorch/pytorch/issues/32692 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32693 Differential Revision: D19606757 Pulled By: mrshenli fbshipit-source-id: 79fc09f8bb6a33e1b73ce0bbc45387544c7adc1b	2020-01-29 08:26:27 -08:00
Gregory Chanan	c35ca84eee	Get rid of some unused THGenerate*Type defines. (#32657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32657 The goal here is to simplify the codegen enough that we can just handwrite the bindings, so anything in here is "bad". Test Plan: Imported from OSS Differential Revision: D19584521 Pulled By: gchanan fbshipit-source-id: 93005b178228c52a1517e911adde2e2fe46d66a5	2020-01-29 08:12:45 -08:00
albanD	594cadeb8f	Make sure temporary vectors are properly initialized in avx2 code (#32722 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32722 Checked using [this](https://godbolt.org/z/uAaE9R) that it gives the correct assembly. Test Plan: Imported from OSS Differential Revision: D19610012 Pulled By: albanD fbshipit-source-id: 4d1cb812951ae03d412a0fba3c80730f0d286e1f	2020-01-29 07:58:25 -08:00
Elias Ellison	5e2311033e	fix windows build (#32762 ) Summary: remove windows visibility macro Pull Request resolved: https://github.com/pytorch/pytorch/pull/32762 Differential Revision: D19616367 Pulled By: eellison fbshipit-source-id: d824162fe92bff4cb2b1a170312cd14b6d7bd99d	2020-01-28 22:55:48 -08:00
svcscm	fd850685da	Updating submodules Summary: GitHub commits: `b81d0657df` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 82d39025e331083e58c0d0cc9b47985e590bb289	2020-01-28 21:03:34 -08:00
Natalia Gimelshein	62d652f922	replaces .at with [] in getSlot (#32677 ) Summary: per title. cc qizzzh Pull Request resolved: https://github.com/pytorch/pytorch/pull/32677 Differential Revision: D19596094 Pulled By: ngimel fbshipit-source-id: 06177b9e12d203d84b541205437ef2ad51db0fac	2020-01-28 20:49:03 -08:00
Elias Ellison	c729614997	[JIT] Improve May Contain Alias Using Contained Elements (#32326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32326 Now that we have type-level granularity we can improve `mayContainAlias` queries. Each new values is initialized as containing the wildcard set of each contained mutable type. Whenever a value is added to a container it is set to the wildcard set. Now, to check if any two values contain overlapping values, we can just check if the `containedMemoryLocations` of two sets overlap. Test Plan: Imported from OSS Differential Revision: D19563262 Pulled By: eellison fbshipit-source-id: c6d7489749c14b2054a6d50ef75baca699ada471	2020-01-28 18:08:56 -08:00
Elias Ellison	25d33a2ee8	[JIT] Use Type Level Granularity in Alias Analysis Wildcards (#32251 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32251 Previously wildcard sets were associated by TypeKind, meaning all Lists were in one alias set, all Classes were in one alias set, etc. We can improve analysis by bucketing wildcard sets by TypePtr instead. Any two mutable types which can unify should be in the same wildcard set bucket. This also allows us do much simpler `mayContainAlias` analysis, and also improves `analyzeConservative` analysis because now we can recurse through all contained memory locations and mark writes, instead of just recursing only level deep in contained elements. Test Plan: Imported from OSS Differential Revision: D19563263 Pulled By: eellison fbshipit-source-id: 371a37d1a8596abc6c53f41c09840b6c140ea362	2020-01-28 18:07:48 -08:00
Yinghai Lu	02f055ffd9	Add mapping for FbFCPacked in fakefp16 transform Summary: ATT. Since the infra is there. Test Plan: run it Reviewed By: amylittleyang Differential Revision: D19605250 fbshipit-source-id: c68be4d7963afa4fa5f8f60c90f1913605eae516	2020-01-28 17:00:24 -08:00
Huamin Li	18aab32959	Move exponential_ from TH to Aten (CPU) (#32501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32501 This diff will address https://github.com/pytorch/pytorch/issues/24699 We ask the input `lambda` to be >= 0 to be same as https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.random.exponential.html#numpy-random-exponential. This does not exist in the previous implementation. Benchmark I am using PT operator microbenchmark ``` ================================================================================ Before the change, Program Output: ================================================================================ # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: exponential_ # Mode: Eager # Name: exponential__M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 21311.746 ================================================================================ After the change, Program Output: ================================================================================ # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: exponential_ # Mode: Eager # Name: exponential__M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 20919.914 ================================================================================ ``` Test Plan: Sandcastle and Github tests Reviewed By: BIT-silence Differential Revision: D19518700 fbshipit-source-id: 0e79cb6a999c1278eb08b0d94cf61b119c85a36c	2020-01-28 16:59:22 -08:00
Xinyi Zhang	1f78bd0774	[caffe2] Early error throwing for currupted embeddings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32717 Reviewed By: xianjiec Differential Revision: D19604954 fbshipit-source-id: c02eccf048c0dba3f66d729ab1fda50f3cacef63	2020-01-28 16:55:29 -08:00
Jianyu Huang	6f7d5bb3e1	Temporarily disable the test_quantized_rnn test (#32742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32742 As Title says (Check https://github.com/pytorch/pytorch/issues/32644). ghstack-source-id: 97352793 Test Plan: CI Differential Revision: D19611029 fbshipit-source-id: 9f4a155c909f419e41c1d7078eb2796dd17cedd2	2020-01-28 16:50:59 -08:00
Brian Stark	43d31ae4c3	Added ONNX model checker to ONNX export (#32298 ) Summary: Included the ONNX model checker code in the ONNX export this will force onnx checker to run for all models that get exported. This should help with validating exported models. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32298 Reviewed By: hl475 Differential Revision: D19538251 Pulled By: houseroad fbshipit-source-id: eb20b124fe59200048f862ddaf20f6c59a0174d5	2020-01-28 16:28:54 -08:00
Marc Lacayo	99228086a6	Added missing period in README. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32723 Differential Revision: D19607256 Pulled By: mlacayo fbshipit-source-id: 2993014d4d90fa26acd5bc01ed7494cc43a29a62	2020-01-28 16:25:04 -08:00
Owen Anderson	e74e1ccc47	Use direct vector indexing in Object::getSlot() instead of at(). (#31627 ) Summary: This method is pretty hot. In an internal workload, this single call to at() accounted for ~2% of overall cycles. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31627 Reviewed By: yinghai Differential Revision: D19607779 Pulled By: qizzzh fbshipit-source-id: 1684919049a35fdad686d8396c7dce7243ab92d4	2020-01-28 16:17:16 -08:00
Alban Desmaison	ee60cd9124	Back out "fix view listing in autograd codegen" (#32720 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32720 Original commit changeset: 5ebc4c978af5 Test Plan: existing tests Reviewed By: chenyangyu1988 Differential Revision: D19603336 fbshipit-source-id: 56051a716c4eedf49cfe7367ff447b4b9c5429ea	2020-01-28 16:10:47 -08:00
davidriazati	2060e0a9dd	Split serialization tests to their own file (#32241 ) Summary: Stacked PRs * #32244 - Make zip serialization the default * #32241 - Split serialization tests to their own file This makes them all easier to run as a batch. This PR is just a code move / fixing up imports. There are still some serialization tests in `test_torch.py` as part of `TestDeviceType`. ](https://our.intern.facebook.com/intern/diff/19415826/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32241 Pulled By: driazati Differential Revision: D19415826 fbshipit-source-id: a3f6cfe1626ff2f9b9631c409bf525bd32e4639b	2020-01-28 15:04:05 -08:00
Jongsoo Park	0327e75e14	Back out "[caffe2] use JIT'ed fp32 SLS" (#32711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32711 Original commit changeset: 4f29d34523ef Test Plan: CI Differential Revision: D19603967 fbshipit-source-id: af3f647fff416a84290a42217747948bac4d73c6	2020-01-28 14:07:11 -08:00
Yinghai Lu	ffdcbadeaa	Minor refactoring to improve code reuse (#32675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32675 It's good to have one location to do the mapping. Test Plan: Everything still runs. Reviewed By: amylittleyang Differential Revision: D19590354 fbshipit-source-id: d8c0d14e4bdf27da3e13bd4d161cd135d6e3822b	2020-01-28 13:31:48 -08:00
Rohan Varma	9de3208449	[rpc][flaky-tests] fix for test_handle_send_exceptions and (#32656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32656 Fixes these flaky tests. Test Plan: Run the test 500 times and verify that it succeeds every time. Differential Revision: D19584453 fbshipit-source-id: 07cbc4914211f274182ac0fa74bb5ef6d43392d1	2020-01-28 12:40:12 -08:00
Rohan Varma	6e7e595c1d	[rpc][easy] remove redundant test in rpc_test.py (#32588 ) Summary: Both `test_wait_all_workers` and `test_wait_all_workers_and_shutdown` test the same pattern of initialize RPC, call `_wait_all_workers`, and `rpc.shutdown(graceful=False)`. `test_wait_all_workers` seems to be more thorough since it tests one worker driving and the others waiting on it as well. We shouldn't have duplicate test so removing this `test_wait_all_workers_and_shutdown`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32588 Differential Revision: D19566294 Pulled By: rohan-varma fbshipit-source-id: b69519d169b3964649d47ad75532bda5de538241	2020-01-28 11:55:17 -08:00
James Reed	0ea65d63cf	[JIT] Fix stateful lambda stuff and simplify code in custom C++ binding API Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32658 Test Plan: Imported from OSS Differential Revision: D19584701 Pulled By: jamesr66a fbshipit-source-id: d556c7db2f32900eb1122348402789b59516a7d7	2020-01-28 11:03:04 -08:00
James Reed	465ebd58ba	[JIT] pickle serialization for custom bound classes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32604 Test Plan: Imported from OSS Differential Revision: D19566633 fbshipit-source-id: 9387d3ff45cbd6ccde49ce190a52859481cc301c	2020-01-28 11:02:59 -08:00
James Reed	34ccfba403	[JIT] Include custom_class.h in torch/script.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32586 Test Plan: Imported from OSS Differential Revision: D19558716 fbshipit-source-id: be540d8ed7de0834e64be89ae621ae50befc83b0	2020-01-28 11:02:54 -08:00
James Reed	06c19263d3	[JIT] Serialize attributes and types in ClassType serialization Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32555 Test Plan: Imported from OSS Differential Revision: D19544737 Pulled By: jamesr66a fbshipit-source-id: 2256cfba414a850cdc986bb5872dd4cb177b456c	2020-01-28 11:02:49 -08:00
James Reed	1719da13f9	[JIT] Support for registering C++ lambdas as methods on custom C++ class Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32553 Test Plan: Imported from OSS Differential Revision: D19543269 Pulled By: jamesr66a fbshipit-source-id: 7e566650295e9d1c4f2f716470e061308a6210a0	2020-01-28 11:01:07 -08:00
Eli Uriegas	da390914bd	.circleci: Add workflows for Python 3.8 (#31948 ) Summary: Done by just editing `.circleci/cimodel/data/dimensions.py` to include `3.8` and then regenerated using `.circleci/regenerate.sh` cc kostmo, mingbowan, ezyang, soumith Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/31948 Differential Revision: D19602069 Pulled By: seemethere fbshipit-source-id: ac57fde9d0c491c7d948a3f5944c3cb324d403c0	2020-01-28 10:26:03 -08:00
Nikolay Korovaiko	0dc38be407	consider FAIL_GUARD while counting indices for GUARDs (#32672 ) Summary: This handles a corner case when a user schedules second bailout after the first one and the first one doesn't fire. Alternatively, we could go back to the implementation that uses a hash set to remember the indices of bailouts that need to fire. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32672 Differential Revision: D19596872 Pulled By: Krovatkin fbshipit-source-id: 41dcc374cd2501ac20a9892fb31a9c56d6640258	2020-01-28 08:59:25 -08:00
Martin Yuan	c64dec1993	Python binding to export bytecode format for lite interpreter (#32621 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32621 Export the "_save_for_mobile" method to Python so that the bytecode format for lite interpreter can be added or updated to the original script model. It's the first step of python binding for lite interpreter, as discussed in this [internal post](https://fb.workplace.com/groups/1144215345733672/permalink/1478900738931796/) and offline. Next step is to export the load_for_mobile and run method of mobile module, so that users could verify the mobile model from Python. Test: use the following python script to display the bytecode part of the updated model file. ``` #!/usr/bin/env python3 import sys import pickle import pprint import zipfile class FakeObject(object): def __init__(self, module, name, args): self.module = module self.name = name self.args = args self.state = None def __repr__(self): state_str = "" if self.state is None else f"(state={self.state!r})" return f"{self.module}.{self.name}{self.args!r}{state_str}" def __setstate__(self, state): self.state = state class FakeClass(object): def __init__(self, module, name): self.module = module self.name = name self.__new__ = self.fake_new def __repr__(self): return f"{self.module}.{self.name}" def __call__(self, args): return FakeObject(self.module, self.name, args) def fake_new(self, args): return FakeObject(self.module, self.name, args) class DumpUnpickler(pickle._Unpickler): def find_class(self, module, name): return FakeClass(module, name) def persistent_load(self, pid): return FakeObject("pers", "obj", (pid,)) def main(argv): zfile = zipfile.ZipFile(argv[1]) names = [i for i in zfile.namelist() if "bytecode.pkl" in i] if not names: print("bytecode.pkl not found.") return with zfile.open(names[0], "r") as handle: value = DumpUnpickler(handle).load() pprint.pprint(value) if __name__ == "__main__": sys.exit(main(sys.argv)) ``` Test Plan: Imported from OSS Differential Revision: D19596359 Pulled By: iseeyuan fbshipit-source-id: 19a4a771320f95217f5b0f031c2c04db7b4079a8	2020-01-28 08:30:20 -08:00
Gregory Chanan	e24ce0e524	Kill some more unused code in function_wrapper.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32600 Test Plan: Imported from OSS Differential Revision: D19565654 Pulled By: gchanan fbshipit-source-id: 993c3dc5467639a7690109d07911951a165a412f	2020-01-28 07:38:51 -08:00
comet	9a2691f2fc	Fix spelling errors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32673 Differential Revision: D19597118 Pulled By: pietern fbshipit-source-id: f88c1da7548fcee141ed248f5f49d25c1d639955	2020-01-28 04:46:15 -08:00
Michael Suo	63170431f9	[jit] fix segfault on missing getstate (#32642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32642 Previously, if we defined `__setstate__` but not `__getstate__`, we would segfault. This PR turns that into a comprehensible error message (and improves another error message as well). Fixes https://github.com/pytorch/pytorch/issues/25886 Test Plan: Imported from OSS Differential Revision: D19596463 Pulled By: suo fbshipit-source-id: dbe76bc36bc747d65fb0223184c009e0e9ba072c	2020-01-28 01:25:37 -08:00
Wojciech Baranowski	8e4161517e	div_kernel: throw when dividing by integer zero (#32629 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/327 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32629 Differential Revision: D19595782 Pulled By: ezyang fbshipit-source-id: f5bbb298f150efe63a698e8a0b53a84871d16560	2020-01-27 21:41:00 -08:00
Pritam Damania	b3848c568e	Fix flaky test_nccl_timeout. (#32653 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32653 This test was flaky since the watchdog thread could abort the communicator instead of the thread calling `wait()`. As a result, we could actually see `NCCL error` instead of `Operation timed out` on the user end. ghstack-source-id: 97250714 Test Plan: waitforbuildbot Differential Revision: D19583003 fbshipit-source-id: 5c07326d1a16f214dcdbabed97ca613e0a5b42b9	2020-01-27 21:09:40 -08:00
James Reed	d68592a440	[JIT] Fix classes as attributes in recursive scripting Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32594 Test Plan: Imported from OSS Differential Revision: D19562951 Pulled By: jamesr66a fbshipit-source-id: 3d5491c1c23456f107390a78be16da687de951e6	2020-01-27 20:37:48 -08:00
Shihao Xu	b9f764b1c7	Use the C++ current RpcAgent pointer to eliminate the unnecessary argument passing from Python world (#32635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32635 With the source of truth of current RPC agent moved to C++ world, there is no point of passing current RPC agent from Python world to C++ world. ghstack-source-id: 97293316 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_process_group_debug_info ``` Differential Revision: D5703519 fbshipit-source-id: ef7c28bdb1efd293eb6cafe0b0fca7ca80fa08a6	2020-01-27 20:24:32 -08:00
Hong Xu	666e5430f8	Clean up mvlgamma doc (including a weird way to link to reference) (#32667 ) Summary: Intentionally left blank Pull Request resolved: https://github.com/pytorch/pytorch/pull/32667 Differential Revision: D19594683 Pulled By: ezyang fbshipit-source-id: 5a6eb0a74f569d3c0db2a35e0ed4b329792a18e4	2020-01-27 20:12:17 -08:00
Alban Desmaison	db8ce7ea2d	Back out "Make autogen functions correct for multiple outputs and views" (#32681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32681 Original commit changeset: a2b41c2d231e Test Plan: fb and oss tests Reviewed By: hudeven Differential Revision: D19591864 fbshipit-source-id: 7068b5563e37bc9a5d415fd535c73fd9d71fe131	2020-01-27 19:54:34 -08:00
Shihao Xu	5c8535d5b0	Make C++ RpcAgent::currentRPCAgent_ the source of truth of current RPC Agent (#32633 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32633 There were 2 sources of current RPC agent. - One is in Python world, `torch.distributedrpc.api._agent`. - The other is in C++ world, `RpcAgent::defaultRpcAgent_` Setting Python `_agent` to `None`, does not necessarily reset the C++ `defaultRpcAgent_` to `nullptr`. i.e. ``` torch.distributedrpc.api._agent = None ``` does not translate to ``` RpcAgent::defaultRpcAgent_ = nullptr ``` This PR is to remove this ambiguity, and use the C++ pointer as source of truth. The solution is to leverage a pybind11 behavior that it implicitly casts C++ `shared_ptr<RpcAgent>(nullptr)` to Python `None`. ghstack-source-id: 97293315 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_duplicate_name buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_process_group_debug_info ``` ``` buck test mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_remote_module buck test mode/dev-nosan //caffe2/torch/fb/distributed/modules/tests:test_sharded_embedding buck test mode/dev-nosan //caffe2/torch/fb/distributed/modules/tests:test_sharded_pairwise_attention_pooling buck test mode/dev-nosan //caffe2/torch/fb/distributed/pytorch/tests:test_rpc ``` Differential Revision: D5733066 fbshipit-source-id: b3e6032ee975f19ca556497edbbf40b517b25be8	2020-01-27 19:34:12 -08:00
svcscm	1217c9b364	Updating submodules Summary: GitHub commits: `3f156207e8` `135cff30a5` `7aa66c704f` `1dc4136644` `9166d9f767` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: fb27e09060ecb4278b4002c02bce48fe9f4dc361	2020-01-27 18:34:38 -08:00
Shihao Xu	1695915371	Make _wait_all_workers() support being called for multiple times (#32624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32624 We need this PR to resolve the issue mentioned in https://github.com/pytorch/pytorch/issues/31325#issuecomment-574918917. The solution is for each `_wait_all_workers()` call, there is a sequence ID added, to identify different calls. ghstack-source-id: 97277591 Test Plan: ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_wait_all_workers buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_wait_all_workers ``` Differential Revision: D5739520 fbshipit-source-id: a64131e09c365179624700514422f5375afe803f	2020-01-27 17:04:02 -08:00
Ivan Kobzarev	39987de9e4	[vulkan][caffe2] Add logging for descriptor extensions, fp16 storage Summary: `fbcode/caffe2/caffe2/mobile/contrib/libvulkan-stub/BUCK` changes comment: libvulkan-stub contains vulkan headers `VK_HEADER_VERSION 29` fbandroid uses ndk r17 that includes vulkan `VK_HEADER_VERSION 76` which contains defines for extensions that we need. `("include", "*/.h"),` -> `("include", "*.h"),` means that ndk vulkan headers to use. For fp16_storage logging need to add boilerplate for `vkGetPhysicalDeviceFeatures2KHR` Test Plan: scuba employees device_event logcat getVulkanInfo(). ``` instance ext.name:VK_KHR_surface instance ext.name:VK_KHR_android_surface instance ext.name:VK_EXT_swapchain_colorspace instance ext.name:VK_KHR_get_surface_capabilities2 instance ext.name:VK_EXT_debug_report instance ext.name:VK_KHR_device_group_creation instance ext.name:VK_KHR_external_fence_capabilities instance ext.name:VK_KHR_external_memory_capabilities instance ext.name:VK_KHR_get_physical_device_properties2 instance ext.name:VK_KHR_external_semaphore_capabilities device ext.name:VK_KHR_incremental_present device ext.name:VK_EXT_hdr_metadata device ext.name:VK_KHR_shared_presentable_image device ext.name:VK_GOOGLE_display_timing device ext.name:VK_KHR_push_descriptor device ext.name:VK_KHR_image_format_list device ext.name:VK_EXT_queue_family_foreign device ext.name:VK_ANDROID_external_memory_android_hardware_buffer device ext.name:VK_KHR_external_semaphore_fd device ext.name:VK_KHR_external_fence_fd device ext.name:VK_KHR_external_memory_fd device ext.name:VK_KHR_external_memory device ext.name:VK_KHR_swapchain device ext.name:VK_KHR_external_semaphore device ext.name:VK_KHR_driver_properties device ext.name:VK_KHR_sampler_mirror_clamp_to_edge device ext.name:VK_KHR_multiview device ext.name:VK_KHR_relaxed_block_layout device ext.name:VK_KHR_maintenance1 device ext.name:VK_KHR_maintenance3 device ext.name:VK_KHR_maintenance2 device ext.name:VK_EXT_global_priority device ext.name:VK_KHR_get_memory_requirements2 device ext.name:VK_KHR_descriptor_update_template device ext.name:VK_KHR_bind_memory2 device ext.name:VK_KHR_shader_draw_parameters device ext.name:VK_KHR_dedicated_allocation device ext.name:VK_KHR_create_renderpass2 device ext.name:VK_KHR_draw_indirect_count device ext.name:VK_KHR_sampler_ycbcr_conversion device ext.name:VK_KHR_device_group device ext.name:VK_KHR_external_fence device ext.name:VK_KHR_variable_pointers device ext.name:VK_EXT_sampler_filter_minmax device ext.name:VK_KHR_storage_buffer_storage_class VULKAN_SYMBOL_WRAPPER_LOAD_INSTANCE_SYMBOL(vkGetPhysicalDeviceFeatures2KHR) res=1 mChipsetInfoUtilInfo.getVulkanInfo():{vk_driver_version=2149056512, vk_device_id=100859905, vk_extension_descriptor_update_template=1, vk_api_version=4198487, vk_support_fp16_storage=0, vk_platform_dlopen=success, vk_shader_int16=1, vk_device_type=1, vk_shader_float64=0, vk_extension_push_descriptor=1, vk_shader_int64=0, vk_wrapper_init=true, vk_vendor_id=20803, vk_max_compute_shared_memory_size=32768, vk_device_name=Adreno (TM) 630, vk_max_compute_work_group_invocations=1024, vk_device_count=1} ``` Reviewed By: dreiss Differential Revision: D19564664 fbshipit-source-id: 908b34bdcc24d9b03ecc185edbc5cfb6e7aa27c9	2020-01-27 16:34:47 -08:00
James Reed	812b1ad869	[quantization] FP16 dynamic quantized Linear Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32331 Test Plan: Imported from OSS Differential Revision: D19441158 Pulled By: jamesr66a fbshipit-source-id: c04247ffe707be68718c486c31bc6c6040f7dc11	2020-01-27 15:45:32 -08:00
svcscm	389b9c180b	Updating submodules Summary: GitHub commits: `9ae8cbb0a1` `986df37135` `ef4d11b6e1` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 04e7a5ad02cb412ef36672ec30e10a898c525232	2020-01-27 14:43:34 -08:00
Edward Yang	57519bd829	Revert "Fix iterator for ncclCommWatchdog. (#32571 )" (#32649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32649 This reverts commit 59dbece3716775c3e6f3a428f73fbf1bde8fac4f. Revert "Enhance NCCL watchdog to acitvely abort communicators for timed out ops. (#32338)" This reverts commit f86d6c6afd0e981642d20b4269837334ec46c140. Test Plan: Imported from OSS Differential Revision: D19584224 Pulled By: ezyang fbshipit-source-id: 6cc0ad56ba1f3aec5b48db44e8c6c24c8105db4a	2020-01-27 14:25:30 -08:00
Gregory Chanan	897b6908d4	Kill THIntegerTensor, THDenseTensor, THDenseIndexTensor. (#32599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32599 these aren't used anymore. Test Plan: Imported from OSS Differential Revision: D19565655 Pulled By: gchanan fbshipit-source-id: c0da31365df7342352f9850ae2a2e1e611a6886b	2020-01-27 13:26:31 -08:00
Zafar Takhirov	f6c46df856	Adding native qconcat Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32252 Test Plan: Imported from OSS Differential Revision: D19422889 Pulled By: z-a-f fbshipit-source-id: 23dd5f50009cc4c46b36c39ae1168b57f9a977a4	2020-01-27 11:24:46 -08:00
Edward Yang	f0917dce7f	Revert D19562258: [pytorch][PR] Fixes moving after weight norm application Test Plan: revert-hammer Differential Revision: D19562258 Original commit changeset: 4fef006e32cd fbshipit-source-id: 62e40de19331a61f4a65b7371460fe7dc28f23ea	2020-01-27 11:18:19 -08:00
Yinghai Lu	64323ae177	Back out "Use simd version for fp16 conversions" (#32640 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32640 Original commit changeset: 3b1ee0ba756e Reverting according to https://our.intern.facebook.com/intern/diff/D19291499/?transaction_id=1347995678706116&dest_fbid=465672071047258 Test Plan: unittests. Reviewed By: jspark1105, jianyuh Differential Revision: D19576708 fbshipit-source-id: bec92318523498067935234ab702c925ece71da6	2020-01-27 10:01:24 -08:00
Mike Ruberry	e36cbb8f2f	Fixes moving after weight norm application (#32563 ) Summary: This PR updates how RNNs handle their "flat weights." In particular, it allows for only some flat weights to be "materialized" when apply is called, and it updates the flattening behavior to only apply if all flat weights are (1) materialized, (2) share a dtype and (3) are acceptable to cuDNN. One test is modified and another created to test these changes. One practical effect of this change is that weight norm can be successfully applied to a module BEFORE that module is moved to an accelerator. Previously doing so would throw an error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32563 Differential Revision: D19562258 Pulled By: mruberry fbshipit-source-id: 4fef006e32cdfd8e3e3d519fc2ab5fc203dd7b36	2020-01-27 09:57:43 -08:00
Johannes M Dieterich	5ac2593d4f	[ROCm] Adjust elementwise_kernel settings on ROCm (#32609 ) Summary: Recent PR https://github.com/pytorch/pytorch/issues/31974 and upcoming PR https://github.com/pytorch/pytorch/issues/32383 are changing the behavior of the elementwise_kernel infrastructure on CUDA. In order to stay in sync, change the nd-loop behavior to match ROCm and CUDA for now. Once the full rework is done, the ROCm settings will likely diverge again. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32609 Differential Revision: D19580121 Pulled By: ezyang fbshipit-source-id: 4c8dcf6db3ac973e48ece6a665615cfe7d7cb764	2020-01-27 09:26:28 -08:00
Sameer Deshmukh	ca9dc67094	0-dim batch size input for interpolate. (#32400 ) Summary: This PR adds support for 0-dim batch size input for `torch.nn.functional.interpolate` for various modes of interpolation. Fixes part of gh-12013 CC: rgommers ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/32400 Differential Revision: D19557090 Pulled By: ezyang fbshipit-source-id: 6822f148bb47bfbcacb5e03798bf2744f24a2a32	2020-01-27 09:24:46 -08:00
Sameer Deshmukh	602394e996	verify input sizes for instance norm and group norm (#29082 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/19250 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29082 Differential Revision: D19373507 Pulled By: ezyang fbshipit-source-id: 231a79280f4cd7db2c26218a60869356a124bf77	2020-01-27 09:05:56 -08:00
xiaobing.zhang	19bb496a0d	Enable mkldnn on windows (#31355 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/15982. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31355 Differential Revision: D19428979 Pulled By: ezyang fbshipit-source-id: bee304c5913e70e8dead3098e9796051861cd666	2020-01-27 09:00:02 -08:00
Chaitanya Sri Krishna Lolla	957a07ffbd	[ROCm] Enable Caffe2 video operators for ROCm Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32610 Differential Revision: D19580129 Pulled By: ezyang fbshipit-source-id: 16d620173dcc231068e041d599aa09c94e677a9e	2020-01-27 08:29:07 -08:00
Rohan Varma	5b321a0985	[rpc] make handling of FORWARD_AUTOGRAD_REQ in request_callback_impl (#32476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32476 This makes the handling of FORWARD_AUTOGRAD_REQ in request_callback nonblocking. Processing this message requires unwrapping the message with autograd information, processing the original message, and sending back the message with autograd information wrapped. This makes the processing the original message nonblocking by grabbing a future to it and marking the parent future as completed when this one completes. ghstack-source-id: 97221251 Test Plan: `test_rpc_spawn.py` and `test_dist_autograd_spawn.py` both pass. Differential Revision: D19509501 fbshipit-source-id: 84ad2f9c5305ed11ed9bb0144b1aaf5f8698cd2b	2020-01-27 00:47:27 -08:00
peter	1e5aead35b	Make cuda search process of cpp extension quiet (#32620 ) Summary: Fixes https://discuss.pytorch.org/t/error-with-cpp-extentions/67559. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32620 Differential Revision: D19576164 Pulled By: soumith fbshipit-source-id: 076229322375774bec03ef2632fc233000c15391	2020-01-26 20:26:43 -08:00
Nikolay Korovaiko	8fbe1ccd16	faster bailout tests (#32266 ) Summary: Reduces the overhead of `prim::BailOut` nodes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32266 Differential Revision: D19503336 Pulled By: Krovatkin fbshipit-source-id: daa0c373f0fa17edd689600b75e7e4ba98b4670a	2020-01-26 19:44:00 -08:00
Summer Deng	12d5933969	Bug fix of norm minimization for dev mode (#31462 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31462 Fix the divide by zero issue in norm minimization in dev mode Test Plan: buck run mode/dev vision/video_modeling/classification/tools:test_octGloRe_quantization -- --test_data=/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_video/deep_vision_video_yufei_test_data_fcc_v4p2_10.csv --output_dir /mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_video/octGloRe --load_model_path=/mnt/vol/gfsfblearner-oregon/flow/data/2019-10-15/e2681db8-e4f5-4b70-ae18-45bf0b8fbfbc/train_model_epoch0_inputcount0_final.mdl --dataset_name="FCC V4P2" --num_labels=1099 --column_handle="handle" --clip_per_video=1 --num_groups=24 --width_per_group=2 --batch_size=32 --histogram_file=/mnt/vol/gfsadslearner-frc3c01/fblearner_flow/users/summerdeng/xray_video/octGloRe/hist_octGloRe_final_24x2_fcc_v4p2_1clip_f144586257_nullfix_100k_compiled.hist --int8_model_type="pb" --int8_predict_net_path="reproduce_octGloRe_final_24x2_predict_net_int8_l2approx_wminmax_from_mdl.pb" --int8_init_net_path="reproduce_octGloRe_final_24x2_init_net_int8_l2approx_wminmax_from_mdl.pb" --weight_quant="l2_approx" --activation_quant="l2_approx" --print_model --int8_model_saved --num_iter 10 Reviewed By: jspark1105 Differential Revision: D19172591 fbshipit-source-id: 994a20e3364b0dc33623a11281e0bdbc2e06159d	2020-01-26 12:44:14 -08:00
Edgar Andrés Margffoy Tuay	90a259e1e2	Add warning regarding pickle insecurity on torch.load documentation (#32593 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31875 Added a small warning box based on the one presented on the [pickle](https://docs.python.org/3/library/pickle.html) module regarding the safety issues of unpickling files. i.e., unwanted code execution. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32593 Differential Revision: D19572292 Pulled By: ngimel fbshipit-source-id: 69e7de390133ea77bddcadcd5b6820193c8abcc9	2020-01-25 22:12:37 -08:00
Enealor	3bbb36e02d	Update linspace types (#32218 ) Summary: Changes the linspace functions to be more consistent as requested in https://github.com/pytorch/pytorch/issues/31991. The code has also been updated to avoid an early rounding error; the line `scalar_t step = (scalar_end - scalar_start) / static_cast<static_t>(steps-1)` can result in `step = 0` for integer scalars, and this gives unintended results. I examined the new output using ``` import torch types = [torch.uint8, torch.int8, torch.short, torch.int, torch.long, torch.half, torch.float, torch.double] print('Testing linspace:') for type in types: print(type, torch.linspace(-2, 2, 10, dtype=type)) ``` which returns ``` Testing linspace: torch.uint8 tensor([254, 254, 254, 255, 255, 0, 0, 1, 1, 2], dtype=torch.uint8) torch.int8 tensor([-2, -2, -2, -1, -1, 0, 0, 1, 1, 2], dtype=torch.int8) torch.int16 tensor([-2, -2, -2, -1, -1, 0, 0, 1, 1, 2], dtype=torch.int16) torch.int32 tensor([-2, -2, -2, -1, -1, 0, 0, 1, 1, 2], dtype=torch.int32) torch.int64 tensor([-2, -2, -2, -1, -1, 0, 0, 1, 1, 2]) torch.float16 tensor([-2.0000, -1.5557, -1.1113, -0.6670, -0.2227, 0.2227, 0.6660, 1.1113, 1.5547, 2.0000], dtype=torch.float16) torch.float32 tensor([-2.0000, -1.5556, -1.1111, -0.6667, -0.2222, 0.2222, 0.6667, 1.1111, 1.5556, 2.0000]) torch.float64 tensor([-2.0000, -1.5556, -1.1111, -0.6667, -0.2222, 0.2222, 0.6667, 1.1111, 1.5556, 2.0000], dtype=torch.float64) ``` which is the expected output: `uint8` overflows as it should, and the result of casting from a floating point to an integer is correct. This PR does not change the logspace function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32218 Differential Revision: D19544224 Pulled By: ngimel fbshipit-source-id: 2bbf2b8552900eaef2dcc41b6464fc39bec22e0b	2020-01-25 20:23:54 -08:00
Charles Hofer	5fd037ce44	Fix MagmaInitializesCorrectly_CUDA by using an invertible matrix (#32547 ) Summary: This test case had been using the tensor ``` 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ``` which is not an invertible tensor and causes the test case to fail, even if magma gets initialized just fine. This change uses a tensor that is invertible, and the inverse doesn't include any elements that are close to zero to avoid floating point rounding errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32547 Differential Revision: D19572316 Pulled By: ngimel fbshipit-source-id: 1baf3f8601b2ba69fdd6678d7a3d86772d01edbe	2020-01-25 20:00:54 -08:00
jeongukjae	320d1a1573	Fix wrong typing (torch/nn/parameter.pyi) (#32617 ) Summary: A constructor of `nn.Parameter` has default values on `data` and `requires_grad`, but in type stub, there are no default values. Resolve https://github.com/pytorch/pytorch/issues/32481 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32617 Differential Revision: D19571397 Pulled By: ngimel fbshipit-source-id: fd14298aa472b7575221229cecf5a56f8c84f531	2020-01-25 16:19:33 -08:00
Jiakai Liu	69283388ca	[pytorch] codegen flags to whitelist op registrations / generate to separate files (#32451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32451 This PR adds a few new parameters to ATen codegen script: ``` 1. op_registration_whitelist Can be used to filter op registrations for selective build; 2. type_whitelist Can be used to filter types (CPUType, CUDAType, ...) for selective build; 3. per_op_registration When set it will group function registrations by op name and write to separate files; ``` 1 & 2 are introduced for mobile custom build without relying on static dispatch; 3 is introduced to solve custom build with multi-library / multi-model (needed by FB internal build - see more details: https://fb.quip.com/ZVh1AgOKW8Vv). These flags should work independently with each other (and independent to USE_STATIC_DISPATCH). Not setting them should have no effect compared to master. ghstack-source-id: 97214788 Test Plan: - tested all 3 params with FB internal build changes. Differential Revision: D19427919 fbshipit-source-id: a381fe5f768fe2e9196563787f08eb9f18316e83	2020-01-25 15:27:29 -08:00
Jiakai Liu	0afe195046	[pytorch] move type_derived_methods out of anonymous namespace (#32275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32275 Currently TypeDerived (e.g. `CPUType::`) methods are declared and defined in anonymous namespace as they are only called from c10 dispatcher - except for STATIC_DISPATCH mode, where they can be directly called from Functions.h. We plan to generate c10 op registration into separate files for internal xplat/BUCK build, thus we need declare these methods in non-anonymous namespace. I feel it's easier to simply change it unconditionally, unless there are some side effect I'm not aware of - `TypeDefault::` methods are in non-anonymous namespace anyway. ghstack-source-id: 97214789 Test Plan: - CI Differential Revision: D19426692 Pulled By: ljk53 fbshipit-source-id: 44aebba15f5e88ef4acfb623844f61d735016959	2020-01-25 15:24:32 -08:00
Jongsoo Park	bd20274e8f	[caffe2] use JIT'ed fp32 SLS (#32413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32413 Use JIT'ed fp32 SLS in Caffe2 operators Test Plan: CI Reviewed By: jianyuh Differential Revision: D19460555 fbshipit-source-id: 4f29d34523efb6ea1e4c324cc8c93c96990c6aad	2020-01-25 12:57:18 -08:00
Shihao Xu	6ad9e5c70d	Support TorchScript call over remote API (RRef) (#32466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32466 It's a follow-up work of https://github.com/pytorch/pytorch/pull/32197. In https://github.com/pytorch/pytorch/pull/32197, `rpc.sync_rpc(..) `and `rpc.rpc_async(..)` support taking a TorchScript annotated Python function as the user function for RPC. This PR extend along this direction by making `rpc.remote(..)` support taking a TorchScript annotated Python function as well. ghstack-source-id: 97211168 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork -- test_script_function_exception buck build mode/dev-nosan //caffe2/test/distributed/rpc:rpc_fork buck-out/gen/caffe2/test/distributed/rpc/rpc_fork\#binary.par -r test_script_function_exception ``` ``` buck test mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork -- test_backward_simple_script_call buck build mode/dev-nosan //caffe2/test/distributed/rpc:dist_autograd_fork buck-out/gen/caffe2/test/distributed/rpc/dist_autograd_fork\#binary.par -r test_backward_simple_script_call ``` Differential Revision: D19440633 fbshipit-source-id: d37f6dcdc0b80d35ac7bcba46ad6f9b831c3779b	2020-01-25 02:18:27 -08:00
Jongsoo Park	e0ffe72649	[aten] fix shadowing variable warning (#32573 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32573 Fix the following warning ``` caffe2/aten/src/ATen/ParallelOpenMP.h:36:9: warning: declaration of ‘num_threads’ shadows a previous local [-Wshadow=compatible-local] int64_t num_threads = omp_get_num_threads(); ^~~~~~~~~~~ caffe2/aten/src/ATen/ParallelOpenMP.h:29:9: note: shadowed declaration is here int64_t num_threads = omp_in_parallel() ? 1 : omp_get_max_threads(); ^~~~~~~~~~~ ``` Test Plan: CI Reviewed By: ilia-cher Differential Revision: D19552578 fbshipit-source-id: b8388de1aaa2bb7676b777c93b8ba9c25f5a3d51	2020-01-24 18:48:07 -08:00
Supriya Rao	169541871a	Add operator support for dynamic quant on mobile (#32479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32479 Run dynamic quantization on mobile (similar to FBGEMM). Currently only implemented on linear operator Test Plan: python test/test_quantized.py TestDynamicQuantizedLinear.test_qlinear Imported from OSS Differential Revision: D19542980 fbshipit-source-id: c9f6e5e8ded4d62ae0f2ed99e478c8307dde22ed	2020-01-24 17:51:54 -08:00
Pritam Damania	59dbece371	Fix iterator for ncclCommWatchdog. (#32571 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32571 The watchdog thread would erase an element and call `it--` (implicitly relying on `it++` in the for loop to position correctly). Although, `it--` would cause undefined behavior if the iterator is pointing to begin(). As a result, I've modified the logic to update the iterator appropriately. I've also enhanced the watchdog thread to catch and log exceptions. ghstack-source-id: 97150763 Test Plan: waitforbuildbot Differential Revision: D19551365 fbshipit-source-id: 426835819ad8d467bccf5846b04d14442a342f78	2020-01-24 17:34:36 -08:00
Jianyu Huang	1218a16aae	[pytorch][refactor] Explicitly use auto* for pointers (#32548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32548 As Title says. ghstack-source-id: 97175523 Test Plan: CI Differential Revision: D19541893 fbshipit-source-id: 96dce6964e6a89393d4159401a59672f041f51d3	2020-01-24 17:20:38 -08:00
Jerry Zhang	e7edc5f20e	[jit] Cloning constants in ClassType (#32371 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32371 After we add constants to ClassType, we didn't update clone to clone the constants, this PR adds the support fixes: https://github.com/pytorch/pytorch/issues/32368 Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D19564378 fbshipit-source-id: dbb13fb889d6ea9291034313b1f3c9aff4748bda	2020-01-24 16:48:38 -08:00
Rohan Varma	666472a38d	[docs] Change fut.wait() to torch.jit._wait(fut) in jit overview docs (#32336 ) Summary: It looks like the jit Future does not have a `wait()` anymore and this throws an error when trying to run this code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32336 Differential Revision: D19559922 Pulled By: rohan-varma fbshipit-source-id: a5aa67990595e98e0682a20cf5aced17c2ae85bb	2020-01-24 16:40:22 -08:00
Michael Ranieri	6412ca3ce9	duplicate symbols with AT_PARALLEL_OPENMP=0 (#32568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32568 explicitly disabling openmp actually causes it to be used. Test Plan: CI passes Reviewed By: ilia-cher Differential Revision: D19549732 fbshipit-source-id: 767b92148f47a1450ded46e101cd3d9b331a5d40	2020-01-24 16:27:50 -08:00
Jerry Zhang	91f10a1de1	[quant][graphmode][refactor] Better API for fold_convbn (#32380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32380 We'll clone the module first and then fold conv bn and return a new module Test Plan: . Imported from OSS Differential Revision: D19508033 fbshipit-source-id: 328e91a2c9420761c904a7f2b62dab4cfaaa31ac	2020-01-24 15:46:47 -08:00
Huamin Li	52f8f031ac	add diag into pt operator microbenchmark (#32597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32597 Currently, there is no benchmark test about diag operator. This diff will add one into the suite. Test Plan: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: diag # Mode: Eager # Name: diag_dim1_M64_N64_diagonal0_outTrue_cpu # Input: dim: 1, M: 64, N: 64, diagonal: 0, out: True, device: cpu Forward Execution Time (us) : 28.496 # Benchmarking PyTorch: diag # Mode: Eager # Name: diag_dim2_M128_N128_diagonal-10_outFalse_cpu # Input: dim: 2, M: 128, N: 128, diagonal: -10, out: False, device: cpu Forward Execution Time (us) : 45.179 # Benchmarking PyTorch: diag # Mode: Eager # Name: diag_dim1_M256_N256_diagonal20_outTrue_cpu # Input: dim: 1, M: 256, N: 256, diagonal: 20, out: True, device: cpu Forward Execution Time (us) : 49.009 ``` Reviewed By: mingzhe09088 Differential Revision: D19564024 fbshipit-source-id: 828a3e0e0e06810a77eb5ddb734efd30e4a63acf	2020-01-24 15:41:04 -08:00
Jiakai Liu	9e0ce72e9e	[pytorch] change op dependency output to use double-quoted strings (#32464 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32464 Changed to double quoted strings to make FB linter happy. Test Plan: Imported from OSS Differential Revision: D19507859 Pulled By: ljk53 fbshipit-source-id: fa70535c7fbea73214b3b0efb0532184b5ee6854	2020-01-24 15:27:28 -08:00
Jerry Zhang	2bfd33b4ab	[refactor] Adding FoldConvBatchNorm2dHelper (#32374 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32374 Moving all fold conv bn code to a class to prepare for making it work with shared ClassType Test Plan: compiles Imported from OSS Differential Revision: D19508032 fbshipit-source-id: 4e9cf714111305d2b5474d4506507078f69f0c84	2020-01-24 14:41:20 -08:00
Jeremy Lilley	573a30270c	[pytorch] Minor: boilerplate to propagate errors in request_callback_impl (#32556 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32556 Out of caution, avoid assuming that there's never a failure in a couple of request_calback_impl case handlers, but rather propagate the error. ghstack-source-id: 97128697 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D19544685 fbshipit-source-id: 67c55626960bd42a5b0dec7841e8ba44ab059eb9	2020-01-24 14:37:33 -08:00
albanD	3ab30753e9	Make autogen functions correct for multiple outputs and views (#31990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31990 This PR does three things: - Add a new `allow_rebase_history` flag to the differentiable views. If set, trying to rebase their history will raise an error. - Make sure that the codegen functions verify this flag before doing inplace operations so that they fail before doing the inplace modification. - Make sure the codegen functions set this flag properly when we don't support rebasing the history of the output. The codegen change can be found [here](`4bf180caa0`). Test Plan: Imported from OSS Differential Revision: D19409649 Pulled By: albanD fbshipit-source-id: a2b41c2d231e952ecfe162bdb6bad620ac595703	2020-01-24 14:32:28 -08:00
albanD	9e59244b53	fix view listing in autograd codegen (#32044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32044 Fix the list of views in the codegen: - Move `narrow` out of the autograd functions since it's now implemented with slice. - Add `split_with_sizes` that was missing from the list - Remove special formulas for both `split` and `split_with_sizes`. Both used not to be considered as views. When they are, all the rnn code breaks because it uses them in an invalid way. The generic formula will generate one `narrow` Node for each output. Which is always valid. The diff for the generated code can be found [here](https://github.com/pytorch/pytorch/compare/16eff6e...albanD:06d6e85) (outdated for last commit) Test Plan: Imported from OSS Differential Revision: D19409648 Pulled By: albanD fbshipit-source-id: 5ebc4c978af500403f7f008c0231b7db0cabab26	2020-01-24 14:31:21 -08:00
Jerry Zhang	d2bda53f6d	[quant][graphmode] Call _jit_pass_dedup_module_ueses in quantize_script (#32303 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32303 att Test Plan: . Imported from OSS Differential Revision: D19508029 fbshipit-source-id: 468ed53fc8bb3c8fdf5d79aea186949e64be711a	2020-01-24 13:34:40 -08:00
Jerry Zhang	fe3eb09da5	[quant] Re-enable fold_convbn in quantize_script (#32302 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32302 att Test Plan: . Imported from OSS Differential Revision: D19508035 fbshipit-source-id: 2ac26585396ec8a115acd0e1d7ccb84098a76824	2020-01-24 13:03:53 -08:00
Jiakai Liu	fd1a4f18ee	[pytorch] update code analyzer build.sh to handle srcs with same name (#32525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32525 Before calling static code analyzer we need link all bitcode files into a single module. Current approach is a bit hacky: cmake still calls "ar" to pack bitcode files into archives, then we manually unpack these archives and call llvm-link. Turns out libtorch_cpu.a contains a few files with same name, e.g.: ``` aten/src/ATen/native/SoftMax.cpp aten/src/ATen/native/mkldnn/SoftMax.cpp ``` "ar x" will only keep one of them and cause inaccurate analysis result. Use this temporary hack to workaround the problem. Ideally should merge this step into cmake (e.g. directly calling llvm-link to produce target output?). Differential Revision: D19530533 Pulled By: ljk53 fbshipit-source-id: 94b292c241abaaa0ff4a23059882abdc3522971e	2020-01-24 12:37:30 -08:00
Michael Suo	ef5637f85e	[jit] allow compilation using optional modules (#32539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32539 Before: if something in `_modules` was `None`, we would barf. This is incorrect because it's allowed for users to put `None` there, in case a module is optional. This case ought to be handled correctly during scripting. Fixes https://github.com/pytorch/pytorch/issues/32469 Test Plan: Imported from OSS Differential Revision: D19552346 Pulled By: suo fbshipit-source-id: aba7fdc19fd84d195c81cdaca8a75013a8626a8b	2020-01-24 11:51:47 -08:00
Nikolay Korovaiko	7d0f0b62de	API for testing bailouts (#32518 ) Summary: This API seems to be quite useful to make sure all bailouts in a graph are triggered. I used it for testing torchvision models and I was wondering if this might be something we might actually want to have? zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/32518 Differential Revision: D19553147 Pulled By: Krovatkin fbshipit-source-id: 7542c99051588b622091aec6d041c70731ca5d26	2020-01-24 11:19:41 -08:00
Eli Uriegas	f0c85571ed	docker: Refactor Dockerfile process for official images (#32515 ) Summary: ## Commit Message: Refactors Dockerfile to be as parallel as possible with caching and adds a new Makefile to build said Dockerfile. Also updated the README.md to reflect the changes as well as updated some of the verbage around running our latest Docker images. Adds the new Dockerfile process to our CircleCI workflows ## How to build: Building the new images is pretty simple, just requires `docker` > 18.06 since the new build process relies on `buildkit` caching and multi-stage build resolving. ### Development images For `runtime` images: ``` make -f docker.Makefile runtime-image ``` For `devel` images: ``` make -f docker.Makefile devel-image ``` Builds are tagged as follows: ```bash docker.io/${docker_user:-whoami}/pytorch:$(git describe --tags)-${image_type} ``` Example: ``` docker.io/seemethere/pytorch:v1.4.0a0-2225-g9eba97b61d-runtime ``` ### Official images Official images are the ones hosted on [`docker.io/pytorch/pytorch`](https://hub.docker.com/r/pytorch/pytorch) To do official images builds you can simply add set the `BUILD_TYPE` variable to `official` and it will do the correct build without building the local binaries: Example: ``` make -f docker.Makefile BUILD_TYPE=official runtime-image ``` ## How to push: Pushing is also super simple (And will automatically tag the right thing based off of the git tag): ``` make -f docker.Makefile runtime-push make -f docker.Makefile devel-push ``` Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/32515 Differential Revision: D19558619 Pulled By: seemethere fbshipit-source-id: a06b25cd39ae9890751a60f8f36739ad6ab9ac99	2020-01-24 10:27:20 -08:00
Michael Suo	8fd3eaed25	[jit] Fix dict type serialization (#32569 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32569 If the dict's contained types cannot be inferred from its contents (for example, `Dict[str, Tensor]` vs. `Dict[str, Optional[Tensor]]`), we must explicitly annotate the type. Also this removes some special handling that omits annotations on empty containers that have the default type. It makes the code more complex for not too much value, and was wrong for dicts anyway. Test Plan: Imported from OSS Differential Revision: D19551016 Pulled By: suo fbshipit-source-id: c529b112e72c10f509a6bc0f5876644caa1be967	2020-01-24 03:19:55 -08:00
Jianyu Huang	3ada2e0d64	[pytorch][embeddingbag] Parallelize the EmbeddingBag operator (#4049 ) Summary: Pull Request resolved: https://github.com/pytorch/glow/pull/4049 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27477 We would like to add the intra-op parallelization support for the EmbeddingBag operator. This should bring speedup for the DLRM benchmark: https://github.com/pytorch/pytorch/pull/24385 Benchmark code: ``` from __future__ import absolute_import, division, print_function, unicode_literals import torch import time eb = torch.nn.EmbeddingBag(1000000, 64, mode='sum') input = torch.LongTensor(1500).random_(0, 1000000) offsets = torch.zeros(64, dtype=torch.int64) niter = 10000 s = time.time() for _ in range(niter): out = eb(input, offsets) time_per_iter = (time.time() - s) / niter print('time_per_iter', time_per_iter) print('GB/s', (input.numel() * 64 * 4 + out.numel() * 4) / time_per_iter / 1e9) ``` The following results are single core on Skylake T6: - Before our change (with the original caffe2::EmbeddingLookup) time_per_iter 6.313693523406982e-05 GB/s 6.341517821789133 - After our change using the EmbeddingLookupIdx API which takes the offsets instead of lengths. time_per_iter 5.7627105712890626e-05 GB/s 6.947841559053659 - With Intel's PR: https://github.com/pytorch/pytorch/pull/24385 time_per_iter 7.393271923065185e-05 GB/s 5.415518381664018 For multi-core performance, because Clang doesn't work with OMP, I can only see the single-core performance on SKL T6. ghstack-source-id: 97124557 Test Plan: With D16990830: ``` buck run mode/dev //caffe2/caffe2/perfkernels:embedding_bench ``` With D17750961: ``` buck run mode/opt //experimental/jianyuhuang/embeddingbag:eb buck run mode/opt-lto //experimental/jianyuhuang/embeddingbag:eb ``` OSS test ``` python run_test.py -i nn -- TestNNDeviceTypeCPU.test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu ``` Buck test ``` buck test mode/dev-nosan //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets_cpu" OMP_NUM_THREADS=3 buck test mode/opt -c pytorch.parallel_backend=tbb //caffe2/test:nn -- "test_EmbeddingBag_per_sample_weights_and_new_offsets" --print-passing-details ``` Generate the AVX2 code for embedding_lookup_idx_avx2.cc: ``` python hp_emblookup_codegen.py --use-offsets ``` Differential Revision: D17768404 fbshipit-source-id: 8dcd15a62d75b737fa97e0eff17f347052675700	2020-01-23 21:29:44 -08:00
Yanli Zhao	b474c351dd	[rpc] Remove template on RRef and add Type to RRef creation (#30630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30630 This remove template and all the specializations it have in rpc, we universally use IValue as the inner value since we support making python object to be hold inside IValue. This will also ensure that we have the correct type information when creating the RRef, we use the return type from the schema when creating userRRef and OwnerRRef, it will enable IValue to always have the correct type if the IValue is the RRef object (next PR) Test Plan: Imported from OSS Differential Revision: D19502235 fbshipit-source-id: 0d5decae8a9767e0893f3b8b6456b231653be3c5	2020-01-23 21:15:46 -08:00
svcscm	ef2d4e67d1	Updating submodules Summary: GitHub commits: `08e28edc08` `6884ecfc67` `685144514f` `ed665880aa` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 7b19dca06ad7e8751de21efc48f5eada37b446fb	2020-01-23 21:09:43 -08:00
Elias Ellison	6f146e1768	[JIT] Remove capsule type handling of node hashing (#32540 ) Summary: Capsule Type doesn't appear in the IR, it is purely used in the runtime. So we should not have to handle it node hashing... Let's see if this breaks anything. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32540 Differential Revision: D19541357 Pulled By: eellison fbshipit-source-id: 905ed9f89cf6d03b45ddb4fde02adfa149b477f8	2020-01-23 17:44:28 -08:00
Nik Ved	d2f66083c5	porting gather to ATen using TensorIterator with multithreading support. (#32425 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/24702](https://github.com/pytorch/pytorch/issues/24702). Pull Request resolved: https://github.com/pytorch/pytorch/pull/32425 Differential Revision: D19538265 Pulled By: ngimel fbshipit-source-id: 78821a16b6948916e956a04f984e0956f86cf582	2020-01-23 16:14:47 -08:00
Jerry Zhang	4cd6b5cda6	[quant] Re-enable test_nested that has different qconfig for shared ClassType (#32206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32206 att Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D19508028 fbshipit-source-id: 5de3c2ef17de146feca03d7135a7e04f393de398	2020-01-23 15:32:57 -08:00
James Reed	6745bfc31c	Revert "Remove __torch__ from custom class qualname" (#32514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32514 This reverts commit c7fdf5b251c6fecd5d78b4f33d30bd77ca3f841c. Test Plan: Imported from OSS Differential Revision: D19525532 Pulled By: jamesr66a fbshipit-source-id: 126f4e87250a2ac739bd7aa161a0f7b39f143d38	2020-01-23 14:56:25 -08:00
James Reed	8ed1dd528e	[JIT] Add torch.classes.load_library Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32508 Test Plan: Imported from OSS Differential Revision: D19525175 Pulled By: jamesr66a fbshipit-source-id: b9f07113f551bdfb56d49d24d12989be2b8fc7e4	2020-01-23 14:56:20 -08:00
James Reed	69f9bf8893	[JIT] Support returning tuple from custom bound C++ method Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32477 Test Plan: Imported from OSS Differential Revision: D19509927 Pulled By: jamesr66a fbshipit-source-id: 7d407150402cc19344c3ec3b4a27b3d7c464e8ac	2020-01-23 14:56:15 -08:00
James Reed	ae42e232ce	[JIT] Fix custom class method binding for const methods Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32471 Test Plan: Imported from OSS Differential Revision: D19508249 Pulled By: jamesr66a fbshipit-source-id: 3a0bce6845072bb03567049a73b9982b54d8daf9	2020-01-23 14:56:11 -08:00
James Reed	7e14c420ae	[JIT] Test __getstate__ and __setstate__ for custom bound C++ classes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32470 Test Plan: Imported from OSS Differential Revision: D19508250 Pulled By: jamesr66a fbshipit-source-id: 481299fb3c18fa874c2a1d2993984bb6b3193bac	2020-01-23 14:56:06 -08:00
James Reed	dbd29e5668	[JIT] Passing custom class as arg (#32260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32260 This makes it so you can actually pass the custom class as an arg to ScriptFunctions Test Plan: Imported from OSS Differential Revision: D19424252 Pulled By: jamesr66a fbshipit-source-id: c3530186619655781dedbea03c2ad321aaff1cb8	2020-01-23 14:54:59 -08:00
Xiang Gao	ad4fba0ce4	Only run test_conv_large and test_conv_transposed_large_cuda on 32GB device (#32473 ) Summary: For some reason, these two tests start to fail on 16GB Volta on Linux... Also fixes https://github.com/pytorch/pytorch/issues/31650 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32473 Differential Revision: D19538314 Pulled By: ngimel fbshipit-source-id: 266195f19d8cf76b035795e0e318c152ae72adc2	2020-01-23 14:50:24 -08:00
Denis Akhiyarov	49cd83d735	no more build_pytorch_libs.sh/.bat (#32319 ) Summary: https://github.com/pytorch/pytorch/issues/12918 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32319 Differential Revision: D19544272 Pulled By: soumith fbshipit-source-id: dd32fa61efa78af908f21c7e54cb6484bf895e54	2020-01-23 14:45:54 -08:00
Jerry Zhang	d234626267	[quant][graphmode] Support quantizing shared ClassType with different qconfigs (#32205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32205 to be filled Test Plan: python test_jit.py Imported from OSS Differential Revision: D19508031 fbshipit-source-id: cbf03d34e52eae62595c34fde6ec645cb6744ad9	2020-01-23 14:32:55 -08:00
Elias Ellison	ef94496b36	[JIT] throw if no self arg on ignored methods (#32503 ) Summary: There was a user who did this and it would seg fault. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32503 Differential Revision: D19538481 Pulled By: eellison fbshipit-source-id: dc3752028b9eff6ac88c025e8a2b5f8fd44ce32f	2020-01-23 14:27:00 -08:00
Guanheng Zhang	db02a4e4ce	Support 3D attention mask in MultiheadAttention. (#31996 ) Summary: Support a 3D attention mask for MultiheadAttention. If `attn_mask` has the batch dimension, it will not be unsqueezed. Fix https://github.com/pytorch/pytorch/issues/30678 Relevant issues/pr: https://github.com/pytorch/pytorch/pull/25359 https://github.com/pytorch/pytorch/issues/29520 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31996 Differential Revision: D19332816 Pulled By: zhangguanheng66 fbshipit-source-id: 3448af4b219607af60e02655affe59997ad212d9	2020-01-23 13:16:48 -08:00
Martin Yuan	b6b8620871	Add unit test on export_opnames with interface. (#31531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31531 As suggested by suo , add unit test on torch.jit.export_opnames with interface. A submodule is annotated as interface and assigned to an instance, and then re-assigned to another instance. Make sure the operator names are also updated. Test Plan: Imported from OSS Differential Revision: D19539129 Pulled By: iseeyuan fbshipit-source-id: 71a76ae7790cdd577618ca278afdb132727f08dc	2020-01-23 12:27:22 -08:00
Pavel Belevich	9af5a97b1d	Fix nll_loss to support empty tensors on GPU (#31491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31491 Fixes #31472 Test Plan: Imported from OSS Differential Revision: D19537231 Pulled By: pbelevich fbshipit-source-id: 20a43251a0f68a7a3557dd8234daee2d4814e5dd	2020-01-23 11:45:59 -08:00
Jerry Zhang	583bb97618	[quant][graphmode] Default to non-inplace in graph mode quantization API (#32204 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32204 att Test Plan: . Imported from OSS Differential Revision: D19508030 fbshipit-source-id: 94814c3c126a196f3938f944abfa5ae2a24d8dde	2020-01-23 10:39:46 -08:00
Lu Fang	ea7bebb7fe	[PyTorch BC] Clean up the whitelist for PyTorch Op BC check (#32523 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32523 remove stale items Test Plan: cont build Reviewed By: hl475 Differential Revision: D19526918 fbshipit-source-id: ee7392ae84e5ddf88284020775119e59c9b6533e	2020-01-23 09:25:37 -08:00
albanD	02aa3ba331	Raise error for code that risk deadlock (#32295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32295 Fix for https://github.com/pytorch/pytorch/issues/32045 Calling into the engine with the GIL can deadlock because: - worker thread initialization acquires the GIL - Any Node / hook can be a python function that will acquire the GIL The choice was made here to raise an error as one of the advantage of using cpp extensions with python is to be able to release the GIL. So we prefer to educate users to do it rather than doing it under the hook. Test Plan: Imported from OSS Differential Revision: D19430979 Pulled By: albanD fbshipit-source-id: e43f57631885f12e573da0fc569c03a943cec519	2020-01-23 08:53:59 -08:00
Hongyi Jia	21d475e20d	[gloo] Skip registry warning (#31126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31126 Gloo device creator registry is throwing warning that confuses users - https://fb.workplace.com/groups/1405155842844877/permalink/3217491788277931/ Create C10_DEFINE_SHARED_REGISTRY_WITHOUT_WARNING API to skip such warning Test Plan: {F224342749} Tested both `C10_DEFINE_SHARED_REGISTRY` and `C10_DEFINE_SHARED_REGISTRY_WITHOUT_WARNING`. Make sure nothing breaks Reviewed By: d4l3k Differential Revision: D18904783 fbshipit-source-id: 0e0065d530956249a18325d4ed3cb58dec255d4c	2020-01-22 22:46:27 -08:00
Pritam Damania	f050b16dd9	Move pytorch distributed tests to separate folder for contbuild. (#30445 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30445 Create distributed and rpc directories under caffe/test for better management of unit tests. Differential Revision: D18702786 fbshipit-source-id: e9daeed0cfb846ef68806f6decfcb57c0e0e3606	2020-01-22 21:16:59 -08:00
Jongsoo Park	e735395fc6	[caffe2] use 2-stage EmbeddingSpMDM interface (#32271 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32271 Use the 2-stage EmbeddingSpMDM interface in D19425982 to reduce the overhead of code cache lookup and lock contention. Fix an issue in sparse_lengths_sum_benchmarks generating empty indices when average length is small like 1. Test Plan: CI Reviewed By: dskhudia Differential Revision: D19425987 fbshipit-source-id: d5c5f0d46e0072403901809c31d516fa0f4b9b31	2020-01-22 19:05:36 -08:00
Dehua Cheng	685f090ac8	[Rowwise Pruning][c2 op] Add Quantile Op (#32448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32448 Using binary search to compute the value for the given quantile among the input tensors. Test Plan: Newly added unittests; Reviewed By: jspark1105 Differential Revision: D19487604 fbshipit-source-id: 0dc6627b78d1310ac35b3f1d53b89cc89a697ece	2020-01-22 16:59:56 -08:00
Michael Carilli	4bdfc71421	Fix race condition for to() backward that spans devices (#31930 ) Summary: While putting finishing touches on the gradient scaling PR (https://github.com/pytorch/pytorch/pull/26512), I discovered my multi-GPU test (which uses `to()` to transfer tensors between devices) was intermittently failing with bad numerics. I knew it was going to be [a weird case from the start](https://www.imdb.com/title/tt8946378/quotes/qt4868203) and spent a week descending into madness. It turns out, for backward ops that create gradients on a different device from the device on whose stream the op is executed, the streaming backward synchronizations in [input_buffer.cpp](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/input_buffer.cpp#L46-L83) do not properly tell later ops to wait on the population/creation of those gradients. For example, a cross-device `to()` backward (CopyBackward Node) enqueues a cudaMemcpyAsync on the current stream of the source (incoming gradient's) device, then [syncs getCurrentCUDAStream on the destination device with the cudaMemcpyAsync](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/cuda/Copy.cu#L76). However, `input_buffer.cpp` in such cases ([case (3)](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/input_buffer.cpp#L77-L81)) was not properly telling `opt_consumer_stream` to wait on the current stream of the destination device (`var`'s device). Circumstances needed to repro in current master (see [my test](https://github.com/pytorch/pytorch/compare/master...mcarilli:backward_to_race_fix#diff-e68a7bc6ba14f212e5e7eb3727394b40R1901)): - 2 devices, with non-default streams used for forward-pass ops on both devices (which is the default behavior in test_cuda.py) - A `to()` that transfers a tensor requiring grad from one device to another - A backward pass that routes back through to()'s backward (aka CopyBackward). Under these circumstances, backward ops following CopyBackward on CopyBackward's destination device (aka the original forward-pass source device) race with the device-to-device transfer, and execute using partially-transferred data. The present PR fixes the race condition and ensures that later ops wait on the CopyBackward transfer. This PR should also make streaming backward safe for other backward ops that span devices, as long as they play nice and populate any new gradients they create using the "current stream" of the device(s) on which they create those gradients. There are a couple minor issues where I'm not sure of the best approach: - Should we guard onto the var's device for the entire body of InputBuffer::add? - I'm fairly sure we need to `recordStream` on `var` if the consumer stream is different from the stream on which (we expect) `var` was created, but calling `c10::cuda::CUDACachingAllocator::recordStream` in input_buffer.cpp might break CPU-only builds. I couldn't find a different API call to record streams that seemed CPU-build-agnostic. Could I wrap the call with a macro? Thanks to mruberry for helpful suggestions and also the organization/naming of the stream pool and streaming backward code that allowed me to (just barely) wrap my head around the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31930 Differential Revision: D19517617 Pulled By: mruberry fbshipit-source-id: 183d5460aefa5d27366b465b0473b80ec80fa044	2020-01-22 16:32:24 -08:00
Yanli Zhao	193ac31441	[jit] Enable IValue to hold a PyObject (#32491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32491 This PR enables IValue to be able to hold a pure PyObject by adding a new enum tag, a new jit_type to denote PyObject existance in IValue and the JIT type system. We don't and not plan to expose this to user. This is the basic piece that enable ivalue to be adopted broader like making RRef always hold IValue, it might also simplify some compiler logic ghstack-source-id: 97039980 Test Plan: Imported from OSS Differential Revision: D19502234 fbshipit-source-id: 90be001706d707d376cfbea25980fd82980df84a	2020-01-22 15:48:32 -08:00
svcscm	556c0b063d	Updating submodules Summary: GitHub commits: `87b81e7cb2` `3a9a0976f2` `9294f3b2fa` `c8addc5ad4` `9a9f1a849a` `27cb280170` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 73beec64bf9c17fa6c42dd09ea85350e8c9c66ea	2020-01-22 15:30:31 -08:00
Jongsoo Park	14e0bec9f2	[caffe2] remove unnecessary np.set_printoptions and fix test errors (#32475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32475 As title Test Plan: CI Reviewed By: houseroad Differential Revision: D19508778 fbshipit-source-id: fd9ad63607535980505d155f3e3c3b7c6b95daf7	2020-01-22 14:49:47 -08:00
Gaurav Singh	faffd2141a	Corrected logical boolean expression (#32249 ) Summary: Changed bitwise & to logical && in the boolean expression. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32249 Differential Revision: D19501586 Pulled By: eellison fbshipit-source-id: afe374cfc9661182703cc82810d9cb735fbb8180	2020-01-22 13:54:16 -08:00
Shen Li	43eb931c0f	Remove mis-exposed abort API on ProcessGroup Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32292 Test Plan: Imported from OSS Differential Revision: D19430252 Pulled By: mrshenli fbshipit-source-id: 4ec594e1be54afe774bdcecc0f1c9bda2edf5e0d	2020-01-22 12:51:20 -08:00
Jerry Zhang	b7c6277c53	Adding QConfigTypePtrMap (#32203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32203 The type is needed for allowing multiple qconfig configurations for shared ClassType, see next PR for more details Test Plan: . Imported from OSS Differential Revision: D19508027 fbshipit-source-id: a3df29dab3038bfa88c55dda98a3e8a78e99e5a1	2020-01-22 12:40:12 -08:00
Elias Ellison	38d122eca9	implement tuple constants (#31841 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31841 Add Tuple Constants to JIT. The constraint here is that all elements of a tuple must themself be insertable as a a constant. Previously tuples were special cased in constant propagation, but now that there are more passes that are inserted constants, such as freezing, we should just have tuples be representable as constants. Test Plan: Imported from OSS Differential Revision: D19439514 Pulled By: eellison fbshipit-source-id: 3810ba08ee349fa5598f4b53ea64525996637b1a	2020-01-22 12:13:31 -08:00
Elias Ellison	69492ad6ac	remove tuple logic in constant propagation (#31840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31840 The next PR in this stack makes tuples insertable as constants, so we can remove special handling of tuples in constant propagation. Test Plan: Imported from OSS Differential Revision: D19439515 Pulled By: eellison fbshipit-source-id: c58f153157f1d4eee4c1242decc4f36e41c1aa05	2020-01-22 12:13:26 -08:00
Elias Ellison	b01d824a78	improve mayContainAlias (#31839 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31839 There are a number of improvements that can be made to `mayContainAlias`, which I would like to do in follow ups. For now, this is an easy one. Test Plan: Imported from OSS Differential Revision: D19439516 Pulled By: eellison fbshipit-source-id: 0042fb7eaae6cfb4916bf95dc38280517a4bd987	2020-01-22 12:13:20 -08:00
Elias Ellison	adf0916606	Add str[] float[] constants resubmit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31791 Test Plan: Imported from OSS Reviewed By: driazati Differential Revision: D19439513 Pulled By: eellison fbshipit-source-id: a04c7401687b051f0d4fb4794963931ebe004194	2020-01-22 12:11:58 -08:00
Zachary DeVito	e184a8843c	Fix comparisions for ConcreteModuleType (#32256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32256 Previously two unrelated modules loaded from torch.jit.load would compare equal because we only considered their data_ attributes which are initialized blank in torch.jit.load. This changes ConcreteModuleType to distinguish when the data_ attribute is blank vs when it is empty. This replaces the poisoned logic. ghstack-source-id: 96755797 Test Plan: oss Differential Revision: D19423055 fbshipit-source-id: 79d6a50a3731c6eeb8466ba2a93702b49264bba0	2020-01-22 11:59:38 -08:00
Jerry Zhang	8e689378c7	Move some of the helper functions for public use (#32202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32202 Move some helper functions in ModuleUseDeduper for public use Test Plan: . Imported from OSS Differential Revision: D19508034 fbshipit-source-id: 2e8e05eff6f3bbcfe6936598371e4afa72f9b11f	2020-01-22 11:35:37 -08:00
Edgar Riba	510a122d27	add missing align_corners annotation (#32492 ) Summary: adds the missing annotation in grid_sample and affine_grid functional Pull Request resolved: https://github.com/pytorch/pytorch/pull/32492 Differential Revision: D19516550 Pulled By: ezyang fbshipit-source-id: 064c8c99bf6eae6744237c0b151b3ce4c82ada96	2020-01-22 11:29:07 -08:00
Hong Xu	1c017f0c14	Migrate max and min (binary) from TH to ATen. (#30851 ) Summary: TH implementation will be removed after the unary max and min are migrated. Benchmark: (Debian 10, Release build, gcc 7.4, no turbo) ```python import timeit for device in ('cpu', 'cuda'): print(f'device: {device}') for op in ('max', 'min'): for dtype in ('torch.double', 'torch.float', 'torch.int16', 'torch.int32', 'torch.int64'): for n, t in [(10_000, 200000), (100_000, 20000)]: print(f'torch.{op}(a, b), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit(f'torch.{op}(a)' + (';torch.cuda.synchronize()' if device == 'cuda' else ''), setup=f'import torch; a = torch.arange({n}, dtype={dtype}); b = torch.ones({n}, 0, dtype={dtype}) * ({n} / 2)', number=t)) print() ``` Before: ``` device: cpu torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 2.241763713000182 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.7138833169992722 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.2183356810000987 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.7031846980007685 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.7704679510006827 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.289198366999699 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.7937613740014058 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.2930124340000475 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8032857640009752 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.2908709189996443 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 1.8829010000008566 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.2994690759987861 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 1.8037853410005482 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.2929310759991495 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.8075240359994496 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.2932477679987642 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.7868400779989315 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.2885970789993735 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8389664830010588 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.29402057399966 device: cuda torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 4.787109836999662 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.842438002999188 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 3.429616614999759 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.835390076999829 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.940423873000327 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.4108991760003846 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.9318018840003788 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4168134739993548 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 2.9610764919998473 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4189234130008117 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 2.960172712999338 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.4162539499993727 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.8985912560001452 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.4113489299998037 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.9160250799995993 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.4128787690005993 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.8806865219994506 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4086357010000938 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 2.9362181240012433 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4151225870009512 ``` After: ``` device: cpu torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 2.2685823729998447 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.72004808300062 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.212242640000113 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.7089235590001408 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.7767087259999244 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.2916517639996528 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.8265984959998605 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.3002885240002797 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8084679720004715 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.3012119999993956 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 1.8800218449996464 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.3060645710002063 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.4905043950002437 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.9126290209997023 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.7972335520007618 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.2918074379995232 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.8047651860006226 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.2992197730000044 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8526509560006161 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.3030709570002728 device: cuda torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 4.700986622000528 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.8415469050005413 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 3.3051693249999516 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.8321999460004008 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.8086475109994353 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.405110773999695 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.913458047999484 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4236377289998927 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 2.9386842409994642 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4230227469997772 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 3.0341797270002644 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.4289592409995748 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 3.6091147850002017 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 2.036691903999781 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.8256167649997224 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.4078955400000268 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.8631781489993955 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4210130069996012 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 3.0112479260005784 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4297719679998409 ``` Solve partly https://github.com/pytorch/pytorch/issues/24594 #24595 Close https://github.com/pytorch/pytorch/issues/25016 Continuing https://github.com/pytorch/pytorch/issues/27185 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30851 Differential Revision: D19515694 Pulled By: ezyang fbshipit-source-id: 1764897f912d6ae24b0c361f19a1aacf96e0826e	2020-01-22 09:03:18 -08:00
peter	b77c25dec0	Fix dll load logic for Python 3.8 on Windows (#32215 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31181 and https://github.com/pytorch/pytorch/pull/31162#discussion_r362495611. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32215 Differential Revision: D19501869 Pulled By: ezyang fbshipit-source-id: 363824e52d2592ad968ecf1df345aa4c0daff915	2020-01-22 08:33:34 -08:00
Yanli Zhao	c342c354a9	Put sparse all reduce results to input tensors (#32226 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32226 right now if users call torch.dist.all_reduce() on dense tensors, outputs are put in input tensors. but if users call torch.dist.all_reduce() on sparse tensors, outputs are neither returned explicitly to users nor are put in input tensors. To make torch.dist.all_reduce() API have same behavior on both dense tensors and sparse tensors, this diff is made to make torch.dist.all_reduce() on sparse tensors to put output in input tensors as well. This is acheived by simply calling input_sparse.copy_(output_sparse), see PR https://github.com/pytorch/pytorch/pull/9005 that implemented copy_ for sparse tensors. close #31413 ghstack-source-id: 96984228 Test Plan: unit test Differential Revision: D19192952 fbshipit-source-id: 2dd31dc057f20cc42b44b9e55df864afa2918c33	2020-01-22 08:06:56 -08:00
Peter Bell	e37a24b044	Always return a new tensor from nn.functional.pad (#32350 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31734 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32350 Differential Revision: D19501845 Pulled By: ezyang fbshipit-source-id: ea79496d23dc0016f3caa233c53d283b08f60371	2020-01-22 08:03:42 -08:00
Stas Bekman	8abaa322da	fix torch.eq() doc entry (#32399 ) Summary: fix `torch.eq()` entry example to match the current output (boolean, instead of uint8) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32399 Differential Revision: D19498104 Pulled By: ezyang fbshipit-source-id: e7ec1263226766a5c549feed16d22f8f172aa1a3	2020-01-22 07:43:10 -08:00
Edward Yang	248f6d0485	Implement backend fallback fallthrough (#32439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32439 This adds c10::fallthrough_kernel which is a special boxed function which can be used to implement fallthrough behavior at a dispatch key. A fallthrough kernel will redispatch to the next valid dispatch key. It is implemented in such a way that it costs no more to fallthrough than it does to go straight to the actual implementation of the kernel. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D19503886 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: 6ee05bd815c4ef444e612d19f62312dbb76f2787	2020-01-22 07:32:08 -08:00
Hong Xu	0d610b4821	Remove the support of build options like NO_, WITH_ (#32447 ) Summary: We will now use USE_, BUILD_ consistently. The backward compatibility for NO_* and WITH_* is hereby removed in this commit, as promised in the comment (next release is beyond Feb 20): # Before we run the setup_helpers, let's look for NO_* and WITH_* variables and hotpatch environment with the USE_* # equivalent The use of NO_* and WITH_* is deprecated and will be removed in Feb 20, 2020. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32447 Differential Revision: D19515536 Pulled By: ezyang fbshipit-source-id: 2f2c51e6d4674af690b190a1f0397b8f596b6a15	2020-01-22 07:25:29 -08:00
Jerry Zhang	44b270d892	`insert_quant_dequant` pass support shared class types (#31408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31408 We'll error out when a graph is quantized with different QSchemes. This only occurs when we have two modules that have same types (e.g. two Conv2d modules initialized with same arguments) and quantized with two configs that would produce different quantized graphs, for example per tensor affine and per channel affine. This is a rare case, so it should be OK to skip for now. Actual support will come later. Test Plan: test_jit.py, test_quantization.py Imported from OSS Differential Revision: D19162366 fbshipit-source-id: 798f06d0ddef0c8458237ce88b62159cc77eec8b	2020-01-21 22:18:49 -08:00
svcscm	60b6c99aa7	Updating submodules Summary: GitHub commits: `d2ee8a1a3f` `a1543b168d` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: a1394f1c4a48920d3ce1403c70351e2c56eaecf0	2020-01-21 19:18:29 -08:00
xiaobing.zhang	64de93d8e7	Move log_normal to Aten(CPU) (#31854 ) Summary: Fix https://github.com/pytorch/pytorch/issues/24723. Benchmark script : ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" #warm up for n in [10, 100, 1000]: input = torch.randn(128, n, requires_grad=False, device=device) for i in range(1000): input.log_normal_() for n in [1, 10, 100, 1000]: fwd_t = 0 input = torch.randn(128, n, requires_grad=False, device=device) for i in range(10000): t1 = _time() input.log_normal_() t2 = _time() fwd_t = fwd_t + (t2 -t1) fwd_avg = fwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.4f (ms)." % (n, fwd_avg)) ``` Test Device: skx-8180. Before: ``` input size(128, 1) forward time is 0.0114 (ms). input size(128, 10) forward time is 0.1021 (ms). input size(128, 100) forward time is 1.0081 (ms). input size(128, 1000) forward time is 10.1831 (ms). ``` After: ``` input size(128, 1) forward time is 0.0108 (ms). input size(128, 10) forward time is 0.0969 (ms). input size(128, 100) forward time is 0.9804 (ms). input size(128, 1000) forward time is 9.6131 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31854 Differential Revision: D19314586 Pulled By: pbelevich fbshipit-source-id: 2ea1d9a2c505e36aca9e609b52ccb3e8caf2ba8f	2020-01-21 19:07:31 -08:00
svcscm	4973695268	Updating submodules Summary: GitHub commits: `d45f7b4f09` `e6e8b9e871` `da618022d2` `2df47f519a` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: c4af09e70a56d11e845150ba3d90a570a3758e51	2020-01-21 17:16:46 -08:00
Peter Bell	7fdc6cb74e	Fix test_data_parallel name errors and add to run_test.py (#32428 ) Summary: While working on https://github.com/pytorch/pytorch/issues/31768 and trying to add tests for `DataParallel`, I discovered that: - `test_data_parallel.py` can't be run through `run_test.py` - running it with `pytest` fails with many name errors `test_data_parallel.py` seems to have been split from `test_nn.py` in https://github.com/pytorch/pytorch/issues/28297 but not in a state where it can actually be run. Presumably `DataParallel` hasn't been tested by CI in the time since. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32428 Differential Revision: D19499345 Pulled By: ezyang fbshipit-source-id: f9b748a99a5c85fc6675c22506cf10bbfd9c8a4d	2020-01-21 15:11:03 -08:00
Pritam Damania	0b606a4a7c	Enhace DispatchStub to be thread safe from a TSAN point of view. (#32148 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32148 TSAN would complain about multiple threads reading and writing to the `cpu_dispatch_ptr` without any sort of synchronization. Although, this is a valid issue from a TSAN point of view, there wasn't a correctness issue since both threads would compute the same value. In order to fix this, I've used std::atomic for cpu_dispatch_ptr with relaxed ordering guarantees. ghstack-source-id: 96989435 Test Plan: Verify the TSAN tests pass. Differential Revision: D19386082 fbshipit-source-id: 1ff0893e02529eddd06b2855d9565edf1bbf1196	2020-01-21 14:59:57 -08:00
anjali411	be6ffac1b6	Adagrad optimizer - updated step function, added param_groups, state to optimizers Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29335 Differential Revision: D19449382 Pulled By: anjali411 fbshipit-source-id: ee238801ed9cdf15a80f2ce31cc4aab8ba582aea	2020-01-21 14:41:12 -08:00
svcscm	0ed04bfdf6	Updating submodules Summary: GitHub commits: `40b08129cf` `8cd8d286e6` `d305f13e21` `2957bd45f1` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 3b76eb7c8b6b5cf617aca7bd143e1ee404c4f0ed	2020-01-21 14:11:17 -08:00
Ashkan Aliabadi	e1d97025ee	QNNPACK: Add support for dynamic quantization. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31896 Test Plan: Added new tests to QNNPACK's test suite to cover the new use case. All new tests are passing. Reviewed By: supriyar Differential Revision: D19443250 Pulled By: AshkanAliabadi fbshipit-source-id: fa7b1cffed7266a3c198eb591d709f222141a152	2020-01-21 12:33:08 -08:00
svcscm	bc6005281b	Updating submodules Summary: GitHub commits: `47e0b9b97e` `6d225aaf95` `ab4da8f60a` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 27bcdf08b6f5e47a5c948e094aca26bf67a6fb66	2020-01-21 12:12:31 -08:00
James Reed	9e853e7090	Revert "Temporary workaround for BC test due to schema parser changes" (#32441 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32441 This reverts commit ceffdbd2179e7dafdc6407909a00f4267db040de. Test Plan: Imported from OSS Reviewed By: houseroad Differential Revision: D19500043 Pulled By: jamesr66a fbshipit-source-id: 3bd22c55e4a81ff8b89d27f6e7438e3bdfc18606	2020-01-21 12:07:46 -08:00
Pritam Damania	f86d6c6afd	Enhance NCCL watchdog to acitvely abort communicators for timed out ops. (#32338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32338 Timed out ops could linger around if the user doesn't actually call `wait()` on that OP. As result, to fix this I've introduced the following functionality in this PR: 1. Keep track of all outstanding work in ProcessGroupNCCL. 2. Enhance NCCL watchdog to sweep through all outstanding work and perform the following operations: i. If the work has timed out, abort all communicators for that work and remove them from the cache. ii. If the communicators for the work receive an error, abort the communicators and remove them from the cache. iii. If the work has completed (successfully/unsuccessfully), remove it from the list of outstanding work. ghstack-source-id: 96895704 Test Plan: waitforbuildbot Differential Revision: D19401625 fbshipit-source-id: 8f6f277ba2750a1e1aa03cdbc76e8c11862e7ce5	2020-01-21 12:05:40 -08:00
Gaurav Singh	ec4be4e58c	Redundant condition (#32396 ) Summary: Optimize expression: 'A \|\| (!A && B)' <=> 'A \|\| B' A: relErr <= maxRelErr !A : relErr > maxRelErr B: absErr <= absErrForRelErrFailure Pull Request resolved: https://github.com/pytorch/pytorch/pull/32396 Differential Revision: D19499370 Pulled By: ezyang fbshipit-source-id: c19bdcb2d4e7ff7806a8cd181c6e7e9e276b9979	2020-01-21 11:30:49 -08:00
Dmytro Dzhulgakov	839fe714de	Fix BC test after TorchBind cahnges (#32429 ) Summary: It was broken by https://github.com/pytorch/pytorch/issues/32320. Let's be on the safe side and just whitelist all testing ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/32429 Differential Revision: D19501016 Pulled By: dzhulgakov fbshipit-source-id: 9cc1d363edb4579905bee1976a2b57255ce41738	2020-01-21 11:30:44 -08:00
David Reiss	e4f43bf7a5	Set rpath for JNI library on Mac (#32247 ) Summary: Without this, dlopen won't look in the proper directory for dependencies (like libtorch and fbjni). Pull Request resolved: https://github.com/pytorch/pytorch/pull/32247 Test Plan: Build libpytorch_jni.dylib on Mac, replaced the one from the libtorch nightly, and was able to run the Java demo. Differential Revision: D19501498 Pulled By: dreiss fbshipit-source-id: 13ffdff9622aa610f905d039f951ee9a3fdc6b23	2020-01-21 11:30:39 -08:00
generatedunixname89002005287564	9482683065	Remove dead includes in caffe2/test Reviewed By: ezyang Differential Revision: D19273220 fbshipit-source-id: 3dfc3388914e60611c84472e3fc529f5b5e40534	2020-01-21 11:30:34 -08:00
Peter Bell	c13df8b688	Fix cusparse version check (#32405 ) Summary: The current version check doesn't use proper lexicographic comparison and so will break for future versions of cuSPARSE with `CUSPARSE_VER_MAJOR > 10` and `CUSPARSE_VER_MINOR < 2`. Also, my cusparse headers for CUDA 9 don't seem to include version macros at all, so added `if !defined` to be explicit about that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32405 Differential Revision: D19499412 Pulled By: ezyang fbshipit-source-id: 1593bf1e5a4aae8b75bb3b350d016cc6c3b9c009	2020-01-21 11:30:30 -08:00
Rohan Varma	9ce25cce91	add an option to record time spent waiting for GIL (#30842 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30842 We'd like to profile the time spent on GIL acqusiition to debug performance issues. Test Plan: Unit tests pass. Differential Revision: D18837590 fbshipit-source-id: 925968f71c5fb96b8cd93f1eab4647602d2617d1	2020-01-21 11:29:23 -08:00
Edward Z. Yang	1177191c8e	Synchronize with ShipIt. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2020-01-21 13:39:28 -05:00
Tongzhou Wang	cc2d5b15ad	F.normalize uses clamp_min_ inplace (#32360 ) Summary: We don't care about autograd when `out!=None` anyways Pull Request resolved: https://github.com/pytorch/pytorch/pull/32360 Differential Revision: D19452402 Pulled By: colesbury fbshipit-source-id: c54775289f8a700019ca61e951d59ff4894ac980	2020-01-21 10:38:06 -08:00
Eli Uriegas	0c03304bdf	.circleci: Only run macos libtorch on master (#32378 ) Summary: These jobs were taking forver to run so we decided it's only really worth it to run it on master. Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/32378 Differential Revision: D19499301 Pulled By: seemethere fbshipit-source-id: 22cac5b5baee84e44607a16daeb77048cb0f5974	2020-01-21 10:38:01 -08:00
Bartosz Gasiorzewski	a2641e6005	Make type of `Tensor.type()` more specific (#32353 ) Summary: Fixes the following issue: ``` $ cat test.py import torch t = torch.tensor(1.5) t.type(torch.float32)[None] $ mypy test.py test.py:4: error: Invalid index type "None" for "Union[str, Tensor]"; expected type "Union[int, slice]" Found 1 error in 1 file (checked 1 source file) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/32353 Differential Revision: D19499388 Pulled By: ezyang fbshipit-source-id: 715111e934aea020b20f850d27e32c4f70b82572	2020-01-21 10:37:56 -08:00
Peter Bell	418ebc827b	Build: Respect USE_CUDNN=0, even if cudnn is found (#32404 ) Summary: Currently, setting `USE_CUDNN=0` has no effect and any cudnn library found on your system will be used anyway. This is especially problematic when your system has multiple CUDA versions installed, and you are building with a version that lacks a matching cudnn. CMake will find any other cudnn versions and you end up with both CUDA versions added to your compiler include paths. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32404 Differential Revision: D19499425 Pulled By: ezyang fbshipit-source-id: a9b3f6f9dc22033481c3c1c5999b1a7ef98468cb	2020-01-21 10:36:03 -08:00
Kimish Patel	ecbf6f99e6	Removed unused weight update in prepack. Moved zero point update to (#32254 ) Summary: qlinear/qconv to be consistent with data update. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32254 Differential Revision: D19422929 Pulled By: kimishpatel fbshipit-source-id: 595a4f7d6fde4978c94f3e720ec8645f3f2bdb7a	2020-01-19 19:08:37 -08:00
Yuxin Wu	b543e3cd6f	support empty batch in group normalization (#32401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32401 https://github.com/pytorch/pytorch/issues/12013 Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- 'test_GroupNorm_empty' Differential Revision: D19463720 fbshipit-source-id: 8ae44590fc5eeb1adc69a2345d7cc2187d3307ac	2020-01-19 19:04:54 -08:00
svcscm	7fbfb7eef2	Updating submodules Summary: GitHub commits: `ea6039a6c9` `0d30b8e0fc` `7acedd4723` `4db6e3b785` `cd898afb5e` `cf5dd11204` `08bdcfd87e` `fc84c09b8f` `454d37976b` `a22e6b8cb4` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: b87550b26e69216be2a8e40870a6e7dab825261c	2020-01-19 03:30:58 -08:00
Yanli Zhao	58234c0254	support torch script call over rpc (#32197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32197 This is to reland https://github.com/pytorch/pytorch/pull/30063, the main change is to match a general exception and grep "pickle" error word in "test_script_functions_not_supported" unit test, as Python 3.5 and Python 3.6 throw different types of errors with different error message for the rpc call in the unit test. [test all]This diff makes following changes: 1. Providing a new set of python rpc privated APIs, they can accept an annotated TorchScript call and this call can be serialized, deserialized and executed in C++ without GIL. These privated APIs will be binded to JIT in the future, and they are different from public APIs as future JIT binded private APIs will be able to accept qualified_name, not callables. These private APIs are subject to be deprecated once JIT supports torch script function to be a JIT type. Also, these APIs require torch script function to be defined and annotated by users in python land, it can not be script class/module constructor or class/module methods. 2. This diff also allows public rpc APIs to accept an annotated TorchScript call and execute code path that above private APIs ran on. Therefore if users invoke an annotated TorchScript call over RPC, this call can be serialized, deserialized and executed in C++ without GIL as well. 3. The above private APIs call a newly defined C++ function to make rpc torch script call to be serialized, deserialized and executed in C++ land. This C++ function returns an ivalue::Future. so that in follow up diff this C++ function can be called when these privated APIs are binded to JIT. 4. script_call.cpp/.h and request_callback_impl.cpp files are refactored accordingly so that torch script call and builtin call can share same message type and codes. 5. refactored deserializeResponse() and added a new utility to deserizalize response to IValue ghstack-source-id: 96879167 ghstack-source-id: 96879167 Test Plan: unit test Differential Revision: D19402374 fbshipit-source-id: 04efcc7c167d08a6503f29efe55e76f2be4b2c5e	2020-01-18 09:24:17 -08:00
James Reed	1ecad2bb2b	Test passing custom class instance to bound method Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32320 Test Plan: Imported from OSS Differential Revision: D19437335 Pulled By: jamesr66a fbshipit-source-id: 8f5166dbe6fc5704b12b6224932460b12be0d39b	2020-01-17 23:09:38 -08:00
James Reed	c7078a1ce8	Fix returning instance of custom class from method Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32312 Test Plan: Imported from OSS Differential Revision: D19433511 Pulled By: jamesr66a fbshipit-source-id: f048d5f60eaba992ee42fea2d318a59b3a156578	2020-01-17 23:09:34 -08:00
James Reed	c7fdf5b251	Remove __torch__ from custom class qualname Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32301 Test Plan: Imported from OSS Differential Revision: D19431645 Pulled By: jamesr66a fbshipit-source-id: 198522a1641cb9f90fa4c614da4ca4162fadf456	2020-01-17 23:09:29 -08:00
James Reed	ceffdbd217	Temporary workaround for BC test due to schema parser changes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32324 Test Plan: Imported from OSS Differential Revision: D19438085 Pulled By: jamesr66a fbshipit-source-id: 3dd2586e73c890a7bdadd6cbb3df2c186f93199d	2020-01-17 23:08:20 -08:00
Nik Ved	61ee8c972f	porting scatter_add to ATen (CPU) (#31662 ) Summary: Fixes [https://github.com/pytorch/pytorch/issues/24758](https://github.com/pytorch/pytorch/issues/24758). Pull Request resolved: https://github.com/pytorch/pytorch/pull/31662 Differential Revision: D19440824 Pulled By: ngimel fbshipit-source-id: b13443cfcc8bcb9ec21f1cddb5c6fbc0ef4bb0f2	2020-01-17 21:36:54 -08:00
davidriazati	53429680d5	Remove stray `@script` (#32235 ) Summary: This should be covered under recursive script now Pull Request resolved: https://github.com/pytorch/pytorch/pull/32235 Pulled By: driazati Differential Revision: D19414889 fbshipit-source-id: 85f8132401dbe44c9dbaef7c0350110f90eb9843	2020-01-17 19:22:09 -08:00
Protonu Basu	8c40a78277	Back out "Calling JITed 8 Bit Fused SLS in FBGEMM from C2" (#32381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32381 Original commit changeset: 0dfa936eb503 "Facebook" Temporary remedy for SEV : https://our.intern.facebook.com/intern/sevmanager/view/s/193726 Test Plan: Run CI tests Reviewed By: jspark1105 Differential Revision: D19458382 fbshipit-source-id: 731790f96b341ade5e70ff13e4b0b5fafad0fea6	2020-01-17 19:08:48 -08:00
svcscm	25e62ebac9	Updating submodules Summary: GitHub commits: `9b13f58aa1` `044b292acc` `e1f67bbf3d` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 21df26f60f436eb8c1766f66afac4a0d93dd33d1	2020-01-17 18:32:53 -08:00
jiej	10c2bd35af	Fix cudnn channels_last descriptors problem (#31952 ) Summary: This is to append fixes to https://github.com/pytorch/pytorch/issues/31783 so we can pull the fixes in without breaking tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31952 Differential Revision: D19433839 Pulled By: ngimel fbshipit-source-id: 5b3d2f0b2a86aacd1d100dd86996ee0d63e5ee92	2020-01-17 17:45:07 -08:00
Jithun Nair	824e649d40	Specify requires_grad for Parameter replica so it's not always set to True by default (#32356 ) Summary: This is the proposed fix for issue https://github.com/pytorch/pytorch/issues/32018 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32356 Differential Revision: D19450648 Pulled By: mrshenli fbshipit-source-id: c63eeb6e9f5a87ebe613dd7013907559f295a7ea	2020-01-17 17:41:10 -08:00
Jiakai Liu	0ac31a99be	run code analysis against mobile interpreter (#32276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32276 Include mobile interpreter in mobile code analysis pass, which has some manually registered ops in temporary namespaces. The mobile interpreter is still under development and these ops will be removed in the future. This is a temporary step for internal build experiment. Test Plan: Imported from OSS Differential Revision: D19426818 Pulled By: ljk53 fbshipit-source-id: 507453dc801e5f93208f1baea12400beccda9ca5	2020-01-17 17:21:28 -08:00
Xiang Gao	5bc44fb6ea	TensorIterator unrolling and vectorized load - step 0, 1 (#31974 ) Summary: This is step 0 and 1 for https://github.com/pytorch/pytorch/issues/31975: - Old code is moved to namespace `legacy` - New `elementwise_kernel` and `launch_kernel` added to namespace `modern`, they only support 1d contiguous case for now - In `gpu_kernel_impl`, dispatch to the new code if the problem is trivial 1d contiguous. In terms of performance, this PR affect elementwise operators on contiguous tensors. The performance is improved slightly (up to 8%) for medium size tensors on Volta. ## compiled code See https://github.com/zasdfgbnm/things/blob/master/2020Q1/disassembly-elementwise.ipynb We can see that, previously, the add kernel compiles to ``` //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 71 /0000/ IMAD.MOV.U32 R1, RZ, RZ, c[0x0][0x28] ; /0010/ @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ; /0020/ S2R R0, SR_TID.X ; //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 73 /0030/ S2R R3, SR_CTAID.X ; /0040/ IMAD R0, R3, 0x200, R0 ; //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 76 /0050/ ISETP.GE.AND P0, PT, R0, c[0x0][0x160], PT ; /0060/ P0 EXIT ; //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 110 /0070/ IMAD R3, R0.reuse, c[0x0][0x194], RZ ; /0080/ IMAD R6, R0, c[0x0][0x198], RZ ; /0090/ IADD3 R4, P0, R3.reuse, c[0x0][0x178], RZ ; /00a0/ IADD3 R2, P1, R6.reuse, c[0x0][0x180], RZ ; /00b0/ LEA.HI.X.SX32 R5, R3, c[0x0][0x17c], 0x1, P0 ; /00c0/ LEA.HI.X.SX32 R3, R6, c[0x0][0x184], 0x1, P1 ; /00d0/ LDG.E.SYS R5, [R4] ; /00e0/ LDG.E.SYS R2, [R2] ; //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 77 /00f0/ IMAD R0, R0, c[0x0][0x190], RZ ; /0100/ IADD3 R6, P0, R0, c[0x0][0x170], RZ ; /0110/ LEA.HI.X.SX32 R7, R0, c[0x0][0x174], 0x1, P0 ; //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 110 /0120/ FFMA R9, R2, c[0x0][0x1a0], R5 ; //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 170 /0130/ STG.E.SYS [R6], R9 ; //## File "/home/xgao/pytorch-master/aten/src/ATen/native/cuda/Loops.cuh", line 81 /0140/ EXIT ; .L_16826: /0150/ BRA `(.L_16826); /0160/ NOP; /0170/ NOP; .L_29063: ``` Now it compiles to ``` //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 210 /0000/ MOV R1, c[0x0][0x28] ; /0010/ @!PT SHFL.IDX PT, RZ, RZ, RZ, RZ ; /0020/ S2R R6, SR_CTAID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 217 /0030/ MOV R7, 0x4 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 208 /0040/ S2R R3, SR_TID.X ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 210 /0050/ LEA R6, R6, R3, 0x8 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 225 /0060/ IADD3 R2, R6.reuse, 0x40, RZ ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 217 /0070/ IMAD.WIDE R4, R6.reuse, R7.reuse, c[0x0][0x190] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 225 /0080/ IADD3 R3, R6, 0x80, RZ ; /0090/ ISETP.GE.AND P1, PT, R2, c[0x0][0x160], PT ; /00a0/ ISETP.GE.AND P0, PT, R6.reuse, c[0x0][0x160], PT ; /00b0/ ISETP.GE.AND P2, PT, R3, c[0x0][0x160], PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 217 /00c0/ IMAD.WIDE R2, R6.reuse, R7, c[0x0][0x188] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 225 /00d0/ IADD3 R14, R6, 0xc0, RZ ; /00e0/ ISETP.GE.AND P3, PT, R14, c[0x0][0x160], PT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 228 /00f0/ @!P1 LDG.E.SYS R11, [R4+0x100] ; /0100/ @!P0 LDG.E.SYS R0, [R2] ; /0110/ @!P0 LDG.E.SYS R9, [R4] ; /0120/ @!P1 LDG.E.SYS R8, [R2+0x100] ; /0130/ @!P2 LDG.E.SYS R10, [R2+0x200] ; /0140/ @!P2 LDG.E.SYS R13, [R4+0x200] ; /0150/ @!P3 LDG.E.SYS R12, [R2+0x300] ; /0160/ @!P3 LDG.E.SYS R15, [R4+0x300] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 245 /0170/ IMAD.WIDE R6, R6, R7, c[0x0][0x180] ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 191 /0180/ FFMA R9, R9, c[0x0][0x168], R0 ; /0190/ FFMA R11, R11, c[0x0][0x168], R8 ; /01a0/ FFMA R13, R13, c[0x0][0x168], R10 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 245 /01b0/ @!P0 STG.E.SYS [R6], R9 ; /01c0/ @!P1 STG.E.SYS [R6+0x100], R11 ; /01d0/ @!P2 STG.E.SYS [R6+0x200], R13 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 191 /01e0/ FFMA R15, R15, c[0x0][0x168], R12 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 244 /01f0/ P3 EXIT ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 245 /0200/ STG.E.SYS [R6+0x300], R15 ; //## File "/home/xgao/pytorch/aten/src/ATen/native/cuda/Loops.cuh", line 248 /0210/ EXIT ; .L_727: /0220/ BRA `(.L_727); /0230/ NOP; /0240/ NOP; /0250/ NOP; /0260/ NOP; /0270/ NOP; .L_32233: ``` ## benchmark The benchmark is for add kernel on Volta. See https://github.com/zasdfgbnm/things/blob/master/2020Q1/benchmark-unroll.ipynb For tensors of size from 2^20 to 2^30, previously we had ``` 1.5.0a0+dedd16b dedd16b4181cae81e37e978cd3bf24c1ba35ca05 33 µs ± 31.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 48.7 µs ± 75 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 78.9 µs ± 122 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 140 µs ± 51.8 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 261 µs ± 71.4 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 506 µs ± 159 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 993 µs ± 189 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.96 ms ± 139 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.9 ms ± 955 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.79 ms ± 187 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Now we have ``` 1.5.0a0+b1a239b b1a239be8d529e89875fe47cd09964ef3a9516ac 30.4 µs ± 18 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 45.2 µs ± 46.5 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 75 µs ± 476 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 134 µs ± 192 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each) 253 µs ± 354 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 489 µs ± 138 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 961 µs ± 431 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 1.91 ms ± 578 ns per loop (mean ± std. dev. of 7 runs, 1000 loops each) 3.8 ms ± 88.8 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) 7.57 ms ± 763 ns per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` It is slightly better. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31974 Differential Revision: D19450765 Pulled By: ngimel fbshipit-source-id: 79601bfceb5da84ff87384ba8193793eb4095a2e	2020-01-17 17:16:23 -08:00
Brian Wignall	f326045b37	Fix typos, via a Levenshtein-type corrector (#31523 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos, with https://github.com/bwignall/typochecker to help automate the checking. Uses an updated version of the tool used in https://github.com/pytorch/pytorch/pull/30606 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/31523 Differential Revision: D19216749 Pulled By: mrshenli fbshipit-source-id: 7fd489cb9a77cd7e4950c1046f925d57524960ea	2020-01-17 16:03:19 -08:00
svcscm	c8ca70e39d	Updating submodules Summary: GitHub commits: `54b290f00f` `e8df50310d` `ef5c9efe12` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 7b6dc88d40e8fd8c396d4d12846db43b0fb4258c	2020-01-17 15:48:29 -08:00
Zachary DeVito	7e3c438913	Renaming IValue List functions (#32093 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32093 toGenericListRef -> toListRef isGenericList -> isList toGenericList -> toList toXListRef -> toXVector Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D19369767 Pulled By: zdevito fbshipit-source-id: 4f0078f95b83e6586524c03f7bcf206722fdd9ae	2020-01-17 15:17:45 -08:00
Rohan Varma	bdd5e15437	skip testExceptions in ProcessGroupGloo if built with TSAN (#32242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32242 TSAN and fork don't play well together, so skip this test if we're building under TSAN. It will still run in other modes. Differential Revision: D19416113 fbshipit-source-id: 7e88d63a843356372160c2524c05e8fd1706553e	2020-01-17 14:17:06 -08:00
svcscm	5a58c16722	Updating submodules Summary: GitHub commits: `29aba0a287` `37a97eb4de` `0efdd57292` `6d886fc7eb` `2e5854752a` `931d1c643b` `781986ef71` `2e6d2903d7` `e04348ff63` `e8650fd560` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: abd7ee4aaec8401b2c885335940773a0655b4496	2020-01-17 12:48:36 -08:00
Yanghan Wang	9b6ec61bfd	exposing CPU/GPU Copy ops (#32248 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32248 expose CPU/GPU copy ops Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:torch_integration_test Reviewed By: houseroad Differential Revision: D19405856 fbshipit-source-id: 1df4aa202e26647cb81e9fe7e4478e594a5f7f3e	2020-01-17 12:40:43 -08:00
Elias Ellison	e7bc1663bd	fix unchecked cast alias analysis (#32309 ) Summary: Unchecked cast just refines the type of a value, the value stays the same, so the output should alias the input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32309 Differential Revision: D19439037 Pulled By: eellison fbshipit-source-id: fe6902d0d9a5a9ef5e9c13e1dbd056576d8c327e	2020-01-17 12:29:28 -08:00
Yinghai Lu	df514fd8c0	C++ C2/Glow operator unittest Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32258 Test Plan: ``` buck test glow/fb/test/numerics:fp16_op_test ``` Reviewed By: bddppq Differential Revision: D19401786 fbshipit-source-id: 1382b5208be6172d3e6f768dedad7ebec31cffc9	2020-01-17 12:13:34 -08:00
Ashkan Aliabadi	e133d8be3b	Fix ASAN / potential segfault in quantized Tensor memory allocations. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29882 Differential Revision: D18522039 Pulled By: AshkanAliabadi fbshipit-source-id: 1fdc68491aa2ac176633b9ecc3ee78c9175a97aa	2020-01-17 12:09:25 -08:00
Alexander Melnikov	4e69352713	Add 64bit atomic fetch add (#32354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32354 adding int_64 version of AtomicFetchAdd Reviewed By: bwasti Differential Revision: D19434349 fbshipit-source-id: b2358e8c5c6b7cd7e7b21de974b4ee1b5258fcf4	2020-01-17 11:43:43 -08:00
Tao Xu	aa61d1ee85	Add a new job to support custom build (#32323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32323 ### Summary Since we have released the custom build in 1.4.0, it's time to setup a CI for that. This PR adds a new iOS job to the iOS builds. To save time, It only runs the arm64 build. ### Test Plan - Don't break any iOS jobs - Custom Build works. Test Plan: Imported from OSS Differential Revision: D19451342 Pulled By: xta0 fbshipit-source-id: 9de305c004fc795710ecf01d436ef4792c07760c	2020-01-17 11:39:08 -08:00
Pavel Belevich	7732924501	Delete unused bernoulli_Tensor from THTensorRandom.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32328 Test Plan: Imported from OSS Differential Revision: D19448736 Pulled By: pbelevich fbshipit-source-id: 92380ca1e0c0ac88d100e6fba8d216a46d0b181e	2020-01-17 11:09:19 -08:00
Jerry Zhang	8c1268aad3	Use default scale/zero_point in fake_quantize module instead of None (#32318 ) Summary: Distributed data parallel can not broadcast None so when we prepare the model for QAT and trying to save the model it will error out. fixes: https://github.com/pytorch/pytorch/issues/32082 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32318 Differential Revision: D19434801 Pulled By: jerryzh168 fbshipit-source-id: ee70abe4c3dcdd3506fb7dd0316aee2fb1705469	2020-01-17 11:04:08 -08:00
anjali411	5b815d980e	Added cummin Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32238 Differential Revision: D19416791 Pulled By: anjali411 fbshipit-source-id: 5aadc0a7a55af40d76f444ab7d7d47ec822f55a5	2020-01-17 10:51:58 -08:00
Xiang Gao	78d8f691ad	Don't dispatch to integral types in smooth_l1_kernel Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32333 Differential Revision: D19442787 Pulled By: ngimel fbshipit-source-id: 9578483202614d7406eceb13cbf15b253c04f237	2020-01-17 10:47:43 -08:00
Rohan Varma	6a5a55d573	use gtest asserts in ProcessGroupGlooTest instead of other checks (#32138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32138 I personally prefer `throw std::runtime_error("BOOM")`, but we should probably have asserts here now that it is gtest. Also ensures that the correct exceptions are thrown by the `testSignal` tests. ghstack-source-id: 96811000 Differential Revision: D19382905 fbshipit-source-id: 1b00dd70524d03c8bd6f48715baa5070a7985467	2020-01-17 10:31:59 -08:00
Nikolay Korovaiko	4968bc2450	cap the maximum depth of bailout chains at 1 (#32073 ) Summary: This is another implementation of the maximum bailout depth. The first version was implemented in https://github.com/pytorch/pytorch/pull/31521 This one has advantages that * the bailout depth only exists in `CodeImpl` which seems to be an appropriate place to keep it in. * threading many objects is reduced to threading through CodeImpl and getPlanFor Pull Request resolved: https://github.com/pytorch/pytorch/pull/32073 Differential Revision: D19443432 Pulled By: Krovatkin fbshipit-source-id: 898384bb2308a1532a50a33d9e05cfca504711e6	2020-01-17 09:42:46 -08:00
svcscm	61a2b34113	Updating submodules Summary: GitHub commits: `2d9c2bb401` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: ea12c419c4bab8ce60793deecb10a8ead086a4d5	2020-01-17 05:54:26 -08:00
Rohan Varma	904ab092c2	fix testSend and testRecv in ProcessGroupGlooTest (#32134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32134 These tests weren't written in the most correct way and were often flaky. It was tricky to identify these tests as flaky until we moved this file to use gtest. The gist of the issue is that the test previously would not coordinate sends and recvs properly. For example, we created a single thread to test an abortRecv and a successful recv. A separate sender thread was used to send 2 messages. What could go wrong here is that the first send could successfully complete, resulting in the receiving end processing the message before it gets the abort signal. In this case we would have an error in the test. ghstack-source-id: 96806879 Differential Revision: D19379395 fbshipit-source-id: 24782ccaf6e6ec6b445378b29d5f10f901e0dee6	2020-01-17 04:00:39 -08:00
Yanli Zhao	7a9c920bac	add lock for ncclCommAbort (#31901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31901 ncclCommAbort is not thread safe, so adding a lock for it ghstack-source-id: 96829715 Test Plan: unit tests Differential Revision: D19293869 fbshipit-source-id: 711b4a07605d6e5a81577247d2f90a78041c1809	2020-01-17 03:57:08 -08:00
David Gisser	91bdb872ce	fix spelling mistake: excpected -> expected Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28817 Differential Revision: D18544562 Pulled By: dgisser fbshipit-source-id: 51f728e807f9c4bb30f58585d5b6f436cb880153	2020-01-17 00:11:08 -08:00
Jing Huang	ef5ae4823a	Register RoIAlignRotated with C10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30785 Reviewed By: wat3rBro Differential Revision: D18415056 fbshipit-source-id: e00376bec948309d53f2172697cd477449f769b2	2020-01-16 16:32:28 -08:00
Nikolay Korovaiko	b79030d6c8	remove unused code after refactoring optimizations into profiling-sensitive and profiling-insensitive (#32106 ) Summary: After we removed `Specialize_AutogradZero` from the optimization pipeline of the simple executor mode, we don't need to mark any inputs as undefined in `autodiff`. Also, `needsGradient` in `graph_executor.cpp` never runs on graph with profiling information, so I removed that code as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32106 Differential Revision: D19374238 Pulled By: Krovatkin fbshipit-source-id: 4223d3efe3c904a55a28471e5ae9593017ce3e07	2020-01-16 16:31:16 -08:00
Xintao Chen	c2761490fc	Enhancing the test (#32321 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32321 Updating the test to test more meaningful sematics Test Plan: [xintchen@devvm6308.prn2 ~/fbsource/fbcode] buck test mode/dev //caffe2:ATen-core-test -- 'OperatorRegistrationTest\.whenRegisteringCPUTensorType_thenCanOnlyCallUnboxedWithCPUTensorIdDispatchKey' Building: finished in 0.4 sec (100%) 517/517 jobs, 0 updated Total time: 0.5 sec Trace available for this run at /tmp/testpilot.20200116-132729.2541763.log TestPilot test runner for Facebook. See https://fburl.com/testpilot for details. Testpilot build revision e5f315ebe0508d11fc281fa4b4f7b43d2ef1c003 fbpkg 67e8eb96914f400db234fd9af70fdcde at Wed Jan 15 23:38:32 2020 by twsvcscm from /usr/local/fbprojects/packages/testinfra.testpilot/762/t.par Discovering tests Running 1 tests Started new test run: https://our.intern.facebook.com/intern/testinfra/testrun/6192449492430045 ✓ caffe2:ATen-core-test - OperatorRegistrationTest.whenRegisteringCPUTensorType_thenCanOnlyCallUnboxedWithCPUTensorIdDispatchKey 0.002 1/1 (passed) Finished test run: https://our.intern.facebook.com/intern/testinfra/testrun/6192449492430045 Summary (total time 1.15s): PASS: 1 FAIL: 0 SKIP: 0 FATAL: 0 TIMEOUT: 0 OMIT: 0 Differential Revision: D19436345 fbshipit-source-id: c1f2383d62627aa4507616b8905ceb42ac563e9d	2020-01-16 15:56:34 -08:00
Nikolay Korovaiko	53708e21ed	classic fixed-point liveness Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31724 Differential Revision: D19426570 Pulled By: Krovatkin fbshipit-source-id: 3387dfb25e6e9456d5d0517eac1d2e44e61d6813	2020-01-16 15:13:22 -08:00
Tao Xu	8c8bd79f32	Add CI scripts for Custom Build (#32316 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32316 ### Summary Since the Custom Build has been released in 1.4.0, it's time setup CI. To do that, we need 1. Add a python script to generate the yaml file 2. Add new build scripts to circle CI (arm64 only). ### Test Plan - Don't break the current iOS CIs Test Plan: Imported from OSS Differential Revision: D19437362 Pulled By: xta0 fbshipit-source-id: 395e27a582c43663af88d11b1ef974a4687e672c	2020-01-16 14:46:16 -08:00
Edward Yang	34c751c263	Eliminate exception throwing code from dispatch call sites (#32168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32168 We move the exception raising into the function, saving us a big pile of instructions for raising the stack. After this stack of changes, the compiler is willing to inline, e.g., `c10::KernelFunction::callUnboxed<at::Tensor, at::Tensor const&>(c10::OperatorHandle const&, at::Tensor const&) const::__func__` (whereas previously it refused to do so.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19392948 Pulled By: ezyang fbshipit-source-id: d5edab00cae48444b308e74438a17a421532c08f	2020-01-16 14:43:16 -08:00
Edward Yang	b85dbe8f7b	Out-of-line construction of OperatorName. (#32121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32121 This reduces code size in the call sites of this function (of which there are many: one for every operator call) since we no longer have to construct std::string at the site. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19392951 Pulled By: ezyang fbshipit-source-id: 8bc43d46ba635380ff9f8989f7557fdd74b552cf	2020-01-16 14:43:12 -08:00
Edward Yang	36d09197ab	Move error reporting code out-of-line from header. (#32118 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32118 This reduces code size and makes the calling function more likely to inline. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19392950 Pulled By: ezyang fbshipit-source-id: 5e3829cca5604407229f93c2486eb9a325581ea2	2020-01-16 14:43:07 -08:00
Edward Yang	7b7390778c	Make an assert on a hotpath trigger only in DEBUG mode. (#32117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32117 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19392949 Pulled By: ezyang fbshipit-source-id: 7f579e45d49bddeab36b8dd1a90c83224a368ac8	2020-01-16 14:42:18 -08:00
Xiang Gao	8746f90cf6	Fix weight backward for cudnn conv of large tensor (#31889 ) Summary: This is the last PR for https://github.com/pytorch/pytorch/issues/22496 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31889 Differential Revision: D19431371 Pulled By: ngimel fbshipit-source-id: 754fa91d49ad03549cb07aa30dde34bf9e851302	2020-01-16 14:15:52 -08:00
David Clissold	b26ee54176	For ppc64le, stop presenting the python 2.7 builds (we will no longer… (#32315 ) Summary: For ppc64le, we no longer plan to run regular builds on Python 2.7, and we wish to stop publicizing the build status for those two builds (ppc64le/CPU and ppc64le/GPU each on py27). This pull request simply removes the build status links for these two builds, replacing them with a generic dash character (consistent with other un-publicized builds within the table). Pull Request resolved: https://github.com/pytorch/pytorch/pull/32315 Differential Revision: D19435939 Pulled By: soumith fbshipit-source-id: c9f31e7acba83e42f6a758ac011bbef36fd8aaa0	2020-01-16 13:49:40 -08:00
Hugo	cd99b3706a	Pin Pillow to latest and use a torchvision that works with it (#32290 ) Summary: Follow on from https://github.com/pytorch/pytorch/pull/31777, as suggested in https://github.com/pytorch/pytorch/pull/31777#issuecomment-575166543. Pillow 7.0.0 removed `PILLOW_VERSION` and `__version__` should be used instead. torchvision 0.5.0 switched from using `PILLOW_VERSION` to `__version__`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32290 Differential Revision: D19430280 Pulled By: mrshenli fbshipit-source-id: be8d6317a4948d71e818adeafe61dfe567df5601	2020-01-16 10:48:22 -08:00
Gaurav Singh	f94aab45fd	Logical condition reduction (#32201 ) Summary: x \|\| ( !x && y ) <=> to x \|\| y Pull Request resolved: https://github.com/pytorch/pytorch/pull/32201 Differential Revision: D19429334 Pulled By: ezyang fbshipit-source-id: 044dc46c2d9a7e180aa1795703c0097b0c7c3585	2020-01-16 07:57:12 -08:00
Vadim Kantorov	14548c2d5b	out variant for native_batch_norm forward (#29192 ) Summary: This is dealing with forward of native BatchNorm CUDA impl to support inplace operation. The larger issue: https://github.com/pytorch/pytorch/issues/26288 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/29192 Differential Revision: D19410370 Pulled By: ezyang fbshipit-source-id: a6889c96bdd848f3a1cb2d943d06e054d22fb7ab	2020-01-16 07:24:13 -08:00
Nathan Goldbaum	bab87e4b60	reimplement __torch_function__ overrides for torch.functional using inline logic (#32194 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30831. This improves the performance of operators in the `torch.functional` namespace that are overridable by `__torch_function__` implementations when supplied with `Tensor` operands. Running the split benchmark in various configurations produces the following timings: <details> <summary>Expand for timings on <code>master</code> </summary> ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: split # Mode: Eager # Name: split_M8_N8_parts2_cpu # Input: M: 8, N: 8, parts: 2, device: cpu Forward Execution Time (us) : 3.340 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M8_N8_parts2_cuda # Input: M: 8, N: 8, parts: 2, device: cuda Forward Execution Time (us) : 3.333 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M256_N512_parts2_cpu # Input: M: 256, N: 512, parts: 2, device: cpu Forward Execution Time (us) : 3.366 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M256_N512_parts2_cuda # Input: M: 256, N: 512, parts: 2, device: cuda Forward Execution Time (us) : 3.385 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M512_N512_parts2_cpu # Input: M: 512, N: 512, parts: 2, device: cpu Forward Execution Time (us) : 3.468 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M512_N512_parts2_cuda # Input: M: 512, N: 512, parts: 2, device: cuda Forward Execution Time (us) : 3.416 ``` </details> <details> <summary>Expand for timings with this pull request applied</summary> ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: split # Mode: Eager # Name: split_M8_N8_parts2_cpu # Input: M: 8, N: 8, parts: 2, device: cpu Forward Execution Time (us) : 2.261 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M8_N8_parts2_cuda # Input: M: 8, N: 8, parts: 2, device: cuda Forward Execution Time (us) : 2.223 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M256_N512_parts2_cpu # Input: M: 256, N: 512, parts: 2, device: cpu Forward Execution Time (us) : 2.237 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M256_N512_parts2_cuda # Input: M: 256, N: 512, parts: 2, device: cuda Forward Execution Time (us) : 2.218 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M512_N512_parts2_cpu # Input: M: 512, N: 512, parts: 2, device: cpu Forward Execution Time (us) : 2.259 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M512_N512_parts2_cuda # Input: M: 512, N: 512, parts: 2, device: cuda Forward Execution Time (us) : 2.234 ``` </details> <details> <summary>Expand for timings on <code>master</code> with <code>__torch_function__</code> dispatch disabled </summary> ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: split # Mode: Eager # Name: split_M8_N8_parts2_cpu # Input: M: 8, N: 8, parts: 2, device: cpu Forward Execution Time (us) : 2.180 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M8_N8_parts2_cuda # Input: M: 8, N: 8, parts: 2, device: cuda Forward Execution Time (us) : 2.172 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M256_N512_parts2_cpu # Input: M: 256, N: 512, parts: 2, device: cpu Forward Execution Time (us) : 2.171 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M256_N512_parts2_cuda # Input: M: 256, N: 512, parts: 2, device: cuda Forward Execution Time (us) : 2.146 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M512_N512_parts2_cpu # Input: M: 512, N: 512, parts: 2, device: cpu Forward Execution Time (us) : 2.175 # Benchmarking PyTorch: split # Mode: Eager # Name: split_M512_N512_parts2_cuda # Input: M: 512, N: 512, parts: 2, device: cuda Forward Execution Time (us) : 2.152 ``` </details> So at least on the machine I'm testing on, this brings the overhead down to less than 100 ns. For comparison, the overhead for `__array_function__` in NumPy is about 850 ns on the same machine. <details> <summary>Expand for timings for NumPy <code>__array_function__</code> dispatch </summary> ``` In [1]: import numpy as np In [2]: %timeit np.mean([1]) 8.89 µs ± 17.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [3]: %timeit np.mean._implementation([1]) 8.04 µs ± 28.2 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) ``` See [the implementation in NumPy](https://github.com/numpy/numpy/blob/master/numpy/core/overrides.py#L195) for why this measures `__array_function__` overhead. </details> Pull Request resolved: https://github.com/pytorch/pytorch/pull/32194 Differential Revision: D19410396 Pulled By: ezyang fbshipit-source-id: ada788a4399c81cd7eb2d548aa04a2459e96634a	2020-01-16 07:10:38 -08:00
Xintao Chen	7df5dc2775	Creating callUnboxedWithDispatchKey method (#32198 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32198 creating a method called "callUnboxedWithDispatchKey". Also adding tests to make sure it works. Test Plan: buck test mode/dev //caffe2:ATen-core-test Differential Revision: D19402815 fbshipit-source-id: b206cf04b1216fbbd5b54ac79aef495cb0c1be06	2020-01-16 01:37:41 -08:00
Yinghai Lu	d75b6b3f9d	Support shape inference and lowering of SparseLengthsWeightedSumFused4BitRowwise (#32257 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32257 Pull Request resolved: https://github.com/pytorch/glow/pull/4018 att. Test Plan: Unit tests: ``` buck test glow:masterCaffe2ImporterTest -- caffe2.SparseLengthsSumFused4BitRowwise buck test caffe2/caffe2/opt:bound_shape_inference_test ``` Reviewed By: jfix71 Differential Revision: D19389014 fbshipit-source-id: 5f6863443adee5d3bf7a50a105866441eefb9560	2020-01-15 23:49:06 -08:00
svcscm	f3b62d4b1c	Updating submodules Summary: GitHub commits: `191bbb1069` `9d5a6e33e3` `2bdfe1544a` `1600bee8de` `b7f1b3e51c` `3220376f13` `1ba747dfb4` `0d5b08cbfc` `481179a38e` `9bc4f9c40f` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 79135519c3449c2b77ff1ca7d4f13724e2390f6e	2020-01-15 21:37:32 -08:00
h6197627	851a7e861b	Add CAFFE2_API to video decoding functions (#31187 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31132 Also closes old issue https://github.com/pytorch/pytorch/issues/11735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31187 Differential Revision: D19147172 Pulled By: pbelevich fbshipit-source-id: e959058eec3489061f431fbecc99ded0d4dc1704	2020-01-15 19:39:02 -08:00
svcscm	89c6e18c43	Updating submodules Summary: GitHub commits: `9915834ced` `3cdb0d61d6` `93a4e9f4cc` `dafd450683` `b5d5670e40` `bab52dcc84` `d2b4d42d4b` `83479196c3` `f2ec66095a` `99561fee3b` `eacaa4f35d` `4ce4667b20` `89291814cc` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 2a3c90f0a7615441dae746b18b9048cfddf0f4de	2020-01-15 17:54:21 -08:00
Michael Suo	90c65b81c3	Define `repr()` on IValues (#32232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32232 Previously, we were using `operator<<` as the default way of printing IValue constants during serialization. The semantics of `operator<<` were ill-defined; and this bit us in particular with strings and lack of quoting. This PR defines the role of `operator<<`: much like Python `str()`, it is intended to produce a human-readable-ish representation for debugging purposes. This PR also defines a new `repr()` function on IValue that is intended to produce a valid Python expression that can be used to recreate an object with the same value. `repr()` is not defined on all IValue kinds (notably tensors!) for this reason. Test Plan: Imported from OSS Differential Revision: D19417036 Pulled By: suo fbshipit-source-id: c102d509eaf95a28b6a62280bc99ca6f09603de5	2020-01-15 17:35:41 -08:00
Ivan Kobzarev	104b2c610b	Tensor prep from image in native (#31426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31426 Tensor convertion from YUV image is moved to native with optimizations to eliminate branching inside loop, no variables declaration, less ops. Perf stat from local devices - measuring converting 320x240 image from camera to 1,3,224,224 tensor; Legend: Java - current java impl JavaOpt - current java impl + the same optimizations with no if/else in for, declare variables outside of for, inlining etc. C - C impl ``` Nexus 5 JavaOpt N:25 avg:119.24 min: 87 max:177 p10:102 p25:105 p50:115 p75:127 p90:150 C N:25 avg: 17.24 min: 14 max: 39 p10: 14 p25: 15 p50: 15 p75: 16 p90: 23 Java N:25 avg:139.96 min: 70 max:214 p10: 89 p25:110 p50:139 p75:173 p90:181 avg C vs JavaOpt 6.91x Pixel 3 XL JavaOpt N:19 avg: 16.11 min: 12 max: 19 p10: 14 p25: 15 p50: 16 p75: 18 p90: 19 C N:19 avg: 5.79 min: 3 max: 10 p10: 4 p25: 5 p50: 6 p75: 6 p90: 9 Java N:19 avg: 16.21 min: 12 max: 20 p10: 14 p25: 15 p50: 16 p75: 18 p90: 20 avg C vs JavaOpt 2.78x Full build with 4 abis inside: Pixel 3 XL JavaOpt N:25 avg: 18.84 min: 16 max: 24 p10: 16 p25: 17 p50: 18 p75: 20 p90: 22 C N:25 avg: 7.96 min: 5 max: 10 p10: 7 p25: 7 p50: 8 p75: 9 p90: 9 avg C vs JavaOpt 2.36x ``` Test Plan: Imported from OSS Differential Revision: D19165429 Pulled By: IvanKobzarev fbshipit-source-id: 3b54e545f6fbecbc5bb43216aca81061e70bd369	2020-01-15 17:10:00 -08:00
Ivan Kobzarev	de5821d291	Torchscript print to logcat (#31456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31456 External request https://discuss.pytorch.org/t/jit-android-debugging-the-model/63950 By default torchscript print function goes to stdout. For android it is not seen in logcat by default. This change propagates it to logcat. Test Plan: Imported from OSS Differential Revision: D19171405 Pulled By: IvanKobzarev fbshipit-source-id: f9c88fa11d90bb386df9ed722ec9345fc6b25a34	2020-01-15 16:44:56 -08:00
Tao Xu	31b7d0873c	Add File existence checking (#32208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32208 ### Summary Since the master branch will generate `libtorch_cpu.a`, which is different from the release branch. This PR will skip the missing libs before archiving them. ### Test Plan - don't break the nightly build Test Plan: Imported from OSS Differential Revision: D19420042 Pulled By: xta0 fbshipit-source-id: fb28df17b7e95d5c7fdf5f3a21bece235d7be17c	2020-01-15 15:35:50 -08:00
neginraoof	8b4c695e47	Added cons folding for ONNX mul, div, sqrt ops (#32077 ) Summary: An example of a model with such leaf nodes is faster_rcnn model. This PR helps optimizing onnx ops. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32077 Reviewed By: hl475 Differential Revision: D19399622 Pulled By: houseroad fbshipit-source-id: 35c628c6f1514b79f1bcf7982c25f0f4486f8941	2020-01-15 15:31:34 -08:00
neginraoof	ffc8e255c4	Sort export w/ negative axes (#31971 ) Summary: Fixing export of Sort on negative axes Pull Request resolved: https://github.com/pytorch/pytorch/pull/31971 Reviewed By: hl475 Differential Revision: D19325874 Pulled By: houseroad fbshipit-source-id: 18ab2bf39221970c8ab65a1355f5759f88faa54f	2020-01-15 15:13:23 -08:00
Negin Raoof	4460a86cd6	Support op registration if name starts with underscore (_) (#32017 ) Summary: This is required for rehistering torchvision::_new_empty_tensor op Pull Request resolved: https://github.com/pytorch/pytorch/pull/32017 Reviewed By: hl475 Differential Revision: D19399606 Pulled By: houseroad fbshipit-source-id: 43e1f2d78d2a0310af347b42f7e9b54cd503a20d	2020-01-15 14:57:57 -08:00
Will Feng	01010f5705	Add comments to torch::nn::ConvTranspose{1,2,3}d modules explaining how to use them in a Sequential module (#32223 ) Summary: Following changes in https://github.com/pytorch/pytorch/pull/31005. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32223 Differential Revision: D19415328 Pulled By: yf225 fbshipit-source-id: f6f74f10ba3b5cc7e1a92f8b02ea4c9747018ae8	2020-01-15 14:53:33 -08:00
Edward Yang	a5161c7022	Update out-of-date comment on Docker image updates. (#32224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32224 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19416878 Pulled By: ezyang fbshipit-source-id: 0205d0635658a3328128dcaad94bbbef505342be	2020-01-15 14:30:58 -08:00
Shen Li	322f34b245	Adding DDP Design Note Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32158 Test Plan: Imported from OSS Differential Revision: D19405980 Pulled By: mrshenli fbshipit-source-id: 808ef1c71b637546f8872375bf1828967b1a5a60	2020-01-15 14:10:45 -08:00
Alexander Golynski	74621ca926	Add allgather_base as per our discussion re: ProcessGroup interface. (#31892 ) Summary: Introduce ProcessGroup::allgather_base. No implementation yet: plan to add it one PG backend at a time in a follow up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31892 Test Plan: No functional changes, no tests yet. Differential Revision: D19290739 Pulled By: agolynski fbshipit-source-id: c2f4947d2980995724c539de7c6d97618e1ba11a	2020-01-15 14:05:23 -08:00
Alban Desmaison	81048c41ab	remove simple .data from torch/nn Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31482 Test Plan: Imported from OSS Differential Revision: D19303243 Pulled By: albanD fbshipit-source-id: 5afdfeb4b8382c09b9ec65acd545148ed76d4285	2020-01-15 12:40:38 -08:00
Chetan Kandpal	3363ca20a7	example_outputs Doc Edit (#31826 ) Summary: torch.onnx.export docs contain two descriptions for 'example_outputs' arg. So combined the information for it with the description with the parameters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31826 Differential Revision: D19274928 Pulled By: zou3519 fbshipit-source-id: cbcce0a79c51784c1d7aa8981aab8aac118ca9b4	2020-01-15 12:34:34 -08:00
Shihao Xu	3d01e3d16f	Notify other threads before running callbacks (#31713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31713 - In case the callbacks are heavy/slow, the other threads should be able to start work on the value of the future after the current thread moves the value and unlock the mutex. - `completed()` is not inlined. Avoid function call overhead. ghstack-source-id: 96694593 Test Plan: tdb Differential Revision: D5624371 fbshipit-source-id: 5762e6e894d20108ec9afedd1a6e64bcd97ee3fe	2020-01-15 12:03:07 -08:00
Tim Gates	0392e8384b	Fix simple typo: whos -> whose (#31288 ) Summary: Closes https://github.com/pytorch/pytorch/issues/31287 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31288 Differential Revision: D19166753 Pulled By: zou3519 fbshipit-source-id: da31ad323b8fafa7cbc502fda4e2eb6e02facfb6	2020-01-15 11:47:21 -08:00
Jerry Zhang	4314620ba0	[jit] Module clone work with shared ClassType (#31970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31970 Now that the ClassType can be shared among different module instances, we'll preserve the sharing in clone as well, that is if the original module has a ClassType that is shared, we'll clone this ClassType once and share it between different module instances as well. Test Plan: build/test/test_jit Imported from OSS Differential Revision: D19406251 fbshipit-source-id: 2881c695f6e718e5432040a3817cf187a62017bf	2020-01-15 11:24:53 -08:00
Pavel Belevich	62b06b9fae	Rename TensorTypeId to DispatchKey (#32154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32154 TensorTypeId -> DispatchKey c10/core/TensorTypeId.h -> c10/core/DispatchKey.h c10/core/TensorTypeId.cpp -> c10/core/DispatchKey.cpp TensorTypeId::* -> DispatchKey::* TensorTypeId type_id -> DispatchKey dispatch_key type_id -> dispatch_key TensorTypeId::NumTensorIds -> DispatchKey::NumDispatchKeys RealTensorTypeId -> RealDispatchKey TensorTypeSet -> DispatchKeySet TensorTypeIds -> DispatchKeys c10/core/TensorTypeSet.h -> c10/core/DispatchKeySet.h c10/core/TensorTypeSet.cpp -> c10/core/DispatchKeySet.cpp type_set() -> key_set() type_set_ -> key_set_ typeSet -> keySet ExcludeTensorTypeIdGuard -> ExcludeDispatchKeyGuard IncludeTensorTypeIdGuard -> IncludeDispatchKeyGuard LocalTensorTypeSet -> LocalDispatchKeySet c10/core/impl/LocalTensorTypeSet.h -> c10/core/impl/LocalDispatchKeySet.h c10/core/impl/LocalTensorTypeSet.cpp -> c10/core/impl/LocalDispatchKeySet.cpp tls_local_tensor_type_set -> tls_local_dispatch_key_set tls_is_tensor_type_id_excluded -> tls_is_dispatch_key_excluded tls_set_tensor_type_id_excluded -> tls_set_dispatch_key_excluded tls_is_tensor_type_id_included -> tls_is_dispatch_key_included tls_set_tensor_type_id_included -> tls_set_dispatch_key_included MultiDispatchTensorTypeSet -> MultiDispatchKeySet multi_dispatch_tensor_type_set -> multi_dispatch_key_set tensorTypeIdToBackend -> dispatchKeyToBackend backendToTensorTypeId -> backendToDispatchKey initForTensorTypeSet -> initForDispatchKeySet inferred_type_set -> inferred_key_set computeTensorTypeId -> computeDispatchKey PODLocalTensorTypeSet raw_local_tensor_type_set -> PODLocalDispatchKeySet raw_local_dispatch_key_set get_default_tensor_type_id -> get_default_dispatch_key inferred_type_id -> inferred_dispatch_key actual_type_id -> actual_dispatch_key typeSetToDispatchKey_ -> dispatchKeySetToDispatchKey_ get_type_id() -> get_dispatch_key() legacyExtractTypeId -> legacyExtractDispatchKey extractTypeId -> extractDispatchKey Test Plan: Imported from OSS Differential Revision: D19398900 Pulled By: pbelevich fbshipit-source-id: 234ad19f93d33e00201b61e153b740a339035776	2020-01-15 11:16:08 -08:00
Shu Liu	8c3ee9f2ba	[Python] Deprecate use of scipy.misc.logsumexp and scipy.misc.comb (#32209 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32209 * Deprecate use of scipy.misc.logsumexp and scipy.misc.comb. * Removed in 1.0.0 https://docs.scipy.org/doc/scipy-1.1.0/reference/generated/scipy.misc.logsumexp.html and https://docs.scipy.org/doc/scipy-1.2.1/reference/generated/scipy.misc.comb.html * Use scipy.special.logsumexp and scipy.special.comb instead. * This diff updates most usages of except those in experimental folders. * This diff does NOT fix existing lint/code/TARGETS issues. * This diff does NOT autoformat codes. Test Plan: sandcastle auto unittests Differential Revision: D19406460 fbshipit-source-id: 2103fa0d674d9671a0175f4ce54b3c887d22f04e	2020-01-15 10:40:47 -08:00
Vamshi Chowdary	05088da8e9	[pytorch][PR] Fixed error in sample code of documentation (#31682 ) Summary: "in_features" and "out_features" are not defined. Possibly a typo. They should be "input_features" and "output_features" instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/31682 Differential Revision: D19251685 Pulled By: zou3519 fbshipit-source-id: ac9e524e792a1853a16e8876d76b908495d8f35e	2020-01-15 10:34:07 -08:00
Alban Desmaison	ef0f96e92f	[pytorch][PR] update comment in autograd.h for locking (#32222 ) Summary: Just update the comment to make it accurate. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32222 Differential Revision: D19410428 Pulled By: albanD fbshipit-source-id: ad13596382613c2728e674a47049ea4f563964b9	2020-01-15 09:42:24 -08:00
Richard Zou	19bbb4fccb	Stop building documentation in pytorch_linux_xenial_cuda*_build (#32187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32187 Fixes #32058. Previously we would build documentation during the pytorch linux cuda build. We don't actually need to do this because we have a dedicated python_doc_build job that builds the docs. With this change, the CUDA build should run ~10 minutes faster, giving devs faster signal. Test Plan: - Check the CUDA (10.1) build on this PR, make sure it doesn't build the docs. Differential Revision: D19400417 Pulled By: zou3519 fbshipit-source-id: e8fb2b818146f33330e06760377a9afbc18a71ed	2020-01-15 07:48:42 -08:00
Elias Ellison	4dce482acb	dict type unification fix (#32185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32185 Previously we would unify the contained types of dictionaries, however this breaks type safety. ``` torch.jit.script def test(input: Dict[str, None], cond): if cond: out = input else: out: {"1": 1} out["hi"] = 3 ``` This would only occur if a dictionary is being re-assigned across an if condition with different contained types, which is pretty unlikely. I tested `model_backward_compatibility` for all fb models and this didn't break anything. This PR is a precursor to alias analysis changes. Also fixes `Future` type unification. Because `Future` is an immutable type, it is okay to unify the contained type. Test Plan: Imported from OSS Differential Revision: D19398585 Pulled By: eellison fbshipit-source-id: ebc8812cdf5b6dba37b1cfbc2edc7d8c467b258c	2020-01-14 23:02:05 -08:00
Elias Ellison	c70bb0a4f8	Fixes to prim ops (#32179 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32179 Tensors are used as keys in dictionaries, so we need to annotate that key insertion into a dictionary inserts the key into the wildcard set. Also fixes bug with `listCopyAndSort` not copying the input list. Test Plan: Imported from OSS Differential Revision: D19397555 Pulled By: eellison fbshipit-source-id: 17acdc22ff5e2dda44fd25c80450396f5592095e	2020-01-14 22:58:29 -08:00
Jongsoo Park	879620e85e	[caffe2] fix how np.clip is used in lengths_reducer_fused_{4,8}_rowwise_ops_test (#32086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32086 np.clip(1, num_indices // 2, 10) -> np.clip(num_indices // 2, 1, 10) Also change batchsize -> num_rows to match with what the variable actually does Test Plan: CI Reviewed By: hx89 Differential Revision: D19361521 fbshipit-source-id: 9ce864c7d7da046dc606afa5207da677ccf80f52	2020-01-14 22:53:28 -08:00
James Donald	7ad03855dc	Fix 'template' keyword warning with clang-cl and clang.exe (#32104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32104 Fixes these warnings: ``` xplat\caffe2\caffe2Windows#header-mode-symlink-tree-only,headers\caffe2\operators\quantized\int8_conv_op.h(96,17): warning: use 'template' keyword to treat 'data' as a dependent template name W.t.data<uint8_t>(), ^ template xplat\caffe2\caffe2Windows#header-mode-symlink-tree-only,headers\caffe2\operators\quantized\int8_conv_op.h(97,17): warning: use 'template' keyword to treat 'data' as a dependent template name B.t.data<int32_t>(), ^ template ``` Test Plan: Tested locally with clang-cl and CI for other toolchains Reviewed By: boguscoder Differential Revision: D19353563 fbshipit-source-id: c28afb8c1ad72fd77ef82556ba89fcf09100d1f9	2020-01-14 20:09:35 -08:00
Shihao Xu	02f09a1bbd	Implement backend-agnostic rpc._wait_all_workers() utility (#32190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32190 We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent. - Sort worker names. - Elect the first name as the leader in the ordered worker names. - Followers reports therir intent to synchronize to the leader. - Leader also reports to itself, when `_wait_all_workers()` called. - If all workers report their intent to proceed, leader send the command to every one to proceed. ghstack-source-id: 96693296 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn buck-out/gen/caffe2/test/rpc_spawn\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_spawn\#binary.par -r test_rref_leak ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_worker_id ``` # Stress runs ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` Differential Revision: D19399908 fbshipit-source-id: 1dee607cd49adafe88534621a1c85e2736e2f595	2020-01-14 19:19:14 -08:00
Rohan Varma	7572501d40	move ProcessGroupGlooTest to gtest (#32133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32133 We should do this to better debug the test. Differential Revision: D19375479 fbshipit-source-id: 8c2bf61bae605a38252bb793b091ade479bea11a	2020-01-14 17:42:42 -08:00
anjali411	8dc67a014f	Add cummax Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32169 Differential Revision: D19393236 Pulled By: anjali411 fbshipit-source-id: 5dac6b0a4038eb48458d4a0b253418daeccbb6bc	2020-01-14 17:19:10 -08:00
Nikolay Korovaiko	02c3493a84	Fix an invalid peephole transformation if input/output values are written to (#28455 ) Summary: fixes https://github.com/pytorch/pytorch/issues/28360 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28455 Differential Revision: D19374601 Pulled By: Krovatkin fbshipit-source-id: 622f24b40aba03e79e55a6b8d25d88417f7d8bad	2020-01-14 16:28:07 -08:00
Will Feng	2bd179147a	Fix typo in config script to re-enable libtorch build and test in macOS CI (#32072 ) Summary: Currently, libtorch build and test are not running in macOS CI. This PR fixes the issue. Test Plan: Check that libtorch build and test are running again in macOS CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32072 Differential Revision: D19391909 Pulled By: yf225 fbshipit-source-id: 1ab345b099869f78e1124f1a8bd185fa51371b6a	2020-01-14 16:23:57 -08:00
Lu Fang	f6f1e0aef5	Automatic update of fbcode/onnx to 65020daafa9183c769938b4512ce543fd5740f8f (#32125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32125 Previous import was 57ebc587fcf3913b4be93653b0dd58c686447298 Included changes: - [65020daa](https://github.com/onnx/onnx/commit/65020daa): better error message for undefined inputs (#2540) <Yuxin Wu> - [8afff0e9](https://github.com/onnx/onnx/commit/8afff0e9): bump ORT version (#2538) <Lu Fang> - [3d9ca57e](https://github.com/onnx/onnx/commit/3d9ca57e): fix name of directory (#2537) <Prasanth Pulavarthi> - [df8fa2c9](https://github.com/onnx/onnx/commit/df8fa2c9): Repository guidelines (#2539) <Prasanth Pulavarthi> - [49cc2f02](https://github.com/onnx/onnx/commit/49cc2f02): Update CircleCI job to use Python3.6 (#2527) <bddppq> - [25ff79a4](https://github.com/onnx/onnx/commit/25ff79a4): Fix wrong model version, it's not 12 (the onnx_opset_version()), not 11 (the opset version of the latest stable), but 10 (#2478) <daquexian> - [7cebaed5](https://github.com/onnx/onnx/commit/7cebaed5): Fix Windows py3.5 CI (#2529) <bddppq> - [eddae00e](https://github.com/onnx/onnx/commit/eddae00e): Correct the order of arguments of InferShapes (#2500) <Shinichiro Hamaji> - [41b5afe6](https://github.com/onnx/onnx/commit/41b5afe6): Include <ostream> in common/status.h (#2519) <Casey Carter> - [423f1977](https://github.com/onnx/onnx/commit/423f1977): add 8 bit support to maxpool op (#2510) <Ashwini Khade> - [78593c2f](https://github.com/onnx/onnx/commit/78593c2f): add 8 bit support to reducemin and reducemax ops (#2516) <Ashwini Khade> Test Plan: cont build Reviewed By: benoitsteiner Differential Revision: D19380034 fbshipit-source-id: ddce8450864a611773b2a32e2f0254c9bb6b6906	2020-01-14 15:21:37 -08:00
davidriazati	f3b67bf750	Fix frontend kwarg defualts error (#32146 ) Summary: This was not tested before, fixes #32139 (which was actually a false positive, functions with kwargs but without defaults on those kwargs are supported). This PR adds testing for both cases and cleans up the error reporting. ](https://our.intern.facebook.com/intern/diff/19385828/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/32146 Pulled By: driazati Differential Revision: D19385828 fbshipit-source-id: 5eab74df6d02f8e1d7ec054cafb44f909f9d637e	2020-01-14 14:59:36 -08:00
Tao Xu	ecc3497172	Update Gemfile (#32147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32147 ### Summary Got some security warnings regarding the ruby dependencies. This diff updates the packages in Gemfile. ``` GitHub has detected that a package defined in the ios/TestApp/Gemfile.lock file of the pytorch/pytorch repository contains a security vulnerability. Package name: excon Affected versions: < 0.71.0 Fixed in version: 0.71.0 Severity: LOW Identifier(s): GHSA-q58g-455p-8vw9 CVE-2019-16779 ``` ### Test Plan - Won't affect the existing iOS CI jobs Test Plan: Imported from OSS Differential Revision: D19400087 Pulled By: xta0 fbshipit-source-id: 34b548d136cfd6b68fcc53bf0b243461bd7afd64	2020-01-14 14:52:50 -08:00
Martin Yuan	9bf0479b65	Fix the passing-by-ref constructor of OperatorName. (#32170 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32170 Stack from [ghstack](https://github.com/ezyang/ghstack): Change the overload name from passing by const ref to by value and move. * #32170 Fix the passing-by-ref constructor of OperatorName. Test Plan: Imported from OSS Differential Revision: D19396225 Pulled By: iseeyuan fbshipit-source-id: e946c47647e1f8d23d7565cfe93f487845e7f24c	2020-01-14 13:52:12 -08:00
Michael Suo	51a34545e9	Revert D18482934: support torch script call over rpc Test Plan: revert-hammer Differential Revision: D18482934 Original commit changeset: bd82a0d820c4 fbshipit-source-id: ca5e50fb0a883ee311aeb310198d84ad28062158	2020-01-14 13:30:56 -08:00
Tao Xu	4a26bb9b18	Suppress pip logs (#31912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31912 ### Summary Clean up the logs from pip-install. ### Test Plan - Don't break the iOS simulator build Test Plan: Imported from OSS Differential Revision: D19395526 Pulled By: xta0 fbshipit-source-id: a638a209cab801ce90c8615e7ea030b1ab0939f3	2020-01-14 12:04:53 -08:00
Michael Ranieri	2bb9dbeffa	omit constexpr with nvcc on clang (#32149 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32149 This is an attempt at clarifying some of the preprocessor boolean logic that was getting more and more complicated. The previous logic used constexpr with nvcc on clang; which we were getting compiler failures on in ovrsource with mode/linux/* (based on platform007). Test Plan: ovrsource xplat/caffe2 compiles fbsource sandcastle green Differential Revision: D19385409 fbshipit-source-id: 60a02bae9854388b87510afdd927709673a6c313	2020-01-14 11:49:16 -08:00
Peter Bell	b0ac425dc4	Emit warning from deprecated torch function signatures (#32009 ) Summary: Continuation of https://github.com/pytorch/pytorch/issues/31514, fixes https://github.com/pytorch/pytorch/issues/28430 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32009 Test Plan: I verified that the deprecation warnings only occur once on a relevant workflow. Built with: ``` buck build mode/opt //vision/fair/detectron2/tools:train_net ``` Ran with: ``` DETECTRON2_ENV_MODULE=detectron2.fb.env ~/local/train_net.par --config-file configs/quick_schedules/retinanet_R_50_FPN_instant_test.yaml --num-gpus 1 SOLVER.IMS_PER_BATCH 2 ``` Inspected log: ``` [01/14 07:28:13 d2.engine.train_loop]: Starting training from iteration 0 buck-out/opt/gen/caffe2/generate-code=python_variable_methods.cpp/python_variable_methods.cpp:1299: UserWarning: This overload of add is deprecated: add(Number alpha, Tensor other) Consider using one of the following signatures instead: add(Tensor other, Number alpha) buck-out/opt/gen/caffe2/generate-code=python_variable_methods.cpp/python_variable_methods.cpp:1334: UserWarning: This overload of add_ is deprecated: add_(Number alpha, Tensor other) Consider using one of the following signatures instead: add_(Tensor other, Number alpha) [01/14 07:28:25 d2.utils.events]: eta: 0:00:10 iter: 19 total_loss: 1.699 loss_cls: 1.185 loss_box_reg: 0.501 time: 0.5020 data_time: 0.0224 lr: 0.000100 max_mem: 3722M [01/14 07:28:35 fvcore.common.checkpoint]: Saving checkpoint to ./output/model_final.pth ``` Differential Revision: D19373523 Pulled By: ezyang fbshipit-source-id: 75756de129645501f43ecc4e3bf8cc0f78c40b90	2020-01-14 11:44:29 -08:00
davidriazati	61e509b992	Skip un-runnable tests (#31965 ) Summary: `test_init_ops` calls `orthogonal_` which fails without lapack (this test was just missing a skip condition) The cpp tests would fail with a `undefined symbol` error if run with `BUILD_TESTS=0`, so this PR skips them if that flag is `0` ](https://our.intern.facebook.com/intern/diff/19320064/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31965 Pulled By: driazati Differential Revision: D19320064 fbshipit-source-id: d1dcd36714107688ded25a414e8969abe026bd03	2020-01-14 11:36:52 -08:00
davidriazati	0664c6bbfd	Add ccls cache to gitignore (#31437 ) Summary: `ccls` [puts a cache](https://github.com/MaskRay/ccls/wiki/Customization#cachedirectory) in the working directory by default, this PR adds it to gitignore so git doesn't pick it up ](https://our.intern.facebook.com/intern/diff/19165007/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31437 Pulled By: driazati Differential Revision: D19165007 fbshipit-source-id: 41012eb0ece2df60b8566d7929710b154c38ee66	2020-01-14 11:27:18 -08:00
Alexander Golynski	b783a75aa3	Fix scalar^tensor derivative for scalars that are zero Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32063 Test Plan: Imported from OSS Differential Revision: D19394258 Pulled By: agolynski fbshipit-source-id: 3eed0f9cc1b8c677c6948c927d007044be67fe7f	2020-01-14 11:11:23 -08:00
Alexander Golynski	fa60e1150d	Fix tensor^tensor derivative for 0 base entries Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32062 Test Plan: Imported from OSS Differential Revision: D19394259 Pulled By: agolynski fbshipit-source-id: 836525e03573af838511ad5b4cc87ec2c1536a5e	2020-01-14 11:10:25 -08:00
Xiang Gao	1487582ba7	Switch important CI from CUDA 9 to 10.1 (#31951 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31427 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31951 Differential Revision: D19393566 Pulled By: ezyang fbshipit-source-id: 06f9637791494a453d3fbef765840dc9f9805196	2020-01-14 09:38:55 -08:00
Yanli Zhao	dbd737158b	support torch script call over rpc (#30063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30063 This diff makes following changes: 1. Providing a new set of python rpc privated APIs, they can accept an annotated TorchScript call and this call can be serialized, deserialized and executed in C++ without GIL. These privated APIs will be binded to JIT in the future, and they are different from public APIs as future JIT binded private APIs will be able to accept qualified_name, not callables. These private APIs are subject to be deprecated once JIT supports torch script function to be a JIT type. Also, these APIs require torch script function to be defined and annotated by users in python land, it can not be script class/module constructor or class/module methods. 2. This diff also allows public rpc APIs to accept an annotated TorchScript call and execute code path that above private APIs ran on. Therefore if users invoke an annotated TorchScript call over RPC, this call can be serialized, deserialized and executed in C++ without GIL as well. 3. The above private APIs call a newly defined C++ function to make rpc torch script call to be serialized, deserialized and executed in C++ land. This C++ function returns an ivalue::Future. so that in follow up diff this C++ function can be called when these privated APIs are binded to JIT. 4. script_call.cpp/.h and request_callback_impl.cpp files are refactored accordingly so that torch script call and builtin call can share same message type and codes. 5. refactored deserializeResponse() and added a new utility to deserizalize response to IValue ghstack-source-id: 96638829 Test Plan: unit test Differential Revision: D18482934 fbshipit-source-id: bd82a0d820c47a8e45b2e7c616eca06573f7d7ea	2020-01-14 09:27:04 -08:00
Edward Yang	5f1a881cb8	Add private user tensor type IDs for experimentation. (#31830 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31830 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19330312 Pulled By: ezyang fbshipit-source-id: fe2e53e732e946088e983ec45fed2393436f0517	2020-01-14 09:01:03 -08:00
Tongzhou Wang	8d472bab6b	Make torch.backends.mkldnn usable without import Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32055 Differential Revision: D19373220 Pulled By: ezyang fbshipit-source-id: 50ab3ff70fc893c81123419c4d3cf2e3e48a0a93	2020-01-14 08:19:19 -08:00
Alban Desmaison	77c78b7d28	remove .data from torch/nn doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31481 Test Plan: Imported from OSS Differential Revision: D19303242 Pulled By: albanD fbshipit-source-id: 4f650df9e9e302a299175967bcc6e30a5099fa2a	2020-01-14 07:30:42 -08:00
Alban Desmaison	c036fbdc5c	remove .data from torch/jit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31480 Test Plan: Imported from OSS Differential Revision: D19303244 Pulled By: albanD fbshipit-source-id: ec66b32353f2f9b16072185ecde3ae8abbe09a35	2020-01-14 07:30:37 -08:00
Alban Desmaison	26621d101f	remove simple .data from torch/nn Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31482 Test Plan: Imported from OSS Differential Revision: D19303185 Pulled By: albanD fbshipit-source-id: 610eae096bab24a7b9f651b9af2e3ecd19df55b0	2020-01-14 07:29:24 -08:00
svcscm	62b1a5f846	Updating submodules Summary: GitHub commits: `2156e48924` `8c5b4af317` `be69716784` `4f76ad1fab` `0b12b2f13c` `0449b53cb1` `1481689822` `43ffa9bbf0` `787d6b6c93` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: b0080fd1a4c26efbe8f26245fbba7740fbac08f3	2020-01-13 20:15:38 -08:00
Brian Stark	a472f0201f	Added support for Dim operation in ONNX export (#31928 ) Summary: While ONNX does not currently directly support the Dim operation on a tensor, we can provide the same functionality with two ONNX operations. This allows us to support Dim for all opsets. It may be adventageous to add support for Dim into a future ONNX opset, and use that for more efficient code. While testing dim op found that there is an issue with empty blocks withing if statements. Modified graph generation to prevent generation of empty if blocks. Fixes https://github.com/pytorch/pytorch/issues/27569 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31928 Reviewed By: hl475 Differential Revision: D19376602 Pulled By: houseroad fbshipit-source-id: 111682b058a5341f5cca6c1a950c83ae412a4c6c	2020-01-13 19:42:43 -08:00
svcscm	c474952b5d	Updating submodules Summary: GitHub commits: `1f8321394d` `024c1d0b43` `1d57089fc3` `3c6f1f782c` `21a27b0f8e` `23bb716b62` `894c6d21af` `e3e241d700` `ac4e11d84a` `c35803ad68` `647388f265` `50a3288630` `b197f0c95a` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 1807ac876a126d221c257edbd4732f9a1240e869	2020-01-13 18:07:08 -08:00
fehiepsi	470c496eb2	use cholesky_inverse to compute precision matrix (#32092 ) Summary: Resolves a long-standing TODO. :D I also fix the docs of lowrank_mvn which is raised at [forum](https://discuss.pytorch.org/t/lowrankmultivariatenormal-example-raises-valueerror/65381). cc vishwakftw Pull Request resolved: https://github.com/pytorch/pytorch/pull/32092 Differential Revision: D19373912 Pulled By: ezyang fbshipit-source-id: b13129d7c30e87c6f8a6ced86601762a3f5c5624	2020-01-13 16:35:46 -08:00
Pritam Damania	f003008d6e	Allow TCPStore to pick a port to bind to. (#31674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31674 The motivation of this PR was to fix the problem where we would see "Address already in use" issues for TCPStoreTest due to port conflicts. To resolve this: 1. We can now pass in port 0 for TCPStore and retrieve the port it actually bound to using a new getPort() API. 2. Added a `wait` flag to TCPStore constructor indicating whether or not it should wait for workers (defaults to true). 3. Made `waitForWorkers` a public API to ensure that we can construct TCPStore without waiting and wait for workers separately. This helps in TCPStoreTest to ensure we can retrieve the port and pass it to the client stores. ghstack-source-id: 96486845 Test Plan: waitforbuildbot Differential Revision: D19240947 fbshipit-source-id: 7b1d1cb2730209fac788764845f1dbbe73d75d9b	2020-01-13 14:23:31 -08:00
Will Feng	632d6fc583	Revert D19373615: Fix typo in config script to re-enable libtorch build and test in macOS CI Test Plan: revert-hammer Differential Revision: D19373615 Original commit changeset: 28686ef58953 fbshipit-source-id: 432b04adfd9d010e1965846a386f117ebc80e013	2020-01-13 14:11:30 -08:00
Zafar Takhirov	701ca68882	Docs entry for the `is_quantized` Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32075 Test Plan: Imported from OSS Differential Revision: D19353861 Pulled By: z-a-f fbshipit-source-id: 4249216ac9a4af354a251c62181d65bc14cbfd3e	2020-01-13 13:54:35 -08:00
svcscm	d53ce5e4cd	Updating submodules Summary: GitHub commits: `b5718e35c8` `e1af1b0550` `8a34e7f444` `e9e70ade5b` `d9e693ece0` `329347c63c` `671b5aa064` `7f3bb0bf37` `6207e92b9b` `d4b95d87d4` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 3c9131bdee0bf8a8ca5c679a95e8ff8a6f805762	2020-01-13 13:30:11 -08:00
Richard Zou	d97413eb7a	Change python/cpp docs CI to use a CPU-only image (#32102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32102 Previously, the docs CI depended on our CUDA xenial py3 build. This meant that the turnaround time to get signal for docs was very slow (I've seen builds that go as much as 3 hours). Fortunately, the docs CI do not (and should not!) rely on CUDA. This PR changes it so that the docs CI runs on a CPU-only machine. Fixes #29995 Test Plan: - Check CI status on this PR by reading logs for the python and cpp docs builds. - I built the docs locally, once for CPU, and once for CUDA, and verified (via diff) that the pages were exactly the same) Differential Revision: D19374078 Pulled By: zou3519 fbshipit-source-id: 3eb36f692c3c0632d2543d3439c822d51a87b809	2020-01-13 12:01:49 -08:00
Jerry Zhang	1f34801460	More robust mangling (#31978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31978 Currently we keep a `mangleIndex_` that's intenral to compilation unit and just increment the index when we found the original name is mangled, this doesn't guarantee the new name is not defined. This PR fixes the problem by querying whether the new name is defined or not. fixes: https://github.com/pytorch/pytorch/issues/31268 Test Plan: fixes the issue Imported from OSS Differential Revision: D19350535 fbshipit-source-id: fe3262b2838d4208ab72e2cd4a5970b3a792ae86	2020-01-13 11:11:50 -08:00
Will Feng	a3dd44653f	Fix typo in config script to re-enable libtorch build and test in macOS CI (#32072 ) Summary: Currently, libtorch build and test are not running in macOS CI. This PR fixes the issue. Test Plan: Check that libtorch build and test are running again in macOS CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32072 Differential Revision: D19373615 Pulled By: yf225 fbshipit-source-id: 28686ef5895358a2b60db46b1946f21c58c6a18e	2020-01-13 10:25:10 -08:00
leetanenbaum	5988d36f58	Fix cumprod error for tensors with zero elements (#32070 ) Summary: Currently cumprod crashes for tensors with non-empty dimensions but with zero elements, which could happen when some dimension is zero. This commit fixes the error by checking both dim() and numel() in cumprod backward Pull Request resolved: https://github.com/pytorch/pytorch/pull/32070 Differential Revision: D19373200 Pulled By: ezyang fbshipit-source-id: d8ecde33f3330b40a7c611f6faa3b1d707ef2a9a	2020-01-13 09:50:27 -08:00
Brian Vaughan	695c4f1bab	Fix a typo in function name: liner -> linear Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32068 Test Plan: Imported from OSS Differential Revision: D19373360 Pulled By: nairbv fbshipit-source-id: 7696300b5c1dbcd7991fda3311d68807b2960982	2020-01-13 09:33:50 -08:00
Xiang Gao	8e93159fb6	CUDA 8 cleanup (#32013 ) Summary: CUDA 8 is no longer supported Pull Request resolved: https://github.com/pytorch/pytorch/pull/32013 Differential Revision: D19372963 Pulled By: ezyang fbshipit-source-id: e584d7d5d5908933221ea4400234b3e6e7c32e7a	2020-01-13 08:48:48 -08:00
ashish	9a4219eb39	Install complete set of headers for ROCm build (#32076 ) Summary: This PR adds a more complete list of pytorch header files to be installed at build time. It also fixes one instance of including a header from local src directory instead of installed directory. A more complete set of headers enable other modules to correctly work with pyTorch built for ROCm. cc: ezyang bddppq iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/32076 Differential Revision: D19372933 Pulled By: ezyang fbshipit-source-id: 3b5f3241c001fa05ea448c359a706ce9a8214aa0	2020-01-13 08:33:28 -08:00
Xiang Gao	4002fec509	Display NVCC version in CI for convenience to look at Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32069 Differential Revision: D19372943 Pulled By: ezyang fbshipit-source-id: c78e5779d4139e42df1f235db65d8c0399ffa1a2	2020-01-13 08:16:52 -08:00
Eleanor Dwight Holland	e74a215ade	Changed clip_grad_norm_ total_norm calculation (#32020 ) Summary: Redefines the computation of the total_norm to increase performance as shown in https://github.com/pytorch/pytorch/issues/31474. Pull Request resolved: https://github.com/pytorch/pytorch/pull/32020 Differential Revision: D19353309 Pulled By: ngimel fbshipit-source-id: bf7530dcd39f56614a211b5f21445864d4f2e875	2020-01-13 08:13:46 -08:00
vishwakftw	77c2c78e01	Fix typographical error in torch.triu docstring (#32067 ) Summary: below --> above Fixes https://github.com/pytorch/pytorch/issues/32032 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32067 Differential Revision: D19355788 Pulled By: zou3519 fbshipit-source-id: dc7a2538a78cd11e72d47ad923ef50599a5a87e2	2020-01-13 07:21:33 -08:00
Zachary DeVito	14593f077f	remove list specialization from ivalue (#30734 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30734 What are specialized lists? The IValues that hold List[int], List[Tensor], and List[AnythingElse] are different C++ types. e.g. List[int] has a std::vector<int> while List[AnythingElse] holds a std::vector<IValue>. Why do we have specialized lists? When we first created the JIT we needed to bind the ATen C++ API which has std::vector<int>, std::vector<Tensor> as inputs. The easiest way to match this API was to make our IValues contain these same types. Conversion was just unwrapping the IValue, very easy and cheap. What is the problem with specialized lists? We end up with significant special cases through the compiler. Other types like Dict are not specialized. So in the Pickler, for instance, there is a single piece of logic to handle their serialization. For Lists, we end up with multiple cases. Furthermore, it doesn't match Python, leading to problems along translation boundaries. Our pickle serialization is slightly different than python, so it is harder to load objects from our IValue serialization as Python values. They also make it harder to provide an easy-to-use user API. We'd like to match pybind11 for C++ bindings to TorchScript. This would entail having a single torch::List class (untemplated) that can be used to construct inputs. This is made much harder if the underlying ivalue needs to be different depending on the type inside the list. The ideal case would be to have a constructor like ``` template<typename T> List(std::vector<T> foo); ``` It would then set up the type tags correctly based on type T, without the need for passing tags. Do specialized lists improve perf? Not in a way we have been able to measure. Our major concern initially was having to translate a std::vector<IValue> to std::vector<int> to call ATen functions. This was especially a concern for aten::_convolution which takes a number of mostly-constant lists of integers. However, when we measure the effect of actually having to do this conversion for an aten::_convolution, it does not take measurable time (benchmark results below). This is true even if you use a trivial convolution (e.g. 1x1x1), and comment out the actual convolution code. What are the issues removing them? This PR removes list specialization but keeps the serialization format, and IValue APIs almost exactly the same. The only visible change is that toTensorListRef and family have turned into toTensorVector because they now return by value a copy of the list as a vector. Further PRs can then clean up the complexity issues that arose from speclization. This will likely involve removing the isTensorList/isIntList functions, and refactoring the code that used them to work generically. At some point we will also change serialization to no longer write specialized lists in the pickle binary. This is forward incompatible, so will go in its own PR. Benchmark: ``` import torch import torch.nn as nn import torch.nn.functional as F import time class MnistNet(nn.Module): def __init__(self): super(MnistNet, self).__init__() self.conv1 = nn.Conv2d(1, 1, kernel_size=1) self.conv2 = nn.Conv2d(1, 1, kernel_size=1) def forward(self, x): for i in range(10): x = F.relu(self.conv1(x)) x = F.relu(self.conv2(x)) return x model = MnistNet() x = torch.rand(1, 1, 1, 1) r = torch.jit.trace(model, x ) r(x) r(x) r(x) r(x) print(torch.jit.last_executed_optimized_graph()) while True: b = time.time() for i in range(100): r(x) e = time.time() print(e - b) ``` Results (no observable difference): ``` Before (actual conv) 0.13251137733459473 0.13260436058044434 0.13276338577270508 0.1327497959136963 0.13250041007995605 0.13270330429077148 0.13290190696716309 0.13265132904052734 0.13274288177490234 0.1326758861541748 0.13253355026245117 0.13254785537719727 0.13260746002197266 0.13285017013549805 0.13264012336730957 0.132490873336792 0.13280034065246582 0.13243484497070312 0.1325232982635498 0.1326127052307129 0.13264131546020508 0.13274383544921875 0.13298296928405762 0.1326909065246582 ------------------- After (actual conv) 0.13127517700195312 0.13150334358215332 0.13092470169067383 0.13102364540100098 0.13134360313415527 0.13155555725097656 0.13314104080200195 0.13151955604553223 0.13160037994384766 0.1315293312072754 0.13137340545654297 0.13148093223571777 0.131455659866333 0.1327371597290039 0.13134026527404785 0.13152337074279785 0.13151192665100098 0.13165974617004395 0.13403725624084473 0.13251852989196777 0.13135504722595215 0.1315624713897705 0.1317615509033203 0.1314380168914795 0.13157200813293457 -------------------- The following replace the convolution operator with a no-op, to show that even if the conv op was made faster, then we still would not see a difference: Before (fake conv) 0.0069539546966552734 0.0069522857666015625 0.007120847702026367 0.007344722747802734 0.007689952850341797 0.007932662963867188 0.00761723518371582 0.007501363754272461 0.007532835006713867 0.007141828536987305 0.007174253463745117 0.007114410400390625 0.007071495056152344 ------------------ After (fake conv) 0.007458209991455078 0.007337093353271484 0.007268190383911133 0.007313251495361328 0.007306575775146484 0.007468700408935547 0.0073091983795166016 0.007308483123779297 0.007538318634033203 0.007356882095336914 0.007464170455932617 0.007372140884399414 ``` Test Plan: Imported from OSS Differential Revision: D18814702 Pulled By: zdevito fbshipit-source-id: 0371c73b63068fdc12f24b801371ea90f23531a6	2020-01-12 18:28:25 -08:00
J M Dieterich	46f32e136a	Revert "Support PyTorch ROCm CI on Ubuntu18.04 (#31886 )" (#31946 ) Summary: This reverts commit 4ee9c562188ae930cb2520cfce7805f55acaf968. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31946 Differential Revision: D19368391 Pulled By: bddppq fbshipit-source-id: 63d032a5256ff4da7247fb1092be314c5b133eb6	2020-01-12 14:04:38 -08:00
Rohan Varma	927c2a02b0	enable autograd profiler to work with RPC and RRef. (#31381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31381 This PR adds support for being able to profile both sync and async RPCs, so that users can use the autograd profiler and be able to view metrics such as RPC latency and number of calls in the profiler output. The way this is implemented is by using the existing `RecordFunction` class provided by the autograd profiler. We create a `RecordFunction` instance when sending an RPC, if autograd profiling is enabled. We also invoke the starting callbacks on this `RecordFunction` instance, this does things such as start the CPU timer. This instance is then persisted across the lifetime of the RPC by attaching it to the `Future` created by the RPC. When the RPC is finished (i.e. when `future->markComplete()` is called), we run the `RecordFunction` instance's end callbacks, which among other things, stops the timer so that we get the correct RPC latency. The `RecordFunction` and relevant callbacks in `profiler.cpp` are modified slightly to support running end callbacks from a different thread (which is needed since futures are marked as completed by a different thread than the main RPC thread). By default, the autograd profiler uses a `thread_local` list of `Events` and `thread_id`. However, since we'd like to run the `RecordFunction`'s callbacks from a different thread, we would like to access the list of `Events` created by the original thread. This is done by attaching the `thread_id` for the event to the `RecordFunction`, and then looking up the event with that thread in `all_event_lists` (see the changes in `profiler.cpp`). To ensure that the original behavior does not change in the profiler, this described behavior is only run when a user calls `setOverrideThreadId()` on the `RecordFunction` object. ghstack-source-id: 96527291 Test Plan: Added a unit test. Differential Revision: D19053322 fbshipit-source-id: 9a27a60c809fc4fdb16fa5d85085f3b6b21abfbb	2020-01-10 21:26:18 -08:00
Chenyang Yu	20e5c90d82	accept url query when rank or wolrd_size is specified (#32016 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32016 The previously logic will raise exception when there is query in url when rank or world_size is specified The fix will parse the url and stitch rank and world_size into url.query and regenerate the url. Test Plan: f161291877 Differential Revision: D19337929 fbshipit-source-id: 6bb3a07716dda5233553804000b706052ff18db8	2020-01-10 18:27:06 -08:00
Will Feng	b6cee03e29	C++ tensor indexing: add Slice / TensorIndex (#30424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30424 `at::indexing::TensorIndex` is used for converting C++ tensor indices such as `{None, "...", Ellipsis, 0, true, {1, None, 2}, torch::tensor({1, 2})}` into its equivalent `std::vector<TensorIndex>`, so that further tensor indexing operations can be performed using the supplied indices. Test Plan: Imported from OSS Differential Revision: D18695902 Pulled By: yf225 fbshipit-source-id: d73e14a411cdbec815866b02e75ffd71a9186e89	2020-01-10 17:53:41 -08:00
anjali411	638e4ad8b9	Updated function definition for torch.mode and torch.median in torch docs (#32003 ) Summary: Issue: https://github.com/pytorch/pytorch/issues/32002 Pull Request resolved: https://github.com/pytorch/pytorch/pull/32003 Differential Revision: D19334306 Pulled By: anjali411 fbshipit-source-id: fe6a7cc7295b2d582a0b528f353ec64d9085e8c5	2020-01-10 13:13:54 -08:00
Silun Wang	28c1258f18	Scale init for batch-norm and layer-norm (#31983 ) Summary: Per discussion with Fei Tian, we need to add a `scale_init_value` to scale down the output of normalization such as batch-norm and layer-norm. Currently we have `sparse_normalization_options` to normalize embedding pooling output. By default, scale = 1.0, we found it's better to set scale from 0.025 to 0.1 https://fb.quip.com/MiKUAibEaYhH Besides, I am removing the tags from normalizers because it makes more sense to calculate norm ops in distributed trainers, not ps. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31983 Test Plan: Testing LN and BN after sum-pooling -- baseline f160348514 LN: f160348609 BN: f160348710 {F226106518} Layer norm after sum-pooling fwd_net https://fburl.com/sa4j207n Layer norm after dot-prod fwd_net https://fburl.com/twggwyvb ## Unit Tests Testing normalization after pooling ``` buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_sparse_pooling_batch_normalization buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_dense_sparse_pooling_batch_normalization buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_sparse_pooling_layer_normalization buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test_4 -- test_dense_sparse_pooling_layer_normalization ``` Testing normalization after dot-prod ``` buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_last_layer_use_batch_norm buck test caffe2/caffe2/fb/dper/layer_models/tests/split_1:sparse_nn_test -- test_last_layer_use_layer_norm ``` Differential Revision: D19277618 Pulled By: SilunWang fbshipit-source-id: ea323e33e3647ba55d2e808ef09d94ad7b45b934	2020-01-10 11:55:56 -08:00
Rohan Varma	c5af0afdcb	catch exceptions in ProcessGroupAgent::enqueueSend and report them. (#31023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31023 Adds support to catch exceptions in ProcessGroupAgent::enqueueSend and report them in the future by marking the future as completed with an exception indicating the error. An example of when this could happen is if the receiving side aborts when the sender is sending the message, previously, we would hang until the timeout is hit, and the original exception would be lost. ghstack-source-id: 96498386 Test Plan: Added a relevant unit test: `test_sender_exceptions` in rpc_test.py Differential Revision: D18901981 fbshipit-source-id: 08de26936c4ad45b837219a247088cbea644c04c	2020-01-10 11:39:57 -08:00
Jiakai Liu	346005d3ed	integrate op dependency analysis process into CI Summary: Custom build and internal build will depend on the analysis result so let's make sure it doesn't break. Tested locally with LLVM-5.0, LLVM-7 and LLVM-8. Test Plan: - check CI result Differential Revision: D18894637 Pulled By: ljk53 fbshipit-source-id: 657854e4bed85a84907e3b6638d158823a56ec80	2020-01-10 11:37:37 -08:00
Jiakai Liu	16b8ca56b6	update docker image version (#31848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31848 Trigger docker image build and bump up docker image version. Test Plan: - Check tag at: http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html Differential Revision: D19282725 Pulled By: ljk53 fbshipit-source-id: a27b2831a92ff54d80ccbae0f18dadff0469254c	2020-01-10 11:37:32 -08:00
Jiakai Liu	03ff3eb94d	skip TEST_DILL on Python2 (#32027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32027 The test was added in #30985 for #28313. Seems the fix only works for Python3 but doesn't work on Python2. The current Python2 CI docker image doesn't have `dill` module installed at all so it's not captured. I'm trying to build and push new CI docker image which has `dill` installed and I verified it's the latest version 0.3.1.1 but the fix doesn't seem to work and blocks me from upgrading image version. It works for Python3 docker image though... Here is a succeeded job with old image (no dill installed): https://app.circleci.com/jobs/github/pytorch/pytorch/4192688 Here is a failed job with new image (dill installed): https://app.circleci.com/jobs/github/pytorch/pytorch/4192679 This PR bypasses the test for Py2 to unblock docker image change. We can figure out a proper fix for Py2 later. Test Plan: Imported from OSS Differential Revision: D19341451 Pulled By: ljk53 fbshipit-source-id: d5768de8cbaf1beba8911da76f4942b8f210f2d2	2020-01-10 11:37:28 -08:00
Jiakai Liu	ab5eb65e74	gate torch_global_deps with BUILD_SHARED_LIBS flag (#32011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/32011 Run into build problem with Ninja + code analysis build as follows: ``` The install of the torch_global_deps target requires changing an RPATH from the build tree, but this is not supported with the Ninja generator unless on an ELF-based platform. ``` Seems we don't need build the target for static build mode? Verified code analyzer works with the patch. Test Plan: Imported from OSS Differential Revision: D19336818 Pulled By: ljk53 fbshipit-source-id: 37f45a9392c45ce92c1df40d739b23954e50a13a	2020-01-10 11:37:24 -08:00
Jerry Zhang	f995ec2076	Remove qconfig_dict in top level eager mode quantization API (#31972 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31972 Since eager mode quantization requires many user modifications, we can't consistently quantize a given model by just changing qconfig_dict, therefore the top level `qconfig_dict` is not that useful. fixes: https://github.com/pytorch/pytorch/issues/31549 Test Plan: . Imported from OSS Differential Revision: D19330691 fbshipit-source-id: 8aee6e5249e0c14e8a363ac1a83836e88887cd7d	2020-01-10 11:04:37 -08:00
svcscm	c5a362a96d	Updating submodules Summary: GitHub commits: `b14a430062` `c1c5426018` `42d18a93c4` `a4e11e8721` `25c971b0c3` `b2ea65322f` `e86573b6de` `31d721301c` `687119aeaf` `25cad9547d` `428862c045` `95640f80d8` `0e4db05b37` `5cb83de9cc` `4fdb800074` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: bcd533c540c1170844dbf2b23538d72c95a0d304	2020-01-10 11:01:20 -08:00
xiaobing.zhang	8098ae455c	Move rshift to Aten (#31594 ) Summary: VitalyFedyunin , this PR is about move rshift to Aten. Benchmark script : ``` import timeit import torch torch.manual_seed(1) for n, t in [(10, 100000),(1000, 10000)]: print('__rshift__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a >> b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t)) for dtype in ('torch.float32', 'torch.float64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a >> b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.randn({n}, dtype = {dtype}, device="{device}")', number=t)) for n, t in [(10, 100000),(1000, 10000)]: print('__irshift__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a >> b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) for dtype in ('torch.float32', 'torch.float64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a >> b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) ``` Device: Tesla P100, skx-8180 Cuda verison: 9.0.176 Before: ``` __rshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.17183916084468365 device: cpu, dtype: torch.uint8, 100000 times 0.16587729007005692 device: cpu, dtype: torch.int16, 100000 times 0.16659130714833736 device: cpu, dtype: torch.int32, 100000 times 0.17177579551935196 device: cpu, dtype: torch.int64, 100000 times 0.17860156949609518 device: cpu, dtype: torch.float32, 100000 times 0.23938780091702938 device: cpu, dtype: torch.float64, 100000 times 0.22591270506381989 device: cuda, dtype: torch.int8, 100000 times 1.2709560776129365 device: cuda, dtype: torch.uint8, 100000 times 1.2692269310355186 device: cuda, dtype: torch.int16, 100000 times 1.2785452520474792 device: cuda, dtype: torch.int32, 100000 times 1.2733035255223513 device: cuda, dtype: torch.int64, 100000 times 1.2785427365452051 device: cuda, dtype: torch.float32, 100000 times 1.2980637094005942 device: cuda, dtype: torch.float64, 100000 times 1.3062487514689565 __rshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.03122080024331808 device: cpu, dtype: torch.uint8, 10000 times 0.030290847644209862 device: cpu, dtype: torch.int16, 10000 times 0.024531075730919838 device: cpu, dtype: torch.int32, 10000 times 0.024743229150772095 device: cpu, dtype: torch.int64, 10000 times 0.025563121773302555 device: cpu, dtype: torch.float32, 10000 times 0.6707976600155234 device: cpu, dtype: torch.float64, 10000 times 0.5344798369333148 device: cuda, dtype: torch.int8, 10000 times 0.12768010422587395 device: cuda, dtype: torch.uint8, 10000 times 0.12681372743099928 device: cuda, dtype: torch.int16, 10000 times 0.12995595764368773 device: cuda, dtype: torch.int32, 10000 times 0.12989260721951723 device: cuda, dtype: torch.int64, 10000 times 0.12804713658988476 device: cuda, dtype: torch.float32, 10000 times 0.13013121113181114 device: cuda, dtype: torch.float64, 10000 times 0.1406280631199479 __irshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.3805475188419223 device: cpu, dtype: torch.uint8, 100000 times 0.36341007333248854 device: cpu, dtype: torch.int16, 100000 times 0.36908434610813856 device: cpu, dtype: torch.int32, 100000 times 0.3669992135837674 device: cpu, dtype: torch.int64, 100000 times 0.37847711704671383 device: cpu, dtype: torch.float32, 100000 times 0.4311870699748397 device: cpu, dtype: torch.float64, 100000 times 0.44503832422196865 device: cuda, dtype: torch.int8, 100000 times 1.4343859804794192 device: cuda, dtype: torch.uint8, 100000 times 1.4298221375793219 device: cuda, dtype: torch.int16, 100000 times 1.4460898758843541 device: cuda, dtype: torch.int32, 100000 times 1.4518025070428848 device: cuda, dtype: torch.int64, 100000 times 1.4456725595518947 device: cuda, dtype: torch.float32, 100000 times 1.4610810624435544 device: cuda, dtype: torch.float64, 100000 times 1.4736663019284606 __irshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.05944254994392395 device: cpu, dtype: torch.uint8, 10000 times 0.058085592463612556 device: cpu, dtype: torch.int16, 10000 times 0.05094402376562357 device: cpu, dtype: torch.int32, 10000 times 0.050842881202697754 device: cpu, dtype: torch.int64, 10000 times 0.06223891582340002 device: cpu, dtype: torch.float32, 10000 times 0.7006897022947669 device: cpu, dtype: torch.float64, 10000 times 0.5614962242543697 device: cuda, dtype: torch.int8, 10000 times 0.1461706068366766 device: cuda, dtype: torch.uint8, 10000 times 0.14335164614021778 device: cuda, dtype: torch.int16, 10000 times 0.1448021186515689 device: cuda, dtype: torch.int32, 10000 times 0.14513055887073278 device: cuda, dtype: torch.int64, 10000 times 0.1439579650759697 device: cuda, dtype: torch.float32, 10000 times 0.14666561130434275 device: cuda, dtype: torch.float64, 10000 times 0.1540807681158185 ``` After: ``` _rshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.16366520430892706 device: cpu, dtype: torch.uint8, 100000 times 0.16091545950621367 device: cpu, dtype: torch.int16, 100000 times 0.1659633992239833 device: cpu, dtype: torch.int32, 100000 times 0.1682385364547372 device: cpu, dtype: torch.int64, 100000 times 0.17289020214229822 device: cpu, dtype: torch.float32, 100000 times 0.24359441827982664 device: cpu, dtype: torch.float64, 100000 times 0.21783945057541132 device: cuda, dtype: torch.int8, 100000 times 1.2517220517620444 device: cuda, dtype: torch.uint8, 100000 times 1.260181212797761 device: cuda, dtype: torch.int16, 100000 times 1.2681935774162412 device: cuda, dtype: torch.int32, 100000 times 1.2764465296640992 device: cuda, dtype: torch.int64, 100000 times 1.294325228780508 device: cuda, dtype: torch.float32, 100000 times 1.3062216322869062 device: cuda, dtype: torch.float64, 100000 times 1.303224254399538 __rshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.027045012451708317 device: cpu, dtype: torch.uint8, 10000 times 0.026978280395269394 device: cpu, dtype: torch.int16, 10000 times 0.025594274513423443 device: cpu, dtype: torch.int32, 10000 times 0.02593063935637474 device: cpu, dtype: torch.int64, 10000 times 0.02668109256774187 device: cpu, dtype: torch.float32, 10000 times 0.09746317192912102 device: cpu, dtype: torch.float64, 10000 times 0.1644029449671507 device: cuda, dtype: torch.int8, 10000 times 0.12530914042145014 device: cuda, dtype: torch.uint8, 10000 times 0.12615622486919165 device: cuda, dtype: torch.int16, 10000 times 0.12741118855774403 device: cuda, dtype: torch.int32, 10000 times 0.1284919548779726 device: cuda, dtype: torch.int64, 10000 times 0.12974756956100464 device: cuda, dtype: torch.float32, 10000 times 0.13044228963553905 device: cuda, dtype: torch.float64, 10000 times 0.13918257877230644 __irshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.19456563983112574 device: cpu, dtype: torch.uint8, 100000 times 0.190769555978477 device: cpu, dtype: torch.int16, 100000 times 0.2002257639542222 device: cpu, dtype: torch.int32, 100000 times 0.20456529594957829 device: cpu, dtype: torch.int64, 100000 times 0.2043834924697876 device: cpu, dtype: torch.float32, 100000 times 0.2832390898838639 device: cpu, dtype: torch.float64, 100000 times 0.2582795573398471 device: cuda, dtype: torch.int8, 100000 times 1.304957083426416 device: cuda, dtype: torch.uint8, 100000 times 1.3216373259201646 device: cuda, dtype: torch.int16, 100000 times 1.3238621400669217 device: cuda, dtype: torch.int32, 100000 times 1.333009460940957 device: cuda, dtype: torch.int64, 100000 times 1.3835567953065038 device: cuda, dtype: torch.float32, 100000 times 1.4483617274090648 device: cuda, dtype: torch.float64, 100000 times 1.4179155295714736 __irshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.03196091763675213 device: cpu, dtype: torch.uint8, 10000 times 0.03048650734126568 device: cpu, dtype: torch.int16, 10000 times 0.03048624936491251 device: cpu, dtype: torch.int32, 10000 times 0.030591044574975967 device: cpu, dtype: torch.int64, 10000 times 0.031246556900441647 device: cpu, dtype: torch.float32, 10000 times 0.10918692220002413 device: cpu, dtype: torch.float64, 10000 times 0.18057993799448013 device: cuda, dtype: torch.int8, 10000 times 0.13614848721772432 device: cuda, dtype: torch.uint8, 10000 times 0.130373639985919 device: cuda, dtype: torch.int16, 10000 times 0.1332557238638401 device: cuda, dtype: torch.int32, 10000 times 0.1331850504502654 device: cuda, dtype: torch.int64, 10000 times 0.1363008264452219 device: cuda, dtype: torch.float32, 10000 times 0.1370363561436534 device: cuda, dtype: torch.float64, 10000 times 0.1442740885540843 ``` Fix https://github.com/pytorch/pytorch/issues/24512 #24516 https://github.com/pytorch/pytorch/issues/24659 https://github.com/pytorch/pytorch/issues/24663 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31594 Differential Revision: D19346542 Pulled By: ezyang fbshipit-source-id: 37dd00b86898810b850cf4769c3af8aea6d4596b	2020-01-10 10:52:15 -08:00
Johannes M Dieterich	a201027e93	Abstract atomic add calls (#31992 ) Summary: Instead of a mixture of direct calls to library provided atomicAdd calls, such as float atomicAdd(float, float) and calls provided internally, such as void atomicAdd(long, long), abstract to one API void gpuAtomicAdd(T*, T) in THCAtomics.cuh for the PyTorch backend. The advantage of this approach is that it allows us to more easily distinguish between capabiltiies of different platforms (and their versions). Additionally, the abstraction of void returning atomicAdds allows us to, in the future, support fast HW instructions on some platforms that will not return the previous value. Call sites that do not satisfy above conditions and are either highly platform specific (__half2 atomicAdd fast path in one operator) or require the return explicitly (some int atomicAdd invocations) are left untouched. The Caffe2 backend also remains untouched. While here, add a bunch of includes of THCAtomics.cuh that were missing before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31992 Differential Revision: D19330220 Pulled By: ezyang fbshipit-source-id: d6ab73ec5168c77e328faeef6c6f48eefba00861	2020-01-10 09:48:42 -08:00
Tongzhou Wang	c6f41ae01b	Fix and add more padding mode support for Conv (#31784 ) Summary: Fix https://github.com/pytorch/pytorch/issues/29712 #29668 , add arg checking, doc, and support for reflection and replication padding modes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31784 Differential Revision: D19301974 Pulled By: ezyang fbshipit-source-id: a0ed4815c0c22e416b16e256bba04324e376b2f8	2020-01-10 08:14:58 -08:00
Tongzhou Wang	b6f43afaca	Fix tensordot allowing negative dims (#31954 ) Summary: fixes https://github.com/pytorch/pytorch/issues/31926 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31954 Differential Revision: D19331847 Pulled By: zou3519 fbshipit-source-id: e30dd9517917c056a52be7d16f23247fe28f4e28	2020-01-10 07:42:04 -08:00
Rohan Varma	8ea49e7a08	add missing braces for format in rpc _to_worker_info (#31969 ) Summary: This was missing and resulted in the incorrect `name` passed into `_to_worker_info` not being printed out in the error message. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31969 Differential Revision: D19331927 Pulled By: rohan-varma fbshipit-source-id: e74d47daec3224c2d9b9da3c0a6404cfa67baf65	2020-01-09 23:18:46 -08:00
Jiakai Liu	4e84661139	update llvmlite to 0.30.0 (#31858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31858 Trying to upgrade docker image but ran into the following error: ``` Running test_nn ... [2020-01-04 18:05:12.537860] Traceback (most recent call last): File "test_nn.py", line 45, in <module> from common_cuda import TEST_CUDA, TEST_MULTIGPU, TEST_CUDNN, TEST_CUDNN_VERSION File "/var/lib/jenkins/workspace/test/common_cuda.py", line 16, in <module> import numba.cuda File "/opt/conda/lib/python3.6/site-packages/numba/__init__.py", line 178, in <module> _ensure_llvm() File "/opt/conda/lib/python3.6/site-packages/numba/__init__.py", line 100, in _ensure_llvm raise ImportError(msg) ImportError: Numba requires at least version 0.30.0 of llvmlite. Installed version is 0.28.0. ``` Test Plan: Imported from OSS Differential Revision: D19282923 Pulled By: ljk53 fbshipit-source-id: bdeefbf4f6c0c97df622282f76e77eb1eadba436	2020-01-09 19:28:08 -08:00
Shen Li	62f93443e5	Explain RPC behavior when using Tensor as arg or return value Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31968 Test Plan: Imported from OSS Differential Revision: D19321380 Pulled By: mrshenli fbshipit-source-id: e3431f1f02963cc8d8266a420ab03866106f26ac	2020-01-09 16:42:24 -08:00
Zafar Takhirov	6abfa9ad8a	Quantized H Tangent function (#31031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31031 This activation will be needed for the LSTM implementation. Also includes the QNNPack implementation. Test Plan: Imported from OSS Differential Revision: D19334280 Pulled By: z-a-f fbshipit-source-id: ae14399765a47afdf9b1e072d3967c24ff473e8d	2020-01-09 16:16:17 -08:00
Bram Wasti	021e1e20c1	Revert D19320493: Javadoc changes Test Plan: revert-hammer Differential Revision: D19320493 Original commit changeset: cc76b2a2acbe fbshipit-source-id: 3b36dd2d2591acc60a06a421dd625c21adbe578a	2020-01-09 14:23:30 -08:00
Jiakai Liu	700d1c5cbc	update CI script to take string docker image version (#31857 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31857 According to mingbowan we will change to use string docker image version because the tag is no longer an integer since we move the docker image build job to circle CI: http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html Test Plan: - with stacked PR Differential Revision: D19282726 Pulled By: ljk53 fbshipit-source-id: 7a12ae89a11cf15163b905734d50fed6dc98cb07	2020-01-09 14:15:10 -08:00
Lu Fang	67ff051ddd	Remove temporary fix for torchbind in BC check (#31982 ) Summary: Remove the patch Pull Request resolved: https://github.com/pytorch/pytorch/pull/31982 Reviewed By: hl475 Differential Revision: D19333205 Pulled By: houseroad fbshipit-source-id: 1d16fd31ede7266789141238520d47b762a7a340	2020-01-09 13:58:16 -08:00
Alban Desmaison	2968faf154	Update doc about output_differentiability keyword in derivatives.yaml Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31925 Test Plan: Imported from OSS Differential Revision: D19303833 Pulled By: albanD fbshipit-source-id: 291a9f122720844a5f8386b22cf6abc66ae86e4d	2020-01-09 13:48:06 -08:00
Edward Yang	67c1d930eb	Lock graph_task before writing leaf_streams. (#31995 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31995 Fixes #31906. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19331259 Pulled By: ezyang fbshipit-source-id: 5d24bf3555e632211a9b6f8e50ff241603c18b3d	2020-01-09 13:26:36 -08:00
TH3CHARLie	1296e2d55e	C++ API parity: isinf (#31099 ) Summary: fixes https://github.com/pytorch/pytorch/issues/31021, port the legacy binding method of `isinf` to C++ therefore support JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/31099 Differential Revision: D19314733 Pulled By: yf225 fbshipit-source-id: 5725c51d19c33b4fddd0fc9e7034078580bd534e	2020-01-09 13:16:13 -08:00
Sameer Deshmukh	cfdfdf70d7	remove JSON dumping dependency (#30724 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/19420 So after actually writing a C++ JSON dumping class I figured that a faster and cleaner way would be simply rewrite the Python without the JSON module since the JSON that we need to output is so simple. For now I decided to not touch the `parse_cpu_trace` function since only changing `export_chrome_trace` shows a 4x speedup. Here's the script I used for benchmarking: ``` python import time import torch x = torch.ones(2, 2) start = time.time() with torch.autograd.profiler.profile() as prof: for _ in range(10000): x * x for i in range(50): prof.export_chrome_trace("trace.json") stop = time.time() print(stop-start) ``` master branch (using json dump) -> 8.07515025138855 new branch (without json dump) -> 2.0943689346313477 I checked the trace file generated in the [test](https://github.com/pytorch/pytorch/blob/master/test/test_autograd.py#L2659) and it does work fine. Please let me know what you think. If you still insist on the C++ version I can send a new patch soon enough. CC ezyang rgommers Pull Request resolved: https://github.com/pytorch/pytorch/pull/30724 Differential Revision: D19298955 Pulled By: ezyang fbshipit-source-id: b0d7324ea5f90884ab8a00dd272f3aa3d9bc0427	2020-01-09 12:56:16 -08:00
jlquinn	bc68a8745f	Spelling fix in transformer docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31973 Differential Revision: D19330660 Pulled By: zou3519 fbshipit-source-id: 29ea1e790a34f0241cb7aba85110f087cdc069ba	2020-01-09 11:13:23 -08:00
Jessica Lin	26f552a3d1	Javadoc changes (#31956 ) Summary: - Add Javadoc url in index.rst - Delete no longer needed java rst files - Remove intersphinx extension from conf.oy - Remove javasphinx from docs/requirements.txt Pull Request resolved: https://github.com/pytorch/pytorch/pull/31956 Differential Revision: D19320493 Pulled By: jlin27 fbshipit-source-id: cc76b2a2acbe2ecdabcd3339e1cc3182f0c906ae	2020-01-09 10:55:24 -08:00
xiaobing.zhang	e59e5ba5a3	Move geometric to Aten(CPU) (#31878 ) Summary: Fix https://github.com/pytorch/pytorch/issues/24704. Benchmark script : ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" #warm up for n in [10, 100, 1000]: input = torch.randn(128, n, requires_grad=False, device=device) for i in range(1000): input.geometric_(0.5) for n in [1, 10, 100, 1000]: fwd_t = 0 input = torch.randn(128, n, requires_grad=False, device=device) for i in range(10000): t1 = _time() input.geometric_(0.5) t2 = _time() fwd_t = fwd_t + (t2 -t1) fwd_avg = fwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.4f (ms)." % (n, fwd_avg)) ``` Test device: skx-8180. Before: ``` input size(128, 1) forward time is 0.0092 (ms). input size(128, 10) forward time is 0.0802 (ms). input size(128, 100) forward time is 0.7994 (ms). input size(128, 1000) forward time is 7.8403 (ms). ``` After: ``` input size(128, 1) forward time is 0.0088 (ms). input size(128, 10) forward time is 0.0781 (ms). input size(128, 100) forward time is 0.7815 (ms). input size(128, 1000) forward time is 7.7163 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31878 Differential Revision: D19314510 Pulled By: ezyang fbshipit-source-id: 2d95bf9938c8becf280890acf9e37223ddd08a39	2020-01-09 10:47:56 -08:00
xiaobing.zhang	99b3f9cac4	Move log_sigmoid to Aten(CPU) (#30958 ) Summary: VitalyFedyunin, This PR is about port LogSigmoid activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" m = nn.LogSigmoid() #warm up for n in [1, 10, 100, 1000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.randn(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [1, 10, 100, 1000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.randn(128, n, device=device) fwd_t = 0 bwd_t = 0 for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Before: ``` input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.10 (ms); backwad avg time is 0.03 (ms). input size(128, 100) forward time is 0.90 (ms); backwad avg time is 0.09 (ms). input size(128, 1000) forward time is 9.04 (ms); backwad avg time is 0.87 (ms). ``` After: ``` input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.03 (ms). input size(128, 1000) forward time is 0.28 (ms); backwad avg time is 0.07 (ms). ``` OMP_NUM_THREADS=1: ``` Before: input size(128, 1) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.10 (ms); backwad avg time is 0.03 (ms). input size(128, 100) forward time is 0.88 (ms); backwad avg time is 0.10 (ms). input size(128, 1000) forward time is 8.72 (ms); backwad avg time is 0.81 (ms). After: input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 100) forward time is 0.07 (ms); backwad avg time is 0.03 (ms). input size(128, 1000) forward time is 0.63 (ms); backwad avg time is 0.15 (ms). ``` Fix https://github.com/pytorch/pytorch/issues/24724, https://github.com/pytorch/pytorch/issues/24725. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30958 Differential Revision: D19275111 Pulled By: ezyang fbshipit-source-id: bbfe82e58fb27a4fb21c1914c6547a9050072e5c	2020-01-09 10:30:00 -08:00
xiaobing.zhang	5a76335aaa	Move lshift to Aten (#31566 ) Summary: VitalyFedyunin , this PR is about move lshift to Aten. Benchmark script : ``` import timeit import torch torch.manual_seed(1) for n, t in [(10, 100000),(1000, 10000)]: print('__lshift__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t)) for dtype in ('torch.float32', 'torch.float64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.randn({n}, dtype = {dtype}, device="{device}")', number=t)) for n, t in [(10, 100000),(1000, 10000)]: print('__ilshift__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) for dtype in ('torch.float32', 'torch.float64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a << b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randn({n}, dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) ``` Device: Tesla P100, skx-8180 Cuda verison: 9.0.176 Before: ``` __lshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.31618343852460384 device: cpu, dtype: torch.uint8, 100000 times 0.31258584931492805 device: cpu, dtype: torch.int16, 100000 times 0.3140896391123533 device: cpu, dtype: torch.int32, 100000 times 0.34389012958854437 device: cpu, dtype: torch.int64, 100000 times 0.339566046372056 device: cpu, dtype: torch.float32, 100000 times 0.4180623721331358 device: cpu, dtype: torch.float64, 100000 times 0.4165227338671684 device: cuda, dtype: torch.int8, 100000 times 1.7851383443921804 device: cuda, dtype: torch.uint8, 100000 times 1.7842160519212484 device: cuda, dtype: torch.int16, 100000 times 1.789359962567687 device: cuda, dtype: torch.int32, 100000 times 1.7822618428617716 device: cuda, dtype: torch.int64, 100000 times 1.7968465769663453 device: cuda, dtype: torch.float32, 100000 times 1.8066061967983842 device: cuda, dtype: torch.float64, 100000 times 1.8046843251213431 __lshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.04618230368942022 device: cpu, dtype: torch.uint8, 10000 times 0.04634759668260813 device: cpu, dtype: torch.int16, 10000 times 0.040676115080714226 device: cpu, dtype: torch.int32, 10000 times 0.04404774494469166 device: cpu, dtype: torch.int64, 10000 times 0.04511771444231272 device: cpu, dtype: torch.float32, 10000 times 0.6887832451611757 device: cpu, dtype: torch.float64, 10000 times 0.5559549620375037 device: cuda, dtype: torch.int8, 10000 times 0.17996764183044434 device: cuda, dtype: torch.uint8, 10000 times 0.17970609478652477 device: cuda, dtype: torch.int16, 10000 times 0.17873135022819042 device: cuda, dtype: torch.int32, 10000 times 0.1781835313886404 device: cuda, dtype: torch.int64, 10000 times 0.17846618220210075 device: cuda, dtype: torch.float32, 10000 times 0.18056879844516516 device: cuda, dtype: torch.float64, 10000 times 0.18132662680000067 __ilshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.61110960226506 device: cpu, dtype: torch.uint8, 100000 times 0.6333359787240624 device: cpu, dtype: torch.int16, 100000 times 0.6345370784401894 device: cpu, dtype: torch.int32, 100000 times 0.6470990972593427 device: cpu, dtype: torch.int64, 100000 times 0.6587044578045607 device: cpu, dtype: torch.float32, 100000 times 0.7269002720713615 device: cpu, dtype: torch.float64, 100000 times 0.7217964073643088 device: cuda, dtype: torch.int8, 100000 times 1.9880435159429908 device: cuda, dtype: torch.uint8, 100000 times 1.986489498987794 device: cuda, dtype: torch.int16, 100000 times 2.0059875370934606 device: cuda, dtype: torch.int32, 100000 times 1.995262237265706 device: cuda, dtype: torch.int64, 100000 times 1.9974954994395375 device: cuda, dtype: torch.float32, 100000 times 2.00442770216614 device: cuda, dtype: torch.float64, 100000 times 2.009664717130363 __ilshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.08199594635516405 device: cpu, dtype: torch.uint8, 10000 times 0.08096733782440424 device: cpu, dtype: torch.int16, 10000 times 0.0734213450923562 device: cpu, dtype: torch.int32, 10000 times 0.0769620593637228 device: cpu, dtype: torch.int64, 10000 times 0.08650507684797049 device: cpu, dtype: torch.float32, 10000 times 0.7196345143020153 device: cpu, dtype: torch.float64, 10000 times 0.597336508333683 device: cuda, dtype: torch.int8, 10000 times 0.19723015930503607 device: cuda, dtype: torch.uint8, 10000 times 0.19754122477024794 device: cuda, dtype: torch.int16, 10000 times 0.19710093270987272 device: cuda, dtype: torch.int32, 10000 times 0.19611249305307865 device: cuda, dtype: torch.int64, 10000 times 0.19750046730041504 device: cuda, dtype: torch.float32, 10000 times 0.19680574722588062 device: cuda, dtype: torch.float64, 10000 times 0.19689027685672045 ``` After: ``` __lshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.3031281465664506 device: cpu, dtype: torch.uint8, 100000 times 0.30772678554058075 device: cpu, dtype: torch.int16, 100000 times 0.3088294789195061 device: cpu, dtype: torch.int32, 100000 times 0.30907699652016163 device: cpu, dtype: torch.int64, 100000 times 0.31315001379698515 device: cpu, dtype: torch.float32, 100000 times 0.38823566399514675 device: cpu, dtype: torch.float64, 100000 times 0.39300001971423626 device: cuda, dtype: torch.int8, 100000 times 1.3225595457479358 device: cuda, dtype: torch.uint8, 100000 times 1.31739442050457 device: cuda, dtype: torch.int16, 100000 times 1.3198596313595772 device: cuda, dtype: torch.int32, 100000 times 1.309600466862321 device: cuda, dtype: torch.int64, 100000 times 1.3264533821493387 device: cuda, dtype: torch.float32, 100000 times 1.3377520674839616 device: cuda, dtype: torch.float64, 100000 times 1.3343619462102652 __lshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.02718757465481758 device: cpu, dtype: torch.uint8, 10000 times 0.02701799664646387 device: cpu, dtype: torch.int16, 10000 times 0.025483975186944008 device: cpu, dtype: torch.int32, 10000 times 0.025557605549693108 device: cpu, dtype: torch.int64, 10000 times 0.026179466396570206 device: cpu, dtype: torch.float32, 10000 times 0.0962932649999857 device: cpu, dtype: torch.float64, 10000 times 0.1611471576616168 device: cuda, dtype: torch.int8, 10000 times 0.13165222201496363 device: cuda, dtype: torch.uint8, 10000 times 0.13358880020678043 device: cuda, dtype: torch.int16, 10000 times 0.1342075066640973 device: cuda, dtype: torch.int32, 10000 times 0.1328689968213439 device: cuda, dtype: torch.int64, 10000 times 0.13336248509585857 device: cuda, dtype: torch.float32, 10000 times 0.1345295710489154 device: cuda, dtype: torch.float64, 10000 times 0.14084953162819147 __ilshift__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.19080814253538847 device: cpu, dtype: torch.uint8, 100000 times 0.18541878275573254 device: cpu, dtype: torch.int16, 100000 times 0.19136024825274944 device: cpu, dtype: torch.int32, 100000 times 0.1916898973286152 device: cpu, dtype: torch.int64, 100000 times 0.1973192635923624 device: cpu, dtype: torch.float32, 100000 times 0.2668355852365494 device: cpu, dtype: torch.float64, 100000 times 0.24472137168049812 device: cuda, dtype: torch.int8, 100000 times 1.3581306440755725 device: cuda, dtype: torch.uint8, 100000 times 1.3522163443267345 device: cuda, dtype: torch.int16, 100000 times 1.366145665757358 device: cuda, dtype: torch.int32, 100000 times 1.3674909211695194 device: cuda, dtype: torch.int64, 100000 times 1.3734915973618627 device: cuda, dtype: torch.float32, 100000 times 1.3831533305346966 device: cuda, dtype: torch.float64, 100000 times 1.396162535995245 __ilshift__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.02847585454583168 device: cpu, dtype: torch.uint8, 10000 times 0.02960751298815012 device: cpu, dtype: torch.int16, 10000 times 0.028516249731183052 device: cpu, dtype: torch.int32, 10000 times 0.02842544950544834 device: cpu, dtype: torch.int64, 10000 times 0.029186096973717213 device: cpu, dtype: torch.float32, 10000 times 0.0999628696590662 device: cpu, dtype: torch.float64, 10000 times 0.16676222812384367 device: cuda, dtype: torch.int8, 10000 times 0.13856443110853434 device: cuda, dtype: torch.uint8, 10000 times 0.13766566663980484 device: cuda, dtype: torch.int16, 10000 times 0.13652489613741636 device: cuda, dtype: torch.int32, 10000 times 0.13678150344640017 device: cuda, dtype: torch.int64, 10000 times 0.13749946560710669 device: cuda, dtype: torch.float32, 10000 times 0.13879029918462038 device: cuda, dtype: torch.float64, 10000 times 0.14587809145450592 ``` Fix https://github.com/pytorch/pytorch/issues/24510 #24514 https://github.com/pytorch/pytorch/issues/24657 https://github.com/pytorch/pytorch/issues/24661 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31566 Differential Revision: D19314251 Pulled By: ezyang fbshipit-source-id: 52df17b2c18ef1880374c6dbcf18fb1118086552	2020-01-09 09:41:36 -08:00
Richard Zou	5c423cae72	Add precision tests for CUDA half linspace+logspace (#31962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31962 I added precision tests for CUDA half, float, and double. The precision for CUDA half seems bad, but I checked the numbers against previous versions of pytorch. The output of CUDA Half linspace+logspace are exactly the same when compared with 1.2.0. Test Plan: - Run CI Differential Revision: D19320182 Pulled By: zou3519 fbshipit-source-id: 38d3d4dea2807875ed0b0ec2b93b19c10a289988	2020-01-09 07:35:52 -08:00
Iurii Zdebskyi	5d5f156558	Revert D18903453: Quantized H Tangent function Test Plan: revert-hammer Differential Revision: D18903453 Original commit changeset: 0050b1cebb1d fbshipit-source-id: 205978f71d5688d4068861f7cf2dff40fbb311c6	2020-01-09 07:30:49 -08:00
Edward Yang	ddff4efa26	Don't use RTLD_GLOBAL to load _C. (#31162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31162 This should help us resolve a multitude of weird segfaults and crashes when PyTorch is imported along with other packages. Those would often happen because libtorch symbols were exposed globally and could be used as a source of relocations in shared libraries loaded after libtorch. Fixes #3059. Some of the subtleties in preparing this patch: * Getting ASAN to play ball was a pain in the ass. The basic problem is that when we load with `RTLD_LOCAL`, we now may load a library multiple times into the address space; this happens when we have custom C++ extensions. Since the libraries are usually identical, this is usually benign, but it is technically undefined behavior and UBSAN hates it. I sprayed a few ways of getting things to "work" correctly: I preload libstdc++ (so that it is seen consistently over all library loads) and added turned off vptr checks entirely. Another possibility is we should have a mode where we use RTLD_GLOBAL to load _C, which would be acceptable in environments where you're sure C++ lines up correctly. There's a long comment in the test script going into more detail about this. * Making some of our shared library dependencies load with `RTLD_LOCAL` breaks them. OpenMPI and MKL don't work; they play linker shenanigans to look up their symbols which doesn't work when loaded locally, and if we load a library with `RLTD_LOCAL` we aren't able to subsequently see it with `ctypes`. To solve this problem, we employ a clever device invented by apaszke: we create a dummy library `torch_global_deps` with dependencies on all of the libraries which need to be loaded globally, and then load that with `RTLD_GLOBAL`. As long as none of these libraries have C++ symbols, we can avoid confusion about C++ standard library. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D19262579 Test Plan: Imported from OSS Pulled By: ezyang fbshipit-source-id: 06a48a5d2c9036aacd535f7e8a4de0e8fe1639f2	2020-01-09 07:28:15 -08:00
Edward Yang	8614860210	Uniformly apply Windows logic in cpp_extensions everywhere (#31161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31161 Previously, it wasn't necessary to specify `DT_NEEDED` in C++ extensions on Linux (aka pass `-l` flags) because all of the symbols would have already been loaded with `RTLD_GLOBAL`, so there wouldn't be any undefined symbols. But when we switch to loading `_C` with `RTLD_LOCAL`, it's now necessary for all the C++ extensions to know what libraries to link with. The resulting code is clearer and more uniform, so it's wins all around. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262578 Pulled By: ezyang fbshipit-source-id: a893cc96f2e9aad1c064a6de4f7ccf79257dec3f	2020-01-09 07:28:11 -08:00
Negin Raoof	0dbd5c0bfe	Added torchvision tests as part of ORT tests (#31835 ) Summary: Added torchvision tests as part of ORT tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/31835 Reviewed By: hl475 Differential Revision: D19278607 Pulled By: houseroad fbshipit-source-id: 18a6a85ce3019bcc9aee9517af1378964b585afd	2020-01-08 21:04:29 -08:00
Supriya Rao	6d9a9e379d	Fix segfault in caffe2 slice test (#31801 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31801 Try to fix issue #30764 Test Plan: python test/onnx/test_utility_funs.py TestUtilityFuns Imported from OSS Differential Revision: D19315046 fbshipit-source-id: de3595969280e4ebe762cb098ff0891f8b5a9a90	2020-01-08 17:13:29 -08:00
Hector Yuen	9e9ca6ec37	add conversion functions to embedding tables (#31083 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31083 add (fp32/fp16)<->(int8 rowwise quantized fp32/fp16 scale biases) Test Plan: added unit tests enhanced shape inference tests Reviewed By: jspark1105 Differential Revision: D18920547 fbshipit-source-id: 6b3d7cb93f9d1669ecf511817d73976177632891	2020-01-08 16:56:12 -08:00
jjsjann123	eb23171bce	TensorIterator norm update (#31903 ) Summary: special case for norm out where p == 2. Instead of calling `pow`, we use multiplication as a faster code path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31903 Differential Revision: D19312749 Pulled By: ngimel fbshipit-source-id: 73732b7b37a243a14438609784795b920271a0b5	2020-01-08 16:50:42 -08:00
Elias Ellison	8ecd3f783d	check for object equality in constant pooling (#31800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31800 If we know that two constants are the same object, we can ignore other constraints and pool them together. This fixes an issue introduced by the other PR where quantization relied on constant pooling happening for correctness. Test Plan: Imported from OSS Differential Revision: D19269499 Pulled By: eellison fbshipit-source-id: 9d4396125aa6899cb081863d463d4f024135cbf4	2020-01-08 16:47:07 -08:00
Elias Ellison	319cc21108	Add AliasDb API For Changing Aliasing (#31501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31501 We have a number of places in our code base where we should be checking if it's safe to change the alias relationship between two sets of values. This PR adds an api to Alias Db to consolidate the logic, and refactors Constant Pooling and `CSE` to use the new api. Next steps: add api usage in peephole.cpp where applicable. Happy to bikeshed `AliasDb::safeToChangeAliasingRelationship`. Previously I suggested `AliasDb::safeToIntroduceAliasing`, however that's not quite accurate, because this API also handles when it is unsafe to remove aliasing. Alternate suggestions: `safeToChangeAliasing`, `validToChangeAliasing`, `validToChangeAliasingRelationship` Related: https://github.com/pytorch/pytorch/issues/28360 Test Plan: Imported from OSS Differential Revision: D19254413 Pulled By: eellison fbshipit-source-id: 17f7f52ad2d1526d303132767cbbb32f8189ae15	2020-01-08 16:47:03 -08:00
davidriazati	5cc49ed45f	Document `IValue` (#31904 ) Summary: This is a first pass attempt at documenting `IValue` to help with problems like in #17165. Most users are probably concerned with * how to make an `IValue` that matches the input type to their graph (most of the constructors are pretty self explanatory, so as long as they are in the docs I think its enough) * how to extract the results after running their graph (there is a small note on the behavior of `.toX()` based on confusions we've had in the past) Preview: https://driazati.github.io/pytorch_doc_previews/31904/api/structc10_1_1_i_value.html#exhale-struct-structc10-1-1-i-value There are also some random CSS fixes to clean up the style. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31904 Pulled By: driazati Differential Revision: D19318733 fbshipit-source-id: b29dae3349d5a7ea5a3b8e09cd23f7ff8434edb4	2020-01-08 16:08:35 -08:00
davidriazati	883fb5434a	Use real argument names for Python functions (#29300 ) Summary: This hooks up `inspect` so that Python functions get their parameters names attached instead of naming them `0, 1, 2, ...`. This also fixes issue #28537 where `ignore` functions were improperly typing `self`. ](https://our.intern.facebook.com/intern/diff/19256434/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29300 Pulled By: driazati Differential Revision: D19256434 fbshipit-source-id: 6a1fe7bd0afab708b8439517798955d0abfeb44c	2020-01-08 15:41:28 -08:00
davidriazati	09a22f3301	Remove C++ docs contributing page (#31908 ) Summary: Stacked PRs * #31908 - Remove C++ docs contributing page * #31905 - Add doc previewing instructions We should have 1 source of truth for contribution instructions (CONTRIBUTING.md). This PR moves the instructions from the C++ doc pages there instead of having its own separate page. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31908 Pulled By: driazati Differential Revision: D19296366 fbshipit-source-id: c1daf004259342bd09e09dea3b80e34db47066ec	2020-01-08 15:37:35 -08:00
davidriazati	8c59d48281	Add doc previewing instructions (#31905 ) Summary: Stacked PRs * #31908 - Remove C++ docs contributing page * #31905 - Add doc previewing instructions This adds some instructions on how to get started with Github pages you can show reviewers your documentation changes. Hopefully we can delete this eventually and build docs automatically on relevant PRs in CI. ](https://our.intern.facebook.com/intern/diff/19296364/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31905 Pulled By: driazati Differential Revision: D19296364 fbshipit-source-id: df47fa1a8d7be029c3efcf6521298583ad9f7a95	2020-01-08 15:37:31 -08:00
xiaobing.zhang	dedd16b418	remove THConv code which never be used (#31879 ) Summary: Just remove dead code in TH. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31879 Differential Revision: D19315818 Pulled By: ezyang fbshipit-source-id: dbeb2475e19e9ebf769df2649cc859c08d3d184d	2020-01-08 15:14:27 -08:00
xiaobing.zhang	9a3cb1e859	Move cauchy to Aten(CPU) (#31824 ) Summary: Fix https://github.com/pytorch/pytorch/issues/24684. Benchmark script : ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" #warm up for n in [10, 100, 1000]: input = torch.randn(128, n, requires_grad=False, device=device) for i in range(1000): input.cauchy_() for n in [1, 10, 100, 1000]: fwd_t = 0 input = torch.randn(128, n, requires_grad=False, device=device) for i in range(10000): t1 = _time() input.cauchy_() t2 = _time() fwd_t = fwd_t + (t2 -t1) fwd_avg = fwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.4f (ms)." % (n, fwd_avg)) ``` Test device: skx-8180. Before: ``` input size(128, 1) forward time is 0.0071 (ms). input size(128, 10) forward time is 0.0596 (ms). input size(128, 100) forward time is 0.5798 (ms). input size(128, 1000) forward time is 5.8395 (ms). ``` After: ``` input size(128, 1) forward time is 0.0070 (ms). input size(128, 10) forward time is 0.0583 (ms). input size(128, 100) forward time is 0.5714 (ms). input size(128, 1000) forward time is 5.7674 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31824 Differential Revision: D19314411 Pulled By: ezyang fbshipit-source-id: 58098546face3e5971b023f702cfe44ff1cccfbc	2020-01-08 15:10:53 -08:00
xiaobing.zhang	9ba6a768de	Add op bitwise_or (#31559 ) Summary: ezyang , this PR add bitwise_or operator as https://github.com/pytorch/pytorch/pull/31104 . Benchmark script : ``` import timeit import torch torch.manual_seed(1) for n, t in [(10, 100000),(1000, 10000)]: print('__or__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a \| b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t)) for n, t in [(10, 100000),(1000, 10000)]: print('__ior__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a \| b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) ``` Device: Tesla P100, skx-8180 Cuda verison: 9.0.176 Before: ``` __or__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.17616272252053022 device: cpu, dtype: torch.uint8, 100000 times 0.17148233391344547 device: cpu, dtype: torch.int16, 100000 times 0.17616403382271528 device: cpu, dtype: torch.int32, 100000 times 0.17717823758721352 device: cpu, dtype: torch.int64, 100000 times 0.1801931718364358 device: cuda, dtype: torch.int8, 100000 times 1.270583058707416 device: cuda, dtype: torch.uint8, 100000 times 1.2636413089931011 device: cuda, dtype: torch.int16, 100000 times 1.2839747751131654 device: cuda, dtype: torch.int32, 100000 times 1.2548385225236416 device: cuda, dtype: torch.int64, 100000 times 1.2650810535997152 __or__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.031136621721088886 device: cpu, dtype: torch.uint8, 10000 times 0.030786747112870216 device: cpu, dtype: torch.int16, 10000 times 0.02391665056347847 device: cpu, dtype: torch.int32, 10000 times 0.024147341027855873 device: cpu, dtype: torch.int64, 10000 times 0.024414129555225372 device: cuda, dtype: torch.int8, 10000 times 0.12741921469569206 device: cuda, dtype: torch.uint8, 10000 times 0.1249831635504961 device: cuda, dtype: torch.int16, 10000 times 0.1283819805830717 device: cuda, dtype: torch.int32, 10000 times 0.12591975275427103 device: cuda, dtype: torch.int64, 10000 times 0.12655890546739101 __ior__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.3908365070819855 device: cpu, dtype: torch.uint8, 100000 times 0.38267823681235313 device: cpu, dtype: torch.int16, 100000 times 0.38239253498613834 device: cpu, dtype: torch.int32, 100000 times 0.3817988149821758 device: cpu, dtype: torch.int64, 100000 times 0.3901665909215808 device: cuda, dtype: torch.int8, 100000 times 1.4211318120360374 device: cuda, dtype: torch.uint8, 100000 times 1.4215159295126796 device: cuda, dtype: torch.int16, 100000 times 1.4307750314474106 device: cuda, dtype: torch.int32, 100000 times 1.4123614141717553 device: cuda, dtype: torch.int64, 100000 times 1.4480243818834424 __ior__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.06468924414366484 device: cpu, dtype: torch.uint8, 10000 times 0.06442475505173206 device: cpu, dtype: torch.int16, 10000 times 0.05267547257244587 device: cpu, dtype: torch.int32, 10000 times 0.05286940559744835 device: cpu, dtype: torch.int64, 10000 times 0.06211103219538927 device: cuda, dtype: torch.int8, 10000 times 0.15332304500043392 device: cuda, dtype: torch.uint8, 10000 times 0.15353196952492 device: cuda, dtype: torch.int16, 10000 times 0.15300503931939602 device: cuda, dtype: torch.int32, 10000 times 0.15274472255259752 device: cuda, dtype: torch.int64, 10000 times 0.1512152962386608 ``` After: ``` __or__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.2465507509186864 device: cpu, dtype: torch.uint8, 100000 times 0.2472386620938778 device: cpu, dtype: torch.int16, 100000 times 0.2469814233481884 device: cpu, dtype: torch.int32, 100000 times 0.2535214088857174 device: cpu, dtype: torch.int64, 100000 times 0.24855613708496094 device: cuda, dtype: torch.int8, 100000 times 1.4351346511393785 device: cuda, dtype: torch.uint8, 100000 times 1.4434308474883437 device: cuda, dtype: torch.int16, 100000 times 1.4520929995924234 device: cuda, dtype: torch.int32, 100000 times 1.4456610176712275 device: cuda, dtype: torch.int64, 100000 times 1.4580101007595658 __or__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.029985425993800163 device: cpu, dtype: torch.uint8, 10000 times 0.03024935908615589 device: cpu, dtype: torch.int16, 10000 times 0.026356655173003674 device: cpu, dtype: torch.int32, 10000 times 0.027377349324524403 device: cpu, dtype: torch.int64, 10000 times 0.029163731262087822 device: cuda, dtype: torch.int8, 10000 times 0.14540370367467403 device: cuda, dtype: torch.uint8, 10000 times 0.1456305105239153 device: cuda, dtype: torch.int16, 10000 times 0.1450125053524971 device: cuda, dtype: torch.int32, 10000 times 0.1472016740590334 device: cuda, dtype: torch.int64, 10000 times 0.14709716010838747 __ior__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.27195510920137167 device: cpu, dtype: torch.uint8, 100000 times 0.2692424338310957 device: cpu, dtype: torch.int16, 100000 times 0.27726674638688564 device: cpu, dtype: torch.int32, 100000 times 0.2815811652690172 device: cpu, dtype: torch.int64, 100000 times 0.2852728571742773 device: cuda, dtype: torch.int8, 100000 times 1.4743850827217102 device: cuda, dtype: torch.uint8, 100000 times 1.4766502184793353 device: cuda, dtype: torch.int16, 100000 times 1.4774163831025362 device: cuda, dtype: torch.int32, 100000 times 1.4749693805351853 device: cuda, dtype: torch.int64, 100000 times 1.5772947426885366 __ior__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.03614502027630806 device: cpu, dtype: torch.uint8, 10000 times 0.03619729354977608 device: cpu, dtype: torch.int16, 10000 times 0.0319912089034915 device: cpu, dtype: torch.int32, 10000 times 0.03319283854216337 device: cpu, dtype: torch.int64, 10000 times 0.0343862259760499 device: cuda, dtype: torch.int8, 10000 times 0.1581476852297783 device: cuda, dtype: torch.uint8, 10000 times 0.15974601730704308 device: cuda, dtype: torch.int16, 10000 times 0.15957212820649147 device: cuda, dtype: torch.int32, 10000 times 0.16002820804715157 device: cuda, dtype: torch.int64, 10000 times 0.16129320487380028 ``` Fix https://github.com/pytorch/pytorch/issues/24511, https://github.com/pytorch/pytorch/issues/24515, https://github.com/pytorch/pytorch/issues/24658, https://github.com/pytorch/pytorch/issues/24662. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31559 Differential Revision: D19315875 Pulled By: ezyang fbshipit-source-id: 4a3ca88fdafbeb796079687e676228111eb44aad	2020-01-08 15:06:30 -08:00
xiaobing.zhang	4f9d2f74e2	Port softplus activation to Aten(CPU+CUDA) (#30504 ) Summary: VitalyFedyunin, This PR is about port Softplus activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" m = nn.Softplus() if torch.cuda.is_available(): device = "cuda" m = m.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) fwd_t = 0 bwd_t = 0 for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. Perfromance: Before: ``` GPU: input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.12 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.18 (ms). CPU: input size(128, 100) forward time is 1.16 (ms); backwad avg time is 0.69 (ms). input size(128, 10000) forward time is 60.19 (ms); backwad avg time is 31.86 (ms). ``` After: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: input size(128, 100) forward time is 0.43 (ms); backwad avg time is 0.16 (ms). input size(128, 10000) forward time is 1.65 (ms); backwad avg time is 0.83 (ms). ``` `OMP_NUM_THREADS=1:` ``` Before: input size(128, 100) forward time is 0.53 (ms); backwad avg time is 0.28 (ms). input size(128, 10000) forward time is 51.33 (ms); backwad avg time is 25.48 (ms). After: input size(128, 100) forward time is 0.44 (ms); backwad avg time is 0.16 (ms). input size(128, 10000) forward time is 42.05 (ms); backwad avg time is 13.97 (ms). ``` Fix https://github.com/pytorch/pytorch/issues/24633, https://github.com/pytorch/pytorch/issues/24634, https://github.com/pytorch/pytorch/issues/24766, https://github.com/pytorch/pytorch/issues/24767. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30504 Differential Revision: D19274913 Pulled By: ezyang fbshipit-source-id: 21b29e8459dcba5a040cc68333887b45a858328e	2020-01-08 15:03:53 -08:00
Yinghai Lu	d2fdf140af	Combine all the user inputs together and convert them to fp16 (#31898 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31898 Att Reviewed By: tracelogfb Differential Revision: D19291357 fbshipit-source-id: 747ed5234ca042ceeaff2d094701ead7597ac3ee	2020-01-08 14:36:42 -08:00
Yinghai Lu	8b4feff01d	Use simd version for fp16 conversions (#31897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31897 Previous version only use avx2. The _simd version uses avx512 if CPU is capable of that. Test Plan: Unitttest Reviewed By: tracelogfb Differential Revision: D19291499 fbshipit-source-id: 3b1ee0ba756e5c9defbd5caf7f68982d9b2ca06c	2020-01-08 14:36:38 -08:00
Alban Desmaison	1314f7f4f4	Ensure the original grad_mode is restored during backward (#31884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31884 Fix #31715 Test Plan: Imported from OSS Differential Revision: D19301076 Pulled By: albanD fbshipit-source-id: 2d20c01bfb6364fa96c8fe5aa5ce7ea39defa3ce	2020-01-08 14:16:51 -08:00
Alban Desmaison	c299cb05ef	temporary fix for jit test backward compatibility issues Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31949 Test Plan: Imported from OSS Differential Revision: D19314763 Pulled By: albanD fbshipit-source-id: b5eff0ed53a371d260596ca85d914c8bddb0a8aa	2020-01-08 13:32:08 -08:00
Mingbo Wan	462bfc7fe7	docker hub image info (#31923 ) Summary: result: http://docker.pytorch.org/docker_hub.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/31923 Differential Revision: D19316770 Pulled By: mingbowan fbshipit-source-id: 57f34d8983d26772bb0d310fa0a4085674c860e5	2020-01-08 13:20:06 -08:00
Edward Yang	5dfcfeebb8	Revert D19298735: Emit warning from deprecated torch function signatures Test Plan: revert-hammer Differential Revision: D19298735 Original commit changeset: 03cb78af1765 fbshipit-source-id: 304a6d4412f53a8fc822d36897c96815432e0f70	2020-01-08 13:04:41 -08:00
Zafar Takhirov	620060cb0c	Quantized H Tangent function (#31031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31031 This activation will be needed for the LSTM implementation. Also includes the QNNPack implementation. Test Plan: Imported from OSS Differential Revision: D18903453 Pulled By: z-a-f fbshipit-source-id: 0050b1cebb1ddb179b7ecbcb114fe70705070f67	2020-01-08 12:59:39 -08:00
Peter Bell	54777b1e73	Avoid reference invalidation in cuda SpectralOps' plan_caches (#31861 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31412 The root cause is `plan_caches` being resized in one thread while another holds a reference to an existing `CuFFTParamsLRUCache` which then becomes invalidated. I was able to reproduce the crash very reliably without this fix applied and no longer see it. Being a race condition, it's hard to say for sure though. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31861 Differential Revision: D19312314 Pulled By: ezyang fbshipit-source-id: 06e4561128d503f2d70cdfe1982be0f3db2a8cf8	2020-01-08 11:50:05 -08:00
Shen Li	7f723cbd8a	Revert D19290954: Implement backend-agnostic rpc._wait_all_workers() utility Test Plan: revert-hammer Differential Revision: D19290954 Original commit changeset: cdb22203c2f2 fbshipit-source-id: 2ae194a06a645e4f48879271eccf0588b0956cd3	2020-01-08 10:25:51 -08:00
Xiang Gao	c66ca74f03	Add device debug info to CUDA build (#31929 ) Summary: Also print NVCC flags in the summary Pull Request resolved: https://github.com/pytorch/pytorch/pull/31929 Differential Revision: D19312079 Pulled By: ezyang fbshipit-source-id: cd20d5a385f61174c1907a9ad883c04de66ef037	2020-01-08 09:56:20 -08:00
Sebastian Messmer	f0072b3af5	Remove C++11 compatibility from c10::optional (#30919 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30919 deletecode ghstack-source-id: 96383227 Test Plan: waitforsandcastle Differential Revision: D18869641 fbshipit-source-id: c08345d17a291cea3749af20473b6acddc78ab27	2020-01-08 09:19:59 -08:00
Sebastian Messmer	f67851d69a	Fix c10::util::get_fully_qualified_type_name for MSVC (#31313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31313 This is a bugfix. The reason we couldn't enable the constexpr-ness for it before is that it was buggy, and without constexpr it crashed at runtime and not at compile time which seems to have passed our CI unfortunately... ghstack-source-id: 96380160 Test Plan: Now it works even when enabling constexpr for it Differential Revision: D19087471 fbshipit-source-id: 28be107389f4507d35d08eab4b089a405690529b	2020-01-08 09:11:10 -08:00
Sebastian Messmer	2a294aace6	Remove memory ordering from LeftRight (#31026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31026 This is error prone and probably wrong. Since we don't use LeftRight on the hot path anymore, let's remove this. ghstack-source-id: 96369644 Test Plan: none Differential Revision: D18902165 fbshipit-source-id: 7b9478cd7cc071f403d75da20c7c889c27248b5c	2020-01-08 08:59:30 -08:00
James Donald	84dfa96f62	Fix -Wundef warning in conversions.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31911 Test Plan: * CI builds including GPU and OSS-build tests * The `defined(__HIP_DEVICE_COMPILE__) ` instance a few lines below is proof that this is a define/undef flag, not a define01 flag Reviewed By: hlu1 Differential Revision: D19296560 fbshipit-source-id: 1c45069aec534b0bf4a87751a74680675c985e06	2020-01-08 08:39:37 -08:00
Alban Desmaison	ee817012b2	Add more tests to the autograd wrt view and inplace (#31147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31147 The goal here is to add more tests of the current behavior of the autograd to make sure no regressions are introduced when modifying it. Do let me know if you think of other corner cases I missed. Test Plan: Imported from OSS Differential Revision: D19301082 Pulled By: albanD fbshipit-source-id: 2cb07dcf99e56eb1f2c56a179796f2e6042d5a2d	2020-01-08 07:14:52 -08:00
Shihao Xu	6664703842	Implement backend-agnostic rpc._wait_all_workers() utility (#31888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31888 We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent. - Sort worker names. - Elect the first name as the leader in the ordered worker names. - Followers reports therir intent to synchronize to the leader. - Leader also reports to itself, when `_wait_all_workers()` called. - If all workers report their intent to proceed, leader send the command to every one to proceed. ghstack-source-id: 96386210 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_worker_id ``` # Stress runs ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` Differential Revision: D19290954 fbshipit-source-id: cdb22203c2f27b5e0d0ad5b2d3b279d438c22dcf	2020-01-08 01:00:25 -08:00
Edward Yang	9116f02beb	Rename TORCH_DCHECK to TORCH_INTERNAL_ASSERT_DEBUG_ONLY (#31917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31917 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19301480 Pulled By: ezyang fbshipit-source-id: fcce8868733965b9fbd326b4ec273135759df377	2020-01-07 17:28:47 -08:00
Sebastian Messmer	ab60cca488	Make c10::util::get_fully_qualified_type_name() backwards compatible with clang 4 (#31351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31351 Clang 4 needs the c10:: namespace specifier on fully_qualified_type_name_impl() to work correctly. Also, let's add an error message for people using clang 3 and earlier, we don't support those compilers anymore but before this PR, they got a crappy message. ghstack-source-id: 96380163 Test Plan: testinprod Differential Revision: D19135587 fbshipit-source-id: c206b56240b36e5c207fb2b69c389bb39f1e62aa	2020-01-07 17:07:54 -08:00
Sebastian Messmer	0dca9c30ca	constexpr typeid improvements (#31312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31312 ghstack-source-id: 96369343 Test Plan: unit tests Differential Revision: D19087198 fbshipit-source-id: 7f9a7169f11973759b9ecabcc755c211d34e2742	2020-01-07 17:07:49 -08:00
Sebastian Messmer	c21f89970f	Remove c++14-conditional constexpr (#30916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30916 These macros said "make it constexpr if we're in C++14". Since we're now always C++14, we can just say "constexpr" isntead. ghstack-source-id: 96369584 Test Plan: waitforsandcastle Differential Revision: D18869635 fbshipit-source-id: f41751e4e26fad6214ec3a98db2d961315fd73ff	2020-01-07 16:40:11 -08:00
David Reiss	4daa3dedbe	Fix IValue.isList Summary: I think this was wrong before? Test Plan: Not sure. Reviewed By: IvanKobzarev Differential Revision: D19221358 fbshipit-source-id: 27e675cac15dde29e026305f4b4e6cc774e15767	2020-01-07 16:33:36 -08:00
David Reiss	1b4d3d5748	Properly return data from non-contiguous tensors in Java Summary: These were returning incorrect data before. Now we make a contiguous copy before converting to Java. Exposing raw data to the user might be faster in some cases, but it's not clear that it's worth the complexity and code size. Test Plan: New unit test. Reviewed By: IvanKobzarev Differential Revision: D19221361 fbshipit-source-id: 22ecdad252c8fd968f833a2be5897c5ae483700c	2020-01-07 16:33:31 -08:00
David Reiss	2d6a2c898c	Support tensors with a storage offset in Java (#31584 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31584 These were returning incorrect data before. Test Plan: New unit test. Reviewed By: IvanKobzarev Differential Revision: D19221360 fbshipit-source-id: b3f01de086857027f8e952a1c739f60814a57acd	2020-01-07 16:33:26 -08:00
David Reiss	6d1fa8296b	Support tensors with empty shape in Java Summary: These are valid tensors. Test Plan: New unit test. Reviewed By: IvanKobzarev Differential Revision: D19221362 fbshipit-source-id: fa9af2fc539eb7381627b3d473241a89859ef2ba	2020-01-07 16:33:21 -08:00
davidriazati	3c07eb33bb	Better error for `torch::jit::load`ing a eager file (#31709 ) Summary: This adds a check to catch the case where someone `torch.save`s something then `torch::jit::load`s it in C++. Relevant for #31620 ](https://our.intern.facebook.com/intern/diff/19252172/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31709 Pulled By: driazati Differential Revision: D19252172 fbshipit-source-id: f2a9b4442647285418b2778306629b4ff77c15e5	2020-01-07 16:20:42 -08:00
Shihao Xu	a730920a3d	Make RRef leak detection always print a warning log (#31922 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31922 For better debugging, `test_rref_leak` failure in https://app.circleci.com/jobs/github/pytorch/pytorch/4135881, as per discussion in https://github.com/pytorch/pytorch/pull/31888. ghstack-source-id: 96375261 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak ``` # Stress runs ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` Differential Revision: D19302814 fbshipit-source-id: 51632aede98e01689f8bc0f266788a9b020daa15	2020-01-07 15:18:00 -08:00
Karl Ostmo	227d1a43a4	Revert D18838848: disable __torch_function__ overides for operators in torch.functional Test Plan: revert-hammer Differential Revision: D18838848 Original commit changeset: 22b8015d7b2f fbshipit-source-id: fdaeffcd112990ed379782cf7216d3f1beeb2cb1	2020-01-07 15:03:15 -08:00
Bram Wasti	8a0503b355	Run a non-quiet submodule update to prevent timeouts on Circle CI (#31900 ) Summary: As in title, this PR will disable the `--quiet` flag used in the CI as a workaround to a timeout hitting Mac OS CI. Circle CI works by timing out when no text has been printed for 10 min. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31900 Differential Revision: D19302899 Pulled By: bwasti fbshipit-source-id: 145647da983ee06f40794bda1abd580ea45a0019	2020-01-07 14:01:05 -08:00
Jeremy Lilley	114562cf93	For torch::from_blob() add clue when memory is non-owned. (#31222 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31222 - When constructing torch::from_blob() in the case where the deleter is a nop, switch to using a nullptr context in the DataPtr (with a nop deleter) - No real extra memory/cpu requirements here, actually saves a minor alloc. Why? Trying to get a signal that a Tensor might contain non-owned memory from torch::from_blob(), by detecting the nullptr context. ghstack-source-id: 96336078 Test Plan: buck test mode/dev caffe2/test/cpp/api/... buck test mode/dev-nosan caffe2/test/... Differential Revision: D18992119 fbshipit-source-id: 4eea642f82d0858b57fdfc6995364a760c10567d	2020-01-07 13:12:30 -08:00
Nathan Goldbaum	ca72df06ae	disable __torch_function__ overides for operators in torch.functional (#30839 ) Summary: For now I'm just removing the decorators from all of the currently overridable functions in `torch.functional`. This means they are no longer overridable, however this should fix the benchmark regressions reported in https://github.com/pytorch/pytorch/issues/30831. Moving forward we'll be looking at reducing the overhead of the python-level override mechanism and failing that, re-implementing all of these operators in C++. cc hl475 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30839 Differential Revision: D18838848 Pulled By: ezyang fbshipit-source-id: 22b8015d7b2f7a947f1ebc9632c998e081b48ad8	2020-01-07 12:27:28 -08:00
lixinyu	bb279c5c63	named tensor max pooling support Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31669 Test Plan: Imported from OSS Differential Revision: D19240348 Pulled By: glaringlee fbshipit-source-id: 004387aa753e4e41afdede66647abbb0bcbd9808	2020-01-07 12:03:18 -08:00
Artem Volkhin	3a2757c682	Fix tracing for modules with List[Tensor] as output (#31343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31343 Fix an issue in TorchScript tracing for modules with `c10::List<at::Tensor>` as an output. TensorList was not supported properly. Test Plan: unit tests Reviewed By: wanchaol Differential Revision: D18850722 fbshipit-source-id: 87a223104d1361fe754d55deceeb1e8bbcad629b	2020-01-07 11:57:25 -08:00
Peter Bell	74d69e296e	Raise an error if torch.cat is given `out` as one of the input tensors (#30577 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30562 for both cpu and cuda. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30577 Differential Revision: D19298732 Pulled By: ezyang fbshipit-source-id: ea539c97493ee17d8f60b1134d100a44c8717578	2020-01-07 11:30:33 -08:00
Jessica Lin	c888473b57	Restructure docs organization and naming (#31849 ) Summary: * Rename “Other Languages” → “Language Bindings” * Move the Community section to the bottom * Move "Language Bindings" above "Python API" Pull Request resolved: https://github.com/pytorch/pytorch/pull/31849 Differential Revision: D19290966 Pulled By: jlin27 fbshipit-source-id: 30b579e032a9fb1636e4afc7bbbd85a2708f637d	2020-01-07 11:16:53 -08:00
Pritam Damania	bf8e1c0710	Integrate async mode for autograd engine with distributed autograd. (#31508 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31508 This PR builds on top of https://github.com/pytorch/pytorch/pull/31230 to ensure that distributed autograd doesn't block an RPC thread anymore during the backward pass. I've also added a unit test where all ranks hammer rank 0 without about 60 backward calls (which would cause a deadlock earlier), but now such a test passes without any issues. ghstack-source-id: 96345097 Test Plan: waitforbuildbot Differential Revision: D19188749 fbshipit-source-id: b21381b38175699afd0f9dce1ddc8ea6a220f589	2020-01-07 11:01:16 -08:00
Peter Bell	0e5a6700cc	Emit warning from deprecated torch function signatures (#31514 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28430 The unpythonic signatures for functions such as `torch.addcdiv` are already seperated in [`deprecated.yaml`] and the signatures marked as deprecated in `PythonArgParser`. However, nothing was done with this information previously. So, this now emits a warning when the deprecated signatures are used. One minor complication is that if all arguments are passed as keyword args then there is nothing to differentiate the deprecated overload. This can lead to false warnings being emitted. So, I've also modified `PythonArgParser` to prefer non-deprecated signatures. [`deprecated.yaml`]: https://github.com/pytorch/pytorch/blob/master/tools/autograd/deprecated.yaml Pull Request resolved: https://github.com/pytorch/pytorch/pull/31514 Differential Revision: D19298735 Pulled By: ezyang fbshipit-source-id: 03cb78af17658eaab9d577cd2497c6f413f07647	2020-01-07 10:57:53 -08:00
Pritam Damania	5cc62f2913	Ensure autograd callbacks are called only once for reentrant backward. (#31909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31909 https://github.com/pytorch/pytorch/pull/31230 introduced a bug where we would end up calling `graph_task_post_processing` twice for reentrant backward calls (once when we mark the future completed and then we we called graph_task_post_processing in execute_with_graph_task). This PR fixes the issues by verifying the future we return in that case is completed and we remove the call to graph_task_post_processing. In addition to that I added a test that reproduced the problem and verified it is fixed by this PR. ghstack-source-id: 96349102 Test Plan: waitforbuildbot Differential Revision: D19296363 fbshipit-source-id: dc01a4e95989709ad163bb0357b1d191ef5a4fb2	2020-01-07 10:35:04 -08:00
Johannes M Dieterich	4ee9c56218	Support PyTorch ROCm CI on Ubuntu18.04 (#31886 ) Summary: In order to support Ubuntu18.04, some changes to the scripts are required. * install dependencies with -y flag * mark install noninteractive * install some required dependencies (gpg-agent, python3-distutils, libidn11) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31886 Differential Revision: D19300586 Pulled By: bddppq fbshipit-source-id: d7fb815a3845697ce63af191a5bc449d661ff1de	2020-01-07 10:32:47 -08:00
Sameer Deshmukh	2f5eefe525	Raise ValueError if CUDA device is specified without specifying the : (#29087 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/19076 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29087 Differential Revision: D19298959 Pulled By: ezyang fbshipit-source-id: 878ea4840682012f07177d8d159a77c0e5afada6	2020-01-07 10:29:49 -08:00
Edward Yang	3c7db5ccbc	Don't unconditionally compile runJITCPPTests (#31236 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31236 It is not compiled on Windows Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262581 Pulled By: ezyang fbshipit-source-id: 80bfa553333a946f00291aaca6ad26313caaa9e6	2020-01-07 10:24:52 -08:00
Fei Tian	809ee9d04c	Enable personalized FC weight_init and sparse_emb weight_init (#31707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31707 Change the initialization value for FC weight init and sparse embedding lookup init. Previous default initialization is uniform(-\sqrt(1/input_dim), \sqrt(1/input_dim)); Now pass into a flexible hyperparameter, say \alpha into it, to change into uniform(-\sqrt(\alpha/input_dim), \sqrt(\alpha/input_dim)); Reviewed By: chonglinsun Differential Revision: D18825615 fbshipit-source-id: 4c5f2e07f2b3f5d642fd96d64dbf68892ebeb30b	2020-01-07 10:10:54 -08:00
Andreas Koepf	22044c6f7c	Use TORCH_CHECK instead of AT_ASSERT in torch::cuda::gather() (#27456 ) Summary: The error message produced by AT_ASSERT() in gather() encouraged users to file a bug report ("please report a bug to PyTorch..."). The assertion should be a regular argument check since it can be triggered by passing tensors with different dimensionality, e.g. `torch.cuda.comm.gather([torch.rand(1, device='cuda'), torch.rand(1, 1, device='cuda')])`. See: https://github.com/pytorch/pytorch/issues/26400 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27456 Differential Revision: D19300270 Pulled By: ezyang fbshipit-source-id: ec87d225e23445020b377521e0daccceb4748215	2020-01-07 10:04:24 -08:00
yyb1995	20c5dd59bd	Add stub for transformer.py and MultiheadAttention Class. (#28396 ) Summary: Add stub for `transformer.py` and `class MultiheadAttention`. Add import for `transformer.py` and `class MultiheadAttention` in `__init__.pyi.in`. I've tested the code hint in PyCharm and all works file. Relate issue: [https://github.com/pytorch/pytorch/issues/27842](https://github.com/pytorch/pytorch/issues/27842) ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/28396 Differential Revision: D19300287 Pulled By: ezyang fbshipit-source-id: 1a79d6518b5edd4643892c46a959108385c739ad	2020-01-07 09:13:36 -08:00
Eli Uriegas	346a349111	Update all instances of 1.4.0 -> 1.5.0 (#31785 ) Summary: Done with: ``` ❯ sed -i 's/1\.4\.0/1.5.0/g' $(find -type f -not -path "./third_party/") ``` This was previously done in separate commits, but it would be beneficial to bump all included projects within this repository at the same time. Old bumps for reference: [iOS]Update Cocoapods to 1.4.0: https://github.com/pytorch/pytorch/pull/30326 * [android] Change nightly builds version to 1.4.0-SNAPSHOT: https://github.com/pytorch/pytorch/pull/27381 * Roll master to 1.4.0: https://github.com/pytorch/pytorch/pull/27374 Signed-off-by: Eli Uriegas <eliuriegas@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/31785 Differential Revision: D19277925 Pulled By: seemethere fbshipit-source-id: f72ad082f0566004858c9374879f4b1bee169f9c	2020-01-07 08:00:17 -08:00
rohithkrn	985fd970aa	Enable BFloat16 support for Convolutions on ROCm (#30948 ) Summary: This PR adds bfloat16 support for convolutions on ROCm. - Intergrates MIOpen bfloat16 convolution support into PyTorch - Enables bfloat16 convolution for non-miopen paths, i.e THCUNN, native hip kernels - Enables bfloat16 type for probability distribution functions(this is included in this PR since conv unit tests use bfloat16 random number generators) Native cuda kernels for convolution and random functions will be compiled for CUDA as well. iotamudelta bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/30948 Differential Revision: D19274164 Pulled By: ezyang fbshipit-source-id: c0888a6ac72a2c5749b1ebb2195ac6f2209996be	2020-01-07 06:57:35 -08:00
Rohan Varma	a561a8448b	minor doc tweak to use mp.spawn in example (#30381 ) Summary: Per pietern's comment in https://github.com/pytorch/pytorch/issues/30022, we can make this example launcher a bit simpler by using `torch.multiprocessing`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30381 Differential Revision: D19292080 Pulled By: rohan-varma fbshipit-source-id: 018ace945601166ef3af05d8c3e69d900bd77c3b	2020-01-06 22:19:01 -08:00
Gao, Xiang	34561dadcd	Don't handle bias inside cudnn_convolution* (#31524 ) Summary: Compared to cuDNN bias, PyTorch add has the following advantage: - faster, especially for backward (see: https://github.com/zasdfgbnm/things/blob/master/2019/conv-backward-profile.md) - handles 64bit indexing automatically - has less code, less maintenance effort ngimel I submit this PR early so the CI could start building it. But I have not tested it locally yet (still waiting for compiling). Pull Request resolved: https://github.com/pytorch/pytorch/pull/31524 Differential Revision: D19264244 Pulled By: ngimel fbshipit-source-id: cb483d378a6d8bce0a05c3643a796e544bd8e8f0	2020-01-06 16:47:54 -08:00
Peter Bell	5d80f63478	no_grad, enable_grad: support for decorating generator functions (#31792 ) Summary: Closes https://github.com/pytorch/pytorch/issues/31497 This allows `torch.no_grad` and `torch.enable_grad` to be used as decorators for generator functions. In which case it disables/enables grad only inside the body of the generator and restores the context outside of the generator. https://github.com/pytorch/pytorch/issues/31497 doesn't include a complete reproducer but the included test with `torch.is_grad_enabled` show this is working where it failed before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31792 Differential Revision: D19274971 Pulled By: albanD fbshipit-source-id: fde6d3fd95d76c8d324ad02db577213a4b68ccbe	2020-01-06 15:21:20 -08:00
Edward Yang	58cffbff91	Add missing TORCH_CUDA_API annotation to throw_nccl_error (#31157 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31157 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262583 Pulled By: ezyang fbshipit-source-id: 8fb87b41ab53770329b38e1e2fe679fb868fee12	2020-01-06 14:39:51 -08:00
Edward Yang	4ef9daf7b2	Remove dead CAFFE2_LIBS variable (#31155 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31155 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262584 Pulled By: ezyang fbshipit-source-id: 147ac5a9c36e813ea9a2f68b498880942d661be5	2020-01-06 14:39:47 -08:00
Edward Yang	a9dae70bae	Remove LibIRC logic from cmake. (#31152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31152 Per apaszke: I can't find any reasonable references to libIRC online, so I decided to remove this. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D19262582 Pulled By: ezyang fbshipit-source-id: a1d47462427a3e0ca469062321d608e0badf8548	2020-01-06 14:39:43 -08:00
neginraoof	112196fdee	Fix index put (#31552 ) Summary: This change is required for cases like: x[1:] = data or x[:3] = data Pull Request resolved: https://github.com/pytorch/pytorch/pull/31552 Reviewed By: hl475 Differential Revision: D19238815 Pulled By: houseroad fbshipit-source-id: 56c9837d86b341ea92b0a71d55034ce189d12e6c	2020-01-06 14:09:48 -08:00
neginraoof	78cba90a8c	Enable constant folding for Reshape (#31054 ) Summary: Enabled constant folding for onnx::Reshape Pull Request resolved: https://github.com/pytorch/pytorch/pull/31054 Reviewed By: hl475 Differential Revision: D18946951 Pulled By: houseroad fbshipit-source-id: 499e8bf5fb091a94f7a27cbdf4311a23b1a6e3d3	2020-01-06 13:35:44 -08:00
Ivan Kobzarev	492ca46e71	Fix androidTest - exclude host tests from it Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31522 Test Plan: Imported from OSS Reviewed By: dreiss Differential Revision: D19200861 Pulled By: IvanKobzarev fbshipit-source-id: a6024f3013398f9e0d237e06c984a20493d42f11	2020-01-06 11:29:46 -08:00
Yinghai Lu	c65305e991	Add a check method for custom type tensor (#31290 ) Summary: For backend integration, backend (e.g. Glow) needs to check the content of the tensor to determine whether it is a legit byte tensor or some special packed format. This provides a convenient interface for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31290 Reviewed By: jackm321, qizzzh Differential Revision: D19069684 Pulled By: yinghai fbshipit-source-id: 63360fa2c4d32695fe9767a40027d446d63efdd4	2020-01-06 11:15:33 -08:00
Farhan Khan	1f2b6d632a	Refactor tests in pytorch's test/dist_autograd_test.py file (#31803 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31803 Refactored the following fairly similar functions: 1. `test_context_cleanup_tensor_with_grad` 2. `test_context_cleanup_tensor_no_grad` 3. `test_context_cleanup_no_tensors` by creating a helper function `context_cleanup_test_helper` that can be invoked with the appropriate arguments. Test Plan: Verified by running tests. Differential Revision: D19269246 fbshipit-source-id: bfb42b078ad56b97ceeecf0d68b4169768c2c453	2020-01-06 10:59:00 -08:00
anjali411	ddff014b79	fixed scale_factor calculation for uint8 tensor (#31778 ) Summary: When calling the add_images() method on the tensorboard SummaryWriter with a uint8 NCHW tensor, the tensor is incorrectly scaled, resulting in overflow behavior. This leads to incorrect images being displayed in tensorboard. Issue: https://github.com/pytorch/pytorch/issues/31459 Local Testing (ran this code with and without the PR changes and printed scale_factor): import torch import torchvision from torch.utils.tensorboard import SummaryWriter writer = SummaryWriter() x=torch.tensor([[[[1, 2, 3], [4, 5, 6]]]], dtype=torch.uint8) writer.add_images("images", x) Before- scale_factor: 255, After- scale_factor: 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31778 Differential Revision: D19289189 Pulled By: anjali411 fbshipit-source-id: 350a1650337244deae4fd8f8b7fb0e354ae6986b	2020-01-06 10:27:35 -08:00
meganset	1ba1799a66	C++ added 3rd arg of false to BatchNorm/InstanceNorm register_parameter … (#31873 ) Summary: Fix for issue https://github.com/pytorch/pytorch/issues/31680 C++ BatchNorm & InstanceNorm attempt to register undefined tensors when affine is false. Fixes https://github.com/pytorch/pytorch/issues/31680 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31873 Differential Revision: D19287087 Pulled By: yf225 fbshipit-source-id: 0d57f10c49083386919b703d72b520a73a8e9e7f	2020-01-06 01:46:24 -08:00
Shen Li	33430cf094	Revert D18643137: Implement backend-agnostic rpc._wait_all_workers() utility Test Plan: revert-hammer Differential Revision: D18643137 Original commit changeset: d669d4fc9ad6 fbshipit-source-id: fe1f8ed77c1c5760638fef06e67ba100b86c33e9	2020-01-05 11:58:51 -08:00
Pritam Damania	fde94e7556	Provide async mode for local autograd engine. (#31230 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31230 A major issue with distributed autograd currently is that we block an RPC thread when we call Engine::execute_with_graph_task. To resolve this issue, I've made modifications to the local autograd engine such that `execute_with_graph_task` returns a Future instead. The `execute()` methods for Engine::execute() and DistEngine::execute() still wait() on this Future which ensures there is no change in behavior yet. In follow up PRs we can modify the distributed autograd engine to take advantage of this Future. Closes #26359 ghstack-source-id: 96298057 Test Plan: waitforbuildbot Differential Revision: D18999709 fbshipit-source-id: 388f54467fd2415a0acb7df17bd063aedc105229	2020-01-05 00:29:28 -08:00
James Noeckel	3f0b330736	corrected keyword argument name in docs for Tensor.scatter (#31617 ) Summary: See https://github.com/pytorch/pytorch/issues/31601 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31617 Differential Revision: D19268872 Pulled By: mruberry fbshipit-source-id: 52f0213f4aab991fd549b7623556a2ced61631a6	2020-01-04 21:48:30 -08:00
svcscm	9020d30fc9	Updating submodules Summary: GitHub commits: `d7f0e32081` `f2a603d2df` `323a2bc3e5` `04c07965ef` `c179d38294` `6fac956f22` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 558f35dbf1adb3b45179629c61d77488e441d4e3	2020-01-04 21:43:31 -08:00
Shihao Xu	502533cfe6	Implement backend-agnostic rpc._wait_all_workers() utility (#30710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30710 We need a backend-agnostic mechanism to do barrier-like operation before locally destroy RRef context and shutdown RPC Agent. - Sort worker names. - Elect the first name as the leader in the ordered worker names. - Followers reports therir intent to synchronize to the leader. - Leader also reports to itself, when `_wait_all_workers()` called. - If all workers report their intent to proceed, leader send the command to every one to proceed. Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_wait_all_workers$ buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_leak buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rref_forward_chain ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_wait_all_workers buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_wait_all_workers$ ``` # Stress runs ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_light_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_spawn_thrift -- test_stress_heavy_rpc --stress-runs 10 ``` # Debug ``` buck test mode/dev-nosan caffe2/test:rpc_fork -- test_shutdown ``` ``` buck test mode/dev-nosan //caffe2/test:dist_autograd_fork -- test_clean_context_during_backward buck build mode/dev-nosan //caffe2/test:dist_autograd_fork buck-out/gen/caffe2/test/dist_autograd_fork\#binary.par -r test_clean_context_during_backward ``` https://our.intern.facebook.com/intern/testinfra/diagnostics/281475127895800.844424945328750.1575664368/ ``` I1206 12:27:47.491420 185619 process_group_agent.cpp:211] Shutting down ProcessGroupAgent. I1206 12:27:47.493880 185630 process_group_agent.cpp:211] Shutting down ProcessGroupAgent. I1206 12:27:47.494526 185625 process_group_agent.cpp:211] Shutting down ProcessGroupAgent. I1206 12:27:47.495390 185636 process_group_agent.cpp:211] Shutting down ProcessGroupAgent. E1206 12:27:47.544198 185627 pair.cc:642] 1 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer) E1206 12:27:47.544203 185633 pair.cc:642] 2 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer) E1206 12:27:47.544210 185639 pair.cc:642] 3 --->>> 0, read ERROR: AsyncSocketException: Network error, type = Network error, errno = 104 (Connection reset by peer) ``` This should mean the UDF in the request has been run, so Python proceeded and ran to `_agent.shutdown()`. While the RpcAgents on followers wanted to send back the response, but the leader has closed RPC. Need to re-trigger "pytorch_rpc-buck" to reproduce the rare-seen issue. Differential Revision: D18643137 fbshipit-source-id: d669d4fc9ad65ed48bed1329a4eb1c32ba51323c	2020-01-04 17:13:44 -08:00
Martin Yuan	f362cd510d	Move prim ops from JIT registration to C10 (#30612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30612 The first version to move prim ops to c10 registration. After the reviewers are fine with the initial changes, more operators will be moved in the same style. Test Plan: Imported from OSS Differential Revision: D19237648 Pulled By: iseeyuan fbshipit-source-id: c5a519604efffb80564a556536f17d829f71d9f9	2020-01-04 13:47:44 -08:00
Jerry Zhang	5579611544	Enable foldbn tests (#29220 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29220 Support for accessing constant is added in previous PRs, this PR re-enables the foldbn tests Test Plan: test_jit.py Imported from OSS Differential Revision: D18846848 fbshipit-source-id: 90ceaf42539ffee80b984e0d8b2420da66c263c3	2020-01-04 11:47:01 -08:00
Jerry Zhang	ebe69236d1	Expose class constant through `attr` and `setattr` in object (#29219 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29219 We added class constant in previous PRs, this PR allows access to class constant in the object API Test Plan: build/bin/test_jit python test/test_jit.py Imported from OSS Differential Revision: D18846851 fbshipit-source-id: 888a6517d5f747d1f8ced283c0c2c30b2f6c72c6	2020-01-04 11:09:35 -08:00
Jerry Zhang	6f62c311a1	Add unsafeRemoveConstant for ClassType (#30787 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30787 This is needed when we fuse conv bn modules, where we need to rewrite a constant bias (None) of conv to an attribute bias of Tensor Test Plan: build/bin/test_jit Imported from OSS Differential Revision: D18846850 fbshipit-source-id: 9fd5fe85d93d07226e180b75d2e068fe00ca25fe	2020-01-04 01:11:59 -08:00
Jerry Zhang	2bac76969c	Fix getConstant (#31012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31012 - getConstant should throw when the item is not found - add another getConstant which takes slot index as argument Test Plan: test_class_type.cpp Imported from OSS Differential Revision: D18898418 fbshipit-source-id: d3a23a4896fdbf5fa98e1c55c9c4d6205840014b	2020-01-03 23:06:11 -08:00
Michael Suo	8420f205ee	Remove refs from ArrayRef arguments (#31845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31845 ArrayRef is trivially copyable and should be passed by value. Removing unnecessary `&`s. Test Plan: Imported from OSS Differential Revision: D19278523 Pulled By: suo fbshipit-source-id: 026db693ea98d19246b02c48d49d1929ecb6478e	2020-01-03 22:50:55 -08:00
Mingbo Wan	b0a2765103	move docker image html to correct bucket (#31832 ) Summary: save docker image version to docker.pytorch.org bucket to be served with http://docker.pytorch.org test result: https://s3.amazonaws.com/docker.pytorch.org/pytorch.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/31832 Differential Revision: D19281263 Pulled By: mingbowan fbshipit-source-id: d906a72d419876c81a570a2086b2d8d2c47d5d17	2020-01-03 21:38:58 -08:00
Jerry Zhang	5fe3604987	Preserve constant from ConcreteModuleType to ClassType (#29218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29218 We need to be able to access constant in module. Test Plan: tbd Imported from OSS Differential Revision: D18846847 fbshipit-source-id: 22d2c485c3c449bc14ad798f6e1a0c64fc8fb346	2020-01-03 21:30:04 -08:00
Zafar Takhirov	e5b7231edc	Adding version check for hypothesis deadline Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31262 Test Plan: Imported from OSS Differential Revision: D19036700 Pulled By: z-a-f fbshipit-source-id: 8e898a6f064dfb4876aa0d3cc299288b5af7b37d	2020-01-03 19:17:55 -08:00
Rohan Varma	28c9dd4436	fix ProcessGroupGlooTest (#31255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31255 This test had 2 issues. A timeout would occasionally happen due to a timeout of 50ms, and CUDA could would get compiled and run on CPU, leading to errors. This PR fixes those issues. Differential Revision: D19028231 fbshipit-source-id: e50752228affe0021e7c0caa83bce78d76473759	2020-01-03 18:35:29 -08:00
svcscm	27488773b0	Updating submodules Summary: GitHub commits: `8c7c0e201e` `b84db9a971` `0524fa0b36` `2df7b2ba54` `80553514ed` `4eb66bc7aa` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 97d0605beabcfc15236038215208acf034f8eba4	2020-01-03 17:04:54 -08:00
Shen Li	c829c6f3d2	Disable flaky test_debug_info Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31847 Test Plan: Imported from OSS Differential Revision: D19278009 Pulled By: mrshenli fbshipit-source-id: 652fa6741a48f35d9f8f54534e84d64fdd96b439	2020-01-03 17:01:27 -08:00
Xiaomeng Yang	6b1db202bc	Add tanh to c10::cuda::compat (#31844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31844 Add tanh to c10::cuda::compat Test Plan: unittest Reviewed By: bddppq Differential Revision: D19277230 fbshipit-source-id: d2cceea58722393ecb90aacec05b692dbb92d467	2020-01-03 14:27:36 -08:00
Liqian Peng	9407137102	Update the descriptive error message for enforce fail (#31575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31575 We need a new exception class specifically for the enforce_finite operator, because we need to map it to a specific python exception ExitException, not the RuntimeError type that all c10::Errors get mapped to by default. This diff includes: - Define c10::EnforceFiniteNotMet - API CAFFE_ENFORCE_FINITE to throw c10::EnforceFiniteNotMet - Map from c10::EnforceFiniteNotMet to python ExitException - Apply CAFFE_ENFORCE_FINITE in caffe2 op Test Plan: - integration test pass: https://fburl.com/fblearner/xwkzbqyo - integration test with D19213617: https://fburl.com/fblearner/479y4jrj Generate error message as desired - Example: - Original error message f157597803 {F225477055} - Updated error message (with D19213617 to generate the error): f158571327 {F225477071} Reviewed By: zheng-xq Differential Revision: D19206240 fbshipit-source-id: bd256862801d5957a26b76d738edf4e531f03827	2020-01-03 13:53:20 -08:00
Jerry Zhang	40e720282c	Using _floats_wrapper in per_channel_tensor generation (#31780 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31780 We need to specify width to ensure the generated float is representable by `float32` fixes: https://github.com/pytorch/pytorch/issues/31774 Test Plan: ci Imported from OSS Differential Revision: D19275165 fbshipit-source-id: 50560b4208c562b6bcd2abccadd234f29fbb4b0a	2020-01-03 13:40:08 -08:00
Nikita Shulga	86a4e2135d	Do not register `const float ` type on utiliy_ops.cu (#31583 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31583 But rather use `float `, which is alredy registered Test Plan: CI Reviewed By: xianjiec Differential Revision: D19221405 fbshipit-source-id: eb8eabcf828745022bc1e4185a0e65abd19a8f04	2020-01-03 13:28:26 -08:00
Rohan Varma	457c57d9f7	use unordered_set instead of vector for futureTimeouts key in (#31813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31813 Closes https://github.com/pytorch/pytorch/issues/31804. We were using an `std::vector` for the key for a map that keeps track of futures to mark them if they timeout, but we can instead use an `unordered_set`. This results in a faster lookup in the code block where we remove futureIDs from this set when they complete successfully. Previously we were finding them via a linear `std::find`. Switching it to a constant time find will help performance in the case where a large number of futures are scheduled to time out at the same time, or if there is no timeout enforced. To benchmark a rough perf improvement, I created 50k futures with the same timeout. Before this PR, the lookup `std::find(futuresAtTime.begin(), futuresAtTime.end(), id)` took ~200us, now it takes 1us. ghstack-source-id: 96251355 Test Plan: Unit tests pass. Differential Revision: D19269798 fbshipit-source-id: 1a0fa84a478ee27a16ab0b9fa6f5413b065a663e	2020-01-03 13:21:23 -08:00
Junjie Bai	b44c0f328e	Skip same tests in ONNX Python3 CI as in Python2 (#31827 ) Summary: resolve https://github.com/pytorch/pytorch/issues/31103 vgg models were not tested in Python2 but are turned on in Python3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31827 Reviewed By: houseroad Differential Revision: D19274123 Pulled By: bddppq fbshipit-source-id: c48beb574e8b03b2adbd6c9d8ca3f600bee93024	2020-01-03 12:42:42 -08:00
Mingfei Ma	79e30ff3f8	optimize index_select performance on CPU with TensorIterator (#30598 ) Summary: This PR aims at improving `index_select` performance on CPU with `TensorIterator`. The code has equally effective optimization for both contiguous tensor and non-contiguous tensor. The code will try to parallel inner loop in case the slice of copy is large enough, otherwise it will parallel on outer loop. Thus both the user scenarios from DLRM (from `Embedding`) and Fairseq transformer is covered. 1. for contiguous input, single socket: 1.25x performance speedup 2. for non-contiguous input, single socket: 799x performance speedup 3. for contiguous input, single core: same performance 4. for non-contiguous input, single core: 31x performance speedup Pull Request resolved: https://github.com/pytorch/pytorch/pull/30598 Differential Revision: D19266892 Pulled By: VitalyFedyunin fbshipit-source-id: 7aaf8e2c861b4a96250c968c4dd95c8d2c5b92d7	2020-01-03 11:59:43 -08:00
Zafar Takhirov	0ae063d5d9	Fixed concatenation benchmark + added it to the microbenchmarking runs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31587 Test Plan: Imported from OSS Differential Revision: D19221813 Pulled By: z-a-f fbshipit-source-id: ee0eb60da7899b23fdc63326302d1e2fd4b540ee	2020-01-03 11:23:12 -08:00
Edward Yang	9c9d3cd550	Revert D19262570: Fix race condition when creating build dir Test Plan: revert-hammer Differential Revision: D19262570 Original commit changeset: bb18c72e4264 fbshipit-source-id: 40675ef6ef4c98629deaaef0b25956f92534ff50	2020-01-03 11:17:42 -08:00
xiaobing.zhang	a02a5129a8	Move rrelu to Aten(CPU) (#31094 ) Summary: VitalyFedyunin, this PR is about port rrelu activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): return time.time() device = "cpu" m = nn.RReLU(0.1, 0.3).train() # for inference #m = nn.RReLU(0.1, 0.3).eval() #warm up for n in [1, 10, 100, 1000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.randn(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [1, 10, 100, 1000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.randn(128, n, device=device) fwd_t = 0 bwd_t = 0 for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Before: ``` Training: input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.03 (ms). input size(128, 10) forward time is 0.03 (ms); backwad avg time is 0.04 (ms). input size(128, 100) forward time is 0.17 (ms); backwad avg time is 0.06 (ms). input size(128, 1000) forward time is 1.45 (ms); backwad avg time is 0.07 (ms). inferecne: input size(128, 1) forward time is 0.01 (ms). input size(128, 10) forward time is 0.01 (ms). input size(128, 100) forward time is 0.02 (ms). input size(128, 1000) forward time is 0.15 (ms). ``` After: ``` Training: input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.03 (ms). input size(128, 10) forward time is 0.03 (ms); backwad avg time is 0.04 (ms). input size(128, 100) forward time is 0.17 (ms); backwad avg time is 0.07 (ms). input size(128, 1000) forward time is 1.43 (ms); backwad avg time is 0.08 (ms). inferecne: input size(128, 1) forward time is 0.02 (ms). input size(128, 10) forward time is 0.02 (ms). input size(128, 100) forward time is 0.02 (ms). input size(128, 1000) forward time is 0.03 (ms). ``` OMP_NUM_THREADS=1: ``` Before: Training: input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 100) forward time is 0.15 (ms); backwad avg time is 0.03 (ms). input size(128, 1000) forward time is 1.45 (ms); backwad avg time is 0.14 (ms). inferecne: input size(128, 1) forward time is 0.01 (ms). input size(128, 10) forward time is 0.01 (ms). input size(128, 100) forward time is 0.02 (ms). input size(128, 1000) forward time is 0.20 (ms). After: Training: input size(128, 1) forward time is 0.01 (ms); backwad avg time is 0.02 (ms). input size(128, 10) forward time is 0.02 (ms); backwad avg time is 0.02 (ms). input size(128, 100) forward time is 0.15 (ms); backwad avg time is 0.03 (ms). input size(128, 1000) forward time is 1.43 (ms); backwad avg time is 0.15 (ms). inferecne: input size(128, 1) forward time is 0.01 (ms). input size(128, 10) forward time is 0.02 (ms). input size(128, 100) forward time is 0.02 (ms). input size(128, 1000) forward time is 0.06 (ms). ``` Fix https://github.com/pytorch/pytorch/issues/24755, https://github.com/pytorch/pytorch/issues/24756. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31094 Differential Revision: D19270936 Pulled By: VitalyFedyunin fbshipit-source-id: 11bb3236b1037a558022d3777d1f9a429af2bffe	2020-01-03 11:10:00 -08:00
xiaobing.zhang	b47e9b97a2	Add op bitwise_and (#31104 ) Summary: Refer to https://github.com/pytorch/pytorch/pull/25665, add `bitwise_and` operator. Benchmark script : ``` import timeit #for __and__ for n, t in [(10, 100000),(1000, 10000)]: print('__and__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a & b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}")', number=t)) #for __iand__ for n, t in [(10, 100000),(1000, 10000)]: print('__iand__ (a.numel() == {}) for {} times'.format(n, t)) for device in ('cpu', 'cuda'): for dtype in ('torch.int8', 'torch.uint8', 'torch.int16', 'torch.int32', 'torch.int64'): print(f'device: {device}, dtype: {dtype}, {t} times', end='\t\t') print(timeit.timeit(f'a & b\nif "{device}" == "cuda": torch.cuda.synchronize()', setup=f'import torch; a = torch.randint(0, 10, ({n},), dtype = {dtype}, device="{device}"); b = torch.tensor(5, dtype = {dtype}, device="{device}")', number=t)) ``` Device: Tesla P100, skx-8180 Cuda verison: 9.0.176 Before: ``` __and__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.1766007635742426 device: cpu, dtype: torch.uint8, 100000 times 0.17322628945112228 device: cpu, dtype: torch.int16, 100000 times 0.17650844901800156 device: cpu, dtype: torch.int32, 100000 times 0.17711848113685846 device: cpu, dtype: torch.int64, 100000 times 0.18240160401910543 device: cuda, dtype: torch.int8, 100000 times 1.273967768996954 device: cuda, dtype: torch.uint8, 100000 times 1.2778537990525365 device: cuda, dtype: torch.int16, 100000 times 1.2753686187788844 device: cuda, dtype: torch.int32, 100000 times 1.2797665279358625 device: cuda, dtype: torch.int64, 100000 times 1.2933144550770521 __and__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.031139614060521126 device: cpu, dtype: torch.uint8, 10000 times 0.03091452084481716 device: cpu, dtype: torch.int16, 10000 times 0.022756479680538177 device: cpu, dtype: torch.int32, 10000 times 0.025045674294233322 device: cpu, dtype: torch.int64, 10000 times 0.024164282716810703 device: cuda, dtype: torch.int8, 10000 times 0.12820732593536377 device: cuda, dtype: torch.uint8, 10000 times 0.12775669433176517 device: cuda, dtype: torch.int16, 10000 times 0.12697868794202805 device: cuda, dtype: torch.int32, 10000 times 0.12832533661276102 device: cuda, dtype: torch.int64, 10000 times 0.1280576130375266 __iand__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.3687064303085208 device: cpu, dtype: torch.uint8, 100000 times 0.36253443732857704 device: cpu, dtype: torch.int16, 100000 times 0.362891579978168 device: cpu, dtype: torch.int32, 100000 times 0.37680106051266193 device: cpu, dtype: torch.int64, 100000 times 0.3689364707097411 device: cuda, dtype: torch.int8, 100000 times 1.419940729625523 device: cuda, dtype: torch.uint8, 100000 times 1.4247053815051913 device: cuda, dtype: torch.int16, 100000 times 1.4191444097086787 device: cuda, dtype: torch.int32, 100000 times 1.4305962566286325 device: cuda, dtype: torch.int64, 100000 times 1.4567416654899716 __iand__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.06224383972585201 device: cpu, dtype: torch.uint8, 10000 times 0.06205617543309927 device: cpu, dtype: torch.int16, 10000 times 0.05016433447599411 device: cpu, dtype: torch.int32, 10000 times 0.05216377507895231 device: cpu, dtype: torch.int64, 10000 times 0.06139362137764692 device: cuda, dtype: torch.int8, 10000 times 0.14827249851077795 device: cuda, dtype: torch.uint8, 10000 times 0.14801877550780773 device: cuda, dtype: torch.int16, 10000 times 0.14952312968671322 device: cuda, dtype: torch.int32, 10000 times 0.14999118447303772 device: cuda, dtype: torch.int64, 10000 times 0.14951884001493454 ``` After: ``` __and__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.23157884553074837 device: cpu, dtype: torch.uint8, 100000 times 0.23063660878688097 device: cpu, dtype: torch.int16, 100000 times 0.23005440644919872 device: cpu, dtype: torch.int32, 100000 times 0.23748818412423134 device: cpu, dtype: torch.int64, 100000 times 0.24106105230748653 device: cuda, dtype: torch.int8, 100000 times 1.4394256137311459 device: cuda, dtype: torch.uint8, 100000 times 1.4436759827658534 device: cuda, dtype: torch.int16, 100000 times 1.4631587155163288 device: cuda, dtype: torch.int32, 100000 times 1.459101552143693 device: cuda, dtype: torch.int64, 100000 times 1.4784048134461045 __and__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.028442862443625927 device: cpu, dtype: torch.uint8, 10000 times 0.028130197897553444 device: cpu, dtype: torch.int16, 10000 times 0.025318274274468422 device: cpu, dtype: torch.int32, 10000 times 0.02519288007169962 device: cpu, dtype: torch.int64, 10000 times 0.028299466706812382 device: cuda, dtype: torch.int8, 10000 times 0.14342594426125288 device: cuda, dtype: torch.uint8, 10000 times 0.145280827768147 device: cuda, dtype: torch.int16, 10000 times 0.14673697855323553 device: cuda, dtype: torch.int32, 10000 times 0.14499565307050943 device: cuda, dtype: torch.int64, 10000 times 0.14582364354282618 __iand__ (a.numel() == 10) for 100000 times device: cpu, dtype: torch.int8, 100000 times 0.25548241566866636 device: cpu, dtype: torch.uint8, 100000 times 0.2552562616765499 device: cpu, dtype: torch.int16, 100000 times 0.25905191246420145 device: cpu, dtype: torch.int32, 100000 times 0.26635489892214537 device: cpu, dtype: torch.int64, 100000 times 0.26269810926169157 device: cuda, dtype: torch.int8, 100000 times 1.485458506271243 device: cuda, dtype: torch.uint8, 100000 times 1.4742380809038877 device: cuda, dtype: torch.int16, 100000 times 1.507783885113895 device: cuda, dtype: torch.int32, 100000 times 1.4926990242674947 device: cuda, dtype: torch.int64, 100000 times 1.519851053133607 __iand__ (a.numel() == 1000) for 10000 times device: cpu, dtype: torch.int8, 10000 times 0.03425929415971041 device: cpu, dtype: torch.uint8, 10000 times 0.03293587639927864 device: cpu, dtype: torch.int16, 10000 times 0.029559112153947353 device: cpu, dtype: torch.int32, 10000 times 0.030915481969714165 device: cpu, dtype: torch.int64, 10000 times 0.03292469773441553 device: cuda, dtype: torch.int8, 10000 times 0.15792148280888796 device: cuda, dtype: torch.uint8, 10000 times 0.16000914946198463 device: cuda, dtype: torch.int16, 10000 times 0.1600684942677617 device: cuda, dtype: torch.int32, 10000 times 0.16162546630948782 device: cuda, dtype: torch.int64, 10000 times 0.1629159888252616 ``` Fix https://github.com/pytorch/pytorch/issues/24508, https://github.com/pytorch/pytorch/issues/24509, https://github.com/pytorch/pytorch/issues/24655, https://github.com/pytorch/pytorch/issues/24656. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31104 Differential Revision: D18938930 Pulled By: VitalyFedyunin fbshipit-source-id: a77e805a0b84e8ace16c6e648c2f67dad44f2e44	2020-01-03 10:32:36 -08:00
xiaobing.zhang	68f3782106	remove std_single and var_single code in TH (#31608 ) Summary: std_single and var_single in TH never be used, remove them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31608 Differential Revision: D19270920 Pulled By: VitalyFedyunin fbshipit-source-id: e106a42383bf224f7e2c1c092b95484d23af4b0a	2020-01-03 10:16:52 -08:00
leetanenbaum	0b9cd410a9	Fix cumsum error for tensors with zero elements (#31694 ) Summary: Currently `cumsum` crashes for tensors with non-empty dimensions but with zero elements, which could happen when some dimension is zero. This commit fixes the error by checking both `dim()` and `numel()` in cumsum backward Fixes https://github.com/pytorch/pytorch/issues/31515 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31694 Reviewed By: mrshenli Differential Revision: D19266613 Pulled By: leedtan fbshipit-source-id: 9407e0aa55440fed911c01a3580bb6c5eab62a16	2020-01-03 10:16:46 -08:00
Hong Xu	daf00beaba	Remove duplicated Numa detection code. (#30628 ) Summary: cmake/Dependencies.cmake (`1111a6b810/cmake/Dependencies.cmake (L595-L609)`) has already detected Numa. Duplicated detection and variables may lead to incorrect results. Close https://github.com/pytorch/pytorch/issues/29968 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30628 Differential Revision: D18782479 Pulled By: ezyang fbshipit-source-id: f74441f03367f11af8fa59b92d656c6fa070fbd0	2020-01-03 08:48:46 -08:00
Kaiyu Shi	8c425dd201	Fix race condition when creating build dir (#30956 ) Summary: The original `check-and-act` style can raise `FileExistsError` when multiple processes are jit-compiling the extension on the same node. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30956 Differential Revision: D19262570 Pulled By: ezyang fbshipit-source-id: bb18c72e42648770b47f9378ac7c3929c3c03efc	2020-01-03 07:58:26 -08:00
Tongzhou Wang	f56c59ead6	clarify when to use `as_tuple` in `torch.nonzero` Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31798 Differential Revision: D19272332 Pulled By: zou3519 fbshipit-source-id: 954d086a7b9f1a719e0dac303a4253bf7ec8e9f4	2020-01-03 07:43:35 -08:00
Owen Anderson	95cb66570a	Erase array sizes from types in c10::str(). (#31683 ) Summary: This dramatically reduces the number of instantiations and eliminates ~900KB of code from my local build of libtorch_cpu. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31683 Differential Revision: D19258364 Pulled By: resistor fbshipit-source-id: addb921a26289978ffd14c203325ca7e35a4515b	2020-01-02 22:30:57 -08:00
Rohan Varma	f39105b68f	add num_pending_users to debug info (#31539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31539 Adding this metric primarily because it is needed to unblock unit tests for https://github.com/pytorch/pytorch/pull/31381. It also may be useful to look at this metric to see the number of pending RRef forks that currently exist. ghstack-source-id: 96230360 Test Plan: Modified the relevant unit test. Differential Revision: D19204158 fbshipit-source-id: 016345e52cd02cc5f46837bffd8d589ba8575f29	2020-01-02 21:28:03 -08:00
Junjie Bai	5be8dac329	Remove non-ascii character from torch/onnx/symbolic_opset11.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31814 Reviewed By: houseroad Differential Revision: D19270742 Pulled By: bddppq fbshipit-source-id: 80800d588e63701d6e1b5838d7ada993f0246a81	2020-01-02 20:54:32 -08:00
Jiakai Liu	fc598f9023	generate op dependency graph as python code Summary: Add support to print op dependence as python code so that both custom build script and BUCK can import it without yaml parser. Test Plan: - generate the file: ``` ANALYZE_TORCH=1 FORMAT=py DEPLOY=1 tools/code_analyzer/build.sh -closure=false ``` - load the file in python: ``` python >>> from tools.code_analyzer.generated.torch import TORCH_DEPS >>> print(TORCH_DEPS) ``` Differential Revision: D18894639 Pulled By: ljk53 fbshipit-source-id: e304d0525a07a13cf6e8a9317cd22637200d044c	2020-01-02 20:26:28 -08:00
Jiakai Liu	fa0424f224	add LLVM-dev package to android docker image (#31215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31215 Install LLVM-dev package for code analysis CI job: #30937 LLVM-dev package is not related to android NDK but the whole code analysis thing is for mobile custom build so choose this docker image. Test Plan: - wait docker image to build? Differential Revision: D19193223 Pulled By: ljk53 fbshipit-source-id: 54a79daf8d98fa7c8b9eed11f519e1c7b1614be8	2020-01-02 20:26:24 -08:00
Rohan Varma	dc43f9dc54	fix test_backward_node_failure flakiness (#31588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31588 Per title. This test can sometimes fail with a different error regex than the one that is currently tested, so add this error regex to make the test pass consistently. Differential Revision: D19222275 fbshipit-source-id: 89c95276d4d9beccf9e0961f970493750d78a96b	2020-01-02 15:44:16 -08:00
Shen Li	155376721c	Pin hypothesis package to 4.57.1 to avoid test failures Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31794 Test Plan: Imported from OSS Differential Revision: D19266039 Pulled By: mrshenli fbshipit-source-id: 4b1839c4de2b4476c8173a79582c861bf4fa998f	2020-01-02 15:33:03 -08:00
Shen Li	5f8308e32d	Pin Pillow to v6 as PILLOW_VERSION is removed in v7 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31777 Test Plan: Imported from OSS Differential Revision: D19264247 Pulled By: mrshenli fbshipit-source-id: 52b0a3629e3a96ef2f9d3e289b9f7bb6a2745786	2020-01-02 15:32:58 -08:00
svcscm	feb0ccdbfd	Updating submodules Summary: GitHub commits: `123ae291fc` `b9e9d4f7d9` `86ea03e727` `1cd1bfb668` `917504ac42` `06cc652030` `e63819cbe3` `6d21d8cfd3` `b636829d55` `19d0faece2` `9860344e10` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 1de7509af788dc7861cfc779936fbc9e0146a5a5	2020-01-02 14:35:41 -08:00
Nikita Shulga	ed5cd0d742	Use numeric limits to define TensorTypeSet(FULL) representation (#31668 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31668 This also removes an annoying warning about change of sign conversion Test Plan: Run unit tests Reviewed By: ezyang Differential Revision: D19238631 fbshipit-source-id: 29b50abac635e530d5b0453c3a0f36a4573fbf5b	2020-01-02 12:54:02 -08:00
olramde	d770fbc1d2	Some modifications to improve readability (#31352 ) Summary: In the long string, formalstring thinks it is good to have a name. When using dict, literal is better for readability and faster than dict constructor. I always appreciate your efforts in creating the world's best frameworks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31352 Differential Revision: D19191967 Pulled By: ngimel fbshipit-source-id: 21f063b163b67de8cf9761a4db5991f74318e991	2020-01-02 12:48:34 -08:00
Lu Fang	7078f4b27d	skip _test_optional_float in BC check (#31786 ) Summary: Skip _test_optional_float Pull Request resolved: https://github.com/pytorch/pytorch/pull/31786 Reviewed By: hl475 Differential Revision: D19265059 Pulled By: houseroad fbshipit-source-id: 6b95bd3b8cad83a4c459c0603befaaeeade6cdff	2020-01-02 11:12:38 -08:00
svcscm	37fc59e847	Updating submodules Summary: GitHub commits: `17caab3d7b` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: f4828cd5c81615d0df86f915b3abb6a58509aa79	2020-01-02 10:57:58 -08:00
Kirayue	9e9bfbfd8d	Update old scheduler example usage (#31358 ) Summary: Update the old example usage in CosineAnnealingWarm, `scheduler.step()` should be called after `optimizer.step()`. https://github.com/pytorch/pytorch/issues/20028#issuecomment-566061580 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31358 Differential Revision: D19199311 Pulled By: vincentqb fbshipit-source-id: cb29b95f8277d2dfa75ec2a83c1af03a5c9c9a69	2020-01-02 09:15:04 -08:00
BowenBao	c4f10e0fe7	Renaming scales parameter for interpolate (#31526 ) Summary: PR separated from https://github.com/pytorch/pytorch/pull/31274. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31526 Reviewed By: zou3519 Differential Revision: D19221931 Pulled By: gchanan fbshipit-source-id: 81958a9910867ac9d62f2b47abc49384526c4e51	2020-01-02 08:19:30 -08:00
Vishwak Srinivasan	236b0a318c	Delete ATen/stub (#31763 ) Summary: This folder contained an empty CombinedStub file which isn't explicitly used anywhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31763 Differential Revision: D19262563 Pulled By: ezyang fbshipit-source-id: 5d095c93d6f7a1cc35f5919aa6006b31c2376b18	2020-01-02 07:04:07 -08:00
Lu Fang	cb1af5f61f	Revert D19233558: add float[] str[] constants Test Plan: revert-hammer Differential Revision: D19233558 Original commit changeset: 4f7c6d9ddbe7 fbshipit-source-id: a5020a9169e349a5970323471d673e8cd7818c66	2019-12-31 11:57:34 -08:00
peterjc123	7a3ed36309	Fix nvcc math functions for MSVC 2019 (#31704 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31108. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31704 Differential Revision: D19256110 Pulled By: mingbowan fbshipit-source-id: a4aba2830aba002497f70a75ef995e5e7de08393	2019-12-31 10:52:12 -08:00
Shen Li	1499b894c4	Apply clang-format to csrc/distributed/rpc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31681 Test Plan: Imported from OSS Differential Revision: D19247085 Pulled By: mrshenli fbshipit-source-id: ce6c1710663eecda3641d8dcf80ef16f9d21b93e	2019-12-31 07:25:50 -08:00
Jiyan Yang	b102550d2c	Allow to pass in masks through db (#31676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31676 Facebook： Previously we assumed mask is passed in as a tensor which is not feasible for sparse parameter. Here we allow to pass in the mask through db path which requires the masks to be stored in some db first. Test Plan: unit tests Reviewed By: ellie-wen Differential Revision: D18928753 fbshipit-source-id: 75ca894de0f0dcd64ce17b13652484b3550cbdac	2019-12-30 20:54:27 -08:00
Pritam Damania	39297bfe08	Fix flaky test_debug_info. (#31675 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31675 This test could be flaky since there could be inflight RPC requests as part of startup which might not have finished. As a result, if they finish between the different calls to retrieve debug_info, there could be a problem since we would report separate information. As a result, we wait to ensure the metrics stabilize to avoid flakiness. ghstack-source-id: 96188488 Test Plan: waitforbuildbot Differential Revision: D19242588 fbshipit-source-id: 8f3db7e7365acbd3742e6ec0c2ddcca68f27db9e	2019-12-30 18:07:26 -08:00
Xinyi Zhang	f4e955ff62	Change PackSegments to ensure consistent behavior between CPU and GPU Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31673 Reviewed By: Wakeupbuddy, BIT-silence Differential Revision: D18925762 fbshipit-source-id: e0c318e97f69b14a54f43c176af57d98fbc16c9f	2019-12-30 13:31:45 -08:00
Elias Ellison	dd0f2f0c19	add float[] str[] constants (#31503 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31503 Add support for float lists and string lists constants, which enables better constant propagation + constant pooling + freezing. Test Plan: Imported from OSS Differential Revision: D19233558 Pulled By: eellison fbshipit-source-id: 4f7c6d9ddbe7623757a9a20606ce5f394e14e93d	2019-12-30 11:58:17 -08:00
davidriazati	6064223808	`@slowTest` some slow tests (#31706 ) Summary: These are all the jit tests that take > 10 seconds according to `pytest test/test_jit.py --durations=15` ``` 32.76s call test/test_jit.py::TestModels::test_super_resolution 32.20s call test/test_jit.py::TestModels::test_neural_style 30.90s call test/test_jit.py::TestJit::test_export_batchnorm 25.95s call test/test_jit.py::TestJit::test_dropout_module_requires_grad 22.24s call test/test_jit.py::TestJitGeneratedModule::test_nn_Transformer 12.38s call test/test_jit.py::TestScript::test_fuser_double_float_codegen ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31706 Pulled By: driazati Differential Revision: D19251567 fbshipit-source-id: 8e76f717506b8bf28d1a63ce302feb0446dc9141	2019-12-30 11:45:24 -08:00
Natalia Gimelshein	ee87b01f40	add additional types to indexing operations dispatch (#31692 ) Summary: - Fixes https://github.com/pytorch/pytorch/issues/31672 - Adds Bfloat16 dispatch to the indexing operations that were missing it - index_put on cuda does not have bfloat16 dispatch, because I'm not sure bfloat16 math ops work on cuda Note: `index_put_` with `accum=True` is enabled for `bool`, which does not make much sense, but I'm not the one who started it, so this behavior is preserved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31692 Differential Revision: D19249561 Pulled By: ngimel fbshipit-source-id: 1269196194f7b9f611b32be198c001704731a78f	2019-12-29 23:03:54 -08:00
vishwakftw	22d84204f7	Expose torch.poisson in documentation (#31667 ) Summary: Changelog: - Add doc string for torch.poisson briefing current behavior - Check for non-positive entries in the tensor passed as input to torch.poisson Closes https://github.com/pytorch/pytorch/issues/31646 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31667 Differential Revision: D19247371 Pulled By: ngimel fbshipit-source-id: b53d105e73bf59a45beeb566f47365c3eb74efca	2019-12-28 21:32:26 -08:00
WANG	3b7916fccd	Modify the order of arguments position of torch.std and torch.std_mean in doc (#31677 ) Summary: Change log: - [x] Change the order of arguments position of torch.std and torch.std_mean in doc. - [x] Correct a spelling mistake of torch.std_mean in doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31677 Differential Revision: D19247372 Pulled By: ngimel fbshipit-source-id: 8685f5207c39be524cdc81250430beac9d75f330	2019-12-28 20:36:26 -08:00
Shen Li	e8e47c0a1b	Split RRef class into abstract RRef and RRefBase (#28942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28942 The new abstract RRef class contains only user-facing RRef APIs. It will be later moved to a common folder so that it can be shared by jit and distributed packages to provide TorchScript support. Test Plan: Imported from OSS Differential Revision: D18240590 Pulled By: mrshenli fbshipit-source-id: ac28cfc2c8039ab7131b537b2971ed4738710acb	2019-12-28 20:01:02 -08:00
Jiyan Yang	90a187618e	Integrate masked sparse Adagrad (#31641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31641 Assuming mask is provided as a tensor Test Plan: unit test Reviewed By: ellie-wen Differential Revision: D18928737 fbshipit-source-id: a4f3dd51769c2b56e5890043e91c18e6128be082	2019-12-27 18:40:50 -08:00
anjali411	ae214f67a5	updated code to ensure error check for negative dims Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31636 Differential Revision: D19233031 Pulled By: anjali411 fbshipit-source-id: c29265ddd1f887f1a0b98aca56a2691d7584353d	2019-12-27 14:39:57 -08:00
Mingbo Wan	647569e546	get rid of choco install (#30897 ) Summary: 7zip and cmake are part of base image, no need to re-install. Remove the install step can make build/test more stable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30897 Differential Revision: D19232961 Pulled By: mingbowan fbshipit-source-id: fa3bbd1325839a2a977bf13fdbd97fda43793b8d	2019-12-27 13:12:04 -08:00
Dehua Cheng	35bee0c729	separate op for rowwise counter (#31612 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31612 Count the number recent update on rows. Exponential decay is applied on the counter with decay rate r, such that r^{counter_halflife} = 0.5; If counter_halflife is nonpositive, this operator is turned off. Test Plan: added unittest Reviewed By: chocjy Differential Revision: D19217921 fbshipit-source-id: 96d850123e339212cc0e0ef352ea8a1b1bf61dfa	2019-12-27 12:18:39 -08:00
Gregory Chanan	e84e7ec556	Kill aten_custom_call. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25613 Test Plan: Imported from OSS Differential Revision: D17172503 Pulled By: gchanan fbshipit-source-id: 1456ecca8f459d008e335412cd7084bdfcb93439	2019-12-27 11:08:42 -08:00
Yinghai Lu	b522a8e1ff	Optimize zero length input (#31602 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31602 Pull Request resolved: https://github.com/pytorch/glow/pull/3943 Zero length input is something we hit fairly frequently in practice. Previous handling of global TensorPool involves two locks per input (acquire and reclaim). Here we use a specialized anchor tensor to host zero length input. Note that it is only padded to max sequence length. If necessary, an easy extension can be added to pad to max `InputPlaceholder.getType().size()`. Reviewed By: jfix71 Differential Revision: D19192467 fbshipit-source-id: cafdc1eb7bf9b9d6ead04a0243b0be838f6b71cd	2019-12-26 22:31:15 -08:00
Lu Fang	204939b401	Automatic update of fbcode/onnx to 57ebc587fcf3913b4be93653b0dd58c686447298 (#31642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31642 Previous import was c08a7b76cf7c1555ae37186f12be4d62b2c39b3b Included changes: - [57ebc587](https://github.com/onnx/onnx/commit/57ebc587): python_out does not recognize dllexport_decl. (#2482) <xkszltl> - [477a9b87](https://github.com/onnx/onnx/commit/477a9b87): Edited PythonAPIOverview.md (#2491) <AlexMuresan> - [59b9f908](https://github.com/onnx/onnx/commit/59b9f908): Minor correction type (#2411) <Jhuo IH> - [cdc8b861](https://github.com/onnx/onnx/commit/cdc8b861): fix the optimize pass of fuse_consecutive_transposes (#2471) <XavierAtShanghai> - [ad1f5567](https://github.com/onnx/onnx/commit/ad1f5567): Add clarification for bias quantization in QlinearConv Op spec (#2464) <Ashwini Khade> - [d9a73ccc](https://github.com/onnx/onnx/commit/d9a73ccc): Add remove operator and function requirements to the add new op doc. (#2486) <Emad Barsoum> Test Plan: cont build Reviewed By: hl475 Differential Revision: D19234753 fbshipit-source-id: 4b7de1407d9b64e584f6e6d68cbe03fa1b4c854d	2019-12-26 21:25:04 -08:00
Lu Fang	ffcac9ad37	Clean White List for BC Checks (#31629 ) Summary: Delete obsolete items Pull Request resolved: https://github.com/pytorch/pytorch/pull/31629 Reviewed By: hl475 Differential Revision: D19231522 Pulled By: houseroad fbshipit-source-id: 393ed630f7854b643c8fa8c5f3f576718934de96	2019-12-26 21:21:39 -08:00
Jiyan Yang	4983ef8de1	Integrating MaskedAdagrad Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31640 Test Plan: unit test Reviewed By: ellie-wen Differential Revision: D18805278 fbshipit-source-id: 1def4a89b7e4e04385c762bf127d95c5e513180e	2019-12-26 17:18:39 -08:00
Jie	909b8eba0d	cudnn grouped convolution nhwc patch (#31444 ) Summary: Earlier cudnn version doesn't support grouped convolution in NHWC well. Legit configuration in later cudnn version might return CUDNN_STATUS_NOT_SUPPORTED. We are falling back to NCHW when runtime check of cudnn version is < 7.6.0 to keep the logic simple. Note: We might update the heuristics, 7.6.0 is very conservative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31444 Differential Revision: D19232414 Pulled By: VitalyFedyunin fbshipit-source-id: 4c2d79ed347c49cd388bbe5b2684dbfa233eb2a3	2019-12-26 17:16:02 -08:00
Fan Wang	39508501a4	Create byte-aware word lstm benchmark (#31260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31260 1. Update the LiteLM dataset conversion script (fbcode/pytext/fb/tools/lite_lm_dataset_to_tensorproto.py) 2. Created a benchmark json file for byte-aware lstm word model (xplat/aibench/specifications/models/caffe2/assistant/lite_lm_len5.json) 3. In order to run the model -- created an int64 Tensor for the model, added batch gather ops to the BUCK file Test Plan: ``` 1. Create tensorproto of the model input buck run mode/opt //pytext/fb/tools:byte_lm_dataset_to_tensorproto -- --in-path /mnt/vol/pytext/smart_keyboard/aibench/test_5.txt --out-path /mnt/vol/pytext/smart_keyboard/aibench/byteAwareWordLM/ --hidden_dim 203 --layers_num 2 --max_seq_len 64 --max_byte_len 15 2. Run the aibench command buck run fbsource//xplat/aibench:run_bench -- -b aibench/specifications/models/caffe2/assistant/lm_byte_lstm_len5.json --remote --devices SM-G960U-8.0.0-26 ``` Reviewed By: gardenia22 Differential Revision: D17785682 fbshipit-source-id: 351c3c8bae16449e72ac641522803b23a83349be	2019-12-26 16:44:30 -08:00
Avinash Madasu	91eb7c26cd	Fix Typos Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31630 Differential Revision: D19233162 Pulled By: zou3519 fbshipit-source-id: c2716a2df2b2ccfeda7718b484e9605515ecdf01	2019-12-26 15:47:10 -08:00
svcscm	34dce8e348	Updating submodules Summary: GitHub commits: `a40d608341` `50e0ea13e5` `bcbdec74f4` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 3de13d5b9b20ec18927ee3f0224df789172a3e9c	2019-12-26 15:06:04 -08:00
davidriazati	ec4e347744	Add Python language reference docs (#30686 ) Summary: This exposes our audit of https://docs.python.org/3/reference/ with descriptions for each line item. To generate the `.rst` from the Quip: ```bash pip install m2r m2r jit_language_reference.md ``` https://driazati.github.io/pytorch_doc_previews/30686/jit.html#python-functions-and-modules Pull Request resolved: https://github.com/pytorch/pytorch/pull/30686 Pulled By: driazati Differential Revision: D19219587 fbshipit-source-id: 249db9b5ee20e38804d4302bbfeca7d54f27d0bd	2019-12-26 13:21:36 -08:00
Lu Fang	5d95a9ca79	Print all broken ops instead of the first one (#31628 ) Summary: Originally, we only print one broken schema. With this changeset, all the broken schemas are printed out. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31628 Reviewed By: hl475 Differential Revision: D19231444 Pulled By: houseroad fbshipit-source-id: 3dd5b4609a6a9a9046e95f2f30deb9beeb5dcd56	2019-12-26 12:51:43 -08:00
svcscm	cf46bcace8	Updating submodules Summary: GitHub commits: `faebc336da` `23d8703808` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 0368879112c318607821bbf3a081669dade19148	2019-12-26 12:27:04 -08:00
Gregory Chanan	866c1b1fcc	Ensure legacy sparse constructor/new doesn't interpret python data as tensor data. (#31490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31490 When this happens, a dense tensor is constructed from a sparse constructor. Fixes: https://github.com/pytorch/pytorch/issues/16154 Test Plan: Imported from OSS Reviewed By: cpuhrsch, mrshenli Differential Revision: D19196498 Pulled By: gchanan fbshipit-source-id: 57a6324833e35f3e62318587ac74267077675b93	2019-12-26 10:46:18 -08:00
svcscm	e2951d586d	Updating submodules Summary: GitHub commits: `11a904583d` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: f00bf65aebddb4541faa2626d42ac436e090ee89	2019-12-26 09:49:33 -08:00
Gregory Chanan	29f345831e	Error out if legacy Tensor.new is called on alternate layouts / dtypes (#31485 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31485 Fixes: https://github.com/pytorch/pytorch/issues/22158 Test Plan: Imported from OSS Differential Revision: D19196499 Pulled By: gchanan fbshipit-source-id: a01ea7641b5fcd00a9d267243539ff64a5492e5f	2019-12-26 07:27:24 -08:00
Jongsoo Park	a54dc87e8e	revert D18805532 and make numerics of masked adagrad consistent with unmasked adagrad (#30784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30784 Instead of putting experimental Masked*Adagrad to OSS, we decided to change D18805278 . Test Plan: CI Reviewed By: chocjy Differential Revision: D18824265 fbshipit-source-id: 3d893fe6c441f2ff7af4c497cf81b9c49363e7a8	2019-12-24 10:02:13 -08:00
Lu Fang	363d8be787	Bypass _TorchScriptTesting_StackString::pop in BC check now (#31586 ) Summary: Failed result: https://circleci.com/gh/pytorch/pytorch/4054919?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link/console Original PR: https://github.com/pytorch/pytorch/pull/30242 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31586 Reviewed By: hl475 Differential Revision: D19222086 Pulled By: houseroad fbshipit-source-id: 96db2bf18fa06eaebdd558e86615e26b95f34516	2019-12-23 22:00:20 -08:00
Kaikai Wang	46ad80c839	Fix null pointer dereference on Android for strtod_c (#31582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31582 D19124934 removed a dummy pointer passed to strtod_c() that's used only for Android (https://fburl.com/diffusion/zkv34jf1). Without it, jit parsing on Android start throwing SIGSEGV due to null pointer dereferencing. This diff adds the dummy pointer back. Test Plan: Tests Reviewed By: driazati, shoumikhin Differential Revision: D19221071 fbshipit-source-id: 2e230c3fbfa873c3f7b92f73c87ee766ac182115	2019-12-23 20:08:13 -08:00
davidriazati	446e9af5b9	Fix parsing of big float literals (#29940 ) Summary: Stacked PRs * #29940 - [jit] Fix parsing of big float literals * #29935 - [jit] Fix hex literal parsing * #29931 - [jit] Throw a better error for int too big for int64_t ](https://our.intern.facebook.com/intern/diff/19186604/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29940 Pulled By: driazati Differential Revision: D19186604 fbshipit-source-id: 6ef66588a5cf956f281e7bd1e5584ef06f5296e9	2019-12-23 17:21:07 -08:00
Xiang Gao	218cfd568d	Conv transpose/backward split 32bit (#31510 ) Summary: Basically the same as https://github.com/pytorch/pytorch/pull/31379 except for that I write a separate function `split_batch_dim_to_32bit_out` for the logic. This function could also be used for convolution forward, and I will rebase this PR after https://github.com/pytorch/pytorch/issues/31379 get merged and then change `raw_cudnn_convolution_forward_out` to use `split_batch_dim_to_32bit_out` here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31510 Differential Revision: D19210563 Pulled By: ngimel fbshipit-source-id: e20bb82b6360aa2c0e449e127188c93f44e1e9b4	2019-12-23 11:34:17 -08:00
Aditya Kumar	fb63c0e2c9	Remove -Wno-unused-private-field Test Plan: Sanity check Reviewed By: nlutsenko Differential Revision: D18833450 fbshipit-source-id: c69b6679b4caa3e868ca41113cd502c8905a776b	2019-12-23 10:59:00 -08:00
Gregory Chanan	68e5172382	Support optional float parameters (float?, optional<double>). (#31517 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31517 This is going to be used by upsample (which currently uses magic values to represent optionals). For now, we just introduce a fake function for testing (torch._test_optional_float(x)). Test Plan: Imported from OSS Differential Revision: D19198721 Pulled By: gchanan fbshipit-source-id: 0a1382fde0927c5d277d02d62bfb31fb574b8c74	2019-12-23 08:33:39 -08:00
Vincent Quenneville-Belair	9459db86bf	Raise warning for schedulers following chainable shedulers (#31125 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29697. Raise warning for schedulers following chainable schedulers in https://github.com/pytorch/pytorch/issues/26423. See explanation for * [new warning when load/save](https://github.com/pytorch/pytorch/issues/29697#issuecomment-564655802) * [change from deprecation to user warning](https://github.com/pytorch/pytorch/issues/29697#issuecomment-564659775). gchanan -- This should go in the upcoming release following https://github.com/pytorch/pytorch/issues/26423. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31125 Differential Revision: D19143740 Pulled By: vincentqb fbshipit-source-id: 35b55fe6c5b39ca5a68b1a6e19f14eb95b9a784e	2019-12-23 08:24:22 -08:00
Rohan Varma	fe76af96ed	fix test_process_group_debug_info flaky test (#31533 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31533 Fixes this test that was flaky and has been disabled (see https://github.com/pytorch/pytorch/issues/31112) ghstack-source-id: 96038999 Test Plan: Run the test 1000 times and ensure that it passes. Differential Revision: D19203366 fbshipit-source-id: 7978cbb8ca0989a0a370a36349cdd4db3bb8345b	2019-12-22 18:01:21 -08:00
Rohan Varma	cc2d5ca37f	add enabled API to autograd profiler (#31380 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31380 For being able to profile async RPCs, we attach a `RecordFunction` object to the future that is created during the RPC to persist it across the lifetime of the RPC (this is implemented in the next PR: ). Since we'd only like to do this when profiling is enabled, this PR adds an enabled API to the autograd profiler. ghstack-source-id: 96053933 Test Plan: Modified unit test. Differential Revision: D19050391 fbshipit-source-id: aa382110e69d06b4a84c83b31d2bec2d8a81ba10	2019-12-22 16:24:59 -08:00
James Reed	7d630278da	Separate torchbind from Python (#30242 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30242 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29501 Currently blocked on schema serialization issue Test Plan: Imported from OSS Differential Revision: D18463063 Pulled By: jamesr66a fbshipit-source-id: c12a1b644eb9bf04e68ff93cccf91d6cb3e75359	2019-12-21 22:52:40 -08:00
Xiang Gao	700109eb63	set stream everytime when we get a cuDNN handle (#31541 ) Summary: cudnn version of https://github.com/pytorch/pytorch/pull/31537 https://github.com/pytorch/pytorch/pull/31532 is a quick fix and this is a bigger change. This would deprecate https://github.com/pytorch/pytorch/pull/31532, but we could also merge https://github.com/pytorch/pytorch/pull/31532 first for a quick fix and then work on this later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31541 Differential Revision: D19206753 Pulled By: ngimel fbshipit-source-id: 3352f923d13a9baf0971f64f8b7ce03e9a8b42b1	2019-12-20 21:34:40 -08:00
Xiang Gao	b5bbec7bad	set stream everytime when we get a cuSparse handle (#31538 ) Summary: cuSparse version of https://github.com/pytorch/pytorch/pull/31537 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31538 Differential Revision: D19206895 Pulled By: ngimel fbshipit-source-id: a32c0bc310189a89a0098837438d62458b5c0a7c	2019-12-20 21:31:17 -08:00
Xiang Gao	8d8e82883e	set stream everytime when we get a cuBlas handle (#31537 ) Summary: I don't see any reason for not doing so, because it is a common error that people forget to set the stream. And I don't think there is a reason for not running on the current stream. This is just for cublas, cusparse and cudnn should be modified also. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31537 Differential Revision: D19206908 Pulled By: ngimel fbshipit-source-id: ba2b2b74e9847f0495c76dbc778751a9f23f8b36	2019-12-20 21:31:13 -08:00
Xiang Gao	0b0f90f53c	Split on batch dimension when 32bit indexing not enough for convolution forward (#31379 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/22496 This is just a first step towards the support of 64bit convolution on CUDA. In the forward of convolution, if the total tensor size is larger than 2^31, then we split it on the batch dimension. I want to get some review feedback before moving forward for the same splitting approach for backward. There are real-world use cases that even when N=1 the input is still larger than 2^31. For this case, the splitting would be complicated, so I am planning to modify `use_cudnn` to just dispatch to the slow fallback kernel in PyTorch in a later PR. Update: `later PR` is https://github.com/pytorch/pytorch/pull/31383 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31379 Differential Revision: D19192018 Pulled By: ngimel fbshipit-source-id: c26ecc56319ac67c4d5302ffed246b8d9b5eb972	2019-12-20 21:27:06 -08:00
Mingbo Wan	3820d6f6b9	make gc script python2 compatible (#31536 ) Summary: get rid of f-string, somehow we still have python2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31536 Differential Revision: D19204187 Pulled By: mingbowan fbshipit-source-id: da8e17e4dccdd6fd1b0e92eb4740f5a09a8a4209	2019-12-20 16:34:33 -08:00
Ivan Kobzarev	c808eed04a	Nightly dimension, input shape in gradle (#30195 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30195 1. Added flavorDimensions 'build' local/nightly to be able to test the latest nightlies ``` cls && gradle clean test_app:installMobNet2QuantNightlyDebug -PABI_FILTERS=x86 --refresh-dependencies && adb shell am start -n org.pytorch.testapp.mobNet2Quant/org.pytorch.testapp.MainActivity ``` 2. To be able to change all new model setup editing only `test_app/build.gradle` Inlined model asset file names to `build.gradle` Extracted input tensor shape to `build.gradle` (BuildConfig) Test Plan: Imported from OSS Differential Revision: D18893394 Pulled By: IvanKobzarev fbshipit-source-id: 1fae9989d6f4b02afb42f8e26d0f3261d7ca929b	2019-12-20 16:08:04 -08:00
Ivan Kobzarev	3a19980b78	Tensor class created from java does not call native methods Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31520 Test Plan: Imported from OSS Reviewed By: iseeyuan Differential Revision: D19199477 Pulled By: IvanKobzarev fbshipit-source-id: ba51454586a9385dba4ab73936f907346e0105d1	2019-12-20 14:40:54 -08:00
Martin Yuan	11854bcd38	Add test to torch.jit.export_opnames, make the _C function private Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31446 Test Plan: Imported from OSS Differential Revision: D19172851 Pulled By: iseeyuan fbshipit-source-id: f06d8766ed73c9abe4ebf41c402ee64880d745be	2019-12-20 13:38:43 -08:00
svcscm	81329c907d	Updating submodules Summary: GitHub commits: `cbce6d17bb` `4762e080cf` `174107c0a4` `8dee0e0058` `ce52b27b4d` `f89dea4fec` `b269fc595c` `5b014c641e` `ae2d7e11a2` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 252ea5198c3fe4ecfe24e878ea701c48c57618de	2019-12-20 13:35:02 -08:00
David Reiss	35b249769d	Exclude lite interpreter Java files from OSS host build Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31204 Test Plan: Imported from OSS Differential Revision: D19200610 Pulled By: dreiss fbshipit-source-id: 0cf41c99b4c2604afc2dccfebbea213c0e1f9638	2019-12-20 13:32:27 -08:00
Jerry Zhang	08de70cad1	Remove observers in the end (#31407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31407 Remove observers in the end instead of before quantize tensor since we still need them to find the quantization paramters for each module instance Test Plan: . Imported from OSS Differential Revision: D19162367 fbshipit-source-id: f817af87183f6c42dc97becea85ddeb7e050e2b1	2019-12-20 13:17:26 -08:00
Jerry Zhang	b4c48b7e29	Call `getQSchemeAndQParamMap` later in `quantizeTensors` (#31406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31406 Previously we record quantization parameters for a given value when we collect the observer nodes, but actually the quantization parameter can vary depending on each module instance, to achieve that, we need to delay the call to later stage and only record the `Value*` that's needed in `collectObserverNodesAndValueToQuantize` function Test Plan: . Imported from OSS Differential Revision: D19162369 fbshipit-source-id: e0f97e322d18a281bf15b6c7bbb04c3dfacb512f	2019-12-20 13:17:21 -08:00
Sam Gross	df9d5b8a77	Use macros instead of directly accessing Python object fields (#31388 ) Summary: The Python C API documentation states "Access to the [PyObject] members must be done by using the macros Py_REFCNT and Py_TYPE." Pull Request resolved: https://github.com/pytorch/pytorch/pull/31388 Differential Revision: D19161790 Pulled By: colesbury fbshipit-source-id: ac9a3738c913ad290a6d3460d0d657ec5c13b711	2019-12-20 12:11:17 -08:00
Nikolay Korovaiko	5375ceae80	run optimizations on pre-profiled graph (#31392 ) Summary: This is the first stab at running profile-insensitive optimizations on pre-profiled graphs. Running those optimizations has a potential to simplify graphs greatly before GuardElimination and GuardElimination should be able to remove more guards. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31392 Differential Revision: D19173639 Pulled By: Krovatkin fbshipit-source-id: 2485a2a598c10f9b5445efb30b16439ad4551b3f	2019-12-20 10:49:08 -08:00
James Reed	256db1e61b	Add fake parsing for torchbind classes in schema type parser Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31506 Test Plan: Imported from OSS Differential Revision: D19187722 Pulled By: jamesr66a fbshipit-source-id: 4529409454d64393a821b8fa795db39bc82da8fc	2019-12-20 10:28:57 -08:00
Jongsoo Park	7a12ccd003	optimize FloatToFused8BitRowwiseQuantized and Fused8BitRowwiseQuantizedToFloat (#31470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31470 Optimize performance of these two operators. Additionally use nearbyint instead of round to be consistent with 4-bit embedding table quantization. Reviewed By: hyuen Differential Revision: D19072103 fbshipit-source-id: efe96f14aeff7958cceb453ed625d3fd693891ff	2019-12-20 10:09:26 -08:00
neginraoof	0b57b383b1	Im2col export (#30972 ) Summary: Added im2col to opset 11. This symbolic is used to export torch.nn.Unfold Pull Request resolved: https://github.com/pytorch/pytorch/pull/30972 Reviewed By: hl475 Differential Revision: D18946921 Pulled By: houseroad fbshipit-source-id: 13dd0cbae899700df32fd74d6dff1f29033a2b4c	2019-12-20 09:45:45 -08:00
Nikita Shulga	6cd987e7c0	Make fully_qualified_type_name_impl() compatible with VS2017 15.9 (#31455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31455 In 15.9, __FUNCSIG__ unwraps using definitions as well as preserves noexcept qualifiers Test Plan: Build caffe2 on Windows using VS2017 Differential Revision: D19166204 fbshipit-source-id: b6c5f70e5262d13adf585f77b92223cf5f1e78dd	2019-12-20 09:17:44 -08:00
Nikita Shulga	2099cfa13d	Fix input_channels divisibility check in concat_split_op (#31448 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31448 Replace `(!x%y)` with `(x%y != 0)` Test Plan: CI Reviewed By: orionr Differential Revision: D19165492 fbshipit-source-id: 246635fb8ddd5823196bcef9d0e6cdf1c349015e	2019-12-20 09:12:54 -08:00
Gregory Chanan	b38901aa15	Test reading `__cuda_array_interface__` inferred strides. (#31451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31451 The PR that fixed this, https://github.com/pytorch/pytorch/pull/24947, didn't add a test. Fixes: https://github.com/pytorch/pytorch/issues/31443 Test Plan: Imported from OSS Differential Revision: D19170020 Pulled By: gchanan fbshipit-source-id: bdbf09989ac8a61b1b70bb1ddee103caa8ef435b	2019-12-20 08:21:39 -08:00
Brian Vaughan	d0d6e0b5e3	add type promotion support for sparse tensors (#30429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30429 also fix a bug in uncoalesced division General approach here is that we: * compute the common dtype based on input tensors * error if the output tensor is specified and the common type can't be cast back to the output type (e.g. for inplace ops) * convert input tensor (values) to the common dtype * perform the op as normal (computing at the common dtype instead of the result type). * convert/copy the result values back to that of the result tensor (for in-place ops). For uncoalesced division we need to coalesce, because an integral tensor with values=[1,1] at the same index divided by 2 would give 1/2 + 1/2 =0 instead of 2/2=1. Test Plan: Imported from OSS Differential Revision: D19143223 Pulled By: nairbv fbshipit-source-id: 480fa334c0b2b3df046818f2342cfd4e2d9d892a	2019-12-20 08:01:00 -08:00
svcscm	e9ef087d2d	Updating submodules Summary: GitHub commits: `357842e091` `d62f47c763` `dc94cd4972` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: dcb9813e1469cc867d9c826daa873c535ef408ab	2019-12-20 00:57:39 -08:00
Chunli Fu	4c341582ea	modify model to enable loading by blob (#31507 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31507 This script is used to generate a model with bound shape inference and blob reorder, which are requirements for big model loading on T17. 1. Load existing model. 2. Do bound shape inference and blob reorder (put embedding blobs at the end). 3. Save the modified model. Test Plan: Generated a new moel and tested on NNPI. P124181047 (mismatch is AA variance) Reviewed By: ipiszy Differential Revision: D19165467 fbshipit-source-id: c3522fc5dc53b7ec652420558e9e8bf65a1ccfae	2019-12-19 21:57:22 -08:00
davidriazati	06dbef663d	Add support for `del` (#31273 ) Summary: Adds the `del` keyword to the parser and corresponding `aten::Delete` op for lists and dicts Fixes #20615 ](https://our.intern.facebook.com/intern/diff/19181473/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31273 Pulled By: driazati Differential Revision: D19181473 fbshipit-source-id: c42a2d43ec361a98e0c425232981edc9c39388c4	2019-12-19 21:48:11 -08:00
Xiang Gao	624088e444	Don't dispatch to cudnn if it is not possible to make it 32bit by splitting batch dim (#31383 ) Summary: Also a step towards supporting 64bit indexing in convolution. See also: https://github.com/pytorch/pytorch/pull/31379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31383 Differential Revision: D19183443 Pulled By: ngimel fbshipit-source-id: 0c2030fac147e629d7be0c29f0683ec2b3f28c71	2019-12-19 18:00:03 -08:00
svcscm	87768e5ade	Updating submodules Summary: GitHub commits: `286867987e` `09cbf47ea5` `db100834c1` `1ba92b8582` `60240e3f08` `beb5c4798e` `c37eb5d377` `1ada29037c` `f12539bbc9` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: 75b16ea1bc038b599b3540d0615dd9eb9ecfda74	2019-12-19 17:30:48 -08:00
Zachary DeVito	457286a383	fix missing type check in dictionary literal Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31375 Test Plan: Imported from OSS Differential Revision: D19145440 Pulled By: zdevito fbshipit-source-id: 69909089586149ef766b4858d3420864a81b2493	2019-12-19 16:22:36 -08:00
Rohan Varma	348d42114e	Kill MessageType::SHUTDOWN related logic in pg agent (#31270 ) Summary: https://github.com/pytorch/pytorch/pull/30330 got rid of the need to send a `MessageType::SHUTDOWN` message, so we can now remove the logic/utils for this type of message. I think we can also delete the enum entry in the `enum MessageType`, but we may want to keep it in case the logic in https://github.com/pytorch/pytorch/pull/30710 is ever moved to C++. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31270 Test Plan: All existing unit tests pass Differential Revision: D19146983 Pulled By: rohan-varma fbshipit-source-id: 35b185411f9446d7d4dfc37a6cb5477cf041e647	2019-12-19 13:47:43 -08:00
davidriazati	57caeb3fc1	Fix builtins table (#31492 ) Summary: Fixes a bad merge that is breaking distributed tests on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/31492 Pulled By: driazati Differential Revision: D19180978 fbshipit-source-id: f69f525e2c7f61194686f07cf75db00eb642882f	2019-12-19 13:33:15 -08:00
Jerry Zhang	226c2d79ce	Get QScheme from observer module (#31293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31293 Previously we check the number of elements in scale to determine if we are using per channel quantization, but we should get qscheme information from observer module directly and we'll expose this information to caller as well Test Plan: . Imported from OSS Differential Revision: D19146669 fbshipit-source-id: ea430eeae0ef8f441be39aa6dcc1bb530b065554	2019-12-19 13:33:11 -08:00
Richard Zou	dbe2f265d0	Better error msg for autograd profiler + multi-worker dataloader crash (#31473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31473 Mitigates #6313 A common use case for the autograd profiler is to use it to run over an entire model, including dataloading. The following will crash: - run autograd profiler in CUDA mode - Use a multi-worker DataLoader (presumably with the 'fork' spawn method) - because the autograd profiler initializes CUDA and forking after CUDA is initialized is bad. This PR puts in a nice error message when this happens so that users aren't too confused. The new error message looks like: https://gist.github.com/zou3519/903f15c3e86bad4585b7e5ce14cc1b70 Test Plan: - Tested locally. - I didn't add a test case for this because it's hard to write a test case that doesn't completely stop the rest of our test suite from running. Differential Revision: D19178080 Pulled By: zou3519 fbshipit-source-id: c632525ba1f7b168324f1aa55416e5250f56a086	2019-12-19 13:30:19 -08:00
Richard Zou	e67064a96f	Exclude generated source docs from Google (#31484 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31484 See https://github.com/pytorch/pytorch/issues/26123 for context. Previously, when someone googles for `pytorch "adaptive_max_pool2d"`, https://pytorch.org/docs/stable/_modules/torch/nn/modules/pooling.html is the first result. This PR changes the docs build script to exclude all such generated source docs under `_modules/` from Google. It does this by doing a search for `<head>` and then appending `<meta name="robots" content="noindex">`. The [google developer docs](https://support.google.com/webmasters/answer/93710?hl=en) suggest that this is the right way to prevent google from indexing the page. In the future, when the CI builds documentation (both master and stable docs), the newly created docs under _modules will have the meta noindex tag. Test Plan: - I ran `find "$install_path/_modules" -name "*.html" -print0 \| xargs -0 sed -i '/<head>/a \ \ <meta name="robots" content="noindex">'` on a docs build locally and checked that it does indeed append the meta noindex tag after `<head>`. - In a few days we should rerun the search to see if these pages are still being indexed. Differential Revision: D19180300 Pulled By: zou3519 fbshipit-source-id: 5f5aa95a85dd9f065607c2a16f4cdd24ed699a83	2019-12-19 13:27:12 -08:00
Richard Zou	8f3c0d541e	Speed up `Tensor::has_names` for unnamed tensors (#31436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31436 Tensor::has_names is slower than it should be for unnamed tensors because of the following: - it always tries to access the TLS for NamesMode. Unnamed tensors don't need to peek at NamesMode to determine if they have names or not. - There is some virtual function being called because TensorImpl is in c10 and NamedTensorMeta is in libtorch. This PR short-circuits Tensor::has_names for unnamed tensors by checking if the underlying TensorImpl hold a pointer to NamedTensorMeta or not. If the NamedTensorMeta is nullptr; then the tensor is definitely unnamed. Benchmarks: - I have a dedicated benchmarking machine where I isolate a single CPU and make sure it runs at a fixed frequency. - I benchmarked torch.add, which calls `tensor::has_names` three times. - The TL;DR is that torch.add between size-1 unnamed tensors gets sped up ~200ns after this change which is a 9% improvement. - Before, on my machine: https://gist.github.com/zou3519/dfd648a1941d584711d850754e0694bc - After on my machine: https://gist.github.com/zou3519/e78f0d8980b43d0d9c3e3e78ecd0d4d5 Test Plan: - run tests Differential Revision: D19166510 Pulled By: zou3519 fbshipit-source-id: 1888a4e92d29152a5e3b778a95e531087e532f53	2019-12-19 13:19:30 -08:00
anjali411	9d9bc93bfb	Added error message to indicate that reduction operations are not supported for dim>=64 (#31476 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/23159 Currently we don't support reduction operations for dim>=64 and we should give a descriptive RuntimeError indicating the same Diff: D19179039 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31476 Differential Revision: D19179039 Pulled By: anjali411 fbshipit-source-id: 58568f64627bf3df6b3e00a1498544c030e74a0e	2019-12-19 13:00:53 -08:00
Elias Ellison	779b128872	add back in reference to jit_unsupported section (#31486 ) Summary: It was added in https://github.com/pytorch/pytorch/pull/31329 and removed in a bad merge in https://github.com/pytorch/pytorch/pull/31138/ Pull Request resolved: https://github.com/pytorch/pytorch/pull/31486 Differential Revision: D19181967 Pulled By: eellison fbshipit-source-id: 7e4b4a9b2042c30ec18f7f737bc4a9a56fac7d92	2019-12-19 12:44:16 -08:00
anjali411	49fe7a7401	Updated documentation for NLLLoss to explain what x, y and w refer to (#31488 ) Summary: Reference: https://github.com/pytorch/pytorch/issues/31385 In the current documentation for NLLLoss, it's unclear what `y` refers to in the math section of the loss description. There was an issue(https://github.com/pytorch/pytorch/issues/31295) filed earlier where there was a confusion if the loss returned for reduction=mean is right or not, perhaps because of lack in clarity of formula symbol description in the current documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31488 Differential Revision: D19181391 Pulled By: anjali411 fbshipit-source-id: 8b75f97aef93c92c26ecbce55b3faf2cd01d3e74	2019-12-19 12:28:16 -08:00
Jerry Zhang	d6acc87c93	Guard against copying from quantized Tensor to non-quantized Tensor (#29660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29660 att Test Plan: python test/test_quantized_tensor.py Imported from OSS Differential Revision: D18799897 fbshipit-source-id: 5d1b4ef84f5ae8eba830784b74485d78fa1e6fcf	2019-12-19 12:16:44 -08:00
peterjc123	c4121ed8db	Fix is_fundamental template for MSVC (#30959 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30932 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30959 Differential Revision: D18891797 Pulled By: mingbowan fbshipit-source-id: e6c36ee80065e66117873e768f86f507c48aaef1	2019-12-19 12:10:22 -08:00
svcscm	6d6a91fb0f	Updating submodules Summary: GitHub commits: `58a1ec274c` `24da1c8b66` `77d5ba7887` `c7b80d7ab5` Test Plan: n/a Reviewed By: tgreenidge fbshipit-source-id: be872df9014b795b279b93bd81efbaa41f2d0fd7	2019-12-19 12:05:29 -08:00
davidriazati	28376e826d	Fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31463 Pulled By: driazati Differential Revision: D19173580 fbshipit-source-id: 6e5bb24949ec357c4d5b29a16d1733b664f21e05	2019-12-19 10:17:01 -08:00
Gregory Chanan	540b9da41e	Bump numba version in circleCI config to 0.46.0. (#31435 ) Summary: The current numba version doesn't appear to actually work with our numba-cuda tests (numba.cuda.is_available()) fails. Previous attempts to upgrade were blocked by https://github.com/numba/numba/issues/4368. It's a bit unclear to me, but I believe 0.46.0 fixes the above version. I'm verify that we catch that issue in CI via https://github.com/pytorch/pytorch/pull/31434. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31435 Differential Revision: D19166865 Pulled By: gchanan fbshipit-source-id: e01fa48c577e35de178423db7a7f79ac3dd3894d	2019-12-19 07:55:55 -08:00
Nikolay Korovaiko	fc3103b116	fixing a naming issue in creating a residual loop node in a bailout graph (#31400 ) Summary: This addresses the issue of differentiating between `%4` in `%12 : int, %y.1 : Tensor = prim::Loop(%9, %6, %4, %3)` and `%y.5 : Double(3) = aten::cat(%22, %4) # test_jit.py:3772:24` in `%4` loop's body in a residual continuation loop, because these should be different values. ``` [DUMP profiling_graph_executor_impl.cpp:124] with prim::BailoutTemplate_0 = graph(%z.1 : int, [DUMP profiling_graph_executor_impl.cpp:124] %size.1 : int): [DUMP profiling_graph_executor_impl.cpp:124] %2 : Tensor = prim::Constant[value= 1 1 [ CPUDoubleType{2} ]]() [DUMP profiling_graph_executor_impl.cpp:124] %3 : Double(2) = prim::BailOut[index=0](%2, %z.1, %size.1) [DUMP profiling_graph_executor_impl.cpp:124] %4 : int = prim::Constant[value=0]() # test_jit.py:3772:54 [DUMP profiling_graph_executor_impl.cpp:124] %5 : None = prim::Constant() [DUMP profiling_graph_executor_impl.cpp:124] %6 : bool = prim::Constant[value=1]() # test_jit.py:3770:16 [DUMP profiling_graph_executor_impl.cpp:124] %counters.1 : int[] = prim::ListConstruct() [DUMP profiling_graph_executor_impl.cpp:124] %8 : int = prim::Constant[value=8]() [DUMP profiling_graph_executor_impl.cpp:124] %9 : int = aten::__round_to_zero_floordiv(%size.1, %8) [DUMP profiling_graph_executor_impl.cpp:124] %10 : int = aten::mul(%9, %8) [DUMP profiling_graph_executor_impl.cpp:124] %11 : int = aten::sub(%size.1, %10) [DUMP profiling_graph_executor_impl.cpp:124] %12 : int, %y.1 : Tensor = prim::Loop(%9, %6, %4, %3) # test_jit.py:3770:16 [DUMP profiling_graph_executor_impl.cpp:124] block0(%i.2 : int, %15 : int, %y.7 : Tensor): [DUMP profiling_graph_executor_impl.cpp:124] %17 : Double(2) = prim::BailOut[index=1](%y.7, %z.1, %counters.1, %9, %11, %i.2, %15) [DUMP profiling_graph_executor_impl.cpp:124] %18 : int[] = aten::append(%counters.1, %15) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %19 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %20 : Tensor = aten::ones(%19, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %21 : Double(1) = prim::BailOut[index=2](%20, %z.1, %counters.1, %9, %11, %i.2, %15, %17) [DUMP profiling_graph_executor_impl.cpp:124] %22 : Tensor[] = prim::ListConstruct(%17, %21) [DUMP profiling_graph_executor_impl.cpp:124] %y.5 : Double(3) = aten::cat(%22, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %24 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:124] %25 : int = aten::add(%15, %24) [DUMP profiling_graph_executor_impl.cpp:124] %26 : int[] = aten::append(%counters.1, %25) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %27 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %28 : Tensor = aten::ones(%27, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %29 : Double(1) = prim::BailOut[index=3](%28, %z.1, %counters.1, %9, %11, %i.2, %y.5, %25) [DUMP profiling_graph_executor_impl.cpp:124] %30 : Tensor[] = prim::ListConstruct(%y.5, %29) [DUMP profiling_graph_executor_impl.cpp:124] %y.9 : Double(4) = aten::cat(%30, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %32 : int = aten::add(%25, %24) [DUMP profiling_graph_executor_impl.cpp:124] %33 : int[] = aten::append(%counters.1, %32) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %34 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %35 : Tensor = aten::ones(%34, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %36 : Double(1) = prim::BailOut[index=4](%35, %z.1, %counters.1, %9, %11, %i.2, %y.9, %32) [DUMP profiling_graph_executor_impl.cpp:124] %37 : Tensor[] = prim::ListConstruct(%y.9, %36) [DUMP profiling_graph_executor_impl.cpp:124] %y.10 : Double(5) = aten::cat(%37, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %39 : int = aten::add(%32, %24) [DUMP profiling_graph_executor_impl.cpp:124] %40 : int[] = aten::append(%counters.1, %39) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %41 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %42 : Tensor = aten::ones(%41, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %43 : Double(1) = prim::BailOut[index=5](%42, %z.1, %counters.1, %9, %11, %i.2, %y.10, %39) [DUMP profiling_graph_executor_impl.cpp:124] %44 : Tensor[] = prim::ListConstruct(%y.10, %43) [DUMP profiling_graph_executor_impl.cpp:124] %y.11 : Double(6) = aten::cat(%44, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %46 : int = aten::add(%39, %24) [DUMP profiling_graph_executor_impl.cpp:124] %47 : int[] = aten::append(%counters.1, %46) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %48 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %49 : Tensor = aten::ones(%48, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %50 : Double(1) = prim::BailOut[index=6](%49, %z.1, %counters.1, %9, %11, %i.2, %y.11, %46) [DUMP profiling_graph_executor_impl.cpp:124] %51 : Tensor[] = prim::ListConstruct(%y.11, %50) [DUMP profiling_graph_executor_impl.cpp:124] %y.12 : Double(7) = aten::cat(%51, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %53 : int = aten::add(%46, %24) [DUMP profiling_graph_executor_impl.cpp:124] %54 : int[] = aten::append(%counters.1, %53) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %55 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %56 : Tensor = aten::ones(%55, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %57 : Double(1) = prim::BailOut[index=7](%56, %z.1, %counters.1, %9, %11, %i.2, %y.12, %53) [DUMP profiling_graph_executor_impl.cpp:124] %58 : Tensor[] = prim::ListConstruct(%y.12, %57) [DUMP profiling_graph_executor_impl.cpp:124] %y.13 : Double(8) = aten::cat(%58, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %60 : int = aten::add(%53, %24) [DUMP profiling_graph_executor_impl.cpp:124] %61 : int[] = aten::append(%counters.1, %60) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %62 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %63 : Tensor = aten::ones(%62, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %64 : Double(1) = prim::BailOut[index=8](%63, %z.1, %counters.1, %9, %11, %i.2, %y.13, %60) [DUMP profiling_graph_executor_impl.cpp:124] %65 : Tensor[] = prim::ListConstruct(%y.13, %64) [DUMP profiling_graph_executor_impl.cpp:124] %y.14 : Double(9) = aten::cat(%65, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %67 : int = aten::add(%60, %24) [DUMP profiling_graph_executor_impl.cpp:124] %68 : int[] = aten::append(%counters.1, %67) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %69 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %70 : Tensor = aten::ones(%69, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %71 : Double(1) = prim::BailOut[index=9](%70, %z.1, %counters.1, %9, %11, %i.2, %y.14, %67) [DUMP profiling_graph_executor_impl.cpp:124] %72 : Tensor[] = prim::ListConstruct(%y.14, %71) [DUMP profiling_graph_executor_impl.cpp:124] %y.15 : Tensor = aten::cat(%72, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %74 : int = aten::add(%67, %24) [DUMP profiling_graph_executor_impl.cpp:124] -> (%6, %74, %y.15) [DUMP profiling_graph_executor_impl.cpp:124] %75 : Double(10) = prim::BailOut[index=10](%y.1, %z.1, %counters.1, %11, %12) [DUMP profiling_graph_executor_impl.cpp:124] %76 : int, %y : Tensor = prim::Loop(%11, %6, %12, %75) # test_jit.py:3770:16 [DUMP profiling_graph_executor_impl.cpp:124] block0(%i.1 : int, %79 : int, %y.6 : Tensor): [DUMP profiling_graph_executor_impl.cpp:124] %81 : Double(*) = prim::BailOut[index=11](%y.6, %z.1, %counters.1, %11, %i.1, %79) [DUMP profiling_graph_executor_impl.cpp:124] %82 : int[] = aten::append(%counters.1, %79) # test_jit.py:3771:20 [DUMP profiling_graph_executor_impl.cpp:124] %83 : int[] = prim::ListConstruct(%z.1) [DUMP profiling_graph_executor_impl.cpp:124] %84 : Tensor = aten::ones(%83, %5, %5, %5, %5) # test_jit.py:3772:38 [DUMP profiling_graph_executor_impl.cpp:124] %85 : Double(1) = prim::BailOut[index=12](%84, %counters.1, %11, %i.1, %79, %81) [DUMP profiling_graph_executor_impl.cpp:124] %86 : Tensor[] = prim::ListConstruct(%81, %85) [DUMP profiling_graph_executor_impl.cpp:124] %y.4 : Tensor = aten::cat(%86, %4) # test_jit.py:3772:24 [DUMP profiling_graph_executor_impl.cpp:124] %88 : int = prim::Constant[value=1]() [DUMP profiling_graph_executor_impl.cpp:124] %89 : int = aten::add(%79, %88) [DUMP profiling_graph_executor_impl.cpp:124] -> (%6, %89, %y.4) [DUMP profiling_graph_executor_impl.cpp:124] %90 : Double(12) = prim::BailOut[index=13](%y, %counters.1) [DUMP profiling_graph_executor_impl.cpp:124] %91 : (Tensor, int[]) = prim::TupleConstruct(%90, %counters.1) [DUMP profiling_graph_executor_impl.cpp:124] return (%91) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31400 Differential Revision: D19172750 Pulled By: Krovatkin fbshipit-source-id: 85d3aac4e80b65b83b6be3c0bca8075a731a2b7e	2019-12-19 00:34:50 -08:00
David Riazati	1e116a5089	Revert D19054937: Add support for `del` Test Plan: revert-hammer Differential Revision: D19054937 Original commit changeset: c535ea16a9e6 fbshipit-source-id: e57d31811441947b7ee38c8c2b16eecde5005792	2019-12-18 22:39:41 -08:00
Junjie Bai	489dd6cb90	Add TORCH_DCHECK macro that checks only in debug builds (#31240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31240 Follow up on discoveries/discussions in https://github.com/pytorch/pytorch/pull/30810 Mimic the `DCHECK` macro from https://github.com/pytorch/pytorch/blob/e5eb871/c10/util/logging_is_not_google_glog.h#L117-L125 With this change the perf gap is eliminated: ``` ================================================================================ Program Output: ================================================================================ Run on (36 X 1601 MHz CPU s) 2019-12-12 20:12:13 ----------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------- BM_IntrusivePtrCtorDtor 23 ns 23 ns 30914703 BM_SharedPtrCtorDtor 27 ns 27 ns 25895944 BM_IntrusivePtrArray/16 503 ns 503 ns 1392139 BM_IntrusivePtrArray/32 1006 ns 1006 ns 695749 BM_IntrusivePtrArray/64 2013 ns 2013 ns 347714 BM_IntrusivePtrArray/128 4024 ns 4024 ns 173964 BM_IntrusivePtrArray/256 8047 ns 8047 ns 86994 BM_IntrusivePtrArray/512 16106 ns 16106 ns 43461 BM_IntrusivePtrArray/1024 32208 ns 32207 ns 21731 BM_IntrusivePtrArray/2048 64431 ns 64430 ns 10865 BM_IntrusivePtrArray/4096 128940 ns 128938 ns 5429 BM_SharedPtrArray/16 503 ns 503 ns 1392128 BM_SharedPtrArray/32 1006 ns 1006 ns 695940 BM_SharedPtrArray/64 2012 ns 2012 ns 347817 BM_SharedPtrArray/128 4024 ns 4023 ns 173927 BM_SharedPtrArray/256 8069 ns 8069 ns 86741 BM_SharedPtrArray/512 16143 ns 16142 ns 43357 BM_SharedPtrArray/1024 32283 ns 32283 ns 21685 BM_SharedPtrArray/2048 64718 ns 64717 ns 10817 BM_SharedPtrArray/4096 129469 ns 129466 ns 5407 ================================================================================ ``` ``` ================================================================================ Program Output: ================================================================================ Run on (80 X 2001 MHz CPU s) 2019-12-12 20:12:23 ----------------------------------------------------------------- Benchmark Time CPU Iterations ----------------------------------------------------------------- BM_IntrusivePtrCtorDtor 18 ns 18 ns 38630411 BM_SharedPtrCtorDtor 22 ns 22 ns 32356114 BM_IntrusivePtrArray/16 402 ns 402 ns 1739637 BM_IntrusivePtrArray/32 805 ns 805 ns 869818 BM_IntrusivePtrArray/64 1610 ns 1609 ns 434881 BM_IntrusivePtrArray/128 3218 ns 3218 ns 217437 BM_IntrusivePtrArray/256 6436 ns 6436 ns 108739 BM_IntrusivePtrArray/512 12882 ns 12882 ns 54356 BM_IntrusivePtrArray/1024 25763 ns 25763 ns 27177 BM_IntrusivePtrArray/2048 51532 ns 51531 ns 13590 BM_IntrusivePtrArray/4096 103091 ns 103091 ns 6778 BM_SharedPtrArray/16 402 ns 402 ns 1740165 BM_SharedPtrArray/32 804 ns 804 ns 869035 BM_SharedPtrArray/64 1610 ns 1610 ns 434975 BM_SharedPtrArray/128 3218 ns 3218 ns 217505 BM_SharedPtrArray/256 6457 ns 6457 ns 108510 BM_SharedPtrArray/512 12909 ns 12909 ns 54249 BM_SharedPtrArray/1024 25810 ns 25810 ns 27127 BM_SharedPtrArray/2048 51763 ns 51763 ns 13531 BM_SharedPtrArray/4096 103506 ns 103505 ns 6759 ================================================================================ ``` Test Plan: buck test caffe2/c10/... buck test mode/opt caffe2/c10/... Differential Revision: D18998243 fbshipit-source-id: ddf0a118a80efe032b52d403867c1f416c721590	2019-12-18 21:55:58 -08:00
Elias Ellison	fb24f7c4ad	catch all exceptions in converting default values to ivalues (#31398 ) Summary: Previously we would only catch `py::cast_error` which led to incomprehensible error messages like: `TypeError: 'NoneType' object is not iterable`. We are running arbitrary pybind code here, and not doing anything with the error message, so we should be less restrictive with the types of errors we catch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31398 Differential Revision: D19166655 Pulled By: eellison fbshipit-source-id: 84db8b3714c718b475913f2f4bb6f19e62f2d9ec	2019-12-18 20:27:46 -08:00
Jerry Zhang	1bb6c51421	Fix getAttribute (#31011 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31011 `getAttribute` is supposed to throw when there the attribute is not found rather than return a `nullptr`. Test Plan: . Imported from OSS Differential Revision: D18898417 fbshipit-source-id: 0fe7d824b978ad19bb5ef094d3aa560e9fc57f87	2019-12-18 19:27:39 -08:00
Jeremy Lilley	dff7b945bf	Avoid sending large unneeded data over wire in process_group_agent. (#31357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31357 If a user selects a subset of a Tensor and sends it in an RPC, we were sending the whole original Tensor Storage over the network. While this sounds reasonable, in practice, we observed view-like Tensors being sent over rpc, where only 1% of the data in the provided Tensor's Storage was actually used/needed. The simple solution here is to just force a clone in the serializer code if we see that less than (arbitrary) half the bits are used, and the tensor is more than a nominal few KB. Add related tests to ensure this doesn't break. An alternate approach would be to modify the Pickler. That said, since Pickler is shared by more components, the logic might be harder to tailor appropriately at that layer (particularly given that the Pickler has explicit logic to share a single Storage* among several Tensors that commonly point to the same Storage*). It's possible that we might want to further refine the basic thresholds in this change. In practice, we've seen a mostly bimodal distribution thus far for the percent of Tensor Storage referred by a Tensor in observed rpcs (i.e. either 90%+ or sub-10% of the Storage referenced), hence the existing 50% threshold here is probably not an unreasonable starting point. ghstack-source-id: 95925474 Test Plan: buck test mode/dev caffe2/test/cpp/rpc/... Differential Revision: D19137056 fbshipit-source-id: e2b3a4dd0cc6e1de820fd0740aa1d59883dbf8d4	2019-12-18 19:24:24 -08:00
svcscm	1bb800cf5c	Updating submodules Summary: GitHub commits: `f5d37bdcfd` `21ba9e3692` `576eeaee27` `7ba1f57d53` `e520f8f5b3` `54f9092b0c` `88bb770ce1` `d91888de6c` `ff06eb0881` `fdaeb6ea30` `1fd432f00f` `60b7cb3408` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: f63bd0a879f4d08e159f530f595067f5a09ffe70	2019-12-18 18:41:23 -08:00
Jerry Zhang	fe707c7849	Use `default_observer` and `default_weight_observer` in tests (#31424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31424 att Test Plan: test_jit.py Imported from OSS Differential Revision: D19162368 fbshipit-source-id: 33b95ba643eeeae942283bbc33f7ceda8d14c431	2019-12-18 18:35:07 -08:00
davidriazati	e1509cb468	Add support for `del` (#31273 ) Summary: Adds the `del` keyword to the parser and corresponding `aten::Delete` op for lists and dicts Fixes #20615 ](https://our.intern.facebook.com/intern/diff/19054937/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31273 Pulled By: driazati Differential Revision: D19054937 fbshipit-source-id: c535ea16a9e62d176f8ad45947670fc3535af77c	2019-12-18 18:19:22 -08:00
Michael Suo	e7d25a3e4d	add a suggested alternative to _get_trace_graph Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31441 Test Plan: Imported from OSS Differential Revision: D19165646 Pulled By: suo fbshipit-source-id: 96a264bc55ceafd798d92b986d319cddbb0d9c69	2019-12-18 17:34:25 -08:00
Kaikai Wang	d2e66b44cc	Temporary fix to support building pytorch from fbsource (for xplat dependencies) (#31393 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31393 pytorch build was set up with the include paths (-I) relative to fbcode/. This works well for fbcode builds, but doesn't work for the new fbcode_deps args for xplat build targets that work across xplat and fbcode. When these targets are built, the include paths need to be relative to fbsource, so fbcode/ suffix needs to be added to those paths. Longer term, to properly fix this, we need to use raw_headers with public_include_directories specified for all of these targets. Test Plan: buck test mode/dev //papaya/integration/service/local/test:mnist_federated_system_test -- 'MnistFederatedSystemTest\.test' --run-disabled Reviewed By: mzlee Differential Revision: D19148465 fbshipit-source-id: a610e84bf4cad5838e54e94bae71b957c4b6d4b5	2019-12-18 17:30:57 -08:00
James Reed	a3cdb7eca3	Fix default instantation of dynamic quantized LSTM Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31433 Test Plan: Imported from OSS Differential Revision: D19164539 Pulled By: jamesr66a fbshipit-source-id: 7045817ab3dfb530c4480a10523c4c6bcdbfc7eb	2019-12-18 16:59:00 -08:00
Tristan Rice	1e80ff7a67	autograd/profiler: make record_function more threadsafe (#31346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31346 This makes it so that if profiling is enabled/disabled from a different thread while a RecordFunction span is active via an op it doesn't crash the process. We currently see when using torch.distributed.rpc to enable/disable profiling on other nodes while other things are running. Test Plan: buck test //caffe2/test:autograd -- test_record_function Reviewed By: albanD Differential Revision: D19133258 fbshipit-source-id: 30712b06c6aa051789948de2918dcfb9b78967ba	2019-12-18 16:27:42 -08:00
davidriazati	148bcd3ee5	Add support for builtins as attributes (#31269 ) Summary: Fixes #27495 This adds builtins as another piece of a concrete type. They're separate from normal functions since they represent the `BuiltinFunction` sugared value (which is a direct call to a builtin op). It also moves the builtins related logic from `jit/__init__.py` to `jit/_builtins.py` so it can be used from `jit/_recursive.py` to look up functions in the builtins table. ](https://our.intern.facebook.com/intern/diff/19149779/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31269 Pulled By: driazati Differential Revision: D19149779 fbshipit-source-id: d4e5e5d7d7d528b75a2f503e6004394251a4e82d	2019-12-18 15:24:45 -08:00
davidriazati	503a4e9019	Cleanup after moving language reference (#31146 ) Summary: Stacked PRs * #31146 - [jit] Cleanup after moving language reference * #31138 - [jit] Move TorchScript language reference to its own page Preview: https://driazati.github.io/pytorch_doc_previews/jit.html#torchscript-language Pull Request resolved: https://github.com/pytorch/pytorch/pull/31146 Pulled By: driazati Differential Revision: D19167390 fbshipit-source-id: f28daed36754a553264fc8ac142ed22c3e26d63e	2019-12-18 15:09:35 -08:00
davidriazati	ae2487bf4d	Move TorchScript language reference to its own page (#31138 ) Summary: Stacked PRs * #31146 - [jit] Cleanup after moving language reference * #31138 - [jit] Move TorchScript language reference to its own page Preview: https://driazati.github.io/pytorch_doc_previews/jit.html#torchscript-language Pull Request resolved: https://github.com/pytorch/pytorch/pull/31138 Pulled By: driazati Differential Revision: D19167375 fbshipit-source-id: d37110d85fc8b8d2c741be49846e873de1357c2a	2019-12-18 15:09:31 -08:00
Yanghan Wang	d08250c223	fix zero-batch handling in convtranspose (#24341 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/24341 ConvTransposeOp doesn't crash for zero-batch, but it doesn't modify the output blob. This leads to buggy behaviour especially when running the same network twice using different input, or backprop during training. Seems `ConvTransposeUnpoolBase<Context>::GetOutputSize` works for zero-batch, so I remove the check for `input.numel() > 0`, and reshape the output blob before returning. For CudnnConvTransposeGradientOp, it's a bit verbose to set `dfilter` and `dbias`, it's a seems the Cudnn can handle it, so simply remove the `X.numel() == 0` branch. Test Plan: buck test mode/dev-nosan caffe2/caffe2/python/operator_test:conv_transpose_test -- --run-disabled Reviewed By: BIT-silence Differential Revision: D16807606 fbshipit-source-id: 0d72c5bd8f2e03c34465e7b530cca548d9bdd5e1	2019-12-18 15:06:36 -08:00
davidriazati	7692494c67	Fix hex literal parsing (#29935 ) Summary: Stacked PRs * #29940 - [jit] Fix parsing of big float literals * #29935 - [jit] Fix hex literal parsing * #29931 - [jit] Throw a better error for int too big for int64_t Previously these were all parsed as `0` ](https://our.intern.facebook.com/intern/diff/19124944/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/29935 Pulled By: driazati Differential Revision: D19124944 fbshipit-source-id: 1ee0c1dee589933363a5efba069a2cfaf94373c5	2019-12-18 14:00:22 -08:00
davidriazati	1f50cfc24d	Throw a better error for int too big for int64_t Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29931 Pulled By: driazati Differential Revision: D19124934 fbshipit-source-id: 91841d7ba4f2f6142c51fba07b7faa14bb817e3a	2019-12-18 14:00:16 -08:00
Elias Ellison	fb30a48b4e	add unsupported section (#31329 ) Summary: Add a section for unsupported ops, and modules. Automatically generate the properties and attributes that aren't bound, and for ops that have semantic mismatches set up tests so the docs stay up to date. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31329 Differential Revision: D19164472 Pulled By: eellison fbshipit-source-id: 46290bb8a64d9de928cfb1eda5ff4558c3799c88	2019-12-18 13:56:02 -08:00
Andreas Koepf	5e8bac24b4	Migrate soft_margin_loss from the TH to Aten (CUDA+CPU) (#28135 ) Summary: Fix: https://github.com/pytorch/pytorch/issues/24631, https://github.com/pytorch/pytorch/issues/24632, https://github.com/pytorch/pytorch/issues/24764, https://github.com/pytorch/pytorch/issues/24765 Port of TH SoftMarginCriterion to ATen using un-fused tensor operators but with custom backward code. This is a follow-up/fixc of reverted PR https://github.com/pytorch/pytorch/issues/27673. Benchmark results: CPU became faster, GPU slower. To reach previous TH perf probably manual fusion is necessary. ### WITH patch ``` CPU warmup 1000 took 7.997200009413064e-05 CPU warmup 10000 took 0.0008116499957395718 CPU warmup 100000 took 0.0012691459996858612 CPU warmup TOTAL time 0.0021982479956932366 CPU forward 1000 took 7.320100849028677e-05 CPU forward 10000 took 0.00015837099635973573 CPU forward 100000 took 0.0010471990099176764 CPU forward 1000000 took 0.01238470000680536 CPU forward 10000000 took 0.12747182900784537 CPU forward 100000000 took 1.2076255190040683 CPU forward TOTAL time 1.3488940890092636 CPU for- & backward 1000 took 0.00032587299938313663 CPU for- & backward 10000 took 0.0006926299975020811 CPU for- & backward 100000 took 0.002146183993318118 CPU for- & backward 1000000 took 0.019158899012836628 CPU for- & backward 10000000 took 0.2957490350090666 CPU for- & backward 100000000 took 1.7630806300003314 CPU for- & backward TOTAL time 2.081367089995183 GPU warmup 1000 took 0.0004558280052151531 GPU warmup 10000 took 0.0002567449992056936 GPU warmup 100000 took 0.0001593509950907901 GPU warmup TOTAL time 0.0009442300070077181 GPU forward 1000 took 0.00015061900194268674 GPU forward 10000 took 0.00015258099301718175 GPU forward 100000 took 0.00015409699699375778 GPU forward 1000000 took 0.0008183339959941804 GPU forward 10000000 took 0.004424853003001772 GPU forward 100000000 took 0.04356115800328553 GPU forward TOTAL time 0.04938192600093316 GPU for- & backward 1000 took 0.0008062430133577436 GPU for- & backward 10000 took 0.0006074949924368411 GPU for- & backward 100000 took 0.0007091690058587119 GPU for- & backward 1000000 took 0.001022183001623489 GPU for- & backward 10000000 took 0.009945805999450386 GPU for- & backward 100000000 took 0.0944173600000795 GPU for- & backward TOTAL time 0.28060428200114984 ``` ### WITHOUT patch ``` CPU warmup 1000 took 6.394000956788659e-05 CPU warmup 10000 took 0.00038220599526539445 CPU warmup 100000 took 0.0034939230099553242 CPU warmup TOTAL time 0.003981974994530901 CPU forward 1000 took 4.7855006414465606e-05 CPU forward 10000 took 0.000347569992300123 CPU forward 100000 took 0.003367935001733713 CPU forward 1000000 took 0.03605044000141788 CPU forward 10000000 took 0.35935167300340254 CPU forward 100000000 took 3.630371332008508 CPU forward TOTAL time 4.029640004009707 CPU for- & backward 1000 took 0.00028494100843090564 CPU for- & backward 10000 took 0.0006738200027029961 CPU for- & backward 100000 took 0.0051178760040784255 CPU for- & backward 1000000 took 0.04925115800870117 CPU for- & backward 10000000 took 0.7172313440096332 CPU for- & backward 100000000 took 5.441953932997421 CPU for- & backward TOTAL time 6.21466830400459 GPU warmup 1000 took 0.001803738996386528 GPU warmup 10000 took 0.00041877900366671383 GPU warmup 100000 took 0.0003870719956466928 GPU warmup TOTAL time 0.0026561370032140985 GPU forward 1000 took 0.00037833399255760014 GPU forward 10000 took 0.00038825398951303214 GPU forward 100000 took 0.0003841099969577044 GPU forward 1000000 took 0.0007090550061548129 GPU forward 10000000 took 0.0016171559982467443 GPU forward 100000000 took 0.013463679002597928 GPU forward TOTAL time 0.017010531009873375 GPU for- & backward 1000 took 0.0007374050037469715 GPU for- & backward 10000 took 0.0006343529967125505 GPU for- & backward 100000 took 0.0006375070079229772 GPU for- & backward 1000000 took 0.0007550300069851801 GPU for- & backward 10000000 took 0.002672752001672052 GPU for- & backward 100000000 took 0.023170708998804912 GPU for- & backward TOTAL time 0.20251446698966902 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/28135 Differential Revision: D18001447 Pulled By: VitalyFedyunin fbshipit-source-id: ad90dc1cca42dcaf3ea9e17e4f8fd79cee0a293e	2019-12-18 13:33:59 -08:00
xiaobing.zhang	7cf8b9bada	Move leaky_relu to Aten(CPU, CUDA) (#29899 ) Summary: VitalyFedyunin, This PR is about port LeakyReLU activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" m = nn.LeakyReLU() if torch.cuda.is_available(): device = "cuda" m = m.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [100, 10000]: fwd_t = 0 bwd_t = 0 input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. Perfromance: Before: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: OMP_NUM_THREADS=56 input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.14 (ms). input size(128, 10000) forward time is 4.21 (ms); backwad avg time is 8.02 (ms). OMP_NUM_THREADS=1 input size(128, 100) forward time is 0.02 (ms); backwad avg time is 0.07 (ms). input size(128, 10000) forward time is 1.98 (ms); backwad avg time is 6.21 (ms) ``` After: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: OMP_NUM_THREADS=56 input size(128, 100) forward time is 0.02 (ms); backwad avg time is 0.04 (ms). input size(128, 10000) forward time is 0.03 (ms); backwad avg time is 0.09 (ms). OMP_NUM_THREADS=1 input size(128, 100) forward time is 0.01 (ms); backwad avg time is 0.02 (ms). input size(128, 10000) forward time is 0.47 (ms); backwad avg time is 1.02 (ms). ``` How to set the numbers of thread? using following script: ``` num_threads=$1 script=$2 last_core=`expr $num_threads - 1` echo "using $num_threads OMP threads" echo "bind cores to 0~$last_core" export OMP_NUM_THREADS=$num_threads export KMP_AFFINITY=granularity=fine,compact,1,0 numactl --physcpubind=0-$last_core --membind=0 python $script ``` and run ./run.sh num_threads test.py. Fixes https://github.com/pytorch/pytorch/issues/24583 #24584 https://github.com/pytorch/pytorch/issues/24720 #24721 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29899 Differential Revision: D18816231 Pulled By: VitalyFedyunin fbshipit-source-id: afb1e43a99317d17f50cff1b593cd8f7a0a83da2	2019-12-18 13:14:11 -08:00
Tristan Rice	b0bd35ff13	caffe2/event: allow multiple errors such as when cancelled (#31335 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31335 When an error occurs in a net we end up cancelling all the async ops. If one error occurs it's highly likely other errors will occur as well. Typically we see: 1. SendOp failed due to a network error 2. async scheduling cancels all other ops via `SetFinished("Cancelled");` 3. Another SendOp fails due to a network error and crashes the process when the exception is thrown. This changes caffe2 ops to allow failing twice. Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu Reviewed By: andrewwdye Differential Revision: D19106548 fbshipit-source-id: 4b7882258a240894cc16d061a563c83a3214d3d9	2019-12-18 13:10:57 -08:00
Mingbo Wan	4d22c3ba01	fix docker login, add docker image tag list after purge as html (#31328 ) Summary: example of the generated html: http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/31328 Differential Revision: D19147113 Pulled By: mingbowan fbshipit-source-id: 5104e92d4490f047a6474e2b12aed3293b52a9df	2019-12-18 12:08:51 -08:00
Pavel Belevich	47766e648f	C++ API parity: MultiheadAttention Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27309 Test Plan: Imported from OSS Differential Revision: D17766736 Pulled By: pbelevich fbshipit-source-id: 7a5f2399f081945d31d4c13d7a8d248c387fc1a6	2019-12-18 10:13:29 -08:00
Elliot Waite	c63f8e5ebe	Fix typo in data.rst docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31395 Differential Revision: D19160010 Pulled By: zou3519 fbshipit-source-id: cbc4e719e69117e8747617729d240c72e7a4e3dd	2019-12-18 09:52:10 -08:00
Natalia Gimelshein	285cc13435	check devices for all input tensors in index_put (#31280 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/30960 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31280 Differential Revision: D19149114 Pulled By: ngimel fbshipit-source-id: af185a98ac6ea614f43bbf865de02ea113d4ed56	2019-12-18 09:25:40 -08:00
Pritam Damania	913323750d	CODEOWNERS for distributed optimizer. (#31403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31403 ghstack-source-id: 95874532 Test Plan: waitforbuildbot Differential Revision: D19154217 fbshipit-source-id: a18ebe646b97c83cc0eb0821b10b4c76d5ce2878	2019-12-18 09:25:35 -08:00
Pritam Damania	359c39b3c2	Use global lock instead of per instance lock. (#31404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31404 Multiple "trainers" could each create different instances of DistributedOptimizer, which means we can still have a race condition unless we do a trully global per worker lock. ghstack-source-id: 95874624 Test Plan: run unit tests -- unfortunatelly due to the non-deterministic behavior it's not clear how to unit test this properly. Differential Revision: D19154248 fbshipit-source-id: fab6286c17212f534f1bd1cbdf9f0de002d48c74	2019-12-18 09:22:54 -08:00
Jerry Zhang	386cd59d44	Remove redundant queries of qconfig in `insertObservers` (#31292 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31292 att Also we need to do this check after we call `insertObservers` on invoked modules as well since qconfig can be None for parent module while being valid for invoked modules Test Plan: . Imported from OSS Differential Revision: D19146668 fbshipit-source-id: be6811353d359ed3edd5415ced29a4999d86650b	2019-12-18 09:15:52 -08:00
Iurii Zdebskyi	58d2dd5b73	Enabled flip for bool tensors (#31267 ) Summary: Fix this [issue](https://github.com/pytorch/pytorch/issues/31213) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31267 Differential Revision: D19047249 Pulled By: izdeby fbshipit-source-id: f58ca3ac88aab28742b8d345400270f7d31c3856	2019-12-18 09:01:32 -08:00
Vitaly Fedyunin	3e59e80429	Revert D18941024: Move TorchScript language reference to its own page Test Plan: revert-hammer Differential Revision: D18941024 Original commit changeset: d0ff600870a1 fbshipit-source-id: 01c0eac4c9741f27b91d710616e71a0d769f6f6a	2019-12-18 08:55:50 -08:00
Kurt Mohler	3694749cd1	Detect dill version in torch.save/load (#30985 ) Summary: Fix for issue https://github.com/pytorch/pytorch/issues/28313 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30985 Differential Revision: D19142947 Pulled By: zou3519 fbshipit-source-id: 10e3a182a99e80ca8c9c8328b6f8764b27d78eb3	2019-12-18 08:05:08 -08:00
Shady Aly	74e59c6fed	caffe2::TypeInfo fix when using clang-cl on Windows (#31364 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31364 clang-cl defines both `_MSC_VER` and `__clang__`. Names are mangled clang style though. calling `extract` with the wrong name mangling pattern will throw `std::logic_error`. This crashes on Windows when `get_fully_qualified_type_name` is called because it is marked with `noexcept`. Test Plan: Windows builds no longer crash on startup. Reviewed By: mattjgalloway Differential Revision: D19142064 fbshipit-source-id: 516b9b63daeff30f5c097d192b0971c7a42db57e	2019-12-18 07:51:07 -08:00
davidriazati	c05538b831	Move TorchScript language reference to its own page (#31138 ) Summary: Preview: https://driazati.github.io/pytorch_doc_previews/jit.html#torchscript-language ](https://our.intern.facebook.com/intern/diff/18941024/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/31138 Pulled By: driazati Differential Revision: D18941024 fbshipit-source-id: d0ff600870a14c4a7c6ce54867d152072a12c48c	2019-12-18 00:46:19 -08:00
Michael Suo	3c8892aa0c	avoid doing quadratic work in concrete type inference (#31020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31020 Before, the recursive scripting process re-did the concrete type inference process for every submodule call. This changes things so that the concrete type inference process only occurs once (at the top level), and we re-use all the inferred concrete types while recursively compiling submodules. This is both more efficient (we don't do n^2 work inferring concrete types) and less bug-prone (since we infer the concrete type only once, there is no possibility of a mismatch). Test Plan: Imported from OSS Differential Revision: D18904110 Pulled By: suo fbshipit-source-id: 6560b85ae29fe5e9db1ee982dbf8bc222614b8d8	2019-12-17 21:55:55 -08:00
Michael Suo	878b0e35f7	Simplify recursive script compilation flow. (#31019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31019 No more `recurisve_script`, just direct calls to `create_script_module`. This reduces the number of pathways through the frontend, and the uniformity is useful for a future PR. Test Plan: Imported from OSS Differential Revision: D18904113 Pulled By: suo fbshipit-source-id: 7de061dfef0cbdfc9376408fc6c1167b81803f01	2019-12-17 21:55:50 -08:00
Michael Suo	82d52bc718	remove remnants of properties hack (#31018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31018 Properties are now disallowed so this hack is no longer necessary Test Plan: Imported from OSS Differential Revision: D18904112 Pulled By: suo fbshipit-source-id: 83448da677082d59355729bb72d9f9f4c31ea756	2019-12-17 21:55:45 -08:00
Michael Suo	7e81d72d12	remove unnecessary arg from create_script_module (#31017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31017 This arg is now derivable from another one. So we don't need to pass both Test Plan: Imported from OSS Differential Revision: D18904111 Pulled By: suo fbshipit-source-id: ea74ea9c2ae83d9e0e6977b0eb6629f53545e2e4	2019-12-17 21:55:41 -08:00
Michael Suo	e5631119f6	use expect instead of casting in register_c10_ops (#31401 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31401 As title, just a mechanical change Test Plan: Imported from OSS Differential Revision: D19152965 Pulled By: suo fbshipit-source-id: 6bb27df7c8f542c55110286c156358ba0936269f	2019-12-17 21:37:59 -08:00
Michael Suo	4ec2448580	Update OVERVIEW.md (#31373 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31373 Just some housekeeping Test Plan: Imported from OSS Differential Revision: D19145987 Pulled By: suo fbshipit-source-id: ae8142dab2bddcf0b628c27c426ca26334c48238	2019-12-17 21:29:16 -08:00
Michael Suo	e0ab255a51	Updates to serialization.md (#31372 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31372 Keeping it current with the latest changes. Test Plan: Imported from OSS Differential Revision: D19145986 Pulled By: suo fbshipit-source-id: 88122e66fa87a354ef8e87faffe58551074e3f03	2019-12-17 21:29:12 -08:00
Sebastian Messmer	e169e02836	Refactor custom op tests (#31282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31282 Introduce a helper to easily call stack ops ghstack-source-id: 95855728 Test Plan: unit tests Differential Revision: D19061515 fbshipit-source-id: a7d6329e26cd3d94730d88c8a6393e10bfbd8e9b	2019-12-17 20:48:01 -08:00
Vitaly Fedyunin	c5d2758c35	Disable flaky TestMomentumSGD.test_fp16momentum_sgd (#31369 ) Summary: Related to https://github.com/pytorch/pytorch/issues/31368 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31369 Differential Revision: D19147072 Pulled By: VitalyFedyunin fbshipit-source-id: 6fad13be7b35f992d84a20f23877cad05ff18616	2019-12-17 19:16:54 -08:00
Wanchao Liang	e3fecabdcb	Setup operator registration for distributed package (#31214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31214 This set up the basic infrastructure for distributed autograd and rpc to bind their operators to TorchScript, since the whole distributed package is builtin behind the `USE_DISTRIBUTED` flag, we separate the registration and build it only when the flag is on. Test Plan: Imported from OSS Differential Revision: D19137160 fbshipit-source-id: ff47dc4c380ebe273fe0eea9e5e3fccfbd6466d7	2019-12-17 17:26:43 -08:00
Zafar Takhirov	e33dea6e4e	dynamicly quantized lstm benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30149 Test Plan: Imported from OSS Differential Revision: D18613005 Pulled By: z-a-f fbshipit-source-id: 966bfe2c862b1b4006b228bd9115c5c1cd3ad8cf	2019-12-17 16:52:04 -08:00
Sebastian Messmer	f0243ea712	Use [[deprecated]] instead of C10_DEPRECATED (#30918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30918 This is a C++14 feature we can use now ghstack-source-id: 95811482 Test Plan: waitforsandcastle Differential Revision: D18869636 fbshipit-source-id: b5b3d78b61b6ceb2deda509131f8502e95b1d057	2019-12-17 15:21:34 -08:00
Yanghan Wang	d9c3913dfc	move BatchPermutationOp to caffe2/operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31350 Reviewed By: houseroad Differential Revision: D19053527 fbshipit-source-id: 50d11f137d0f5c07e8ad899a3a84d56a042bbc32	2019-12-17 14:58:27 -08:00
Sebastian Messmer	0b8332efb4	Remove c++11 examples from doc comments (#30925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30925 - ghstack-source-id: 95810835 Test Plan: it's just comments Differential Revision: D18869634 fbshipit-source-id: 346498ae2472dbfe23ef40533bff891fde9922c4	2019-12-17 14:58:22 -08:00
Sebastian Messmer	5554e5b793	Docs: c++11 -> c++14 (#30530 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30530 Switch some mentions of "C++11" in the docs to "C++14" ghstack-source-id: 95812049 Test Plan: testinprod Differential Revision: D18733733 fbshipit-source-id: b9d0490eb3f72bad974d134bbe9eb563f6bc8775	2019-12-17 14:09:02 -08:00
Zachary DeVito	cc8d6342fc	make profiling take no_grad flags into account (#31071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31071 Previously the profiler would think Tensors would require grad, even when the no_grad flag is enabled during execution. This makes the profiling and guards respect the no_grad flag, which eliminates extra differentiable graphs that appear in the backward graph (where no_grad is typically enabled). Test Plan: Imported from OSS Differential Revision: D18915468 Pulled By: zdevito fbshipit-source-id: 1ae816a16ab78ae5352825cc6b4a68ed7681a089	2019-12-17 13:22:16 -08:00
Zachary DeVito	dab5f72543	we should have a config-based way to skip flaky tests (#30978 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30978 This particular approach queries our issue tracker for test titles that match the following format: ``` DISABLED test_async_grad_guard_with_grad (jit.test_async.TestAsync) ``` And then skips the python test for them. There is 1 second timeout so if the internet flakes we still run the test suite, without disabling any tests. This is intended as a quick fix, similar to ninja unland, to get to a green master. Long term test disables should go into the code. Test Plan: Imported from OSS Pulled By: zdevito Differential Revision: D18890532 fbshipit-source-id: fe9447e59a6d5c9ad345f7c3ff15d63b6d2a09e2	2019-12-17 11:58:43 -08:00
Gregory Chanan	d2067569e7	Kill THTensor_(bhistc). (#31254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31254 It's not used. Test Plan: Imported from OSS Differential Revision: D19022923 Pulled By: gchanan fbshipit-source-id: caa5e6b7a133f24f8f3349fd1e53147f8dd3fd97	2019-12-17 08:54:17 -08:00
Gregory Chanan	49eff2f43c	Kill THSize. (#31218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31218 It isn't used. Test Plan: Imported from OSS Differential Revision: D18986641 Pulled By: gchanan fbshipit-source-id: 0a434941d12193941f097232c18ffe4268bf5f82	2019-12-17 08:54:13 -08:00
Yanghan Wang	52b8a52e4d	move AliasWithNameOp to caffe2/operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31281 Reviewed By: houseroad Differential Revision: D19053453 fbshipit-source-id: 350bfd5c001db9c17916dcae7ade8f56db1e9841	2019-12-17 02:39:40 -08:00
BowenBao	0e548a76eb	Upgrade exported ONNX IR version to 6 (#31025 ) Summary: Upgrade IR version from 4 to 6, below is change doc from ONNX. The upgrade should be backward compatible. ``` // IR VERSION 5 published on March 18, 2019 // - Add message TensorAnnotation. // - Add quantization annotation in GraphProto to map tensor with its scale and zero point quantization parameters. IR_VERSION_2019_3_18 = 0x0000000000000005; // IR VERSION 6 published on Sep 19, 2019 // - Add support for sparse tensor constants stored in model. // - Add message SparseTensorProto // - Add sparse initializers IR_VERSION = 0x0000000000000006; ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31025 Reviewed By: hl475 Differential Revision: D18935444 Pulled By: houseroad fbshipit-source-id: 9ba47f9657fa1a668db291cf04af07d5e8d73c21	2019-12-16 23:18:22 -08:00
Xintao Chen	10ce1765be	Introducing ScalarTypeType and LayoutType (#31074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31074 As the title, It's step 1 in https://github.com/pytorch/pytorch/pull/30694#issuecomment-564205276. Not using those types in any other place. Test Plan: Making sure all unit tests and build pass successfully. Differential Revision: D18916246 fbshipit-source-id: c8213307ed196e1b51ce1a2a7c10869dcd45b79e	2019-12-16 21:46:47 -08:00
Mingzhe Li	f9010d7648	remove wipe cache from op bench (#31334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31334 The wipe cache logic was introduced hoping to reduce the variations in the benchmark results. Based on our experiments result, it didn't actually help with that. In addition, several engineers had encountered the issue of missing cpuinfo.h which was used in the wipe cache logic. So this diff removes that feature to ensure smooth installation and running of the op bench. Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:add_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_M1_N1_K1_cpu # Input: M: 1, N: 1, K: 1, device: cpu Forward Execution Time (us) : 111.192 A/B test also pass Benchmark Run #2476535015 Reviewed By: hl475 Differential Revision: D19126970 fbshipit-source-id: 9b1ab48c121838836ba6e0ae664a48fe2d18efdd	2019-12-16 16:34:14 -08:00
Vitaly Fedyunin	229ce89b92	Fix coverage and hypothesis conflict (#31320 ) Summary: Temporarily enforcing versions for all envs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31320 Differential Revision: D19122781 Pulled By: VitalyFedyunin fbshipit-source-id: fe6473b177367371387d4b3b873131e7ecfbc0f8	2019-12-16 15:52:42 -08:00
Shihao Xu	c5d3be1102	Remove the second copy on calling dist_autograd_context._known_worker_ids() (#31206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31206 Improvement on #25525. - DistAutogradContext::getKnownWorkerIds() returns a unordered_map as temp value. No need to copy this temp value A into another temp value B. ghstack-source-id: 95736296 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:dist_autograd_fork -- test_worker_ids_recorded ``` ``` buck test mode/dev-nosan //caffe2/test:dist_autograd_fork_thrift -- test_context_cleanup_tensor_with_grad ``` Differential Revision: D5707771 fbshipit-source-id: 9fea83dc69b02047aef8b02a73028a260ac0be40	2019-12-16 15:07:39 -08:00
Sebastian Messmer	643ca5def2	Replace c10::guts::stuff with std::stuff (#30915 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30915 Since we now have C++14, we don't need these c10::guts helpers anymore ghstack-source-id: 95777609 Test Plan: waitforsandcastle Differential Revision: D18869639 fbshipit-source-id: 97716f932297c64c6e814410ac47b444c33d4e2e	2019-12-16 13:57:19 -08:00
Mingzhe Li	c6a8f884d8	add copy_ operator the op bench (#31327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31327 Adds copy_ operator to the benchmark suite Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:binary_test -- --iterations 1 --operators copy_ # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: copy_ # Mode: Eager # Name: copy__M1_N1_K1_cpu_dtype_onetorch.int32_dtype_twotorch.int32 # Input: M: 1, N: 1, K: 1, device: cpu, dtype_one: torch.int32, dtype_two: torch.int32 Forward Execution Time (us) : 60.645 Reviewed By: hl475 Differential Revision: D19122910 fbshipit-source-id: e5f0b0e2612daae0201b1b4a87f52b971e0cc4a8	2019-12-16 13:45:12 -08:00
Mingzhe Li	d401ba1417	benchmark binary ops in binary_test (#31326 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31326 as title Test Plan: ``` buck run caffe2/benchmarks/operator_benchmark/pt:binary_test -- --iterations 1 # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: add # Mode: Eager # Name: add_in_one[64,1,64]_in_two[1,64,1]_cpu_dtypetorch.float32 # Input: in_one: [64, 1, 64], in_two: [1, 64, 1], device: cpu, dtype: torch.float32 Forward Execution Time (us) : 28080.802 Reviewed By: hl475 Differential Revision: D19120113 fbshipit-source-id: 1105de208f7609cc6d74f0b5bc6fe75f19146b28	2019-12-16 13:45:08 -08:00
vishwakftw	455e85a2f1	Fix unflatten when dim is a negative integer (#31208 ) Summary: Changelog: - Wrap dim to be a positive integer when dim is negative Pull Request resolved: https://github.com/pytorch/pytorch/pull/31208 Test Plan: - Updated tests in test_namedtensor.py Fixes https://github.com/pytorch/pytorch/issues/31184 Differential Revision: D19036569 Pulled By: zou3519 fbshipit-source-id: 86e01e20988dee7c4b6c73232f66282d687f9a2c	2019-12-16 12:48:03 -08:00
Gregory Chanan	9ca61aec0f	Kill THLogAdd (#31217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31217 It doesn't seem to be used. Test Plan: Imported from OSS Differential Revision: D18986642 Pulled By: gchanan fbshipit-source-id: 96d615df82731d2224d403ab6e2cad6d4c6674fd	2019-12-16 12:30:16 -08:00
Sebastian Messmer	409151e1bb	Use [[noreturn]] instead of C10_NORETURN or CAFFE_NORETURN (#30917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30917 This is a C++14 feature, we can use this now. ghstack-source-id: 95255753 Test Plan: waitforsandcastle Differential Revision: D18869637 fbshipit-source-id: dd02036b9faeaffa64b2d2d305725443054da31b	2019-12-15 23:54:16 -08:00
Sebastian Messmer	c95d46abbd	Remove C++11 compatibility from c10::util::crc64_t (#30920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30920 deletecode ghstack-source-id: 95255641 Test Plan: waitforsandcastle Differential Revision: D18869640 fbshipit-source-id: c3d7f4e1a29caff9fd8a8141c258f6f1c3fd830c	2019-12-15 23:43:02 -08:00
Sebastian Messmer	0d7391f8b2	Test cases for custom ops with autograd (#31003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31003 - ghstack-source-id: 95663728 Test Plan: unit tests Differential Revision: D18896189 fbshipit-source-id: d71f7678fff644536fe30452ee21a5a7df1f1f0b	2019-12-15 22:37:24 -08:00
Ivan Kobzarev	930d0751e6	Java Tensor hybrid, owns at::Tensor, no memcopy for java outputs. (#30501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30501 Motivation: In current state output of libtorch Module forward,runMethod is mem copied to java ByteBuffer, which is allocated, at least in some versions of android, on java heap. That could lead to intensive garbage collection. Change: Output java tensor becomes owner of output at::Tensor and holds it (as `pytorch_jni::TensorHybrid::tensor_` field) alive until java part is not destroyed by GC. For that org.pytorch.Tensor becomes 'Hybrid' class in fbjni naming and starts holding member field `HybridData mHybridData;` If construction of it starts from java side - java constructors of subclasses (we need all the fields initialized, due to this `mHybridData` is not declared final, but works as final) call `this.mHybridData = super.initHybrid();` to initialize cpp part (`at::Tensor tensor_`). If construction starts from cpp side - cpp side is initialiaed using provided at::Tensor with `makeCxxInstance(std::move(tensor))` and is passed to java method `org.pytorch.Tensor#nativeNewTensor` as parameter `HybridData hybridData`, which holds native pointer to cpp side. In that case `initHybrid()` method is not called, but parallel set of ctors of subclasses are used, which stores `hybridData` in `mHybridData`. Renaming: `JTensor` -> `TensorHybrid` Removed method: `JTensor::newAtTensorFromJTensor(JTensor)` becomes trivial `TensorHybrid->cthis()->tensor()` Test Plan: Imported from OSS Differential Revision: D18893320 Pulled By: IvanKobzarev fbshipit-source-id: df94775d2a010a1ad945b339101c89e2b79e0f83	2019-12-15 21:36:20 -08:00
Xiang Gao	60ec53c7fd	Fix copy kernel speed regression introduced in #29631 (#31279 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/31271 This fixes copy kernel speed regression introduced in https://github.com/pytorch/pytorch/issues/29631. The previous implementation forces the compiler to instantiate `static_cast_with_inter_type` because it is passed as an argument of a function. This behavior makes it impossible for compilers to do optimizations like automatic vectorization, and, function call itself is expensive compared to a single casting instruction. To check the change, run ``` readelf -Ws /home/xgao/anaconda3/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so \| grep static_cast_with_inter_type ``` On nightly build, we have output ``` 168217: 0000000001852bf0 5 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIsdE5applyEd 168816: 0000000001852d30 33 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIfEaE5applyEa 168843: 00000000018531f0 7 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIblE5applyEl 168930: 0000000001852c20 3 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIslE5applyEl 168935: 00000000018528d0 124 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIfNS_4HalfEE5applyES1_ 169023: 0000000001852f30 17 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIdEhE5applyEh 169713: 00000000018525c0 3 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIahE5applyEh 170033: 0000000001852c10 3 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIsiE5applyEi 170105: 0000000001852bd0 5 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIshE5applyEh 170980: 0000000001852fc0 27 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIdES1_IfEE5applyES3_ 171398: 0000000001852810 13 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIdbE5applyEb 171574: 00000000018532e0 35 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIbNS_8BFloat16EE5applyES1_ 171734: 0000000001852b20 6 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIlSt7complexIdEE5applyES2_ 172422: 0000000001853350 54 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeINS_8BFloat16EaE5applyEa 172704: 00000000018533c0 38 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeINS_8BFloat16EfE5applyEf 172976: 0000000001852890 10 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIflE5applyEl 173038: 0000000001852f80 9 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIdEfE5applyEf 173329: 00000000018531c0 20 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIbfE5applyEf 173779: 00000000018524d0 3 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIhiE5applyEi 174032: 0000000001852960 14 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIfNS_8BFloat16EE5applyES1_ 174334: 0000000001852d60 29 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeISt7complexIfEdE5applyEd 174470: 0000000001852c60 124 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIsNS_4HalfEE5applyES1_ 174770: 0000000001852bc0 15 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIlNS_8BFloat16EE5applyES1_ 176408: 0000000001853980 144 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeINS_4HalfEbE5applyEb 176475: 0000000001852790 128 FUNC LOCAL DEFAULT 9 _ZN3c1027static_cast_with_inter_typeIdNS_4HalfEE5applyES1_ .... ``` And after this PR, we get empty output ``` ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31279 Differential Revision: D19075587 Pulled By: ngimel fbshipit-source-id: c20088241f39fa40c1d055f0a46eb5b9ece52e71	2019-12-15 14:01:31 -08:00
Natalia Gimelshein	9dc3d8738c	fix view call on discontiguous tensor in to_sparse_backward (#31223 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30820 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31223 Differential Revision: D19044172 Pulled By: ngimel fbshipit-source-id: ac9fa71197d4f6c5b90a26e8d23360250745a2e2	2019-12-15 11:51:47 -08:00
Serhat Yilmaz	0e50c1b0d9	Replace assert with cuda assert macro (#31297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31297 Follow-up to https://github.com/pytorch/pytorch/pull/31276 This is final replacement needed for aten out of place hipification. Test Plan: wait for CI to clear. Reviewed By: bddppq Differential Revision: D19070209 fbshipit-source-id: 1428cd0ddfb5a8f4e234fabce822285e898047ea	2019-12-15 05:43:00 -08:00
Rohan Varma	ec92711aac	Fix error message in incorrect rref.localValue() call (#31199 ) Summary: Closes https://github.com/pytorch/pytorch/issues/31198, see the issue for more details. We throw an error when `local_value()` is called on a non-owned rref, but the incorrect node name is printed in the error message. This PR fixes that and adds a relevant unit test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31199 Differential Revision: D19072014 Pulled By: rohan-varma fbshipit-source-id: 760c20bfd2fbf286eaaca19500469509a575cfec	2019-12-14 22:51:00 -08:00
Xiang Gao	ffe0c1ae4d	Make test_torch.py pass cuda-memcheck (#29243 ) Summary: Make the following changes: - When there are more than 10k errors, cuda-memcheck only shows 10k errors, in this case we shouldn't raise an Exception - Add UNDER_CUDA_MEMCHECK environment to allow disabling `pin_memory` tests when running cuda-memcheck. - Add a `--ci` command option, when turned on, then this script would run output to stdout instead of writing a file, and exit with an error if cuda-memcheck fails - Add a `--nohang` command option. When turned on, then hang would be treated as pass instead of error - Do simple filtering on the test to run: if `'cpu'` in the test name but not `'cuda'` is not in the test name - Add `--split` and `--rank` to allowing splitting the work (NVIDIA CI has a limitation of 3 hours, we have to split the work to satisfy this limitation) - The error summary could be `ERROR SUMMARY: 1 error`, or `ERROR SUMMARY: 2 errors`, the tail could be `error` or `errors`, it is not of the same length. The script is fixed to handle this case. - Ignore errors from `cufft` Pull Request resolved: https://github.com/pytorch/pytorch/pull/29243 Differential Revision: D18941701 Pulled By: mruberry fbshipit-source-id: 2048428f32b66ef50c67444c03ce4dd9491179d2	2019-12-14 20:29:58 -08:00
Ivan Kobzarev	701e05dcbb	Buck test targets robolectric,instrumentattion Summary: Buck targets for robolectric and instrumentation tests for pytorch android: ``` buck test fbsource//fbandroid/mode/server //xplat/caffe2/android:test_host ``` ``` buck test //xplat/caffe2/android:test_instrumentation ``` For both: ``` buck test fbsource//fbandroid/mode/server //xplat/caffe2/android:pytorch ``` Models in assets: `pt_android_test_asset` - creates buck target that can be included in both robolectric and instrumentation tests that contains asset created from provided torchscript sources as separate file, using the latest binaries of libtorch. `pt_gen_test_asset_bin` does that tacing, usage format ``` generate_test_asset input_file.jit output_file.py ``` Example of test-host setup for users of pytorch android: robolectric tests: ``` load("fbsource//xplat/caffe2:pt_defs.bzl", "pt_android_test_asset", "pt_predictor_binary", "PT_ANDRIOID_TEST_HOST_JNI_DEPS") pt_android_test_asset( name = "test_asset", src = "test_asset.jit", asset_name = "test_asset.pt", ) robolectric3_test( name = "example_test_host", srcs = [...], jni_deps = PT_ANDRIOID_TEST_HOST_JNI_DEPS, deps = [ ":pytorch_common", ":test_asset", "//fbandroid/java/com/facebook/soloader/annotation:annotation", "//fbandroid/java/com/facebook/testing/robolectric/v3:v3", "//fbandroid/libraries/soloader/java/com/facebook/soloader:soloader", "//fbandroid/third-party/java/robolectric3/robolectric:robolectric", ], ) ``` COMMON_LINKER_FLAGS = ["-Wl,--no-as-needed"] can not be applied on MacOs Test Plan: ``` [twsvcscm@od0187.atn1 /data/sandcastle/boxes/fbsource (b416b20a)]$ buck test fbsource//fbandroid/mode/server //xplat/caffe2/android:pytorch Parsing buck files: finished in 7.2 sec Creating action graph: finished in 0.7 sec Building: finished in 11.9 sec (100%) 791/791 jobs, 0 updated Total time: 19.9 sec Testing: finished in 11.0 sec (30 PASS/0 FAIL) RESULTS FOR //xplat/caffe2/android:test_host //xplat/caffe2/android:test_instrumentation PASS 159ms 15 Passed 0 Skipped 0 Failed org.pytorch.PytorchHostTests PASS 152ms 15 Passed 0 Skipped 0 Failed org.pytorch.PytorchInstrumentedTests (localhost:31930) TESTS PASSED ``` OSS changes test: ``` gradle -p android pytorch_android:cAT passes ``` Reviewed By: dreiss Differential Revision: D18799005 fbshipit-source-id: 881609826a837efebc8526aee40355c5a62947d0	2019-12-14 20:29:52 -08:00
Serhat Yilmaz	57ee7dab87	Wraps assert statements in cuda kernels (#31276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31276 Change assert --> CUDA_ASSERT_KERNEL to avoid hip undefined __assert_fail() This is similar to https://github.com/pytorch/pytorch/pull/13902 in caffe2 land. Test Plan: wait for CI to clear Reviewed By: bddppq Differential Revision: D19047582 fbshipit-source-id: 34703b03786c8eee9c78d2459eb54bde8dc21a57	2019-12-14 20:29:47 -08:00
Martin Yuan	58eb15f41c	JIT Type parser for mobile (#30391 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30391 A Type parser to parse the python string of a Type. For example, "Tuple[str, Optional[float], Dict[str, List[Tensor]], int]". Please refer to test_type_parser.cpp for the usage. One of the use cases is in lite interpreter, types needs to be serialized (directly calling the python_str() of the Type) and deserialized (calling parseType(str)). Test Plan: Imported from OSS Differential Revision: D18924268 Pulled By: iseeyuan fbshipit-source-id: 830d411563abfbeec023f01e7f8f4a1796f9a59a	2019-12-14 20:29:42 -08:00
Ivan Kobzarev	065685180d	Loading module from android asset (#30378 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30378 Loading module directly from android assets. Iteration on https://github.com/pytorch/pytorch/pull/30109 Loading Module: ``` mModule = AndroidUtils.loadModuleFromAsset(assetName, getAssets()); ``` `org.pytorch.AndroidUtils` is excluded from pytorch_jni host build Testing: test_app module load switched to this approach and works fine ``` gradle test_app:installMobNet2QuantDebug -PABI_FILTERS=x86 && adb shell am start -n org.pytorch.testapp.mobNet2Quant/org.pytorch.testapp.MainActivity ``` Test Plan: Imported from OSS Differential Revision: D18893269 Pulled By: IvanKobzarev fbshipit-source-id: a7c73776f40e9c67bef233da05db56cc6efbe76a	2019-12-14 20:29:37 -08:00
Yinglin Sun	70013415c7	DDP should not set grad for globally unused params (#28883 ) Summary: https://github.com/pytorch/pytorch/issues/28294 DDP should not set grad for globally unused parameters DDP currently computes the param to bucket mapping upfront, and allreduce grads for all params in every iteration. Even if params are unused, it will just set grad to zero. With such behavior, optimizer cannot tell if a param indeed has a zero grad or it is not used in the current iteration. This could trigger convergence problems for optimizers with weight decay and momentum such as SGD. However, DDP cannot simply set grad to None for local unused parameters, as local unused parameters might be used in other processes, and hence we still need to allreduce its grad. Instead DDP should figure out the globally unused parameters and skip touching their grad in the end of backward. Implementation summary: * Add locally used parameter map for each model replica. * Mark the locally unused parameters in the end of forward and then reduce to get the globally unused parameters. * In the end of backward skip touching grad for those globally unused parameters. * Add a unit test test_global_local_unused_params_grad Pull Request resolved: https://github.com/pytorch/pytorch/pull/28883 Differential Revision: D18491530 Pulled By: mrshenli fbshipit-source-id: 24e9b5f20df86c34ddbf9c7106250fd6ce186699	2019-12-14 20:29:32 -08:00
Peter Bell	7cb83bea3b	Fix static cuda builds on older cmake versions (#30935 ) Summary: Fixes https://github.com/pytorch/pytorch/pull/28378#issuecomment-562597033 To reproduce the failure I had to downgrade to `cmake 3.9` (Ubuntu 18 uses 3.10 apparently). These older `cmake` versions unfortunately don't seem to allow `target_link_libraries(INTERFACE)` to be used with imported libraries. Switching back to `set_property(TARGET)` fixes the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30935 Differential Revision: D18956912 Pulled By: albanD fbshipit-source-id: a2b728ee3268599a428b7878c988e1edef5d9dda	2019-12-14 20:29:27 -08:00
Davide Libenzi	7c1b5084a7	Enable equality operator for bfloat16 CPU scalar types. (#30817 ) Summary: See https://github.com/pytorch/xla/issues/1330 for reference. mruberry ailzhang FYI Pull Request resolved: https://github.com/pytorch/pytorch/pull/30817 Differential Revision: D18847375 Pulled By: mruberry fbshipit-source-id: d1efedf8b975b8d9b55cf0ddf141818eaa7c91f0	2019-12-14 20:29:21 -08:00
Sebastian Messmer	2950530031	caffe2::TypeMeta uses compile time type names (#26619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26619 ghstack-source-id: 95348564 Test Plan: unit tests Differential Revision: D17519252 fbshipit-source-id: 337ec76d17172dd1af60a1676d69964a41dcb7a1	2019-12-14 20:29:16 -08:00
Sebastian Messmer	6e1e09fd10	Compile time type names (#26618 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/26618 Implement a mechanism to get type names at compile time In a future diff, I'm planning to introduce this to caffe2::TypeMeta and a few other places. ghstack-source-id: 95337871 Test Plan: unit tests Differential Revision: D17519253 fbshipit-source-id: e14017f962fd181d147accb3f53fa8d6ee42a3f8	2019-12-14 20:29:11 -08:00
Vitaly Fedyunin	c35cddb306	Switch default memory format of clone operator to Preserve Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30089 Test Plan: Imported from OSS Differential Revision: D18624985 Pulled By: VitalyFedyunin fbshipit-source-id: 8d315b08b7b5858fd0a81d3375b44ccb94787ad4	2019-12-14 20:29:06 -08:00
Vitaly Fedyunin	fde3d707ad	Switch default memory format of to (and similar) operators to Preserve Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30088 Test Plan: Imported from OSS Differential Revision: D18624984 Pulled By: VitalyFedyunin fbshipit-source-id: 54901786d7496c7dce785140b0585ac9093b1d86	2019-12-14 20:29:01 -08:00
Vitaly Fedyunin	927588df8e	Switch default memory format of _like operators to Preserve Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30087 Test Plan: Imported from OSS Differential Revision: D18624986 Pulled By: VitalyFedyunin fbshipit-source-id: 8e434966f872ffaddf1249248ea445cbbab300ce	2019-12-14 20:28:57 -08:00
Gregory Chanan	1ec989404c	Kill some unnecessary function declarations. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31216 Test Plan: Imported from OSS Differential Revision: D18986640 Pulled By: gchanan fbshipit-source-id: 30630d9ea025bb510f85e9627cbb4ba46de5e93d	2019-12-14 20:28:52 -08:00
Gao, Xiang	d7d07e7caf	thrust is included in SortingKthValue.cu but never used Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31263 Differential Revision: D19042793 Pulled By: ngimel fbshipit-source-id: 28f06c46a53e15f106ebee6c36e2ad25a3676bd2	2019-12-14 20:28:47 -08:00
Serhat Yilmaz	cd3f05b44d	Small fixes for hipification (#31200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31200 We do not hipify these files when doing out of place. Test Plan: wait for CI to clear. Differential Revision: D18963683 fbshipit-source-id: eeba8597143f26417d0a8181a4c746139afefa24	2019-12-14 20:28:43 -08:00
Xiang Gao	9954739956	Refactor test for unique and unique_consecutive and fix some bugs (#31211 ) Summary: Tests for unique_dim will be refactored in a separate PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31211 Differential Revision: D19034968 Pulled By: ngimel fbshipit-source-id: 855d326b37638b5944f11fbbce03394cf000daf9	2019-12-14 20:28:38 -08:00
Anjali Chourdia	3587f769dc	use propagate_names instead of propagate_names_for_reduction for cumsum and cumprod Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31134 Differential Revision: D18964172 Pulled By: anjali411 fbshipit-source-id: 3050c6d283a469a858378c44ac2ab9102baefce5	2019-12-14 20:28:33 -08:00
Shihao Xu	a9ad98fb25	Remove unused argument "destId" in addSendRpcBackward (#31207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31207 Cleanup after #30914. In #30914, `autogradContext->addKnownWorkerId(dst);` was moved out of `addSendRpcBackward()`. So `addSendRpcBackward()` does not need `dstId` as it's argument anymore. ghstack-source-id: 95509218 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:dist_autograd_fork -- test_context_cleanup_tensor_no_grad ``` Differential Revision: D5742365 fbshipit-source-id: accd041a594ec18d369231f5590289828d87baa7	2019-12-14 20:28:29 -08:00
peter	8fea7a49d6	pinning hypothesis for windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31169 Differential Revision: D19036734 Pulled By: mingbowan fbshipit-source-id: 2205a40720329cb53e741c9827c9049142759588	2019-12-14 20:28:24 -08:00
Jeremy Lilley	b64baa963f	Robustify rpc_agent handlers with generic Future<T> (#31224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31224 If a future coming back to a rpc_agent server is satisfied with an exception, ensure this information is propagated back over the wire. ghstack-source-id: 95522418 Test Plan: buck test mode/dev-nosan caffe2/torch/fb/distributed/thriftRpcBackend/... Differential Revision: D18979185 fbshipit-source-id: 99848ae805cc2d48948809a238f61a2e0ef234c9	2019-12-14 20:28:20 -08:00
Yanli Zhao	36d17f4105	abort nccl communicators before throwing operation timed out (#31128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31128 When operation times out due to some errors that are not detected by nccl communicators, ncclCommWatchdog can not check this time out error and thus can not abort ncclComms accordingly. So explicitly abort ncclComms here before throwing this timed out exception to users, after this, ncclCommWatchdog can detect nccl communicators are aborted and clean up devNCCLCommMap_ accordingly. if throwing timed out excepiton without aborting nccl communicators here, it was observed that CUDA GPU will have 100% utilization and can not run new events successfully. ghstack-source-id: 95528488 Test Plan: newly revised test _test_nccl_errors_blocking passed with the changes in this diff; the reviesed test failed withtout the changes in this diff Reviewed By: isunjin Differential Revision: D18928607 fbshipit-source-id: be65a05ce4ff005f0c7fed36ae8e28903e8ffe2b	2019-12-13 00:33:36 -08:00
Yangqing Jia	1ef99cf0ab	Intrusive_ptr implementation slower than shared_ptr (#30810 ) Summary: It was a random coding exercise so I wasn't putting much effort into it; but, I was like "hey is the current intrusive_ptr implementation optimized enough?" so I compared it with shared_ptr (using std::shared_from_this). My benchmark result shows that intrusive_ptr is actually slower. On my macbook the speed is: ``` --------------------------------------------------------------- Benchmark Time CPU Iterations --------------------------------------------------------------- BM_IntrusivePtrCtorDtor 14 ns 14 ns 52541902 BM_SharedPtrCtorDtor 10 ns 10 ns 71898849 BM_IntrusivePtrArray 14285 ns 14112 ns 49775 BM_SharedPtrArray 13821 ns 13384 ns 51602 ``` Wanted to share the results so someone could probably take a look if interested. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30810 Reviewed By: yinghai Differential Revision: D18828785 Pulled By: bddppq fbshipit-source-id: 202e9849c9d8a3da17edbe568572a74bb70cb6c5	2019-12-13 00:25:36 -08:00
Ivan Kobzarev	f7c92f60ba	Typo in filename align with classname Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31235 Test Plan: Imported from OSS Differential Revision: D19001793 Pulled By: IvanKobzarev fbshipit-source-id: ae7f410be6b3c291f1feb3027b5b4a6b7ce15ab3	2019-12-12 23:16:29 -08:00
Ivan Kobzarev	db90a5b992	Switch to open sourced fbjni (#30175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30175 fbjni was opensourced and java part is published as 'com.facebook.fbjni:fbjni-java-only:0.0.3' switching to it. We still need submodule fbjni inside the repo (which is already pointing to https://github.com/facebookincubator/fbjni) for so linking. Packaging changes: before that `libfbjni.so` came from pytorch_android_fbjni dependency, as we also linked fbjni in `pytorch_android/CMakeLists.txt` - it was built in pytorch_android, but excluded for publishing. As we had 2 libfbjni.so there was a hack to exclude it for publishing and resolve duplication locally. ``` if (rootProject.isPublishing()) { exclude '/libfbjni.so' } else { pickFirst '/libfbjni.so' } ``` After this change fbjni.so will be packaged inside pytorch_android.aar artefact and we do not need this gradle logic. I will update README in separate PR after landing previous PR to readme(https://github.com/pytorch/pytorch/pull/30128) to avoid conflicts Test Plan: Imported from OSS Differential Revision: D18982235 Pulled By: IvanKobzarev fbshipit-source-id: 5097df2557858e623fa480625819a24a7e8ad840	2019-12-12 20:05:22 -08:00
Jianyu Huang	199e1fb348	Use AVX2 to increase frequency for FP16<->FP32 Caffe2 ops (#31203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31203 For multi-instance environment, AVX2 should help increase the clock frequency. ghstack-source-id: 95502576 Test Plan: buck test //caffe2/caffe2:caffe2_test_cpu -- "Float16" Reviewed By: jspark1105 Differential Revision: D18962649 fbshipit-source-id: 6532d929a99f41f2f6ad1a1a1962e38ae3ddaecb	2019-12-12 19:42:29 -08:00
Ivan Kobzarev	ca8cb3241a	Expose setNumThreads to android api (#31205 ) Summary: PR https://github.com/pytorch/pytorch/pull/31033 was unlanded due to macos build failure: https://app.circleci.com/jobs/github/pytorch/pytorch/3916388 This PR has changes that `setNumThreads` is only for android and moved to separate class `org.pytorch.PytorchAndroid` as a static function which is better as it has global effect Pull Request resolved: https://github.com/pytorch/pytorch/pull/31205 Reviewed By: dreiss Differential Revision: D18977250 Pulled By: IvanKobzarev fbshipit-source-id: 4995859808af498c82933c4db52bd7c7dfae90e5	2019-12-12 18:57:27 -08:00
Mingzhe Li	b7c148013f	fix torch square_ benchmark runtime error (#31221 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31221 This is fixing the runtime error introduced in https://github.com/pytorch/pytorch/pull/30719 that added torch square_ operator to the benchmark suite. Test Plan: ``` # ---------------------------------------- # PyTorch/Caffe2 Operator Micro-benchmarks # ---------------------------------------- # Tag : short # Benchmarking PyTorch: square_ # Mode: Eager # Name: square__M512_N512_cpu # Input: M: 512, N: 512, device: cpu Forward Execution Time (us) : 66.291 Reviewed By: hl475 Differential Revision: D18987889 fbshipit-source-id: 09c56e3a73aab5ab661aac2b06429063b3a82fac	2019-12-12 18:48:02 -08:00
Alexander Stante	f30b14dead	Fix handling of type comments in body (#30590 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30477. Any type comment after `# type: (...) -> ` is ignored. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30590 Differential Revision: D18887351 Pulled By: driazati fbshipit-source-id: 162c652f6d7610d14609bbcb25aaa27cdd947a76	2019-12-12 18:19:30 -08:00
Yanli Zhao	20a2e526ef	build a generic future<T> (#29579 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29579 Per #28923, this diff is to move Future<Message> to torch::utils and extend it to be Future<T>, most of implementations are copied from FutureMessage and ivalue::Future. merge ivalue::Future with Future<T> will be done separately. The main difference between Future<T> and FutureMessage is the error handling, instead of checking message type inside Future to handle error, this future<T> owns has_error_ and error_ states. also this future passes value_, has_error_ and error_ states to callbacks for easily read future states. In next diff, a torch script rpc async API will be created, before the API returns, it will create an ivalue::Future and passes it to Future<T>'s call back where state of ivalue::Future will be set. In this way, the torch script rpc async API can still return a ivalue::Future and call wait() to get its state appropriately afterwards. ghstack-source-id: 95479525 Test Plan: unit tests Differential Revision: D18263023 fbshipit-source-id: 48a65712656a72c2feb0bb3ec8b308c0528986a6	2019-12-12 16:57:14 -08:00
svcscm	c08f2ea254	Updating submodules Summary: GitHub commits: `367861fec0` `22f5444c09` `11c103407d` `34507cb383` `16d5e3e5ac` `c4ce8e637f` `0f7ef79620` `330fa43933` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 2b6847af7ccba6b53a866e3fded2edf9995b0aaf	2019-12-12 16:53:44 -08:00
Qi Zhou	5ef0d6f854	Remove subgraphNode kind assert in unmergeSubgraph (#31212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31212 To be able to use this function more broadly. Test Plan: unit tests Reviewed By: jackm321 Differential Revision: D18978913 fbshipit-source-id: d998dc7c7f9540f491a8a4bc5d6d25d9c3bf8764	2019-12-12 15:59:55 -08:00
Daya Khudia	a2463cbc38	Adding quantized clamp kernel (#30541 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30541 ghstack-source-id: 95450749 Adding quantized clamp kernel Test Plan: Added test. buck test mode/dev //caffe2/test:quantized -- 'test_qclamp $test_quantized\.TestQuantizedOps$' --print-passing-details Differential Revision: D18739628 fbshipit-source-id: 38a029ab96c5b0689bb15c67dc4f274883e74975	2019-12-12 15:54:40 -08:00
Lara	1d5af9599d	Update ONNX Flatten to accept negative indices in opset 11 (#30751 ) Summary: Update ONNX Flatten to accept negative indices in opset 11. With this change, some cases of flatten do not rely on the input rank being available. Fixes : https://github.com/pytorch/pytorch/issues/30512 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/30751 Reviewed By: hl475 Differential Revision: D18946904 Pulled By: houseroad fbshipit-source-id: a6fa30a9182fff92211e505a19325525c6112f19	2019-12-12 15:27:54 -08:00
Mingbo Wan	84d6796658	move AWS ECR gc jobs to circleci (#30996 ) Summary: all jobs are currently running with "--dry-run", so you can verify if the jobs are doing the right thing. i'll remove the flag and make it runs every hour same as on Jenkins once this PR is approved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30996 Differential Revision: D18971001 Pulled By: mingbowan fbshipit-source-id: 2384bdb50ebdf47aad265395f26be3843f0ce05e	2019-12-12 14:28:20 -08:00
Linbin Yu	5c936845cf	fix torch_train build (#30497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30497 fix torch_train build Test Plan: buck build //xplat/caffe2:torch_trainAndroid Reviewed By: dreiss Differential Revision: D18719662 fbshipit-source-id: a3d06b4068d502dbe29681d9f26906f2b8c7b622	2019-12-12 14:20:17 -08:00
Shen Li	a38184dbab	Only create OwnerRRefs when processing remote calls (#31163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31163 The purpose is to unblock integration with TorchScript. Currently, an OwnerRRef will be created by either a remote call or a to_here call, whichever arrives first. However, when making RRef an IValue, we need to know the type of value held by the RRef, which is retrived by checking the return type of the TorchScript function. The TorchScript function is only avaible during the remote call but not in the to_here() call. Hence, an OwnerRRef can only be created when processing a remote call. This commit implements this behavior by introducing a conditional variable for every OwnerRRef in the RRefContext, and let the to_here() call and PyRRef::unpickle block on the CV until the value is ready. Test Plan: Imported from OSS Differential Revision: D18949591 Pulled By: mrshenli fbshipit-source-id: 17513c6f1fd766885ea8e1cd38f672a403fa4222	2019-12-12 14:02:04 -08:00
Iurii Zdebskyi	f6c31f61c5	Enabled roll for bool tensor (#31194 ) Summary: Fixed this [issue](https://github.com/pytorch/pytorch/issues/31079). Tested via unit test Pull Request resolved: https://github.com/pytorch/pytorch/pull/31194 Differential Revision: D18958141 Pulled By: izdeby fbshipit-source-id: 119bf4d31df10ee02c277f5a4663038470cf7780	2019-12-12 13:48:14 -08:00
Elias Ellison	bee6344d4e	remove / rewrite weak module tests (#31193 ) Summary: Remove most of the testing for `weak_script`, since we removed it. Refactor a few of the existing tests to use recursive scripting api. Fix for https://github.com/pytorch/pytorch/issues/23965 Pull Request resolved: https://github.com/pytorch/pytorch/pull/31193 Differential Revision: D18966291 Pulled By: eellison fbshipit-source-id: 6b1e18c293f55017868a14610d87b69be42bde12	2019-12-12 13:33:38 -08:00
Jianyu Huang	066e3ed953	Re-apply "[bert/RoBERTa] Optimize LayerNorm with explicit vectorization using Vec256" (#31127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31127 Original commit changeset: d22448b90843 On Skylake T6: Single Core: (Note that our benchmark generates batch_size=47 for first case and batch_size=56 for the second case. In spite of that, the vectorized version is still faster than the original reference C version without vectorization.) - Before the PR: ``` native_layer_norm 0.81% 5.884ms 0.81% 5.884ms 122.580us NaN 0.000us 0.000us 48 [[47, 1, 1024], [1024], [1024]] ``` - After the PR: ``` native_layer_norm 0.68% 5.053ms 0.68% 5.053ms 105.272us NaN 0.000us 0.000us 48 [[56, 1, 1024], [1024], [1024]] ``` 20 Cores: - Before the PR: ``` native_layer_norm 1.65% 41.682ms 1.65% 41.682ms 868.365us NaN 0.000us 0.000us 48 [[61, 64, 1024], [1024], [1024]] ``` - After the PR: ``` native_layer_norm 1.34% 33.829ms 1.34% 33.829ms 704.771us NaN 0.000us 0.000us 48 [[61, 64, 1024], [1024], [1024]] ``` ghstack-source-id: 95420889 Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm" buck test mode/dev-nosan //caffe2/test:nn -- "test_LayerNorm_1d_no_elementwise_affine_eval" python run_test.py -i nn -- TestNN.test_LayerNorm_1d_no_elementwise_affine_eval Differential Revision: D18936428 fbshipit-source-id: 8cae33d35fb338b5ac49b1597c2709152612d6e5	2019-12-12 13:31:12 -08:00
Vitaly Fedyunin	66f2bba852	Adding function to convert Module to channels last Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28991 Test Plan: Imported from OSS Differential Revision: D18430810 Pulled By: VitalyFedyunin fbshipit-source-id: 0693d4e31fc6f9831722c29fc83517f16ddfc028	2019-12-12 11:38:35 -08:00
Ivan Kobzarev	4ead2e8996	Fix CircleCI behavior for non-leaf stack PRs (#31088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31088 Original issue: https://github.com/pytorch/pytorch/issues/31027 The problem is that for the stacks of PRs for non-leaf PRs circleCI does not set environment variable `CIRCLE_PULL_REQUEST` which is used to filter out some jobs that should run only on `master`. (Android job for master includes alll 4 abis (x86, x86_64, armeabi-v7a, arm64-v8a) and gradle build tries to get results from all 4 abis, for PRs we run only x86 build for resources economy. Thats why not filtered master android job fails as abis apart x86 were not scheduled) env variable `CIRCLE_BRANCH ` is set fine and can be used as a workaround to distinguish that this is PR (published with ghstack). Test Plan: Imported from OSS Differential Revision: D18966385 Pulled By: IvanKobzarev fbshipit-source-id: 644c5ef07fcf2d718b72695da2cc303da8b94ef4	2019-12-12 11:33:14 -08:00
Richard Zou	bcb0bb7e0e	Remove unnecessary ATen/core/EnableNamedTensor.h (#31117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31117 After this diff, we will have completely removed the named tensor feature flagging. This means that named tensors are always on and that there is no mechanism to turn them off. There should be no more follow-up diffs. I performed the deletion of the header with ``` find . -type f -print0 \| xargs -0 sed -i '/#include <ATen\/core\/EnableNamedTensor.h>/d' ``` Test Plan: - wait for CI Differential Revision: D18934952 Pulled By: zou3519 fbshipit-source-id: 253d059074b910fef15bdf885ebf71e0edf5bea5	2019-12-12 09:53:07 -08:00
Richard Zou	9047d4df45	Remove all remaining usages of BUILD_NAMEDTENSOR (#31116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31116 Changelist: - remove BUILD_NAMEDTENSOR macro - remove torch._C._BUILD_NAMEDTENSOR - remove all python behavior that relies on torch._C._BUILD_NAMEDTENSOR Future: - In the next diff, I will remove all usages of ATen/core/EnableNamedTensor.h since that header doesn't do anything anymore - After that, we'll be done with the BUILD_NAMEDTENSOR removal. Test Plan: - run CI Differential Revision: D18934951 Pulled By: zou3519 fbshipit-source-id: 0a0df0f1f0470d0a01c495579333a2835aac9f5d	2019-12-12 09:53:03 -08:00
Michael Suo	c0bcfd0445	Revert D18923167: Expose setNumThreads to android api Test Plan: revert-hammer Differential Revision: D18923167 Original commit changeset: 8d98c2edbff4 fbshipit-source-id: 7db37cff298c511d0dd9eb373811c769e4a73be9	2019-12-12 09:23:58 -08:00
Elias Ellison	56de8853da	Resubmit overload v2 (#31123 ) Summary: Resubmit of https://github.com/pytorch/pytorch/pull/30356 and https://github.com/pytorch/pytorch/pull/31014 :'( The last commit contains the fix. There was an internal FBcode error not able to compile the previous `impl_default->second.equal(default_val.second))` line. I tried various fixes in C++ internally but couldn't figure anything out. This is a good example of the programming costs of going from python -> c++ for different types of objects, because the conceptual overhead has expanded in scope from (python) -> (python, c++, pybind). Pull Request resolved: https://github.com/pytorch/pytorch/pull/31123 Differential Revision: D18936128 Pulled By: eellison fbshipit-source-id: 7d8fd66a6dd4a3e9838f3a0b68c219b6565a9462	2019-12-12 07:54:23 -08:00
Jerry Zhang	3a02ed822b	Remove `insert_prepack_unpack` and `fold_prepack` for now (#30909 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30909 `fold_prepack` doesn't work anymore after we change `scale`, `zero_point` to be attributes, but since the freeze API is coming up, I don't want to spend time to make this work since this will be thrown away later. Test Plan: . Imported from OSS Differential Revision: D18864537 fbshipit-source-id: 649e6b91f2b04b8babacc0afb6bc1530ed7259d3	2019-12-12 07:44:31 -08:00
Stephen Roller	159835e666	Add types for the remaining optimizers. (#31130 ) Summary: Patch Description Round out the rest of the optimizer types in torch.optim by creating the stubs for the rest of them. Testing: I ran mypy looking for just errors in that optim folder. There's no new mypy errors created. ``` $ mypy torch/optim \| grep optim $ git checkout master; mypy torch/optim \| wc -l 968 $ git checkout typeoptims; mypy torch/optim \| wc -l 968 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/31130 Reviewed By: stephenroller Differential Revision: D18947145 Pulled By: vincentqb fbshipit-source-id: 5b8582223833b1d9123d829acc1ed8243df87561	2019-12-12 06:36:41 -08:00
Shihao Xu	2488231fe3	Tweak pollTimedOutRPCs thread synchronization (#30355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30355 - Make processTimedOutFutures hold lock. - Reduce unnecessary scan on future and future timeout maps. - Reduce the scope of lock at a spot. - Avoid repeatedly wake up if user set timeout = 0. ghstack-source-id: 95409528 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_rpc_timeouts buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rpc_timeouts ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_rpc_timeouts buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_rpc_timeouts ``` Differential Revision: D5516149 fbshipit-source-id: 4bb0bd59fa31d9bfaef9f07ac0126782da17f762	2019-12-11 22:02:32 -08:00
Michael Suo	0db6c01301	Re-enable python 2 builds (#31164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31164 We have a small number of internal projects that still are on Python 2. Until we can figure out how to get rid of them, we need to continue supporting Python 2 for PyTorch. Test Plan: Imported from OSS Differential Revision: D18949698 Pulled By: suo fbshipit-source-id: 4a9d7e4306ed81576e05f243de472937a2bb1176	2019-12-11 22:02:28 -08:00
Serhat Yilmaz	4f5a4be45f	Add native/quantized to the list of header rewrites (#31151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31151 same as title. I am not sure why this was not added in the first place. Test Plan: wait for build to succeed. Reviewed By: bddppq, xw285cornell Differential Revision: D18880216 fbshipit-source-id: 8b17d4fbd5dd08c28c52df8b1da77b69d56d65dc	2019-12-11 21:59:29 -08:00
BowenBao	6ab2d1b1a4	Partially support tensor lists in loop/concat/stack (#30126 ) Summary: This is a follow-up PR after https://github.com/pytorch/pytorch/pull/29136 ~~and https://github.com/pytorch/pytorch/pull/29171~~ ONNX::Loop does not support Sequence type as loop-carried dependencies. Only tensors are supported. This PR adds a pass that converts Sequence loop-carried dependencies to scan_outputs. In opset 11, only the below pattern is supported. ``` PTIR graph: ... %res.1 : Tensor[] = prim::ListConstruct() %res : Tensor[] = prim::Loop(%11, %22, %res.1) block0(%i.1 : Tensor, %res.6 : Tensor[]): ... %res.3 : Tensor[] = aten::append(%res.6, %17) -> (%22, %res.3) return (%res.3) ONNX graph: ... %res : Tensor = onnx::Loop(%11, %22) block0(%i.1 : Tensor): ... -> (%22, %17) %res_seq : Tensor[] = onnx::SplitToSequence[keepdims=0](%res) return (%res_seq) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30126 Reviewed By: hl475 Differential Revision: D18946880 Pulled By: houseroad fbshipit-source-id: 67ee65700513e8a942344a3d647e2e73c19ee3d2	2019-12-11 21:24:41 -08:00
Shihao Xu	a3ed350eb2	Change type of timeoutFutures_ key to time_point instead of duration (#31078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31078 Make `ProcessGroupAgent::pollTimedOutRPCs` code more conventional. - Use `std::chrono::time_point` to represent `endTime` instead of `std::chrono::duration`. - Replace `std::condition_variable::wait_for(lock, endTime)` with `std::condition_variable::wait_until(lock, endTime)`. - Remove the unnecessary `::getRPCRemainingTime()`. ghstack-source-id: 95408482 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_rpc_timeouts buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rpc_timeouts ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_rpc_timeouts buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_rpc_timeouts ``` Differential Revision: D5705442 fbshipit-source-id: ba54b7bdb84bc02d05c22360b01290d044bbfcf5	2019-12-11 21:01:31 -08:00
Will Feng	49a5841a9f	Make Conv{1,2,3}dOptions and ConvTranspose{1,2,3}dOptions different classes (#31005 ) Summary: Currently, both `Conv{1,2,3}dOptions` and `ConvTranspose{1,2,3}dOptions` are aliases of the `ConvOptions<{1,2,3}>` class, which causes confusion because the `ConvOptions` class has parameters such as `transposed` that shouldn't be exposed to the end user. (This has caused issues such as https://github.com/pytorch/pytorch/issues/30931.) This PR makes the following improvements: 1. Rename the original `torch::nn::ConvOptions<N>` class to `torch::nn::detail::ConvNdOptions<N>` class, to signify that it's an implementation detail and should not be used publicly. 2. Create new classes `torch::nn::ConvOptions<N>` and `torch::nn::ConvTransposeOptions<N>`, which have parameters that exactly match the constructor of `torch.nn.Conv{1,2,3}d` and `torch.nn.ConvTranspose{1,2,3}d` in Python API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31005 Differential Revision: D18898048 Pulled By: yf225 fbshipit-source-id: 7663d646304c8cb004ca7f4aa4e70d3612c7bc75	2019-12-11 20:31:48 -08:00
Elias Ellison	85107e72b4	Fix type unification With Specialized Tensor Shapes (#31076 ) Summary: Fix for https://github.com/pytorch/pytorch/issues/30015 We had a model that failed in shape propagation because we could not unify `Tensor` and `Optional[BoolTensor]`. Tensor not subtyping Optional[BoolTensor] was correct, but we should have unified those two types to `Optional[Tensor]`. The fix here is that for immutable types containers (Optional, Tuple Type), we should be attempting to unify with complete shape information, and if that fails, then try to unify those types with unshaped types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31076 Differential Revision: D18921802 Pulled By: eellison fbshipit-source-id: aa6890277470c60b349ed1da4d81cc5d71d377f6	2019-12-11 20:11:34 -08:00
Lara	97c1e90f46	ONNX Interpolate Add Scales Params (#28324 ) Summary: Fix for : https://github.com/pytorch/pytorch/issues/27176 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28324 Reviewed By: hl475 Differential Revision: D18309133 Pulled By: houseroad fbshipit-source-id: 348bb41393442c6b107d88fc2cd3224e0afa3ccf	2019-12-11 20:09:15 -08:00
Lara	79c27ba4ef	Add ONNX Export Support to floor_divide (#31081 ) Summary: Adding support for the new ATen op floor_divide which was introduced in https://github.com/pytorch/pytorch/pull/30493/files. This operation is used in Torchvision/FasterRCNN-MaskRCNN, which are now failing after the new op was introduced. This PR fixes the failure. cc: neginraoof Pull Request resolved: https://github.com/pytorch/pytorch/pull/31081 Reviewed By: houseroad Differential Revision: D18945316 Pulled By: eellison fbshipit-source-id: 09919c237d618ce7db293c7770f48f7304949dcf	2019-12-11 19:39:11 -08:00
svcscm	d81c6bde3b	Updating submodules Summary: GitHub commits: `36ab9debf5` `55e5070f0a` `5fed1a6da7` `9f0f470fce` `e1dfe80fe0` `786d2c588c` `6c2b9d596d` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 1242688c93ba233f19f3afac174c814ae4c455dc	2019-12-11 18:58:37 -08:00
Zafar Takhirov	efe683fb2a	dynamicly quantized linear benchmarking Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30148 Test Plan: Imported from OSS Differential Revision: D18613006 Pulled By: z-a-f fbshipit-source-id: 3851189a2822fd09a5dd97c9d54774727822d2bf	2019-12-11 18:39:57 -08:00
Jeremy Lilley	73f9e81660	Make rref fetch calls async. (#31086 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31086 This change leverages the new future response framework so that server threads don't block until setValue is called. Particulurly, we add a getFuture() method to OwnerRRef so that we get a future that is satisfied once setValue is called. ghstack-source-id: 95402273 Test Plan: buck test mode/dev-nosan caffe2/test/... Differential Revision: D18925272 fbshipit-source-id: 2caf51019e5b5fd7ec45539544780067deb28610	2019-12-11 18:30:09 -08:00
davidriazati	679b20b1e4	Unify list elements for all list types (#30777 ) Summary: Previously list elements were only unified for tensor lists. This improves error messages and expands the unification logic to include all types. ](https://our.intern.facebook.com/intern/diff/18837726/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30777 Pulled By: driazati Differential Revision: D18837726 fbshipit-source-id: c4d275562a8429700987569426d694faa8f6002e	2019-12-11 17:00:52 -08:00
nikitaved	0414463007	doc fix for max method: a warning about different behaviour on CPU and GPU (#31115 ) Summary: Fixes [30708](https://github.com/pytorch/pytorch/issues/30708), Adds warning regarding different behaviour of the method depending on device type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31115 Differential Revision: D18937365 Pulled By: zou3519 fbshipit-source-id: 7c731dd80f8b371de08d7fdfcc2196be15a593e1	2019-12-11 16:02:33 -08:00
Richard Zou	e5a550cd1d	Fix Test CI by pinning hypothesis and correcting the import (#31137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31137 Our Test CI is broken because: - hypothesis recently did a new release that reorganized their internal modules - we were importing something from their internal module structure. This PR fixes the CI by doing the following: - import SearchStrategy from the correct (public) location - Pin the hypothesis version to avoid future surprises. In the long term, we should stop install hypothesis every time the CI runs and instead install it as a part of our docker build process. See https://github.com/pytorch/pytorch/issues/31136 for details. Test Plan: - I tested this locally; before this PR test/test_nn.py fails to run but after it does run. - Wait for CI Differential Revision: D18940817 Pulled By: zou3519 fbshipit-source-id: c1ef78faa5a33ddf4d923f947c03cf075a590bb8	2019-12-11 15:42:59 -08:00
Brian Vaughan	945ce71b18	Correctly handle scalar types, fix parse of numpy ints (#30486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30486 Fixes: https://github.com/pytorch/pytorch/issues/29252 There is some incorrect code in the handling of parsing python numbers that led to issue #29252: When we allow interpretation of a zero-dim numpy integer value as a scalar in pytorch, we incorrectly parse the int as a float. This PR also fixes the issue described in the "FIXME" here: https://github.com/pytorch/pytorch/pull/27628/files#diff-f539198dd366265fb8dc2d661bc5d5bcR1487 Test Plan: Added a unit test based on the example given in the issue. Differential Revision: D18932520 Pulled By: nairbv fbshipit-source-id: f6416f28dfd73ac72c1042042851d76beb5fcf65	2019-12-11 15:35:57 -08:00
Michael Suo	293a139d79	add a warning for script classes (#31069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31069 Just to clarify that they are still experimental. Test Plan: Imported from OSS Differential Revision: D18920496 Pulled By: suo fbshipit-source-id: d2f3014592a01a21f7fc60a4ce46dd0bfe5e19e9	2019-12-11 14:48:55 -08:00
Ivan Kobzarev	6225443009	Expose setNumThreads to android api (#31033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31033 Intention: There are requests from users to control number of threads from android side: https://discuss.pytorch.org/t/android-pytorch-forward-method-running-in-a-separate-thread-slow-down-ui-thread/63516/2 https://discuss.pytorch.org/t/threading-of-model-pytorch-android/62490/2 At the moment `setNumThreads` is placed in `org.pytorch.Module`, but this method changes global threadPool size, in future we will move it to some separate class to repeat python binding structure, which has torch.set_num_threads() Test Plan: Imported from OSS Differential Revision: D18923167 Pulled By: IvanKobzarev fbshipit-source-id: 8d98c2edbff42e9b673509672dce3f2dd03a923e	2019-12-11 14:20:14 -08:00
Shihao Xu	06d874f95b	Change startTime_ to endTime_ in FutureInfo (#30342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30342 This can eliminate the unnecessary calls to getRPCEndTime(). Reduce lines of code for simplicity. ghstack-source-id: 95377162 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_rpc_timeouts buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_rpc_timeouts ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_rpc_timeouts buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_rpc_timeouts ``` Differential Revision: D5705624 fbshipit-source-id: aca4c4917718124022c09ee0d13cf5ca483402af	2019-12-11 14:04:49 -08:00
svcscm	7a8261e962	Updating submodules Summary: GitHub commits: `06033e7eb2` `c56d2fa73f` `972f299a62` `3717a88289` `ea64a080c6` `b4e0237162` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 73d2d91c851f1905d6d4606a9f8002eb47246852	2019-12-11 12:52:00 -08:00
Shen Li	4b2d356ac1	Re-enable test_rref_context_debug_info after enforcing proper synchronization (#30994 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30994 The flakiness we saw was due to missing barriers(), which caused states leaked into previous or subsequent checks. This commit attempts fix this problem by adding barriers before and after each check. Test Plan: Imported from OSS Differential Revision: D18893457 Pulled By: mrshenli fbshipit-source-id: 42bcc12efa7e6e43e2841ef23e4bc2543b0236c6	2019-12-11 12:38:14 -08:00
Alban Desmaison	5b03ff0a09	Update embedding renorm comment to reference fixed issue (#29140 ) Summary: Address last comment in https://github.com/pytorch/pytorch/issues/28546 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29140 Differential Revision: D18915091 Pulled By: albanD fbshipit-source-id: 756ff5bb6a92d47c80aa9f96ff6f0edea5fd24de	2019-12-11 11:58:55 -08:00
Rohan Varma	dbc8b00816	Document WorkerInfo and RpcBackendOptions structures in RPC docs. (#31077 ) Summary: We mention `WorkerInfo` and `RpcBackendOptions` in a couple of different locations in our docs, and these are public classes that the user may use, so we should add the class to the documentation. <img width="978" alt="Screen Shot 2019-12-10 at 1 42 22 PM" src="https://user-images.githubusercontent.com/8039770/70571759-47db2080-1b53-11ea-9d61-c83985a29dd9.png"> Pull Request resolved: https://github.com/pytorch/pytorch/pull/31077 Differential Revision: D18928162 Pulled By: rohan-varma fbshipit-source-id: 67f11eedd87523c469377b791a0ba23704ec3723	2019-12-11 11:39:57 -08:00
Yuchen Hao	4a751dfc20	optimize MulGradient for common shapes (#19705 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/19705 Optimizing for a case when there's a consecutive dims that are not broadcasted followed by another consecutive dims that are broadcasted. For example, MulGradient(["dC", "A", "B"], ["dA", "dB"], broadcast=True, axis=0) where A.shape == dC.shape == [9508, 80] and B.shape == [80] . Test Plan: In SKL T6, Running mul_gradient_benchmark without this optimization Operator #0 (dA, MulGradient) 11.9119 ms/iter After this optimization, Operator #0 (dA, MulGradient) 0.672759 ms/iter Need to land D15291800 before to fix the unit test error Reviewed By: dmudiger Differential Revision: D15075415 fbshipit-source-id: 0f97be17cf8f1dacbafa34cd637fb8bc1c5e5387	2019-12-11 11:39:52 -08:00
Shen Li	a53b39f09d	Disable flaky test_process_group_debug_info Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31113 Test Plan: Imported from OSS Differential Revision: D18932365 Pulled By: mrshenli fbshipit-source-id: a2996b6a8d3881be4ffc174b85509aeee8c51c96	2019-12-11 11:36:58 -08:00
Iurii Zdebskyi	44ecc3a70b	Add tracing support for optional Device and Layout (#30979 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30979 This stack is a first step toward an effort to fix, clean up and simplify code generation logic. �Please see the master [task](https://github.com/pytorch/pytorch/issues/30405) to see related discussions and all the known issues. Main focus of these changes is TensorOptions in code generation. Goals: - Remove TensorOptions from generated code wherever it's possible. Leave it only in python/C++ API layers. - Refactor TensorOptions logic to a single place. - Log all discovered issues. Non goals: - Fix Everything! - Remove all the hacks in code generation scripts. - Clean up and defector all code generation scripts. -------------- In this PR: Add tracing support for optional Device and Layout types. -------------- Test Plan: Imported from OSS Differential Revision: D18912685 Pulled By: izdeby fbshipit-source-id: 4a9514ce2eee0041f9bc96636d3ddb4f077675e1	2019-12-11 11:32:52 -08:00
Iurii Zdebskyi	672f4cfad9	Added C++ API test (#30980 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30980 This stack is a first step toward an effort to fix, clean up and simplify code generation logic. �Please see the master [task](https://github.com/pytorch/pytorch/issues/30405) to see related discussions and all the known issues. Main focus of these changes is TensorOptions in code generation. Goals: - Remove TensorOptions from generated code wherever it's possible. Leave it only in python/C++ API layers. - Refactor TensorOptions logic to a single place. - Log all discovered issues. Non goals: - Fix Everything! - Remove all the hacks in code generation scripts. - Clean up and defector all code generation scripts. -------------- In this PR: Add a test to check that C++ API behavior stays the same after all the changes. While working on it a bug related to `requires_grad` was found and logged in the master task. -------------- Test Plan: Imported from OSS Differential Revision: D18912681 Pulled By: izdeby fbshipit-source-id: 19772a37c92dde820839b79055f348689b99fa77	2019-12-11 11:21:05 -08:00
David Riazati	1f87e823b8	Make `nn.Transformer` TorchScript compatible (#28561 ) Summary: This makes `nn.Transformer` usable from TorchScript. It preserves backwards compatibility via `__setstate__` on the encoder/decoder. Fixes https://github.com/pytorch/pytorch/issues/24173 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28561 Differential Revision: D18124753 Pulled By: driazati fbshipit-source-id: 7314843e5aa9c9bf974c4672e4edb24ed8ef4a6f	2019-12-11 10:57:31 -08:00
Richard Zou	a929d312ac	Add dill>=0.3.1 as testing dependency (#31121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31121 For https://github.com/pytorch/pytorch/pull/30985 . Test Plan: - run `pip install "dill>=0.3.1"` locally, check that it actually installs dill>=0.3.1. Differential Revision: D18934871 Pulled By: zou3519 fbshipit-source-id: 688a489b9e81134ccb5ab4b099116e3fe6b6b7ae	2019-12-11 10:33:00 -08:00
svcscm	3593981976	Updating submodules Summary: GitHub commits: `9b38c6430e` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 8801c415c9b00bec46efc102c0daceba59397449	2019-12-11 09:50:33 -08:00
Alban Desmaison	717274c001	Add useful warnings for t.grad when it won't be populated for known reasons (#30531 ) Summary: Fix https://github.com/pytorch/pytorch/issues/2362 and https://github.com/pytorch/pytorch/issues/19778 To avoid issues with frozen model, we only consider warning for Tensors that require gradients and are neither leafs nor retain gradients. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30531 Differential Revision: D18832767 Pulled By: albanD fbshipit-source-id: 743e863dc14ab57713e66da78b2e4d759dfba0ff	2019-12-11 09:47:18 -08:00
xiaobing.zhang	3301794855	Port ELU activation to Aten (#29275 ) Summary: VitalyFedyunin, This PR is about port ELU activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" m = nn.ELU() if torch.cuda.is_available(): device = "cuda" m = m.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [100, 10000]: fwd_t = 0 bwd_t = 0 input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. Perfromance: Before: ``` GPU: input size(128, 100) forward time is 0.04 (ms); backwad avg time is 0.09 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: OMP_NUM_THREADS=56 input size(128, 100) forward time is 0.28 (ms); backwad avg time is 0.18 (ms). input size(128, 10000) forward time is 23.53 (ms); backwad avg time is 14.46 (ms). OMP_NUM_THREADS=1 input size(128, 100) forward time is 0.16 (ms); backwad avg time is 0.08 (ms). input size(128, 10000) forward time is 15.53 (ms); backwad avg time is 6.60 (ms). ``` After: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: OMP_NUM_THREADS=56 input size(128, 100) forward time is 0.24 (ms); backwad avg time is 0.17 (ms). input size(128, 10000) forward time is 0.73 (ms); backwad avg time is 1.11 (ms). OMP_NUM_THREADS=1 input size(128, 100) forward time is 0.15 (ms); backwad avg time is 0.07 (ms). input size(128, 10000) forward time is 14.40 (ms); backwad avg time is 6.00 (ms). ``` How to set the numbers of thread? using following script: ``` num_threads=$1 script=$2 last_core=`expr $num_threads - 1` echo "using $num_threads OMP threads" echo "bind cores to 0~$last_core" export OMP_NUM_THREADS=$num_threads export KMP_AFFINITY=granularity=fine,compact,1,0 numactl --physcpubind=0-$last_core --membind=0 python $script ``` and run ./run.sh num_threads test.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29275 Differential Revision: D18587389 Pulled By: VitalyFedyunin fbshipit-source-id: bea8f3f006c6893090f863d047c01886d195437a	2019-12-11 09:44:34 -08:00
Jianyu Huang	4aa30d3c0c	Revert D18293522: Optimize LayerNorm with explicit vectorization using Vec256 Test Plan: revert-hammer Differential Revision: D18293522 Original commit changeset: f4cfed6e62ba fbshipit-source-id: cdd6d9d36c00b516aecdab549abeeffc4a473829	2019-12-11 08:55:28 -08:00
Richard Zou	9305f44854	Remove BUILD_NAMEDTENSOR from codegen and .cu files (#31047 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31047 Changelist: - remove BUILD_NAMEDTENSOR from .cu files - remove BUILD_NAMEDTENSOR special handling in function_wrapper.py - remove BUILD_NAMEDTENSOR from cpp_extension.py. This code actually did nothing because we always compile with BUILD_NAMEDTENSOR. Test Plan: - run tests Differential Revision: D18908442 Pulled By: zou3519 fbshipit-source-id: b239e24de58580adaf3cef573350773a38b1e4f0	2019-12-11 08:49:56 -08:00
svcscm	65f6e449c7	Updating submodules Summary: GitHub commits: `0f94976f31` `be15abd839` `034086d70f` `aa131abdf5` `a3f268f1b5` `6394aabc99` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: fa99a0a096de1f088e5fa8cd92fdf5fd6c330740	2019-12-11 07:25:34 -08:00
Jianyu Huang	d6d6075573	Optimize LayerNorm with explicit vectorization using Vec256 (#29104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29104 We would like to provide the vectorized implementation for layer norm. This PR reuses https://github.com/pytorch/pytorch/pull/23349. Test Plan: buck test mode/dev-nosan //caffe2/test:nn -- "LayerNorm" buck test mode/dev-nosan //caffe2/test:nn -- "test_LayerNorm_1d_no_elementwise_affine_eval" python run_test.py -i nn -- TestNN.test_LayerNorm_1d_no_elementwise_affine_eval Differential Revision: D18293522 fbshipit-source-id: f4cfed6e62bac1b43ee00c32b495ecc836bd9ec5	2019-12-11 06:01:45 -08:00
Michael Suo	28ee309c9a	disable onnx py3 gcc5 build (#31100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31100 This appears to not work right now. Disabling pending an investigation. Test Plan: Imported from OSS Differential Revision: D18928777 Pulled By: suo fbshipit-source-id: 63089131bad98902979e5cf4373732c85badef9d	2019-12-11 00:26:15 -08:00
BowenBao	8013ffd400	Fix weight_norm export for dim=0 (#31015 ) Summary: Exported weight_norm is incorrectly reducing over axis 0 as well when dim is set to 0. Previous test case only covers weight with size(0) == 1, which yields the same result whether reduced over or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31015 Reviewed By: hl475 Differential Revision: D18900894 Pulled By: houseroad fbshipit-source-id: 19004f51933b37f848dbe4138e617a7a8e35a9ec	2019-12-10 23:43:56 -08:00
peterjc123	9a5fd2eb07	Fix conflicts in CMAKE_GENERATOR and generator (#30971 ) Summary: ...specified in -G https://cmake.org/cmake/help/latest/variable/CMAKE_GENERATOR.html According to the document, the generator could be determined through two methods: 1. Specify in `-G` 2. Read from `CMAKE_GENERATOR` We should avoid conflicts in these two methods. This fixes https://github.com/pytorch/pytorch/issues/30910. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30971 Differential Revision: D18927529 Pulled By: mingbowan fbshipit-source-id: e9a179ceb32d6fbabfaeac6cfe9e6170ca170b20	2019-12-10 22:22:26 -08:00
Shunting Zhang	7f5f2e8871	add ZERO_COLLISION_HASH to caffe2 data type (#30912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30912 Add a new data type ZERO_COLLISION_HASH . Test Plan: ci Reviewed By: boryiingsu Differential Revision: D18843626 fbshipit-source-id: b2d8280f13c78b4a656cf95822198df59de7b64c	2019-12-10 21:36:24 -08:00
Michael Suo	c72dd526a7	kill py2 onnx builds Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31082 Differential Revision: D18922689 Pulled By: suo fbshipit-source-id: 98c91b90ee3b1dd13c6020597a0ace741a1597da	2019-12-10 20:25:42 -08:00
Elias Ellison	9f3fe78239	peephole optimize type refinements (#31024 ) Summary: Peephole optimize out type refinements when they are no longer refining the type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31024 Differential Revision: D18920958 Pulled By: eellison fbshipit-source-id: 6d05d9812b9f9dcf001de760a78a2042fb832773	2019-12-10 18:32:28 -08:00
Michael Suo	d02280b432	move migration guide to appendix (#31068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31068 Let's get it out of the early parts now that the recursive API has been around for a while Test Plan: Imported from OSS Differential Revision: D18920498 Pulled By: suo fbshipit-source-id: 6f4389739dd9e7e5f3014811b452249cc21d88e7	2019-12-10 18:04:02 -08:00
svcscm	d088bd0bad	Updating submodules Summary: GitHub commits: `c6506e2698` `4427c1a832` `a653857178` `558f42bd6c` `3839cbaf52` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 4a253bba6de9a2c2a11a82e33809a370e1b4fd04	2019-12-10 16:58:08 -08:00
Jeremy Lilley	e7e6d56b77	Allow async work in rpc RequestCallback processing. (#30637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30637 RequestCallback api currently forces work to be always synchronous, which, as we scale, means we're going to need to throw large number of (mostly blocked) threads at the rpc problem. For some activities like dependent autograd rpcs, there's not a necessary reason to block in these threads. In this change, the RequestCallback api is updated to return a shared_ptr<FutureMessage> rather than a Message: std::shared_ptr<FutureMessage> operator()(Message& request) const; With a futures-style api, RPC ops that wish to be async can then be async, while short-lived blocking functions (or Python UDFs) can just block. In this change, we keep all of the current ops as synchronous (i.e. we block and then return a completed FutureMessage). We also update the rpc_agents in a manner compatible with this sort of parallelism. Here, we only want to incur overhead when we use the async behavior. Some modest extra cost seems unavoidable here (e.g. the allocation for the std::make_shared<>), but we can trivially detect the synchronous/completed case in the rpc_agent and avoid the extra thread-switches/etc. in that case. ghstack-source-id: 95287026 Test Plan: - Basic: buck test mode/dev-nosan caffe2/test/... - Additional testcase in ThriftRpcAgentTest for deferred work. Differential Revision: D18774322 fbshipit-source-id: cf49922a71707cfb1726de16f93af23b160385d8	2019-12-10 16:11:05 -08:00
Supriya Rao	e42af97349	Add quantized concat conversion (#30887 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30887 Support to convert quantized concat from pytorch to caffe2 Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_cat Imported from OSS Differential Revision: D18855676 fbshipit-source-id: 5d0cf3f03c61819e168b080afa368b1255d0419c	2019-12-10 15:46:16 -08:00
Ilia Cherniavskii	3de8584de8	Correct definition of nodes that work with Autograd (#30683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30683 Assume that a node can work with autograd only if it is not a fusion group and in prim or aten namespaces. Test Plan: CI Reviewed By: lly-zero-one Differential Revision: D18795171 Pulled By: ilia-cher fbshipit-source-id: 301090557e330b58be70e956784f7f0dc343c684	2019-12-10 15:39:38 -08:00
Michael Suo	b7652a2f81	remove py2 flake8 lint (#29357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29357 As title Test Plan: Imported from OSS Reviewed By: pritamdamania87 Differential Revision: D18920562 Pulled By: suo fbshipit-source-id: b5dd559cfb0ba6c64b9ccf3655417afb56a7b472	2019-12-10 15:31:10 -08:00
Michael Suo	d113b22571	kill PyTorch py2 circle jobs (#29353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29353 First step to killing Python 2 everywhere. I don't really know that much about the caffe2 circle jobs so I left them alone for now. Test Plan: Imported from OSS Differential Revision: D18920563 Pulled By: suo fbshipit-source-id: b37d8427a6ecd4b8a7e16c1ff948e0ce13b5798f	2019-12-10 15:31:06 -08:00
TH3CHARLie	5edfe9cb80	add torch.square (#30719 ) Summary: fixes https://github.com/pytorch/pytorch/issues/30524 This adds an new operator `torch.square` to PyTorch I think it is ready for the first-time review now albanD Pull Request resolved: https://github.com/pytorch/pytorch/pull/30719 Differential Revision: D18909268 Pulled By: albanD fbshipit-source-id: 5626c445d8db20471a56fc1d7a3490e77812662b	2019-12-10 15:22:46 -08:00
Michael Suo	e3d40f857b	Make nn.Module `forward()` type annotation more permissive (#31057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31057 The current signature basically will always fail to type check, because mypy enforces that the subclass method's input types must be "wider" than their superclass method's input types (i.e. they can vary contravariantly). And nothing is wider than `Any`. This change makes it so that any input params are allowed in `forward()`. Fixes #29099 Test Plan: Imported from OSS Differential Revision: D18918034 Pulled By: suo fbshipit-source-id: 9940e9f769b55d580d9d7f23abf6f88edb92627f	2019-12-10 14:36:13 -08:00
svcscm	8fd85d70be	Updating submodules Summary: GitHub commits: `163b6e2428` `1d7a0e1a4b` `b8031f09d7` `7fd86a8f64` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 98b2487b39fb56641641c0947ed09f883755126a	2019-12-10 14:19:31 -08:00
Gregory Chanan	ed20937231	Remove TensorImpl::maybe_zero_dim. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30878 Test Plan: Imported from OSS Differential Revision: D18855989 Pulled By: gchanan fbshipit-source-id: 44087b6136ec40d0a3de5b5a9f03c60d002a1107	2019-12-10 13:21:47 -08:00
svcscm	0cbbe050bb	Updating submodules Summary: GitHub commits: `b459fcc89f` `2b060c1498` `13a2c072c4` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 59fb11a977dcb7b2c09acb7fe997b0d5e52f27c4	2019-12-10 12:48:07 -08:00
Zafar Takhirov	cc319659e3	qnnpack TanH Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31013 Test Plan: Imported from OSS Differential Revision: D18898903 Pulled By: z-a-f fbshipit-source-id: aa126a98627b808678f629f39853c3b9c70eb2bf	2019-12-10 12:23:37 -08:00
Pritam Damania	b01b05790e	Fix memory leak due to circular dependency. (#31030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31030 DistAutogradContext held a shared_ptr reference to RecvRpcBackward and RecvRpcBackward held a shared_ptr reference to the context. This circular dependency caused significant memory leaks. As a result, I'm changing the reference in RecvRpcBackward to be a weak_ptr. Test Plan: waitforbuildbot Differential Revision: D18896389 fbshipit-source-id: e5bc588b6f998885854e3a67de1e82452e8475ce	2019-12-10 12:20:43 -08:00
Summer Deng	57f29a44c7	Bug fix of the histogram observers (#30970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30970 Check null tensors in the histogram observers Test Plan: f154576636 vs f154820243 Reviewed By: hx89 Differential Revision: D18865771 fbshipit-source-id: 669c014d914525deee36142e12f013afaf3caf1d	2019-12-10 11:45:20 -08:00
Gregory Chanan	27d7dba9ab	Remove scalar_check specification and codegen. (#30874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30874 These have all been disabled at this point, so there is no difference in the generated code. Test Plan: Imported from OSS Differential Revision: D18855990 Pulled By: gchanan fbshipit-source-id: 03796b2978e23ef9060063f33241a1cbb39f1cf3	2019-12-10 11:41:20 -08:00
Tao Xu	47033b49f3	Suppress XCode build warnings (#31000 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/31000 ## Summary Add Fastlane configurations to suppress the build warnings from XCode. Test Plan: Imported from OSS Differential Revision: D18912489 Pulled By: xta0 fbshipit-source-id: f2c54d54a12ad2415695d1fcb1800684c7a9e560	2019-12-10 11:37:52 -08:00
svcscm	2da3b9a0f6	Updating submodules Summary: GitHub commits: `fd8771904e` `6bf51e234f` `6380df5e10` `696c2a2359` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 188670fcdc50ccf060eea137698ecfb45484e059	2019-12-10 11:23:13 -08:00
Pieter Noordhuis	78a00d72b4	Revert D18899127: resubmit polish up overloads on free functions Test Plan: revert-hammer Differential Revision: D18899127 Original commit changeset: 9049b8718926 fbshipit-source-id: c70a8aa4120aa757dce0926a8ab3cc5c92cd6041	2019-12-10 10:51:07 -08:00
Hong Xu	394d2f7037	Fix the rendering of the doc of max. (#30779 ) Summary: Close https://github.com/pytorch/pytorch/issues/30731 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30779 Differential Revision: D18837317 Pulled By: zou3519 fbshipit-source-id: b9b5ba414756a68d4b39a7a7c2d89fee1e3c040f	2019-12-10 10:48:16 -08:00
Protonu Basu	313c211f3f	Calling JITed 8 Bit Fused SLS in FBGEMM from C2 (#30926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30926 Calling the JITed FBGEMM kernel for Fused 8 Bit Sparse Length Sum (Fused8BitRowwiseEmbeddingLookup) Test Plan: buck test mode/dbg //caffe2/caffe2/python:lengths_reducer_fused_8bit_rowwise_ops_test All tests pass. Reviewed By: jspark1105 Differential Revision: D18058128 fbshipit-source-id: 0dfa936eb503712c39e53748e015fc156afde86f	2019-12-10 10:44:05 -08:00
Chunli Fu	bb7befb12c	Support loading by blob in predictor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30805 Reviewed By: ipiszy Differential Revision: D18827383 fbshipit-source-id: b97f958768618ca29a02b057667a9b4ee313ad3c	2019-12-10 10:34:14 -08:00
Summer Deng	a42d093db2	FCTransposed to FbFCPacked (#29766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29766 Add FbgemmPackTranspose op to support the packing on FCTransposed weights Add FCTransposed to FbFCPacked transformation to Dper fp16 exporter Test Plan: ``` buck test mode/opt caffe2/caffe2/fb/fbgemm:fb_fc_packed_op_test ``` ``` buck test mode/opt caffe2/caffe2/python:layers_test ``` Differential Revision: D18482306 fbshipit-source-id: e8f1947b3d0d04892293509ebf88742f5f0f5997	2019-12-10 10:18:21 -08:00
Lu Fang	c34ef1aa2e	Automatic update of fbcode/onnx to c08a7b76cf7c1555ae37186f12be4d62b2c39b3b (#30619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30619 Previous import was fea8568cac61a482ed208748fdc0e1a8e47f62f5 Included changes: - [c08a7b76](https://github.com/onnx/onnx/commit/c08a7b76): doc: fix some typos at ONNXIFI (#2473) <Yorkie Liu> - [4be12d46](https://github.com/onnx/onnx/commit/4be12d46): remove workshop update since it is done (#2460) <Prasanth Pulavarthi> - [86107d1b](https://github.com/onnx/onnx/commit/86107d1b): Updated with correct URL to LICENSE (#2468) <Ryan Loney> - [9bf6fbb6](https://github.com/onnx/onnx/commit/9bf6fbb6): Update Argmin/Argmax (#2461) <Lara Haidar> - [748d81b8](https://github.com/onnx/onnx/commit/748d81b8): Fix windows conda build (#2452) <Ashwini Khade> - [a32db1c5](https://github.com/onnx/onnx/commit/a32db1c5): Delete duplicate word in comment (#2439) <Haibo Hao> - [e108da9a](https://github.com/onnx/onnx/commit/e108da9a): Fix bug in function body verifier (#2390) <G. Ramalingam> - [c3d3ef82](https://github.com/onnx/onnx/commit/c3d3ef82): docs: fix typo in IR.md (#2441) <Elliot Waite> Test Plan: ci Reviewed By: hl475 Differential Revision: D18766132 fbshipit-source-id: 13c04f21399579acb87a8f9fac2e4c329b0720b8	2019-12-10 10:15:08 -08:00
hxia11	06c7420fa2	Raise error if a block can not be found from a CUDA tensor (#30870 ) Summary: After several discussions, we agreed not to put any extra safety check for recordStream as either the check will cause failures in certain scenarios or there is no need to throw for user errors. As a summary, it simply does what is described in https://github.com/pytorch/pytorch/issues/27405, check if a tensor is indeed allocated by a CUDACachingAllocator instance, if it is, then throw internal error if a block can not be retrieved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30870 Differential Revision: D18851669 Pulled By: yxia11 fbshipit-source-id: c2f01798cd24f1fd0f35db8764057d5d333dab95	2019-12-10 08:04:00 -08:00
Elias Ellison	af4040d808	resubmit polish up overloads on free functions (#31014 ) Summary: Resubmitting https://github.com/pytorch/pytorch/pull/30356 Second commit has reintroduces deleted function which caused revert previously. Pull Request resolved: https://github.com/pytorch/pytorch/pull/31014 Differential Revision: D18899127 Pulled By: eellison fbshipit-source-id: 9049b8718926c329d9cb46bb96eac6c278e9b866	2019-12-10 07:57:47 -08:00
Richard Zou	e05ee4c421	Remove BUILD_NAMEDTENSOR macros (#30894 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30894 This PR begins the process of removing BUILD_NAMEDTENSOR macros. There will be followups. Reasons for removing the macros: - BUILD_NAMEDTENSOR is always on and has been on since pytorch 1.3.0. - Since we don't test building without it, it is useless to keep around. - Code becomes nicer to read without the macros Reasons for not removing the macros: - potential for feature flagging Now, I argue against needing to feature flag. The main reason why we might want to feature flag is if we need to disable the feature. We'd need a fast switch to disable the feature if someone discovers in the future that named tensors caused some regression in some existing workflows. In https://github.com/pytorch/pytorch/pull/25798, I did a variety of macro- and micro- benchmarks to determine the performance impact of named tensors on regular tensors. [The microbenchmarks](https://github.com/pytorch/pytorch/pull/25798#issuecomment-529014810) were not very stable, and running the microbenchmarks for more iterations doesn't actually help because the noise is not distributed in a nice way. Instead of microbenchmarks I ran a [profiler (perf)](https://github.com/pytorch/pytorch/pull/25798#issuecomment-555707645) to estimate how much overhead named tensors add to unnamed code. I estimated the overhead to be less than 100ns for `add` and even smaller for `mm`; there are ways to optimize even futher if we find this to be a problem. [Initial macrobenchmarks](https://github.com/pytorch/pytorch/pull/25798#issuecomment-530539104) were also not very stable. I ran imagenet for some number of epochs. To make them more stable, I got rid of the data loading (which seemed to vary between runs). [In some benchmarkers without data loading](https://github.com/pytorch/pytorch/pull/25798#issuecomment-562214053), we can see that the results are less noisy now. These results support no noticeable regressions in speed. Test Plan: - wait for CI Differential Revision: D18858543 Pulled By: zou3519 fbshipit-source-id: 08bf3853a9f506c6b084808dc9ddd1e835f48c13	2019-12-10 07:54:05 -08:00
Elias Ellison	f48a8901c5	Add floor_divide function (#30493 ) Summary: Adds `torch.floor_divide` following the numpy's `floor_divide` api. I only implemented the out-of-place version, I can add the inplace version if requested. Also fixes https://github.com/pytorch/pytorch/issues/27512 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30493 Differential Revision: D18896211 Pulled By: eellison fbshipit-source-id: ee401c96ab23a62fc114ed3bb9791b8ec150ecbd	2019-12-10 07:51:39 -08:00
svcscm	44428d0ee2	Updating submodules Summary: GitHub commits: `6c87dc4d3c` `5ec43afc1d` `1e3cb8283f` `3af1c72471` `dc8e6e6e68` `405e596d50` `f40ae54a52` `479a143912` `e63b40cb4b` `cb5f0670a6` `470a664def` `6e8f70b2d9` `0fb026ca58` `3595e0cf38` `79b171ffa3` `fb5322d98d` `cd48fc606b` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 99bee659ea0fca0247d67d2dac12a821e1bd402d	2019-12-10 07:45:23 -08:00
Chunli Fu	42324cb6e8	Change interface from map of TensorShape to shapeInfoMap (#30802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30802 Change shape_hints from map<string, TensorShape> to ShapeInfoMap to catch dimType info from model file. Reviewed By: ipiszy Differential Revision: D18821486 fbshipit-source-id: c5d9ed72e158d3698aba38900aeda00f776745b4	2019-12-10 00:35:11 -08:00
neginraoof	5205556782	Export custom ops (#29752 ) Summary: Updated to export API: When calling this API, a dict containing the custom opsets (domain and version) used to export the model could be provided. We allow registering one custom opset (domain, version) per ONNX opset. So, when exporting an operator from a custom domain, users need to pass this pair. Default custom opset version is 1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29752 Reviewed By: hl475 Differential Revision: D18703662 Pulled By: houseroad fbshipit-source-id: 84d22557d132b526169051193d730761798fce60	2019-12-09 18:48:50 -08:00
Jerry Zhang	04b9324476	Factor out getInvokedMethod in `InsertQuantDeQuantHelper` (#30860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30860 att Test Plan: . Imported from OSS Differential Revision: D18849021 fbshipit-source-id: e5ff260f2f4e88075b0c6b32ccfd8272053ccc41	2019-12-09 16:10:58 -08:00
Shen Li	fa6661422f	Disable flaky test_rref_context_debug_info Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30990 Test Plan: Imported from OSS Differential Revision: D18893023 Pulled By: mrshenli fbshipit-source-id: 80b36927f243fa53c4d64f7e7c51097290ffdeee	2019-12-09 15:55:51 -08:00
Wanchao Liang	73dd8c005a	Revert D18864774: polish up overloads on free functions Test Plan: revert-hammer Differential Revision: D18864774 Original commit changeset: 6c566738bd6f fbshipit-source-id: 669192605a3bc1a6ba06bbb5cae54f61637a45ae	2019-12-09 15:41:45 -08:00
Elias Ellison	446488960a	polish up overloads on free functions (#30356 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30356 This finishes up the `torch.jit.overload` api for free-functions. - defaults now required on the implementation function itself - fully follows [overload spec](https://mypy.readthedocs.io/en/latest/more_types.html#function-overloading) such that the following is supported ``` overload def mouse_event(x1: int, y1: int) -> ClickEvent: ... def mouse_event(x1: int, y1: int, x2: Optional[int] = None, y2: Optional[int] = None): ... ``` Note: `jit.overload` isn't supported yet for UDT, but is support for modules. This PR doesn't make the same changes for modules, if reviewers think I should include them then I could do so in a follow up PR or wait to land this. Since that's still an internal api I think it's fine, and the changes here would allow us to expose `torch.jit.overload` on free functions. Test Plan: Imported from OSS Differential Revision: D18864774 Pulled By: eellison fbshipit-source-id: 6c566738bd6f0551a000a9ea8d56e403636b7856	2019-12-09 15:12:18 -08:00
Elias Ellison	a03581b927	add tests that schemas are valid (#30749 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30749 Add check to schemas that the schema is sane. I removed the defaults from symbolic_script because they were in some cases wrong and don't actually do anything. At the point they're invoked the forward should already have matched all arguments. Test Plan: Imported from OSS Differential Revision: D18864775 Pulled By: eellison fbshipit-source-id: 273d7e96d65b8a3d3de72e2d7bfcdf2417046c6b	2019-12-09 15:12:13 -08:00
Shen Li	e9ca13d7f5	Add glue code to collect debug info from all components Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30888 Test Plan: Imported from OSS Differential Revision: D18857139 Pulled By: mrshenli fbshipit-source-id: 5c1bfb83a21a4a57c4297bb94f14baa09520b791	2019-12-09 14:39:11 -08:00
Shen Li	8a57362000	Fix index out of bound error in Engine::ready_queue_size when called before start_threads Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30967 Test Plan: Imported from OSS Differential Revision: D18887178 Pulled By: mrshenli fbshipit-source-id: 67baeac9214a4749ce7e9b4d89862c93620b2d5e	2019-12-09 14:39:07 -08:00
Shen Li	a38c9b1ade	Adding debugging metrics to process group agent Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30884 Test Plan: Imported from OSS Differential Revision: D18857140 Pulled By: mrshenli fbshipit-source-id: 4ec61d13778dd49467159d0db4b6dd51feaf282b	2019-12-09 14:39:03 -08:00
Elias Ellison	82268bf300	handle reassignment to inf and nan (#30877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30877 Previously, when the environment tried to reassign variables which had been assigned to "inf" or "nan" it would fail because they are not simple values. Constant prop exposed this, a test was failing internally because of it. Test Plan: Imported from OSS Reviewed By: Krovatkin Differential Revision: D18861016 Pulled By: eellison fbshipit-source-id: b9b72978a26a0b00b13bf8ea7685825551f5a541	2019-12-09 14:20:17 -08:00
Elias Ellison	3eefc06feb	add constant prop for immutable types (#30544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30544 Run Constant Propagation upon compilation only on ops with non-aliasing inputs and outputs. This speeds up the first run of `torchvision.models.resnet18` by over 50% and speeds up compilation by about 25% (although the effects didn't seem additive with with https://github.com/pytorch/pytorch/pull/30503, so I'm going to land this PR first and then see if caching still has a sizable impact). Running constant prop only with non-aliasing types does a lot of graph cleanup by removing constant ifs and a bunch of other smaller ops. It also avoids all the jitter problems we had when we tried running full constant prop previously. Bc it is idempotent it doesn't jitter, and it doesn't jitter graphs constructed from tracing because tracing doesn't emit any ops that only involve non-aliasing inputs. Full constant prop isn't idempotent because what ops are run depends on the state of mutation in alias db, which will often change upon successive iterations of constant propagation, and bc it affects graphs constructed from tracing. Edit: if we were okay with running constant propagation on graphs constructed from tracing (potentially making them hard to debug), an alternative would be to run constant propagation until the graph reaches a fixed point. Test Plan: Imported from OSS Differential Revision: D18833607 Pulled By: eellison fbshipit-source-id: 92a0adb4882d67ed5a0db5c279f5e122aeeba54a	2019-12-09 14:20:12 -08:00
Elias Ellison	648bb501a1	rename shouldAnnotate api (#30543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30543 `shouldAnnotate` doesn't make make a ton of sense as a public api Test Plan: Imported from OSS Differential Revision: D18833608 Pulled By: eellison fbshipit-source-id: 460ee05d0fa91b1edc640c037be2a6ee8eaf50a6	2019-12-09 14:20:07 -08:00
Jerry Zhang	45f0556ba0	Proper print for one element tuple (#30853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30853 Right now we print one element tuple as `(val)`, and it will be interpreted as `val` in parsing, this PR changes it to `(val,)` so we can recognize the one element tuple in parsing Test Plan: . Imported from OSS Differential Revision: D18846849 fbshipit-source-id: 42959b9190c2567ef021a861497077c550324b7c	2019-12-09 14:15:40 -08:00
Jerry Zhang	5bf58274cc	getQParams return a dictionary of qparams (#30859 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30859 We can dictionary of quantization parameters to simplify the code handling these things a bit Test Plan: . Imported from OSS Differential Revision: D18849023 fbshipit-source-id: 09e9860b2656a1affa8776016e16794529bcee3b	2019-12-09 13:42:21 -08:00
svcscm	fb36f1c334	Updating submodules Summary: GitHub commits: `0f96b98cec` `8090b337a4` `e43d2c4424` `70d1c268bf` `fc6140865b` `4caba2ed65` Test Plan: n/a Reviewed By: zpao fbshipit-source-id: 5b4edf4267942ab0cbd2980dc500227e3ce353e3	2019-12-09 13:02:10 -08:00
Sebastian Messmer	536481d9de	Fix missing virtual destructor (#30927 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30927 Classes that are used virtually (e.g. have virtual methods) must have a virtual destructor or bad things happen ghstack-source-id: 95144736 Test Plan: waitforsandcastle Differential Revision: D18870351 fbshipit-source-id: 333af4e95469fdd9103aa9ef17b40cbc4a343f82	2019-12-09 12:25:26 -08:00
Sebastian Messmer	528fa737ba	Custom op autograd tests (#30519 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30519 Re-enable them and write a few additional ones ghstack-source-id: 95143051 Test Plan: unit tests Differential Revision: D18729561 fbshipit-source-id: 8cefd8320913d72a450a3324bfd7c88faed072d7	2019-12-09 12:25:22 -08:00
xiaobing.zhang	daef363b15	Move Softshrink activation to Aten(CPU+CUDA) (#30229 ) Summary: VitalyFedyunin, This PR is about port Softshrink activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" m = nn.Softshrink() if torch.cuda.is_available(): device = "cuda" m = m.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) fwd_t = 0 bwd_t = 0 for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. Perfromance: Before: ``` GPU: input size(128, 100) forward time is 0.06 (ms); backwad avg time is 0.12 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.18 (ms). CPU: input size(128, 100) forward time is 0.19 (ms); backwad avg time is 0.23 (ms). input size(128, 10000) forward time is 17.23 (ms); backwad avg time is 16.83 (ms). ``` After: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU: input size(128, 100) forward time is 0.08 (ms); backwad avg time is 0.05 (ms). input size(128, 10000) forward time is 0.32 (ms); backwad avg time is 0.08 (ms). ``` `OMP_NUM_THREADS=1:` ``` Before: input size(128, 100) forward time is 0.08 (ms); backwad avg time is 0.10 (ms). input size(128, 10000) forward time is 7.58 (ms); backwad avg time is 7.91 (ms). After: input size(128, 100) forward time is 0.08 (ms); backwad avg time is 0.02 (ms). input size(128, 10000) forward time is 7.30 (ms); backwad avg time is 1.02 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30229 Differential Revision: D18810054 Pulled By: VitalyFedyunin fbshipit-source-id: e19074824396570db45ba488ae4f9fe1b07a5839	2019-12-09 12:19:46 -08:00
Rohan Varma	4f342a61c1	add the worker IDs outside of addSendRpcBackward to ensure they are (#30914 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30914 When tensors don't require grad, we don't call `addSendRpcBackward`, where we record known workerIDs to clean up the dist autograd context later. But since https://github.com/pytorch/pytorch/pull/29781, we always include the autograd context ID in RPCs, even if tensors do not require grad. So, it could be possible that we don't release the contexts on some nodes. This can contribute to OOMs since the contexts will not be cleaned up in this case, which can be checking by running the unit test without this patch. We can fix this issue by moving the `addKnownWorkerIds` call to the `getMessageWithAutograd` function. ghstack-source-id: 95178561 Test Plan: Added a unit test: `test_context_cleanup_tensor_no_grad` Differential Revision: D18869191 fbshipit-source-id: b80f66bfd0dd7d01960abe1691d3f44095bb1b2b	2019-12-09 11:38:34 -08:00
Gregory Chanan	c75bc9067c	MultiMarginCriterion: move scalar_check from codegen to code. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30827 Test Plan: Imported from OSS Differential Revision: D18833658 Pulled By: gchanan fbshipit-source-id: decd42789d92d4fbfeea9b470b3d7333e3862263	2019-12-09 07:48:58 -08:00
Owen Anderson	190dac13e3	Use universal references and perfect forwarding in Loops.h. (#30466 ) Summary: This simplifies the generated code a bit, saving about 40K off of libtorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30466 Differential Revision: D18836215 Pulled By: resistor fbshipit-source-id: ad75c9e04783bb29cc06afd2022f73f9625dd52b	2019-12-08 23:31:10 -08:00
Jongsoo Park	6848f9abb8	call fp16<->fp32 routines in fbgemm from Half2Float and Float2Half operators (#30715 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30715 Changed caffe2/caffe2/TARGETS file to define USE_FBGEMM for x86 and USE_SSE_ONLY is not defined. Test Plan: buck test caffe2/caffe2:caffe2_test_cpu -- Float16 Reviewed By: jianyuh Differential Revision: D18806067 fbshipit-source-id: 1b44b90a9f6dc3c27f81a46038c0f7542ed2bab3	2019-12-07 19:46:47 -08:00
Pritam Damania	776fdda753	Add debug info API for distributed autograd. (#30642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30642 Adding a couple of basic metrics for distributed autograd which would help in determining stuckness. ghstack-source-id: 95156189 Test Plan: waitforbuildbot Differential Revision: D18776478 fbshipit-source-id: a0556ad6fe2b7c3cd0082ee2350c1c78cafaaec5	2019-12-07 13:56:51 -08:00
svcscm	0b33080992	Updating submodules Summary: GitHub commits: `452ebf30a8` `8e85afc8a1` `39d204760c` `5760376392` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: aa1ff805dbe1a1cbe5eb256ed2ba30af587a8707	2019-12-07 13:48:58 -08:00
Pavel Belevich	4bb497b38e	MultiheadAttention fixes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30666 Test Plan: Imported from OSS Differential Revision: D18864094 Pulled By: pbelevich fbshipit-source-id: f7a634b2c7f526282bf918d47b9cc82aa0c0af1d	2019-12-07 09:42:10 -08:00
svcscm	8b6d7698d6	Updating submodules Summary: GitHub commits: `40ac0e57c1` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: ac74c10651a5a4ef67c93a38dc6673f0687e38ae	2019-12-07 02:43:38 -08:00
Pritam Damania	f1bd8cc286	Fix lint issues in dist_autograd_test.py (#30928 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30928 ghstack-source-id: 95152373 Test Plan: waitforbuildbot Differential Revision: D18872870 fbshipit-source-id: 2cd1ef228da4bd90c13e2f067a0c89b975fa3179	2019-12-07 01:44:37 -08:00
BowenBao	63f1b780ba	Support exporting aten::copy_ and aten::index_put to ONNX opset 11 (#26941 ) Summary: - [x] Add more comments and refactor the logic of `ReshapeToAdvancedIndexingFormat` - [x] Add more description here. Cases that are/aren't supported, and how they are supported. - [x] Need to merge this PR https://github.com/pytorch/pytorch/issues/27186 to enable testing inplace operators. We are now supporting exporting aten::copy_ and aten::index_put to ONNX. Here's a breakdown of the different cases in PyTorch code. ``` # Case 1: Scalar Indices x[0, 1, 2] = data # Case 2: Slice Indices x[1:3, :, ::2] = data # Case 3: Ellipsis Indices x[..., 0] = data # Case 4: Tensor Indices ind1 = torch.tensor([0, 2]) ind2 = torch.tensor([1, 1]) x[ind1, ind2] = data # Case 5: Mixing all the above cases ind1 = torch.tensor([0, 2]) ind2 = torch.tensor([1, 1]) x[1:3, ind1, ind2, ..., 3] = data ``` Limitations: Tensor indices must be consecutive, and 1-d tensors. ``` # Supported ind1 = torch.tensor([0, 2]) ind2 = torch.tensor([1, 1]) x[ind1, ind2] = data # Not supported ind1 = torch.tensor([0, 2]) ind2 = torch.tensor([1, 1]) ind3 = torch.tensor([[0], [1]]) x[ind1, :, ind2] = data x[ind3] = data ``` Negative indices are not supported. ``` # Not supported x[-1] = data ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/26941 Differential Revision: D17951030 Pulled By: houseroad fbshipit-source-id: 4357777072f53aa0bc4b297aa1ee53457a7f8dec	2019-12-06 22:48:46 -08:00
Junjie Bai	a26238da57	Enable using `torch.autograd.profiler.record_function` as decorator (#30861 ) Summary: ```python record_function('my_func') def f(x, y): return x + y with profile() as p: f(1, 2) print(prof.key_averages().table()) ``` ``` ------------------------------------ --------------- --------------- --------------- --------------- --------------- --------------- Name Self CPU total % Self CPU total CPU total % CPU total CPU time avg Number of Calls ------------------------------------ --------------- --------------- --------------- --------------- --------------- --------------- my_func 85.42% 86.796us 87.27% 88.670us 88.670us 1 ------------------------------------ --------------- --------------- --------------- --------------- --------------- --------------- Self CPU time total: 101.606us ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30861 Differential Revision: D18857993 Pulled By: bddppq fbshipit-source-id: eb6b8e2a8d4f3a7f8e5b4cb3da1ee3320acb1ae7	2019-12-06 21:38:35 -08:00
Pritam Damania	5c56986738	Attach autograd edges only for tensors requiring grad. (#30904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30904 When we sent tensors over RPC, on the server side we would call addRecvRpcBackward which would call `set_history` on all tensors. This was incorrect and set the `requires_grad` flag on tensors that didn't actually need grad. To fix this, we only attach autograd edges to tensors that need grads. ghstack-source-id: 95113672 ghstack-source-id: 95113999 Test Plan: waitforbuildbot Differential Revision: D18828561 fbshipit-source-id: d8942b76e9e4c567f8f1821f125c00d275ea0f90	2019-12-06 18:05:57 -08:00
Michael Suo	62b10721fb	Actually make flake8 do something (#30892 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30892 Fixes all outstanding lints and actually installs a properly configured flake8 Test Plan: Imported from OSS Differential Revision: D18862825 Pulled By: suo fbshipit-source-id: 08e9083338a7309272e17bb803feaa42e348aa85	2019-12-06 17:50:50 -08:00
Ansha Yu	8d35b6cec7	embedding_bag make_bag_size optimization (#30701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30701 From James' PR https://github.com/pytorch/pytorch/pull/19715 embedding_bag microbenchmarks: Baseline: P123020983 Refactor make_bag_size, no changing at::zeros to at::empty (this diff): P123021393 Inference benchmark on T6_SKL - _embedding_bag self time only: bs=40, baseline: .302 ms/iter bs=40, with diff: .244 ms/iter bs=1 baseline: .148 ms/iter bs=1 with diff: .124 ms/iter The bigger gap comes from fb::embedding_bag_byte_rowwise_offsets, I'm looking into that one too. Test Plan: MKL_NUM_THREADS=1 OMP_NUM_THREADS=1 numactl -m 0 -C 3 ./inference_benchmark_nolr_emb.par --pt-scripted-model=traced_model.pt --pt-inputs="batch_size_40/pt_inputs.pth" --iters=3000 --warmup-iters=100 buck run mode/opt //caffe2/benchmarks/operator_benchmark:benchmark_all_other_test -- --tag_filter all --iterations 3000 --operators embeddingbag Reviewed By: yinghai, qizzzh Differential Revision: D18800166 fbshipit-source-id: 820e6ece0b6ade72ee42409661f92c548f43a4cb	2019-12-06 16:17:16 -08:00
Ailing Zhang	cd6167ff63	Upgrade bazel to 1.2.0. (#30885 ) Summary: Companion diff for https://github.com/pytorch/xla/pull/1464. Should land only after the pytorch/xla PR is in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30885 Differential Revision: D18866835 Pulled By: ailzhang fbshipit-source-id: 51f4d2770f8ef873a659579ddd81a42957ffb885	2019-12-06 16:08:24 -08:00
Xingying Cheng	7b97eaeba5	Add module level qpl logging. (#30906 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30906 Add mobile module observer to measure performance of each method run. ghstack-source-id: 95120194 Test Plan: Run pytext model through BI cloaking flow on lite-interpreter and verify logs are sent: 1. buck install -r fb4a 2. Go to internal setting and find MobileConfig, search for android_bi_infra_cloaking_iab_models and set the following params: a. sample_rate: 1.0 b. enabled: true c. use_bytedoc_pytorch_model: true d. use_bytedoc_caffe2_model: false e. use_full_jit: false 3. Go back to new feed and scroll down until find an ads which will direct you to offsite webpage; 4. Click on the ads, wait for the offsite ads loads; 5. Click back to news feed; 6. Go to scuba table: https://fburl.com/scuba/4fghwp0b and see all the operator runs have been logged: {F223456981} Reviewed By: ljk53 Differential Revision: D18702116 fbshipit-source-id: a9f07eee684e3022cef5ba3c5934f30f20192a85	2019-12-06 15:52:26 -08:00
Nikolay Korovaiko	118f1c633b	refactor the way we are handling bailout counts Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30410 Differential Revision: D18733370 Pulled By: Krovatkin fbshipit-source-id: 0ea9dc0f3dd1a47bcc09f1d54745460f9bd71886	2019-12-06 15:45:38 -08:00
Tongzhou Wang	c37de32b23	Enable len(dataloader) for iterable dataset (#23587 ) Summary: Copy-paste comment from code for reasoning: ``` # NOTE [ IterableDataset and __len__ ] # # For `IterableDataset`, `__len__` could be inaccurate when one naively # does multi-processing data loading, since the samples will be duplicated. # However, no real use case should be actually using that behavior, so # it should count as a user error. We should generally trust user # code to do the proper thing (e.g., configure each replica differently # in `__iter__`), and give us the correct `__len__` if they choose to # implement it (this will still throw if the dataset does not implement # a `__len__`). # # To provide a further warning, we track if `__len__` was called on the # `DataLoader`, save the returned value in `self._len_called`, and warn # if the iterator ends up yielding more than this number of samples. ``` Fixes https://github.com/pytorch/pytorch/issues/30184 Pull Request resolved: https://github.com/pytorch/pytorch/pull/23587 Differential Revision: D18852625 Pulled By: ailzhang fbshipit-source-id: aea8d4d70c7f21aaa69b35908a6f43026493d826	2019-12-06 15:38:05 -08:00
Serhat Yilmaz	a77eafa1d8	Fix 'initialized after field' error (#30908 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30908 Same as title. Test Plan: Wait for CI to clear. Reviewed By: bddppq, xw285cornell Differential Revision: D18862837 fbshipit-source-id: bc34356b85774fc20ba46d321c8a2bb5d5c727f6	2019-12-06 15:04:18 -08:00
Jiakai Liu	baccd26df7	update code analyzer script to handle splitted torch libraries (#30864 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30864 Change it to handle all archive files under install folder. Test Plan: ``` ANALYZE_TEST=1 CHECK_RESULT=1 tools/code_analyzer/build.sh ANALYZE_TORCH=1 tools/code_analyzer/build.sh ``` Differential Revision: D18850317 Pulled By: ljk53 fbshipit-source-id: 7c57ae16c82b6ded53aa7df385f3b6074190fc04	2019-12-06 14:38:30 -08:00
Sebastian Messmer	223f46f5fa	Fix flake8 warning (#30905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30905 - ghstack-source-id: 95117983 Test Plan: - Differential Revision: D18861981 fbshipit-source-id: b794a7fbe05af29471286c7f665cf3f86541eb5a	2019-12-06 14:19:35 -08:00
James Reed	4fd20c0816	Kill hypothesis deadline testing (#30890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30890 We've received way too many complaints about this functionality making tests flaky, and it's not providing value to us anyway. Let's cut the shit and kill deadline testing Test Plan: Imported from OSS Differential Revision: D18857597 Pulled By: jamesr66a fbshipit-source-id: 67e3412795ef2fb7b7ee896169651084e434d2f6	2019-12-06 13:36:14 -08:00
Shen Li	26c51468c5	Fix examples in RRef API doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30857 Test Plan: Imported from OSS Differential Revision: D18847527 Pulled By: mrshenli fbshipit-source-id: 7dc9d28277597f8fc3ef97fa9ac98a312e76e6fb	2019-12-06 13:14:11 -08:00
Shen Li	642469b706	Fix examples in API doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30856 Test Plan: Imported from OSS Differential Revision: D18847528 Pulled By: mrshenli fbshipit-source-id: 57f666d9d4b634fb77b1b65debd2b07e2bebd57a	2019-12-06 13:14:06 -08:00
Shen Li	5e6c3fb23b	Add more details to explain rpc_backend_options arg in init_rpc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30855 Test Plan: Imported from OSS Differential Revision: D18847529 Pulled By: mrshenli fbshipit-source-id: b4f0d5797f3b41cce155b7821d6bd34b268bd24e	2019-12-06 13:14:02 -08:00
Jerry Zhang	6d06b925ba	Remove `values_to_quantize_` (#30858 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30858 This is not needed since we have `values_to_qparams_` Test Plan: . Imported from OSS Differential Revision: D18848992 fbshipit-source-id: dc81f59967a93abdd5562f1010f02de4f4e60db0	2019-12-06 12:15:13 -08:00
Xingying Cheng	81e4739141	Move QScheme ops to c10 (#30134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30134 ghstack-source-id: 95055387 Test Plan: buck build mode/dev caffe2:generate-code Differential Revision: D18609716 fbshipit-source-id: fec39359e0b97387a9b13f8179d72a731cc61808	2019-12-06 12:04:51 -08:00
Mingbo Wan	d6ddfab11f	save linux build binary size to Scuba (#30832 ) Summary: example: https://fburl.com/scuba/mjheume7 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30832 Differential Revision: D18857146 Pulled By: mingbowan fbshipit-source-id: 66bcd352922944c227f337a66e8a75e2d7393fd3	2019-12-06 11:55:35 -08:00
Xingying Cheng	78254eab45	Add mobile operator observer for qpl logging. Summary: Add mobile operator observer to measure performance of each operator run, the result will also log into QPL event: [MOBILE_OPERATOR_STATS ](https://fburl.com/quicklog/8773a00a). Test Plan: Run pytext model through BI cloaking flow on lite-interpreter and verify logs are sent: 1. buck install -r fb4a 2. Go to internal setting and find MobileConfig, search for android_bi_infra_cloaking_iab_models and set the following params: a. sample_rate: 1.0 b. enabled: true c. use_bytedoc_pytorch_model: true d. use_bytedoc_caffe2_model: false e. use_full_jit: false 3. Go back to new feed and scroll down until find an ads which will direct you to offsite webpage; 4. Click on the ads, wait for the offsite ads loads; 5. Click back to news feed; 6. Go to scuba table: https://fburl.com/scuba/er7t4g9u and see all the operator runs have been logged: {F223250762} Reviewed By: ljk53 Differential Revision: D18131224 fbshipit-source-id: 23e2f6e2a9851c04b29511b45dc53f3cce03e8a0	2019-12-06 11:55:32 -08:00
Qi Zhou	44ff7b08d8	Reduce intrusive_ptr incref/decref costs (#30709 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30709 Intrusive_ptr doesn't provide a explicit incref method. When a users want to incref the target, they creates a intrusive_ptr to wrap the target, then makes a copy which does the actual incref, then release both the first intrusive_ptr and the copy to prevent decref at deconstruction time. This is very inefficient. Instead, do the incref/decref directly. Differential Revision: D18798505 fbshipit-source-id: 524d4f30d07d733df09d54423b044d80e4651454	2019-12-06 11:52:20 -08:00
Sebastian Messmer	e123d90a93	Back out "Back out "Back out "Revert D18542342: Boxed variable dispatch""" (#30650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30650 Original commit changeset: 51bb7aac7cb7 ghstack-source-id: 95082205 Test Plan: CI Differential Revision: D18778190 fbshipit-source-id: 7e9577e88fd0492006b6ea836ec081aea9da6b0c	2019-12-06 11:45:09 -08:00
Sebastian Messmer	37435d36ed	Refactor VariableTypeManual (#30649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30649 Operators in VariableTypeManual are now no longer registered against the VariableTypeId key, but they are registered as compound ops. See https://github.com/pytorch/pytorch/issues/30102 for background. This also requires the non-variable codegen to ignore them and requires removal of VariableMethodStubs.cpp. So, because function_wrapper.py now also needs to know which ops are manual, instead of having a hard-coded list in gen_variable_type.cpp for ops with manual implementation, we now have a `manual_kernel_registration` flag in native_functions.yaml that disables the registration of operator kernels for this operator (the schema is still registered). Then, we manually register the right kernels for the operator. ghstack-source-id: 95082204 Test Plan: unit tests Differential Revision: D18778191 fbshipit-source-id: 0af6f9e43ff4fb9800ce19b286dfccd0fd22cc41	2019-12-06 11:45:05 -08:00
Mikhail Zolotukhin	b0e7db5b31	Revert D18840736: make sure windows tests get triggered Test Plan: revert-hammer Differential Revision: D18840736 Original commit changeset: 6fdf73649622 fbshipit-source-id: 719576e9c717847bfb4b057875a273123e941db3	2019-12-06 11:26:37 -08:00
Jerry Zhang	4ed2eae2d0	Add registerQParams function (#30552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30552 For upcoming changes to support quantizing shared class type Test Plan: . Imported from OSS Differential Revision: D18818653 fbshipit-source-id: 393a55db69b20a1c00ffa0157ab568cb097915b2	2019-12-06 11:17:35 -08:00
Soumith Chintala	0051467118	Update CITATION from Workshop paper to Conference paper (#30872 ) Summary: The conference paper is finally published at NeurIPS 2019: http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf Pull Request resolved: https://github.com/pytorch/pytorch/pull/30872 Differential Revision: D18854253 Pulled By: soumith fbshipit-source-id: 4f91838b1953e976542997959d5571884f739872	2019-12-06 09:16:17 -08:00
Gregory Chanan	377131b0eb	MultiMarginCriterion: fix scalar_check in the case where reduction == None. (#30826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30826 Previously the scalar_check for the reduction None case was: input.dim() <= 1, but it should be target based, i.e.: target.dim() == 0. This follows from the "correct cases", i.e. (N, C) X (N,) -> (N,) (C,) X () -> () Test Plan: Imported from OSS Differential Revision: D18833660 Pulled By: gchanan fbshipit-source-id: 26338b842a8311718c4b89da3e2f1b726d5409b8	2019-12-06 09:04:38 -08:00
Anjali Chourdia	5687ee1d85	added a serialize function in SGD class to utilize the existing macro for serialization/deserialization calls Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30739 Differential Revision: D18842908 Pulled By: anjali411 fbshipit-source-id: 7dc13ff9c4fc126790b88b1b4b5d03425c349d38	2019-12-06 08:38:07 -08:00
Gregory Chanan	e5d571ae25	Remove scalar_check from topk, move it to the THC implementation. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30852 Test Plan: Imported from OSS Differential Revision: D18842662 Pulled By: gchanan fbshipit-source-id: b5e8a4367fce9441be2ddbd026495f1911038221	2019-12-06 07:50:20 -08:00
Gregory Chanan	60714dfb64	change index_select scalar_check to retain dimensionality of input. (#30790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30790 The index_select documentaiton reads: "The returned tensor has the same number of dimensions as the original tensor (input)." But the implementation would return a 0-dimensional tensor iff both the input and index were 0-dimensional. This change makes it so we retuan a 0-dimensional tensor iff the input is 0-dimensional. Restacked version of: https://github.com/pytorch/pytorch/pull/30502 Test Plan: Imported from OSS Differential Revision: D18825717 Pulled By: gchanan fbshipit-source-id: aeb10c5107e748af3e264fbdc81fff5dd4833cc4	2019-12-06 07:47:53 -08:00
Seiya Tokui	1d7b40f1c4	Fix reading `__cuda_array_interface__` without strides (#24947 ) Summary: When converting a contiguous CuPy ndarray to Tensor via `__cuda_array_interface__`, an error occurs due to incorrect handling of default strides. This PR fixes this problem. It makes `torch.tensor(cupy_ndarray)` works for contiguous inputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24947 Differential Revision: D18838986 Pulled By: ezyang fbshipit-source-id: 2d827578f54ea22836037fe9ea8735b99f2efb42	2019-12-06 07:36:27 -08:00
Edward Yang	11b3065323	Run method_tests on CUDA. (#30821 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30821 While investigating while our tests didn't catch #30704 I noticed that none of our tests in method_tests() were being run on CUDA. This diff moves those tests into the new device-generic test framework so that we also get CUDA coverage. For expediency, I blacklisted all tests which didn't work on CUDA (rather than fix them); that's something we can leave for future PRs. This is done by way of a new expectedFailure gadget. Note that all occurences of skipIfNoLapack needed to be replaced with skipCPUIfNoLapack. I punted for test_jit; it's possible those tests should also run CUDA but a JIT expert should take a look here. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18840089 Pulled By: ezyang fbshipit-source-id: 66b613b5024c91d3e391c456bb642be7e73d4785	2019-12-06 07:24:27 -08:00
Xintao Chen	9a858aba5f	Moving checks related to options.aliasAnalysis and schema.hasAliasInfo to read callsite (#30671 ) Summary: Context: In D18530964, we allow not set aliasAnalysis at previous registration call, and then update it to the correct one in following registration call. But its not working E2E due to those existing checks. So we want to remove or delay those TORCH_CHECKs. Here is the existing three callsites for operator.aliasAnalysisKind(): https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/torch/csrc/jit/ir.cpp?lines=994%2C995%2C996%2C1001%2C1004 https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/torch/csrc/jit/operator.cpp?lines=147%2C155 https://our.intern.facebook.com/intern/diffusion/FBS/browse/master/fbcode/caffe2/torch/csrc/jit/passes/alias_analysis.cpp?lines=260%2C277%2C380 Things to check 1. Those two checks are different. But since in original op_registration code, if options.schemaOrName_->is_right() is FALSE, we kind of convert it to FunctionSchema type, so in the read callsites, we only need to check the following: options.aliasAnalysisKind_ == AliasAnalysisKind::FROM_SCHEMA \|\| !schema.hasAnyAliasInfo() 2. If the three callsites above are indeed needed for those checks. 3. Here we made assumptions that for reads from jit or other places, its always being called after all registrations calls are done. Trying to make sure its a valid assumption Pull Request resolved: https://github.com/pytorch/pytorch/pull/30671 Test Plan: Will update and refactor the tests soon. Differential Revision: D18784623 Pulled By: charliechen0401 fbshipit-source-id: 75edea140d0ae3e54820e1aeef010c81fe26416a	2019-12-06 01:36:22 -08:00
Shen Li	619e2ffe23	Replace deprecated AT_* with TORCH_* to reduce warnings in c10d Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30795 Test Plan: Imported from OSS Differential Revision: D18826310 Pulled By: mrshenli fbshipit-source-id: 0041ac2e5788e874e0a566abd57a8a90e658da9b	2019-12-06 01:28:30 -08:00
Shen Li	b0cba8ceae	Replace deprecated AT_ERROR with TORCH_CHECK to reduce warnings in rpc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30794 Test Plan: Imported from OSS Differential Revision: D18826311 Pulled By: mrshenli fbshipit-source-id: bfd58d30f386bbe9535264b2afce4acbe7ac5b0e	2019-12-06 01:28:26 -08:00
Xiang Gao	2011cc1e91	Fix half->float case of softmax backward when inner_size is not 1 (#30838 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/30572 That unit test is tested to fail with master and success with this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30838 Differential Revision: D18841066 Pulled By: ngimel fbshipit-source-id: 86a7ccdb3016c98d62dd0946daff101704cd1f68	2019-12-06 00:25:34 -08:00
Satendra Gera	d32aec5ad6	Add get_metrics and get_debug_info to rpc agent (#30833 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30833 [rpc] Add get_metrics and get_debug_info to rpc agent Test Plan: UT and builds Reviewed By: mrshenli Differential Revision: D18835068 fbshipit-source-id: f552cf196bb6d54ccd38a44ba981e7d5b15513f0	2019-12-05 23:52:42 -08:00
Jerry Zhang	58cdf1429c	Add tests for quantizing traced models (#30476 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30476 att Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D18795724 fbshipit-source-id: 9253e102bf458d9185f68848071a4e4eff9f9b08	2019-12-05 23:03:45 -08:00
Jerry Zhang	f1755d9aea	Insert GetAttr for quantization parameters instead of Constant (#30551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30551 To enable quantizing with shared types, we need to insert GetAttr nodes for quantization parameters since the code might be shared by multiple module instances and we'd like to make quantized module instance also share the same code but with different values of attributes. Test Plan: test_jit.py, test_quantization.py Imported from OSS Differential Revision: D18818652 fbshipit-source-id: fc95623cac59dcedd9e3f95397524eae515e7a11	2019-12-05 22:52:45 -08:00
Jerry Zhang	1fa4908ac0	Refactor test_quantization.py and enable `test_nested` (#30475 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30475 att Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D18795727 fbshipit-source-id: c9942c5361e0a34e91a08b8fc27405799db7ff4f	2019-12-05 21:56:03 -08:00
Rohan Varma	ef95a72690	modify test_local_shutdown_with_rpc to not be flaky (#30837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30837 This test would get very occasional flakes, with an error saying the RPC timed out. This happened because one worker would still be waiting for the return value of an RPC, but another worker had already performed its local shutdown, so it would not have sent the response. This didn't show up in initial testing since the flakiness is very rare (< 1/100 test runs). This diff fixes the issue by not erroring if these RPCs timeout. The reason this is okay is because with a local shutdown, we should not expect for all outstanding RPCs to be completed, since workers are free to shut down without completing/waiting on outstanding work. ghstack-source-id: 95021672 ghstack-source-id: 95021672 Test Plan: Ran the test 1000 times to ensure that it is not flaky. Differential Revision: D18775731 fbshipit-source-id: 21074e8b4b4bbab2be7b0a59e80cb31bb471ea46	2019-12-05 21:46:39 -08:00
Joseph Spisak	7af9d77290	Update persons_of_interest.rst Updating to add POI for mobile, quantization and an addition to optimizers.	2019-12-05 21:20:40 -08:00
Jerry Zhang	a7406516d1	Refactor bias and weight check and add aten::linear pattern (#30474 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30474 There are some common parts in `isBiasOfConvOrLinear` and `isWeightOfConvOrLinear`, we can factor them out, the refactor will allow for easier extension of new patterns Test Plan: python test/test_jit.py python test/test_quantization.py Imported from OSS Differential Revision: D18795725 fbshipit-source-id: 446463da5e3fa8464db441ed0d9651930487b3b7	2019-12-05 21:00:39 -08:00
Supriya Rao	a51c5f5cbf	Add JIT pass to insert permutes for conv ops (#30679 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30679 Caffe2 expects quantized ops to be in NHWC format while pytorch inputs are in NCHW. Add a jit pass to insert permutes to convert from nchw2nhwc before each conv op and add nhwc2nchw permute after the conv op. Using graph rewriter to find consecutive redundant permutes and remove them from the graph Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps Imported from OSS Differential Revision: D18790518 fbshipit-source-id: 4dd39cf0b31b21f5586c0edfdce2260d4e245112	2019-12-05 18:51:16 -08:00
Zachary DeVito	c1159494a6	Revert D18621773: we should have a config-based way to skip flaky tests Test Plan: revert-hammer Differential Revision: D18621773 Original commit changeset: 5532f1d5fa3f fbshipit-source-id: 22239b88a6f9551938e6e2178bf9162e3385b011	2019-12-05 17:08:20 -08:00
Mingbo Wan	4034aa7621	make sure windows tests get triggered (#30836 ) Summary: we prefer "_" over "-" in build names, so change checks in test script Pull Request resolved: https://github.com/pytorch/pytorch/pull/30836 Differential Revision: D18840736 Pulled By: mingbowan fbshipit-source-id: 6fdf736496225c5f8ab44906d8f4681b7bf894a7	2019-12-05 15:47:56 -08:00
xiaobing.zhang	82c3f4861f	Move hardtanh activation to Aten(CPU, CUDA) (#30152 ) Summary: VitalyFedyunin, This PR is about port Hardtanh activation to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" m = nn.Hardtanh() if torch.cuda.is_available(): device = "cuda" m = m.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) fwd_t = 0 bwd_t = 0 for i in range(10000): t1 = _time() output = m(input) t2 = _time() output.backward(grad_output) t3 = _time() fwd_t = fwd_t + (t2 -t1) bwd_t = bwd_t + (t3 - t2) fwd_avg = fwd_t / 10000 * 1000 bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) forward time is %.2f (ms); backwad avg time is %.2f (ms)." % (n, fwd_avg, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. Perfromance: Before: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU input size(128, 100) forward time is 0.02 (ms); backwad avg time is 0.06 (ms). input size(128, 10000) forward time is 0.84 (ms); backwad avg time is 0.44 (ms). ``` After: ``` GPU: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.11 (ms). input size(128, 10000) forward time is 0.06 (ms); backwad avg time is 0.17 (ms). CPU input size(128, 100) forward time is 0.02 (ms); backwad avg time is 0.05 (ms). input size(128, 10000) forward time is 0.61 (ms); backwad avg time is 0.10 (ms). ``` `OMP_NUM_THREADS=1:` ``` Before: input size(128, 100) forward time is 0.05 (ms); backwad avg time is 0.07 (ms). input size(128, 10000) forward time is 5.21 (ms); backwad avg time is 5.25 (ms). After: input size(128, 100) forward time is 0.01 (ms); backwad avg time is 0.02 (ms). input size(128, 10000) forward time is 1.09 (ms); backwad avg time is 1.09 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30152 Differential Revision: D18815545 Pulled By: VitalyFedyunin fbshipit-source-id: d23b6b340a7276457f22dce826bcbe3b341d755f	2019-12-05 15:28:03 -08:00
Edward Yang	6e38d50352	Revert D18117070: Migrate max and min (binary) from TH to ATen. Test Plan: revert-hammer Differential Revision: D18117070 Original commit changeset: e06d37a8a140 fbshipit-source-id: 49dd33f52e7e3ffcaafc02109a0a0a67545ec7e8	2019-12-05 14:43:29 -08:00
Zachary DeVito	e5bd7a7942	we should have a config-based way to skip flaky tests (#29944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29944 This particular approach queries our issue tracker for test titles that match the following format: ``` DISABLED test_async_grad_guard_with_grad (jit.test_async.TestAsync) ``` And then skips the python test for them. There is 1 second timeout so if the internet flakes we still run the test suite, without disabling any tests. This is intended as a quick fix, similar to ninja unland, to get to a green master. Long term test disables should go into the code. Test Plan: Imported from OSS Differential Revision: D18621773 Pulled By: zdevito fbshipit-source-id: 5532f1d5fa3f83f77fc3597126cbb7dba09a3c33	2019-12-05 14:28:27 -08:00
Gregory Chanan	0974dcc244	Fix error checking of CUDA multi_margin_loss. (#30825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30825 It didn't verify in the 1-d case that the targets were size 1.. Test Plan: Imported from OSS Differential Revision: D18833659 Pulled By: gchanan fbshipit-source-id: 9b0276e7b0423fdaf2ba7cfa34bde541558c61f9	2019-12-05 14:23:00 -08:00
Edward Yang	2ced81f289	Revert "Default to not build Caffe2 operators on Windows. (#29061 )" (#30740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30740 This reverts commit 7102aceaf88ab71781c6019458bd7a07e86a532f. Test Plan: Imported from OSS Differential Revision: D18834315 Pulled By: ezyang fbshipit-source-id: 2dbd1cf686864b9840365083182cd6188a285399	2019-12-05 14:01:59 -08:00
Hong Xu	f874230d33	Vectorize smooth L1 loss backward function on CPU. (#30046 ) Summary: Benchmark (Intel i7-8850H, turbo off, release build, RHEL 7.7): ``` import timeit for dtype in ('torch.float', 'torch.double'): print(f'dtype={dtype}') for n, t in [(10_000, 100000), (100_000, 20000)]: print(f'numel() == {n} for {t} times') print(timeit.timeit('output.backward(retain_graph=True)', number=t, setup=f""" import torch loss = torch.nn.SmoothL1Loss() input = torch.randn({n}, requires_grad=True) target = torch.randn({n}) output = loss(input, target) """)) ``` Before: ``` dtype=torch.float numel() == 10000 for 100000 times 6.154701935998673 numel() == 100000 for 20000 times 5.157296671999575 dtype=torch.double numel() == 10000 for 100000 times 6.195317157000318 numel() == 100000 for 20000 times 5.099748799999361 ``` After: ``` dtype=torch.float numel() == 10000 for 100000 times 4.968745516000126 numel() == 100000 for 20000 times 2.4029395039997326 dtype=torch.double numel() == 10000 for 100000 times 4.9910852479988534 numel() == 100000 for 20000 times 2.4867371629989066 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30046 Differential Revision: D18602399 Pulled By: VitalyFedyunin fbshipit-source-id: 4c6c7b7b69ad6bce759786ddd7d6bc1e88ecf6ab	2019-12-05 13:57:42 -08:00
peterjc123	6486bdfb90	Fix `os.register_at_fork` not defined on Windows (#30809 ) Summary: According to https://docs.python.org/3.8/library/os.html#os.register_at_fork, this function is only available in Unix platforms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30809 Differential Revision: D18828777 Pulled By: bddppq fbshipit-source-id: 3325a984da488bb0a80a5c27131553fbcf78921f	2019-12-05 13:36:53 -08:00
zrphercule	c564d794ed	Add ATen/native/ headers to torch target (#30835 ) Summary: We dont have ATen/native/*.h in torch target before, and we would like it to be exposed for external use. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30835 Differential Revision: D18836160 Pulled By: zrphercule fbshipit-source-id: 7330a9c9d8b65f173cc332b1cfeeb18c7dca20a8	2019-12-05 13:24:21 -08:00
Will Feng	244b0bd1a5	Add docs for how we expose declarations in at:: to torch:: (#30760 ) Summary: This PR adds docs for how we expose declarations in `at::` to `torch::`, to make the semantics more clear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30760 Differential Revision: D18833081 Pulled By: yf225 fbshipit-source-id: eff4d8815c67f681ce3a930ce99771cf2e55dbd9	2019-12-05 13:05:28 -08:00
Jiakai Liu	be55874f2c	style fixes to code analyzer (#30808 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30808 Addressed some comments on #29550 after it's landed. Test Plan: ``` LLVM_DIR=... ANALYZE_TEST=1 CHECK_RESULT=1 tools/code_analyzer/build.sh LLVM_DIR=... ANALYZE_TORCH=1 tools/code_analyzer/build.sh -closure=false -debug_path=true ``` Differential Revision: D18835100 Pulled By: ljk53 fbshipit-source-id: 991d292ddc0211a88b04d0bdc24719f471c7786e	2019-12-05 11:25:37 -08:00
Qi Zhou	9617d07bd5	Wrap warning handler in a function to avoid siof (#30800 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30800 SparseNN benchmark crashed due to this. Wrap warning handler in a function to avoid siof. Test Plan: Tested locally, SparseNN benchmark no longer crashes. Reviewed By: yinghai Differential Revision: D18826731 fbshipit-source-id: 8fcab8a3f38cc20f775409c0686363af3c27d0a6	2019-12-05 11:22:15 -08:00
Jiakai Liu	bf1b4b6fef	add torch_cpu to the static library list in TorchConfig.cmake.in (#30769 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30769 The TorchConfig.cmake is the public cmake we produce in install folder for 3rd party client code to get all libtorch dependencies easily. Apparently this build flow is not well covered by our CI (which is focused on 1st party build / shared libraries?) as the little dummy project for code analysis testing purpose was broken by #30315 without fail any CI. Fixed the problem for mobile build and add the dummy project build to mobile CI as well. Test Plan: - make sure new CI pass; Differential Revision: D18825054 Pulled By: ljk53 fbshipit-source-id: 80506f3875ffbc1a191154bb9e3621c621e08b12	2019-12-05 11:13:32 -08:00
Nathan Goldbaum	f531815526	Deprecate tensor.type() (#30281 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/29161. I looked a bit at the code changes related to this and think I have all of the use cases of `DeprecatedTypeProperties` covered in the message, but suggestions from someone with more context on this would be very much appreciated :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30281 Differential Revision: D18830818 Pulled By: ezyang fbshipit-source-id: 1a7fcee15354ae09e6644577e7fa33bd26acfe20	2019-12-05 10:55:34 -08:00
Natalia Gimelshein	2171f91053	reenable cuda_kernel_loop_overflow_large test (#30797 ) Summary: Fix https://github.com/pytorch/pytorch/issues/30771 has landed, original issue https://github.com/pytorch/pytorch/issues/26838 is now closed cc peterjc123 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30797 Differential Revision: D18827307 Pulled By: ngimel fbshipit-source-id: 41b3db5fc9db85daeaa1b53c55b468976c996285	2019-12-05 10:09:39 -08:00
Hong Xu	1578a28692	Migrate max and min (binary) from TH to ATen. (#27185 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/27185 TH implementation will be removed after the unary max and min are migrated. Benchmark: (Debian 10, Release build, gcc 7.4, no turbo) ```python import timeit for device in ('cpu', 'cuda'): print(f'device: {device}') for op in ('max', 'min'): for dtype in ('torch.double', 'torch.float', 'torch.int16', 'torch.int32', 'torch.int64'): for n, t in [(10_000, 200000), (100_000, 20000)]: print(f'torch.{op}(a, b), numel() == {n} for {t} times, dtype={dtype}') print(timeit.timeit(f'torch.{op}(a)' + (';torch.cuda.synchronize()' if device == 'cuda' else ''), setup=f'import torch; a = torch.arange({n}, dtype={dtype}); b = torch.ones({n}, 0, dtype={dtype}) * ({n} / 2)', number=t)) print() ``` Before: ``` device: cpu torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 2.241763713000182 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.7138833169992722 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.2183356810000987 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.7031846980007685 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.7704679510006827 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.289198366999699 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.7937613740014058 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.2930124340000475 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8032857640009752 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.2908709189996443 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 1.8829010000008566 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.2994690759987861 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 1.8037853410005482 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.2929310759991495 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.8075240359994496 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.2932477679987642 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.7868400779989315 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.2885970789993735 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8389664830010588 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.29402057399966 device: cuda torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 4.787109836999662 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.842438002999188 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 3.429616614999759 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.835390076999829 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.940423873000327 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.4108991760003846 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.9318018840003788 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4168134739993548 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 2.9610764919998473 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4189234130008117 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 2.960172712999338 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.4162539499993727 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.8985912560001452 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.4113489299998037 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.9160250799995993 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.4128787690005993 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.8806865219994506 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4086357010000938 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 2.9362181240012433 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4151225870009512 ``` After: ``` device: cpu torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 2.2685823729998447 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.72004808300062 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.212242640000113 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.7089235590001408 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.7767087259999244 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.2916517639996528 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.8265984959998605 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.3002885240002797 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8084679720004715 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.3012119999993956 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 1.8800218449996464 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.3060645710002063 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 2.4905043950002437 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.9126290209997023 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 1.7972335520007618 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.2918074379995232 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 1.8047651860006226 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.2992197730000044 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 1.8526509560006161 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.3030709570002728 device: cuda torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.double 4.700986622000528 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.8415469050005413 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.float 3.3051693249999516 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.float 1.8321999460004008 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.8086475109994353 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.405110773999695 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.913458047999484 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4236377289998927 torch.max(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 2.9386842409994642 torch.max(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4230227469997772 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.double 3.0341797270002644 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.double 1.4289592409995748 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.float 3.6091147850002017 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.float 2.036691903999781 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int16 2.8256167649997224 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int16 1.4078955400000268 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int32 2.8631781489993955 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int32 1.4210130069996012 torch.min(a, b), numel() == 10000 for 200000 times, dtype=torch.int64 3.0112479260005784 torch.min(a, b), numel() == 100000 for 20000 times, dtype=torch.int64 1.4297719679998409 ``` Solve partly #24594 #24595 Close #25016 Test Plan: Imported from OSS Differential Revision: D18117070 Pulled By: VitalyFedyunin fbshipit-source-id: e06d37a8a1405848ba0b9e398870a77eb52bae8b	2019-12-05 09:55:56 -08:00
Heungsub Hans Lee	fa251cfd97	Fully deprecate variadic inputs of checkpoint_sequential (#25985 ) Summary: To support variadic inputs of `checkpoint_sequential` was deprecated at https://github.com/pytorch/pytorch/issues/21006. This case should be warned with `DeprecationWarning` for PyTorch 1.2, but it should be simply failed with `TypeError` since PyTorch 1.3. This patch removes the `DeprecationWarning` for PyTorch 1.2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/25985 Differential Revision: D18809875 Pulled By: albanD fbshipit-source-id: e84dd8629c04979c4b2dc63e8ada94292e8cedd0	2019-12-05 09:23:28 -08:00
Gregory Chanan	2607772959	Turn off scalar_checks for SpatialDepthwiseConvolution and SpatialConvolutionMM. (#30789 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30789 The input(s) can't be 0-dimensional, so its irrelevant. Restacked version of: https://github.com/pytorch/pytorch/pull/30438 Test Plan: Imported from OSS Differential Revision: D18825716 Pulled By: gchanan fbshipit-source-id: a4883b795163efcb9d8dba6166d0f2102b6728a2	2019-12-05 08:07:31 -08:00
Gregory Chanan	f12332eb51	Move scalar_check from codegen to code in MultiLabelMarginCriterion. (#30770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30770 Restacked version of: https://github.com/pytorch/pytorch/pull/30753 Test Plan: Imported from OSS Differential Revision: D18821556 Pulled By: gchanan fbshipit-source-id: 64b7311b1eb3855c4f1981d060accc918b99088d	2019-12-05 08:07:26 -08:00
Gregory Chanan	50625798df	Fix scalar check of MultiLabelMarginLoss. (#30768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30768 The behavior didn't match the documentation, because the documentation (for 'none' reduction) reads: input X target -> output (N, C) X (N, C) -> (N,) (C,) X (C,) -> () but the later case would output (1,). This also changes the case to: () X (C,) -> () from: () X (C,) -> (C,) which makes more sense with the above formulas. Restacked version of: https://github.com/pytorch/pytorch/pull/30748 Test Plan: Imported from OSS Differential Revision: D18821554 Pulled By: gchanan fbshipit-source-id: 3df77c51cf25648cb5fab62a68b09f49c91dab4e	2019-12-05 08:07:20 -08:00
Gregory Chanan	473a044835	Fix a CUDA memory leak in MultiLabelMarginCriterion error checking. (#30767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30767 Restacked version of: https://github.com/pytorch/pytorch/pull/30733 Test Plan: Imported from OSS Differential Revision: D18821553 Pulled By: gchanan fbshipit-source-id: 8bf0365ce54dd2f07a5d6d0937332d0baf75b350	2019-12-05 08:07:15 -08:00
Gregory Chanan	ba1a9871cb	Turn off scalar_check for is_target for MultiLabelMarginCriterion, which is handled correctly in code. (#30766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30766 Restacked version of: https://github.com/pytorch/pytorch/pull/30728 Test Plan: Imported from OSS Differential Revision: D18821555 Pulled By: gchanan fbshipit-source-id: 27acc72f82e94eddeea675ae66e010cfb2fc7421	2019-12-05 08:07:10 -08:00
Gregory Chanan	35a6997863	Support 0-d tensors in CUDA MultiLabelMarginCriterion. (#30765 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30765 It is already supported in CPU and is pretty easy to add for consistency. Restacked version of: https://github.com/pytorch/pytorch/pull/30727 Test Plan: Imported from OSS Differential Revision: D18821557 Pulled By: gchanan fbshipit-source-id: e6aa3e91000ff3fd63941defc7d30aef58ae2f82	2019-12-05 08:07:05 -08:00
Serhat Yilmaz	c4e9748bc6	Provide full path for buck hipification (#30746 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30746 This diff should be safe as long as open source build succeeds and should have no impact to cuda. Differential Revision: D18811302 fbshipit-source-id: a7adab993816cba51842701898fac5019438b664	2019-12-05 07:57:52 -08:00
Dylan Bespalko	f2a2fec47c	CUDA-strided-complex Binary and Unary Op support (#30295 ) Summary: In-tree changes to pytorch to support complex numbers are being submitted here. Out-of-tree support for CUDA complex numbers is here: [pytorch-cuda-strided-complex extension](https://gitlab.com/pytorch-complex/pytorch-cuda-strided-complex) Changes so far: - [x] Added complex support of torch.empty and torch.fill() - [x] Added complex support of CopyKernels - The 'static_cast_with_inter_type' template function is specialized for the following cases - `dest_t = thrust::complex<dest_value_t>`, `src_t = std::complex<src_value_t>` - `dest_t = std::complex<dest_value_t>`, `src_t = thrust::complex<src_value_t>` - This handles the compile-time case where `dest_value_t=double` and `src_value_t=float`. - [x] Added complex support of BinaryOp kernels - `using thrust_t = typename ztype_cuda<scalar_t>::thrust_t;` converts std::complex<T> ScalarTypes to thrust types and is a no-op of other Scalar Types. - The operator is performed using complex number support defined in `thrust/complex.h` - This could be extended to work with ROCm by using `rocm/complex.h` - [x] Added complex support of UnaryOp kernels - Added CUDA support for `angle()`, `real()`, `imag()`, `conj()` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30295 Differential Revision: D18781954 Pulled By: ezyang fbshipit-source-id: 25d204c0b8143ee27fda345a5d6a82f095da92a7	2019-12-05 07:30:39 -08:00
Sebastian Messmer	139aa51962	Clean up non-C++14 code (#28443 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28443 We're now on C++14, so we don't need the else branch of these ifdef's anymore ghstack-source-id: 94904074 Test Plan: waitforsandcastle Differential Revision: D18069136 fbshipit-source-id: f1613cab9a99ee30f99775e4a60a1b06fd0a03ff	2019-12-05 00:41:29 -08:00
Soumith Chintala	a939b52ddb	fix AvgPool2d for 2^31-1 sized inputs, and get test_cuda_kernel_loop_… (#30771 ) Summary: …overflow_large to working state Pull Request resolved: https://github.com/pytorch/pytorch/pull/30771 Differential Revision: D18821529 Pulled By: ngimel fbshipit-source-id: c5cbf56e686a2a3cfc7274dd96db37289dac7588	2019-12-04 20:58:30 -08:00
Jerry Zhang	1d20c32bf1	Make `InsertQuantDeQuantHelper` global (#30550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30550 Right now we have a `InsertQuantDeQuantHelper` for each module, but we need it to be global because we need to know what graphs have been quantized before and based on this information we can decide how to handle the module instance. Test Plan: test_jit.py, test_quantization.py Imported from OSS Differential Revision: D18818651 fbshipit-source-id: bfcaf37094ce20a257171a0c99b05b9348ebc13d	2019-12-04 20:03:00 -08:00
Jerry Zhang	c4c2e23385	Supporting making submodules unique (#30037 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30037 Support quantization for modules with reused submodules, e.g. relu (automatically make unique) We first do a pass on the graph to find all duplicate uses of the same module, and record the `Value`s of the module instance, for each of these values we create a new module and change the access to that module. Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D18821483 fbshipit-source-id: 1698b981e9e9f0c728d9f03fcbcfbd260151f679	2019-12-04 19:26:56 -08:00
Zachary DeVito	7a2889b014	Stop producing op_version_set version numbers. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28122 Test Plan: Imported from OSS Differential Revision: D17959565 Pulled By: zdevito fbshipit-source-id: 701101bd870700eb0c9882c69e2cfdd2524b555e	2019-12-04 19:14:43 -08:00
Jerry Zhang	3c1bb21cf5	Invoke more passes in `insertObservers` (#30473 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30473 Invoked `ConstantPooling` and `FuseLinear` pass before `insertObservers`. `ConstantPooling` is for cleanning up traced graph, e.g. when we have to constant node that has the same value, this pass will merge them, this allows us to have less quantization patterns `FuseLinear` is to merge the exploded linear function into `aten::linear` so that we can quantize this function properly. We need to fuse it because right now the way we recognize weight and bias is by matching the argument position in certain function calls, e.g. 1st argument of aten::conv2d is weight. Therefore we have to preserve the bounary of the linear function to recognize the weight of linear. Since in the exploded linear code, input of addmm is transposed weight rather than the original weight of linear. ghstack-source-id: 94887831 Test Plan: This is needed for quantizing traced model tests to pass Imported from OSS Differential Revision: D18795722 fbshipit-source-id: 192d9d1e56307e2e1d90e30dce0502e31cb4f829	2019-12-04 18:45:04 -08:00
Jongsoo Park	e09c415387	Back out "make the order btw div and mul in adagrad update consistent" (#30737 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30737 Original commit changeset: 2a8b2a3f5401 Reverting this to be safe until we address test failures in T58528495 Test Plan: CI Reviewed By: wx1988 Differential Revision: D18812384 fbshipit-source-id: 2a3ac554024773022ec827f259127e4c8cffe6e2	2019-12-04 17:43:45 -08:00
Nathan Goldbaum	1f1ce53e8e	Don't install pybind11 header directory for system pybind11 installs (#30758 ) Summary: For system pybind11 installs this is a system header location that should not get installed since it might include other unrelated headers. Since the header is already installed for a system install there's no need to install the headers, so only do the install when we use the bundled pybind11 version. Closes https://github.com/pytorch/pytorch/issues/29823. Closes https://github.com/pytorch/pytorch/issues/30627. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30758 Differential Revision: D18820189 Pulled By: bddppq fbshipit-source-id: fcc9fa657897e18c07da090752c912e3be513b17	2019-12-04 16:43:21 -08:00
Wanchao Liang	569ea63f3b	fix anynonzero op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29423 Test Plan: Imported from OSS Differential Revision: D18820523 fbshipit-source-id: 55c7a1911121f0aed008bd684b448151bbbf0a8a	2019-12-04 16:40:43 -08:00
svcscm	1d8a13147c	Updating submodules Summary: GitHub commits: `1e345af4de` `61d54df22c` `dab87e19bf` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 88e55e94c7473a7a310338eaaf508e7fc71e0df6	2019-12-04 16:40:39 -08:00
svcscm	cd032c7f6a	Updating submodules Summary: GitHub commits: `b94ef9fb23` `4462a7f00a` `16e629c415` `50770702ad` `5b632a5deb` `d2fa2cbcd6` `4e152f651e` `54c89b5f03` Test Plan: n/a Reviewed By: yns88 fbshipit-source-id: 766783d00f8440c1264f13045ae6411233355af6	2019-12-04 14:56:01 -08:00
Jerry Zhang	1707774417	AddConstant and findConstant for ClassType (#29217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29217 We want to preserve constant information in ClassType so that users can access the constants in the module by name. This is also used later for freezing some attribute(converting attributes to constant) Test Plan: tbd Imported from OSS Differential Revision: D18799955 fbshipit-source-id: fbfbcd5d3f7f560368b96e2a87e270c822a3d03a	2019-12-04 14:17:13 -08:00
davidriazati	2308a0ec1b	Improve documentation around builtin functions (#30347 ) Summary: This breaks the builtins page into some more sections and adds details about Python built-in functions ](https://our.intern.facebook.com/intern/diff/18718166/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30347 Pulled By: driazati Reviewed By: wanchaol Differential Revision: D18718166 fbshipit-source-id: bf43260ab7bcf92cccef684a5ce68cb16020771d	2019-12-04 13:50:40 -08:00
Gregory Chanan	42e79d7e8a	Kill THNN version of MultiMarginCriterion; it's not used anymore. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30725 Test Plan: Imported from OSS Differential Revision: D18808767 Pulled By: gchanan fbshipit-source-id: bcc4a6e272036f3d167fc158a53fe7aa1dec51f9	2019-12-04 13:46:32 -08:00
Nathan Goldbaum	9d3402e4cb	Add the __torch_function__ API override mechanism (#30730 ) Summary: This is a re-do of https://github.com/pytorch/pytorch/issues/27064, which was reverted (`b8792c0438`). This was landed at the same time as other work that added new operators to the `torch` namespace so the check for whether the `torch` namespace is exhaustively checked for overridability was triggering test failures. I've temporarily disabled that check and added an explanatory comment that the check will be re-enabled in a future PR that will be merged during a time when the commit velocity on PyTorch is lower. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30730 Differential Revision: D18813270 Pulled By: ezyang fbshipit-source-id: 70477c4656dca8fea6e7bc59259555041fcfbf68	2019-12-04 13:19:07 -08:00
xiaobing.zhang	289e9a07fd	Move Tanh backward to Aten(CPU+CUDA) (#30224 ) Summary: VitalyFedyunin, This PR is about port Tanh backward to Aten: Test script: ``` import torch import torch.nn as nn import time torch.manual_seed(0) def _time(): if torch.cuda.is_available(): torch.cuda.synchronize() return time.time() device = "cpu" m = nn.Tanh() if torch.cuda.is_available(): device = "cuda" m = m.cuda() #warm up for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) for i in range(1000): output = m(input) output.backward(grad_output) for n in [100, 10000]: input = torch.randn(128, n, requires_grad=True, device=device) grad_output = torch.ones(128, n, device=device) bwd_t = 0 for i in range(10000): output = m(input) t1 = _time() output.backward(grad_output) t2 = _time() bwd_t = bwd_t + (t2 - t1) bwd_avg = bwd_t / 10000 * 1000 print("input size(128, %d) backwad avg time is %.2f (ms)." % (n, bwd_avg)) ``` Test Device: CPU: skx-8180, GPU: Tesla P40. Perfromance: Before: ``` GPU: input size(128, 100) backwad avg time is 0.12 (ms). input size(128, 10000) backwad avg time is 0.17 (ms). CPU input size(128, 100) backwad avg time is 0.05 (ms). input size(128, 10000) backwad avg time is 0.35 (ms). ``` After: ``` GPU: input size(128, 100) backwad avg time is 0.12 (ms). input size(128, 10000) backwad avg time is 0.17 (ms). CPU input size(128, 100) backwad avg time is 0.04 (ms). input size(128, 10000) backwad avg time is 0.25 (ms). ``` `OMP_NUM_THREADS=1:` ``` Before: input size(128, 100) backwad avg time is 0.03 (ms). input size(128, 10000) backwad avg time is 1.85 (ms). After: input size(128, 100) backwad avg time is 0.02 (ms). input size(128, 10000) backwad avg time is 1.16 (ms). ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30224 Differential Revision: D18810045 Pulled By: VitalyFedyunin fbshipit-source-id: ab37948ab8f76bdaf9f3d1388562eaf29dacc0ea	2019-12-04 12:55:33 -08:00
Elias Ellison	d38f9117fd	Cache compilation of free functions (#30503 ) Summary: We don't have to recompile free functions if we've already compiled them. Improved compilation of resnet18 by 27%. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30503 Differential Revision: D18796501 Pulled By: eellison fbshipit-source-id: 2dee0fc5fcf9adc5b92213f8cb813730d71b376f	2019-12-04 12:45:35 -08:00
Jongsoo Park	9d69c55b0d	add MaskedRowWiseSparseAdagrad Summary: As title Test Plan: buck test caffe2/caffe2/fb/optimizers:masked_adagrad_test Reviewed By: chocjy Differential Revision: D18736639 fbshipit-source-id: d0d73f75228604d3448651bff2cf34ecc21f9ba6	2019-12-04 12:36:09 -08:00
Gregory Chanan	786de33832	Move scalar_check logic from codegen to code in NLLLoss. (#30670 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30670 Also turn off scalar_check for grad_input: it isn't necessary because the input can't be 0-dimensional. Test Plan: Imported from OSS Differential Revision: D18784523 Pulled By: gchanan fbshipit-source-id: 246d30970457075a0403dd0089317659a2cd2dd4	2019-12-04 12:30:23 -08:00
Gregory Chanan	fa2aa245cf	Simplify scalar_check of nll_loss. (#30669 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30669 The inputs can't be 0-d, so we don't need that check in the scalar_check. Test Plan: Imported from OSS Differential Revision: D18784524 Pulled By: gchanan fbshipit-source-id: d44222dffc91880a6e8c7be69e6e146e60040d43	2019-12-04 12:30:19 -08:00
Gregory Chanan	6918f0ce86	Move scalar_check for total_weight in NLLLoss functions to code from codegen. (#30665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30665 total_weight is a "hidden" output just for autograd, so it's not user visible. The existing test_nn tests cover this (I verified that the new code is executed) and this matches the CPU behavior. Test Plan: Imported from OSS Differential Revision: D18782709 Pulled By: gchanan fbshipit-source-id: 6d1c20eeaeffa14d06f375b37f11e866587f5fa0	2019-12-04 12:30:14 -08:00
Jerry Zhang	756f279d95	Rename QuantizeHelper to InsertQuantDeQuantHelper (#30549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30549 Preparing for later refactoring Test Plan: . Imported from OSS Differential Revision: D18802464 fbshipit-source-id: 0b5afb143549d93eed4c429125d3d5fd253093a9	2019-12-04 10:40:22 -08:00
Jerry Zhang	f73cd28082	InsertObservers for shared class types (#30548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30548 ClassTypes can be shared among different module instances, but previously we assumed they would be unique, this PR enables the insert_observers pass to work with shared class types Test Plan: python test/test_jit.py python test/test_quantization.py Imported from OSS Differential Revision: D18802465 fbshipit-source-id: b782e71e44a043af45577ac2b5c83e695155bb8b	2019-12-04 09:34:47 -08:00
Jiakai Liu	6e145b4614	add irregular c10 op registration/invocation cases to test project (#30558 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30558 Most c10 op registration/invocation cases are generated by aten codegen following some fixed pattern, but a handful of them were written manually, mainly for quantized ops. Added these "irregular" cases to the test project to verify static code analyzer can handle them as well. Test: - build and run the test project; Test Plan: Imported from OSS Differential Revision: D18811098 Pulled By: ljk53 fbshipit-source-id: 7bdf17175dfec41c56c0d70f124cc96478135bc4	2019-12-04 08:46:00 -08:00
Edward Yang	a55f125e3b	Check the error return of nvrtcGetProgramLogSize and nvrtcGetProgramLog (#30663 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30663 Yes they can fail. See https://github.com/ROCm-Developer-Tools/HIP/issues/1706 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18810088 Pulled By: ezyang fbshipit-source-id: 96186e71c9a195bdbbed811e7ba8dc40bec09eae	2019-12-04 08:37:43 -08:00
Jongsoo Park	ca072951d5	move MaskedAdagrad to caffe2/operators/experimental/optimizers (#30714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30714 Move Masked*Adagrad operators so caffe2/python/optimizer.py can use them. Test Plan: buck test caffe2/caffe2/operators/experimental/optimizers:masked_adagrad_test Reviewed By: chocjy Differential Revision: D18805532 fbshipit-source-id: 49b1f755b31296c62e7a6a8134313b962ad9690c	2019-12-04 08:29:13 -08:00
Tongzhou Wang	d0af07ca4c	Fix capitalization inconsistency in optim.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30608 Differential Revision: D18808516 Pulled By: ezyang fbshipit-source-id: 4be68be9a8c8c3da7a0b98162bc1050b588fab43	2019-12-04 08:17:03 -08:00
Edward Yang	38986e1dea	Split libtorch.so back into libtorch_{cpu,cuda,hip} (#30315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30315 The new structure is that libtorch_cpu contains the bulk of our code, and libtorch depends on libtorch_cpu and libtorch_cuda. This is a reland of https://github.com/pytorch/pytorch/pull/29731 but I've extracted all of the prep work into separate PRs which can be landed before this one. Some things of note: * torch/csrc/cuda/nccl.cpp was added to the wrong list of SRCS, now fixed (this didn't matter before because previously they were all in the same library) * The dummy file for libtorch was brought back from the dead; it was previously deleted in #20774 In an initial version of the patch, I forgot to make torch_cuda explicitly depend on torch_cpu. This lead to some very odd errors, most notably "bin/blob_test: hidden symbol `_ZNK6google8protobuf5Arena17OnArenaAllocationEPKSt9type_infom' in lib/libprotobuf.a(arena.cc.o) is referenced by DSO" * A number of places in Android/iOS builds have to add torch_cuda explicitly as a library, as they do not have transitive dependency calculation working correctly * I had to torch_cpu/torch_cuda caffe2_interface_library so that they get whole-archived linked into torch when you statically link. And I had to do this in an exported fashion because torch needs to depend on torch_cpu_library. In the end I exported everything and removed the redefinition in the Caffe2Config.cmake. However, I am not too sure why the old code did it in this way in the first place; however, it doesn't seem to have broken anything to switch it this way. * There's some uses of `__HIP_PLATFORM_HCC__` still in `torch_cpu` code, so I had to apply it to that library too (UGH). This manifests as a failer when trying to run the CUDA fuser. This doesn't really matter substantively right now because we still in-place HIPify, but it would be good to fix eventually. This was a bit difficult to debug because of an unrelated HIP bug, see https://github.com/ROCm-Developer-Tools/HIP/issues/1706 Fixes #27215 (as our libraries are smaller), and executes on part of the plan in #29235. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18790941 Pulled By: ezyang fbshipit-source-id: 01296f6089d3de5e8365251b490c51e694f2d6c7	2019-12-04 08:04:57 -08:00
Will Price	1189595875	Fix Tensor.argsort -> torch.argsort documentation link Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30464 Differential Revision: D18717657 Pulled By: zou3519 fbshipit-source-id: 9894f63c6cb1b5311117441e78805230d1bc09f3	2019-12-04 07:49:38 -08:00
Edward Yang	b8792c0438	Revert D18645954: add __torch_function__ API override mechanism Test Plan: revert-hammer Differential Revision: D18645954 Original commit changeset: 54b5e4344d7a fbshipit-source-id: 4a7aebb483e6b001130d6f384ccc53c5a808ab13	2019-12-04 07:41:47 -08:00
Tongzhou Wang	a68b790293	fix ref to nonexistent torch.repeat Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30614 Differential Revision: D18808517 Pulled By: ezyang fbshipit-source-id: 27f9bda6fbbd1c3c751a0e96fdc336bf724c0b31	2019-12-04 07:27:01 -08:00
Tongzhou Wang	ec7bb9de1c	format tri[lu]_indices doc better Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30377 Differential Revision: D18689152 Pulled By: zou3519 fbshipit-source-id: 7fab1e39ecd39ef6a3869befcbe217f8d3b6a87e	2019-12-04 07:16:34 -08:00
Tongzhou Wang	d6ca93b353	add doc for F.softplus Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30055 Differential Revision: D18762624 Pulled By: zou3519 fbshipit-source-id: 61da88cbb8cd0f37ac26b0fb8aaacdbe85c724ba	2019-12-04 07:16:30 -08:00
Prasun Anand	d12786b24f	add __torch_function__ API override mechanism (#27064 ) Summary: Closes https://github.com/pytorch/pytorch/issues/24015 (see description of that issue for more details). For a toy example, see the `DiagonalTensor` and `SubDiagonalTensor` class in test/test_overrides.py. This PR currently contains: * tests for `__torch_function__` behavior * modification to `gen_python_functions` and `parse` function signatures and dispatched to correct overloaded argument. This feature is inspired by and analogous to NumPy's `__array_function__` protocol ([see NumPy Enhancement Proposal 18](https://numpy.org/neps/nep-0018-array-function-protocol.html#trying-array-function-methods-until-the-right-one-works)). ### Benchmarks: See Nathan's comment below: https://github.com/pytorch/pytorch/pull/27064#issuecomment-554601189 Pull Request resolved: https://github.com/pytorch/pytorch/pull/27064 Differential Revision: D18645954 Pulled By: ezyang fbshipit-source-id: 54b5e4344d7afdbcf996bb57191b0bdadc7b1767	2019-12-04 05:56:46 -08:00
Jiakai Liu	c0299d2707	add LLVM code analyzer in order to replace static dispatch Summary: [Why static dispatch] Static dispatch was introduced to allow stripping out unused ops at link time (with “gc-sections” linker flag) for mobile build. The alternative approaches to do "non-static" dispatch are: * virtual methods - old ATen dispatcher, which has already been deprecated; * registry pattern - used by caffe2, c10 and JIT; However, none of them are “gc-sections” friendly. Global registers are root symbols - linker cannot strip out any op if we use registry pattern for mobile. [Why static dispatch isn’t great] * One more code path to maintain; * Need recompile framework to add new backends/ops; * Doesn’t support AutoGrad yet thus blocks on-device training; [Static Code Analysis] This PR introduces a LLVM analysis pass. It takes LLVM bitcode / assembly as input and generates dependecy graph among aten ops. From a set of root ops used by a model, we can calculate transitive closure of all dependent ops, then we can ask codegen to only register these ops. [Approach] To generate the dependency graph it searches for 3 types of connections in LLVM bitcode / assembly: 1) op registration: op name (schema string literal) -> registered function; 2) regular function call: function -> function; 3) op invocation: function -> op name (schema string literal) For 2) it uses similar algorithm as llvm::LazyCallGraph - not only looks into call/invoke instructions but also recursively searches for function pointers in each instruction's operands. For 1) and 3) it searches for connections between operator name string literals / function pointers and c10 op registration/invocation API calls in LLVM IR graph via "use" edges (bi-directional): 1. llvm::Value has "users()" method to get other llvm::Value nodes that use the value; 2. most of types derive from llvm::User which has "operands()" method to get other llvm::Value nodes being used by the value; [Limitation] For now the search doesn't go beyond the function boundary because the reference to op name string literals and c10 op registration/invocation APIs are almost always in the same function. The script uses regular expression to identify c10 API calls: * op_schema_pattern="^(aten\|quantized\|profiler\|_test)::[^ ]+" * op_register_pattern="c10::RegisterOperators::(op\|checkSchemaAndRegisterOp_)" * op_invoke_pattern="c10::Dispatcher::findSchema\|callOp" If we create helper function around c10 API (e.g. the "callOp" method defined in aten/native), we could simply add them to the regular expression used to identify c10 API. [Example] In the following example, it finds out: 1) the registered function for "quantized:add" operator; 2) one possible call path to at::empty() function; 3) the called operator name "aten::empty": - "quantized::add" - c10::detail::wrap_kernel_functor_unboxed_<at::native::(anonymous namespace)::QAdd<false>, at::Tensor (at::Tensor, at::Tensor, double, long)>::call(c10::OperatorKernel, at::Tensor, at::Tensor, double, long) - at::native::(anonymous namespace)::QAdd<false>::operator()(at::Tensor, at::Tensor, double, long) - void at::native::DispatchStub<void ()(at::Tensor&, at::Tensor const&, at::Tensor const&), at::native::qadd_stub>::operator()<at::Tensor&, at::Tensor const&, at::Tensor const&>(c10::DeviceType, at::Tensor&, at::Tensor const&, at::Tensor const&) - at::native::DispatchStub<void ()(at::Tensor&, at::Tensor const&, at::Tensor const&), at::native::qadd_stub>::choose_cpu_impl() - void at::native::(anonymous namespace)::qadd_kernel<false>(at::Tensor&, at::Tensor const&, at::Tensor const&) - at::TensorIterator::binary_op(at::Tensor&, at::Tensor const&, at::Tensor const&, bool) - at::TensorIterator::build() - at::TensorIterator::fast_set_up() - at::empty(c10::ArrayRef<long>, c10::TensorOptions const&, c10::optional<c10::MemoryFormat>) - "aten::empty" [How do we know it’s correct?] Built a test project that contains different op registration/invocation patterns found in pytorch codebase, including both codegen and non-codegen cases. * Tried different optimization flags “-O0”, “-O3” - the result seems to be stable. * Filtered by common patterns: “aten::”, “at::”, “at::native”, “at::CPUType”, “at::TypeDefault” - manually checked the relationship between function schema strings and corresponding implementations were captured. * It can print instruction level data flow and show warning message if it encounters unexpected cases (e.g.: found 0 or multiple op names per registration/invocation API call, found 0 registered functions, etc). * Verified consistent results on different linux / macOs hosts. It can handle different STL library ABI reliably, including rare corner cases for short string literals [Known issues] * Doesn’t handle C code yet; * Doesn’t handle overload name yet (all variants are collapsed into the main op name); Test Plan: ``` LLVM_DIR=... ANALYZE_TEST=1 CHECK_RESULT=1 scripts/build_code_analyzer.sh ``` Differential Revision: D18428118 Pulled By: ljk53 fbshipit-source-id: d505363fa0cbbcdae87492c1f2c29464f6df2fed	2019-12-04 01:02:33 -08:00
Qi Zhou	f5c9452beb	Fix toObject() r-value version (#30713 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30713 It should use moveToIntrusivePtr. This function is a very hot one and used a lot in interpreter loop. e.g. GET_ATTR, SET_ATTR. Making a copy and doing incref/decref caused big overhead. Reviewed By: yinghai Differential Revision: D18805212 fbshipit-source-id: 3a9368604f71638a21300ad086739c4b50f0644e	2019-12-04 00:19:35 -08:00
Jiakai Liu	d456a538f9	op dependency analysis bash driver Summary: Move the shell script into this separate PR to make the original PR smaller and less scary. Test Plan: - With stacked PRs: 1. analyze test project and compare with expected results: ``` ANALYZE_TEST=1 CHECK_RESULT=1 tools/code_analyzer/build.sh ``` 2. analyze LibTorch: ``` ANALYZE_TORCH=1 tools/code_analyzer/build.sh ``` Differential Revision: D18474749 Pulled By: ljk53 fbshipit-source-id: 55c5cae3636cf2b1c4928fd2dc615d01f287076a	2019-12-04 00:12:24 -08:00
Michael Suo	7e472679ff	pin actions/checkout version Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30703 Test Plan: Imported from OSS Differential Revision: D18805447 Pulled By: suo fbshipit-source-id: d58ebe0e90b81c9282d3977f36c53c54cac750d9	2019-12-03 20:52:54 -08:00
Martin Yuan	b26401f965	Dump operator names of a script module (#30467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30467 Introduce function jit.export_opnames(module), which returns a list of all operator names used in the module and its submodules. One usage is to have mobile custom build to link only operators in the returned list to save the mobile size. Example: import torch m = torch.jit.load("example.pt") print(torch.jit.export_opnames(m)) The outputs are in alphabetical order: ['aten::_convolution', 'aten::add.Tensor', 'aten::add_.Tensor', 'aten::addmm', 'aten::append.Tensor', 'aten::cat', 'aten::dropout', 'aten::embedding', 'aten::matmul', 'aten::max.dim', 'aten::mul.Tensor', 'aten::permute', 'aten::relu', 'aten::t', 'aten::tanh', 'prim::ListConstruct', 'prim::TupleConstruct', 'prim::TupleUnpack'] Test Plan: Imported from OSS Differential Revision: D18801619 Pulled By: iseeyuan fbshipit-source-id: f9b198d3e82b095daf704ee595d8026ad889bb13	2019-12-03 20:20:33 -08:00
Shen Li	63a1542ed2	Adding Debug Info for RRef Context Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30610 Test Plan: Imported from OSS Differential Revision: D18763592 Pulled By: mrshenli fbshipit-source-id: ad8854bdb6250c29eaa0f582d66cfd31394312e5	2019-12-03 19:16:31 -08:00
Shen Li	6dda241ab8	Add RRef.__str__() API Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30609 Test Plan: Imported from OSS Differential Revision: D18763593 Pulled By: mrshenli fbshipit-source-id: 20f1eea2d6cfe9ab2a27a9677d97dde07c1dca9b	2019-12-03 19:16:26 -08:00
Hong Xu	bb5dcaf24f	Add logical_and and logical_or (#30521 ) Summary: With the CI failure caused in 8bbafa0b32d2899ef6101172d62c6049427c977b fixed (incorrect return type of the lambdas in CUDA kernels) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30521 Differential Revision: D18770151 Pulled By: ailzhang fbshipit-source-id: 02f0fe1d5718c34d24da6dbb5884ee8b247ce39a	2019-12-03 18:24:54 -08:00
Hong Xu	ab834d5093	Remove exp10 in TH (unused) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30422 Test Plan: Imported from OSS Differential Revision: D18764280 Pulled By: VitalyFedyunin fbshipit-source-id: 626b88a115f2efce4a53c6784f0a6660b36c97f9	2019-12-03 18:17:24 -08:00
Hong Xu	76acf5b553	Remove many unused bfloat16 functions in TH Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30329 Test Plan: Imported from OSS Differential Revision: D18764281 Pulled By: VitalyFedyunin fbshipit-source-id: bc3f91c6d09d4f73c77fe1492a358128744aee76	2019-12-03 18:17:19 -08:00
Hong Xu	4ac614191a	Remove exp10 in TH (unused) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30422 Test Plan: Imported from OSS Differential Revision: D18764186 Pulled By: VitalyFedyunin fbshipit-source-id: 9343a5a7e4edf61ba3b85eaf846b2e149ed6529a	2019-12-03 18:17:15 -08:00
Michael Ranieri	ea3697db69	inline to prevent duplicate obj when linking (#30363 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30363 getting duplicate definition errors when linking test. ghstack-source-id: 94472892 Test Plan: CI passes Differential Revision: D18669686 fbshipit-source-id: 3d3bfc38e4247cf8bea655537824b891b84f67bc	2019-12-03 15:59:25 -08:00
Prasun Anand	3cf8382984	detect_anomaly() for SparseTensors (#29803 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28649 1. Modified detect_anomaly() to use isnan() 2. isnan() for SparseTensors returns a bool Tensor of _values. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29803 Differential Revision: D18594299 Pulled By: ezyang fbshipit-source-id: 3f4190c569f53219be330584fc604ca43c4a6c7a	2019-12-03 15:42:51 -08:00
Rohan Varma	fef4360536	remove default constructor in futureInfo (#30197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30197 This default constructor was added because std::map's operator[] requires a default constructor. However, instead of using operator[], we can use emplace and remove the constructor, to ensure that the FutureInfo struct doesnt get constructed with garbage values. ghstack-source-id: 94802453 Test Plan: Unit tests pass. Differential Revision: D18627675 fbshipit-source-id: c4cb000e60081478c0fd7308e17103ebbc4dc554	2019-12-03 15:36:22 -08:00
Tristan Rice	59151d3e43	autograd/profiler: support merging FunctionEventAvg (#30677 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30677 Currently you can only add FunctionEvents to FunctionEventAvg. This makes it so you can add multiple FunctionEventAvg objects together. This is useful for merging multiple profiles together such as when dealing with distributed training. Test Plan: added unit test buck test //caffe2/test:autograd -- test_profiler Reviewed By: bddppq Differential Revision: D18785578 fbshipit-source-id: 567a441dec885db7b0bd8f6e0ac9a60b18092278	2019-12-03 15:28:58 -08:00
Peter Bell	dcd1216efe	Force early initialization of OpenMP in forked children (#29006 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/28389 Intel's OpenMP implementation sets the thread affinity on the first call to an OpenMP function after a fork. By adding an atfork handler we can force this to happen before a user tries to set the affinity in their own DataLoader `worker_init_fn`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29006 Differential Revision: D18782456 Pulled By: ezyang fbshipit-source-id: ce0b515256da0cf18ceb125e0cdec99a3311bbd3	2019-12-03 15:23:31 -08:00
Brian Vaughan	a376dd344c	Added check for torch.where on CPU that both arguments have same dtype (#30662 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30662 Cherry picked from: https://github.com/pytorch/pytorch/pull/29081 Test Plan: Imported from OSS Differential Revision: D18782295 Pulled By: nairbv fbshipit-source-id: 897ab25ddf8819ca34f5e86c5d3f41debb56cb04 Co-authored-by: ifedan	2019-12-03 15:19:52 -08:00
Brian Vaughan	56dd2836ec	Make zeros argument of torch.where same dtype as other argument (#30661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30661 Cherry-picked from https://github.com/pytorch/pytorch/pull/29080 Test Plan: Imported from OSS Differential Revision: D18781870 Pulled By: nairbv fbshipit-source-id: 9de85aa91bf7e0856f35c7c6238a8923315ed27f Co-authored-by: ifedan	2019-12-03 15:19:48 -08:00
Shen Li	2ba03e0287	Enable test_trainer_ps in dist_autograd_test.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30341 Test Plan: Imported from OSS Differential Revision: D18769574 Pulled By: mrshenli fbshipit-source-id: caf25742fa1fc9dbf6486f5ec981fae3f29784bc	2019-12-03 15:12:36 -08:00
Nikolay Korovaiko	d4c25add45	make sure the counter stays correct in between bailout transitions (#30186 ) Summary: This fixes the second issue reported in https://github.com/pytorch/pytorch/issues/29909 namely, a loop counter is assigned the wrong values after transitioning to a bailout graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30186 Differential Revision: D18646845 Pulled By: Krovatkin fbshipit-source-id: 1f7c601dd9f35892979385ffa132fb0886a4f203	2019-12-03 14:59:08 -08:00
Will Feng	03a73cb9ac	Remove namespace F = torch::nn::functional from torch/nn/modules/batchhnorm.h (#30684 ) Summary: This PR removes `namespace F = torch::nn::functional` from `torch/nn/modules/batchhnorm.h`, so that people don't have to define `torch::nn::functional` as `F` if they don't want to. Fixes https://github.com/pytorch/pytorch/issues/30682. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30684 Differential Revision: D18795717 Pulled By: yf225 fbshipit-source-id: c9feffbeb632cc6b4ce3e6c22c0a78533bab69ad	2019-12-03 14:52:23 -08:00
Brian Vaughan	604a27361f	remove tuple_parser (#30659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30659 I could only find one usage of TupleParser and it doesn't seem worth maintaining just for that one usage. Test Plan: Imported from OSS Differential Revision: D18795979 Pulled By: nairbv fbshipit-source-id: 6e50d65fc8fade0944f36ab20d00f1539a3d4cb8	2019-12-03 14:49:59 -08:00
Joseph Spisak	4d4d8e0dce	Update persons_of_interest.rst (#30647 ) Summary: Adding back the 3 names for the MSFT team - re: ONNX Governance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30647 Differential Revision: D18781163 Pulled By: jlin27 fbshipit-source-id: 7284ba29841ab41b9807c9d92694630b50de7b6a	2019-12-03 14:46:15 -08:00
Michael Suo	4e6379379c	fetch before checking out PR tip Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30680 Test Plan: Imported from OSS Differential Revision: D18796189 Pulled By: suo fbshipit-source-id: 99da48e5fd510ffdf4e606c2393eb55d4f6ca8d5	2019-12-03 14:43:19 -08:00
Supriya Rao	980aead1f8	Add support for quantized slice conversion (#30498 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30498 Updated Int8SliceOp to accept dim, start and end index similar to Pytorch. Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_slice Imported from OSS Differential Revision: D18740519 fbshipit-source-id: 2313f37a4936edb150ce04911b241e591e191801	2019-12-03 14:37:59 -08:00
Sebastian Messmer	bc2e6d10fa	Back out "Revert D17908478: Switch PyTorch/Caffe2 to C++14" Summary: Original commit changeset: 775d2e29be0b Test Plan: CI Reviewed By: mruberry Differential Revision: D18775520 fbshipit-source-id: a350b3f86b66d97241f208786ee67e9a51172eac	2019-12-03 14:33:43 -08:00
ashish	aff693ab1c	Ensure MIOpen is called on same stream as operator for RNN (#30672 ) Summary: To ensure synchronization between copying of weights in RNN wei buf, and the operation, both the pyTorch operator as well as underlying MIOpen call must be on the same HIP stream. This is also consistent with MIOpen calls in other pyTorch operators ezyang iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/30672 Differential Revision: D18785683 Pulled By: bddppq fbshipit-source-id: 144611046cb70cfe450680295734203f253ac6e2	2019-12-03 14:28:45 -08:00
Yanli Zhao	40146eb48e	Skip ProcessGroupGlooAyncTest if there is no CUDA available (#30345 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30345 Skip ProcessGroupGlooAyncTest if there is no CUDA available, otherwise in sandcastle non GPU host the test will abort with failing to load CUDA library ghstack-source-id: 94771241 Test Plan: test skipped on non GPU host Differential Revision: D18665322 fbshipit-source-id: 8c7b89aeecc6ec007bee12d864a6058384254e61	2019-12-03 13:27:34 -08:00
Jerry Zhang	19cd90d303	Globally record observer nodes (#30547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30547 att Test Plan: test_jit.py test_quantization.py Imported from OSS Differential Revision: D18784752 fbshipit-source-id: 000e140aa86ff12a240d98da71871a5a5053401f	2019-12-03 12:16:00 -08:00
Natalia Gimelshein	1b5ce05924	don't use size()/stride() functions in TensorImpl, use size_[d]/stride_[d] instead (#30452 ) Summary: This improved multi-d microbenchmark by ~100 ns, empty_tensor_restride used to be 13% of iteration time, now about 5% Pull Request resolved: https://github.com/pytorch/pytorch/pull/30452 Test Plan: Covered by existing tests Differential Revision: D18704233 Pulled By: ngimel fbshipit-source-id: be527f09183bc31e9d1f63fd49bfbe0998fe167f	2019-12-03 11:38:07 -08:00
Jerry Zhang	7023e13fbb	Fix mapping white list (#30636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30636 Currently DeQuantStub is still in whitelist because set union has lower precedence than set difference fix issue: https://github.com/pytorch/pytorch/issues/29646 Test Plan: verified locally that we don't attach qconfig for DeQuantStub Imported from OSS Differential Revision: D18775275 fbshipit-source-id: 8da07e40963555671b3d4326c9291706103f858e	2019-12-03 11:34:28 -08:00
Tao Xu	f114c33e69	Fix iOS CI (#30327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30327 ### Summary Seems like starting from macOS 10.15, we can no longer get access to the `Downloads` folder in our macOS machines. ``` permissionError: [Errno 1] Operation not permitted: '/Users/distiller/Downloads' ``` The fix is to change the conda download directory to ${HOME} ### Test Plan - iOS jobs are back to normal - Don't break other jobs Test Plan: Imported from OSS Differential Revision: D18717380 Pulled By: xta0 fbshipit-source-id: cad754076bf4ae5035741aa57a310ad87c76726e	2019-12-03 11:24:21 -08:00
Edward Yang	1b12fd33ed	Add missing trigramma_stub definition. (#30314 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30314 Somehow we forgot to define it! Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18762356 Pulled By: ezyang fbshipit-source-id: 28afc605ad986266071e3831049ec8a7f71fd695	2019-12-03 10:46:52 -08:00
Edward Yang	a009fc14be	Workaround hcc bug regarding extern "C" definitions (#30313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30313 See comments in code about the bug. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18762360 Pulled By: ezyang fbshipit-source-id: 406a01f2f0c3722b381428c89afd67b3c3c19142	2019-12-03 10:46:48 -08:00
Edward Yang	8269f7b652	Delete redundant THC_API on THCStorage_new (#30312 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30312 It's not necessary because it's already defined in the header. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18762363 Pulled By: ezyang fbshipit-source-id: 418bf355d460dd171ac449559f20bf55415e54ae	2019-12-03 10:46:43 -08:00
Edward Yang	d43e205026	Properly include declaration of dispatch in file that registers it. (#30311 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30311 multinomial_stub must be in scope to register against it. Somehow, this works today, but when I split torch_cpu and torch_cuda it doesn't. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18762358 Pulled By: ezyang fbshipit-source-id: ef9c111292cd02d816af1c94c8bbaadabffaabe5	2019-12-03 10:46:38 -08:00
Edward Yang	a5b1f6e7d7	Add missing _API definitions. (#30310 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30310 - Annotate CUDAGenerator.h with correct TORCH_CUDA_API. This is actually CUDA related functionality with its implementation living in the cuda/ folder. For some reason it lives at the top level; it should be moved (but that should be handled in another PR.) - Add missing TORCH/CAFFE_API annotations to. All of these functions are used from CUDA code, which means that we need to correctly annotate them if we split CPU/CUDA code into separate libraries. Test Plan: Imported from OSS Differential Revision: D18762357 Pulled By: ezyang fbshipit-source-id: c975a8e4f082fe9f4196c2cca40977623caf4148	2019-12-03 10:46:32 -08:00
Edward Yang	08394cede3	DEFINE_DISPATCH in the correct namespace. (#30308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30308 Dispatch is declared in non-anonymous namespace, so it definitely shouldn't be defined in an anonymous namespace. This doesn't seem to matter today, but it matters when we split libtorch into two libraries. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18762361 Pulled By: ezyang fbshipit-source-id: 484f0fab183c385dd889db9dad3e48e92e0a3900	2019-12-03 10:46:27 -08:00
Edward Yang	9740011f10	Use normal dispatch to get to CUDA threshold kernels, instead of DispatchStub. (#30307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30307 DispatchStub will stop working when I split CPU/CUDA libraries, because there are some symbols from the templates in DispatchStub stubs which aren't properly exported and I couldn't figure out how to make them dispatch properly. This is the only case where DispatchStub is being used to dispatch to CUDA, anyway. This partially addresses #29844 but I need to also just completely delete the CUDA registration logic from DispatchStub entirely. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18762362 Pulled By: ezyang fbshipit-source-id: bdfa8739c0daf23badf3c5af61890a934af00813	2019-12-03 10:46:22 -08:00
Ailing Zhang	a997f224ac	Add torch.multiprocessing.create_processes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/28493 Differential Revision: D18766066 Pulled By: ailzhang fbshipit-source-id: 7f424c8fae3012be2416cf9bc72ee2dde40c1f89	2019-12-03 10:38:19 -08:00
Lara	4d30415f12	Add ONNX Scripting Conv Support (#30618 ) Summary: Convolution nodes are traced as aten:_convolution and are currently supported in ONNX. Scripting convolution uses aten:conv<1,2,3>d which are currently not supported in ONNX. This PR adds the symbolics for aten:conv<1,2,3>d and aten:conv_transpose<1,2,3>d Pull Request resolved: https://github.com/pytorch/pytorch/pull/30618 Reviewed By: hl475 Differential Revision: D18778145 Pulled By: houseroad fbshipit-source-id: 4af0379f29974a1ce8443024d1d87b3eb8d2dd36	2019-12-03 10:28:38 -08:00
Jerry Zhang	89be1a22d4	split getInvokedMethods (#30546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30546 factor out this function for later support of quantizing shared types Test Plan: test_jit.py, test_quantization.py Imported from OSS Differential Revision: D18776304 fbshipit-source-id: f5a736b0f69019cefe17ec4517da1ae5462f78e1	2019-12-03 10:11:57 -08:00
Natalia Gimelshein	d5c136097a	improve .view() performance (#30554 ) Summary: Improve .view() performance by not calling set_ and instead restriding returned alias. This improves performance of .view() operation from ~500ns to ~360 ns Pull Request resolved: https://github.com/pytorch/pytorch/pull/30554 Test Plan: covered by existing tests Differential Revision: D18759896 Pulled By: ngimel fbshipit-source-id: 9757c93158bc55e9c87dc30ac3415ba8f8b849e5	2019-12-03 09:17:43 -08:00
Rohan Varma	5a484245d9	Change test_invalid_names test to only test constructor of WorkerInfo (#30620 ) Summary: This tests seems to only test that we throw exceptions in the `WorkerInfo` constructor when invalid names are passed in, so I don't think we need to complicate by initializing RPC, and exposing ourselves to potential flakiness. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30620 Differential Revision: D18766955 Pulled By: rohan-varma fbshipit-source-id: 11643de4d57431e5f46e096c7766de3ab0b9b05a	2019-12-03 09:07:10 -08:00
Shen Li	f9f54201d3	Remove deprecated fromIvalue in RRefForkData Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30646 Test Plan: Imported from OSS Differential Revision: D18777610 Pulled By: mrshenli fbshipit-source-id: 7a749c1035e36bbb464332d3829fd53e2c6cf727	2019-12-03 09:01:40 -08:00
Nik Ved	b446572997	TestCppExtension now removes /tmp/torch_extensions folder so that it can be used by other users in a multi-user environment. (#30095 ) Summary: Previous behaviour: a user runs tests from `TestCppExtension` class so that `/tmp/torch_extensions` is created under her ownership and not removed afterwards, then the other user's run of the same tests might result in 'Permission denied' exception upon deleting `/tmp/torch_extensions`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30095 Differential Revision: D18770234 Pulled By: ezyang fbshipit-source-id: 4c6b972e4c4327a94c8b4bf6b0b9998a01c218bb	2019-12-03 07:44:27 -08:00
Gregory Chanan	8b29701ae5	Turn off scalar_checks for _th_reciprocal. (#30436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30436 The underlying TH implementation is correct. Test Plan: Imported from OSS Differential Revision: D18699088 Pulled By: gchanan fbshipit-source-id: e75a588ae4afb0506922ba98208546d5c0de623a	2019-12-03 07:04:53 -08:00
Gregory Chanan	61798865e3	Turn off scalar_checks for torch.clamp. (#30435 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30435 The underlying THC implementations are correct. Test Plan: Imported from OSS Differential Revision: D18699089 Pulled By: gchanan fbshipit-source-id: f5d1319bf48eae36903296dad0b98ed80661f732	2019-12-03 07:04:47 -08:00
Brian Vaughan	e5b947a3a8	Raise an error for is_signed on quantized types (#30527 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30527 When we introduced dtype.is_signed we allowed for support of quantized types, but we're not sure what the correct result should be. See discussion at https://github.com/pytorch/pytorch/pull/29511 Test Plan: Imported from OSS Differential Revision: D18765410 Pulled By: nairbv fbshipit-source-id: c87cfe999b604cfcbbafa561e04d0d5cdbf41e6d	2019-12-03 06:34:53 -08:00
Will Feng	18ec4632b3	Exclude undefined tensors in the result of Module::parameters() / named_paramters() / buffers() / named_buffers() (#30626 ) Summary: PR https://github.com/pytorch/pytorch/pull/30523 attempted to fix https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462, but the fix wasn't complete. This PR makes the following improvements: 1. Fixes https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462 properly by excluding undefined tensors in the result of `Module::parameters()` / `named_parameters()` / `buffers()` / `named_buffers()`, which mirrors the Python API behavior. 2. Audits all use sites of `Module::parameters_` / `buffers_` and change them to `Module::named_parameters(/recurse=/false)` / `named_buffers(/recurse=/false)` when appropriate, so that use sites of module parameters / buffers never need to worry about undefined tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30626 Differential Revision: D18777507 Pulled By: yf225 fbshipit-source-id: 55b64b69779e1186342efd3c44857f416334ed6b	2019-12-02 21:59:58 -08:00
Brian Wignall	e7fe64f6a6	Fix typos (#30606 ) Summary: Should be non-semantic. Uses https://en.wikipedia.org/wiki/Wikipedia:Lists_of_common_misspellings/For_machines to find likely typos. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30606 Differential Revision: D18763028 Pulled By: mrshenli fbshipit-source-id: 896515a2156d062653408852e6c04b429fc5955c	2019-12-02 20:17:42 -08:00
Jianyu Huang	0bebfe2143	Add the explicit per-tensor/per-channel quant info when we print the module (#30591 ) Summary: As Title says. We would like to explicitly distinguish per-tensor/per-channel scheme when we print the module. Here is an example for Lenet after applying the per-channel dynamic quantization: Before this PR: ``` FloatModel( (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1)) (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1)) (fc1): DynamicQuantizedLinear( in_features=800, out_features=500 (_packed_params): LinearPackedParams() ) (fc2): DynamicQuantizedLinear( in_features=500, out_features=10 (_packed_params): LinearPackedParams() ) ) ``` After this PR: ``` FloatModel( (conv1): Conv2d(1, 20, kernel_size=(5, 5), stride=(1, 1)) (conv2): Conv2d(20, 50, kernel_size=(5, 5), stride=(1, 1)) (fc1): DynamicQuantizedLinear( in_features=800, out_features=500, qscheme=torch.per_channel_affine (_packed_params): LinearPackedParams() ) (fc2): DynamicQuantizedLinear( in_features=500, out_features=10, qscheme=torch.per_channel_affine (_packed_params): LinearPackedParams() ) ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/30591 Differential Revision: D18764366 Pulled By: jianyuh fbshipit-source-id: e897ab42ace6b82b2a90729ba788313c7873de1a	2019-12-02 20:14:46 -08:00
Jeremy Lilley	4dab29a2bd	Fix serialization memory lifetime issue. (#30603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30603 Pickler object needs to be kept in scope until data is written out to the final serialized string. tensorData in particular is a reference to memory owned by the descoped Pickle object. Noticed this by inspection. In practice, this potential read-after-free here is limited to non-cpu tensors, and any such use was very soon after free. ghstack-source-id: 94756036 Test Plan: existing test suite at buck test mode/dev-nosan caffe2/test:rpc_fork Differential Revision: D18760463 fbshipit-source-id: 9de890d66626aa48f13ca376dd9bd50b92e0cb00	2019-12-02 20:10:28 -08:00
Pritam Damania	db81e13d6b	Fix TCPStoreTest and improve tcputils::connect() (#30354 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30354 TCPStoreTest would timeout since the TCPStore constructor for the server would block the main thread waiting for workers. The workers themselves were spawned later on once the server store is created. As a result, this test would always timeout. To fix the test, I moved the server store to a thread so that the workers can register with the server in parallel. In addition to this made a few improvements to tcputils::connect. When tcputils::connect() encountered an exception, it always looked at `errno` for the error code. In some cases `errno` could be overwritten and the real error code would be stored in `std::system_error`. As a result, I've modified the code to look at the error code in `std::system_error` if we catch an exception of that type. ghstack-source-id: 94758939 Test Plan: waitforbuildbot Differential Revision: D18668454 fbshipit-source-id: d5a3c57b066b094bfecda9a79d9d31bfa32e17f0	2019-12-02 19:52:34 -08:00
Wenlei He	9e3d19412b	Disable implicit conversion warning (#30529 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30529 We started to see build failures for multiple services with top-of-trunk LLVM compiler. The failures point to a warning that was treated as error for implicit conversion from long to double. Per discussion on D18642524, I'm disabling this warning from the containing TARGET file. T58053069 opened for code owner to track this - a proper source code fix and more unit test is needed. Test Plan: local build, sandcastle Reviewed By: smessmer Differential Revision: D18668396 fbshipit-source-id: 28c0ff3258c5ba3afd41a0053f9fe1b356a496a8	2019-12-02 18:30:03 -08:00
Supriya Rao	968c0d4a46	Add support for converting quantized AvgPool2d and Reshape operations (#30490 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30490 Add symbolic mapping to Int8AvgPool2d and Int8Reshape op in C2 Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps Imported from OSS Differential Revision: D18740520 fbshipit-source-id: 1606125500c4b549fbc984e7929b7fd5204396a0	2019-12-02 18:15:01 -08:00
Pritam Damania	2d0a4e42e9	Add barriers to fix flaky test_graph_for_py_nested_call and (#30624 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30624 These tests were flaky since we would end up calling the 'verify' methods before some of the RPCs were done. The `check_rpc_done` function might not guarantee this since set_rpc_done sets an appropriate flag in python which causes `check_rpc_done` to pass. Although, there are a few steps after that like attaching the send functions for the response of the RPC that might not have executed by then. ghstack-source-id: 94781954 Test Plan: Run the tests 100 times. Reviewed By: zhaojuanmao Differential Revision: D18768786 fbshipit-source-id: a14c3f4b27de14fe5ecc6e90854dc52652f769b8	2019-12-02 18:12:28 -08:00
Michael Ranieri	98ab55fc51	PRAGMA missing for clang (#30351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30351 Not sure what proper fix is, clang is having trouble with the loop pragmas. This at least gets things compiling. ghstack-source-id: 94458450 Test Plan: CI passes Differential Revision: D18665812 fbshipit-source-id: b8a899ce4138010cbe308eaa2c0838dd9e15573f	2019-12-02 17:50:22 -08:00
davidriazati	9c02b88791	Add pickler support for Device (#30131 ) Summary: This PR adds (un)pickling support for `c10::Device`. It also adds `torch.device` as a type annotation for device attributes. ](https://our.intern.facebook.com/intern/diff/18664421/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30131 Pulled By: driazati Differential Revision: D18664421 fbshipit-source-id: 64378fb42b2d1bbe2bd86259e5ed10f24b5d1e49	2019-12-02 17:43:08 -08:00
davidriazati	19b7d49fac	Add TOC to CONTRIBUTING.md (#29671 ) Summary: This TOC is manually generated but `CONTRIBUTING.md` seems like its stable enough for that to be okay Pull Request resolved: https://github.com/pytorch/pytorch/pull/29671 Pulled By: driazati Differential Revision: D18771604 fbshipit-source-id: 0d6c9c6cf1083d3be413219d3cead79c2fe5050b	2019-12-02 16:47:59 -08:00
Gregory Chanan	569729527b	Turn off scalar_checks for exp, cos, cosh, tan, atan, tanh, erf, erfc. (#30434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30434 These are all pointwise ops that are implemented correctly wrt shapes in THC. Test Plan: Imported from OSS Differential Revision: D18699087 Pulled By: gchanan fbshipit-source-id: 82cb91b00c77bfaca75be497c87fc7ae52daf46c	2019-12-02 16:10:25 -08:00
Sebastian Messmer	9082123038	Back out "Back out "Revert D18542342: Boxed variable dispatch"" Summary: Original commit changeset: 7f3e32a6ee0c Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D18766763 fbshipit-source-id: 51bb7aac7cb7ce3df94681e838949e7a156e3ad9	2019-12-02 16:06:36 -08:00
Mingbo Wan	3636cb0364	windows build (#30556 ) Summary: based on https://github.com/pytorch/pytorch/pull/28677 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30556 Differential Revision: D18764040 Pulled By: mingbowan fbshipit-source-id: 53104636800f5887b74a82c154bc5e9603de9322	2019-12-02 14:54:22 -08:00
Jongsoo Park	d32f261f16	make the order btw div and mul in adagrad update consistent (#30449 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30449 There was an inconsistency in the order of operation between scalar and SIMD code when we compute Adagrad. In this diff we first compute effective_lr = lr / (sqrt(moment) + epsilon) and then multiply with gradient. Test Plan: CI Reviewed By: protonu Differential Revision: D18703416 fbshipit-source-id: 2a8b2a3f5401466549561412bd22f07abac3c598	2019-12-02 13:53:38 -08:00
Edward Yang	1111a6b810	Use pybind11::gil_scoped_* functions instead of AutoGIL/AutoNoGIL (#30274 ) Summary: Reland of https://github.com/pytorch/pytorch/pull/29095 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30274 Differential Revision: D18762293 Pulled By: ezyang fbshipit-source-id: d3d50c2dd12bcb678ab25fa708eb6587cc4b66f9	2019-12-02 12:19:58 -08:00
peterjc123	6deb41c88d	Update magma to 2.5.1 for Windows and switch CUDA in CI to 9.2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30513 Differential Revision: D18764184 Pulled By: ezyang fbshipit-source-id: 4992869fd6a89471a5d25eb6a9b44ad8eceb480f	2019-12-02 11:56:10 -08:00
Mingzhe Li	b68d1fc316	add small input shapes to some ops (#30617 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30617 as title Test Plan: buck run //caffe2/benchmarks/operator_benchmark:benchmark_all_test -- --iterations 1 --operator add,as_strided,cat,chunk,fill,linear,matmul,split Reviewed By: hl475 Differential Revision: D18764248 fbshipit-source-id: 510cf83542822acfa1b7b5e475b0cc7432f7ac19	2019-12-02 10:46:43 -08:00
Liu Xiteng	8ee61e0be4	Fix CPU_INTEL flag error on windows (#30564 ) Summary: ${CMAKE_HOST_SYSTEM_PROCESSOR} get processor name by `uname -p` on linux and `%PROCESSOR_ARCHITECTURE%` on windows 1. %PROCESSOR_ARCHITECTURE% has value in (AMD64\|IA64\|ARM64) for 64-bit processor, and (x86) for 32-bit processor 2. `uname -p` has value like "(x86_64\|i[3-6]+86)" We cannot tell intel cpu from other cpus by ${CMAKE_HOST_SYSTEM_PROCESSOR}. It is the architecture, not provider. i. e. Intel CPU i7-9700K CPU on windows get "AMD64" reference: [MSDN](https://docs.microsoft.com/zh-cn/windows/win32/winprog64/wow64-implementation-details?redirectedfrom=MSDN) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30564 Differential Revision: D18763031 Pulled By: ezyang fbshipit-source-id: 11ae20e66b4b89bde1dcf4df6177606a3374c671	2019-12-02 08:43:01 -08:00
Jeremy Lilley	e6000a7c04	Temporarily disable test_numerical_consistency_per_tensor (#30600 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30600 test_numerical_consistency_per_tensor in test_fake_quant is failing on Windows. ghstack-source-id: 94742124 Test Plan: CircleCI tests Differential Revision: D18760287 fbshipit-source-id: 7f59355eab74e811bb370ad2836ed2f1def1f621	2019-12-02 06:57:14 -08:00
Jeremy Lilley	c780610f2d	Disable test_backward_per_tensor in test_fake_quant (#30594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30594 This testcase started breaking, clean up for the build. ghstack-source-id: 94736837 Test Plan: Unittest disabling change Differential Revision: D18758635 fbshipit-source-id: 05df1158ff0ccd75e401f352da529fb663b1cae0	2019-12-01 22:26:28 -08:00
Peter Bell	53785771a7	Don't build test_cpp_rpc if torch is built without distributed support (#30587 ) Summary: On the latest master, I get link errors when building one of the tests: ```sh /home/pbell/git/pytorch/build/../test/cpp/rpc/test_wire_serialization.cpp:23: undefined reference to `torch::distributed::rpc::wireDeserialize(void const*, unsigned long)' ``` This seems to be caused by PR https://github.com/pytorch/pytorch/issues/29785 not working with `USE_DISTRIBUTED=0`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30587 Differential Revision: D18758625 Pulled By: jjlilley fbshipit-source-id: 0ad0703acdbbac22bb4b8317370fbe2606fcb67e	2019-12-01 16:43:12 -08:00
Shen Li	dd52f50fc8	Add examples to RRef doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30516 Test Plan: Imported from OSS Differential Revision: D18728183 Pulled By: mrshenli fbshipit-source-id: af472ebed0e6dd0a85653b080abd3ac4d482bd26	2019-11-28 15:34:26 -08:00
Shen Li	30d70d5378	Make doc source format consistent in rpc/init.cpp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30515 Test Plan: Imported from OSS Differential Revision: D18728184 Pulled By: mrshenli fbshipit-source-id: 7b643c7f8225943113fbd7130ff6aadb30c1d4e9	2019-11-28 15:34:22 -08:00
Shen Li	ec5e471647	Reorganize rpc API doc and add introduction (#30491 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30491 Our RPC API docs presents the APIs well but misses a general introduction to the APIs. Readers might be a little lost the first time landing this page. This commits reorganizes the APIs into four components from user's perspective, RPC, RRef, dist autograd, and dist optimizer. It also adds an intro to each and briefly discribes why we provide those. Test Plan: Imported from OSS Differential Revision: D18723294 Pulled By: mrshenli fbshipit-source-id: 4aced4ab537b070aa780aaaf9724659fd47cb3cb	2019-11-28 15:34:18 -08:00
Jeremy Lilley	f4e7e9039d	Improve process_group_agent() serialization speed (#29785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29785 TLDR: This change improves process_group's serialization speed: Serialize_Tensor64: 12.38us -> 1.99us (~-84%) Deserialize_Tensor64: 33.89us -> 5.62us (~-84%) Serialize_Tensor1M: 525.74us -> 285.43us (~-45%) Deserialize_Tensor1M: 892.61us -> 273.68us (~-70%) After speaking with the jit team, we had consensus that torch::save()/load() are somewhat high-overhead for RPC serialization, mostly intended for persistent disk data. (Particularly, for large tensors, 35% of the time is spent in CRC checking, even with the fb-side changes to subsitute 40x faster SSE-accelerated crc checking; Also, for small tensors, the zip container overhead is considerable, as is the overhead of lexing/parsing an embedded text python program for each RPC). The jit team encouraged us to use jit::pickler, with the WriteableTensorData way of outputting result tensors (not the default side-tensor table, or with pickling the actual tensors). This ends up just pickling some tensor metadata, and giving us some tensor blobs that we can mindlessly blit over the wire (they copy to cpu memory if needed). There is yet no standardized container format for the pickled data (there is jit::pickle_save() checked in, but but it's experimental, no load function is yet provided), but they encouraged us to just use something sensible for this, and possibly revisit later. For now, I made the directory headers slightly http-inspired. Note that serialization is just one component of the pipeline, but that said, we also see reasonable reductions in end-to-end echo times (noisier): ProcessGroupAgent_Echo(Tensor_Small) 855.25us -> 492.65us (~-42%) ProcessGroupAgent_Echo(Tensor_1M) 10.82ms -> 6.94ms (~-35%) ProcessGroupAgent_Echo(Small_NoTensor) 688.82us -> 301.72us (~-56%) ProcessGroupAgent_Echo(1MB_NoTensor) 4.65ms -> 3.71ms (~-20%) I moved the "wire serialization" logic to a separate file to assist with unittesting. ghstack-source-id: 94694682 Test Plan: buck test mode/dev-nosan caffe2/test/cpp/api:serialize buck test mode/dev-nosan caffe2/test/... Differential Revision: D18493938 fbshipit-source-id: 07ddfe87dbe56472bc944f7d070627052c94a8f4	2019-11-28 09:57:52 -08:00
Rohan Varma	1350b99de4	Add local shutdown to process group agent (#30330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30330 This is now possible due to previous changes made in `gloo` and `ProcessGroupGloo`. We `abort` the listener thread that is waiting for a message, and join all other threads. The API is changed so that the previous `wait_all_workers` does not destroy the agent, and this is now done in a new `shutdown` method. All callsites are updated appropriately. ghstack-source-id: 94673884 ghstack-source-id: 94673884 Test Plan: Unit tests pass. Reviewed By: mrshenli Differential Revision: D18661775 fbshipit-source-id: 5aaa7c14603e18253394224994f6cd43234301c2	2019-11-27 22:34:08 -08:00
Will Feng	7ac8efa689	Skip undefined tensors when moving torch::nn module to a different device (#30523 ) Summary: This fixes high-pri issues such as https://github.com/pytorch/pytorch/issues/30508 and https://github.com/pytorch/pytorch/issues/30462. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30523 Differential Revision: D18732904 Pulled By: yf225 fbshipit-source-id: fe5a7a43838000f5803bd9c01ecfba0c3f02df5d	2019-11-27 21:21:02 -08:00
Sebastian Messmer	640109ae5d	Back out "Revert D18542342: Boxed variable dispatch" Summary: Original commit changeset: 082992125447 Test Plan: waitforsandcastle Reviewed By: akinh Differential Revision: D18737627 fbshipit-source-id: 7f3e32a6ee0c330002ae7fdcc8a35e8b540bb4db	2019-11-27 17:39:09 -08:00
Lu Fang	87f29557bd	Ignore logical_and and logical_or in op BC check for now (#30537 ) Summary: Get the CI happy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30537 Reviewed By: hl475 Differential Revision: D18738567 Pulled By: houseroad fbshipit-source-id: f30a87e22653b83ebdb1b54851460ec245866ecf	2019-11-27 16:59:37 -08:00
Sebastian Messmer	a2ed50c920	Revert D17908478: Switch PyTorch/Caffe2 to C++14 Test Plan: revert-hammer Differential Revision: D17908478 Original commit changeset: 6e340024591e fbshipit-source-id: 775d2e29be0bc3a0db64f164c8960c44d4877d5d	2019-11-27 14:57:05 -08:00
Gregory Chanan	0b25371f5d	Turn off scalar_check for _th_normal. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29955 Test Plan: Imported from OSS Differential Revision: D18548051 Pulled By: gchanan fbshipit-source-id: c652999ac9e37d2592aa85ef022040fe0700b5cf	2019-11-27 14:52:06 -08:00
Sebastian Messmer	f3631c2464	Revert D18542342: Boxed variable dispatch Test Plan: revert-hammer Differential Revision: D18542342 Original commit changeset: a30ae35d98f8 fbshipit-source-id: 082992125447c814c90f7934fadf00995e146e0e	2019-11-27 14:01:40 -08:00
Karl Ostmo	7d2b0aa693	add retries to network operations (curl, conda install, git clone) (#30479 ) Summary: Addresses some of the top network-related flakiness occurrences. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30479 Differential Revision: D18736386 Pulled By: kostmo fbshipit-source-id: 9eb5dca0cd0281894a0b304fbaf59a0341d3ff58	2019-11-27 13:58:15 -08:00
Richard Zou	c1c5622a6a	Add katex to pytorch-linux-xenial-py3.6-gcc5.4 docker image (#30522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30522 This is in preparation for moving the docs push CI jobs to depend on `pytorch-linux-xenial-py3.6-gcc5.4` rather than `pytorch-linux-xenial-cuda9-cudnn7-py3`. Test Plan: Imported from OSS Differential Revision: D18731108 Pulled By: zou3519 fbshipit-source-id: fd753a5ca818fa73a14e4276c33368a247cc40e1	2019-11-27 12:41:58 -08:00
Tao Xu	a69be8123a	Use `gettimeofday` on iOS (#30361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30361 ### Summary By default, the compiler will choose `clock_gettime` for the iOS build. However, that API is not available until iOS 10. Since the Facebook app still supports iOS 9.0, we have to use `gettimeofday` instead. ```shell xplat/caffe2/torch/csrc/autograd/profiler.h:86:3: error: 'clock_gettime' is only available on iOS 10.0 or newer [-Werror,-Wunguarded-availability] xplat/caffe2/torch/csrc/autograd/profiler.h:86:17: error: '_CLOCK_MONOTONIC' is only available on iOS 10.0 or newer [-Werror,-Wunguarded-availability] ``` P.S. the open-sourced version is iOS 12.0 and above, so we don't have this problem. ### Test Plan - buck build works - Don't break CIs Test Plan: Imported from OSS Differential Revision: D18730262 Pulled By: xta0 fbshipit-source-id: fe6d954b8d3c23cbc9d1e25a2e72e0b0c1d4eaa9	2019-11-27 11:48:41 -08:00
svcscm	2f42488d36	Updating submodules Summary: GitHub commits: `64dc8e79e9` `3b2aa3c218` `dc6c17ca9e` `4508ea4e06` `6150034ff3` `12b7a89a4b` `9befbe9b40` `2fd96cc070` `68bf04ce46` `19bd96d453` `7229ad4fd7` `b2bb2b465b` `4c65c9023d` Test Plan: n/a Reviewed By: 2d2d2d2d2d fbshipit-source-id: e7dc6a4ebafdc6a01aff89f4038f5679ed6e7011	2019-11-27 11:44:54 -08:00
Alban Desmaison	106ab487eb	fix typo in doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30518 Differential Revision: D18729361 Pulled By: albanD fbshipit-source-id: 4e386b99e898b9cd8f9a21dff642d0f40355899f	2019-11-27 11:19:13 -08:00
peterjc123	fcb7371e65	Update docs for cpp_extension on Windows (#30392 ) Summary: Targets https://github.com/pytorch/pytorch/issues/30379. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30392 Differential Revision: D18730438 Pulled By: albanD fbshipit-source-id: f718d006ee8aaaa356c1e15e53a0469f15e8ed41	2019-11-27 10:56:29 -08:00
Sebastian Messmer	d0acc9c085	Switch PyTorch/Caffe2 to C++14 (#30406 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30406 ghstack-source-id: 94642238 Test Plan: waitforsandcastle Differential Revision: D17908478 fbshipit-source-id: 6e340024591ec2c69521668022999df4a33b4ddb	2019-11-27 10:47:31 -08:00
Richard Zou	ec5c08de74	Revert D18580867: Add logical_and and logical_or Test Plan: revert-hammer Differential Revision: D18580867 Original commit changeset: 7e4d7c37da4d fbshipit-source-id: 81fb604c7aef8d847f518f5faa016e7bd0423016	2019-11-27 09:27:00 -08:00
Bowen Bao	1e8ed021c6	Support logsoftmax with dim != -1 (#30433 ) Summary: PyTorch dim and ONNX axis have different meanings. ONNX only supports log_softmax with dim = -1. Transpose must be added before and after log_softmax to support other cases. This requires input rank to be known at export time. Fixes https://github.com/pytorch/pytorch/issues/17918 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30433 Reviewed By: hl475 Differential Revision: D18723520 Pulled By: houseroad fbshipit-source-id: d0ed3b3f051d08d46495a7abfa854edd120dca3a	2019-11-27 08:34:38 -08:00
Pieter Noordhuis	0282c5ae69	Add helper to aggregate multiple process groups (#25768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/25768 The round robin process group can be constructed from multiple other process groups. Every collective call against this new process group is delegated to the specified process groups in a round robin fashion. Doing so may benefit performance when calling into multiple NCCL process groups. Instead of adding support for round-robin usage of NCCL communicators, we achieve the same without changing the NCCL process group and adding this wrapper class. The API to create this round robin process group is a bit harsh. If we find it adds significant benefit we can revisit and make this a first class citizen in the torch.distributed module. ghstack-source-id: 94578376 Test Plan: The newly added test passes. Reviewed By: chenyangyu1988 Differential Revision: D17226323 fbshipit-source-id: ec9f754b66f33b983fee30bfb86a1c4c5d74767d	2019-11-27 08:34:34 -08:00
Pieter Noordhuis	1d3f3a1a0c	Add pybind11 trampoline class for c10d.Store (#30415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30415 This enables subclassing of c10d.Store and implementing its interface in Python. ghstack-source-id: 94586627 Test Plan: New tests passes. Reviewed By: vladbelous Differential Revision: D18693018 fbshipit-source-id: fa1eba4bd11cc09a3d6bf3f35369c885033c63c0	2019-11-27 08:34:29 -08:00
Sebastian Messmer	d2336edcfb	Boxed variable dispatch (#29934 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29934 Previously, when doing boxed dispatch (e.g. custom ops), the dispatcher manually removed the VariableTensorId flag before dispatching because custom ops don't have variable kernels. This is one of the blockers that prevented us from using the boxed dispatch mechanism for ops from native_functions.yaml because they define variable kernels and need them to be called for autograd. This PR changes that. The dispatcher doesn't remove the VariableTensorId flag anymore. Instead, to make custom ops work, we implement a variable fallback kernel that is called whenever no other variable kernel was found. ghstack-source-id: 94618474 Test Plan: unit tests Differential Revision: D18542342 fbshipit-source-id: a30ae35d98f89f7ae507151f55c42cfbed54a451	2019-11-27 08:34:25 -08:00
neginraoof	512c2a2df5	Enable constant folding (#29834 ) Summary: Set default do_constant_folding = True Pull Request resolved: https://github.com/pytorch/pytorch/pull/29834 Reviewed By: hl475 Differential Revision: D18588037 Pulled By: houseroad fbshipit-source-id: b35c06161321629c886e177ea666eff31cebf06a	2019-11-27 08:34:20 -08:00
Junjie Bai	c1c8105de0	Make the warning of using SparseTensor in JIT less noisy Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30499 Test Plan: waitforsandcastle Reviewed By: wanchaol Differential Revision: D18705553 fbshipit-source-id: d6e16e3285a74a1c031a5312f7a690f1baf392f8	2019-11-27 08:34:16 -08:00
Jiakai Liu	829499e626	avoid Formatting::print() when STRIP_ERROR_MESSAGES is set (#30451 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30451 TORCH_CHECK takes __VA_ARGS__ so there is no need to concatenate strings before calling it. This way it won't call Formatting::print() on the tensor when STRIP_ERROR_MESSAGES macro is set. Formatting::print() calls several specific tensor methods that brings in unnecessary inter-op dependencies for static code analysis. Test Plan: - builds Differential Revision: D18703784 Pulled By: ljk53 fbshipit-source-id: 1c0628e3ddcb2fd42c475cb161edbef09dfe8eb5	2019-11-26 17:38:45 -08:00
Daya Khudia	2d6b2f39e9	Fix docs so that the example works (#30120 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30120 The example given for functional conv2d didn't work. This diff fixes the example in docs so that it works. Fixes https://github.com/pytorch/pytorch/issues/29649 ghstack-source-id: 94601559 Test Plan: Tried the example locally Differential Revision: D18604606 fbshipit-source-id: ff1a4f903e2843efe30d962d4ff00e5065cd1d7e	2019-11-26 17:38:40 -08:00
Ivan Kobzarev	5ada5363fc	GenericDict/List type use unshapedType() (#30428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30428 Reported issue https://discuss.pytorch.org/t/incomprehensible-behaviour/61710 Steps to reproduce: ``` class WrapRPN(nn.Module): def __init__(self): super().__init__() def forward(self, features): # type: (Dict[str, Tensor]) -> int return 0 ``` ``` #include <torch/script.h> int main() { torch::jit::script::Module module = torch::jit::load("dict_str_tensor.pt"); torch::Tensor tensor = torch::rand({2, 3}); at::IValue ivalue{tensor}; c10::impl::GenericDict dict{c10::StringType::get(),ivalue.type()}; dict.insert("key", ivalue); module.forward({dict}); } ``` ValueType of `c10::impl::GenericDict` is from the first specified element as `ivalue.type()` It fails on type check in` function_schema_inl.h` !value.type()->isSubtypeOf(argument.type()) as `DictType::isSubtypeOf` requires equal KeyType and ValueType, while `TensorType`s are different. Fix: Use c10::unshapedType for creating Generic List/Dict Test Plan: Imported from OSS Differential Revision: D18717189 Pulled By: IvanKobzarev fbshipit-source-id: 1e352a9c776a7f7e69fd5b9ece558f1d1849ea57	2019-11-26 17:38:36 -08:00
Pavel Belevich	6bd8937aee	FunctionParameter::set_default_str replace \|\| with && Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30471 Test Plan: Imported from OSS Differential Revision: D18710958 Pulled By: pbelevich fbshipit-source-id: 7e5339175c7e16cd975a90bf6b123df728045e4d	2019-11-26 17:38:31 -08:00
Hong Xu	21d7532dfe	Add more comment on NumPy detection in Python scripts. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30417 Differential Revision: D18716502 Pulled By: albanD fbshipit-source-id: 0b1b86f882e0e24cb6845e4a44708048e7e3b4a8	2019-11-26 17:38:27 -08:00
Hong Xu	8bbafa0b32	Add logical_and and logical_or (#28162 ) Summary: Superseding https://github.com/pytorch/pytorch/issues/24379 as type promotion has been implemented. Close https://github.com/pytorch/pytorch/issues/24379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/28162 Differential Revision: D18580867 Pulled By: ailzhang fbshipit-source-id: 7e4d7c37da4dc8df87314bd4f1f6a7539e46586a	2019-11-26 17:38:22 -08:00
Bram Wasti	92e27c5e89	Flag to disable Variable Summary: using `buck build mode/opt mode/no-gpu //experimental/ngimel/benchmark_framework_overheads:cpp_benchmark` ``` devvm497.prn3.facebook.com:/data/users/bwasti/fbsource/fbcode $ ./cpp_benchmark --niter 10000 creating inputs, number of dimensions 1 starting op benchmarking 10000 iterations using cpp frontend elapsed time per iteration 0.90638 us ``` ``` devvm497.prn3.facebook.com:/data/users/bwasti/fbsource/fbcode $ ./cpp_benchmark --niter 10000 --disable_variable_dispatch creating inputs, number of dimensions 1 starting op benchmarking 10000 iterations using cpp frontend elapsed time per iteration 0.775436 us ``` Test Plan: let all tests run Reviewed By: smessmer Differential Revision: D18654276 fbshipit-source-id: 362812b2c87ec428448b2ac65baac45f492fdce4	2019-11-26 17:38:18 -08:00
Santiago Castro	4eff2f2007	Fix missing closing quotes in docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30448 Differential Revision: D18711396 Pulled By: zou3519 fbshipit-source-id: 6e35e0779716185791273eedca7a93667a6cda90	2019-11-26 17:38:13 -08:00
James Reed	05a1644ce3	Fix BC for quantized linear Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30481 Test Plan: Imported from OSS Differential Revision: D18714602 Pulled By: jamesr66a fbshipit-source-id: d51206c22cf2446e98053446789c6324c0481321	2019-11-26 17:38:09 -08:00
Elias Ellison	976d91d30a	Comment on a set of ops bound at the python layer Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30420 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D18713999 Pulled By: eellison fbshipit-source-id: 3a8d6e4431cbfe6a78ca047217c1c53c47403841	2019-11-26 17:38:04 -08:00
Elias Ellison	634f370c63	Add comment to ops bound at python layer Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30419 Test Plan: Imported from OSS Reviewed By: suo Differential Revision: D18714000 Pulled By: eellison fbshipit-source-id: 22ccb941b2db24031921f378c600e68fe70e1346	2019-11-26 17:37:59 -08:00
Deyu Fu	c5a6c4d6c9	Adding elementwise kernel also operating on index (#28175 ) Summary: This PR add `gpu_kernel_with_index` as an addition to element-wise kernel template. It allows kernel to not only operate on input tensor value, but also each values index(view as 1d, so from 0 to numel) within the lambda. Direct use case here is to replace thrust::tabulate used in range/arange/linspace. Benifits are: - thrust::tabulate causes additional unneccessary synchronization on cpu. - Now it works with tensor iterator, output no longer needs to be contiguous and a memcpy is saved It can also potentially be reused to add new function to pytorch later, if we see use case both value and index is needed.(for example unify tril/triu into tensor iterator element-wise? add other pattern?) Known issues: https://github.com/pytorch/pytorch/pull/23586 is needed to enable non-contiguous case work properly, since overlapping needs to be checked. Currently non-contiguous tensor falls into TOO_HARD. I could write proper check in this file but I figured using exist method is better. jjsjann123 It does not work beyond 32bit indexing. But thrust was erroring on those case too. We could split tensor in caller to enable this. Index changes after split, so it is easier for caller to pass different lambda, and harder for the template to handle it in general. Pull Request resolved: https://github.com/pytorch/pytorch/pull/28175 Differential Revision: D18708649 Pulled By: ngimel fbshipit-source-id: 382081c96f266ae7b61095fc1f2af41c6b210fa9	2019-11-26 17:37:55 -08:00
Xingying Cheng	e9cc4a5942	Add @DoNotStrip to nativeNewTensor method. (#30472 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30472 Add DoNotStrip to nativeNewTensor method. ghstack-source-id: 94596624 Test Plan: Triggered build on diff for automation_fbandroid_fallback_release. buck install -r fb4a Tested BI cloaking using pytext lite interpreter. Obverse that logs are sent to scuba table: {F223408345} Reviewed By: linbinyu Differential Revision: D18709087 fbshipit-source-id: 74fa7a0665640c294811a50913a60ef8d6b9b672	2019-11-26 12:16:33 -08:00
Jerry Zhang	fec903ce00	Fix test case after get_qparams refactor (#30470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30470 att Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D18710775 fbshipit-source-id: b1c7c0afbc538ff1d3e19c5d3d6bd425e4f94f06	2019-11-26 12:16:29 -08:00
albanD	b0871f211b	Make all optimizers consistent so that they don't change gradients inplace Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30257 Test Plan: Imported from OSS Differential Revision: D18665461 Pulled By: albanD fbshipit-source-id: cfdafef919468a41007881b82fd288b7128baf95	2019-11-26 12:16:25 -08:00
Xinyi Zhang	45880f4246	Change logging to remove the word "error" from info log Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30468 Reviewed By: xianjiec Differential Revision: D18702959 fbshipit-source-id: a777445bea735dce89182dd95f38907963fab556	2019-11-26 12:16:21 -08:00
vishwakftw	dcd9f49809	Specify ordering on singular values and eigenvalues output from torch… (#30389 ) Summary: ….svd/symeig respectively Changelog: - Adds a note to docstrings of the both functions specifying the ordering Fixes https://github.com/pytorch/pytorch/issues/30301 Pull Request resolved: https://github.com/pytorch/pytorch/pull/30389 Differential Revision: D18707608 Pulled By: zou3519 fbshipit-source-id: b0f73631578f39a24fae9af4997c6491de8be9a8	2019-11-26 10:23:47 -08:00
Gregory Chanan	dbce53fe32	Turn off scalar_check for _th_gather. (#29954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29954 The underlying op handles scalar_check correctly. Test Plan: Imported from OSS Differential Revision: D18548054 Pulled By: gchanan fbshipit-source-id: a1b44afa80c2928b78abbfba8b8b5d3608ac0fd3	2019-11-26 10:23:42 -08:00
Gregory Chanan	72ac45662b	Turn off scalar_checks for torch.take. (#29953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29953 The underlying function handles it correctly. Test Plan: Imported from OSS Differential Revision: D18548055 Pulled By: gchanan fbshipit-source-id: cc2d0ae37d9689423363d115c6a653cb64840528	2019-11-26 10:23:37 -08:00
Gregory Chanan	79a830af56	Turn off scalar_check for Tensor.set_(Tensor) (#29952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29952 The underlying op handles the check correctly. Test Plan: Imported from OSS Differential Revision: D18548048 Pulled By: gchanan fbshipit-source-id: 9ac6fde743408e59ccdfc61bd574ebe6e2862238	2019-11-26 10:23:33 -08:00
BowenBao	0febff36ac	Export dynamic unbind/split and __getitem__ (#29136 ) Summary: In ONNX opset 11, a series of sequence ops were added. Operators that are related to Tensor[] in PyTorch can be exported using these sequence ops. In this PR, unbind/split that produces Tensor[], and __getitem__ that takes Tensor[] as input, are exported correctly to ONNX opset 11. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29136 Reviewed By: hl475 Differential Revision: D18309222 Pulled By: houseroad fbshipit-source-id: be12c96bf8d0a56900683ef579f1c808c0a1af21	2019-11-26 06:54:06 -08:00
Supriya Rao	2599b9b551	Add output_size argument to caffe2 Int8ResizeNearest (#30202 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30202 Pytorch Upsample operator has output_size as an argument. For quantized tensor inputs we cannot get the input_size to calculate the width and height scale factor. Instead we pass the output_size directly to caffe2 to calculate the scale factors. Test Plan: python test/onnx/test_pytorch_onnx_caffe2_quantized.py TestQuantizedOps.test_upsample Imported from OSS Differential Revision: D18631478 fbshipit-source-id: 38a39129bc863f4ecf2293acc068e40ab7edc825	2019-11-26 06:54:02 -08:00
Shen Li	efe1859ad9	By default ignore RRef leaks during shutdown (#30217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30217 Before this commit, RRefContext throws an error if it detects any RRef leak during shutdown. However, this requires applications to make sure that is has freed all references to RRefs in application code, which can be a bad debugging experience when for large applications. Besides, this also relies on Python GC to free things up in time, which might not always be true. After this commit, RRefContext would ignore leaking RRefs during shutdown, as shutdown is called when the application has finished training and no longer care about local states. Hence, it should be OK to just ignore those leaks and destroy OwnerRRefs. If application would like to enforce no leaks, just set torch.distributed.rpc.api._ignore_rref_leak to False. Test Plan: Imported from OSS Differential Revision: D18632546 Pulled By: mrshenli fbshipit-source-id: 2744b2401dafdd16de0e0a76cf8e07777bed0f38	2019-11-26 06:53:58 -08:00
Spandan Tiwari	06db5ad707	Provide names for operator nodes in ONNX exported graph. (#27342 ) Summary: The PyTorch exporter does not add any name to the ONNX operators in the exported graph. A common request is to add names to op nodes by default. This helps the readability of the graph in visualization tools such a Netron, or when the ONNX graph is printed as a string. Also, it helps with the debuggability of the ONNX graph. Therefore this PR adds name to operators in the exporters. The names follow a simple format, <op_type>_<index>. Expect files for tests in `test/onnx/test_operators.py` have been updated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/27342 Reviewed By: hl475 Differential Revision: D17790979 Pulled By: houseroad fbshipit-source-id: 1eaae88b5f51f152735a2ff96e22827837e34d9d	2019-11-26 06:53:53 -08:00
BowenBao	584be86c3f	Try exporting ONNX with force_outplace=False (#29466 ) Summary: This should resolve https://github.com/pytorch/pytorch/issues/29008. This flag has two effects on the tracer. - Remove the underscroll for inplace operators. E.g.: index_put_ ==> index_put. This is handled in utils.py separately as well. - Add out as input for backward computation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29466 Reviewed By: hl475 Differential Revision: D18422815 Pulled By: houseroad fbshipit-source-id: 317b6a3c8a5751fe6fe49d7543e429d281ed0d6d	2019-11-26 06:53:49 -08:00
Raghuraman Krishnamoorthi	eccf42fd15	Bug fix: Handle missing keys in observer state dict during load (#30357 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30357 Fix issue https://github.com/pytorch/pytorch/issues/29032 in loading from state dict for observers and fake quant. ghstack-source-id: 94468814 Test Plan: Ensures that load/save of fake quant and observers with missing keys works correctly. Differential Revision: D18668517 fbshipit-source-id: 0eda6f47c39102e55977fc548b9a03664f123ad7	2019-11-26 06:53:45 -08:00
Ivan Kobzarev	ab5774547a	Add info about transitive dependencies in case of using local aars (#30128 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30128 Preview: https://github.com/pytorch/pytorch/tree/gh/IvanKobzarev/23/head/android Based on users issue: https://discuss.pytorch.org/t/android-somethings-went-wrong-with-pytorch-android-1-4-0-snapshot/61009/3 Test Plan: Imported from OSS Differential Revision: D18702658 Pulled By: IvanKobzarev fbshipit-source-id: 14928baccd58ddbe633fad03038271d8333c4b49	2019-11-26 06:53:40 -08:00
Jonathan Reynolds	085dde5965	Fix for when PyTorch model trace has RecursiveScriptModules (#30430 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30430 When a module isn't a TracedModule, attempt to get name information with `original_name` property on module and default to 'Module' when no such property exists. Test Plan: ### Change child module to scripted module: ``` model = torchvision.models.alexnet() model.classifier = torch.jit.script(model.classifier) ``` ### Add graph ``` w = SummaryWriter() w.add_graph(model, torch.rand((2, 3, 224, 224))) w.close() ``` ### No errors However, graph is disconnected at parts and hard to understand. {F223327878} Reviewed By: sanekmelnikov Differential Revision: D18690836 fbshipit-source-id: 42295d06b7c1d48d5401776dca1e0d12cd64b49d	2019-11-26 06:53:35 -08:00
Sebastian Messmer	8199596d7e	Add missing std::move (#30411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30411 - ghstack-source-id: 94526555 Test Plan: unit tests Differential Revision: D18690385 fbshipit-source-id: fd348c0887c279694c2f6d287b361c8e07f02ffb	2019-11-26 06:53:31 -08:00
Jerry Zhang	661a6c8ef2	Add `get_qparams` and revert the changes to `calculate_qparams` (#30262 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30262 `get_qparams` returns all parameters that's needed to call quantize function Test Plan: python test/test_jit.py Imported from OSS Differential Revision: D18645047 fbshipit-source-id: e57c11a66dac2d589778d412a996796ad5b6f86a	2019-11-26 06:53:26 -08:00
davidriazati	46e7f31fa3	Document unsupported types (#30344 ) Summary: This adds a listing of the parts of the `typing` module that are unsupported This is also a first pass decisions on features are 'unlikely to be implemented' vs 'not implemented' so they're open to discussion ](https://our.intern.facebook.com/intern/diff/18665628/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30344 Pulled By: driazati Differential Revision: D18665628 fbshipit-source-id: 22b8ebbde23df03839306cdb4344ca18a44f2c29	2019-11-26 06:53:22 -08:00
Zhang Zhi	ab2ec4d835	Fix inexistent parameter in document (#24335 ) Summary: There is no `out` argument to `argsort` according to the source code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/24335 Differential Revision: D16829134 Pulled By: vincentqb fbshipit-source-id: 8f91154984cd4a753ba1d6105fb8a9bfa0da22b3	2019-11-26 06:53:17 -08:00
Jerry Zhang	0b71e7e1fd	Refactor QAT Conv module for better extensibility (#30362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30362 Right now the qat modules(qat.ConvBn2d, qat.ConvBnReLU2d, qat.Conv2d) are not convinent to support other dimensions of Conv, this PR refactors these modules so that we can support Conv1d/Conv3d better Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D18691152 fbshipit-source-id: 5b561e6b054eadd31b98cabdf1ac67a61ee9b805	2019-11-26 06:53:12 -08:00
Lingyi Liu	b8f50d9cc8	Support to add dequant for each use of Value (#30145 ) Summary: In this PR, we mainly handle the case there are multiple usage of a Value when inserting the quant-dequant pair. This change will add one dequant for each usage of the Value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30145 Differential Revision: D18671600 Pulled By: lly-zero-one fbshipit-source-id: 61324a98861da85b80dcf7e930381311118ae53b	2019-11-25 14:52:58 -08:00
Gao, Xiang	25f4ba7c1b	Improve compare kernel (#29743 ) Summary: Currently, the way the compare kernels handle dtypes is very funny (this behavior is introduced in https://github.com/pytorch/pytorch/pull/28427 and I just realize it today): Let's say `a, b` are two float tensors on CUDA. If you do `a < b`, this is what would happen inside the loop: - Step 1: Fetch `a` and `b`, dynamically cast them from `float` to `float`. (i.e. check the scalar type to figure out if it needs cast. it doesn't. so do nothing then.) - Step 2: compute `a < b`, get a `bool` result - Step 3: statically cast the result into `float` - Step 3: do a dynamic cast of the result from `float` to `bool` and store the value And if you do `a.lt_(b)`, this is what would happen: - Step 1: Fetch `a` and `b`, no casting - Step 2: compute `a < b`, get a `bool` result - Step 3: statically cast the result into `float` - Step 4: store the result to memory, no casting Although dynamic casting happens on registers, it still hurt the performance a bit (~8%). This PR fixes this issue. Now for compare kernels, if the output is bool and inputs have the same dtype, then there is no dynamic casting. Otherwise, there will be dynamic casting for each input and output. That is, the dynamic casting behavior of the two cases described above are swapped. Benchmark on `a < b` for tensor of 1000000000 fp32 elements: Before https://github.com/pytorch/pytorch/issues/28427 6.35 ms Current master: 6.88 ms With this PR: 6.36 ms Benchmark on `a.lt_(b)` does not show any difference across versions. Besides this, what worries me most is, with type promotion, the logic for tensor iterator is becoming super complicated, and it is hard to see if one change causes the performance regression of others. I suggest we create scripts that could benchmark tensor iterator entirely, review that code and put it somewhere inside the repository (maybe under `/tools` or `/test/scripts`?), and whenever we are not certain about the performance we could run it to check. (I guess not on this PR but on PRs after the script is done. If there are worries about performance, the author of PRs should run the script manually, and the reviewer should remind PR author to do so if necessary) If this is a good idea, I will send a PR for the script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/29743 Differential Revision: D18671269 Pulled By: ngimel fbshipit-source-id: 89a9c1c8b5fd45d5ae8fe907d65c2fe1a7dfd2dc	2019-11-25 14:52:53 -08:00
Rohan Varma	5c6705e62c	add default arg for init_method (#30208 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30208 Adds default arg for init_method so users don't have to pass this in, and moves it to `RpcBackendOptions` struct. Removes `init_method` arg from rpc.init_rpc. Also fixes some docs. ghstack-source-id: 94500475 Test Plan: Unit tests pass. Reviewed By: mrshenli Differential Revision: D18630074 fbshipit-source-id: 04b7dd7ec96f4c4da311b71d250233f1f262135a	2019-11-25 14:52:48 -08:00
Sebastian Meßmer	d64e2581cc	Add list of supported XCode/CUDA versions to README Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30407 Differential Revision: D18689043 Pulled By: smessmer fbshipit-source-id: cd772451ef31356ed3045ebb1a9c4f5e5e91bb45	2019-11-25 14:52:42 -08:00
Sebastian Messmer	0517323dad	Update osx CI to XCode 9.4 / CUDA 10.0, cudnn 7.6.5 (#30359 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30359 We need this for C++14 support ghstack-source-id: 94519850 Test Plan: unit tests Differential Revision: D18668868 fbshipit-source-id: 87e8eadf0e60a1699fba4524aea53b306b9a7f24	2019-11-25 14:52:37 -08:00
Xiaomeng Yang	c12f9a12a8	Fix quantized ConvReLU3d test (#30266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30266 Fix quantized ConvReLU3d test Test Plan: buck test mode/dev-nosan //caffe2/test:quantized -- "conv" Reviewed By: hl475 Differential Revision: D18645717 fbshipit-source-id: bbe93f9daf5046f2aa05363efc7d0e59eaff37bf	2019-11-25 14:52:32 -08:00
Gregory Chanan	d7ac90e2ef	Stop binding std_single and var_single from TH; they aren't used anymore. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29951 Test Plan: Imported from OSS Differential Revision: D18548057 Pulled By: gchanan fbshipit-source-id: 0143f694517fa8229e53bd2bc636501804a3f80b	2019-11-25 14:52:27 -08:00
Gregory Chanan	0c67311878	Turn off scalar_check for set_(Storage, ...) (#29950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29950 The underlying code handles it correctly. Test Plan: Imported from OSS Differential Revision: D18548052 Pulled By: gchanan fbshipit-source-id: 88b737572c816fb0026ac5e66da7e3f4ab686773	2019-11-25 14:52:22 -08:00
Gregory Chanan	7160300638	Turn off scalar_check for reductions _th_max, _th_min. (#29949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29949 The underlying functions handle this already. Test Plan: Imported from OSS Differential Revision: D18548047 Pulled By: gchanan fbshipit-source-id: 123c9297db4e4315da9b1d996ac8b41aa1b4c7bc	2019-11-25 14:52:17 -08:00
Gregory Chanan	16606e1725	Turn off scalar_check for mode; the underlying code is correct. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29948 Test Plan: Imported from OSS Differential Revision: D18548053 Pulled By: gchanan fbshipit-source-id: 15cdfc24d3e5123497c72dc09c5e6b28cb5e1f88	2019-11-25 14:52:12 -08:00
Gregory Chanan	b8eba7aca9	Turn off scalar_check for ormqr. (#29947 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29947 It requires > 0-dimensional tensors. Test Plan: Imported from OSS Differential Revision: D18548049 Pulled By: gchanan fbshipit-source-id: ce80a42515b59513a0e5ef2b32e2c2b90b4d64f5	2019-11-25 14:52:07 -08:00
Gregory Chanan	7c6cc1d6d4	Turn off scalar_checks for _th_multinomial_alias_draw. (#29946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29946 it requires > 0-dimensional tensors. Test Plan: Imported from OSS Differential Revision: D18548050 Pulled By: gchanan fbshipit-source-id: 4d1e3b53bd701137cc2cb674f95627a5e064a274	2019-11-25 14:52:02 -08:00
Gregory Chanan	6e88ddf352	Turn off scalar_check for _th_addmv and _th_eig as they can never pass. (#29945 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29945 Both functions require at least 1 2-dimensional tensor, so can never return an inferred scalar. Test Plan: Imported from OSS Differential Revision: D18548056 Pulled By: gchanan fbshipit-source-id: f99a41d490b9a5ab5717534c92e4f2e848c743e8	2019-11-25 14:51:56 -08:00
Gregory Chanan	ce5f1a1b25	Turn off scalar_check for masked_select. (#29923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29923 Note that this changes the behavior of masked_select when both "self" and "mask" are 0-dimensional. In previous versions of PyTorch, this would return a 0-dimensional tensor. But the documentation reads: "Returns a new 1-D tensor which indexes the input tensor according to the boolean mask mask which is a BoolTensor." Test Plan: Imported from OSS Differential Revision: D18539560 Pulled By: gchanan fbshipit-source-id: 1637ed2c434fcf8ceead0073aa610581f4a19d21	2019-11-25 14:51:51 -08:00
Gregory Chanan	0c9c62ba6e	Turn off scalar_checks for __and__ and clone. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29880 Test Plan: Imported from OSS Differential Revision: D18521732 Pulled By: gchanan fbshipit-source-id: 7fdf5d8a7b93b43ac32067222cb8df5e790900de	2019-11-25 14:51:46 -08:00
Gregory Chanan	94ad7544ae	Turn off scalar_check for __or__ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29879 Test Plan: Imported from OSS Differential Revision: D18521745 Pulled By: gchanan fbshipit-source-id: 93d17d5e9cad5dd6d2c20221d87408c838d74eca	2019-11-25 14:51:40 -08:00
Gregory Chanan	f994377d28	Turn off scalar_check for lshift, rshift. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29878 Test Plan: Imported from OSS Differential Revision: D18521746 Pulled By: gchanan fbshipit-source-id: 11fd7db79ac8ae76b1a5df25fb0ff59d81fcf394	2019-11-25 14:51:34 -08:00
Edward Yang	99a46b44ea	Use correct API macro in VariableHooksInterface. (#30320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30320 Fixes #30296 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Test Plan: Imported from OSS Differential Revision: D18665704 Pulled By: ezyang fbshipit-source-id: f09a953137fcc105959382254f9b8886af5aea3b	2019-11-25 14:51:29 -08:00
Xingying Cheng	20dfae4099	Fix the crashes for c++ not able to find java class through Jni (#30390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30390 Fix the crashes for c++ not able to find java class through Jni ghstack-source-id: 94499644 Test Plan: buck install -r fb4a Reviewed By: ljk53 Differential Revision: D18667992 fbshipit-source-id: aa1b19c6dae39d46440f4a3e691054f7f8b1d42e	2019-11-25 14:51:23 -08:00
Sebastian Messmer	3990e9d1ca	Improve performance of LeftRight::read() (#30282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30282 The atomic increment/decrements in LeftRight::read() were measurable in perf benchmarks. Let's improve their perf. ghstack-source-id: 94443230 Test Plan: unit tests, perf benchmarks Differential Revision: D18650228 fbshipit-source-id: d184ce8288510ab178e7c7da73562609d1ca3c9f	2019-11-23 15:25:13 -08:00
Sebastian Messmer	0c7e4c1d62	backend fallback test (#29682 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29682 This PR re-introduces backend_fallback_test.cpp, which was previously called boxed_fallback_test.cpp and showed how to use the backend fallback API. ghstack-source-id: 94481314 Test Plan: unit tests Differential Revision: D18462654 fbshipit-source-id: 3e9b5c8f35c05f9cd795f44a5fefd1a0aaf03509	2019-11-23 15:25:09 -08:00
Sebastian Messmer	959a849a23	better boxing (#29681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29681 Remove callUnboxedOnly() and instead use metaprogramming to figure out if an operator can use a boxed fallback or not. This enables boxed fallback for ops in native_functions.yaml even if they don't have `use_c10_dispatcher: full` set, as long as they're in the range of supported types. ghstack-source-id: 94481320 Test Plan: unit tests Differential Revision: D18462653 fbshipit-source-id: 2955e3c4949267520a1734a6a2b919ef5e9684a2	2019-11-23 15:25:05 -08:00
Sebastian Messmer	aa2862b843	Hide the OperatorKernel* argument from the stack based kernel API (#29337 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29337 This argument is needed by boxing wrappers so they're able to get a pointer to the corresponding unboxed kernel and call into it. But if a kernel is registered in a boxed way, we don't need it and should hide this from the API. This is especially needed for the backend fallback API where users would only be left wondering why this argument is there and what it does. Also, hiding it allows us to potentially totally remove it in a future refactoring if we find some way to do so. ghstack-source-id: 94481316 Test Plan: unit tests Differential Revision: D18361991 fbshipit-source-id: 5cef26c896fe3f2a5db730d3bc79dcd62e7ef492	2019-11-23 15:25:01 -08:00
Sebastian Messmer	afdc0bd4ec	OperatorHandle::callBoxed/callUnboxed (#29330 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29330 This makes for a nicer API, especially in backend fallback kernels who get an OperatorHandle instance and can directly call these methods on it. ghstack-source-id: 94481322 Test Plan: unit tests stacked on top Differential Revision: D18357424 fbshipit-source-id: fa8c638335f246c906c8e16186507b4c486afb3f	2019-11-23 15:24:57 -08:00
Sebastian Messmer	fb8c17dde1	Test cases for backend fallback kernels (#29214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29214 - ghstack-source-id: 94481312 Test Plan: unit tests Differential Revision: D18329308 fbshipit-source-id: 1dbae401f2255c69ed16d436f891b9b60c333d81	2019-11-23 15:24:53 -08:00
Sebastian Messmer	583c288232	Add a OperatorHandle argument to boxed kernels (#29201 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29201 This is required for boxed backend fallback kernels (e.g. lazy, AMP) because they need to know which op was actually called. ghstack-source-id: 94481313 Test Plan: I will add unit tests in a diff stacked on top Differential Revision: D18282746 fbshipit-source-id: 339a1bbabd6aff31a587b98f095c75104dfc6f99	2019-11-23 15:24:49 -08:00
Sebastian Messmer	24aabe439a	Make Dispatcher::backendFallbackKernels_ an array (#30340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30340 We already made OperatorEntry::dispatchTable_ an array to be able to avoid the concurrency primitives there, but Dispatcher::backendFallbackKernels_ has the same issue. Let's make it a table too. Since there is some code duplication here, we also factor out the concept of a KernelFunctionTable to be used in both places. ghstack-source-id: 94481317 Test Plan: unit tests Differential Revision: D18663426 fbshipit-source-id: ba82ca5c4cae581eea359d5c0c3a5e23b0f8838c	2019-11-23 15:24:45 -08:00
Sebastian Messmer	7b5045be9d	Remove LeftRight from OperatorEntry and DispatchTable. (#30333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30333 re-export of https://github.com/pytorch/pytorch/pull/30328 ghstack-source-id: 94481321 Differential Revision: D18661518 fbshipit-source-id: 5a35a1ed2fae3b21a43614957a91d648c21bcca1	2019-11-23 15:24:41 -08:00
Sebastian Messmer	4aa692fc91	Convert KernelTable to a flat-indexed array rather than a hashtable. (#30332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30332 - ghstack-source-id: 94481315 Reviewed By: resistor Differential Revision: D18660421 fbshipit-source-id: 9f11434f1c3c234c45f586719182053fa81731f0	2019-11-23 15:24:37 -08:00
Chris Gottbrath	7c4b9042ab	Updates to quantization documentation (#30288 ) Summary: This pull request includes fixes for six quantization doc bugs. https://github.com/pytorch/pytorch/issues/30283 - Rendering issue on QConfig https://github.com/pytorch/pytorch/issues/26305 - Minor doc issue on fuse_modules() https://github.com/pytorch/pytorch/issues/27451 - Issues with ConvReLU2d, ConvReLU3d, and LinearReLU doc issues https://github.com/pytorch/pytorch/issues/26899 - Missing docstrings in torch.nn.intrinsic fused functions https://github.com/pytorch/pytorch/issues/29735 - add discussion of QNNPack to quantization doc page https://github.com/pytorch/pytorch/issues/27938 - some of the quantized functions lack documentation Pull Request resolved: https://github.com/pytorch/pytorch/pull/30288 Differential Revision: D18653368 Pulled By: gottbrath fbshipit-source-id: 410b3dd81ff10909a7f1a7736ca42d7cabf0beb1	2019-11-23 09:29:30 -08:00
Julia Kreutzer	7570b2798a	updating citation (#30267 ) Summary: NIPS -> NeurIPS Pull Request resolved: https://github.com/pytorch/pytorch/pull/30267 Differential Revision: D18672928 Pulled By: soumith fbshipit-source-id: c20f26a0547f94ff39f8ee40e5f0ccc5fcc814af	2019-11-23 07:24:14 -08:00
Lingyi Liu	59ca9b7430	Graph-mode quantization for convolution from traced model (#30245 ) Summary: In the PR, we enhance the graph-mode quantization for aten::_convolution, which could be generated from tracing path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/30245 Differential Revision: D18671597 Pulled By: lly-zero-one fbshipit-source-id: 78a2470fbb0fe0def55d63c6bda7cbb5c89f7848	2019-11-23 01:24:50 -08:00
davidriazati	2a7a39c1af	(de)serialization of values between C++ and Python (#30108 ) Summary: This PR updates `torch::pickle_save` to use the new zipfile format introduced in #29232 and adds `torch::pickle_load` which can decode the zipfile format. Now that `torch.save/load` use this format as well (if the `_use_new_zipfile_serialization` flag is `True`), raw values saved in Python can be loaded in C++ and vice versa. Fixes #20356 ](https://our.intern.facebook.com/intern/diff/18607087/) Pull Request resolved: https://github.com/pytorch/pytorch/pull/30108 Pulled By: driazati Differential Revision: D18607087 fbshipit-source-id: 067cdd5b1cf9c30ddc7e2e5021a8cceee62d8a14	2019-11-23 00:06:07 -08:00
Hector Yuen	ee20e66c48	replace the SLSRQ for their right emulations in the replayer test (#30367 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30367 use the SLS emulations that match the hardware Test Plan: replayer test Differential Revision: D18667605 fbshipit-source-id: 89aee630184737b86ecfb09717437e5c7473e42c	2019-11-23 00:06:03 -08:00
Lingyi Liu	328ec5460f	refactor the observer removal and quantize tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30360 Differential Revision: D18670373 Pulled By: lly-zero-one fbshipit-source-id: 1481d6e4d5ce40376577b8deb0a0f74d5559076e	2019-11-22 21:25:23 -08:00
Shihao Xu	6a00191fc2	Add RpcAgent::getWorkerInfos() (#30241 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30241 We need an API to get all worker infos. This will be used by backend-agnostic `rpc.wait_all_workers()` API. ghstack-source-id: 94454935 Test Plan: # Unit tests ``` buck test mode/dev-nosan //caffe2/test:rpc_fork -- test_get_worker_infos buck-out/gen/caffe2/test/rpc_fork\#binary.par -r test_get_worker_infos ``` ``` buck test mode/dev-nosan //caffe2/test:rpc_fork_thrift -- test_get_worker_infos buck-out/gen/caffe2/test/rpc_fork_thrift\#binary.par -r test_get_worker_infos ``` Differential Revision: D5693412 fbshipit-source-id: 5123c8248b6d44fd36b8a5f381dbabb2660e6f0f	2019-11-22 18:26:30 -08:00
Hongyi Jia	c7f988b8c6	transport open registration (#30167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30167 Pull Request resolved: https://github.com/pytorch/pytorch/pull/29164 - Created GlooDeviceFactory to hide device creation details - Added transport option while on Python interface The reason of making the factory class is to make it easier to extend gloo transport in the future Test Plan: Imported from OSS Reviewed By: satgera, d4l3k Differential Revision: D18596527 fbshipit-source-id: e8114162ee8d841c0e0769315b48356b37d6ca0a	2019-11-22 17:41:52 -08:00
Sebastian Messmer	ac103a5d78	Remove variable wrapping from register_c10_ops (#29207 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29207 The logic calling c10 ops from JIT did some variable wrapping to make sure all results are always variables. Thanks to ezyang, this is not needed anymore because everything is a variable now. ghstack-source-id: 93345590 Test Plan: waitforsandcastle Differential Revision: D18327507 fbshipit-source-id: 86512c5e19d6972d70f125feae172461c25e3cb6	2019-11-22 15:32:55 -08:00
Hao Lu	9fb879934e	Revert D18641413: add unit tests to iOS CI jobs Test Plan: revert-hammer Differential Revision: D18641413 Original commit changeset: 12942206f1de fbshipit-source-id: 4fa76d50fb897db4342d10a4e46a9887e37ef233	2019-11-22 15:24:27 -08:00
Chuan Jiang	6c9b188262	Support in-place update in IndexHashOp (#30275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30275 `IndexHash` did not support in-place update. Reviewed By: kennyhorror Differential Revision: D18612231 fbshipit-source-id: adeccdf1ceb6107454555ff9cdf66fd5e5773f2a	2019-11-22 14:49:28 -08:00
Richard Zou	99a2a0b1ca	Implement torch.diagonal for named tensors (#30193 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30193 Featuring: - Added a NoNamesGuard::reset() function that sets NamesMode back to what it was before the guard. This makes it so that we don't have to create a new context to run code in an unnamed way. - Added a diagonal(Tensor, *, Dimname outdim, Dimname dim1, Dimname dim2, int64_t offset=0) overload. All of the non-tensor arguments are keyword only for readability purposes; something like `tensor.diagonal("A", "B", "C")` would be really confusing. Test Plan: - Added new tests Differential Revision: D18638363 Pulled By: zou3519 fbshipit-source-id: ea37b52a19535f84a69be38e95e569e88f307381	2019-11-22 14:49:23 -08:00
Brian Vaughan	2e709763a3	add wrapper to exclude XLA when running device tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30316 Test Plan: Imported from OSS Differential Revision: D18659286 Pulled By: nairbv fbshipit-source-id: 86d035bb0c54c612868590c3188cfcd969c3f686	2019-11-22 13:04:59 -08:00
David Riazati	8c6f0c0587	Detect TorchScript archives in torch.load (#29339 ) Summary: This PR looks for a `constants.pkl` file at the top level in a zip file in `torch.load`. If found, it calls `torch.jit.load` instead and issues a warning to call `torch.jit.load` directly Pull Request resolved: https://github.com/pytorch/pytorch/pull/29339 Differential Revision: D18611095 Pulled By: driazati fbshipit-source-id: f070a02f6b5509054fc3876b3e8356bbbcc183e1	2019-11-22 12:30:30 -08:00
David Reiss	90cb1e67ff	Fix exception message in Java Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30205 Test Plan: Imported from OSS Reviewed By: linbinyu Differential Revision: D18653568 Pulled By: dreiss fbshipit-source-id: a5fcb809eba641a7fbd0e99e835eceeb248e680c	2019-11-22 12:04:49 -08:00
Chunli Fu	0c18de2623	Add inferBoundShapeOp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/30101 Reviewed By: ipiszy Differential Revision: D18387803 fbshipit-source-id: 5edb6b949257370b62fa6da477bd6ed2f16a9bd1	2019-11-22 12:04:45 -08:00
David Reiss	35e6c1763e	Switch Docker image onda-cuda-cxx11-ubuntu1604 to new uniform name (#29943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/29943 This was apparently the same as "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest", so standardize on that name. Test Plan: This PR, which is stacked on top of a commit that puts one of the jobs using that container into the set of PR builds. Imported from OSS Differential Revision: D18653554 fbshipit-source-id: 40e6c52db02265d61e8166bb1211376faccfc53a	2019-11-22 11:39:55 -08:00

4129 changed files with 366994 additions and 126125 deletions

3

.bazelrc Normal file

View File

 @ -0,0 +1,3 @@
 build --copt=--std=c++14
 build --copt=-I.
 build --copt=-isystem --copt bazel-out/k8-fastbuild/bin

1

.bazelversion Normal file

View File

				`@ -0,0 +1 @@`
				`3.1.0`

									
										8

.circleci/README.md
									
												View File
												
				@ -71,9 +71,9 @@ A **binary configuration** is a collection of

				* release or nightly

				    * releases are stable, nightlies are beta and built every night

				* python version

				    * linux: 2.7m, 2.7mu, 3.5m, 3.6m 3.7m (mu is wide unicode or something like that. It usually doesn't matter but you should know that it exists)

				    * macos: 2.7, 3.5, 3.6, 3.7

				    * windows: 3.5, 3.6, 3.7

				    * linux: 3.5m, 3.6m 3.7m (mu is wide unicode or something like that. It usually doesn't matter but you should know that it exists)

				    * macos: 3.6, 3.7, 3.8

				    * windows: 3.6, 3.7, 3.8

				* cpu version

				    * cpu, cuda 9.0, cuda 10.0

				    * The supported cuda versions occasionally change

				@ -466,7 +466,7 @@ But if you want to try, then I’d recommend

				# Always install miniconda 3, even if building for Python <3

				new_conda="~/my_new_conda"

				conda_sh="$new_conda/install_miniconda.sh"

				curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				curl -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				chmod +x "$conda_sh"

				"$conda_sh" -b -p "$MINICONDA_ROOT"

				rm -f "$conda_sh"

									
										37

.circleci/cimodel/data/binary_build_data.py
									
												View File
												
				@ -5,9 +5,6 @@ for "smoketest" builds.

				Each subclass of ConfigNode represents a layer of the configuration hierarchy.

				These tree nodes encapsulate the logic for whether a branch of the hierarchy

				should be "pruned".

				In addition to generating config.yml content, the tree is also traversed

				to produce a visualization of config dimensions.

				"""

				from collections import OrderedDict

				@ -34,15 +31,13 @@ def get_processor_arch_name(cuda_version):

				LINUX_PACKAGE_VARIANTS = OrderedDict(

				    manywheel=[

				        "2.7m",

				        "2.7mu",

				        "3.5m",

				        "3.6m",

				        "3.7m",

				        "3.8m",

				    ],

				    conda=dimensions.STANDARD_PYTHON_VERSIONS,

				    libtorch=[

				        "2.7m",

				        "3.7m",

				    ],

				)

				@ -52,7 +47,14 @@ CONFIG_TREE_DATA = OrderedDict(

				        wheel=dimensions.STANDARD_PYTHON_VERSIONS,

				        conda=dimensions.STANDARD_PYTHON_VERSIONS,

				        libtorch=[

				            "2.7",

				            "3.7",

				        ],

				    )),

				    windows=(dimensions.CUDA_VERSIONS, OrderedDict(

				        wheel=dimensions.STANDARD_PYTHON_VERSIONS,

				        conda=dimensions.STANDARD_PYTHON_VERSIONS,

				        libtorch=[

				            "3.7",

				        ],

				    )),

				)

				@ -73,6 +75,11 @@ LINUX_GCC_CONFIG_VARIANTS = OrderedDict(

				    ],

				)

				WINDOWS_LIBTORCH_CONFIG_VARIANTS = [

				    "debug",

				    "release",

				]

				class TopLevelNode(ConfigNode):

				    def __init__(self, node_name, config_tree_data, smoke):

				@ -107,6 +114,8 @@ class PackageFormatConfigNode(ConfigNode):

				    def get_children(self):

				        if self.find_prop("os_name") == "linux":

				            return [LinuxGccConfigNode(self, v) for v in LINUX_GCC_CONFIG_VARIANTS[self.find_prop("package_format")]]

				        elif self.find_prop("os_name") == "windows" and self.find_prop("package_format") == "libtorch":

				            return [WindowsLibtorchConfigNode(self, v) for v in WINDOWS_LIBTORCH_CONFIG_VARIANTS]

				        else:

				            return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]

				@ -128,6 +137,16 @@ class LinuxGccConfigNode(ConfigNode):

				        return [ArchConfigNode(self, v) for v in cuda_versions]

				class WindowsLibtorchConfigNode(ConfigNode):

				    def __init__(self, parent, libtorch_config_variant):

				        super(WindowsLibtorchConfigNode, self).__init__(parent, "LIBTORCH_CONFIG_VARIANT=" + str(libtorch_config_variant))

				        self.props["libtorch_config_variant"] = libtorch_config_variant

				    def get_children(self):

				        return [ArchConfigNode(self, v) for v in self.find_prop("cuda_versions")]

				class ArchConfigNode(ConfigNode):

				    def __init__(self, parent, cu):

				        super(ArchConfigNode, self).__init__(parent, get_processor_arch_name(cu))

				@ -145,8 +164,6 @@ class PyVersionConfigNode(ConfigNode):

				        self.props["pyver"] = pyver

				    def get_children(self):

				        smoke = self.find_prop("smoke")

				        package_format = self.find_prop("package_format")

				        os_name = self.find_prop("os_name")

									
										87

.circleci/cimodel/data/binary_build_definitions.py
									
												View File
												
				@ -1,12 +1,12 @@

				from collections import OrderedDict

				import cimodel.data.simple.util.branch_filters as branch_filters

				import cimodel.data.binary_build_data as binary_build_data

				import cimodel.lib.conf_tree as conf_tree

				import cimodel.lib.miniutils as miniutils

				class Conf(object):

				    def __init__(self, os, cuda_version, pydistro, parms, smoke, libtorch_variant, gcc_config_variant):

				    def __init__(self, os, cuda_version, pydistro, parms, smoke, libtorch_variant, gcc_config_variant, libtorch_config_variant):

				        self.os = os

				        self.cuda_version = cuda_version

				@ -15,16 +15,19 @@ class Conf(object):

				        self.smoke = smoke

				        self.libtorch_variant = libtorch_variant

				        self.gcc_config_variant = gcc_config_variant

				        self.libtorch_config_variant = libtorch_config_variant

				    def gen_build_env_parms(self):

				        elems = [self.pydistro] + self.parms + [binary_build_data.get_processor_arch_name(self.cuda_version)]

				        if self.gcc_config_variant is not None:

				            elems.append(str(self.gcc_config_variant))

				        if self.libtorch_config_variant is not None:

				            elems.append(str(self.libtorch_config_variant))

				        return elems

				    def gen_docker_image(self):

				        if self.gcc_config_variant == 'gcc5.4_cxx11-abi':

				            return miniutils.quote("pytorch/conda-cuda-cxx11-ubuntu1604:latest")

				            return miniutils.quote("pytorch/pytorch-binary-docker-image-ubuntu16.04:latest")

				        docker_word_substitution = {

				            "manywheel": "manylinux",

				@ -33,11 +36,9 @@ class Conf(object):

				        docker_distro_prefix = miniutils.override(self.pydistro, docker_word_substitution)

				        # The cpu nightlies are built on the pytorch/manylinux-cuda100 docker image

				        alt_docker_suffix = self.cuda_version or "100"

				        # The cpu nightlies are built on the pytorch/manylinux-cuda102 docker image

				        alt_docker_suffix = self.cuda_version or "102"

				        docker_distro_suffix = "" if self.pydistro == "conda" else alt_docker_suffix

				        if self.cuda_version == "101":

				            return "soumith/manylinux-cuda101@sha256:5d62be90d5b7777121180e6137c7eed73d37aaf9f669c51b783611e37e0b4916"

				        return miniutils.quote("pytorch/" + docker_distro_prefix + "-cuda" + docker_distro_suffix)

				    def get_name_prefix(self):

				@ -63,22 +64,32 @@ class Conf(object):

				        job_def = OrderedDict()

				        job_def["name"] = self.gen_build_name(phase, nightly)

				        job_def["build_environment"] = miniutils.quote(" ".join(self.gen_build_env_parms()))

				        job_def["requires"] = ["setup"]

				        if self.smoke:

				            job_def["requires"].append("update_s3_htmls_for_nightlies")

				            job_def["requires"].append("update_s3_htmls_for_nightlies_devtoolset7")

				            job_def["filters"] = {"branches": {"only": "postnightly"}}

				            job_def["requires"] = [

				                "update_s3_htmls",

				            ]

				            job_def["filters"] = branch_filters.gen_filter_dict(

				                branches_list=["nightly"],

				                tags_list=[branch_filters.RC_PATTERN],

				            )

				        else:

				            job_def["filters"] = {"branches": {"only": "nightly"}}

				            if phase in ["upload"]:

				                filter_branch = "nightly"

				            else:

				                filter_branch = r"/.*/"

				            job_def["filters"] = branch_filters.gen_filter_dict(

				                branches_list=[filter_branch],

				                tags_list=[branch_filters.RC_PATTERN],

				            )

				        if self.libtorch_variant:

				            job_def["libtorch_variant"] = miniutils.quote(self.libtorch_variant)

				        if phase == "test":

				            if not self.smoke:

				                job_def["requires"].append(self.gen_build_name("build", nightly))

				            if not (self.smoke and self.os == "macos"):

				                job_def["requires"] = [self.gen_build_name("build", nightly)]

				            if not (self.smoke and self.os == "macos") and self.os != "windows":

				                job_def["docker_image"] = self.gen_docker_image()

				            if self.cuda_version:

				            if self.os != "windows" and self.cuda_version:

				                job_def["use_cuda_docker_runtime"] = miniutils.quote("1")

				        else:

				            if self.os == "linux" and phase != "upload":

				@ -86,10 +97,15 @@ class Conf(object):

				        if phase == "test":

				            if self.cuda_version:

				                job_def["resource_class"] = "gpu.medium"

				                if self.os == "windows":

				                    job_def["executor"] = "windows-with-nvidia-gpu"

				                else:

				                    job_def["resource_class"] = "gpu.medium"

				        if phase == "upload":

				            job_def["context"] = "org-member"

				            job_def["requires"] = ["setup", self.gen_build_name(upload_phase_dependency, nightly)]

				            job_def["requires"] = [

				                self.gen_build_name(upload_phase_dependency, nightly)

				            ]

				        os_name = miniutils.override(self.os, {"macos": "mac"})

				        job_name = "_".join([self.get_name_prefix(), os_name, phase])

				@ -119,29 +135,54 @@ def gen_build_env_list(smoke):

				            c.find_prop("smoke"),

				            c.find_prop("libtorch_variant"),

				            c.find_prop("gcc_config_variant"),

				            c.find_prop("libtorch_config_variant"),

				        )

				        newlist.append(conf)

				    return newlist

				def predicate_exclude_nonlinux_and_libtorch(config):

				    return config.os == "linux"

				def predicate_exclude_macos(config):

				    return config.os == "linux" or config.os == "windows"

				def get_nightly_uploads():

				    configs = gen_build_env_list(False)

				    mylist = []

				    for conf in configs:

				        phase_dependency = "test" if predicate_exclude_nonlinux_and_libtorch(conf) else "build"

				        phase_dependency = "test" if predicate_exclude_macos(conf) else "build"

				        mylist.append(conf.gen_workflow_job("upload", phase_dependency, nightly=True))

				    return mylist

				def get_post_upload_jobs():

				    """Generate jobs to update HTML indices and report binary sizes"""

				    configs = gen_build_env_list(False)

				    common_job_def = {

				        "context": "org-member",

				        "filters": branch_filters.gen_filter_dict(

				            branches_list=["nightly"],

				            tags_list=[branch_filters.RC_PATTERN],

				        ),

				        "requires": [],

				    }

				    for conf in configs:

				        upload_job_name = conf.gen_build_name(

				            build_or_test="upload",

				            nightly=True

				        )

				        common_job_def["requires"].append(upload_job_name)

				    return [

				        {

				            "update_s3_htmls": {

				                "name": "update_s3_htmls",

				                **common_job_def,

				            },

				        },

				    ]

				def get_nightly_tests():

				    configs = gen_build_env_list(False)

				    filtered_configs = filter(predicate_exclude_nonlinux_and_libtorch, configs)

				    filtered_configs = filter(predicate_exclude_macos, configs)

				    tests = []

				    for conf_options in filtered_configs:

									
										16

.circleci/cimodel/data/caffe2_build_data.py
									
												View File
												
				@ -4,8 +4,9 @@ from cimodel.lib.conf_tree import Ver

				CONFIG_TREE_DATA = [

				    (Ver("ubuntu", "16.04"), [

				        ([Ver("gcc", "5")], [XImportant("onnx_py2")]),

				        ([Ver("clang", "7")], [XImportant("onnx_py3.6")]),

				        ([Ver("clang", "7")], [XImportant("onnx_main_py3.6"),

				                               XImportant("onnx_ort1_py3.6"),

				                               XImportant("onnx_ort2_py3.6")]),

				    ]),

				]

				@ -27,7 +28,9 @@ class TreeConfigNode(ConfigNode):

				        return [self.child_constructor()(self, k, v) for (k, v) in self.subtree]

				    def is_build_only(self):

				        if str(self.find_prop("language_version")) == "onnx_py3.6":

				        if str(self.find_prop("language_version")) == "onnx_main_py3.6" or \

				                str(self.find_prop("language_version")) == "onnx_ort1_py3.6" or \

				                str(self.find_prop("language_version")) == "onnx_ort2_py3.6":

				            return False

				        return set(str(c) for c in self.find_prop("compiler_version")).intersection({

				            "clang3.8",

				@ -36,6 +39,12 @@ class TreeConfigNode(ConfigNode):

				            "android",

				        }) or self.find_prop("distro_version").name == "macos"

				    def is_test_only(self):

				        if str(self.find_prop("language_version")) == "onnx_ort1_py3.6" or \

				                str(self.find_prop("language_version")) == "onnx_ort2_py3.6":

				            return True

				        return False

				class TopLevelNode(TreeConfigNode):

				    def __init__(self, node_name, subtree):

				@ -68,6 +77,7 @@ class LanguageConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["language_version"] = node_name

				        self.props["build_only"] = self.is_build_only()

				        self.props["test_only"] = self.is_test_only()

				    def child_constructor(self):

				        return ImportantConfigNode

									
										39

.circleci/cimodel/data/caffe2_build_definitions.py
									
												View File
												
				@ -5,14 +5,14 @@ import cimodel.lib.conf_tree as conf_tree

				from cimodel.lib.conf_tree import Ver

				import cimodel.lib.miniutils as miniutils

				from cimodel.data.caffe2_build_data import CONFIG_TREE_DATA, TopLevelNode

				from cimodel.data.simple.util.branch_filters import gen_filter_dict

				from dataclasses import dataclass

				DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/"

				DOCKER_IMAGE_VERSION = 345

				DOCKER_IMAGE_VERSION = "376"

				@dataclass

				@ -23,6 +23,7 @@ class Conf:

				    # for gpu files and host compiler (gcc/clang) for cpu files)

				    compilers: [Ver]

				    build_only: bool

				    test_only: bool

				    is_important: bool

				    @property

				@ -32,8 +33,9 @@ class Conf:

				    # TODO: Eventually we can probably just remove the cudnn7 everywhere.

				    def get_cudnn_insertion(self):

				        omit = self.language == "onnx_py2" \

				            or self.language == "onnx_py3.6" \

				        omit = self.language == "onnx_main_py3.6" \

				            or self.language == "onnx_ort1_py3.6" \

				            or self.language == "onnx_ort2_py3.6" \

				            or set(self.compiler_names).intersection({"android", "mkl", "clang"}) \

				            or str(self.distro) in ["ubuntu14.04", "macos10.13"]

				@ -50,6 +52,13 @@ class Conf:

				    def construct_phase_name(self, phase):

				        root_parts = self.get_build_name_root_parts()

				        build_name_substitutions = {

				            "onnx_ort1_py3.6": "onnx_main_py3.6",

				            "onnx_ort2_py3.6": "onnx_main_py3.6",

				        }

				        if phase == "build":

				            root_parts = [miniutils.override(r, build_name_substitutions) for r in root_parts]

				        return "_".join(root_parts + [phase]).replace(".", "_")

				    def get_platform(self):

				@ -61,9 +70,10 @@ class Conf:

				    def gen_docker_image(self):

				        lang_substitutions = {

				            "onnx_py2": "py2",

				            "onnx_py3.6": "py3.6",

				            "cmake": "py2",

				            "onnx_main_py3.6": "py3.6",

				            "onnx_ort1_py3.6": "py3.6",

				            "onnx_ort2_py3.6": "py3.6",

				            "cmake": "py3",

				        }

				        lang = miniutils.override(self.language, lang_substitutions)

				@ -73,8 +83,10 @@ class Conf:

				    def gen_workflow_params(self, phase):

				        parameters = OrderedDict()

				        lang_substitutions = {

				            "onnx_py2": "onnx-py2",

				            "onnx_py3.6": "onnx-py3.6",

				            "onnx_py3": "onnx-py3",

				            "onnx_main_py3.6": "onnx-main-py3.6",

				            "onnx_ort1_py3.6": "onnx-ort1-py3.6",

				            "onnx_ort2_py3.6": "onnx-ort2-py3.6",

				        }

				        lang = miniutils.override(self.language, lang_substitutions)

				@ -106,16 +118,15 @@ class Conf:

				    def gen_workflow_job(self, phase):

				        job_def = OrderedDict()

				        job_def["name"] = self.construct_phase_name(phase)

				        job_def["requires"] = ["setup"]

				        if phase == "test":

				            job_def["requires"].append(self.construct_phase_name("build"))

				            job_def["requires"] = [self.construct_phase_name("build")]

				            job_name = "caffe2_" + self.get_platform() + "_test"

				        else:

				            job_name = "caffe2_" + self.get_platform() + "_build"

				        if not self.is_important:

				            job_def["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/"]}}

				            job_def["filters"] = gen_filter_dict()

				        job_def.update(self.gen_workflow_params(phase))

				        return {job_name : job_def}

				@ -136,6 +147,7 @@ def instantiate_configs():

				            distro=fc.find_prop("distro_version"),

				            compilers=fc.find_prop("compiler_version"),

				            build_only=fc.find_prop("build_only"),

				            test_only=fc.find_prop("test_only"),

				            is_important=fc.find_prop("important"),

				        )

				@ -150,10 +162,11 @@ def get_workflow_jobs():

				    x = []

				    for conf_options in configs:

				        phases = ["build"]

				        if not conf_options.build_only:

				            phases = dimensions.PHASES

				        if conf_options.test_only:

				            phases = ["test"]

				        for phase in phases:

				            x.append(conf_options.gen_workflow_job(phase))

									
										5

.circleci/cimodel/data/dimensions.py
									
												View File
												
				@ -3,13 +3,12 @@ PHASES = ["build", "test"]

				CUDA_VERSIONS = [

				    None,  # cpu build

				    "92",

				    "100",

				    "101",

				    "102",

				]

				STANDARD_PYTHON_VERSIONS = [

				    "2.7",

				    "3.5",

				    "3.6",

				    "3.7",

				    "3.8"

				]

									
										103

.circleci/cimodel/data/pytorch_build_data.py
									
												View File
												
				@ -4,17 +4,14 @@ from cimodel.lib.conf_tree import ConfigNode, X, XImportant

				CONFIG_TREE_DATA = [

				    ("xenial", [

				        (None, [

				            XImportant("2.7.9"),

				            X("2.7"),

				            XImportant("3.5"),  # Not run on all PRs, but should be included on [test all]

				            X("nightly"),

				        ]),

				        ("gcc", [

				            ("5.4", [  # All this subtree rebases to master and then build

				                XImportant("3.6"),

				                ("3.6", [

				                    ("parallel_tbb", [XImportant(True)]),

				                    ("parallel_native", [XImportant(True)]),

				                    ("parallel_tbb", [X(True)]),

				                    ("parallel_native", [X(True)]),

				                ]),

				            ]),

				            # TODO: bring back libtorch test

				@ -24,41 +21,44 @@ CONFIG_TREE_DATA = [

				            ("5", [

				                XImportant("3.6"),  # This is actually the ASAN build

				            ]),

				            # ("7", [

				            #     ("3.6", [

				            #         ("xla", [XImportant(True)]),

				            #     ]),

				            # ]),

				        ]),

				        ("cuda", [

				            ("9", [

				                # Note there are magic strings here

				                # https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L21

				                # and

				                # https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L143

				                # and

				                # https://github.com/pytorch/pytorch/blob/master/.jenkins/pytorch/build.sh#L153

				                # (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453144)

				            ("9.2", [

				                X("3.6"),

				                ("3.6", [

				                    ("cuda_gcc_override", [X("gcc5.4")])

				                ])

				            ]),

				            ("10.1", [X("3.6")]),

				            ("10.2", [

				                XImportant("3.6"),

				                ("3.6", [

				                    ("libtorch", [XImportant(True)])

				                ]),

				            ]),

				            ("9.2", [X("3.6")]),

				            ("10", [X("3.6")]),

				            ("10.1", [X("3.6")]),

				        ]),

				        ("android", [

				            ("r19c", [

				                ("3.6", [

				                    ("android_abi", [XImportant("x86_32")]),

				                    ("android_abi", [X("x86_64")]),

				                    ("android_abi", [X("arm-v7a")]),

				                    ("android_abi", [X("arm-v8a")]),

				                ])

				            ("11.0", [

				                X("3.8"),

				                ("3.8", [

				                    ("libtorch", [X(True)])

				                ]),

				            ]),

				        ]),

				    ]),

				    ("bionic", [

				        ("clang", [

				            ("9", [

				                XImportant("3.6"),

				            ]),

				            ("9", [

				                ("3.6", [

				                    ("xla", [XImportant(True)]),

				                ]),

				            ]),

				        ]),

				        ("gcc", [

				            ("9", [XImportant("3.8")]),

				        ]),

				    ]),

				]

				@ -101,6 +101,7 @@ class DistroConfigNode(TreeConfigNode):

				        next_nodes = {

				            "xenial": XenialCompilerConfigNode,

				            "bionic": BionicCompilerConfigNode,

				        }

				        return next_nodes[distro]

				@ -128,7 +129,8 @@ class ExperimentalFeatureConfigNode(TreeConfigNode):

				            "parallel_native": ParallelNativeConfigNode,

				            "libtorch": LibTorchConfigNode,

				            "important": ImportantConfigNode,

				            "android_abi": AndroidAbiConfigNode,

				            "build_only": BuildOnlyConfigNode,

				            "cuda_gcc_override": CudaGccOverrideConfigNode

				        }

				        return next_nodes[experimental_feature]

				@ -143,6 +145,7 @@ class XlaConfigNode(TreeConfigNode):

				    def child_constructor(self):

				        return ImportantConfigNode

				class ParallelTBBConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "PARALLELTBB=" + str(label)

				@ -153,6 +156,7 @@ class ParallelTBBConfigNode(TreeConfigNode):

				    def child_constructor(self):

				        return ImportantConfigNode

				class ParallelNativeConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "PARALLELNATIVE=" + str(label)

				@ -163,6 +167,7 @@ class ParallelNativeConfigNode(TreeConfigNode):

				    def child_constructor(self):

				        return ImportantConfigNode

				class LibTorchConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "BUILD_TEST_LIBTORCH=" + str(label)

				@ -173,14 +178,23 @@ class LibTorchConfigNode(TreeConfigNode):

				    def child_constructor(self):

				        return ImportantConfigNode

				class AndroidAbiConfigNode(TreeConfigNode):

				class CudaGccOverrideConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["android_abi"] = node_name

				        self.props["cuda_gcc_override"] = node_name

				    def child_constructor(self):

				        return ImportantConfigNode

				class BuildOnlyConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["build_only"] = node_name

				    def child_constructor(self):

				        return ImportantConfigNode

				class ImportantConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return "IMPORTANT=" + str(label)

				@ -206,6 +220,20 @@ class XenialCompilerConfigNode(TreeConfigNode):

				        return XenialCompilerVersionConfigNode if self.props["compiler_name"] else PyVerConfigNode

				class BionicCompilerConfigNode(TreeConfigNode):

				    def modify_label(self, label):

				        return label or "<unspecified>"

				    def init2(self, node_name):

				        self.props["compiler_name"] = node_name

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return BionicCompilerVersionConfigNode if self.props["compiler_name"] else PyVerConfigNode

				class XenialCompilerVersionConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["compiler_version"] = node_name

				@ -213,3 +241,12 @@ class XenialCompilerVersionConfigNode(TreeConfigNode):

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return PyVerConfigNode

				class BionicCompilerVersionConfigNode(TreeConfigNode):

				    def init2(self, node_name):

				        self.props["compiler_version"] = node_name

				    # noinspection PyMethodMayBeStatic

				    def child_constructor(self):

				        return PyVerConfigNode

									
										66

.circleci/cimodel/data/pytorch_build_definitions.py
									
												View File
												
				@ -4,18 +4,13 @@ from cimodel.data.pytorch_build_data import TopLevelNode, CONFIG_TREE_DATA

				import cimodel.data.dimensions as dimensions

				import cimodel.lib.conf_tree as conf_tree

				import cimodel.lib.miniutils as miniutils

				from cimodel.data.simple.util.branch_filters import gen_filter_dict

				from cimodel.data.simple.util.docker_constants import gen_docker_image_path

				from dataclasses import dataclass, field

				from typing import List, Optional

				DOCKER_IMAGE_PATH_BASE = "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/"

				# ARE YOU EDITING THIS NUMBER?  MAKE SURE YOU READ THE GUIDANCE AT THE

				# TOP OF .circleci/config.yml

				DOCKER_IMAGE_VERSION = 405

				@dataclass

				class Conf:

				    distro: str

				@ -27,6 +22,7 @@ class Conf:

				    #  tesnrorrt, leveldb, lmdb, redis, opencv, mkldnn, ideep, etc.

				    # (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259453608)

				    is_xla: bool = False

				    vulkan: bool = False

				    restrict_phases: Optional[List[str]] = None

				    gpu_resource: Optional[str] = None

				    dependent_tests: List = field(default_factory=list)

				@ -53,9 +49,10 @@ class Conf:

				        cuda_parms = []

				        if self.cuda_version:

				            cuda_parms.extend(["cuda" + self.cuda_version, "cudnn7"])

				            cudnn = "cudnn8" if self.cuda_version.startswith("11.") else "cudnn7"

				            cuda_parms.extend(["cuda" + self.cuda_version, cudnn])

				        result = leading + ["linux", self.distro] + cuda_parms + self.parms

				        if (not for_docker and self.parms_list_ignored_for_docker_image is not None):

				        if not for_docker and self.parms_list_ignored_for_docker_image is not None:

				            result = result + self.parms_list_ignored_for_docker_image

				        return result

				@ -64,7 +61,7 @@ class Conf:

				        parms_source = self.parent_build or self

				        base_build_env_name = "-".join(parms_source.get_parms(True))

				        return miniutils.quote(DOCKER_IMAGE_PATH_BASE + base_build_env_name + ":" + str(DOCKER_IMAGE_VERSION))

				        return miniutils.quote(gen_docker_image_path(base_build_env_name))

				    def get_build_job_name_pieces(self, build_or_test):

				        return self.get_parms(False) + [build_or_test]

				@ -92,10 +89,8 @@ class Conf:

				        return parameters

				    def gen_workflow_job(self, phase):

				        # All jobs require the setup job

				        job_def = OrderedDict()

				        job_def["name"] = self.gen_build_name(phase)

				        job_def["requires"] = ["setup"]

				        if phase == "test":

				@ -105,16 +100,13 @@ class Conf:

				            #  pytorch build job (from https://github.com/pytorch/pytorch/pull/17323#discussion_r259452641)

				            dependency_build = self.parent_build or self

				            job_def["requires"].append(dependency_build.gen_build_name("build"))

				            job_def["requires"] = [dependency_build.gen_build_name("build")]

				            job_name = "pytorch_linux_test"

				        else:

				            job_name = "pytorch_linux_build"

				        if not self.is_important:

				            # If you update this, update

				            # caffe2_build_definitions.py too

				            job_def["filters"] = {"branches": {"only": ["master", r"/ci-all\/.*/"]}}

				            job_def["filters"] = gen_filter_dict()

				        job_def.update(self.gen_workflow_params(phase))

				        return {job_name : job_def}

				@ -160,7 +152,13 @@ def gen_dependent_configs(xenial_parent_config):

				        configs.append(c)

				    for x in ["pytorch_python_doc_push", "pytorch_cpp_doc_push"]:

				    return configs

				def gen_docs_configs(xenial_parent_config):

				    configs = []

				    for x in ["pytorch_python_doc_push", "pytorch_cpp_doc_push", "pytorch_doc_test"]:

				        configs.append(HiddenConf(x, parent_build=xenial_parent_config))

				    return configs

				@ -182,15 +180,19 @@ def instantiate_configs():

				    root = get_root()

				    found_configs = conf_tree.dfs(root)

				    restrict_phases = None

				    for fc in found_configs:

				        restrict_phases = None

				        distro_name = fc.find_prop("distro_name")

				        compiler_name = fc.find_prop("compiler_name")

				        compiler_version = fc.find_prop("compiler_version")

				        is_xla = fc.find_prop("is_xla") or False

				        parms_list_ignored_for_docker_image = []

				        vulkan = fc.find_prop("vulkan") or False

				        if vulkan:

				            parms_list_ignored_for_docker_image.append("vulkan")

				        python_version = None

				        if compiler_name == "cuda" or compiler_name == "android":

				            python_version = fc.find_prop("pyver")

				@ -210,25 +212,27 @@ def instantiate_configs():

				            android_abi = fc.find_prop("android_abi")

				            parms_list_ignored_for_docker_image.append(android_abi)

				            restrict_phases = ["build"]

				            fc.props["is_important"] = True

				        elif compiler_name:

				            gcc_version = compiler_name + (fc.find_prop("compiler_version") or "")

				            parms_list.append(gcc_version)

				            # TODO: This is a nasty special case

				            if compiler_name == "clang" and not is_xla:

				            if gcc_version == 'clang5' and not is_xla:

				                parms_list.append("asan")

				                python_version = fc.find_prop("pyver")

				                parms_list[0] = fc.find_prop("abbreviated_pyver")

				        if cuda_version in ["9.2", "10", "10.1"]:

				            # TODO The gcc version is orthogonal to CUDA version?

				            parms_list.append("gcc7")

				        if cuda_version:

				            cuda_gcc_version = fc.find_prop("cuda_gcc_override") or "gcc7"

				            parms_list.append(cuda_gcc_version)

				        is_libtorch = fc.find_prop("is_libtorch") or False

				        is_important = fc.find_prop("is_important") or False

				        parallel_backend = fc.find_prop("parallel_backend") or None

				        build_only = fc.find_prop("build_only") or False

				        if build_only and restrict_phases is None:

				            restrict_phases = ["build"]

				        gpu_resource = None

				        if cuda_version and cuda_version != "10":

				@ -241,6 +245,7 @@ def instantiate_configs():

				            python_version,

				            cuda_version,

				            is_xla,

				            vulkan,

				            restrict_phases,

				            gpu_resource,

				            is_libtorch=is_libtorch,

				@ -248,7 +253,16 @@ def instantiate_configs():

				            parallel_backend=parallel_backend,

				        )

				        if cuda_version == "9" and python_version == "3.6" and not is_libtorch:

				        # run docs builds on "pytorch-linux-xenial-py3.6-gcc5.4". Docs builds

				        # should run on a CPU-only build that runs on all PRs.

				        if distro_name == 'xenial' and fc.find_prop("pyver") == '3.6' \

				                and cuda_version is None \

				                and parallel_backend is None \

				                and compiler_name == 'gcc' \

				                and fc.find_prop('compiler_version') == '5.4':

				            c.dependent_tests = gen_docs_configs(c)

				        if cuda_version == "10.1" and python_version == "3.6" and not is_libtorch:

				            c.dependent_tests = gen_dependent_configs(c)

				        if (compiler_name == "gcc"

				@ -275,7 +289,7 @@ def get_workflow_jobs():

				    config_list = instantiate_configs()

				    x = ["setup"]

				    x = []

				    for conf_options in config_list:

				        phases = conf_options.restrict_phases or dimensions.PHASES

0

test/test_module/init.py → .circleci/cimodel/data/simple/init.py

View File

									
										92

.circleci/cimodel/data/simple/android_definitions.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,92 @@

				import cimodel.data.simple.util.branch_filters

				from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_NDK

				class AndroidJob:

				    def __init__(self,

				                 variant,

				                 template_name,

				                 is_master_only=True):

				        self.variant = variant

				        self.template_name = template_name

				        self.is_master_only = is_master_only

				    def gen_tree(self):

				        base_name_parts = [

				            "pytorch",

				            "linux",

				            "xenial",

				            "py3",

				            "clang5",

				            "android",

				            "ndk",

				            "r19c",

				        ] + self.variant + [

				            "build",

				        ]

				        full_job_name = "_".join(base_name_parts)

				        build_env_name = "-".join(base_name_parts)

				        props_dict = {

				            "name": full_job_name,

				            "build_environment": "\"{}\"".format(build_env_name),

				            "docker_image": "\"{}\"".format(DOCKER_IMAGE_NDK),

				        }

				        if self.is_master_only:

				            props_dict["filters"] = cimodel.data.simple.util.branch_filters.gen_filter_dict()

				        return [{self.template_name: props_dict}]

				class AndroidGradleJob:

				    def __init__(self,

				                 job_name,

				                 template_name,

				                 dependencies,

				                 is_master_only=True):

				        self.job_name = job_name

				        self.template_name = template_name

				        self.dependencies = dependencies

				        self.is_master_only = is_master_only

				    def gen_tree(self):

				        props_dict = {

				            "name": self.job_name,

				            "requires": self.dependencies,

				        }

				        if self.is_master_only:

				            props_dict["filters"] = cimodel.data.simple.util.branch_filters.gen_filter_dict()

				        return [{self.template_name: props_dict}]

				WORKFLOW_DATA = [

				    AndroidJob(["x86_32"], "pytorch_linux_build", is_master_only=False),

				    AndroidJob(["x86_64"], "pytorch_linux_build"),

				    AndroidJob(["arm", "v7a"], "pytorch_linux_build"),

				    AndroidJob(["arm", "v8a"], "pytorch_linux_build"),

				    AndroidJob(["vulkan", "x86_32"], "pytorch_linux_build", is_master_only=False),

				    AndroidGradleJob(

				        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-x86_32",

				        "pytorch_android_gradle_build-x86_32",

				        ["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build"],

				        is_master_only=False),

				    AndroidGradleJob(

				        "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build",

				        "pytorch_android_gradle_build",

				        ["pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build",

				         "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build",

				         "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build",

				         "pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build"]),

				]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										63

.circleci/cimodel/data/simple/bazel_definitions.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,63 @@

				from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_GCC7

				def gen_job_name(phase):

				    job_name_parts = [

				        "pytorch",

				        "bazel",

				        phase,

				    ]

				    return "_".join(job_name_parts)

				class BazelJob:

				    def __init__(self, phase, extra_props=None):

				        self.phase = phase

				        self.extra_props = extra_props or {}

				    def gen_tree(self):

				        template_parts = [

				            "pytorch",

				            "linux",

				            "bazel",

				            self.phase,

				        ]

				        build_env_parts = [

				            "pytorch",

				            "linux",

				            "xenial",

				            "py3.6",

				            "gcc7",

				            "bazel",

				            self.phase,

				        ]

				        full_job_name = gen_job_name(self.phase)

				        build_env_name = "-".join(build_env_parts)

				        extra_requires = [gen_job_name("build")] if self.phase == "test" else []

				        props_dict = {

				            "build_environment": build_env_name,

				            "docker_image": DOCKER_IMAGE_GCC7,

				            "name": full_job_name,

				            "requires": extra_requires,

				        }

				        props_dict.update(self.extra_props)

				        template_name = "_".join(template_parts)

				        return [{template_name: props_dict}]

				WORKFLOW_DATA = [

				    BazelJob("build", {"resource_class": "large"}),

				    BazelJob("test"),

				]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										193

.circleci/cimodel/data/simple/binary_smoketest.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,193 @@

				"""

				TODO: Refactor circleci/cimodel/data/binary_build_data.py to generate this file

				       instead of doing one offs here

				 Binary builds (subset, to smoke test that they'll work)

				 NB: If you modify this file, you need to also modify

				 the binary_and_smoke_tests_on_pr variable in

				 pytorch-ci-hud to adjust the list of whitelisted builds

				 at https://github.com/ezyang/pytorch-ci-hud/blob/master/src/BuildHistoryDisplay.js

				 Note:

				 This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710

				 - binary_linux_conda_3_6_cu90_devtoolset7_build

				 - binary_linux_conda_3_6_cu90_devtoolset7_test

				 TODO

				 we should test a libtorch cuda build, but they take too long

				 - binary_linux_libtorch_3_6m_cu90_devtoolset7_static-without-deps_build

				"""

				import cimodel.lib.miniutils as miniutils

				import cimodel.data.simple.util.branch_filters

				class SmoketestJob:

				    def __init__(self,

				                 template_name,

				                 build_env_parts,

				                 docker_image,

				                 job_name,

				                 is_master_only=False,

				                 requires=None,

				                 has_libtorch_variant=False,

				                 extra_props=None):

				        self.template_name = template_name

				        self.build_env_parts = build_env_parts

				        self.docker_image = docker_image

				        self.job_name = job_name

				        self.is_master_only = is_master_only

				        self.requires = requires or []

				        self.has_libtorch_variant = has_libtorch_variant

				        self.extra_props = extra_props or {}

				    def gen_tree(self):

				        props_dict = {

				            "build_environment": " ".join(self.build_env_parts),

				            "name": self.job_name,

				            "requires": self.requires,

				        }

				        if self.docker_image:

				            props_dict["docker_image"] = self.docker_image

				        if self.is_master_only:

				            props_dict["filters"] = cimodel.data.simple.util.branch_filters.gen_filter_dict()

				        if self.has_libtorch_variant:

				            props_dict["libtorch_variant"] = "shared-with-deps"

				        props_dict.update(self.extra_props)

				        return [{self.template_name: props_dict}]

				WORKFLOW_DATA = [

				    SmoketestJob(

				        "binary_linux_build",

				        ["manywheel", "3.7m", "cu102", "devtoolset7"],

				        "pytorch/manylinux-cuda102",

				        "binary_linux_manywheel_3_7m_cu102_devtoolset7_build",

				        is_master_only=True,

				    ),

				    SmoketestJob(

				        "binary_linux_build",

				        ["libtorch", "3.7m", "cpu", "devtoolset7"],

				        "pytorch/manylinux-cuda102",

				        "binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build",

				        is_master_only=False,

				        has_libtorch_variant=True,

				    ),

				    SmoketestJob(

				        "binary_linux_build",

				        ["libtorch", "3.7m", "cpu", "gcc5.4_cxx11-abi"],

				        "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest",

				        "binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build",

				        is_master_only=False,

				        has_libtorch_variant=True,

				    ),

				    SmoketestJob(

				        "binary_mac_build",

				        ["wheel", "3.7", "cpu"],

				        None,

				        "binary_macos_wheel_3_7_cpu_build",

				        is_master_only=True,

				    ),

				    # This job has an average run time of 3 hours o.O

				    # Now only running this on master to reduce overhead

				    SmoketestJob(

				        "binary_mac_build",

				        ["libtorch", "3.7", "cpu"],

				        None,

				        "binary_macos_libtorch_3_7_cpu_build",

				        is_master_only=True,

				    ),

				    SmoketestJob(

				        "binary_windows_build",

				        ["libtorch", "3.7", "cpu", "debug"],

				        None,

				        "binary_windows_libtorch_3_7_cpu_debug_build",

				        is_master_only=False,

				    ),

				    SmoketestJob(

				        "binary_windows_build",

				        ["libtorch", "3.7", "cpu", "release"],

				        None,

				        "binary_windows_libtorch_3_7_cpu_release_build",

				        is_master_only=False,

				    ),

				    SmoketestJob(

				        "binary_windows_build",

				        ["wheel", "3.7", "cu102"],

				        None,

				        "binary_windows_wheel_3_7_cu102_build",

				        is_master_only=True,

				    ),

				    SmoketestJob(

				        "binary_windows_test",

				        ["libtorch", "3.7", "cpu", "debug"],

				        None,

				        "binary_windows_libtorch_3_7_cpu_debug_test",

				        is_master_only=False,

				        requires=["binary_windows_libtorch_3_7_cpu_debug_build"],

				    ),

				    SmoketestJob(

				        "binary_windows_test",

				        ["libtorch", "3.7", "cpu", "release"],

				        None,

				        "binary_windows_libtorch_3_7_cpu_release_test",

				        is_master_only=False,

				        requires=["binary_windows_libtorch_3_7_cpu_release_build"],

				    ),

				    SmoketestJob(

				        "binary_windows_test",

				        ["wheel", "3.7", "cu102"],

				        None,

				        "binary_windows_wheel_3_7_cu102_test",

				        is_master_only=True,

				        requires=["binary_windows_wheel_3_7_cu102_build"],

				        extra_props={

				            "executor": "windows-with-nvidia-gpu",

				        },

				    ),

				    SmoketestJob(

				        "binary_linux_test",

				        ["manywheel", "3.7m", "cu102", "devtoolset7"],

				        "pytorch/manylinux-cuda102",

				        "binary_linux_manywheel_3_7m_cu102_devtoolset7_test",

				        is_master_only=True,

				        requires=["binary_linux_manywheel_3_7m_cu102_devtoolset7_build"],

				        extra_props={

				            "resource_class": "gpu.medium",

				            "use_cuda_docker_runtime": miniutils.quote((str(1))),

				        },

				    ),

				    SmoketestJob(

				        "binary_linux_test",

				        ["libtorch", "3.7m", "cpu", "devtoolset7"],

				        "pytorch/manylinux-cuda102",

				        "binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_test",

				        is_master_only=False,

				        requires=["binary_linux_libtorch_3_7m_cpu_devtoolset7_shared-with-deps_build"],

				        has_libtorch_variant=True,

				    ),

				    SmoketestJob(

				        "binary_linux_test",

				        ["libtorch", "3.7m", "cpu", "gcc5.4_cxx11-abi"],

				        "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest",

				        "binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test",

				        is_master_only=False,

				        requires=["binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build"],

				        has_libtorch_variant=True,

				    ),

				]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										44

.circleci/cimodel/data/simple/docker_definitions.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,44 @@

				from collections import OrderedDict

				from cimodel.lib.miniutils import quote

				# TODO: make this generated from a matrix rather than just a static list

				IMAGE_NAMES = [

				    "pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9",

				    "pytorch-linux-bionic-cuda11.0-cudnn8-py3.8-gcc9",

				    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9",

				    "pytorch-linux-bionic-py3.6-clang9",

				    "pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9",

				    "pytorch-linux-bionic-py3.8-gcc9",

				    "pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7",

				    "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7",

				    "pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7",

				    "pytorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7",

				    "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc5.4",

				    "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7",

				    "pytorch-linux-xenial-py3-clang5-android-ndk-r19c",

				    "pytorch-linux-xenial-py3-clang5-asan",

				    "pytorch-linux-xenial-py3.8",

				    "pytorch-linux-xenial-py3.6-clang7",

				    "pytorch-linux-xenial-py3.6-gcc4.8",

				    "pytorch-linux-xenial-py3.6-gcc5.4",

				    "pytorch-linux-xenial-py3.6-gcc7.2",

				    "pytorch-linux-xenial-py3.6-gcc7",

				    "pytorch-linux-xenial-pynightly",

				    "pytorch-linux-xenial-rocm3.3-py3.6",

				]

				def get_workflow_jobs():

				    """Generates a list of docker image build definitions"""

				    return [

				        OrderedDict(

				            {

				                "docker_build_job": OrderedDict(

				                    {"name": quote(image_name), "image_name": quote(image_name)}

				                )

				            }

				        )

				        for image_name in IMAGE_NAMES

				    ]

									
										103

.circleci/cimodel/data/simple/ge_config_tests.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,103 @@

				import cimodel.lib.miniutils as miniutils

				from cimodel.data.simple.util.versions import MultiPartVersion, CudaVersion

				from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_BASIC, DOCKER_IMAGE_CUDA_10_2

				class GeConfigTestJob:

				    def __init__(self,

				                 py_version,

				                 gcc_version,

				                 cuda_version,

				                 variant_parts,

				                 extra_requires,

				                 use_cuda_docker=False,

				                 build_env_override=None):

				        self.py_version = py_version

				        self.gcc_version = gcc_version

				        self.cuda_version = cuda_version

				        self.variant_parts = variant_parts

				        self.extra_requires = extra_requires

				        self.use_cuda_docker = use_cuda_docker

				        self.build_env_override = build_env_override

				    def get_all_parts(self, with_dots):

				        maybe_py_version = self.py_version.render_dots_or_parts(with_dots) if self.py_version else []

				        maybe_gcc_version = self.gcc_version.render_dots_or_parts(with_dots) if self.gcc_version else []

				        maybe_cuda_version = self.cuda_version.render_dots_or_parts(with_dots) if self.cuda_version else []

				        common_parts = [

				            "pytorch",

				            "linux",

				            "xenial",

				        ] + maybe_cuda_version + maybe_py_version + maybe_gcc_version

				        return common_parts + self.variant_parts

				    def gen_tree(self):

				        resource_class = "gpu.medium" if self.use_cuda_docker else "large"

				        docker_image = DOCKER_IMAGE_CUDA_10_2 if self.use_cuda_docker else DOCKER_IMAGE_BASIC

				        full_name = "_".join(self.get_all_parts(False))

				        build_env = self.build_env_override or "-".join(self.get_all_parts(True))

				        props_dict = {

				            "name": full_name,

				            "build_environment": build_env,

				            "requires": self.extra_requires,

				            "resource_class": resource_class,

				            "docker_image": docker_image,

				        }

				        if self.use_cuda_docker:

				            props_dict["use_cuda_docker_runtime"] = miniutils.quote(str(1))

				        return [{"pytorch_linux_test": props_dict}]

				WORKFLOW_DATA = [

				    GeConfigTestJob(

				        MultiPartVersion([3, 6], "py"),

				        MultiPartVersion([5, 4], "gcc"),

				        None,

				        ["ge_config_legacy", "test"],

				        ["pytorch_linux_xenial_py3_6_gcc5_4_build"]),

				    GeConfigTestJob(

				        MultiPartVersion([3, 6], "py"),

				        MultiPartVersion([5, 4], "gcc"),

				        None,

				        ["ge_config_profiling", "test"],

				        ["pytorch_linux_xenial_py3_6_gcc5_4_build"]),

				    GeConfigTestJob(

				        MultiPartVersion([3, 6], "py"),

				        MultiPartVersion([5, 4], "gcc"),

				        None,

				        ["ge_config_simple", "test"],

				        ["pytorch_linux_xenial_py3_6_gcc5_4_build"],

				    ),

				    GeConfigTestJob(

				        None,

				        None,

				        CudaVersion(10, 2),

				        ["cudnn7", "py3", "ge_config_legacy", "test"],

				        ["pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build"],

				        use_cuda_docker=True,

				        # TODO Why does the build environment specify cuda10.1, while the

				        # job name is cuda10_2?

				        build_env_override="pytorch-linux-xenial-cuda10.1-cudnn7-ge_config_legacy-test"),

				    GeConfigTestJob(

				        None,

				        None,

				        CudaVersion(10, 2),

				        ["cudnn7", "py3", "ge_config_profiling", "test"],

				        ["pytorch_linux_xenial_cuda10_2_cudnn7_py3_gcc7_build"],

				        use_cuda_docker=True,

				        # TODO Why does the build environment specify cuda10.1, while the

				        # job name is cuda10_2?

				        build_env_override="pytorch-linux-xenial-cuda10.1-cudnn7-ge_config_profiling-test"),

				]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										71

.circleci/cimodel/data/simple/ios_definitions.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,71 @@

				from cimodel.data.simple.util.versions import MultiPartVersion

				IOS_VERSION = MultiPartVersion([11, 2, 1])

				class ArchVariant:

				    def __init__(self, name, is_custom=False):

				        self.name = name

				        self.is_custom = is_custom

				    def render(self):

				        extra_parts = ["custom"] if self.is_custom else []

				        return "_".join([self.name] + extra_parts)

				def get_platform(arch_variant_name):

				    return "SIMULATOR" if arch_variant_name == "x86_64" else "OS"

				class IOSJob:

				    def __init__(self, ios_version, arch_variant, is_org_member_context=True, extra_props=None):

				        self.ios_version = ios_version

				        self.arch_variant = arch_variant

				        self.is_org_member_context = is_org_member_context

				        self.extra_props = extra_props

				    def gen_name_parts(self, with_version_dots):

				        version_parts = self.ios_version.render_dots_or_parts(with_version_dots)

				        build_variant_suffix = "_".join([self.arch_variant.render(), "build"])

				        return [

				            "pytorch",

				            "ios",

				        ] + version_parts + [

				            build_variant_suffix,

				        ]

				    def gen_job_name(self):

				        return "_".join(self.gen_name_parts(False))

				    def gen_tree(self):

				        platform_name = get_platform(self.arch_variant.name)

				        props_dict = {

				            "build_environment": "-".join(self.gen_name_parts(True)),

				            "ios_arch": self.arch_variant.name,

				            "ios_platform": platform_name,

				            "name": self.gen_job_name(),

				        }

				        if self.is_org_member_context:

				            props_dict["context"] = "org-member"

				        if self.extra_props:

				            props_dict.update(self.extra_props)

				        return [{"pytorch_ios_build": props_dict}]

				WORKFLOW_DATA = [

				    IOSJob(IOS_VERSION, ArchVariant("x86_64"), is_org_member_context=False),

				    IOSJob(IOS_VERSION, ArchVariant("arm64")),

				    IOSJob(IOS_VERSION, ArchVariant("arm64", True), extra_props={"op_list": "mobilenetv2.yaml"}),

				]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										28

.circleci/cimodel/data/simple/macos_definitions.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,28 @@

				class MacOsJob:

				    def __init__(self, os_version, is_test=False):

				        self.os_version = os_version

				        self.is_test = is_test

				    def gen_tree(self):

				        non_phase_parts = ["pytorch", "macos", self.os_version, "py3"]

				        phase_name = "test" if self.is_test else "build"

				        full_job_name = "_".join(non_phase_parts + [phase_name])

				        test_build_dependency = "_".join(non_phase_parts + ["build"])

				        extra_dependencies = [test_build_dependency] if self.is_test else []

				        job_dependencies = extra_dependencies

				        # Yes we name the job after itself, it needs a non-empty value in here

				        # for the YAML output to work.

				        props_dict = {"requires": job_dependencies, "name": full_job_name}

				        return [{full_job_name: props_dict}]

				WORKFLOW_DATA = [MacOsJob("10_13"), MacOsJob("10_13", True)]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										56

.circleci/cimodel/data/simple/mobile_definitions.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,56 @@

				"""

				PyTorch Mobile PR builds (use linux host toolchain + mobile build options)

				"""

				import cimodel.lib.miniutils as miniutils

				import cimodel.data.simple.util.branch_filters

				from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_ASAN, DOCKER_IMAGE_NDK

				class MobileJob:

				    def __init__(self, docker_image, variant_parts, is_master_only=False):

				        self.docker_image = docker_image

				        self.variant_parts = variant_parts

				        self.is_master_only = is_master_only

				    def gen_tree(self):

				        non_phase_parts = [

				            "pytorch",

				            "linux",

				            "xenial",

				            "py3",

				            "clang5",

				            "mobile",

				        ] + self.variant_parts

				        full_job_name = "_".join(non_phase_parts)

				        build_env_name = "-".join(non_phase_parts)

				        props_dict = {

				            "build_environment": build_env_name,

				            "build_only": miniutils.quote(str(int(True))),

				            "docker_image": self.docker_image,

				            "name": full_job_name,

				        }

				        if self.is_master_only:

				            props_dict["filters"] = cimodel.data.simple.util.branch_filters.gen_filter_dict()

				        return [{"pytorch_linux_build": props_dict}]

				WORKFLOW_DATA = [

				    MobileJob(DOCKER_IMAGE_ASAN, ["build"]),

				    MobileJob(DOCKER_IMAGE_ASAN, ["custom", "build", "static"]),

				    # Use LLVM-DEV toolchain in android-ndk-r19c docker image

				    MobileJob(DOCKER_IMAGE_NDK, ["custom", "build", "dynamic"]),

				    # Use LLVM-DEV toolchain in android-ndk-r19c docker image

				    # Most of this CI is already covered by "mobile-custom-build-dynamic" job

				    MobileJob(DOCKER_IMAGE_NDK, ["code", "analysis"], True),

				]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										73

.circleci/cimodel/data/simple/nightly_android.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,73 @@

				from cimodel.data.simple.util.docker_constants import DOCKER_IMAGE_NDK

				class AndroidNightlyJob:

				    def __init__(self,

				                 variant,

				                 template_name,

				                 extra_props=None,

				                 with_docker=True,

				                 requires=None,

				                 no_build_suffix=False):

				        self.variant = variant

				        self.template_name = template_name

				        self.extra_props = extra_props or {}

				        self.with_docker = with_docker

				        self.requires = requires

				        self.no_build_suffix = no_build_suffix

				    def gen_tree(self):

				        base_name_parts = [

				            "pytorch",

				            "linux",

				            "xenial",

				            "py3",

				            "clang5",

				            "android",

				            "ndk",

				            "r19c",

				        ] + self.variant

				        build_suffix = [] if self.no_build_suffix else ["build"]

				        full_job_name = "_".join(["nightly"] + base_name_parts + build_suffix)

				        build_env_name = "-".join(base_name_parts)

				        props_dict = {

				            "name": full_job_name,

				            "requires": self.requires,

				            "filters": {"branches": {"only": "nightly"}},

				        }

				        props_dict.update(self.extra_props)

				        if self.with_docker:

				            props_dict["docker_image"] = DOCKER_IMAGE_NDK

				            props_dict["build_environment"] = build_env_name

				        return [{self.template_name: props_dict}]

				WORKFLOW_DATA = [

				    AndroidNightlyJob(["x86_32"], "pytorch_linux_build"),

				    AndroidNightlyJob(["x86_64"], "pytorch_linux_build"),

				    AndroidNightlyJob(["arm", "v7a"], "pytorch_linux_build"),

				    AndroidNightlyJob(["arm", "v8a"], "pytorch_linux_build"),

				    AndroidNightlyJob(["android_gradle"], "pytorch_android_gradle_build",

				                      with_docker=False,

				                      requires=[

				                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build",

				                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build",

				                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build",

				                          "nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build"]),

				    AndroidNightlyJob(["x86_32_android_publish_snapshot"], "pytorch_android_publish_snapshot",

				                      extra_props={"context": "org-member"},

				                      with_docker=False,

				                      requires=["nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build"],

				                      no_build_suffix=True),

				]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										68

.circleci/cimodel/data/simple/nightly_ios.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,68 @@

				import cimodel.data.simple.ios_definitions as ios_definitions

				class IOSNightlyJob:

				    def __init__(self,

				                 variant,

				                 is_upload=False):

				        self.variant = variant

				        self.is_upload = is_upload

				    def get_phase_name(self):

				        return "upload" if self.is_upload else "build"

				    def get_common_name_pieces(self, with_version_dots):

				        extra_name_suffix = [self.get_phase_name()] if self.is_upload else []

				        common_name_pieces = [

				            "ios",

				        ] + ios_definitions.IOS_VERSION.render_dots_or_parts(with_version_dots) + [

				            "nightly",

				            self.variant,

				            "build",

				        ] + extra_name_suffix

				        return common_name_pieces

				    def gen_job_name(self):

				        return "_".join(["pytorch"] + self.get_common_name_pieces(False))

				    def gen_tree(self):

				        extra_requires = [x.gen_job_name() for x in BUILD_CONFIGS] if self.is_upload else []

				        props_dict = {

				            "build_environment": "-".join(["libtorch"] + self.get_common_name_pieces(True)),

				            "requires": extra_requires,

				            "context": "org-member",

				            "filters": {"branches": {"only": "nightly"}},

				        }

				        if not self.is_upload:

				            props_dict["ios_arch"] = self.variant

				            props_dict["ios_platform"] = ios_definitions.get_platform(self.variant)

				            props_dict["name"] = self.gen_job_name()

				        template_name = "_".join([

				            "binary",

				            "ios",

				            self.get_phase_name(),

				        ])

				        return [{template_name: props_dict}]

				BUILD_CONFIGS = [

				    IOSNightlyJob("x86_64"),

				    IOSNightlyJob("arm64"),

				]

				WORKFLOW_DATA = BUILD_CONFIGS + [

				    IOSNightlyJob("binary", is_upload=True),

				]

				def get_workflow_jobs():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

0

.python2 → .circleci/cimodel/data/simple/util/init.py

View File

									
										22

.circleci/cimodel/data/simple/util/branch_filters.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,22 @@

				NON_PR_BRANCH_LIST = [

				    "master",

				    r"/ci-all\/.*/",

				    r"/release\/.*/",

				]

				RC_PATTERN = r"/v[0-9]+(\.[0-9]+)*-rc[0-9]+/"

				def gen_filter_dict(

				        branches_list=NON_PR_BRANCH_LIST,

				        tags_list=None

				):

				    """Generates a filter dictionary for use with CircleCI's job filter"""

				    filter_dict = {

				        "branches": {

				            "only": branches_list,

				        },

				    }

				    if tags_list is not None:

				        filter_dict["tags"] = {"only": tags_list}

				    return filter_dict

									
										30

.circleci/cimodel/data/simple/util/docker_constants.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,30 @@

				AWS_DOCKER_HOST = "308535385114.dkr.ecr.us-east-1.amazonaws.com"

				# ARE YOU EDITING THIS NUMBER?  MAKE SURE YOU READ THE GUIDANCE AT THE

				# TOP OF .circleci/config.yml

				DOCKER_IMAGE_TAG = "209062ef-ab58-422a-b295-36c4eed6e906"

				def gen_docker_image_path(container_type):

				    return "/".join([

				        AWS_DOCKER_HOST,

				        "pytorch",

				        container_type + ":" + DOCKER_IMAGE_TAG,

				    ])

				DOCKER_IMAGE_BASIC = gen_docker_image_path("pytorch-linux-xenial-py3.6-gcc5.4")

				DOCKER_IMAGE_CUDA_10_2 = gen_docker_image_path("pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7")

				DOCKER_IMAGE_GCC7 = gen_docker_image_path("pytorch-linux-xenial-py3.6-gcc7")

				def gen_mobile_docker_name(specifier):

				    container_type = "pytorch-linux-xenial-py3-clang5-" + specifier

				    return gen_docker_image_path(container_type)

				DOCKER_IMAGE_ASAN = gen_mobile_docker_name("asan")

				DOCKER_IMAGE_NDK = gen_mobile_docker_name("android-ndk-r19c")

									
										31

.circleci/cimodel/data/simple/util/versions.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,31 @@

				class MultiPartVersion:

				    def __init__(self, parts, prefix=""):

				        self.parts = parts

				        self.prefix = prefix

				    def prefixed_parts(self):

				        """

				        Prepends the first element of the version list

				        with the prefix string.

				        """

				        if self.parts:

				            return [self.prefix + str(self.parts[0])] + list(map(str, self.parts[1:]))

				        else:

				            return [self.prefix]

				    def render_dots(self):

				        return ".".join(self.prefixed_parts())

				    def render_dots_or_parts(self, with_dots):

				        if with_dots:

				            return [self.render_dots()]

				        else:

				            return self.prefixed_parts()

				class CudaVersion(MultiPartVersion):

				    def __init__(self, major, minor):

				        self.major = major

				        self.minor = minor

				        super().__init__([self.major, self.minor], "cuda")

									
										142

.circleci/cimodel/data/windows_build_definitions.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,142 @@

				import cimodel.data.simple.util.branch_filters

				import cimodel.lib.miniutils as miniutils

				from cimodel.data.simple.util.versions import CudaVersion

				class WindowsJob:

				    def __init__(

				        self,

				        test_index,

				        vscode_spec,

				        cuda_version,

				        force_on_cpu=False,

				        master_only_pred=lambda job: job.vscode_spec.year != 2019,

				    ):

				        self.test_index = test_index

				        self.vscode_spec = vscode_spec

				        self.cuda_version = cuda_version

				        self.force_on_cpu = force_on_cpu

				        self.master_only_pred = master_only_pred

				    def gen_tree(self):

				        base_phase = "build" if self.test_index is None else "test"

				        numbered_phase = (

				            base_phase if self.test_index is None else base_phase + str(self.test_index)

				        )

				        key_name = "_".join(["pytorch", "windows", base_phase])

				        cpu_forcing_name_parts = ["on", "cpu"] if self.force_on_cpu else []

				        target_arch = self.cuda_version.render_dots() if self.cuda_version else "cpu"

				        base_name_parts = [

				            "pytorch",

				            "windows",

				            self.vscode_spec.render(),

				            "py36",

				            target_arch,

				        ]

				        prerequisite_jobs = []

				        if base_phase == "test":

				            prerequisite_jobs.append("_".join(base_name_parts + ["build"]))

				        arch_env_elements = (

				            ["cuda" + str(self.cuda_version.major), "cudnn7"]

				            if self.cuda_version

				            else ["cpu"]

				        )

				        build_environment_string = "-".join(

				            ["pytorch", "win"]

				            + self.vscode_spec.get_elements()

				            + arch_env_elements

				            + ["py3"]

				        )

				        is_running_on_cuda = bool(self.cuda_version) and not self.force_on_cpu

				        props_dict = {

				            "build_environment": build_environment_string,

				            "python_version": miniutils.quote("3.6"),

				            "vc_version": miniutils.quote(self.vscode_spec.dotted_version()),

				            "vc_year": miniutils.quote(str(self.vscode_spec.year)),

				            "vc_product": self.vscode_spec.get_product(),

				            "use_cuda": miniutils.quote(str(int(is_running_on_cuda))),

				            "requires": prerequisite_jobs,

				        }

				        if self.master_only_pred(self):

				            props_dict[

				                "filters"

				            ] = cimodel.data.simple.util.branch_filters.gen_filter_dict()

				        name_parts = base_name_parts + cpu_forcing_name_parts + [numbered_phase]

				        if base_phase == "test":

				            test_name = "-".join(["pytorch", "windows", numbered_phase])

				            props_dict["test_name"] = test_name

				            if is_running_on_cuda:

				                props_dict["executor"] = "windows-with-nvidia-gpu"

				        props_dict["cuda_version"] = (

				            miniutils.quote(str(self.cuda_version.major))

				            if self.cuda_version

				            else "cpu"

				        )

				        props_dict["name"] = "_".join(name_parts)

				        return [{key_name: props_dict}]

				class VcSpec:

				    def __init__(self, year, version_elements=None):

				        self.year = year

				        self.version_elements = version_elements or []

				    def get_elements(self):

				        return [self.prefixed_year()] + self.version_elements

				    def get_product(self):

				        return "Community" if self.year == 2019 else "BuildTools"

				    def dotted_version(self):

				        return ".".join(self.version_elements)

				    def prefixed_year(self):

				        return "vs" + str(self.year)

				    def render(self):

				        return "_".join(filter(None, [self.prefixed_year(), self.dotted_version()]))

				def FalsePred(_):

				    return False

				def TruePred(_):

				    return True

				WORKFLOW_DATA = [

				    # VS2017 CUDA-10.1

				    WindowsJob(None, VcSpec(2017, ["14", "11"]), CudaVersion(10, 1), master_only_pred=FalsePred),

				    WindowsJob(1, VcSpec(2017, ["14", "11"]), CudaVersion(10, 1)),

				    # VS2017 no-CUDA (builds only)

				    WindowsJob(None, VcSpec(2017, ["14", "16"]), CudaVersion(10, 1)),

				    WindowsJob(None, VcSpec(2017, ["14", "16"]), None),

				    # VS2019 CUDA-10.1

				    WindowsJob(None, VcSpec(2019), CudaVersion(10, 1)),

				    WindowsJob(1, VcSpec(2019), CudaVersion(10, 1)),

				    WindowsJob(2, VcSpec(2019), CudaVersion(10, 1)),

				    # VS2019 CPU-only

				    WindowsJob(None, VcSpec(2019), None),

				    WindowsJob(1, VcSpec(2019), None),

				    WindowsJob(2, VcSpec(2019), None, master_only_pred=TruePred),

				    WindowsJob(1, VcSpec(2019), CudaVersion(10, 1), force_on_cpu=True),

				    WindowsJob(2, VcSpec(2019), CudaVersion(10, 1), force_on_cpu=True, master_only_pred=TruePred),

				]

				def get_windows_workflows():

				    return [item.gen_tree() for item in WORKFLOW_DATA]

									
										9

.circleci/cimodel/lib/miniyaml.py
									
												View File
												
				@ -1,5 +1,7 @@

				from collections import OrderedDict

				import cimodel.lib.miniutils as miniutils

				LIST_MARKER = "- "

				INDENTATION_WIDTH = 2

				@ -29,7 +31,8 @@ def render(fh, data, depth, is_list_member=False):

				            tuples.sort()

				        for i, (k, v) in enumerate(tuples):

				            if not v:

				                continue

				            # If this dict is itself a list member, the first key gets prefixed with a list marker

				            list_marker_prefix = LIST_MARKER if is_list_member and not i else ""

				@ -43,5 +46,7 @@ def render(fh, data, depth, is_list_member=False):

				            render(fh, v, depth, True)

				    else:

				        # use empty quotes to denote an empty string value instead of blank space

				        modified_data = miniutils.quote(data) if data == "" else data

				        list_member_prefix = indentation + LIST_MARKER if is_list_member else ""

				        fh.write(list_member_prefix + str(data) + "\n")

				        fh.write(list_member_prefix + str(modified_data) + "\n")

									
										84

.circleci/cimodel/lib/visualization.py
									
												View File
											
				@ -1,84 +0,0 @@

				"""

				This module encapsulates dependencies on pygraphviz

				"""

				import colorsys

				import cimodel.lib.conf_tree as conf_tree

				def rgb2hex(rgb_tuple):

				    def to_hex(f):

				        return "%02x" % int(f * 255)

				    return "#" + "".join(map(to_hex, list(rgb_tuple)))

				def handle_missing_graphviz(f):

				    """

				    If the user has not installed pygraphviz, this causes

				    calls to the draw() method of the returned object to do nothing.

				    """

				    try:

				        import pygraphviz  # noqa: F401

				        return f

				    except ModuleNotFoundError:

				        class FakeGraph:

				            def draw(self, *args, **kwargs):

				                pass

				        return lambda _: FakeGraph()

				@handle_missing_graphviz

				def generate_graph(toplevel_config_node):

				    """

				    Traverses the graph once first just to find the max depth

				    """

				    config_list = conf_tree.dfs(toplevel_config_node)

				    max_depth = 0

				    for config in config_list:

				        max_depth = max(max_depth, config.get_depth())

				    # color the nodes using the max depth

				    from pygraphviz import AGraph

				    dot = AGraph()

				    def node_discovery_callback(node, sibling_index, sibling_count):

				        depth = node.get_depth()

				        sat_min, sat_max = 0.1, 0.6

				        sat_range = sat_max - sat_min

				        saturation_fraction = sibling_index / float(sibling_count - 1) if sibling_count > 1 else 1

				        saturation = sat_min + sat_range * saturation_fraction

				        # TODO Use a hash of the node label to determine the color

				        hue = depth / float(max_depth + 1)

				        rgb_tuple = colorsys.hsv_to_rgb(hue, saturation, 1)

				        this_node_key = node.get_node_key()

				        dot.add_node(

				            this_node_key,

				            label=node.get_label(),

				            style="filled",

				            # fillcolor=hex_color + ":orange",

				            fillcolor=rgb2hex(rgb_tuple),

				            penwidth=3,

				            color=rgb2hex(colorsys.hsv_to_rgb(hue, saturation, 0.9))

				        )

				    def child_callback(node, child):

				        this_node_key = node.get_node_key()

				        child_node_key = child.get_node_key()

				        dot.add_edge((this_node_key, child_node_key))

				    conf_tree.dfs_recurse(toplevel_config_node, lambda x: None, node_discovery_callback, child_callback)

				    return dot

									
										17

.circleci/codegen_validation/compare_normalized_yaml.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,17 @@

				#!/bin/bash -xe

				YAML_FILENAME=verbatim-sources/workflows-pytorch-ge-config-tests.yml

				DIFF_TOOL=meld

				# Allows this script to be invoked from any directory:

				cd $(dirname "$0")

				pushd ..

				$DIFF_TOOL $YAML_FILENAME <(./codegen_validation/normalize_yaml_fragment.py < $YAML_FILENAME)

				popd

									
										24

.circleci/codegen_validation/normalize_yaml_fragment.py
									
										Executable file
									
												View File
												
				@ -0,0 +1,24 @@

				#!/usr/bin/env python3

				import os

				import sys

				import yaml

				# Need to import modules that lie on an upward-relative path

				sys.path.append(os.path.join(sys.path[0], '..'))

				import cimodel.lib.miniyaml as miniyaml

				def regurgitate(depth, use_pyyaml_formatter=False):

				    data = yaml.safe_load(sys.stdin)

				    if use_pyyaml_formatter:

				        output = yaml.dump(data, sort_keys=True)

				        sys.stdout.write(output)

				    else:

				        miniyaml.render(sys.stdout, data, depth)

				if __name__ == "__main__":

				    regurgitate(3)

									
										15

.circleci/codegen_validation/overwrite_with_normalized.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,15 @@

				#!/bin/bash -xe

				YAML_FILENAME=$1

				# Allows this script to be invoked from any directory:

				cd $(dirname "$0")

				pushd ..

				TEMP_FILENAME=$(mktemp)

				cat $YAML_FILENAME | ./codegen_validation/normalize_yaml_fragment.py > $TEMP_FILENAME

				mv $TEMP_FILENAME $YAML_FILENAME

				popd

10151

.circleci/config.yml

View File

File diff suppressed because it is too large Load Diff

									
										159

.circleci/docker/build.sh
									
												View File
												
				@ -15,6 +15,8 @@ OS="ubuntu"

				DOCKERFILE="${OS}/Dockerfile"

				if [[ "$image" == *-cuda* ]]; then

				  DOCKERFILE="${OS}-cuda/Dockerfile"

				elif [[ "$image" == *-rocm* ]]; then

				  DOCKERFILE="${OS}-rocm/Dockerfile"

				fi

				if [[ "$image" == *-trusty* ]]; then

				@ -25,32 +27,20 @@ elif [[ "$image" == *-artful* ]]; then

				  UBUNTU_VERSION=17.10

				elif [[ "$image" == *-bionic* ]]; then

				  UBUNTU_VERSION=18.04

				elif [[ "$image" == *-focal* ]]; then

				  UBUNTU_VERSION=20.04

				fi

				TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64"

				# It's annoying to rename jobs every time you want to rewrite a

				# configuration, so we hardcode everything here rather than do it

				# from scratch

				case "$image" in

				  pytorch-linux-bionic-clang9-thrift-llvmdev)

				    CLANG_VERSION=9

				    THRIFT=yes

				    LLVMDEV=yes

				    PROTOBUF=yes

				    ;;

				  pytorch-linux-xenial-py2.7.9)

				    TRAVIS_PYTHON_VERSION=2.7.9

				    GCC_VERSION=7

				    # Do not install PROTOBUF, DB, and VISION as a test

				    ;;

				  pytorch-linux-xenial-py2.7)

				    TRAVIS_PYTHON_VERSION=2.7

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-py3.5)

				    TRAVIS_PYTHON_VERSION=3.5

				  pytorch-linux-xenial-py3.8)

				    # TODO: This is a hack, get rid of this as soon as you get rid of the travis downloads

				    TRAVIS_DL_URL_PREFIX="https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/16.04/x86_64"

				    TRAVIS_PYTHON_VERSION=3.8

				    GCC_VERSION=7

				    # Do not install PROTOBUF, DB, and VISION as a test

				    ;;

				@ -67,6 +57,7 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-py3.6-gcc7.2)

				    ANACONDA_PYTHON_VERSION=3.6

				@ -87,39 +78,15 @@ case "$image" in

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-cuda8-cudnn7-py2)

				    CUDA_VERSION=8.0

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=2.7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-cuda8-cudnn7-py3)

				    CUDA_VERSION=8.0

				  pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc5.4)

				    CUDA_VERSION=9.2

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=5

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-cuda9-cudnn7-py2)

				    CUDA_VERSION=9.0

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=2.7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-xenial-cuda9-cudnn7-py3)

				    CUDA_VERSION=9.0

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7)

				    CUDA_VERSION=9.2

				    CUDNN_VERSION=7

				@ -146,6 +113,28 @@ case "$image" in

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-cuda10.2-cudnn7-py3-gcc7)

				    CUDA_VERSION=10.2

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-cuda11.0-cudnn8-py3-gcc7)

				    UBUNTU_VERSION=16.04-rc

				    CUDA_VERSION=11.0

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=7

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-py3-clang5-asan)

				    ANACONDA_PYTHON_VERSION=3.6

				@ -157,6 +146,7 @@ case "$image" in

				  pytorch-linux-xenial-py3-clang5-android-ndk-r19c)

				    ANACONDA_PYTHON_VERSION=3.6

				    CLANG_VERSION=5.0

				    LLVMDEV=yes

				    PROTOBUF=yes

				    ANDROID=yes

				    ANDROID_NDK_VERSION=r19c

				@ -171,6 +161,76 @@ case "$image" in

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-bionic-py3.6-clang9)

				    ANACONDA_PYTHON_VERSION=3.6

				    CLANG_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-bionic-py3.8-gcc9)

				    ANACONDA_PYTHON_VERSION=3.8

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-bionic-cuda10.2-cudnn7-py3.6-clang9)

				    CUDA_VERSION=10.2

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.6

				    CLANG_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-bionic-cuda10.2-cudnn7-py3.8-gcc9)

				    CUDA_VERSION=10.2

				    CUDNN_VERSION=7

				    ANACONDA_PYTHON_VERSION=3.8

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ;;

				  pytorch-linux-bionic-cuda11.0-cudnn8-py3.6-gcc9)

				    UBUNTU_VERSION=18.04-rc

				    CUDA_VERSION=11.0

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.6

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-bionic-cuda11.0-cudnn8-py3.8-gcc9)

				    UBUNTU_VERSION=18.04-rc

				    CUDA_VERSION=11.0

				    CUDNN_VERSION=8

				    ANACONDA_PYTHON_VERSION=3.8

				    GCC_VERSION=9

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    KATEX=yes

				    ;;

				  pytorch-linux-xenial-rocm3.3-py3.6)

				    ANACONDA_PYTHON_VERSION=3.6

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ROCM_VERSION=3.3

				    # newer cmake version required

				    CMAKE_VERSION=3.6.3

				    ;;

				  pytorch-linux-bionic-rocm3.3-py3.6)

				    ANACONDA_PYTHON_VERSION=3.6

				    PROTOBUF=yes

				    DB=yes

				    VISION=yes

				    ROCM_VERSION=3.3

				    ;;

				esac

				# Set Jenkins UID and GID if running Jenkins

				@ -182,8 +242,12 @@ fi

				tmp_tag="tmp-$(cat /dev/urandom | tr -dc 'a-z' | fold -w 32 | head -n 1)"

				# Build image

				# TODO: build-arg THRIFT is not turned on for any image, remove it once we confirm

				# it's no longer needed.

				docker build \

				       --no-cache \

				       --progress=plain \

				       --build-arg "TRAVIS_DL_URL_PREFIX=${TRAVIS_DL_URL_PREFIX}" \

				       --build-arg "BUILD_ENVIRONMENT=${image}" \

				       --build-arg "PROTOBUF=${PROTOBUF:-}" \

				       --build-arg "THRIFT=${THRIFT:-}" \

				@ -207,6 +271,7 @@ docker build \

				       --build-arg "CMAKE_VERSION=${CMAKE_VERSION:-}" \

				       --build-arg "NINJA_VERSION=${NINJA_VERSION:-}" \

				       --build-arg "KATEX=${KATEX:-}" \

				       --build-arg "ROCM_VERSION=${ROCM_VERSION:-}" \

				       -f $(dirname ${DOCKERFILE})/Dockerfile \

				       -t "$tmp_tag" \

				       "$@" \

									
										4

.circleci/docker/build_docker.sh
									
												View File
												
				@ -45,5 +45,9 @@ trap "docker logout ${registry}" EXIT

				docker push "${image}:${tag}"

				# TODO: Get rid of duplicate tagging once ${DOCKER_TAG} becomes the default

				docker tag "${image}:${tag}" "${image}:${DOCKER_TAG}"

				docker push "${image}:${DOCKER_TAG}"

				docker save -o "${IMAGE_NAME}:${tag}.tar" "${image}:${tag}"

				aws s3 cp "${IMAGE_NAME}:${tag}.tar" "s3://ossci-linux-build/pytorch/base/${IMAGE_NAME}:${tag}.tar" --acl public-read

									
										2

.circleci/docker/common/install_android.sh
									
												View File
												
				@ -10,7 +10,7 @@ apt-get autoclean && apt-get clean

				rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				pushd /tmp

				curl -Os https://dl.google.com/android/repository/android-ndk-${ANDROID_NDK}-linux-x86_64.zip

				curl -Os --retry 3 https://dl.google.com/android/repository/android-ndk-${ANDROID_NDK}-linux-x86_64.zip

				popd

				_ndk_dir=/opt/ndk

				mkdir -p "$_ndk_dir"

									
										28

.circleci/docker/common/install_base.sh
									
												View File
												
				@ -2,17 +2,16 @@

				set -ex

				if [[ "$UBUNTU_VERSION" == "14.04" ]]; then

				  # cmake 2 is too old

				  cmake3=cmake3

				else

				  cmake3=cmake

				fi

				if [[ "$UBUNTU_VERSION" == "18.04" ]]; then

				# NVIDIA dockers for RC releases use tag names like `11.0-cudnn8-devel-ubuntu18.04-rc`,

				# for this case we will set UBUNTU_VERSION to `18.04-rc` so that the Dockerfile could

				# find the correct image. As a result, here we have to check for

				#   "$UBUNTU_VERSION" == "18.04"*

				# instead of

				#   "$UBUNTU_VERSION" == "18.04"

				if [[ "$UBUNTU_VERSION" == "18.04"* ]]; then

				  cmake3="cmake=3.10*"

				else

				  cmake3="${cmake3}=3.5*"

				  cmake3="cmake=3.5*"

				fi

				# Install common dependencies

				@ -51,14 +50,15 @@ apt-get install -y --no-install-recommends \

				# Install Valgrind separately since the apt-get version is too old.

				mkdir valgrind_build && cd valgrind_build

				if ! wget http://valgrind.org/downloads/valgrind-3.14.0.tar.bz2

				VALGRIND_VERSION=3.15.0

				if ! wget http://valgrind.org/downloads/valgrind-${VALGRIND_VERSION}.tar.bz2

				then

				  wget https://sourceware.org/ftp/valgrind/valgrind-3.14.0.tar.bz2

				  wget https://sourceware.org/ftp/valgrind/valgrind-${VALGRIND_VERSION}.tar.bz2

				fi

				tar -xjf valgrind-3.14.0.tar.bz2

				cd valgrind-3.14.0

				tar -xjf valgrind-${VALGRIND_VERSION}.tar.bz2

				cd valgrind-${VALGRIND_VERSION}

				./configure --prefix=/usr/local

				make

				make -j 4

				sudo make install

				cd ../../

				rm -rf valgrind_build

									
										2

.circleci/docker/common/install_cache.sh
									
												View File
												
				@ -8,7 +8,7 @@ sed -e 's|PATH="\(.*\)"|PATH="/opt/cache/bin:\1"|g' -i /etc/environment

				export PATH="/opt/cache/bin:$PATH"

				# Setup compiler cache

				curl https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache

				curl --retry 3 https://s3.amazonaws.com/ossci-linux/sccache -o /opt/cache/bin/sccache

				chmod a+x /opt/cache/bin/sccache

				function write_sccache_stub() {

									
										2

.circleci/docker/common/install_cmake.sh
									
												View File
												
				@ -10,7 +10,7 @@ file="cmake-${CMAKE_VERSION}-Linux-x86_64.tar.gz"

				# Download and install specific CMake version in /usr/local

				pushd /tmp

				curl -Os "https://cmake.org/files/${path}/${file}"

				curl -Os --retry 3 "https://cmake.org/files/${path}/${file}"

				tar -C /usr/local --strip-components 1 --no-same-owner -zxf cmake-*.tar.gz

				rm -f cmake-*.tar.gz

				popd

									
										22

.circleci/docker/common/install_conda.sh
									
												View File
												
				@ -4,7 +4,7 @@ set -ex

				# Optionally install conda

				if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  BASE_URL="https://repo.continuum.io/miniconda"

				  BASE_URL="https://repo.anaconda.com/miniconda"

				  MAJOR_PYTHON_VERSION=$(echo "$ANACONDA_PYTHON_VERSION" | cut -d . -f 1)

				@ -64,19 +64,21 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  # Install PyTorch conda deps, as per https://github.com/pytorch/pytorch README

				  # DO NOT install cmake here as it would install a version newer than 3.5, but

				  # we want to pin to version 3.5.

				  conda_install numpy pyyaml mkl mkl-include setuptools cffi typing future six

				  if [[ "$CUDA_VERSION" == 8.0* ]]; then

				    conda_install magma-cuda80 -c pytorch

				  elif [[ "$CUDA_VERSION" == 9.0* ]]; then

				    conda_install magma-cuda90 -c pytorch

				  elif [[ "$CUDA_VERSION" == 9.1* ]]; then

				    conda_install magma-cuda91 -c pytorch

				  elif [[ "$CUDA_VERSION" == 9.2* ]]; then

				  if [ "$ANACONDA_PYTHON_VERSION" = "3.8" ]; then

				    # DO NOT install typing if installing python-3.8, since its part of python-3.8 core packages

				    # Install llvm-8 as it is required to compile llvmlite-0.30.0 from source

				    conda_install numpy pyyaml mkl mkl-include setuptools cffi future six llvmdev=8.0.0

				  else

				    conda_install numpy pyyaml mkl mkl-include setuptools cffi typing future six

				  fi

				  if [[ "$CUDA_VERSION" == 9.2* ]]; then

				    conda_install magma-cuda92 -c pytorch

				  elif [[ "$CUDA_VERSION" == 10.0* ]]; then

				    conda_install magma-cuda100 -c pytorch

				  elif [[ "$CUDA_VERSION" == 10.1* ]]; then

				    conda_install magma-cuda101 -c pytorch

				  elif [[ "$CUDA_VERSION" == 10.2* ]]; then

				    conda_install magma-cuda102 -c pytorch

				  fi

				  # TODO: This isn't working atm

				@ -88,7 +90,7 @@ if [ -n "$ANACONDA_PYTHON_VERSION" ]; then

				  # scikit-learn is pinned because of

				  # https://github.com/scikit-learn/scikit-learn/issues/14485 (affects gcc 5.5

				  # only)

				  as_jenkins pip install --progress-bar off pytest scipy==1.1.0 scikit-learn==0.20.3 scikit-image librosa>=0.6.2 psutil numba==0.43.1 llvmlite==0.28.0

				  as_jenkins pip install --progress-bar off pytest scipy==1.1.0 scikit-learn==0.20.3 scikit-image librosa>=0.6.2 psutil numba==0.46.0 llvmlite==0.30.0

				  popd

				fi

									
										6

.circleci/docker/common/install_gcc.sh
									
												View File
												
				@ -7,7 +7,11 @@ if [ -n "$GCC_VERSION" ]; then

				  # Need the official toolchain repo to get alternate packages

				  add-apt-repository ppa:ubuntu-toolchain-r/test

				  apt-get update

				  apt-get install -y g++-$GCC_VERSION

				  if [ "$UBUNTU_VERSION" = "16.04" -a "$GCC_VERSION" = "5" ]; then

				    apt-get install -y g++-5=5.4.0-6ubuntu1~16.04.12

				  else

				    apt-get install -y g++-$GCC_VERSION

				  fi

				  update-alternatives --install /usr/bin/gcc gcc /usr/bin/gcc-"$GCC_VERSION" 50

				  update-alternatives --install /usr/bin/g++ g++ /usr/bin/g++-"$GCC_VERSION" 50

									
										30

.circleci/docker/common/install_llvm.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,30 @@

				#!/bin/bash

				set -ex

				llvm_url="https://github.com/llvm/llvm-project/releases/download/llvmorg-9.0.1/llvm-9.0.1.src.tar.xz"

				mkdir /opt/llvm

				pushd /tmp

				wget --no-verbose --output-document=llvm.tar.xz "$llvm_url"

				mkdir llvm

				tar -xf llvm.tar.xz -C llvm --strip-components 1

				rm -f llvm.tar.xz

				cd llvm

				mkdir build

				cd build

				cmake -G "Unix Makefiles" \

				  -DCMAKE_BUILD_TYPE=MinSizeRel \

				  -DLLVM_ENABLE_ASSERTIONS=ON \

				  -DCMAKE_INSTALL_PREFIX=/opt/llvm \

				  -DLLVM_TARGETS_TO_BUILD="host" \

				  -DLLVM_BUILD_TOOLS=OFF \

				  -DLLVM_BUILD_UTILS=OFF \

				  -DLLVM_TEMPORARILY_ALLOW_OLD_TOOLCHAIN=ON \

				  ../

				make -j4

				sudo make install

				popd

									
										89

.circleci/docker/common/install_rocm.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,89 @@

				#!/bin/bash

				set -ex

				install_ubuntu() {

				    apt-get update

				    if [[ $UBUNTU_VERSION == 18.04 ]]; then

				      # gpg-agent is not available by default on 18.04

				      apt-get install -y --no-install-recommends gpg-agent

				    fi

				    apt-get install -y wget

				    apt-get install -y libopenblas-dev

				    # Need the libc++1 and libc++abi1 libraries to allow torch._C to load at runtime

				    apt-get install -y libc++1

				    apt-get install -y libc++abi1

				    DEB_ROCM_REPO=http://repo.radeon.com/rocm/apt/${ROCM_VERSION}

				    # Add rocm repository

				    wget -qO - $DEB_ROCM_REPO/rocm.gpg.key | apt-key add -

				    echo "deb [arch=amd64] $DEB_ROCM_REPO xenial main" > /etc/apt/sources.list.d/rocm.list

				    apt-get update --allow-insecure-repositories

				    DEBIAN_FRONTEND=noninteractive apt-get install -y --allow-unauthenticated \

				                   rocm-dev \

				                   rocm-utils \

				                   rocfft \

				                   miopen-hip \

				                   rocblas \

				                   hipsparse \

				                   rocrand \

				                   hipcub \

				                   rocthrust \

				                   rccl \

				                   rocprofiler-dev \

				                   roctracer-dev

				  # Cleanup

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

				}

				install_centos() {

				  yum update -y

				  yum install -y wget

				  yum install -y openblas-devel

				  yum install -y epel-release

				  yum install -y dkms kernel-headers-`uname -r` kernel-devel-`uname -r`

				  echo "[ROCm]" > /etc/yum.repos.d/rocm.repo

				  echo "name=ROCm" >> /etc/yum.repos.d/rocm.repo

				  echo "baseurl=http://repo.radeon.com/rocm/yum/rpm/" >> /etc/yum.repos.d/rocm.repo

				  echo "enabled=1" >> /etc/yum.repos.d/rocm.repo

				  echo "gpgcheck=0" >> /etc/yum.repos.d/rocm.repo

				  yum update -y

				  yum install -y \

				                   rocm-dev \

				                   rocm-utils \

				                   rocfft \

				                   miopen-hip \

				                   rocblas \

				                   hipsparse \

				                   rocrand \

				                   rccl \

				                   hipcub \

				                   rocthrust \

				                   rocprofiler-dev \

				                   roctracer-dev

				  # Cleanup

				  yum clean all

				  rm -rf /var/cache/yum

				  rm -rf /var/lib/yum/yumdb

				  rm -rf /var/lib/yum/history

				}

				# Install Python packages depending on the base OS

				if [ -f /etc/lsb-release ]; then

				  install_ubuntu

				elif [ -f /etc/os-release ]; then

				  install_centos

				else

				  echo "Unable to determine OS..."

				  exit 1

				fi

									
										5

.circleci/docker/common/install_travis_python.sh
									
												View File
												
				@ -14,7 +14,7 @@ if [ -n "$TRAVIS_PYTHON_VERSION" ]; then

				  # Download Python binary from Travis

				  pushd tmp

				  as_jenkins wget --quiet https://s3.amazonaws.com/travis-python-archives/binaries/ubuntu/14.04/x86_64/python-$TRAVIS_PYTHON_VERSION.tar.bz2

				  as_jenkins wget --quiet ${TRAVIS_DL_URL_PREFIX}/python-$TRAVIS_PYTHON_VERSION.tar.bz2

				  # NB: The tarball also comes with /home/travis virtualenv that we

				  # don't care about.  (Maybe we should, but we've worked around the

				  # "how do I install to python" issue by making this entire directory

				@ -88,6 +88,9 @@ if [ -n "$TRAVIS_PYTHON_VERSION" ]; then

				  # Install psutil for dataloader tests

				  as_jenkins pip install psutil

				  # Install dill for serialization tests

				  as_jenkins pip install "dill>=0.3.1"

				  # Cleanup package manager

				  apt-get autoclean && apt-get clean

				  rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

									
										9

.circleci/docker/ubuntu-cuda/Dockerfile
									
												View File
												
				@ -35,6 +35,11 @@ ARG GCC_VERSION

				ADD ./common/install_gcc.sh install_gcc.sh

				RUN bash ./install_gcc.sh && rm install_gcc.sh

				# Install clang

				ARG CLANG_VERSION

				ADD ./common/install_clang.sh install_clang.sh

				RUN bash ./install_clang.sh && rm install_clang.sh

				# Install non-standard Python versions (via Travis binaries)

				ARG TRAVIS_PYTHON_VERSION

				ENV PATH /opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH

				@ -81,5 +86,9 @@ ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}

				ENV TORCH_CUDA_ARCH_LIST Maxwell

				ENV TORCH_NVCC_FLAGS "-Xfatbin -compress-all"

				# Install LLVM dev version

				ADD ./common/install_llvm.sh install_llvm.sh

				RUN bash ./install_llvm.sh

				USER jenkins

				CMD ["bash"]

1

.circleci/docker/ubuntu-rocm/.gitignore vendored Normal file

View File

				`@ -0,0 +1 @@`
				`*.sh`

									
										86

.circleci/docker/ubuntu-rocm/Dockerfile
									
										Normal file
									
												View File
												
				@ -0,0 +1,86 @@

				ARG UBUNTU_VERSION

				FROM ubuntu:${UBUNTU_VERSION}

				ARG UBUNTU_VERSION

				ENV DEBIAN_FRONTEND noninteractive

				# Install common dependencies (so that this step can be cached separately)

				ARG EC2

				ADD ./common/install_base.sh install_base.sh

				RUN bash ./install_base.sh && rm install_base.sh

				# Install clang

				ARG LLVMDEV

				ARG CLANG_VERSION

				ADD ./common/install_clang.sh install_clang.sh

				RUN bash ./install_clang.sh && rm install_clang.sh

				# Install user

				ADD ./common/install_user.sh install_user.sh

				RUN bash ./install_user.sh && rm install_user.sh

				# Install conda

				ENV PATH /opt/conda/bin:$PATH

				ARG ANACONDA_PYTHON_VERSION

				ADD ./common/install_conda.sh install_conda.sh

				RUN bash ./install_conda.sh && rm install_conda.sh

				# (optional) Install protobuf for ONNX

				ARG PROTOBUF

				ADD ./common/install_protobuf.sh install_protobuf.sh

				RUN if [ -n "${PROTOBUF}" ]; then bash ./install_protobuf.sh; fi

				RUN rm install_protobuf.sh

				ENV INSTALLED_PROTOBUF ${PROTOBUF}

				# (optional) Install database packages like LMDB and LevelDB

				ARG DB

				ADD ./common/install_db.sh install_db.sh

				RUN if [ -n "${DB}" ]; then bash ./install_db.sh; fi

				RUN rm install_db.sh

				ENV INSTALLED_DB ${DB}

				# (optional) Install vision packages like OpenCV and ffmpeg

				ARG VISION

				ADD ./common/install_vision.sh install_vision.sh

				RUN if [ -n "${VISION}" ]; then bash ./install_vision.sh; fi

				RUN rm install_vision.sh

				ENV INSTALLED_VISION ${VISION}

				# Install rocm

				ARG ROCM_VERSION

				ADD ./common/install_rocm.sh install_rocm.sh

				RUN bash ./install_rocm.sh

				RUN rm install_rocm.sh

				ENV PATH /opt/rocm/bin:$PATH

				ENV PATH /opt/rocm/hcc/bin:$PATH

				ENV PATH /opt/rocm/hip/bin:$PATH

				ENV PATH /opt/rocm/opencl/bin:$PATH

				ENV HIP_PLATFORM hcc

				ENV LANG C.UTF-8

				ENV LC_ALL C.UTF-8

				# (optional) Install non-default CMake version

				ARG CMAKE_VERSION

				ADD ./common/install_cmake.sh install_cmake.sh

				RUN if [ -n "${CMAKE_VERSION}" ]; then bash ./install_cmake.sh; fi

				RUN rm install_cmake.sh

				# (optional) Install non-default Ninja version

				ARG NINJA_VERSION

				ADD ./common/install_ninja.sh install_ninja.sh

				RUN if [ -n "${NINJA_VERSION}" ]; then bash ./install_ninja.sh; fi

				RUN rm install_ninja.sh

				# Install ccache/sccache (do this last, so we get priority in PATH)

				ADD ./common/install_cache.sh install_cache.sh

				ENV PATH /opt/cache/bin:$PATH

				RUN bash ./install_cache.sh && rm install_cache.sh

				# Include BUILD_ENVIRONMENT environment variable in image

				ARG BUILD_ENVIRONMENT

				ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}

				USER jenkins

				CMD ["bash"]

									
										5

.circleci/docker/ubuntu/Dockerfile
									
												View File
												
				@ -46,6 +46,7 @@ RUN bash ./install_gcc.sh && rm install_gcc.sh

				# Install non-standard Python versions (via Travis binaries)

				ARG TRAVIS_PYTHON_VERSION

				ARG TRAVIS_DL_URL_PREFIX

				ENV PATH /opt/python/$TRAVIS_PYTHON_VERSION/bin:$PATH

				ADD ./common/install_travis_python.sh install_travis_python.sh

				RUN bash ./install_travis_python.sh && rm install_travis_python.sh

				@ -110,5 +111,9 @@ RUN bash ./install_jni.sh && rm install_jni.sh

				ARG BUILD_ENVIRONMENT

				ENV BUILD_ENVIRONMENT ${BUILD_ENVIRONMENT}

				# Install LLVM dev version

				ADD ./common/install_llvm.sh install_llvm.sh

				RUN bash ./install_llvm.sh

				USER jenkins

				CMD ["bash"]

									
										13

.circleci/ecr_gc_docker/Dockerfile
									
										Normal file
									
												View File
												
				@ -0,0 +1,13 @@

				FROM ubuntu:16.04

				RUN apt-get update && apt-get install -y python-pip git && rm -rf /var/lib/apt/lists/* /var/log/dpkg.log

				ADD requirements.txt /requirements.txt

				RUN pip install -r /requirements.txt

				ADD gc.py /usr/bin/gc.py

				ADD docker_hub.py /usr/bin/docker_hub.py

				ENTRYPOINT ["/usr/bin/gc.py"]

									
										125

.circleci/ecr_gc_docker/docker_hub.py
									
										Executable file
									
												View File
												
				@ -0,0 +1,125 @@

				#!/usr/bin/env python

				from collections import namedtuple

				import boto3

				import requests

				import os

				IMAGE_INFO = namedtuple(

				    "IMAGE_INFO", ("repo", "tag", "size", "last_updated_at", "last_updated_by")

				)

				def build_access_token(username, passwordtr):

				    r = requests.post(

				        "https://hub.docker.com/v2/users/login/",

				        data={"username": username, "password": password},

				    )

				    r.raise_for_status()

				    token = r.json().get("token")

				    return {"Authorization": "JWT " + token}

				def list_repos(user, token):

				    r = requests.get("https://hub.docker.com/v2/repositories/" + user, headers=token)

				    r.raise_for_status()

				    ret = sorted(

				        repo["user"] + "/" + repo["name"] for repo in r.json().get("results", [])

				    )

				    if ret:

				        print("repos found:")

				        print("".join("\n\t" + r for r in ret))

				    return ret

				def list_tags(repo, token):

				    r = requests.get(

				        "https://hub.docker.com/v2/repositories/" + repo + "/tags", headers=token

				    )

				    r.raise_for_status()

				    return [

				        IMAGE_INFO(

				            repo=repo,

				            tag=t["name"],

				            size=t["full_size"],

				            last_updated_at=t["last_updated"],

				            last_updated_by=t["last_updater_username"],

				        )

				        for t in r.json().get("results", [])

				    ]

				def save_to_s3(tags):

				    table_content = ""

				    client = boto3.client("s3")

				    for t in tags:

				        table_content += (

				            "<tr><td>{repo}</td><td>{tag}</td><td>{size}</td>"

				            "<td>{last_updated_at}</td><td>{last_updated_by}</td></tr>"

				        ).format(

				            repo=t.repo,

				            tag=t.tag,

				            size=t.size,

				            last_updated_at=t.last_updated_at,

				            last_updated_by=t.last_updated_by,

				        )

				    html_body = """

				    <html>

				        <head>

				            <link rel="stylesheet"

				                href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"

				                integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"

				                crossorigin="anonymous">

				            <link rel="stylesheet" type="text/css"

				                href="https://cdn.datatables.net/1.10.20/css/jquery.dataTables.css">

				            <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js">

				            </script>

				            <script type="text/javascript" charset="utf8"

				                src="https://cdn.datatables.net/1.10.20/js/jquery.dataTables.js"></script>

				            <title> docker image info</title>

				        </head>

				        <body>

				            <table class="table table-striped table-hover" id="docker">

				            <caption>Docker images on docker hub</caption>

				            <thead class="thead-dark">

				                <tr>

				                <th scope="col">repo</th>

				                <th scope="col">tag</th>

				                <th scope="col">size</th>

				                <th scope="col">last_updated_at</th>

				                <th scope="col">last_updated_by</th>

				                </tr>

				            </thead>

				            <tbody>

				                {table_content}

				            </tbody>

				            </table>

				        </body>

				        <script>

				            $(document).ready( function () {{

				                $('#docker').DataTable({{paging: false}});

				            }} );py

				        </script>

				    </html>

				    """.format(

				        table_content=table_content

				    )

				    client.put_object(

				        Bucket="docker.pytorch.org",

				        ACL="public-read",

				        Key="docker_hub.html",

				        Body=html_body,

				        ContentType="text/html",

				    )

				if __name__ == "__main__":

				    username = os.environ.get("DOCKER_HUB_USERNAME")

				    password = os.environ.get("DOCKER_HUB_PASSWORD")

				    token = build_access_token(username, password)

				    tags = []

				    for repo in list_repos("pytorch", token):

				        tags.extend(list_tags(repo, token))

				    save_to_s3(tags)

									
										214

.circleci/ecr_gc_docker/gc.py
									
										Executable file
									
												View File
												
				@ -0,0 +1,214 @@

				#!/usr/bin/env python

				import argparse

				import datetime

				import boto3

				import pytz

				import sys

				import re

				def save_to_s3(project, data):

				    table_content = ""

				    client = boto3.client("s3")

				    for repo, tag, window, age, pushed in data:

				        table_content += "<tr><td>{repo}</td><td>{tag}</td><td>{window}</td><td>{age}</td><td>{pushed}</td></tr>".format(

				            repo=repo, tag=tag, window=window, age=age, pushed=pushed

				        )

				    html_body = """

				    <html>

				        <head>

				            <link rel="stylesheet"

				                href="https://stackpath.bootstrapcdn.com/bootstrap/4.4.1/css/bootstrap.min.css"

				                integrity="sha384-Vkoo8x4CGsO3+Hhxv8T/Q5PaXtkKtu6ug5TOeNV6gBiFeWPGFN9MuhOf23Q9Ifjh"

				                crossorigin="anonymous">

				            <link rel="stylesheet" type="text/css" href="https://cdn.datatables.net/1.10.20/css/jquery.dataTables.css">

				            <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.4.1/jquery.min.js"></script>

				            <script type="text/javascript" charset="utf8" src="https://cdn.datatables.net/1.10.20/js/jquery.dataTables.js"></script>

				            <title>{project} nightly and permanent docker image info</title>

				        </head>

				        <body>

				            <table class="table table-striped table-hover" id="docker">

				            <thead class="thead-dark">

				                <tr>

				                <th scope="col">repo</th>

				                <th scope="col">tag</th>

				                <th scope="col">keep window</th>

				                <th scope="col">age</th>

				                <th scope="col">pushed at</th>

				                </tr>

				            </thead>

				            <tbody>

				                {table_content}

				            </tbody>

				            </table>

				        </body>

				        <script>

				            $(document).ready( function () {{

				                $('#docker').DataTable({{paging: false}});

				            }} );

				        </script>

				    </html>

				    """.format(

				        project=project, table_content=table_content

				    )

				    # for pytorch, file can be found at

				    # http://ossci-docker.s3-website.us-east-1.amazonaws.com/pytorch.html

				    # and later one we can config docker.pytorch.org to point to the location

				    client.put_object(

				        Bucket="docker.pytorch.org",

				        ACL="public-read",

				        Key="{project}.html".format(project=project),

				        Body=html_body,

				        ContentType="text/html",

				    )

				def repos(client):

				    paginator = client.get_paginator("describe_repositories")

				    pages = paginator.paginate(registryId="308535385114")

				    for page in pages:

				        for repo in page["repositories"]:

				            yield repo

				def images(client, repository):

				    paginator = client.get_paginator("describe_images")

				    pages = paginator.paginate(

				        registryId="308535385114", repositoryName=repository["repositoryName"]

				    )

				    for page in pages:

				        for image in page["imageDetails"]:

				            yield image

				parser = argparse.ArgumentParser(description="Delete old Docker tags from registry")

				parser.add_argument(

				    "--dry-run", action="store_true", help="Dry run; print tags that would be deleted"

				)

				parser.add_argument(

				    "--keep-stable-days",

				    type=int,

				    default=14,

				    help="Days of stable Docker tags to keep (non per-build images)",

				)

				parser.add_argument(

				    "--keep-unstable-days",

				    type=int,

				    default=1,

				    help="Days of unstable Docker tags to keep (per-build images)",

				)

				parser.add_argument(

				    "--filter-prefix",

				    type=str,

				    default="",

				    help="Only run cleanup for repositories with this prefix",

				)

				parser.add_argument(

				    "--ignore-tags",

				    type=str,

				    default="",

				    help="Never cleanup these tags (comma separated)",

				)

				args = parser.parse_args()

				if not args.ignore_tags or not args.filter_prefix:

				    print(

				        """

				Missing required arguments --ignore-tags and --filter-prefix

				You must specify --ignore-tags and --filter-prefix to avoid accidentally

				pruning a stable Docker tag which is being actively used.  This will

				make you VERY SAD.  So pay attention.

				First, which filter-prefix do you want?  The list of valid prefixes

				is in jobs/private.groovy under the 'docker-registry-cleanup' job.

				You probably want either pytorch or caffe2.

				Second, which ignore-tags do you want?  It should be whatever the most

				up-to-date DockerVersion for the repository in question is.  Follow

				the imports of jobs/pytorch.groovy to find them.

				"""

				    )

				    sys.exit(1)

				client = boto3.client("ecr", region_name="us-east-1")

				stable_window = datetime.timedelta(days=args.keep_stable_days)

				unstable_window = datetime.timedelta(days=args.keep_unstable_days)

				now = datetime.datetime.now(pytz.UTC)

				ignore_tags = args.ignore_tags.split(",")

				def chunks(chunkable, n):

				    """ Yield successive n-sized chunks from l.

				    """

				    for i in range(0, len(chunkable), n):

				        yield chunkable[i : i + n]

				SHA_PATTERN = re.compile(r'^[0-9a-f]{40}$')

				def looks_like_git_sha(tag):

				    """Returns a boolean to check if a tag looks like a git sha

				    For reference a sha1 is 40 characters with only 0-9a-f and contains no

				    "-" characters

				    """

				    return re.match(SHA_PATTERN, tag) is not None

				stable_window_tags = []

				for repo in repos(client):

				    repositoryName = repo["repositoryName"]

				    if not repositoryName.startswith(args.filter_prefix):

				        continue

				    # Keep list of image digests to delete for this repository

				    digest_to_delete = []

				    print(repositoryName)

				    for image in images(client, repo):

				        tags = image.get("imageTags")

				        if not isinstance(tags, (list,)) or len(tags) == 0:

				            continue

				        tag = tags[0]

				        created = image["imagePushedAt"]

				        age = now - created

				        if any([

				                looks_like_git_sha(tag),

				                tag.isdigit(),

				                tag.count("-") == 4,  # TODO: Remove, this no longer applies as tags are now built using a SHA1

				                tag in ignore_tags]):

				            window = stable_window

				            if tag in ignore_tags:

				                stable_window_tags.append((repositoryName, tag, "", age, created))

				            elif age < window:

				                stable_window_tags.append((repositoryName, tag, window, age, created))

				        else:

				            window = unstable_window

				        if tag in ignore_tags:

				            print("Ignoring tag {}:{} (age: {})".format(repositoryName, tag, age))

				            continue

				        if age < window:

				            print("Not deleting manifest for tag {}:{} (age: {})".format(repositoryName, tag, age))

				            continue

				        if args.dry_run:

				            print("(dry run) Deleting manifest for tag {}:{} (age: {})".format(repositoryName, tag, age))

				        else:

				            print("Deleting manifest for tag{}:{} (age: {})".format(repositoryName, tag, age))

				            digest_to_delete.append(image["imageDigest"])

				    # Issue batch delete for all images to delete for this repository

				    # Note that as of 2018-07-25, the maximum number of images you can

				    # delete in a single batch is 100, so chunk our list into batches of

				    # 100

				    for c in chunks(digest_to_delete, 100):

				        client.batch_delete_image(

				            registryId="308535385114",

				            repositoryName=repositoryName,

				            imageIds=[{"imageDigest": digest} for digest in c],

				        )

				    save_to_s3(args.filter_prefix, stable_window_tags)

3

.circleci/ecr_gc_docker/requirements.txt Normal file

View File

 @ -0,0 +1,3 @@
 boto3
 pytz
 requests

									
										124

.circleci/generate_config_yml.py
									
												View File
												
				@ -6,13 +6,24 @@ Please see README.md in this directory for details.

				"""

				import os

				import sys

				import shutil

				from collections import namedtuple, OrderedDict

				import sys

				from collections import OrderedDict, namedtuple

				import cimodel.data.pytorch_build_definitions as pytorch_build_definitions

				import cimodel.data.binary_build_definitions as binary_build_definitions

				import cimodel.data.caffe2_build_definitions as caffe2_build_definitions

				import cimodel.data.pytorch_build_definitions as pytorch_build_definitions

				import cimodel.data.simple.android_definitions

				import cimodel.data.simple.bazel_definitions

				import cimodel.data.simple.binary_smoketest

				import cimodel.data.simple.docker_definitions

				import cimodel.data.simple.ge_config_tests

				import cimodel.data.simple.ios_definitions

				import cimodel.data.simple.macos_definitions

				import cimodel.data.simple.mobile_definitions

				import cimodel.data.simple.nightly_android

				import cimodel.data.simple.nightly_ios

				import cimodel.data.windows_build_definitions as windows_build_definitions

				import cimodel.lib.miniutils as miniutils

				import cimodel.lib.miniyaml as miniyaml

				@ -21,6 +32,7 @@ class File(object):

				    """

				    Verbatim copy the contents of a file into config.yml

				    """

				    def __init__(self, filename):

				        self.filename = filename

				@ -29,7 +41,7 @@ class File(object):

				            shutil.copyfileobj(fh, output_filehandle)

				class FunctionGen(namedtuple('FunctionGen', 'function depth')):

				class FunctionGen(namedtuple("FunctionGen", "function depth")):

				    __slots__ = ()

				@ -39,15 +51,14 @@ class Treegen(FunctionGen):

				    """

				    def write(self, output_filehandle):

				        build_dict = OrderedDict()

				        self.function(build_dict)

				        miniyaml.render(output_filehandle, build_dict, self.depth)

				        miniyaml.render(output_filehandle, self.function(), self.depth)

				class Listgen(FunctionGen):

				    """

				    Insert the content of a YAML list into config.yml

				    """

				    def write(self, output_filehandle):

				        miniyaml.render(output_filehandle, self.function(), self.depth)

				@ -57,7 +68,6 @@ def horizontal_rule():

				class Header(object):

				    def __init__(self, title, summary=None):

				        self.title = title

				        self.summary_lines = summary or []

				@ -71,43 +81,81 @@ class Header(object):

				            output_filehandle.write(line + "\n")

				def gen_build_workflows_tree():

				    build_workflows_functions = [

				        pytorch_build_definitions.get_workflow_jobs,

				        cimodel.data.simple.macos_definitions.get_workflow_jobs,

				        cimodel.data.simple.android_definitions.get_workflow_jobs,

				        cimodel.data.simple.ios_definitions.get_workflow_jobs,

				        cimodel.data.simple.mobile_definitions.get_workflow_jobs,

				        cimodel.data.simple.ge_config_tests.get_workflow_jobs,

				        cimodel.data.simple.bazel_definitions.get_workflow_jobs,

				        caffe2_build_definitions.get_workflow_jobs,

				        cimodel.data.simple.binary_smoketest.get_workflow_jobs,

				        cimodel.data.simple.nightly_ios.get_workflow_jobs,

				        cimodel.data.simple.nightly_android.get_workflow_jobs,

				        windows_build_definitions.get_windows_workflows,

				    ]

				    binary_build_functions = [

				        binary_build_definitions.get_binary_build_jobs,

				        binary_build_definitions.get_nightly_tests,

				        binary_build_definitions.get_nightly_uploads,

				        binary_build_definitions.get_post_upload_jobs,

				        binary_build_definitions.get_binary_smoke_test_jobs,

				    ]

				    docker_builder_functions = [

				        cimodel.data.simple.docker_definitions.get_workflow_jobs

				    ]

				    return {

				        "workflows": {

				            "binary_builds": {

				                "when": r"<< pipeline.parameters.run_binary_tests >>",

				                "jobs": [f() for f in binary_build_functions],

				            },

				            "docker_build": OrderedDict(

				                {

				                    "triggers": [

				                        {

				                            "schedule": {

				                                "cron": miniutils.quote("0 15 * * 0"),

				                                "filters": {"branches": {"only": ["master"]}},

				                            }

				                        }

				                    ],

				                    "jobs": [f() for f in docker_builder_functions],

				                }

				            ),

				            "build": {"jobs": [f() for f in build_workflows_functions]},

				        }

				    }

				# Order of this list matters to the generated config.yml.

				YAML_SOURCES = [

				    File("header-section.yml"),

				    File("commands.yml"),

				    File("nightly-binary-build-defaults.yml"),

				    Header("Build parameters"),

				    File("pytorch-build-params.yml"),

				    File("caffe2-build-params.yml"),

				    File("binary-build-params.yml"),

				    File("build-parameters/pytorch-build-params.yml"),

				    File("build-parameters/caffe2-build-params.yml"),

				    File("build-parameters/binary-build-params.yml"),

				    File("build-parameters/promote-build-params.yml"),

				    Header("Job specs"),

				    File("pytorch-job-specs.yml"),

				    File("caffe2-job-specs.yml"),

				    File("binary-job-specs.yml"),

				    File("job-specs-setup.yml"),

				    File("job-specs-custom.yml"),

				    File("binary_update_htmls.yml"),

				    File("binary-build-tests.yml"),

				    File("docker_build_job.yml"),

				    File("workflows.yml"),

				    Listgen(pytorch_build_definitions.get_workflow_jobs, 3),

				    File("workflows-pytorch-macos-builds.yml"),

				    File("workflows-pytorch-android-gradle-build.yml"),

				    File("workflows-pytorch-ios-builds.yml"),

				    File("workflows-pytorch-mobile-builds.yml"),

				    File("workflows-pytorch-ge-config-tests.yml"),

				    Listgen(caffe2_build_definitions.get_workflow_jobs, 3),

				    File("workflows-binary-builds-smoke-subset.yml"),

				    Listgen(binary_build_definitions.get_binary_smoke_test_jobs, 3),

				    Listgen(binary_build_definitions.get_binary_build_jobs, 3),

				    File("workflows-nightly-ios-binary-builds.yml"),

				    File("workflows-nightly-android-binary-builds.yml"),

				    Header("Nightly tests"),

				    Listgen(binary_build_definitions.get_nightly_tests, 3),

				    File("workflows-nightly-uploads-header.yml"),

				    Listgen(binary_build_definitions.get_nightly_uploads, 3),

				    File("workflows-s3-html.yml"),

				    File("workflows-docker-builder.yml")

				    File("job-specs/pytorch-job-specs.yml"),

				    File("job-specs/caffe2-job-specs.yml"),

				    File("job-specs/binary-job-specs.yml"),

				    File("job-specs/job-specs-custom.yml"),

				    File("job-specs/job-specs-promote.yml"),

				    File("job-specs/binary_update_htmls.yml"),

				    File("job-specs/binary-build-tests.yml"),

				    File("job-specs/docker_jobs.yml"),

				    Header("Workflows"),

				    Treegen(gen_build_workflows_tree, 0),

				    File("workflows/workflows-ecr-gc.yml"),

				    File("workflows/workflows-promote.yml"),

				]

									
										27

.circleci/scripts/binary_checkout.sh
									
												View File
												
				@ -1,9 +1,20 @@

				#!/bin/bash

				set -eux -o pipefail

				retry () {

				    $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)

				}

				# This step runs on multiple executors with different envfile locations

				if [[ "$(uname)" == Darwin ]]; then

				  # macos executor (builds and tests)

				  workdir="/Users/distiller/project"

				elif [[ "$OSTYPE" == "msys" ]]; then

				  # windows executor (builds and tests)

				  rm -rf /c/w

				  ln -s "/c/Users/circleci/project" /c/w

				  workdir="/c/w"

				elif [[ -d "/home/circleci/project" ]]; then

				  # machine executor (binary tests)

				  workdir="/home/circleci/project"

				@ -13,11 +24,17 @@ else

				fi

				# It is very important that this stays in sync with binary_populate_env.sh

				export PYTORCH_ROOT="$workdir/pytorch"

				export BUILDER_ROOT="$workdir/builder"

				if [[ "$OSTYPE" == "msys" ]]; then

				  # We need to make the paths as short as possible on Windows

				  export PYTORCH_ROOT="$workdir/p"

				  export BUILDER_ROOT="$workdir/b"

				else

				  export PYTORCH_ROOT="$workdir/pytorch"

				  export BUILDER_ROOT="$workdir/builder"

				fi

				# Clone the Pytorch branch

				git clone https://github.com/pytorch/pytorch.git "$PYTORCH_ROOT"

				retry git clone https://github.com/pytorch/pytorch.git "$PYTORCH_ROOT"

				pushd "$PYTORCH_ROOT"

				if [[ -n "${CIRCLE_PR_NUMBER:-}" ]]; then

				  # "smoke" binary build on PRs

				@ -33,13 +50,13 @@ else

				  echo "Can't tell what to checkout"

				  exit 1

				fi

				git submodule update --init --recursive --quiet

				retry git submodule update --init --recursive

				echo "Using Pytorch from "

				git --no-pager log --max-count 1

				popd

				# Clone the Builder master repo

				git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"

				retry git clone -q https://github.com/pytorch/builder.git "$BUILDER_ROOT"

				pushd "$BUILDER_ROOT"

				echo "Using builder from "

				git --no-pager log --max-count 1

									
										4

.circleci/scripts/binary_install_miniconda.sh
									
												View File
												
				@ -31,9 +31,9 @@ fi

				conda_sh="$workdir/install_miniconda.sh"

				if [[ "$(uname)" == Darwin ]]; then

				  retry curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				  curl --retry 3 -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				else

				  retry curl -o "$conda_sh" https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh

				  curl --retry 3 -o "$conda_sh" https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

				fi

				chmod +x "$conda_sh"

				"$conda_sh" -b -p "$MINICONDA_ROOT"

									
										16

.circleci/scripts/binary_ios_build.sh
									
												View File
												
				@ -5,20 +5,24 @@ echo ""

				echo "DIR: $(pwd)"

				WORKSPACE=/Users/distiller/workspace

				PROJ_ROOT=/Users/distiller/project

				export TCLLIBPATH="/usr/local/lib" 

				export TCLLIBPATH="/usr/local/lib"

				# Install conda

				curl -o ~/Downloads/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				chmod +x ~/Downloads/conda.sh

				/bin/bash ~/Downloads/conda.sh -b -p ~/anaconda

				curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				chmod +x ~/conda.sh

				/bin/bash ~/conda.sh -b -p ~/anaconda

				export PATH="~/anaconda/bin:${PATH}"

				source ~/anaconda/bin/activate

				# Install dependencies

				conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes

				export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

				# sync submodules

				cd ${PROJ_ROOT}

				git submodule sync

				git submodule update --init --recursive

				# run build script

				chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh

				echo "########################################################"

				@ -26,13 +30,13 @@ cat ${PROJ_ROOT}/scripts/build_ios.sh

				echo "########################################################"

				echo "IOS_ARCH: ${IOS_ARCH}"

				echo "IOS_PLATFORM: ${IOS_PLATFORM}"

				export BUILD_PYTORCH_MOBILE=1

				export IOS_ARCH=${IOS_ARCH}

				export IOS_PLATFORM=${IOS_PLATFORM}

				unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts

				#store the binary

				cd ${WORKSPACE}

				DEST_DIR=${WORKSPACE}/ios

				mkdir -p ${DEST_DIR}

				cp -R ${PROJ_ROOT}/build_ios/install ${DEST_DIR}

				mv ${DEST_DIR}/install ${DEST_DIR}/${IOS_ARCH}

				mv ${DEST_DIR}/install ${DEST_DIR}/${IOS_ARCH}

									
										10

.circleci/scripts/binary_ios_upload.sh
									
												View File
												
				@ -14,14 +14,14 @@ mkdir -p ${ZIP_DIR}/src

				cp -R ${ARTIFACTS_DIR}/arm64/include ${ZIP_DIR}/install/

				# build a FAT bianry

				cd ${ZIP_DIR}/install/lib

				target_libs=(libc10.a libclog.a libcpuinfo.a libeigen_blas.a libpytorch_qnnpack.a libtorch.a)

				target_libs=(libc10.a libclog.a libcpuinfo.a libeigen_blas.a libpthreadpool.a libpytorch_qnnpack.a libtorch_cpu.a libtorch.a libXNNPACK.a)

				for lib in ${target_libs[*]}

				do

				    libs=(${ARTIFACTS_DIR}/x86_64/lib/${lib} ${ARTIFACTS_DIR}/arm64/lib/${lib})

				    lipo -create "${libs[@]}" -o ${ZIP_DIR}/install/lib/${lib}

				    if [ -f "${ARTIFACTS_DIR}/x86_64/lib/${lib}" ] && [ -f "${ARTIFACTS_DIR}/arm64/lib/${lib}" ]; then

				        libs=("${ARTIFACTS_DIR}/x86_64/lib/${lib}" "${ARTIFACTS_DIR}/arm64/lib/${lib}")

				        lipo -create "${libs[@]}" -o ${ZIP_DIR}/install/lib/${lib}

				    fi

				done

				# for nnpack, we only support arm64 build

				cp ${ARTIFACTS_DIR}/arm64/lib/libnnpack.a ./

				lipo -i ${ZIP_DIR}/install/lib/*.a

				# copy the umbrella header and license

				cp ${PROJ_ROOT}/ios/LibTorch.h ${ZIP_DIR}/src/

									
										17

.circleci/scripts/binary_linux_test.sh
									
												View File
												
				@ -9,13 +9,15 @@ set -eux -o pipefail

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  retry conda create -qyn testenv python="$DESIRED_PYTHON"

				  source activate testenv >/dev/null

				elif [[ "$DESIRED_PYTHON" == 2.7mu ]]; then

				  export PATH="/opt/python/cp27-cp27mu/bin:\$PATH"

				elif [[ "$DESIRED_PYTHON" == 3.8m ]]; then

				  export PATH="/opt/python/cp38-cp38/bin:\$PATH"

				elif [[ "$PACKAGE_TYPE" != libtorch ]]; then

				  python_nodot="\$(echo $DESIRED_PYTHON | tr -d m.u)"

				  export PATH="/opt/python/cp\$python_nodot-cp\${python_nodot}m/bin:\$PATH"

				  python_path="/opt/python/cp\$python_nodot-cp\${python_nodot}"

				  # Prior to Python 3.8 paths were suffixed with an 'm'

				  if [[ -d  "\${python_path}/bin" ]]; then

				    export PATH="\${python_path}/bin:\$PATH"

				  elif [[ -d "\${python_path}m/bin" ]]; then

				    export PATH="\${python_path}m/bin:\$PATH"

				  fi

				fi

				# Install the package

				@ -28,11 +30,11 @@ pkg="/final_pkgs/\$(ls /final_pkgs)"

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  conda install -y "\$pkg" --offline

				  if [[ "$DESIRED_CUDA" == 'cpu' ]]; then

				    conda install -y cpuonly -c pytorch

				    retry conda install -y cpuonly -c pytorch

				  fi

				  retry conda install -yq future numpy protobuf six

				  if [[ "$DESIRED_CUDA" != 'cpu' ]]; then

				    # DESIRED_CUDA is in format cu90 or cu100

				    # DESIRED_CUDA is in format cu90 or cu102

				    if [[ "${#DESIRED_CUDA}" == 4 ]]; then

				      cu_ver="${DESIRED_CUDA:2:1}.${DESIRED_CUDA:3}"

				    else

				@ -52,6 +54,7 @@ fi

				# Test the package

				/builder/check_binary.sh

				# =================== The above code will be executed inside Docker container ===================

				EOL

				echo

									
										34

.circleci/scripts/binary_linux_upload.sh
									
												View File
												
				@ -5,15 +5,6 @@ set -eu -o pipefail

				set +x

				declare -x "AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"

				declare -x "AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"

				cat >/home/circleci/project/login_to_anaconda.sh <<EOL

				set +x

				echo "Trying to login to Anaconda"

				yes | anaconda login \

				    --username "$PYTORCH_BINARY_PJH5_CONDA_USERNAME" \

				    --password "$PYTORCH_BINARY_PJH5_CONDA_PASSWORD"

				set -x

				EOL

				chmod +x /home/circleci/project/login_to_anaconda.sh

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				# DO NOT TURN -x ON BEFORE THIS LINE

				@ -21,20 +12,37 @@ chmod +x /home/circleci/project/login_to_anaconda.sh

				set -eux -o pipefail

				export PATH="$MINICONDA_ROOT/bin:$PATH"

				# This gets set in binary_populate_env.sh, but lets have a sane default just in case

				PIP_UPLOAD_FOLDER=${PIP_UPLOAD_FOLDER:-nightly}

				# TODO: Combine CONDA_UPLOAD_CHANNEL and PIP_UPLOAD_FOLDER into one variable

				#       The only difference is the trailing slash

				# Strip trailing slashes if there

				CONDA_UPLOAD_CHANNEL=$(echo "${PIP_UPLOAD_FOLDER}" | sed 's:/*$::')

				BACKUP_BUCKET="s3://pytorch-backup"

				retry pip install -q awscli

				# Upload the package to the final location

				pushd /home/circleci/project/final_pkgs

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  retry conda install -yq anaconda-client

				  retry timeout 30 /home/circleci/project/login_to_anaconda.sh

				  anaconda upload "$(ls)" -u pytorch-nightly --label main --no-progress --force

				  retry anaconda -t "${CONDA_PYTORCHBOT_TOKEN}" upload  "$(ls)" -u "pytorch-${CONDA_UPLOAD_CHANNEL}" --label main --no-progress --force

				  # Fetch  platform (eg. win-64, linux-64, etc.) from index file

				  # Because there's no actual conda command to read this

				  subdir=$(tar -xOf ./*.bz2 info/index.json | grep subdir  | cut -d ':' -f2 | sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//')

				  BACKUP_DIR="conda/${subdir}"

				elif [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  for pkg in $(ls); do

				    retry aws s3 cp "$pkg" "$s3_dir" --acl public-read

				  done

				  BACKUP_DIR="libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				else

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  retry aws s3 cp "$(ls)" "$s3_dir" --acl public-read

				  BACKUP_DIR="whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				fi

				if [[ -n "${CIRCLE_TAG:-}" ]]; then

				  s3_dir="${BACKUP_BUCKET}/${CIRCLE_TAG}/${BACKUP_DIR}"

				  retry aws s3 cp --recursive . "$s3_dir"

				fi

									
										34

.circleci/scripts/binary_macos_upload.sh
									
												View File
												
				@ -4,15 +4,6 @@ set -eu -o pipefail

				set +x

				export AWS_ACCESS_KEY_ID="${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"

				export AWS_SECRET_ACCESS_KEY="${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"

				cat >/Users/distiller/project/login_to_anaconda.sh <<EOL

				set +x

				echo "Trying to login to Anaconda"

				yes | anaconda login \

				    --username "$PYTORCH_BINARY_PJH5_CONDA_USERNAME" \

				    --password "$PYTORCH_BINARY_PJH5_CONDA_PASSWORD"

				set -x

				EOL

				chmod +x /Users/distiller/project/login_to_anaconda.sh

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				# DO NOT TURN -x ON BEFORE THIS LINE

				@ -22,19 +13,36 @@ set -eux -o pipefail

				source "/Users/distiller/project/env"

				export "PATH=$workdir/miniconda/bin:$PATH"

				# This gets set in binary_populate_env.sh, but lets have a sane default just in case

				PIP_UPLOAD_FOLDER=${PIP_UPLOAD_FOLDER:-nightly}

				# TODO: Combine CONDA_UPLOAD_CHANNEL and PIP_UPLOAD_FOLDER into one variable

				#       The only difference is the trailing slash

				# Strip trailing slashes if there

				CONDA_UPLOAD_CHANNEL=$(echo "${PIP_UPLOAD_FOLDER}" | sed 's:/*$::')

				BACKUP_BUCKET="s3://pytorch-backup"

				retry pip install -q awscli

				pushd "$workdir/final_pkgs"

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  retry conda install -yq anaconda-client

				  retry /Users/distiller/project/login_to_anaconda.sh

				  retry anaconda upload "$(ls)" -u pytorch-nightly --label main --no-progress --force

				  retry anaconda -t "${CONDA_PYTORCHBOT_TOKEN}" upload "$(ls)" -u "pytorch-${CONDA_UPLOAD_CHANNEL}" --label main --no-progress --force

				  # Fetch  platform (eg. win-64, linux-64, etc.) from index file

				  # Because there's no actual conda command to read this

				  subdir=$(tar -xOf ./*.bz2 info/index.json | grep subdir  | cut -d ':' -f2 | sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//')

				  BACKUP_DIR="conda/${subdir}"

				elif [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  for pkg in $(ls); do

				    retry aws s3 cp "$pkg" "$s3_dir" --acl public-read

				  done

				  BACKUP_DIR="libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				else

				  retry pip install -q awscli

				  s3_dir="s3://pytorch/whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  retry aws s3 cp "$(ls)" "$s3_dir" --acl public-read

				  BACKUP_DIR="whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				fi

				if [[ -n "${CIRCLE_TAG:-}" ]]; then

				  s3_dir="${BACKUP_BUCKET}/${CIRCLE_TAG}/${BACKUP_DIR}"

				  retry aws s3 cp --recursive . "$s3_dir"

				fi

									
										79

.circleci/scripts/binary_populate_env.sh
									
												View File
												
				@ -2,11 +2,31 @@

				set -eux -o pipefail

				export TZ=UTC

				tagged_version() {

				  # Grabs version from either the env variable CIRCLE_TAG

				  # or the pytorch git described version

				  if [[ "$OSTYPE" == "msys" ]]; then

				    GIT_DESCRIBE="git --git-dir ${workdir}/p/.git describe"

				  else

				    GIT_DESCRIBE="git --git-dir ${workdir}/pytorch/.git describe"

				  fi

				  if [[ -n "${CIRCLE_TAG:-}" ]]; then

				    echo "${CIRCLE_TAG}"

				  elif ${GIT_DESCRIBE} --exact --tags >/dev/null; then

				    ${GIT_DESCRIBE} --tags

				  else

				    return 1

				  fi

				}

				# We need to write an envfile to persist these variables to following

				# steps, but the location of the envfile depends on the circleci executor

				if [[ "$(uname)" == Darwin ]]; then

				  # macos executor (builds and tests)

				  workdir="/Users/distiller/project"

				elif [[ "$OSTYPE" == "msys" ]]; then

				  # windows executor (builds and tests)

				  workdir="/c/w"

				elif [[ -d "/home/circleci/project" ]]; then

				  # machine executor (binary tests)

				  workdir="/home/circleci/project"

				@ -23,7 +43,15 @@ configs=($BUILD_ENVIRONMENT)

				export PACKAGE_TYPE="${configs[0]}"

				export DESIRED_PYTHON="${configs[1]}"

				export DESIRED_CUDA="${configs[2]}"

				export DESIRED_DEVTOOLSET="${configs[3]:-}"

				if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then

				  export DESIRED_DEVTOOLSET=""

				  export LIBTORCH_CONFIG="${configs[3]:-}"

				  if [[ "$LIBTORCH_CONFIG" == 'debug' ]]; then

				    export DEBUG=1

				  fi

				else

				  export DESIRED_DEVTOOLSET="${configs[3]:-}"

				fi

				if [[ "$PACKAGE_TYPE" == 'libtorch' ]]; then

				  export BUILD_PYTHONLESS=1

				fi

				@ -40,25 +68,27 @@ if [[ -z "$DOCKER_IMAGE" ]]; then

				  fi

				fi

				# Upload to parallel folder for devtoolsets

				# All nightlies used to be devtoolset3, then devtoolset7 was added as a build

				# option, so the upload was redirected to nightly/devtoolset7 to avoid

				# conflicts with other binaries (there shouldn't be any conflicts). Now we are

				# making devtoolset7 the default.

				if [[ "$DESIRED_DEVTOOLSET" == 'devtoolset7' || "$DESIRED_DEVTOOLSET" == *"cxx11-abi"* || "$(uname)" == 'Darwin' ]]; then

				  export PIP_UPLOAD_FOLDER='nightly/'

				else

				  # On linux machines, this shouldn't actually be called anymore. This is just

				  # here for extra safety.

				  export PIP_UPLOAD_FOLDER='nightly/devtoolset3/'

				fi

				# Default to nightly, since that's where this normally uploads to

				PIP_UPLOAD_FOLDER='nightly/'

				# We put this here so that OVERRIDE_PACKAGE_VERSION below can read from it

				export DATE="$(date -u +%Y%m%d)"

				if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu101" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then

				  export PYTORCH_BUILD_VERSION="1.4.0.dev$DATE"

				#TODO: We should be pulling semver version from the base version.txt

				BASE_BUILD_VERSION="1.6.0.dev$DATE"

				# Change BASE_BUILD_VERSION to git tag when on a git tag

				# Use 'git -C' to make doubly sure we're in the correct directory for checking

				# the git tag

				if tagged_version >/dev/null; then

				  # Switch upload folder to 'test/' if we are on a tag

				  PIP_UPLOAD_FOLDER='test/'

				  # Grab git tag, remove prefixed v and remove everything after -

				  # Used to clean up tags that are for release candidates like v1.6.0-rc1

				  # Turns tag v1.6.0-rc1 -> v1.6.0

				  BASE_BUILD_VERSION="$(tagged_version | sed -e 's/^v//' -e 's/-.*$//')"

				fi

				if [[ "$(uname)" == 'Darwin' ]] || [[ "$DESIRED_CUDA" == "cu102" ]] || [[ "$PACKAGE_TYPE" == conda ]]; then

				  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}"

				else

				  export PYTORCH_BUILD_VERSION="1.4.0.dev$DATE+$DESIRED_CUDA"

				  export PYTORCH_BUILD_VERSION="${BASE_BUILD_VERSION}+$DESIRED_CUDA"

				fi

				export PYTORCH_BUILD_NUMBER=1

				@ -94,9 +124,13 @@ export DESIRED_CUDA="$DESIRED_CUDA"

				export LIBTORCH_VARIANT="${LIBTORCH_VARIANT:-}"

				export BUILD_PYTHONLESS="${BUILD_PYTHONLESS:-}"

				export DESIRED_DEVTOOLSET="$DESIRED_DEVTOOLSET"

				if [[ "${BUILD_FOR_SYSTEM:-}" == "windows" ]]; then

				  export LIBTORCH_CONFIG="${LIBTORCH_CONFIG:-}"

				  export DEBUG="${DEBUG:-}"

				fi

				export DATE="$DATE"

				export NIGHTLIES_DATE_PREAMBLE=1.4.0.dev

				export NIGHTLIES_DATE_PREAMBLE=1.6.0.dev

				export PYTORCH_BUILD_VERSION="$PYTORCH_BUILD_VERSION"

				export PYTORCH_BUILD_NUMBER="$PYTORCH_BUILD_NUMBER"

				export OVERRIDE_PACKAGE_VERSION="$PYTORCH_BUILD_VERSION"

				@ -113,8 +147,13 @@ export DOCKER_IMAGE="$DOCKER_IMAGE"

				export workdir="$workdir"

				export MAC_PACKAGE_WORK_DIR="$workdir"

				export PYTORCH_ROOT="$workdir/pytorch"

				export BUILDER_ROOT="$workdir/builder"

				if [[ "$OSTYPE" == "msys" ]]; then

				  export PYTORCH_ROOT="$workdir/p"

				  export BUILDER_ROOT="$workdir/b"

				else

				  export PYTORCH_ROOT="$workdir/pytorch"

				  export BUILDER_ROOT="$workdir/builder"

				fi

				export MINICONDA_ROOT="$workdir/miniconda"

				export PYTORCH_FINAL_PACKAGE_DIR="$workdir/final_pkgs"

									
										25

.circleci/scripts/binary_run_in_docker.sh
									
												View File
												
				@ -16,31 +16,12 @@ set -eux -o pipefail

				# Expect actual code to be written to this file

				chmod +x /home/circleci/project/ci_test_script.sh

				VOLUME_MOUNTS="-v /home/circleci/project/:/circleci_stuff -v /home/circleci/project/final_pkgs:/final_pkgs -v ${PYTORCH_ROOT}:/pytorch -v ${BUILDER_ROOT}:/builder"

				# Run the docker

				if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

				  export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d "${DOCKER_IMAGE}")

				  export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia ${VOLUME_MOUNTS} -t -d "${DOCKER_IMAGE}")

				else

				  export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d "${DOCKER_IMAGE}")

				fi

				# Copy the envfile and script with all the code to run into the docker.

				docker cp /home/circleci/project/. "$id:/circleci_stuff"

				# Copy built packages into the docker to test. This should only exist on the

				# binary test jobs. The package should've been created from a binary build job,

				# whhich persisted the package to a CircleCI workspace, which this job then

				# copies into a GPU enabled docker for testing

				if [[ -d "/home/circleci/project/final_pkgs" ]]; then

				  docker cp /home/circleci/project/final_pkgs "$id:/final_pkgs"

				fi

				# Copy the needed repos into the docker. These do not exist in the smoke test

				# jobs, since the smoke test jobs do not need the Pytorch source code.

				if [[ -d "$PYTORCH_ROOT" ]]; then

				  docker cp "$PYTORCH_ROOT" "$id:/pytorch"

				fi

				if [[ -d "$BUILDER_ROOT" ]]; then

				  docker cp "$BUILDER_ROOT" "$id:/builder"

				  export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined ${VOLUME_MOUNTS} -t -d "${DOCKER_IMAGE}")

				fi

				# Execute the test script that was populated by an earlier section

									
										41

.circleci/scripts/binary_windows_build.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,41 @@

				#!/bin/bash

				set -eux -o pipefail

				source "/c/w/env"

				mkdir -p "$PYTORCH_FINAL_PACKAGE_DIR"

				export CUDA_VERSION="${DESIRED_CUDA/cu/}"

				export USE_SCCACHE=1

				export SCCACHE_BUCKET=ossci-compiler-cache-windows

				export NIGHTLIES_PYTORCH_ROOT="$PYTORCH_ROOT"

				if [[ "$CUDA_VERSION" == "92" || "$CUDA_VERSION" == "100" ]]; then

				  export VC_YEAR=2017

				else

				  export VC_YEAR=2019

				fi

				set +x

				export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}

				export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4:-}

				set -x

				if [[ "$CIRCLECI" == 'true' && -d "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages\\_Instances" ]]; then

				  mv "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages\\_Instances" .

				  rm -rf "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"

				  mkdir -p "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"

				  mv _Instances "C:\\ProgramData\\Microsoft\\VisualStudio\\Packages"

				fi

				echo "Free space on filesystem before build:"

				df -h

				pushd "$BUILDER_ROOT"

				if [[ "$PACKAGE_TYPE" == 'conda' ]]; then

				  ./windows/internal/build_conda.bat

				elif [[ "$PACKAGE_TYPE" == 'wheel' || "$PACKAGE_TYPE" == 'libtorch' ]]; then

				  ./windows/internal/build_wheels.bat

				fi

				echo "Free space on filesystem after build:"

				df -h

									
										19

.circleci/scripts/binary_windows_test.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,19 @@

				#!/bin/bash

				set -eux -o pipefail

				source "/c/w/env"

				export CUDA_VERSION="${DESIRED_CUDA/cu/}"

				export VC_YEAR=2017

				if [[ "$CUDA_VERSION" == "92" || "$CUDA_VERSION" == "100" ]]; then

				  export VC_YEAR=2017

				else

				  export VC_YEAR=2019

				fi

				pushd "$BUILDER_ROOT"

				./windows/internal/smoke_test.bat

				popd

									
										47

.circleci/scripts/binary_windows_upload.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,47 @@

				#!/bin/bash

				set -eu -o pipefail

				set +x

				declare -x "AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}"

				declare -x "AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}"

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				# DO NOT TURN -x ON BEFORE THIS LINE

				#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!#!

				set -eux -o pipefail

				source "/env"

				# This gets set in binary_populate_env.sh, but lets have a sane default just in case

				PIP_UPLOAD_FOLDER=${PIP_UPLOAD_FOLDER:-nightly/}

				# TODO: Combine CONDA_UPLOAD_CHANNEL and PIP_UPLOAD_FOLDER into one variable

				#       The only difference is the trailing slash

				# Strip trailing slashes if there

				CONDA_UPLOAD_CHANNEL=$(echo "${PIP_UPLOAD_FOLDER}" | sed 's:/*$::')

				BACKUP_BUCKET="s3://pytorch-backup"

				retry pip install -q awscli

				pushd /root/workspace/final_pkgs

				# Upload the package to the final location

				if [[ "$PACKAGE_TYPE" == conda ]]; then

				  retry conda install -yq anaconda-client

				  retry anaconda -t "${CONDA_PYTORCHBOT_TOKEN}" upload  "$(ls)" -u "pytorch-${CONDA_UPLOAD_CHANNEL}" --label main --no-progress --force

				  # Fetch  platform (eg. win-64, linux-64, etc.) from index file

				  # Because there's no actual conda command to read this

				  subdir=$(tar -xOf ./*.bz2 info/index.json | grep subdir  | cut -d ':' -f2 | sed -e 's/[[:space:]]//' -e 's/"//g' -e 's/,//')

				  BACKUP_DIR="conda/${subdir}"

				elif [[ "$PACKAGE_TYPE" == libtorch ]]; then

				  s3_dir="s3://pytorch/libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  for pkg in $(ls); do

				    retry aws s3 cp "$pkg" "$s3_dir" --acl public-read

				  done

				  BACKUP_DIR="libtorch/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				else

				  s3_dir="s3://pytorch/whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				  retry aws s3 cp "$(ls)" "$s3_dir" --acl public-read

				  BACKUP_DIR="whl/${PIP_UPLOAD_FOLDER}${DESIRED_CUDA}/"

				fi

				if [[ -n "${CIRCLE_TAG:-}" ]]; then

				  s3_dir="${BACKUP_BUCKET}/${CIRCLE_TAG}/${BACKUP_DIR}"

				  retry aws s3 cp --recursive . "$s3_dir"

				fi

									
										5

.circleci/scripts/cpp_doc_push_script.sh
									
												View File
												
				@ -57,7 +57,6 @@ time python aten/src/ATen/gen.py \

				  -s aten/src/ATen \

				  -d build/aten/src/ATen \

				  aten/src/ATen/Declarations.cwrap \

				  aten/src/THNN/generic/THNN.h \

				  aten/src/THCUNN/generic/THCUNN.h \

				  aten/src/ATen/nn.yaml \

				  aten/src/ATen/native/native_functions.yaml

				@ -73,10 +72,10 @@ time python tools/setup_helpers/generate_code.py \

				# Build the docs

				pushd docs/cpp

				pip install breathe==4.11.1 bs4 lxml six

				pip install breathe==4.13.0 bs4 lxml six

				pip install --no-cache-dir -e "git+https://github.com/pytorch/pytorch_sphinx_theme.git#egg=pytorch_sphinx_theme"

				pip install exhale>=0.2.1

				pip install sphinx==1.8.5

				pip install sphinx==2.4.4

				# Uncomment once it is fixed

				# pip install -r requirements.txt

				time make VERBOSE=1 html -j

									
										28

.circleci/scripts/python_doc_push_script.sh
									
												View File
												
				@ -71,8 +71,30 @@ cp -a ../vision/docs/source source/torchvision

				# Build the docs

				pip -q install -r requirements.txt || true

				if [ "$is_master_doc" = true ]; then

				  # TODO: fix gh-38011 then enable this which changes warnings into errors

				  # export SPHINXOPTS="-WT --keep-going"

				  make html

				  make coverage

				  # Now we have the coverage report, we need to make sure it is empty.

				  # Count the number of lines in the file and turn that number into a variable

				  # $lines. The `cut -f1 ...` is to only parse the number, not the filename

				  # Skip the report header by subtracting 2: the header will be output even if

				  # there are no undocumented items.

				  #

				  # Also: see docs/source/conf.py for "coverage_ignore*" items, which should

				  # be documented then removed from there.

				  lines=$(wc -l build/coverage/python.txt 2>/dev/null |cut -f1 -d' ')

				  undocumented=$(($lines - 2))

				  if [ $undocumented -lt 0 ]; then

				    echo coverage output not found

				    exit 1

				  elif [ $undocumented -gt 0 ]; then

				    echo undocumented objects found:

				    cat build/coverage/python.txt

				    exit 1

				  fi

				else

				  # Don't fail the build on coverage problems

				  make html-stable

				fi

				@ -90,6 +112,12 @@ else

				  find "$install_path" -name "*.html" -print0 | xargs -0 perl -pi -w -e "s@master\s+\((\d\.\d\.[A-Fa-f0-9]+\+[A-Fa-f0-9]+)\s+\)@<a href='http://pytorch.org/docs/versions.html'>$version \&#x25BC</a>@g"

				fi

				# Prevent Google from indexing $install_path/_modules. This folder contains

				# generated source files.

				# NB: the following only works on gnu sed. The sed shipped with mac os is different.

				# One can `brew install gnu-sed` on a mac and then use "gsed" instead of "sed".

				find "$install_path/_modules" -name "*.html" -print0 | xargs -0 sed -i '/<head>/a \ \ <meta name="robots" content="noindex">'

				git add "$install_path" || true

				git status

				git config user.email "soumith+bot@pytorch.org"

									
										22

.circleci/scripts/setup_ci_environment.sh
									
												View File
												
				@ -2,7 +2,7 @@

				set -ex -o pipefail

				# Set up NVIDIA docker repo

				curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

				curl -s -L --retry 3 https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

				echo "deb https://nvidia.github.io/libnvidia-container/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				echo "deb https://nvidia.github.io/nvidia-docker/ubuntu16.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				@ -13,6 +13,15 @@ sudo rm -f /etc/apt/heroku.list

				sudo rm -f /etc/apt/openjdk-r-ubuntu-ppa-xenial.list

				sudo rm -f /etc/apt/partner.list

				retry () {

				    $*  || $* || $* || $* || $*

				}

				# Method adapted from here: https://askubuntu.com/questions/875213/apt-get-to-retry-downloading

				# (with use of tee to avoid permissions problems)

				# This is better than retrying the whole apt-get command

				echo "APT::Acquire::Retries \"3\";" | sudo tee /etc/apt/apt.conf.d/80-retries

				sudo apt-get -y update

				sudo apt-get -y remove linux-image-generic linux-headers-generic linux-generic docker-ce

				# WARNING: Docker version is hardcoded here; you must update the

				@ -27,7 +36,11 @@ sudo apt-get -y remove linux-image-generic linux-headers-generic linux-generic d

				# Ubuntu version (e.g., docker run -it ubuntu:16.04) and then ask

				# apt what the packages you need are.  Note that the CircleCI image

				# comes with Docker.

				sudo apt-get -y install \

				#

				# Using 'retry' here as belt-and-suspenders even though we are

				# presumably retrying at the single-package level via the

				# apt.conf.d/80-retries technique.

				retry sudo apt-get -y install \

				  linux-headers-$(uname -r) \

				  linux-image-generic \

				  moreutils \

				@ -38,14 +51,11 @@ sudo apt-get -y install \

				sudo pkill -SIGHUP dockerd

				retry () {

				    $*  || $* || $* || $* || $*

				}

				retry sudo pip -q install awscli==1.16.35

				if [ -n "${USE_CUDA_DOCKER_RUNTIME:-}" ]; then

				  DRIVER_FN="NVIDIA-Linux-x86_64-430.40.run"

				  DRIVER_FN="NVIDIA-Linux-x86_64-440.59.run"

				  wget "https://s3.amazonaws.com/ossci-linux/nvidia_driver/$DRIVER_FN"

				  sudo /bin/bash "$DRIVER_FN" -s --no-drm || (sudo cat /var/log/nvidia-installer.log && false)

				  nvidia-smi

									
										2

.circleci/scripts/setup_linux_system_environment.sh
									
												View File
												
				@ -2,7 +2,7 @@

				set -eux -o pipefail

				# Set up CircleCI GPG keys for apt, if needed

				curl -L https://packagecloud.io/circleci/trusty/gpgkey | sudo apt-key add -

				curl --retry 3 -s -L https://packagecloud.io/circleci/trusty/gpgkey | sudo apt-key add -

				# Stop background apt updates.  Hypothetically, the kill should not

				# be necessary, because stop is supposed to send a kill signal to

									
										140

.circleci/scripts/should_run_job.py
									
												View File
											
				@ -1,140 +0,0 @@

				import argparse

				import re

				import sys

				# Modify this variable if you want to change the set of default jobs

				# which are run on all pull requests.

				#

				# WARNING: Actually, this is a lie; we're currently also controlling

				# the set of jobs to run via the Workflows filters in CircleCI config.

				default_set = set([

				    # PyTorch CPU

				    # Selected oldest Python 2 version to ensure Python 2 coverage

				    'pytorch-linux-xenial-py2.7.9',

				    # PyTorch CUDA

				    'pytorch-linux-xenial-cuda9-cudnn7-py3',

				    # PyTorch ASAN

				    'pytorch-linux-xenial-py3-clang5-asan',

				    # PyTorch DEBUG

				    'pytorch-linux-xenial-py3.6-gcc5.4',

				    # LibTorch

				    'pytorch-libtorch-linux-xenial-cuda9-cudnn7-py3',

				    # Caffe2 CPU

				    'caffe2-py2-mkl-ubuntu16.04',

				    # Caffe2 CUDA

				    'caffe2-py3.5-cuda10.1-cudnn7-ubuntu16.04',

				    # Caffe2 ONNX

				    'caffe2-onnx-py2-gcc5-ubuntu16.04',

				    'caffe2-onnx-py3.6-clang7-ubuntu16.04',

				    # Caffe2 Clang

				    'caffe2-py2-clang7-ubuntu16.04',

				    # Caffe2 CMake

				    'caffe2-cmake-cuda9.0-cudnn7-ubuntu16.04',

				    # Caffe2 CentOS

				    'caffe2-py3.6-devtoolset7-cuda9.0-cudnn7-centos7',

				    # Binaries

				    'manywheel 2.7mu cpu devtoolset7',

				    'libtorch 2.7m cpu devtoolset7',

				    'libtorch 2.7m cpu gcc5.4_cxx11-abi',

				    'libtorch 2.7 cpu',

				    'libtorch-ios-11.2.1-nightly-x86_64-build',

				    'libtorch-ios-11.2.1-nightly-arm64-build',

				    'libtorch-ios-11.2.1-nightly-binary-build-upload',

				    # Caffe2 Android

				    'caffe2-py2-android-ubuntu16.04',

				    # Caffe2 OSX

				    'caffe2-py2-system-macos10.13',

				    # PyTorch OSX

				    'pytorch-macos-10.13-py3',

				    'pytorch-macos-10.13-cuda9.2-cudnn7-py3',

				    # PyTorch Android

				    'pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32-build',

				    'pytorch-linux-xenial-py3-clang5-android-ndk-r19',

				    # PyTorch Android gradle

				    'pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32',

				    # Pytorch iOS builds

				    'pytorch-ios-11.2.1-x86_64_build',

				    'pytorch-ios-11.2.1-arm64_build',

				    # PyTorch Mobile builds

				    'pytorch-linux-xenial-py3-clang5-mobile-build',

				    # Pytorch backward compatibility check

				    'pytorch-linux-backward-compatibility-check-test',

				    # XLA

				    'pytorch-xla-linux-xenial-py3.6-clang7',

				    # GraphExecutor config jobs

				    'pytorch-linux-xenial-py3.6-gcc5.4-ge_config_simple-test',

				    'pytorch-linux-xenial-py3.6-gcc5.4-ge_config_legacy-test',

				    # Other checks

				    'pytorch-short-perf-test-gpu',

				    'pytorch-python-doc-push',

				    'pytorch-cpp-doc-push',

				])

				# Collection of jobs that are *temporarily* excluded from running on PRs.

				# Use this if there is a long-running job breakage that we can't fix with a

				# single revert.

				skip_override = {

				    # example entry:

				    # 'pytorch-cpp-doc-push': "https://github.com/pytorch/pytorch/issues/<related issue>"

				}

				# Takes in commit message to analyze via stdin

				#

				# This script will query Git and attempt to determine if we should

				# run the current CI job under question

				#

				# NB: Try to avoid hard-coding names here, so there's less place to update when jobs

				# are updated/renamed

				#

				# Semantics in the presence of multiple tags:

				#   - Let D be the set of default builds

				#   - Let S be the set of explicitly specified builds

				#   - Let O be the set of temporarily skipped builds

				#   - Run S \/ (D - O)

				parser = argparse.ArgumentParser()

				parser.add_argument('build_environment')

				args = parser.parse_args()

				commit_msg = sys.stdin.read()

				# Matches anything that looks like [foo ci] or [ci foo] or [foo test]

				# or [test foo]

				RE_MARKER = re.compile(r'\[(?:([^ \[\]]+) )?(?:ci|test)(?: ([^ \[\]]+))?\]')

				markers = RE_MARKER.finditer(commit_msg)

				for m in markers:

				    if m.group(1) and m.group(2):

				        print("Unrecognized marker: {}".format(m.group(0)))

				        continue

				    spec = m.group(1) or m.group(2)

				    if spec is None:

				        print("Unrecognized marker: {}".format(m.group(0)))

				        continue

				    if spec in args.build_environment or spec == 'all':

				        print("Accepting {} due to commit marker {}".format(args.build_environment, m.group(0)))

				        sys.exit(0)

				skip_override_set = set(skip_override.keys())

				should_run_set = default_set - skip_override_set

				for spec in should_run_set:

				    if spec in args.build_environment:

				        print("Accepting {} as part of default set".format(args.build_environment))

				        sys.exit(0)

				print("Rejecting {}".format(args.build_environment))

				for spec, issue in skip_override.items():

				    if spec in args.build_environment:

				        print("This job is temporarily excluded from running on PRs. Reason: {}".format(issue))

				        break

				sys.exit(1)

									
										29

.circleci/scripts/should_run_job.sh
									
												View File
											
				@ -1,29 +0,0 @@

				#!/usr/bin/env bash

				set -exu -o pipefail

				SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )"

				# Check if we should actually run

				echo "BUILD_ENVIRONMENT: ${BUILD_ENVIRONMENT:-}"

				echo "CIRCLE_PULL_REQUEST: ${CIRCLE_PULL_REQUEST:-}"

				if [ -z "${BUILD_ENVIRONMENT:-}" ]; then

				  echo "Cannot run should_run_job.sh if BUILD_ENVIRONMENT is not defined!"

				  echo "CircleCI scripts are probably misconfigured."

				  exit 1

				fi

				if ! [ -e "$SCRIPT_DIR/COMMIT_MSG" ]; then

				  echo "Cannot run should_run_job.sh if you don't have COMMIT_MSG"

				  echo "written out.  Are you perhaps running the wrong copy of this script?"

				  echo "You should be running the copy in ~/workspace; SCRIPT_DIR=$SCRIPT_DIR"

				  exit 1

				fi

				if [ -n "${CIRCLE_PULL_REQUEST:-}" ]; then

				  if [[ $CIRCLE_BRANCH != "ci-all/"* ]] && [[ $CIRCLE_BRANCH != "nightly" ]] &&  [[ $CIRCLE_BRANCH != "postnightly" ]] ; then

				    # Don't swallow "script doesn't exist

				    [ -e "$SCRIPT_DIR/should_run_job.py"  ]

				    if ! python "$SCRIPT_DIR/should_run_job.py" "${BUILD_ENVIRONMENT:-}" < "$SCRIPT_DIR/COMMIT_MSG" ; then

				      circleci step halt

				      exit

				    fi

				  fi

				fi

									
										145

.circleci/scripts/upload_binary_size_to_scuba.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,145 @@

				import glob

				import json

				import logging

				import os

				import os.path

				import pathlib

				import re

				import sys

				import time

				import zipfile

				import requests

				def get_size(file_dir):

				    try:

				        # we should only expect one file, if no, something is wrong

				        file_name = glob.glob(os.path.join(file_dir, "*"))[0]

				        return os.stat(file_name).st_size

				    except:

				        logging.exception(f"error getting file from: {file_dir}")

				        return 0

				def build_message(size):

				    pkg_type, py_ver, cu_ver, *_ = os.environ.get("BUILD_ENVIRONMENT", "").split() + [

				        None,

				        None,

				        None,

				    ]

				    os_name = os.uname()[0].lower()

				    if os_name == "darwin":

				        os_name = "macos"

				    return {

				        "normal": {

				            "os": os_name,

				            "pkg_type": pkg_type,

				            "py_ver": py_ver,

				            "cu_ver": cu_ver,

				            "pr": os.environ.get("CIRCLE_PR_NUMBER"),

				            "build_num": os.environ.get("CIRCLE_BUILD_NUM"),

				            "sha1": os.environ.get("CIRCLE_SHA1"),

				            "branch": os.environ.get("CIRCLE_BRANCH"),

				        },

				        "int": {

				            "time": int(time.time()),

				            "size": size,

				            "commit_time": int(os.environ.get("COMMIT_TIME", "0")),

				        },

				    }

				def send_message(messages):

				    access_token = os.environ.get("SCRIBE_GRAPHQL_ACCESS_TOKEN")

				    if not access_token:

				        raise ValueError("Can't find access token from environment variable")

				    url = "https://graph.facebook.com/scribe_logs"

				    r = requests.post(

				        url,

				        data={

				            "access_token": access_token,

				            "logs": json.dumps(

				                [

				                    {

				                        "category": "perfpipe_pytorch_binary_size",

				                        "message": json.dumps(message),

				                        "line_escape": False,

				                    }

				                    for message in messages

				                ]

				            ),

				        },

				    )

				    print(r.text)

				    r.raise_for_status()

				def report_android_sizes(file_dir):

				    def gen_sizes():

				        # we should only expect one file, if no, something is wrong

				        aar_files = list(pathlib.Path(file_dir).rglob("pytorch_android-*.aar"))

				        if len(aar_files) != 1:

				            logging.exception(f"error getting aar files from: {file_dir} / {aar_files}")

				            return

				        aar_file = aar_files[0]

				        zf = zipfile.ZipFile(aar_file)

				        for info in zf.infolist():

				            # Scan ".so" libs in `jni` folder. Examples:

				            # jni/arm64-v8a/libfbjni.so

				            # jni/arm64-v8a/libpytorch_jni.so

				            m = re.match(r"^jni/([^/]+)/(.*\.so)$", info.filename)

				            if not m:

				                continue

				            arch, lib = m.groups()

				            # report per architecture library size

				            yield [arch, lib, info.compress_size, info.file_size]

				        # report whole package size

				        yield ["aar", aar_file.name, os.stat(aar_file).st_size, 0]

				    def gen_messages():

				        android_build_type = os.environ.get("ANDROID_BUILD_TYPE")

				        for arch, lib, comp_size, uncomp_size in gen_sizes():

				            print(android_build_type, arch, lib, comp_size, uncomp_size)

				            yield {

				                "normal": {

				                    "os": "android",

				                    # TODO: create dedicated columns

				                    "pkg_type": "{}/{}/{}".format(android_build_type, arch, lib),

				                    "cu_ver": "",  # dummy value for derived field `build_name`

				                    "py_ver": "",  # dummy value for derived field `build_name`

				                    "pr": os.environ.get("CIRCLE_PR_NUMBER"),

				                    "build_num": os.environ.get("CIRCLE_BUILD_NUM"),

				                    "sha1": os.environ.get("CIRCLE_SHA1"),

				                    "branch": os.environ.get("CIRCLE_BRANCH"),

				                },

				                "int": {

				                    "time": int(time.time()),

				                    "commit_time": int(os.environ.get("COMMIT_TIME", "0")),

				                    "size": comp_size,

				                    "raw_size": uncomp_size,

				                },

				            }

				    send_message(list(gen_messages()))

				if __name__ == "__main__":

				    file_dir = os.environ.get(

				        "PYTORCH_FINAL_PACKAGE_DIR", "/home/circleci/project/final_pkgs"

				    )

				    if len(sys.argv) == 2:

				        file_dir = sys.argv[1]

				    print("checking dir: " + file_dir)

				    if "-android" in os.environ.get("BUILD_ENVIRONMENT", ""):

				        report_android_sizes(file_dir)

				    else:

				        size = get_size(file_dir)

				        if size != 0:

				            try:

				                send_message([build_message(size)])

				            except:

				                logging.exception("can't send message")

									
										34

.circleci/scripts/vs_install.ps1
									
										Normal file
									
												View File
												
				@ -0,0 +1,34 @@

				$VS_DOWNLOAD_LINK = "https://aka.ms/vs/15/release/vs_buildtools.exe"

				$COLLECT_DOWNLOAD_LINK = "https://aka.ms/vscollect.exe"

				$VS_INSTALL_ARGS = @("--nocache","--quiet","--wait", "--add Microsoft.VisualStudio.Workload.VCTools",

				                                                     "--add Microsoft.VisualStudio.Component.VC.Tools.14.11",

				                                                     "--add Microsoft.Component.MSBuild",

				                                                     "--add Microsoft.VisualStudio.Component.Roslyn.Compiler",

				                                                     "--add Microsoft.VisualStudio.Component.TextTemplating",

				                                                     "--add Microsoft.VisualStudio.Component.VC.CoreIde",

				                                                     "--add Microsoft.VisualStudio.Component.VC.Redist.14.Latest",

				                                                     "--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Core",

				                                                     "--add Microsoft.VisualStudio.Component.VC.Tools.x86.x64",

				                                                     "--add Microsoft.VisualStudio.ComponentGroup.NativeDesktop.Win81")

				curl.exe --retry 3 -kL $VS_DOWNLOAD_LINK --output vs_installer.exe

				if ($LASTEXITCODE -ne 0) {

				    echo "Download of the VS 2017 installer failed"

				    exit 1

				}

				$process = Start-Process "${PWD}\vs_installer.exe" -ArgumentList $VS_INSTALL_ARGS -NoNewWindow -Wait -PassThru

				Remove-Item -Path vs_installer.exe -Force

				$exitCode = $process.ExitCode

				if (($exitCode -ne 0) -and ($exitCode -ne 3010)) {

				    echo "VS 2017 installer exited with code $exitCode, which should be one of [0, 3010]."

				    curl.exe --retry 3 -kL $COLLECT_DOWNLOAD_LINK --output Collect.exe

				    if ($LASTEXITCODE -ne 0) {

				        echo "Download of the VS Collect tool failed."

				        exit 1

				    }

				    Start-Process "${PWD}\Collect.exe" -NoNewWindow -Wait -PassThru

				    New-Item -Path "C:\w\build-results" -ItemType "directory" -Force

				    Copy-Item -Path "C:\Users\circleci\AppData\Local\Temp\vslogs.zip" -Destination "C:\w\build-results\"

				    exit 1

				}

									
										37

.circleci/scripts/windows_cuda_install.sh
									
										Normal file
									
												View File
												
				@ -0,0 +1,37 @@

				#!/bin/bash

				set -eux -o pipefail

				curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/cuda_10.1.243_426.00_win10.exe

				7z x cuda_10.1.243_426.00_win10.exe -ocuda_10.1.243_426.00_win10

				cd cuda_10.1.243_426.00_win10

				mkdir cuda_install_logs

				set +e

				./setup.exe -s nvcc_10.1 cuobjdump_10.1 nvprune_10.1 cupti_10.1 cublas_10.1 cublas_dev_10.1 cudart_10.1 cufft_10.1 cufft_dev_10.1 curand_10.1 curand_dev_10.1 cusolver_10.1 cusolver_dev_10.1 cusparse_10.1 cusparse_dev_10.1 nvgraph_10.1 nvgraph_dev_10.1 npp_10.1 npp_dev_10.1 nvrtc_10.1 nvrtc_dev_10.1 nvml_dev_10.1 -loglevel:6 -log:"$(pwd -W)/cuda_install_logs"

				set -e

				if [[ "${VC_YEAR}" == "2017" ]]; then

				    cp -r CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions/* "C:/Program Files (x86)/Microsoft Visual Studio/2017/${VC_PRODUCT}/Common7/IDE/VC/VCTargets/BuildCustomizations/"

				else

				    cp -r CUDAVisualStudioIntegration/extras/visual_studio_integration/MSBuildExtensions/* "C:/Program Files (x86)/Microsoft Visual Studio/2019/${VC_PRODUCT}/MSBuild/Microsoft/VC/v160/BuildCustomizations/"

				fi

				curl --retry 3 -kLO https://ossci-windows.s3.amazonaws.com/NvToolsExt.7z

				7z x NvToolsExt.7z -oNvToolsExt

				mkdir -p "C:/Program Files/NVIDIA Corporation/NvToolsExt"

				cp -r NvToolsExt/* "C:/Program Files/NVIDIA Corporation/NvToolsExt/"

				export NVTOOLSEXT_PATH="C:\\Program Files\\NVIDIA Corporation\\NvToolsExt\\"

				if ! ls "/c/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/bin/nvcc.exe"

				then

				    echo "CUDA installation failed"

				    mkdir -p /c/w/build-results

				    7z a "c:\\w\\build-results\\cuda_install_logs.7z" cuda_install_logs

				    exit 1

				fi

				cd ..

				rm -rf ./cuda_10.1.243_426.00_win10

				rm -f ./cuda_10.1.243_426.00_win10.exe

									
										67

.circleci/validate-docker-version.py
									
												View File
												
				@ -1,43 +1,44 @@

				#!/usr/bin/env python3

				import urllib.request

				import re

				import cimodel.data.pytorch_build_definitions as pytorch_build_definitions

				import cimodel.data.caffe2_build_definitions as caffe2_build_definitions

				import cimodel.data.simple.util.docker_constants as pytorch_docker_constants

				RE_VERSION = re.compile(r'allDeployedVersions = "([0-9,]+)"')

				from yaml import load

				URL_TEMPLATE = (

				    "https://raw.githubusercontent.com/pytorch/ossci-job-dsl/"

				    "master/src/main/groovy/ossci/{}/DockerVersion.groovy"

				)

				try:

				    from yaml import CLoader as Loader

				except ImportError:

				    from yaml import Loader

				def load_config(filename=".circleci/config.yml"):

				    with open(filename, "r") as fh:

				        return load("".join(fh.readlines()), Loader)

				def load_tags_for_projects(workflow_config):

				    return {

				        v["ecr_gc_job"]["project"]: v["ecr_gc_job"]["tags_to_keep"]

				        for v in workflow_config["workflows"]["ecr_gc"]["jobs"]

				        if isinstance(v, dict) and "ecr_gc_job" in v

				    }

				def check_version(job, tags, expected_version):

				    valid_versions = tags[job].split(",")

				    if expected_version not in valid_versions:

				        raise RuntimeError(

				            "We configured {} to use Docker version {}; but this "

				            "version is not configured in job ecr_gc_job_for_{}.  Non-deployed versions will be "

				            "garbage collected two weeks after they are created.  DO NOT LAND "

				            "THIS TO MASTER without also updating ossci-job-dsl with this version."

				            "\n\nDeployed versions: {}".format(job, expected_version, job, tags[job])

				        )

				def check_version(job, expected_version):

				    url = URL_TEMPLATE.format(job)

				    with urllib.request.urlopen(url) as f:

				        contents = f.read().decode('utf-8')

				        m = RE_VERSION.search(contents)

				        if not m:

				            raise RuntimeError(

				                "Unbelievable! I could not find the variable allDeployedVersions in "

				                "{}; did the organization of ossci-job-dsl change?\n\nFull contents:\n{}"

				                .format(url, contents)

				            )

				        valid_versions = [int(v) for v in m.group(1).split(',')]

				        if expected_version not in valid_versions:

				            raise RuntimeError(

				                "We configured {} to use Docker version {}; but this "

				                "version is not deployed in {}.  Non-deployed versions will be "

				                "garbage collected two weeks after they are created.  DO NOT LAND "

				                "THIS TO MASTER without also updating ossci-job-dsl with this version."

				                "\n\nDeployed versions: {}"

				                .format(job, expected_version, url, m.group(1))

				            )

				def validate_docker_version():

				    check_version('pytorch', pytorch_build_definitions.DOCKER_IMAGE_VERSION)

				    check_version('caffe2', caffe2_build_definitions.DOCKER_IMAGE_VERSION)

				    tags = load_tags_for_projects(load_config())

				    check_version("pytorch", tags, pytorch_docker_constants.DOCKER_IMAGE_TAG)

				    check_version("caffe2", tags, caffe2_build_definitions.DOCKER_IMAGE_VERSION)

				if __name__ == "__main__":

									
										20

.circleci/verbatim-sources/binary-build-tests.yml
									
												View File
											
				@ -1,20 +0,0 @@

				# There is currently no testing for libtorch TODO

				#  binary_linux_libtorch_2.7m_cpu_test:

				#    environment:

				#      BUILD_ENVIRONMENT: "libtorch 2.7m cpu"

				#    resource_class: gpu.medium

				#    <<: *binary_linux_test

				#

				#  binary_linux_libtorch_2.7m_cu90_test:

				#    environment:

				#      BUILD_ENVIRONMENT: "libtorch 2.7m cu90"

				#    resource_class: gpu.medium

				#    <<: *binary_linux_test

				#

				#  binary_linux_libtorch_2.7m_cu100_test:

				#    environment:

				#      BUILD_ENVIRONMENT: "libtorch 2.7m cu100"

				#    resource_class: gpu.medium

				#    <<: *binary_linux_test

									
										12

.circleci/verbatim-sources/binary-build-params.yml → .circleci/verbatim-sources/build-parameters/binary-build-params.yml
									
												View File
												
				@ -52,3 +52,15 @@ binary_mac_params: &binary_mac_params

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				binary_windows_params: &binary_windows_params

				  parameters:

				    build_environment:

				      type: string

				      default: ""

				    executor:

				      type: string

				      default: "windows-cpu-with-nvidia-cuda"

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    BUILD_FOR_SYSTEM: windows

				    JOB_EXECUTOR: <<parameters.executor>>

									
										1

.circleci/verbatim-sources/caffe2-build-params.yml → .circleci/verbatim-sources/build-parameters/caffe2-build-params.yml
									
												View File
												
				@ -25,4 +25,3 @@ caffe2_params: &caffe2_params

				    DOCKER_IMAGE: << parameters.docker_image >>

				    BUILD_ONLY: << parameters.build_only >>

				  resource_class: << parameters.resource_class >>

									
										14

.circleci/verbatim-sources/build-parameters/promote-build-params.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,14 @@

				promote_common: &promote_common

				  docker:

				    - image: pytorch/release

				  parameters:

				    package_name:

				      description: "package name to promote"

				      type: string

				      default: ""

				  environment:

				    PACKAGE_NAME: << parameters.package_name >>

				    ANACONDA_API_TOKEN: ${CONDA_PYTORCHBOT_TOKEN}

				    AWS_ACCESS_KEY_ID: ${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}

				    AWS_SECRET_ACCESS_KEY: ${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}

									
										85

.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,85 @@

				pytorch_params: &pytorch_params

				  parameters:

				    build_environment:

				      type: string

				      default: ""

				    docker_image:

				      type: string

				      default: ""

				    resource_class:

				      type: string

				      default: "large"

				    use_cuda_docker_runtime:

				      type: string

				      default: ""

				    build_only:

				      type: string

				      default: ""

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    DOCKER_IMAGE: << parameters.docker_image >>

				    USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>

				    BUILD_ONLY: << parameters.build_only >>

				  resource_class: << parameters.resource_class >>

				pytorch_ios_params: &pytorch_ios_params

				  parameters:

				    build_environment:

				      type: string

				      default: ""

				    ios_arch:

				      type: string

				      default: ""

				    ios_platform:

				      type: string

				      default: ""

				    op_list:

				      type: string

				      default: ""

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    IOS_ARCH: << parameters.ios_arch >>

				    IOS_PLATFORM: << parameters.ios_platform >>

				    SELECTED_OP_LIST: << parameters.op_list >>

				pytorch_windows_params: &pytorch_windows_params

				  parameters:

				    executor:

				      type: string

				      default: "windows-cpu-with-nvidia-cuda"

				    build_environment:

				      type: string

				      default: ""

				    test_name:

				      type: string

				      default: ""

				    cuda_version:

				      type: string

				      default: "10"

				    python_version:

				      type: string

				      default: "3.6"

				    vc_version:

				      type: string

				      default: "14.11"

				    vc_year:

				      type: string

				      default: "2017"

				    vc_product:

				      type: string

				      default: "BuildTools"

				    use_cuda:

				      type: string

				      default: ""

				  environment:

				    BUILD_ENVIRONMENT: <<parameters.build_environment>>

				    SCCACHE_BUCKET: "ossci-compiler-cache"

				    CUDA_VERSION: <<parameters.cuda_version>>

				    PYTHON_VERSION: <<parameters.python_version>>

				    VC_VERSION: <<parameters.vc_version>>

				    VC_YEAR: <<parameters.vc_year>>

				    VC_PRODUCT: <<parameters.vc_product>>

				    USE_CUDA: <<parameters.use_cuda>>

				    TORCH_CUDA_ARCH_LIST: "7.5"

				    JOB_BASE_NAME: <<parameters.test_name>>

				    JOB_EXECUTOR: <<parameters.executor>>

									
										71

.circleci/verbatim-sources/commands.yml
									
												View File
												
				@ -1,18 +1,23 @@

				commands:

				  # NB: This command must be run as the first command in a job. It

				  # attaches the workspace at ~/workspace; this workspace is generated

				  # by the setup job. Note that ~/workspace is not the default working

				  # directory (that's ~/project).

				  should_run_job:

				    description: "Test if the job should run or not"

				  # Must be run after attaching workspace from previous steps

				  load_shared_env:

				    description: "Loads .circleci/shared/env_file into ${BASH_ENV}"

				    parameters:

				      # For some weird reason we decide to reattach our workspace to ~/workspace so

				      # in the vein of making it simple let's assume our share env_file is here

				      root:

				        type: string

				        default: "~/workspace"

				    steps:

				      - attach_workspace:

				          name: Attaching workspace

				          at: ~/workspace

				      - run:

				          name: Should run job

				          no_output_timeout: "2m"

				          command: ~/workspace/.circleci/scripts/should_run_job.sh

				          name: "Load .circleci/shared/env_file into ${BASH_ENV}"

				          command: |

				            if [[ -f  "<< parameters.root >>/.circleci/shared/env_file" ]]; then

				              cat << parameters.root >>/.circleci/shared/env_file >> ${BASH_ENV}

				            else

				              echo "We didn't have a shared env file, that's weird"

				            fi

				  # This system setup script is meant to run before the CI-related scripts, e.g.,

				  # installing Git client, checking out code, setting up CI env, and

				@ -22,14 +27,14 @@ commands:

				      - run:

				          name: Set Up System Environment

				          no_output_timeout: "1h"

				          command: ~/workspace/.circleci/scripts/setup_linux_system_environment.sh

				          command: .circleci/scripts/setup_linux_system_environment.sh

				  setup_ci_environment:

				    steps:

				      - run:

				          name: Set Up CI Environment After attach_workspace

				          no_output_timeout: "1h"

				          command: ~/workspace/.circleci/scripts/setup_ci_environment.sh

				          command: .circleci/scripts/setup_ci_environment.sh

				  brew_update:

				    description: "Update Homebrew and install base formulae"

				@ -88,3 +93,41 @@ commands:

				      - brew_update

				      - brew_install:

				          formulae: libtool

				  optional_merge_target_branch:

				    steps:

				      - run:

				          name: (Optional) Merge target branch

				          no_output_timeout: "10m"

				          command: |

				            if [ -n "$CIRCLE_PULL_REQUEST" ]; then

				              PR_NUM=$(basename $CIRCLE_PULL_REQUEST)

				              CIRCLE_PR_BASE_BRANCH=$(curl -s https://api.github.com/repos/$CIRCLE_PROJECT_USERNAME/$CIRCLE_PROJECT_REPONAME/pulls/$PR_NUM | jq -r '.base.ref')

				              if [[ "${BUILD_ENVIRONMENT}" == *"xla"* || "${BUILD_ENVIRONMENT}" == *"gcc5"* ]] ; then

				                set -x

				                git config --global user.email "circleci.ossci@gmail.com"

				                git config --global user.name "CircleCI"

				                git config remote.origin.url https://github.com/pytorch/pytorch.git

				                git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master

				                git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet

				                # PRs generated from ghstack has format CIRCLE_PR_BASE_BRANCH=gh/xxx/1234/base

				                if [[ "${CIRCLE_PR_BASE_BRANCH}" == "gh/"* ]]; then

				                  CIRCLE_PR_BASE_BRANCH=master

				                fi

				                export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/$CIRCLE_PR_BASE_BRANCH`

				                echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}

				                export GIT_COMMIT=${CIRCLE_SHA1}

				                echo "GIT_COMMIT: " ${GIT_COMMIT}

				                git checkout -f ${GIT_COMMIT}

				                git reset --hard ${GIT_COMMIT}

				                git merge --allow-unrelated-histories --no-edit --no-ff ${GIT_MERGE_TARGET}

				                echo "Merged $CIRCLE_PR_BASE_BRANCH branch before building in environment $BUILD_ENVIRONMENT"

				                set +x

				              else

				                echo "No need to merge with $CIRCLE_PR_BASE_BRANCH, skipping..."

				              fi

				            else

				              echo "This is not a pull request, skipping..."

				            fi

									
										21

.circleci/verbatim-sources/docker_build_job.yml
									
												View File
											
				@ -1,21 +0,0 @@

				  docker_build_job:

				      parameters:

				        image_name:

				          type: string

				          default: ""

				      machine:

				        image: ubuntu-1604:201903-01

				      resource_class: large

				      environment:

				        IMAGE_NAME: << parameters.image_name >>

				      steps:

				        - checkout

				        - run:

				            name: build_docker_image_<< parameters.image_name >>

				            no_output_timeout: "1h"

				            command: |

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              set -x

				              cd .circleci/docker && ./build_docker.sh

									
										31

.circleci/verbatim-sources/header-section.yml
									
												View File
												
				@ -1,21 +1,34 @@

				# WARNING: DO NOT EDIT THIS FILE DIRECTLY!!!

				# See the README.md in this directory.

				# IMPORTANT: To update Docker image version, please first update

				# https://github.com/pytorch/ossci-job-dsl/blob/master/src/main/groovy/ossci/pytorch/DockerVersion.groovy and

				# https://github.com/pytorch/ossci-job-dsl/blob/master/src/main/groovy/ossci/caffe2/DockerVersion.groovy,

				# and then update DOCKER_IMAGE_VERSION at the top of the following files:

				# * cimodel/data/pytorch_build_definitions.py

				# * cimodel/data/caffe2_build_definitions.py

				# And the inline copies of the variable in

				# * verbatim-sources/job-specs-custom.yml

				#   (grep for DOCKER_IMAGE)

				# IMPORTANT: To update Docker image version, please follow

				# the instructions at

				# https://github.com/pytorch/pytorch/wiki/Docker-image-build-on-CircleCI

				version: 2.1

				parameters:

				  run_binary_tests:

				    type: boolean

				    default: false

				docker_config_defaults: &docker_config_defaults

				  user: jenkins

				  aws_auth:

				    # This IAM user only allows read-write access to ECR

				    aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_ECR_READ_WRITE_V4}

				    aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_WRITE_V4}

				executors:

				  windows-with-nvidia-gpu:

				    machine:

				      resource_class: windows.gpu.nvidia.medium

				      image: windows-server-2019-nvidia:stable

				      shell: bash.exe

				  windows-cpu-with-nvidia-cuda:

				    machine:

				      # we will change to CPU host when it's ready

				      resource_class: windows.xlarge

				      image: windows-server-2019-vs2019:stable

				      shell: bash.exe

									
										14

.circleci/verbatim-sources/job-specs/binary-build-tests.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,14 @@

				# There is currently no testing for libtorch TODO

				#  binary_linux_libtorch_3.6m_cpu_test:

				#    environment:

				#      BUILD_ENVIRONMENT: "libtorch 3.6m cpu"

				#    resource_class: gpu.medium

				#    <<: *binary_linux_test

				#

				#  binary_linux_libtorch_3.6m_cu90_test:

				#    environment:

				#      BUILD_ENVIRONMENT: "libtorch 3.6m cu90"

				#    resource_class: gpu.medium

				#    <<: *binary_linux_test

				#

									
										163

.circleci/verbatim-sources/binary-job-specs.yml → .circleci/verbatim-sources/job-specs/binary-job-specs.yml
									
												View File
												
				@ -2,7 +2,7 @@

				    <<: *binary_linux_build_params

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - checkout

				    - run:

				        <<: *binary_checkout

				    - run:

				@ -19,8 +19,8 @@

				            elif [[ "$OS_NAME" == *"Ubuntu"* ]]; then

				              retry apt-get update

				              retry apt-get -y install expect moreutils

				              conda install -y -c eumetsat expect

				              conda install -y cmake

				              retry conda install -y -c eumetsat expect

				              retry conda install -y cmake

				            fi

				    - run:

				        name: Update compiler to devtoolset7

				@ -41,10 +41,28 @@

				        no_output_timeout: "1h"

				        command: |

				            source "/pytorch/.circleci/scripts/binary_linux_build.sh"

				    - run:

				        name: Output binary sizes

				        no_output_timeout: "1m"

				        command: |

				            ls -lah /final_pkgs

				    - run:

				        name: save binary size

				        no_output_timeout: "5m"

				        command: |

				            source /env

				            cd /pytorch && export COMMIT_TIME=$(git log --max-count=1 --format=%ct || echo 0)

				            pip3 install requests && \

				            SCRIBE_GRAPHQL_ACCESS_TOKEN=${SCRIBE_GRAPHQL_ACCESS_TOKEN} \

				            python3 /pytorch/.circleci/scripts/upload_binary_size_to_scuba.py || exit 0

				    - persist_to_workspace:

				        root: /

				        paths: final_pkgs

				    - store_artifacts:

				        path: /final_pkgs

				    # This should really just be another step of the binary_linux_build job above.

				    # This isn't possible right now b/c the build job uses the docker executor

				    # (otherwise they'd be really really slow) but this one uses the macine

				@ -56,7 +74,7 @@

				        image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - checkout

				    # TODO: We shouldn't attach the workspace multiple times

				    - attach_workspace:

				        at: /home/circleci/project

				@ -69,7 +87,7 @@

				    - run:

				        name: Prepare test code

				        no_output_timeout: "1h"

				        command: ~/workspace/.circleci/scripts/binary_linux_test.sh

				        command: .circleci/scripts/binary_linux_test.sh

				    - run:

				        <<: *binary_run_in_docker

				@ -79,7 +97,7 @@

				        image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - checkout

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - attach_workspace:

				@ -91,7 +109,7 @@

				    - run:

				        name: Upload

				        no_output_timeout: "1h"

				        command: ~/workspace/.circleci/scripts/binary_linux_upload.sh

				        command: .circleci/scripts/binary_linux_upload.sh

				  # Nighlty build smoke tests defaults

				  # These are the second-round smoke tests. These make sure that the binaries are

				@ -103,10 +121,7 @@

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				    - attach_workspace:

				        at: /home/circleci/project

				    - checkout

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -130,12 +145,9 @@

				  smoke_mac_test:

				    <<: *binary_linux_test_upload_params

				    macos:

				      xcode: "9.0"

				      xcode: "9.4.1"

				    steps:

				      - attach_workspace:

				          at: ~/workspace

				      - attach_workspace: # TODO - we can `cp` from ~/workspace

				          at: /Users/distiller/project

				      - checkout

				      - run:

				          <<: *binary_checkout

				      - run:

				@ -158,10 +170,10 @@

				  binary_mac_build:

				    <<: *binary_mac_params

				    macos:

				      xcode: "9.0"

				      xcode: "9.4.1"

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - checkout

				    - run:

				        <<: *binary_checkout

				    - run:

				@ -199,10 +211,10 @@

				  binary_mac_upload: &binary_mac_upload

				    <<: *binary_mac_params

				    macos:

				      xcode: "9.0"

				      xcode: "9.4.1"

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - checkout

				    - run:

				        <<: *binary_checkout

				    - run:

				@ -227,7 +239,6 @@

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				    - should_run_job

				    - checkout

				    - run_brew_for_ios_build

				    - run:

				@ -247,15 +258,14 @@

				    - persist_to_workspace:

				        root: /Users/distiller/workspace/

				        paths: ios

				  binary_ios_upload: 

				  binary_ios_upload:

				    <<: *pytorch_ios_params

				    macos:

				      xcode: "11.2.1"

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				    - should_run_job

				    - checkout

				    - run_brew_for_ios_build

				    - run:

				@ -265,3 +275,108 @@

				          script="/Users/distiller/project/.circleci/scripts/binary_ios_upload.sh"

				          cat "$script"

				          source "$script"

				  binary_windows_build:

				    <<: *binary_windows_params

				    parameters:

				      build_environment:

				        type: string

				        default: ""

				      executor:

				        type: string

				        default: "windows-cpu-with-nvidia-cuda"

				    executor: <<parameters.executor>>

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    - run:

				        <<: *binary_checkout

				    - run:

				        <<: *binary_populate_env

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        command: |

				          set -eux -o pipefail

				          script="/c/w/p/.circleci/scripts/binary_windows_build.sh"

				          cat "$script"

				          source "$script"

				    - persist_to_workspace:

				        root: "C:/w"

				        paths: final_pkgs

				  binary_windows_test:

				    <<: *binary_windows_params

				    parameters:

				      build_environment:

				        type: string

				        default: ""

				      executor:

				        type: string

				        default: "windows-cpu-with-nvidia-cuda"

				    executor: <<parameters.executor>>

				    steps:

				    - checkout

				    - attach_workspace:

				        at: c:/users/circleci/project

				    - run:

				        <<: *binary_checkout

				    - run:

				        <<: *binary_populate_env

				    - run:

				        name: Test

				        no_output_timeout: "1h"

				        command: |

				          set -eux -o pipefail

				          script="/c/w/p/.circleci/scripts/binary_windows_test.sh"

				          cat "$script"

				          source "$script"

				  binary_windows_upload:

				    <<: *binary_windows_params

				    docker:

				      - image: continuumio/miniconda

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    - attach_workspace:

				        at: /root/workspace

				    - run:

				        <<: *binary_checkout

				    - run:

				        <<: *binary_populate_env

				    - run:

				        name: Upload

				        no_output_timeout: "10m"

				        command: |

				          set -eux -o pipefail

				          script="/pytorch/.circleci/scripts/binary_windows_upload.sh"

				          cat "$script"

				          source "$script"

				  smoke_windows_test:

				    <<: *binary_windows_params

				    parameters:

				      build_environment:

				        type: string

				        default: ""

				      executor:

				        type: string

				        default: "windows-cpu-with-nvidia-cuda"

				    executor: <<parameters.executor>>

				    steps:

				    - checkout

				    - run:

				        <<: *binary_checkout

				    - run:

				        <<: *binary_populate_env

				    - run:

				        name: Test

				        no_output_timeout: "1h"

				        command: |

				          set -eux -o pipefail

				          export TEST_NIGHTLY_PACKAGE=1

				          script="/c/w/p/.circleci/scripts/binary_windows_test.sh"

				          cat "$script"

				          source "$script"

									
										64

.circleci/verbatim-sources/binary_update_htmls.yml → .circleci/verbatim-sources/job-specs/binary_update_htmls.yml
									
												View File
												
				@ -10,8 +10,7 @@

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				    - checkout

				    - setup_linux_system_environment

				    - run:

				        <<: *binary_checkout

				@ -28,6 +27,15 @@

				    # make sure it has the same upload folder as the job it's attached to. This

				    # function is idempotent, so it won't hurt anything; it's just a little

				    # unnescessary"

				    - run:

				        name: define PIP_UPLOAD_FOLDER

				        command: |

				          our_upload_folder=nightly/

				          # On tags upload to test instead

				          if [[ -n "${CIRCLE_TAG}" ]]; then

				            our_upload_folder=test/

				          fi

				          echo "export PIP_UPLOAD_FOLDER=${our_upload_folder}" >> ${BASH_ENV}

				    - run:

				        name: Update s3 htmls

				        no_output_timeout: "1h"

				@ -42,55 +50,3 @@

				          }

				          retry pip install awscli==1.6

				          "/home/circleci/project/builder/cron/update_s3_htmls.sh"

				  # Update s3 htmls for the nightlies

				  update_s3_htmls_for_nightlies:

				    environment:

				      PIP_UPLOAD_FOLDER: "nightly/"

				    <<: *update_s3_htmls

				  # Update s3 htmls for the nightlies for devtoolset7

				  update_s3_htmls_for_nightlies_devtoolset7:

				    environment:

				      PIP_UPLOAD_FOLDER: "nightly/devtoolset7/"

				    <<: *update_s3_htmls

				  # upload_binary_logs job

				  # The builder hud at pytorch.org/builder shows the sizes of all the binaries

				  # over time. It gets this info from html files stored in S3, which this job

				  # populates every day.

				  upload_binary_sizes: &upload_binary_sizes

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - attach_workspace:

				        at: ~/workspace

				    - setup_linux_system_environment

				    - run:

				        <<: *binary_checkout

				    - run:

				        <<: *binary_install_miniconda

				    - run:

				        name: Upload binary sizes

				        no_output_timeout: "1h"

				        command: |

				          set +x

				          echo "declare -x \"AWS_ACCESS_KEY_ID=${PYTORCH_BINARY_AWS_ACCESS_KEY_ID}\"" > /home/circleci/project/env

				          echo "declare -x \"AWS_SECRET_ACCESS_KEY=${PYTORCH_BINARY_AWS_SECRET_ACCESS_KEY}\"" >> /home/circleci/project/env

				          export DATE="$(date -u +%Y_%m_%d)"

				          retry () {

				              $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)

				          }

				          source /home/circleci/project/env

				          set -eux -o pipefail

				          # This is hardcoded to match binary_install_miniconda.sh

				          export PATH="/home/circleci/project/miniconda/bin:$PATH"

				          # Not any awscli will work. Most won't. This one will work

				          retry conda create -qyn aws36 python=3.6

				          source activate aws36

				          pip install awscli==1.16.46

				          "/home/circleci/project/builder/cron/upload_binary_sizes.sh"

									
										12

.circleci/verbatim-sources/caffe2-job-specs.yml → .circleci/verbatim-sources/job-specs/caffe2-job-specs.yml
									
												View File
												
				@ -4,9 +4,8 @@

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - setup_linux_system_environment

				    - checkout

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Build

				@ -64,7 +63,7 @@

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - checkout

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -124,10 +123,9 @@

				  caffe2_macos_build:

				    <<: *caffe2_params

				    macos:

				      xcode: "9.0"

				      xcode: "9.4.1"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      - should_run_job

				      - checkout

				      - run_brew_for_macos_build

				      - run:

				@ -151,7 +149,7 @@

				            # Install Anaconda if we need to

				            if [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				              rm -rf ${TMPDIR}/anaconda

				              curl -o ${TMPDIR}/conda.sh https://repo.continuum.io/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh

				              curl --retry 3 -o ${TMPDIR}/conda.sh https://repo.anaconda.com/miniconda/Miniconda${ANACONDA_VERSION}-latest-MacOSX-x86_64.sh

				              chmod +x ${TMPDIR}/conda.sh

				              /bin/bash ${TMPDIR}/conda.sh -b -p ${TMPDIR}/anaconda

				              rm -f ${TMPDIR}/conda.sh

				@ -162,7 +160,7 @@

				            pip -q install numpy

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

									
										135

.circleci/verbatim-sources/job-specs/docker_jobs.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,135 @@

				  docker_build_job:

				      parameters:

				        image_name:

				          type: string

				          default: ""

				      machine:

				        image: ubuntu-1604:201903-01

				      resource_class: large

				      environment:

				        IMAGE_NAME: << parameters.image_name >>

				        # Enable 'docker manifest'

				        DOCKER_CLI_EXPERIMENTAL: "enabled"

				        DOCKER_BUILDKIT: 1

				      steps:

				        - checkout

				        - run:

				            name: Calculate docker tag

				            command: |

				              set -x

				              mkdir .circleci/shared

				              # git keeps a hash of all sub trees

				              echo "export DOCKER_TAG=$(git rev-parse HEAD:.circleci/docker)" >> .circleci/shared/env_file

				        # Saves our calculated docker tag to our workpace for later use

				        - persist_to_workspace:

				            root: .

				            paths:

				              - .circleci/shared/

				        - load_shared_env:

				            root: .

				        - run:

				            name: Check if image should be built

				            command: |

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              eval $(aws ecr get-login --no-include-email --region us-east-1)

				              set -x

				              PREVIOUS_DOCKER_TAG=$(git rev-parse "$(git merge-base HEAD << pipeline.git.base_revision >>):.circleci/docker")

				              # Check if image already exists, if it does then skip building it

				              if docker manifest inspect "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/${IMAGE_NAME}:${DOCKER_TAG}"; then

				                circleci-agent step halt

				                # circleci-agent step halt doesn't actually halt the step so we need to

				                # explicitly exit the step here ourselves before it causes too much trouble

				                exit 0

				              fi

				              # If no image exists but the hash is the same as the previous hash then we should error out here

				              if [[ ${PREVIOUS_DOCKER_TAG} = ${DOCKER_TAG} ]]; then

				                echo "ERROR: Something has gone wrong and the previous image isn't available for the merge-base of your branch"

				                echo "       contact the PyTorch team to restore the original images"

				                exit 1

				              fi

				        - run:

				            name: build_docker_image_<< parameters.image_name >>

				            no_output_timeout: "1h"

				            command: |

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              set -x

				              cd .circleci/docker && ./build_docker.sh

				  docker_for_ecr_gc_build_job:

				      machine:

				        image: ubuntu-1604:201903-01

				      steps:

				        - checkout

				        - run:

				            name: build_docker_image_for_ecr_gc

				            no_output_timeout: "1h"

				            command: |

				              cd .circleci/ecr_gc_docker

				              docker build . -t 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              eval $(aws ecr get-login --no-include-email --region us-east-1)

				              set -x

				              docker push 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr

				  ecr_gc_job:

				      parameters:

				        project:

				          type: string

				          default: "pytorch"

				        tags_to_keep:  # comma separate values

				          type: string

				      environment:

				        PROJECT: << parameters.project >>

				        # TODO: Remove legacy image tags once we feel comfortable with new docker image tags

				        IMAGE_TAG: << parameters.tags_to_keep >>

				      docker:

				        - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr

				          aws_auth:

				            aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				            aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				      steps:

				        - checkout

				        - run:

				            # NOTE: see 'docker_build_job' for how these tags actually get built

				            name: dynamically generate tags to keep

				            no_output_timeout: "1h"

				            command: |

				              GENERATED_IMAGE_TAG=$(\

				                git log --oneline --pretty='%H' .circleci/docker \

				                  | xargs -I '{}' git rev-parse '{}:.circleci/docker' \

				                  | paste -sd "," -)

				              echo "export GENERATED_IMAGE_TAG='${GENERATED_IMAGE_TAG}'" >> ${BASH_ENV}

				        - run:

				            name: garbage collecting for ecr images

				            no_output_timeout: "1h"

				            command: |

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              set -x

				              /usr/bin/gc.py --filter-prefix ${PROJECT}  --ignore-tags "${IMAGE_TAG},${GENERATED_IMAGE_TAG}"

				  docker_hub_index_job:

				      docker:

				        - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/gc/ecr

				          aws_auth:

				            aws_access_key_id: ${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				            aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				      steps:

				        - run:

				            name: garbage collecting for ecr images

				            no_output_timeout: "1h"

				            command: |

				              set +x

				              export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_DOCKER_BUILDER_V1}

				              export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_DOCKER_BUILDER_V1}

				              export DOCKER_HUB_USERNAME=${CIRCLECI_DOCKER_HUB_USERNAME}

				              export DOCKER_HUB_PASSWORD=${CIRCLECI_DOCKER_HUB_PASSWORD}

				              set -x

				              /usr/bin/docker_hub.py

									
										291

.circleci/verbatim-sources/job-specs-custom.yml → .circleci/verbatim-sources/job-specs/job-specs-custom.yml
									
												View File
												
				@ -2,13 +2,12 @@

				    environment:

				      BUILD_ENVIRONMENT: pytorch-python-doc-push

				      # TODO: stop hardcoding this

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:405"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:209062ef-ab58-422a-b295-36c4eed6e906"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - checkout

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -39,21 +38,26 @@

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir -p ~/workspace/build_artifacts

				          docker cp $id:/var/lib/jenkins/workspace/pytorch.github.io/docs/master ~/workspace/build_artifacts

				          # Save the docs build so we can debug any problems

				          export DEBUG_COMMIT_DOCKER_IMAGE=${COMMIT_DOCKER_IMAGE}-debug

				          docker commit "$id" ${DEBUG_COMMIT_DOCKER_IMAGE}

				          time docker push ${DEBUG_COMMIT_DOCKER_IMAGE}

				    - store_artifacts:

				        path: ~/workspace/build_artifacts/master

				        destination: docs

				  pytorch_cpp_doc_push:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-cpp-doc-push

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:405"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:209062ef-ab58-422a-b295-36c4eed6e906"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - checkout

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				@ -93,10 +97,8 @@

				    environment:

				      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-build

				    macos:

				      xcode: "9.0"

				      xcode: "9.4.1"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      - should_run_job

				      - checkout

				      - run_brew_for_macos_build

				      - run:

				@ -107,7 +109,7 @@

				            export IN_CIRCLECI=1

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo curl --retry 3 https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				@ -120,24 +122,20 @@

				            chmod a+x .jenkins/pytorch/macos-build.sh

				            unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts

				            # copy with -a to preserve relative structure (e.g., symlinks), and be recursive

				            cp -a ~/project ~/workspace

				      - persist_to_workspace:

				          root: ~/workspace

				          root: /Users/distiller/workspace/

				          paths:

				            - miniconda3

				            - project

				  pytorch_macos_10_13_py3_test:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-macos-10.13-py3-test

				    macos:

				      xcode: "9.0"

				      xcode: "9.4.1"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      # This workspace also carries binaries from the build job

				      - should_run_job

				      - checkout

				      - attach_workspace:

				          at: ~/workspace

				      - run_brew_for_macos_build

				      - run:

				          name: Test

				@ -146,74 +144,22 @@

				            set -e

				            export IN_CIRCLECI=1

				            # copy with -a to preserve relative structure (e.g., symlinks), and be recursive

				            cp -a ~/workspace/project/. ~/project

				            chmod a+x .jenkins/pytorch/macos-test.sh

				            unbuffer .jenkins/pytorch/macos-test.sh 2>&1 | ts

				      - store_test_results:

				          path: test/test-reports

				  pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-macos-10.13-cuda9.2-cudnn7-py3-build

				    macos:

				      xcode: "9.0"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      - should_run_job

				      - checkout

				      - run_brew_for_macos_build

				      - run:

				          name: Build

				          no_output_timeout: "1h"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            # Install CUDA 9.2

				            sudo rm -rf ~/cuda_9.2.64_mac_installer.app || true

				            curl https://s3.amazonaws.com/ossci-macos/cuda_9.2.64_mac_installer.zip -o ~/cuda_9.2.64_mac_installer.zip

				            unzip ~/cuda_9.2.64_mac_installer.zip -d ~/

				            sudo ~/cuda_9.2.64_mac_installer.app/Contents/MacOS/CUDAMacOSXInstaller --accept-eula --no-window

				            sudo cp /usr/local/cuda/lib/libcuda.dylib /Developer/NVIDIA/CUDA-9.2/lib/libcuda.dylib

				            sudo rm -rf /usr/local/cuda || true

				            # Install cuDNN 7.1 for CUDA 9.2

				            curl https://s3.amazonaws.com/ossci-macos/cudnn-9.2-osx-x64-v7.1.tgz -o ~/cudnn-9.2-osx-x64-v7.1.tgz

				            rm -rf ~/cudnn-9.2-osx-x64-v7.1 && mkdir ~/cudnn-9.2-osx-x64-v7.1

				            tar -xzvf ~/cudnn-9.2-osx-x64-v7.1.tgz -C ~/cudnn-9.2-osx-x64-v7.1

				            sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/include/

				            sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/lib/libcudnn* /Developer/NVIDIA/CUDA-9.2/lib/

				            sudo chmod a+r /Developer/NVIDIA/CUDA-9.2/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/lib/libcudnn*

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				            # This IAM user allows write access to S3 bucket for sccache

				            set +x

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET_V4}

				            set -x

				            git submodule sync && git submodule update -q --init --recursive

				            chmod a+x .jenkins/pytorch/macos-build.sh

				            unbuffer .jenkins/pytorch/macos-build.sh 2>&1 | ts

				  pytorch_android_gradle_build:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:209062ef-ab58-422a-b295-36c4eed6e906"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - should_run_job

				    - setup_linux_system_environment

				    - checkout

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: pytorch android gradle build

				@ -247,7 +193,7 @@

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v7a" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir ~/workspace/build_android_install_arm_v7a

				          mkdir -p ~/workspace/build_android_install_arm_v7a

				          docker cp $id_arm_v7a:/var/lib/jenkins/workspace/build_android/install ~/workspace/build_android_install_arm_v7a

				          # x86_64

				@ -257,7 +203,7 @@

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_x86_64" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir ~/workspace/build_android_install_x86_64

				          mkdir -p ~/workspace/build_android_install_x86_64

				          docker cp $id_x86_64:/var/lib/jenkins/workspace/build_android/install ~/workspace/build_android_install_x86_64

				          # arm-v8a

				@ -267,7 +213,7 @@

				          export COMMAND='((echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace") | docker exec -u jenkins -i "$id_arm_v8a" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          mkdir ~/workspace/build_android_install_arm_v8a

				          mkdir -p ~/workspace/build_android_install_arm_v8a

				          docker cp $id_arm_v8a:/var/lib/jenkins/workspace/build_android/install ~/workspace/build_android_install_arm_v8a

				          docker cp ~/workspace/build_android_install_arm_v7a $id_x86_32:/var/lib/jenkins/workspace/build_android_install_arm_v7a

				@ -284,6 +230,26 @@

				          output_image=$docker_image_libtorch_android_x86_32-gradle

				          docker commit "$id_x86_32" ${output_image}

				          time docker push ${output_image}

				    - run:

				        name: save binary size

				        no_output_timeout: "5m"

				        command: |

				          docker_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}-android-x86_32-gradle

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image})

				          echo "docker-id: $id"

				          cat \<< EOL | docker exec -u jenkins -i "$id" bash

				          # ============================== Begin Docker ==============================

				          cd workspace

				          source ./env

				          export ANDROID_BUILD_TYPE="prebuild"

				          export COMMIT_TIME=\$(git log --max-count=1 --format=%ct || echo 0)

				          export CIRCLE_BUILD_NUM="${CIRCLE_BUILD_NUM}"

				          export CIRCLE_SHA1="${CIRCLE_SHA1}"

				          export CIRCLE_BRANCH="${CIRCLE_BRANCH}"

				          export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"

				          python .circleci/scripts/upload_binary_size_to_scuba.py android

				          # ==============================  End Docker  ==============================

				          EOL

				    - store_artifacts:

				        path: ~/workspace/build_android_artifacts/artifacts.tgz

				        destination: artifacts.tgz

				@ -291,13 +257,13 @@

				  pytorch_android_publish_snapshot:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-publish-snapshot

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:209062ef-ab58-422a-b295-36c4eed6e906"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - should_run_job

				    - checkout

				    - setup_linux_system_environment

				    - checkout

				    - setup_ci_environment

				@ -327,13 +293,13 @@

				  pytorch_android_gradle_build-x86_32:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-linux-xenial-py3-clang5-android-ndk-r19c-gradle-build-only-x86_32

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:209062ef-ab58-422a-b295-36c4eed6e906"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - should_run_job

				    - checkout

				    - run:

				        name: filter out not PR runs

				        no_output_timeout: "5m"

				@ -366,6 +332,26 @@

				          output_image=${docker_image_libtorch_android_x86_32}-gradle

				          docker commit "$id" ${output_image}

				          time docker push ${output_image}

				    - run:

				        name: save binary size

				        no_output_timeout: "5m"

				        command: |

				          docker_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}-android-x86_32-gradle

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${docker_image})

				          echo "docker-id: $id"

				          cat \<< EOL | docker exec -u jenkins -i "$id" bash

				          # ============================== Begin Docker ==============================

				          cd workspace

				          source ./env

				          export ANDROID_BUILD_TYPE="prebuild-single"

				          export COMMIT_TIME=\$(git log --max-count=1 --format=%ct || echo 0)

				          export CIRCLE_BUILD_NUM="${CIRCLE_BUILD_NUM}"

				          export CIRCLE_SHA1="${CIRCLE_SHA1}"

				          export CIRCLE_BRANCH="${CIRCLE_BRANCH}"

				          export SCRIBE_GRAPHQL_ACCESS_TOKEN="${SCRIBE_GRAPHQL_ACCESS_TOKEN}"

				          python .circleci/scripts/upload_binary_size_to_scuba.py android

				          # ==============================  End Docker  ==============================

				          EOL

				    - store_artifacts:

				        path: ~/workspace/build_android_x86_32_artifacts/artifacts.tgz

				        destination: artifacts.tgz

				@ -375,10 +361,8 @@

				    macos:

				      xcode: "11.2.1"

				    steps:

				      # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				      - should_run_job

				      - checkout

				      - run_brew_for_ios_build    

				      - run_brew_for_ios_build

				      - run:

				          name: Run Fastlane

				          no_output_timeout: "1h"

				@ -410,30 +394,44 @@

				            WORKSPACE=/Users/distiller/workspace

				            PROJ_ROOT=/Users/distiller/project

				            export TCLLIBPATH="/usr/local/lib"

				            # Install conda

				            curl -o ~/Downloads/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				            chmod +x ~/Downloads/conda.sh

				            /bin/bash ~/Downloads/conda.sh -b -p ~/anaconda

				            curl --retry 3 -o ~/conda.sh https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh

				            chmod +x ~/conda.sh

				            /bin/bash ~/conda.sh -b -p ~/anaconda

				            export PATH="~/anaconda/bin:${PATH}"

				            source ~/anaconda/bin/activate

				            # Install dependencies

				            conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes

				            retry () {

				                $*  || (sleep 1 && $*) || (sleep 2 && $*) || (sleep 4 && $*) || (sleep 8 && $*)

				            }

				            retry conda install numpy ninja pyyaml mkl mkl-include setuptools cmake cffi typing requests --yes

				            # sync submodules

				            cd ${PROJ_ROOT}

				            git submodule sync

				            git submodule update --init --recursive

				            # export

				            export CMAKE_PREFIX_PATH=${CONDA_PREFIX:-"$(dirname $(which conda))/../"}

				            # run build script

				            chmod a+x ${PROJ_ROOT}/scripts/build_ios.sh

				            echo "IOS_ARCH: ${IOS_ARCH}"

				            echo "IOS_PLATFORM: ${IOS_PLATFORM}"

				            export BUILD_PYTORCH_MOBILE=1

				            #check the custom build flag

				            echo "SELECTED_OP_LIST: ${SELECTED_OP_LIST}"

				            if [ -n "${SELECTED_OP_LIST}" ]; then

				                export SELECTED_OP_LIST="${PROJ_ROOT}/ios/TestApp/custom_build/${SELECTED_OP_LIST}"

				            fi

				            export IOS_ARCH=${IOS_ARCH}

				            export IOS_PLATFORM=${IOS_PLATFORM}

				            unbuffer ${PROJ_ROOT}/scripts/build_ios.sh 2>&1 | ts

				      - run:

				          name: Run Build Tests

				          name: Run Build Test

				          no_output_timeout: "30m"

				          command: |

				            set -e

				@ -445,7 +443,11 @@

				              exit 1

				            fi

				            echo ${IOS_DEV_TEAM_ID}

				            ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

				            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then

				              ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM} -c ${PROFILE} -t ${IOS_DEV_TEAM_ID}

				            else

				              ruby ${PROJ_ROOT}/scripts/xcode_build.rb -i ${PROJ_ROOT}/build_ios/install -x ${PROJ_ROOT}/ios/TestApp/TestApp.xcodeproj -p ${IOS_PLATFORM}

				            fi

				            if ! [ "$?" -eq "0" ]; then

				              echo 'xcodebuild failed!'

				              exit 1

				@ -455,15 +457,14 @@

				          no_output_timeout: "2h"

				          command: |

				            set -e

				            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then 

				            if [ ${IOS_PLATFORM} != "SIMULATOR" ]; then

				              echo "not SIMULATOR build, skip it."

				              exit 0

				            fi

				            WORKSPACE=/Users/distiller/workspace

				            PROJ_ROOT=/Users/distiller/project

				            source ~/anaconda/bin/activate

				            #install the latest version of PyTorch and TorchVision

				            pip install torch torchvision

				            pip install torch torchvision --progress-bar off

				            #run unit test

				            cd ${PROJ_ROOT}/ios/TestApp/benchmark

				            python trace_model.py

				@ -471,4 +472,106 @@

				            cd ${PROJ_ROOT}/ios/TestApp

				            instruments -s -devices

				            fastlane scan

				  pytorch_linux_bazel_build:

				    <<: *pytorch_params

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - checkout

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Bazel Build

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          # Pull Docker image and run build

				          echo "DOCKER_IMAGE: "${DOCKER_IMAGE}

				          time docker pull ${DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				          echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"

				          git submodule sync && git submodule update -q --init --recursive

				          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Push intermediate Docker image for next phase to use

				          if [ -z "${BUILD_ONLY}" ]; then

				            # Augment our output image name with bazel to avoid collisions

				            output_image=${DOCKER_IMAGE}-bazel-${CIRCLE_SHA1}

				            export COMMIT_DOCKER_IMAGE=$output_image

				            docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				            time docker push ${COMMIT_DOCKER_IMAGE}

				          fi

				  pytorch_linux_bazel_test:

				    <<: *pytorch_params

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - checkout

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Test

				        no_output_timeout: "90m"

				        command: |

				          set -e

				          output_image=${DOCKER_IMAGE}-bazel-${CIRCLE_SHA1}

				          export COMMIT_DOCKER_IMAGE=$output_image

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          else

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          fi

				          retrieve_test_reports() {

				            echo "retrieving test reports"

				            docker cp -L $id:/var/lib/jenkins/workspace/bazel-testlogs ./ || echo 'No test reports found!'

				          }

				          trap "retrieve_test_reports" ERR

				          if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          else

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export CIRCLE_PULL_REQUEST=${CIRCLE_PULL_REQUEST}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          fi

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          retrieve_test_reports

				          docker stats --all --no-stream

				    - store_test_results:

				        path: bazel-testlogs

				  pytorch_doc_test:

				    environment:

				      BUILD_ENVIRONMENT: pytorch-doc-test

				      # TODO: stop hardcoding this

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3.6-gcc5.4:209062ef-ab58-422a-b295-36c4eed6e906"

				    resource_class: medium

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    - checkout

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Doc test

				        no_output_timeout: "30m"

				        command: |

				          set -ex

				          export COMMIT_DOCKER_IMAGE=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export GITHUB_PYTORCHBOT_TOKEN=${GITHUB_PYTORCHBOT_TOKEN}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && . ./.jenkins/pytorch/docs-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

									
										18

.circleci/verbatim-sources/job-specs/job-specs-promote.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,18 @@

				  promote_s3:

				    <<: *promote_common

				    steps:

				      - checkout

				      - run:

				          name: Running promote script

				          command: |

				            scripts/release/promote/wheel_to_s3.sh

				  promote_conda:

				    <<: *promote_common

				    steps:

				      - checkout

				      - run:

				          name: Running promote script

				          command: |

				            scripts/release/promote/conda_to_conda.sh

									
										1

.circleci/verbatim-sources/job-specs-setup.yml → .circleci/verbatim-sources/job-specs/job-specs-setup.yml
									
												View File
												
				@ -27,4 +27,3 @@

				      - persist_to_workspace:

				          root: .

				          paths: .circleci/scripts

									
										304

.circleci/verbatim-sources/job-specs/pytorch-job-specs.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,304 @@

				jobs:

				  pytorch_linux_build:

				    <<: *pytorch_params

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    - setup_linux_system_environment

				    - checkout

				    - optional_merge_target_branch

				    - setup_ci_environment

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          # Pull Docker image and run build

				          echo "DOCKER_IMAGE: "${DOCKER_IMAGE}

				          time docker pull ${DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				          git submodule sync && git submodule update -q --init --recursive

				          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				          if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=TBB USE_TBB=1 "

				          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=NATIVE "

				          fi

				          echo "Parallel backend flags: "${PARALLEL_FLAGS}

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$PARALLEL_FLAGS"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Push intermediate Docker image for next phase to use

				          if [ -z "${BUILD_ONLY}" ]; then

				            # Note [Special build images]

				            # The xla build uses the same docker image as

				            # pytorch-linux-trusty-py3.6-gcc5.4-build. In the push step, we have to

				            # distinguish between them so the test can pick up the correct image.

				            output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				            if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-xla

				            elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-libtorch

				            elif [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-paralleltbb

				            elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-parallelnative

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_64"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-x86_64

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v7a"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v7a

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v8a"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v8a

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_32"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-x86_32

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-vulkan-x86_32"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-vulkan-x86_32

				            else

				              export COMMIT_DOCKER_IMAGE=$output_image

				            fi

				            docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				            time docker push ${COMMIT_DOCKER_IMAGE}

				          fi

				  pytorch_linux_test:

				    <<: *pytorch_params

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - checkout

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Download Docker image

				        no_output_timeout: "90m"

				        command: |

				          set -e

				          # See Note [Special build images]

				          output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-xla

				          elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-libtorch

				          elif [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-paralleltbb

				          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-parallelnative

				          else

				            export COMMIT_DOCKER_IMAGE=$output_image

				          fi

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=TBB USE_TBB=1 "

				          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=NATIVE "

				          fi

				          echo "Parallel backend flags: "${PARALLEL_FLAGS}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          else

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          fi

				          # Pass environment variables to the next step

				          # See https://circleci.com/docs/2.0/env-vars/#using-parameters-and-bash-environment

				          echo "export PARALLEL_FLAGS=\"${PARALLEL_FLAGS}\"" >> $BASH_ENV

				          echo "export id=$id" >> $BASH_ENV

				    - run:

				        name: Check for no AVX instruction by default

				        no_output_timeout: "20m"

				        command: |

				          set -e

				          is_vanilla_build() {

				            if [ "${BUILD_ENVIRONMENT}" == "pytorch-linux-bionic-py3.6-clang9-test" ]; then

				              return 0

				            fi

				            if [ "${BUILD_ENVIRONMENT}" == "pytorch-linux-xenial-py3.6-gcc5.4-test" ]; then

				              return 0

				            fi

				            return 1

				          }

				          if is_vanilla_build; then

				            echo "apt-get update && apt-get install -y qemu-user" | docker exec -u root -i "$id" bash

				            echo "cd workspace/build; qemu-x86_64 -cpu Broadwell -E ATEN_CPU_CAPABILITY=default ./bin/basic --gtest_filter=BasicTest.BasicTestCPU" | docker exec -u jenkins -i "$id" bash

				          else

				            echo "Skipping for ${BUILD_ENVIRONMENT}"

				          fi

				    - run:

				        name: Run tests

				        no_output_timeout: "90m"

				        command: |

				          set -e

				          if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          else

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "export CIRCLE_PULL_REQUEST=${CIRCLE_PULL_REQUEST}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          fi

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				    - run:

				        name: Report results

				        no_output_timeout: "5m"

				        command: |

				          set -e

				          docker stats --all --no-stream

				          echo "cd workspace; python test/print_test_stats.py test" | docker exec -u jenkins -i "$id" bash

				          echo "Retrieving test reports"

				          docker cp $id:/var/lib/jenkins/workspace/test/test-reports ./ || echo 'No test reports found!'

				        when: always

				    - store_test_results:

				        path: test-reports

				  pytorch_windows_build:

				    <<: *pytorch_windows_params

				    parameters:

				      executor:

				        type: string

				        default: "windows-cpu-with-nvidia-cuda"

				      build_environment:

				        type: string

				        default: ""

				      test_name:

				        type: string

				        default: ""

				      cuda_version:

				        type: string

				        default: "10"

				      python_version:

				        type: string

				        default: "3.6"

				      vc_version:

				        type: string

				        default: "14.11"

				      vc_year:

				        type: string

				        default: "2017"

				      vc_product:

				        type: string

				        default: "BuildTools"

				      use_cuda:

				        type: string

				        default: ""

				    executor: <<parameters.executor>>

				    steps:

				      - checkout

				      - run:

				          name: Install VS2017

				          command: |

				            if [[ "${VC_YEAR}" == "2017" ]]; then

				              powershell .circleci/scripts/vs_install.ps1

				            fi

				      - run:

				          name: Install Cuda

				          no_output_timeout: 30m

				          command: |

				            if [[ "${USE_CUDA}" == "1" ]]; then

				              .circleci/scripts/windows_cuda_install.sh

				            fi

				      - run:

				          name: Install Cudnn

				          command : |

				            if [[ "${USE_CUDA}" == "1" ]]; then

				              cd c:/

				              curl --retry 3 -O https://ossci-windows.s3.amazonaws.com/cudnn-10.1-windows10-x64-v7.6.4.38.zip

				              7z x cudnn-10.1-windows10-x64-v7.6.4.38.zip -ocudnn

				              cp -r cudnn/cuda/* "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/"

				            fi

				      - run:

				          name: Build

				          no_output_timeout: "90m"

				          command: |

				            set -e

				            set +x

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

				            set -x

				            .jenkins/pytorch/win-build.sh

				      - persist_to_workspace:

				          root: "C:/w"

				          paths: build-results

				      - store_artifacts:

				          path: C:/w/build-results

				  pytorch_windows_test:

				    <<: *pytorch_windows_params

				    parameters:

				      executor:

				        type: string

				        default: "windows-cpu-with-nvidia-cuda"

				      build_environment:

				        type: string

				        default: ""

				      test_name:

				        type: string

				        default: ""

				      cuda_version:

				        type: string

				        default: "10"

				      python_version:

				        type: string

				        default: "3.6"

				      vc_version:

				        type: string

				        default: "14.11"

				      vc_year:

				        type: string

				        default: "2017"

				      vc_product:

				        type: string

				        default: "BuildTools"

				      use_cuda:

				        type: string

				        default: ""

				    executor: <<parameters.executor>>

				    steps:

				      - checkout

				      - attach_workspace:

				          at: c:/users/circleci/workspace

				      - run:

				          name: Install VS2017

				          command: |

				            if [[ "${VC_YEAR}" == "2017" ]]; then

				              powershell .circleci/scripts/vs_install.ps1

				            fi

				      - run:

				          name: Install Cuda

				          no_output_timeout: 30m

				          command: |

				            if [[ "${CUDA_VERSION}" != "cpu" && "${JOB_EXECUTOR}" != "windows-with-nvidia-gpu" ]]; then

				              .circleci/scripts/windows_cuda_install.sh

				            fi

				      - run:

				          name: Install Cudnn

				          command : |

				            if [[ "${CUDA_VERSION}" != "cpu" && "${JOB_EXECUTOR}" != "windows-with-nvidia-gpu" ]]; then

				              cd c:/

				              curl --retry 3 -O https://ossci-windows.s3.amazonaws.com/cudnn-10.1-windows10-x64-v7.6.4.38.zip

				              7z x cudnn-10.1-windows10-x64-v7.6.4.38.zip -ocudnn

				              cp -r cudnn/cuda/* "C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v10.1/"

				            fi

				      - run:

				          name: Test

				          no_output_timeout: "30m"

				          command: |

				            set -e

				            export IN_CIRCLECI=1

				            set +x

				            export AWS_ACCESS_KEY_ID=${CIRCLECI_AWS_ACCESS_KEY_FOR_WIN_BUILD_V1}

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_WIN_BUILD_V1}

				            set -x

				            .jenkins/pytorch/win-test.sh

				      - store_test_results:

				          path: test/test-reports

									
										8

.circleci/verbatim-sources/nightly-binary-build-defaults.yml
									
												View File
												
				@ -26,18 +26,18 @@

				# (smoke tests and upload jobs do not need the pytorch repo).

				binary_checkout: &binary_checkout

				  name: Checkout pytorch/builder repo

				  command: ~/workspace/.circleci/scripts/binary_checkout.sh

				  command: .circleci/scripts/binary_checkout.sh

				# Parses circleci arguments in a consistent way, essentially routing to the

				# correct pythonXgccXcudaXos build we want

				binary_populate_env: &binary_populate_env

				  name: Set up binary env variables

				  command: ~/workspace/.circleci/scripts/binary_populate_env.sh

				  command: .circleci/scripts/binary_populate_env.sh

				binary_install_miniconda: &binary_install_miniconda

				  name: Install miniconda

				  no_output_timeout: "1h"

				  command: ~/workspace/.circleci/scripts/binary_install_miniconda.sh

				  command: .circleci/scripts/binary_install_miniconda.sh

				# This section is used in the binary_test and smoke_test jobs. It expects

				# 'binary_populate_env' to have populated /home/circleci/project/env and it

				@ -47,4 +47,4 @@ binary_run_in_docker: &binary_run_in_docker

				  name: Run in docker

				  # This step only runs on circleci linux machine executors that themselves

				  # need to start docker images

				  command: ~/workspace/.circleci/scripts/binary_run_in_docker.sh

				  command: .circleci/scripts/binary_run_in_docker.sh

									
										39

.circleci/verbatim-sources/pytorch-build-params.yml
									
												View File
											
				@ -1,39 +0,0 @@

				pytorch_params: &pytorch_params

				  parameters:

				    build_environment:

				      type: string

				      default: ""

				    docker_image:

				      type: string

				      default: ""

				    resource_class:

				      type: string

				      default: "large"

				    use_cuda_docker_runtime:

				      type: string

				      default: ""

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    DOCKER_IMAGE: << parameters.docker_image >>

				    USE_CUDA_DOCKER_RUNTIME: << parameters.use_cuda_docker_runtime >>

				  resource_class: << parameters.resource_class >>

				pytorch_ios_params: &pytorch_ios_params

				  parameters:

				    build_environment:

				      type: string

				      default: ""

				    ios_arch:

				      type: string

				      default: ""

				    ios_platform:

				      type: string

				      default: ""

				  environment:

				    BUILD_ENVIRONMENT: << parameters.build_environment >>

				    IOS_ARCH: << parameters.ios_arch >>

				    IOS_PLATFORM: << parameters.ios_platform >>

									
										141

.circleci/verbatim-sources/pytorch-job-specs.yml
									
												View File
											
				@ -1,141 +0,0 @@

				jobs:

				  pytorch_linux_build:

				    <<: *pytorch_params

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - setup_linux_system_environment

				    - checkout

				    - setup_ci_environment

				    - run:

				        name: Build

				        no_output_timeout: "1h"

				        command: |

				          set -e

				          # Pull Docker image and run build

				          echo "DOCKER_IMAGE: "${DOCKER_IMAGE}

				          time docker pull ${DOCKER_IMAGE} >/dev/null

				          export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				          # NB: Temporarily disable the rebase logic in v1.4.0, don't merge this change into master

				          # # TODO We may want to move the rebase logic to a separate step after checkout

				          # # Rebase to master only if in xenial_py3_6_gcc5_4 case

				          # if [[ "${CIRCLE_BRANCH}" != "master" && "${BUILD_ENVIRONMENT}" == *"gcc5"* ]]; then

				          #   echo "Merge master branch into $CIRCLE_BRANCH before build in environment $BUILD_ENVIRONMENT"

				          #   set -x

				          #   git config --global user.email "circleci.ossci@gmail.com"

				          #   git config --global user.name "CircleCI"

				          #   git config remote.origin.url https://github.com/pytorch/pytorch.git

				          #   git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master

				          #   git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=100 --quiet

				          #   export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`

				          #   echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}

				          #   export GIT_COMMIT=${CIRCLE_SHA1}

				          #   echo "GIT_COMMIT: " ${GIT_COMMIT}

				          #   git checkout -f ${GIT_COMMIT}

				          #   git reset --hard ${GIT_COMMIT}

				          #   git merge --allow-unrelated-histories --no-edit --no-ff ${GIT_MERGE_TARGET}

				          #   set +x

				          # else

				          #   echo "Do NOT merge master branch into $CIRCLE_BRANCH in environment $BUILD_ENVIRONMENT"

				          # fi

				          git submodule sync && git submodule update -q --init --recursive

				          docker cp /home/circleci/project/. $id:/var/lib/jenkins/workspace

				          if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=TBB USE_TBB=1 "

				          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=NATIVE "

				          fi

				          echo "Parallel backend flags: "${PARALLEL_FLAGS}

				          export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo '"$PARALLEL_FLAGS"' && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/build.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          # Push intermediate Docker image for next phase to use

				          if [ -z "${BUILD_ONLY}" ]; then

				            # Note [Special build images]

				            # The xla build uses the same docker image as

				            # pytorch-linux-trusty-py3.6-gcc5.4-build. In the push step, we have to

				            # distinguish between them so the test can pick up the correct image.

				            output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				            if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-xla

				            elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-libtorch

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_64"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-x86_64

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v7a"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v7a

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-arm-v8a"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-arm-v8a

				            elif [[ ${BUILD_ENVIRONMENT} == *"android-ndk-r19c-x86_32"* ]]; then

				              export COMMIT_DOCKER_IMAGE=$output_image-android-x86_32

				            else

				              export COMMIT_DOCKER_IMAGE=$output_image

				            fi

				            docker commit "$id" ${COMMIT_DOCKER_IMAGE}

				            time docker push ${COMMIT_DOCKER_IMAGE}

				          fi

				  pytorch_linux_test:

				    <<: *pytorch_params

				    machine:

				      image: ubuntu-1604:201903-01

				    steps:

				    # See Note [Workspace for CircleCI scripts] in job-specs-setup.yml

				    - should_run_job

				    - setup_linux_system_environment

				    - setup_ci_environment

				    - run:

				        name: Test

				        no_output_timeout: "90m"

				        command: |

				          set -e

				          # See Note [Special build images]

				          output_image=${DOCKER_IMAGE}-${CIRCLE_SHA1}

				          if [[ ${BUILD_ENVIRONMENT} == *"xla"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-xla

				          elif [[ ${BUILD_ENVIRONMENT} == *"libtorch"* ]]; then

				            export COMMIT_DOCKER_IMAGE=$output_image-libtorch

				          else

				            export COMMIT_DOCKER_IMAGE=$output_image

				          fi

				          echo "DOCKER_IMAGE: "${COMMIT_DOCKER_IMAGE}

				          if [[ ${BUILD_ENVIRONMENT} == *"paralleltbb"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=TBB USE_TBB=1 "

				          elif [[ ${BUILD_ENVIRONMENT} == *"parallelnative"* ]]; then

				            export PARALLEL_FLAGS="export ATEN_THREADING=NATIVE "

				          fi

				          echo "Parallel backend flags: "${PARALLEL_FLAGS}

				          time docker pull ${COMMIT_DOCKER_IMAGE} >/dev/null

				          if [ -n "${USE_CUDA_DOCKER_RUNTIME}" ]; then

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --runtime=nvidia -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          else

				            export id=$(docker run --cap-add=SYS_PTRACE --security-opt seccomp=unconfined -t -d -w /var/lib/jenkins ${COMMIT_DOCKER_IMAGE})

				          fi

				          retrieve_test_reports() {

				            echo "retrieving test reports"

				            docker cp $id:/var/lib/jenkins/workspace/test/test-reports ./ || echo 'No test reports found!'

				          }

				          trap "retrieve_test_reports" ERR

				          if [[ ${BUILD_ENVIRONMENT} == *"multigpu"* ]]; then

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/multigpu-test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          else

				            export COMMAND='((echo "export BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" && echo "${PARALLEL_FLAGS}" && echo "source ./workspace/env" && echo "sudo chown -R jenkins workspace && cd workspace && .jenkins/pytorch/test.sh") | docker exec -u jenkins -i "$id" bash) 2>&1'

				          fi

				          echo ${COMMAND} > ./command.sh && unbuffer bash ./command.sh | ts

				          retrieve_test_reports

				    - store_test_results:

				        path: test-reports

									
										4

.circleci/verbatim-sources/workflows-binary-build-header.yml
									
												View File
											
				@ -1,4 +0,0 @@

				##############################################################################

				# Daily binary build trigger

				##############################################################################

									
										101

.circleci/verbatim-sources/workflows-binary-builds-smoke-subset.yml
									
												View File
											
				@ -1,101 +0,0 @@

				      # Binary builds (subset, to smoke test that they'll work)

				      #

				      # NB: If you modify this file, you need to also modify

				      # the binary_and_smoke_tests_on_pr variable in

				      # pytorch-ci-hud to adjust the list of whitelisted builds

				      # at https://github.com/ezyang/pytorch-ci-hud/blob/master/src/BuildHistoryDisplay.js

				      - binary_linux_build:

				          name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_build

				          build_environment: "manywheel 2.7mu cpu devtoolset7"

				          requires:

				            - setup

				          docker_image: "pytorch/manylinux-cuda100"

				      - binary_linux_build:

				          name: binary_linux_manywheel_3_7m_cu100_devtoolset7_build

				          build_environment: "manywheel 3.7m cu100 devtoolset7"

				          requires:

				            - setup

				          docker_image: "pytorch/manylinux-cuda100"

				      - binary_linux_build:

				          name: binary_linux_conda_2_7_cpu_devtoolset7_build

				          build_environment: "conda 2.7 cpu devtoolset7"

				          requires:

				            - setup

				          docker_image: "pytorch/conda-cuda"

				      # This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710

				      # - binary_linux_conda_3_6_cu90_devtoolset7_build

				      - binary_linux_build:

				          name: binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build

				          build_environment: "libtorch 2.7m cpu devtoolset7"

				          requires:

				            - setup

				          libtorch_variant: "shared-with-deps"

				          docker_image: "pytorch/manylinux-cuda100"

				      - binary_linux_build:

				          name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build

				          build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"

				          requires:

				            - setup

				          libtorch_variant: "shared-with-deps"

				          docker_image: "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest"

				      # TODO we should test a libtorch cuda build, but they take too long

				      # - binary_linux_libtorch_2_7m_cu90_devtoolset7_static-without-deps_build

				      - binary_mac_build:

				          name: binary_macos_wheel_3_6_cpu_build

				          build_environment: "wheel 3.6 cpu"

				          requires:

				            - setup

				      - binary_mac_build:

				          name: binary_macos_conda_2_7_cpu_build

				          build_environment: "conda 2.7 cpu"

				          requires:

				            - setup

				      - binary_mac_build:

				          name: binary_macos_libtorch_2_7_cpu_build

				          build_environment: "libtorch 2.7 cpu"

				          requires:

				            - setup

				      - binary_linux_test:

				          name: binary_linux_manywheel_2_7mu_cpu_devtoolset7_test

				          build_environment: "manywheel 2.7mu cpu devtoolset7"

				          requires:

				            - setup

				            - binary_linux_manywheel_2_7mu_cpu_devtoolset7_build

				          docker_image: "pytorch/manylinux-cuda100"

				      - binary_linux_test:

				          name: binary_linux_manywheel_3_7m_cu100_devtoolset7_test

				          build_environment: "manywheel 3.7m cu100 devtoolset7"

				          requires:

				            - setup

				            - binary_linux_manywheel_3_7m_cu100_devtoolset7_build

				          docker_image: "pytorch/manylinux-cuda100"

				          use_cuda_docker_runtime: "1"

				          resource_class: gpu.medium

				      - binary_linux_test:

				          name: binary_linux_conda_2_7_cpu_devtoolset7_test

				          build_environment: "conda 2.7 cpu devtoolset7"

				          requires:

				            - setup

				            - binary_linux_conda_2_7_cpu_devtoolset7_build

				          docker_image: "pytorch/conda-cuda"

				      # This binary build is currently broken, see https://github_com/pytorch/pytorch/issues/16710

				      # - binary_linux_conda_3_6_cu90_devtoolset7_test:

				      - binary_linux_test:

				          name: binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_test

				          build_environment: "libtorch 2.7m cpu devtoolset7"

				          requires:

				            - setup

				            - binary_linux_libtorch_2_7m_cpu_devtoolset7_shared-with-deps_build

				          libtorch_variant: "shared-with-deps"

				          docker_image: "pytorch/manylinux-cuda100"

				      - binary_linux_test:

				          name: binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_test

				          build_environment: "libtorch 2.7m cpu gcc5.4_cxx11-abi"

				          requires:

				            - setup

				            - binary_linux_libtorch_2_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build

				          libtorch_variant: "shared-with-deps"

				          docker_image: "pytorch/pytorch-binary-docker-image-ubuntu16.04:latest"

									
										66

.circleci/verbatim-sources/workflows-docker-builder.yml
									
												View File
											
				@ -1,66 +0,0 @@

				  docker_build:

				    triggers:

				      - schedule:

				          cron: "0 15 * * 0"

				          filters:

				            branches:

				              only:

				                - master

				    jobs:

				      - docker_build_job:

				          name: "pytorch-linux-bionic-clang9-thrift-llvmdev"

				          image_name: "pytorch-linux-bionic-clang9-thrift-llvmdev"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7"

				          image_name: "pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7"

				          image_name: "pytorch-linux-xenial-cuda10.1-cudnn7-py3-gcc7"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda8-cudnn7-py2"

				          image_name: "pytorch-linux-xenial-cuda8-cudnn7-py2"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda8-cudnn7-py3"

				          image_name: "pytorch-linux-xenial-cuda8-cudnn7-py3"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda9-cudnn7-py2"

				          image_name: "pytorch-linux-xenial-cuda9-cudnn7-py2"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda9-cudnn7-py3"

				          image_name: "pytorch-linux-xenial-cuda9-cudnn7-py3"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7"

				          image_name: "pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py2.7.9"

				          image_name: "pytorch-linux-xenial-py2.7.9"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py2.7"

				          image_name: "pytorch-linux-xenial-py2.7"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

				          image_name: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3-clang5-asan"

				          image_name: "pytorch-linux-xenial-py3-clang5-asan"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.5"

				          image_name: "pytorch-linux-xenial-py3.5"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.6-clang7"

				          image_name: "pytorch-linux-xenial-py3.6-clang7"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.6-gcc4.8"

				          image_name: "pytorch-linux-xenial-py3.6-gcc4.8"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.6-gcc5.4"

				          image_name: "pytorch-linux-xenial-py3.6-gcc5.4"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.6-gcc7.2"

				          image_name: "pytorch-linux-xenial-py3.6-gcc7.2"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-py3.6-gcc7"

				          image_name: "pytorch-linux-xenial-py3.6-gcc7"

				      - docker_build_job:

				          name: "pytorch-linux-xenial-pynightly"

				          image_name: "pytorch-linux-xenial-pynightly"

									
										56

.circleci/verbatim-sources/workflows-nightly-android-binary-builds.yml
									
												View File
											
				@ -1,56 +0,0 @@

				      - pytorch_linux_build:

				          name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_32"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				          filters:

				            branches:

				              only: nightly

				      - pytorch_linux_build:

				          name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-x86_64"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				          filters:

				            branches:

				              only: nightly

				      - pytorch_linux_build:

				          name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v7a"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				          filters:

				            branches:

				              only: nightly

				      - pytorch_linux_build:

				          name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build

				          build_environment: "pytorch-linux-xenial-py3-clang5-android-ndk-r19c-arm-v8a"

				          requires:

				            - setup

				          docker_image: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-android-ndk-r19c:405"

				          filters:

				            branches:

				              only: nightly

				      - pytorch_android_gradle_build:

				          name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build

				          requires:

				            - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_build

				            - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_64_build

				            - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v7a_build

				            - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_arm_v8a_build

				          filters:

				            branches:

				              only: nightly

				      - pytorch_android_publish_snapshot:

				          name: nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_x86_32_android_publish_snapshot

				          requires:

				            - nightly_pytorch_linux_xenial_py3_clang5_android_ndk_r19c_android_gradle_build

				          context: org-member

				          filters:

				            branches:

				              only: nightly

									
										33

.circleci/verbatim-sources/workflows-nightly-ios-binary-builds.yml
									
												View File
											
				@ -1,33 +0,0 @@

				      # Pytorch iOS binary builds

				      - binary_ios_build:

				          name: pytorch_ios_11_2_1_nightly_x86_64_build

				          build_environment: "libtorch-ios-11.2.1-nightly-x86_64-build"

				          context: org-member

				          ios_platform: "SIMULATOR"

				          ios_arch: "x86_64"

				          requires: 

				            - setup

				          filters:

				            branches:

				              only: nightly

				      - binary_ios_build:

				          name: pytorch_ios_11_2_1_nightly_arm64_build

				          build_environment: "libtorch-ios-11.2.1-nightly-arm64-build"

				          context: org-member

				          ios_arch: "arm64"

				          ios_platform: "OS"

				          requires: 

				            - setup

				          filters:

				            branches:

				              only: nightly

				      - binary_ios_upload:

				          build_environment: "libtorch-ios-11.2.1-nightly-binary-build-upload"

				          context: org-member

				          requires:

				            - setup

				            - pytorch_ios_11_2_1_nightly_x86_64_build

				            - pytorch_ios_11_2_1_nightly_arm64_build

				          filters:

				            branches:

				              only: nightly

									
										11

.circleci/verbatim-sources/workflows-nightly-uploads-header.yml
									
												View File
											
				@ -1,11 +0,0 @@

				      #- binary_linux_libtorch_2.7m_cpu_test:

				      #    requires:

				      #      - binary_linux_libtorch_2.7m_cpu_build

				      #- binary_linux_libtorch_2.7m_cu90_test:

				      #    requires:

				      #      - binary_linux_libtorch_2.7m_cu90_build

				      #- binary_linux_libtorch_2.7m_cu100_test:

				      #    requires:

				      #      - binary_linux_libtorch_2.7m_cu100_build

				      # Nightly uploads

Compare commits

5137 Commits v1.4.1 ... v1.6.0-rc4

3 .bazelrc Normal file Unescape Escape View File

1 .bazelversion Normal file Unescape Escape View File

8 .circleci/README.md Unescape Escape View File

37 .circleci/cimodel/data/binary_build_data.py Unescape Escape View File

87 .circleci/cimodel/data/binary_build_definitions.py Unescape Escape View File

16 .circleci/cimodel/data/caffe2_build_data.py Unescape Escape View File

39 .circleci/cimodel/data/caffe2_build_definitions.py Unescape Escape View File

5 .circleci/cimodel/data/dimensions.py Unescape Escape View File

103 .circleci/cimodel/data/pytorch_build_data.py Unescape Escape View File

66 .circleci/cimodel/data/pytorch_build_definitions.py Unescape Escape View File

0 test/test_module/__init__.py → .circleci/cimodel/data/simple/__init__.py Unescape Escape View File

92 .circleci/cimodel/data/simple/android_definitions.py Normal file Unescape Escape View File

63 .circleci/cimodel/data/simple/bazel_definitions.py Normal file Unescape Escape View File

193 .circleci/cimodel/data/simple/binary_smoketest.py Normal file Unescape Escape View File

44 .circleci/cimodel/data/simple/docker_definitions.py Normal file Unescape Escape View File

103 .circleci/cimodel/data/simple/ge_config_tests.py Normal file Unescape Escape View File

71 .circleci/cimodel/data/simple/ios_definitions.py Normal file Unescape Escape View File

28 .circleci/cimodel/data/simple/macos_definitions.py Normal file Unescape Escape View File

56 .circleci/cimodel/data/simple/mobile_definitions.py Normal file Unescape Escape View File

73 .circleci/cimodel/data/simple/nightly_android.py Normal file Unescape Escape View File

68 .circleci/cimodel/data/simple/nightly_ios.py Normal file Unescape Escape View File

0 .python2 → .circleci/cimodel/data/simple/util/__init__.py Unescape Escape View File

22 .circleci/cimodel/data/simple/util/branch_filters.py Normal file Unescape Escape View File

30 .circleci/cimodel/data/simple/util/docker_constants.py Normal file Unescape Escape View File

31 .circleci/cimodel/data/simple/util/versions.py Normal file Unescape Escape View File

142 .circleci/cimodel/data/windows_build_definitions.py Normal file Unescape Escape View File

9 .circleci/cimodel/lib/miniyaml.py Unescape Escape View File

84 .circleci/cimodel/lib/visualization.py Unescape Escape View File

17 .circleci/codegen_validation/compare_normalized_yaml.sh Executable file Unescape Escape View File

24 .circleci/codegen_validation/normalize_yaml_fragment.py Executable file Unescape Escape View File

15 .circleci/codegen_validation/overwrite_with_normalized.sh Executable file Unescape Escape View File

10151 .circleci/config.yml View File

159 .circleci/docker/build.sh Unescape Escape View File

4 .circleci/docker/build_docker.sh Unescape Escape View File

2 .circleci/docker/common/install_android.sh Unescape Escape View File

28 .circleci/docker/common/install_base.sh Unescape Escape View File

2 .circleci/docker/common/install_cache.sh Unescape Escape View File

2 .circleci/docker/common/install_cmake.sh Unescape Escape View File

22 .circleci/docker/common/install_conda.sh Unescape Escape View File

6 .circleci/docker/common/install_gcc.sh Unescape Escape View File

30 .circleci/docker/common/install_llvm.sh Normal file Unescape Escape View File

89 .circleci/docker/common/install_rocm.sh Normal file Unescape Escape View File

5 .circleci/docker/common/install_travis_python.sh Unescape Escape View File

9 .circleci/docker/ubuntu-cuda/Dockerfile Unescape Escape View File

1 .circleci/docker/ubuntu-rocm/.gitignore vendored Normal file Unescape Escape View File

86 .circleci/docker/ubuntu-rocm/Dockerfile Normal file Unescape Escape View File

5 .circleci/docker/ubuntu/Dockerfile Unescape Escape View File

13 .circleci/ecr_gc_docker/Dockerfile Normal file Unescape Escape View File

125 .circleci/ecr_gc_docker/docker_hub.py Executable file Unescape Escape View File

214 .circleci/ecr_gc_docker/gc.py Executable file Unescape Escape View File

3 .circleci/ecr_gc_docker/requirements.txt Normal file Unescape Escape View File

124 .circleci/generate_config_yml.py Unescape Escape View File

27 .circleci/scripts/binary_checkout.sh Unescape Escape View File

4 .circleci/scripts/binary_install_miniconda.sh Unescape Escape View File

16 .circleci/scripts/binary_ios_build.sh Unescape Escape View File

10 .circleci/scripts/binary_ios_upload.sh Unescape Escape View File

17 .circleci/scripts/binary_linux_test.sh Unescape Escape View File

34 .circleci/scripts/binary_linux_upload.sh Unescape Escape View File

34 .circleci/scripts/binary_macos_upload.sh Unescape Escape View File

79 .circleci/scripts/binary_populate_env.sh Unescape Escape View File

25 .circleci/scripts/binary_run_in_docker.sh Unescape Escape View File

41 .circleci/scripts/binary_windows_build.sh Normal file Unescape Escape View File

19 .circleci/scripts/binary_windows_test.sh Normal file Unescape Escape View File

47 .circleci/scripts/binary_windows_upload.sh Normal file Unescape Escape View File

5 .circleci/scripts/cpp_doc_push_script.sh Unescape Escape View File

28 .circleci/scripts/python_doc_push_script.sh Unescape Escape View File

22 .circleci/scripts/setup_ci_environment.sh Unescape Escape View File

2 .circleci/scripts/setup_linux_system_environment.sh Unescape Escape View File

140 .circleci/scripts/should_run_job.py Unescape Escape View File

29 .circleci/scripts/should_run_job.sh Unescape Escape View File

145 .circleci/scripts/upload_binary_size_to_scuba.py Normal file Unescape Escape View File

34 .circleci/scripts/vs_install.ps1 Normal file Unescape Escape View File

37 .circleci/scripts/windows_cuda_install.sh Normal file Unescape Escape View File

67 .circleci/validate-docker-version.py Unescape Escape View File

20 .circleci/verbatim-sources/binary-build-tests.yml Unescape Escape View File

12 .circleci/verbatim-sources/binary-build-params.yml → .circleci/verbatim-sources/build-parameters/binary-build-params.yml Unescape Escape View File

1 .circleci/verbatim-sources/caffe2-build-params.yml → .circleci/verbatim-sources/build-parameters/caffe2-build-params.yml Unescape Escape View File

14 .circleci/verbatim-sources/build-parameters/promote-build-params.yml Normal file Unescape Escape View File

5137 Commits

v1.4.1 ... v1.6.0-rc4

3

.bazelrc Normal file

View File

1

.bazelversion Normal file

View File

8

.circleci/README.md

View File

37

.circleci/cimodel/data/binary_build_data.py

View File

87

.circleci/cimodel/data/binary_build_definitions.py

View File

16

.circleci/cimodel/data/caffe2_build_data.py

View File

39

.circleci/cimodel/data/caffe2_build_definitions.py

View File

5

.circleci/cimodel/data/dimensions.py

View File

103

.circleci/cimodel/data/pytorch_build_data.py

View File

66

.circleci/cimodel/data/pytorch_build_definitions.py

View File

0

test/test_module/init.py → .circleci/cimodel/data/simple/init.py

View File

92

.circleci/cimodel/data/simple/android_definitions.py Normal file

View File

63

.circleci/cimodel/data/simple/bazel_definitions.py Normal file

View File

193

.circleci/cimodel/data/simple/binary_smoketest.py Normal file

View File

44

.circleci/cimodel/data/simple/docker_definitions.py Normal file

View File

103

.circleci/cimodel/data/simple/ge_config_tests.py Normal file

View File

71

.circleci/cimodel/data/simple/ios_definitions.py Normal file

View File

28

.circleci/cimodel/data/simple/macos_definitions.py Normal file

View File

56

.circleci/cimodel/data/simple/mobile_definitions.py Normal file

View File

73

.circleci/cimodel/data/simple/nightly_android.py Normal file

View File

68

.circleci/cimodel/data/simple/nightly_ios.py Normal file

View File

0

.python2 → .circleci/cimodel/data/simple/util/init.py

View File

22

.circleci/cimodel/data/simple/util/branch_filters.py Normal file

View File

30

.circleci/cimodel/data/simple/util/docker_constants.py Normal file

View File

31

.circleci/cimodel/data/simple/util/versions.py Normal file

View File

142

.circleci/cimodel/data/windows_build_definitions.py Normal file

View File

9

.circleci/cimodel/lib/miniyaml.py

View File

84

.circleci/cimodel/lib/visualization.py

View File

17

.circleci/codegen_validation/compare_normalized_yaml.sh Executable file

View File

24

.circleci/codegen_validation/normalize_yaml_fragment.py Executable file

View File

15

.circleci/codegen_validation/overwrite_with_normalized.sh Executable file

View File

10151

.circleci/config.yml

View File

159

.circleci/docker/build.sh

View File

4

.circleci/docker/build_docker.sh

View File

2

.circleci/docker/common/install_android.sh

View File

28

.circleci/docker/common/install_base.sh

View File

2

.circleci/docker/common/install_cache.sh

View File

2

.circleci/docker/common/install_cmake.sh

View File

22

.circleci/docker/common/install_conda.sh

View File

6

.circleci/docker/common/install_gcc.sh

View File

30

.circleci/docker/common/install_llvm.sh Normal file

View File

89

.circleci/docker/common/install_rocm.sh Normal file

View File

5

.circleci/docker/common/install_travis_python.sh

View File

9

.circleci/docker/ubuntu-cuda/Dockerfile

View File

1

.circleci/docker/ubuntu-rocm/.gitignore vendored Normal file

View File

86

.circleci/docker/ubuntu-rocm/Dockerfile Normal file

View File

5

.circleci/docker/ubuntu/Dockerfile

View File

13

.circleci/ecr_gc_docker/Dockerfile Normal file

View File

125

.circleci/ecr_gc_docker/docker_hub.py Executable file

View File

214

.circleci/ecr_gc_docker/gc.py Executable file

View File

3

.circleci/ecr_gc_docker/requirements.txt Normal file

View File

124

.circleci/generate_config_yml.py

View File

27

.circleci/scripts/binary_checkout.sh

View File

4

.circleci/scripts/binary_install_miniconda.sh

View File

16

.circleci/scripts/binary_ios_build.sh

View File

10

.circleci/scripts/binary_ios_upload.sh

View File

17

.circleci/scripts/binary_linux_test.sh

View File

34

.circleci/scripts/binary_linux_upload.sh

View File

34

.circleci/scripts/binary_macos_upload.sh

View File

79

.circleci/scripts/binary_populate_env.sh

View File

25

.circleci/scripts/binary_run_in_docker.sh

View File

41

.circleci/scripts/binary_windows_build.sh Normal file

View File

19

.circleci/scripts/binary_windows_test.sh Normal file

View File

47

.circleci/scripts/binary_windows_upload.sh Normal file

View File

5

.circleci/scripts/cpp_doc_push_script.sh

View File

28

.circleci/scripts/python_doc_push_script.sh

View File

22

.circleci/scripts/setup_ci_environment.sh

View File

2

.circleci/scripts/setup_linux_system_environment.sh

View File

140

.circleci/scripts/should_run_job.py

View File

29

.circleci/scripts/should_run_job.sh

View File

145

.circleci/scripts/upload_binary_size_to_scuba.py Normal file

View File

34

.circleci/scripts/vs_install.ps1 Normal file

View File

37

.circleci/scripts/windows_cuda_install.sh Normal file

View File

67

.circleci/validate-docker-version.py

View File

20

.circleci/verbatim-sources/binary-build-tests.yml

View File

12

.circleci/verbatim-sources/binary-build-params.yml → .circleci/verbatim-sources/build-parameters/binary-build-params.yml

View File

1

.circleci/verbatim-sources/caffe2-build-params.yml → .circleci/verbatim-sources/build-parameters/caffe2-build-params.yml

View File

14

.circleci/verbatim-sources/build-parameters/promote-build-params.yml Normal file

View File

85

.circleci/verbatim-sources/build-parameters/pytorch-build-params.yml Normal file

View File