pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-21 21:49:24 +08:00

Author	SHA1	Message	Date
Soumith Chintala	db5d3131d1	add fix for CUDA 10	2018-12-06 15:44:56 -08:00
Lin Huang	524574ab73	Define THPStorage struct only once (rather than N times) (#14802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14802 The definetion of THPStorage does not depend on any Real, its macro defintion is unnecessary, refactor the code so that THPStorage is not macro defined. Reviewed By: ezyang Differential Revision: D13340445 fbshipit-source-id: 343393d0a36c868b9a06eea2ad9b80f5e395e947	2018-12-05 13:19:29 -08:00
Daya S Khudia	ca6311d909	File name change for FbgemmI8Depthwise.h and FbgemmI8Depthwise.cc (#14725 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14725 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/33 Renaming FbgemmI8Depthwise.h to FbgemmI8DepthwiseAvx2.h and FbgemmI8Depthwise.cc to FbgemmI8DepthwiseAvx2.cc since FbgemmI8DepthwiseAvx2.cc will be compiled with avx2 flags Reviewed By: jianyuh Differential Revision: D13313898 fbshipit-source-id: a8111eacf3d79a466ce0565bfe5f2f0b200a5c33	2018-12-05 13:14:48 -08:00
zrphercule	e114527d19	Add torch.nn.RReLU support in symbolic (#14781 ) Summary: Now we support exporting torch.nn.RReLU in onnx. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14781 Reviewed By: houseroad Differential Revision: D13343872 Pulled By: zrphercule fbshipit-source-id: 1e96b957de4fc2f5ba3959d42329807975419ae3	2018-12-05 13:10:07 -08:00
Daya S Khudia	50936cb06e	Move avx2 specific code in different source files (#28 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/28 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14516 This is the first diff in a series of diffs that will separate out avx2 specific code in separate files. The goal is to compile as little as possible code with avx2 and avx512 compiler flags. Reviewed By: jianyuh Differential Revision: D13248376 fbshipit-source-id: 401c2e9d3cd96c420fd08c3efa011febce96ffbb	2018-12-05 12:19:35 -08:00
Marat Dukhan	55092b1cc6	Validate matching input shapes in Int8Add operator (#14520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14520 Default engine doesn't support broadcast semantics in Int8Add operator. This patch adds a check that shapes are equivalent. Reviewed By: bertmaher Differential Revision: D13250922 fbshipit-source-id: 8526d07723bd9a34d54dee04d121c57f8b33c481	2018-12-05 12:00:23 -08:00
Tongzhou Wang	1c2273c8e9	fix stft arg types Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14800 Reviewed By: zou3519 Differential Revision: D13340574 Pulled By: SsnL fbshipit-source-id: 8b0dbbe299d1a362da0ecc0b1c0dadb2543ded5d	2018-12-05 11:45:37 -08:00
Edward Yang	999690ff3d	Improve HIPify performance (#14803 ) Summary: ``` Improve performance of pyHIPIFY Changes: - Pre-compile regexes, don't use regexes when it's not necessary (this saves us ~15%) - Compile all substitutions for mappings into a single, non-backtracking regex using a Trie. This gives big savings. Before, running pyHIPIFY on all files took 15.8s. Now it takes 3.9s. ``` Stacked on #14769 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14803 Differential Revision: D13342620 Pulled By: ezyang fbshipit-source-id: 1cfa36b3236bbe24d07080a31cc788a52d740f40	2018-12-05 11:00:03 -08:00
Ailing Zhang	be47470c91	Fix cuda multiprocessing cached memory (#14736 ) Summary: This PR fixes #11422 In the old world of CUDA IPC, when we want to share a tensor T from A to B, we have to share the whole CUDA mem allocation where T's storage sit in. And we casted it to the same type of storage of T's. This causes problem when two different types of storage got allocated to the same CUDA mem block. When we try to reconstruct the second tensor, it will complain about wrong storage type. In this PR we reconstruct the storage only (not the entire mem block). However, CUDA only allows one open memHandle once per process, we have to save the device pointer in a global cache so that we can reconstruct tensors as they come. Thanks a ton to ezyang who helped design the solution and debugged the issue! Pull Request resolved: https://github.com/pytorch/pytorch/pull/14736 Differential Revision: D13335899 Pulled By: ailzhang fbshipit-source-id: cad69db392ed6f8fdc2b93a9dc2899f6d378c371	2018-12-05 10:55:43 -08:00
Peter Goldsborough	3ae721d350	Set and get default dtype (#13748 ) Summary: Replaces the `DefaultTensorOptions` with just a global default dtype that you can set and get like in Python. Also, calls `set_default_dtype` in the implementation of `torch.set_default_dtype`. Right now these two default values are separate but will always be the same. Should we just bind `set_default_dtype` into Python? I think that might be good to do in a separate PR though. ezyang gchanan Also CC colesbury who wanted to do this for ATen for a while? What do you think about it? Pull Request resolved: https://github.com/pytorch/pytorch/pull/13748 Differential Revision: D13340207 Pulled By: goldsborough fbshipit-source-id: 2689b09eb137fabb3a92d1ad1635782bee9398e8	2018-12-05 10:28:41 -08:00
Marat Dukhan	90b1196ac4	Switch Int8AveragePool operator to QNNPACK (#14783 ) Summary: 2.2-2.9X better performance on ARM when compiled with gcc (same bad perf when compiled with Clang) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14783 Differential Revision: D13332680 Pulled By: Maratyszcza fbshipit-source-id: 4c1138500c6b3026335e9bfe5f6be43b1ae2cefb	2018-12-05 10:18:42 -08:00
peterjc123	e1eb32d9f1	Update magma to 2.4.0 for Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14738 Differential Revision: D13341611 Pulled By: soumith fbshipit-source-id: 39a49fc60e710cc32a463858c9cee57c182330e2	2018-12-05 09:53:39 -08:00
Edward Yang	62f4db6d8a	Unify build_caffe2_amd.py and build_pytorch_amd.py (#14769 ) Summary: I need to preserve ability to HIPify out-of-place files only, so build_amd.py grows a --out-of-place-only flag. Stacked on #14757 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14769 Differential Revision: D13340154 Pulled By: ezyang fbshipit-source-id: 1b855bc79e824ea94517a893236fd2c8ba4cb79d	2018-12-05 09:26:12 -08:00
Ilia Cherniavskii	dbf6d12776	Default pool() option (#14636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14636 Add a default CPU option for the pool() Reviewed By: andrewwdye Differential Revision: D13281367 fbshipit-source-id: 92dbfce89c900a41731b6d1ff62bb97886c40f77	2018-12-05 08:44:19 -08:00
Francisco Massa	2d958b7f77	Storage.clone maintains original device (#14751 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/14673 As pointed out by vishwakftw , the root case of the `deepcopy` issue was that `storage.clone()` would create a new storage in the default device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14751 Reviewed By: soumith Differential Revision: D13323061 Pulled By: fmassa fbshipit-source-id: bfe46ebd78f0b6cd9518c11d09de7849282ed2a2	2018-12-05 08:33:56 -08:00
svcscm	a80a46a6d0	Updating submodules Reviewed By: yns88 fbshipit-source-id: 080e0034bd6353420383ac7b476af5a35eaba7c3	2018-12-05 08:33:55 -08:00
svcscm	0b1b72e975	Updating submodules Reviewed By: yns88 fbshipit-source-id: e397238c7c477c4268e2dc89e530776fc89f18f8	2018-12-05 02:55:46 -08:00
Jongsoo Park	0573ef664e	include avx512vl to avx512 code path (#14733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14733 We often also want to use AVX512VL instruction sets. We already included AVX512F, AVX512DQ. Skylake also has AVX512BW, AVX512CD we may want to later. Reviewed By: duc0 Differential Revision: D13317282 fbshipit-source-id: 82c8e401d82d5c3a5452fb4ccb6e5cb88d242bda	2018-12-05 00:50:51 -08:00
Adam Paszke	f89de64796	Use AT_WARN for warnings in the JIT (#14770 ) Summary: Previously their implementation dispatched to prim::Print, which kept printing the warnings. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14770 Differential Revision: D13327629 Pulled By: suo fbshipit-source-id: b9913f533d4530eb7c29146c39981ba7f72b6b68	2018-12-05 00:16:09 -08:00
Yinghai Lu	ecc17fe3dd	Add output info when doing onnxGetBackendCompatibility (#14784 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14784 TSIA. To give more complete info to `onnxGetBackendCompatibility`. Reviewed By: bertmaher, rdzhabarov Differential Revision: D13331989 fbshipit-source-id: 1064b93f7f474788f736e6f0c893dae915c6fb99	2018-12-04 21:53:32 -08:00
Adam Paszke	c79e305add	Don't DCE PythonOp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14773 Reviewed By: eellison Differential Revision: D13327673 Pulled By: suo fbshipit-source-id: 236db3407c7eacac470530836e3d4d0dc323110c	2018-12-04 21:37:36 -08:00
Adam Paszke	8dfebc16cc	Improvements for symbolic AD (#14758 ) Summary: Review only the last commit. This commit adds a few optimizations to AD, that let us dramatically reduce the number of sizes we capture from forward. We now: - collapse chains of SumToSize - avoid capturing sizes of tensors that are captured anyway - more aggressively DCE the reverse code - run CSE on the primal code to deduplicate `aten::size` calls cc zou3519 zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/14758 Differential Revision: D13324440 Pulled By: zou3519 fbshipit-source-id: 45ccbc13605adcef2b461840c6089d3200000c72	2018-12-04 20:38:21 -08:00
Ailing Zhang	38eb1beff5	Revert D13289919: [pytorch][PR] [DataLoader] Refactor dataloader.py Differential Revision: D13289919 Original commit changeset: d701bc7bb48f fbshipit-source-id: c350c491fefa98a0a7c0cf22cb832e78aeb15c3d	2018-12-04 20:25:16 -08:00
Edward Yang	78a9e7d83f	Delete defunct files from torch/csrc/distributed (#14785 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14785 Differential Revision: D13333066 Pulled By: ezyang fbshipit-source-id: e7937b4e8e12409b0fa964c34f995f7861ca95ff	2018-12-04 20:13:20 -08:00
Elias Ellison	d76e411d8c	support conv transpose in script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14775 Differential Revision: D13330491 Pulled By: eellison fbshipit-source-id: 432b327d6a33517ff53ea33c9f64700e81432332	2018-12-04 19:54:09 -08:00
Teng Li	2d3cf98b49	Making dist.get_default_group private for PT1 release (#14767 ) Summary: When I wrote the frontend API, it is designed on not letting users use the default_group directly on any functions. It should really be private. All collectives are supposed to either use group.WORLD, or anything that comes out of new_group. That was the initial design. We need to make a TODO on removing group.WORLD one day. It exists for backward compatibility reasons and adds lots of complexity. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14767 Reviewed By: pietern Differential Revision: D13330655 Pulled By: teng-li fbshipit-source-id: ace107e1c3a9b3910a300b22815a9e8096fafb1c	2018-12-04 19:22:24 -08:00
Andy Chen	33ea7eafef	Make checkpoint_sequential work with multiple arguments (#14278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14278 In this commit, we make checkpoint_sequential work for models with multiple tensor inputs. Previously, it only processed the first tensor and ignored the rest. We introduce a new test in test/test_utils.py that replicates the issue referenced in this [GitHub issue](https://github.com/pytorch/pytorch/issues/11093), and we make sure that the test passes by changing the behavior of checkpoint_sequential to process all input tensors. Reviewed By: ezyang Differential Revision: D13144672 fbshipit-source-id: 24f58233a65a0f5b80b89c8d8cbced6f814004f7	2018-12-04 18:47:43 -08:00
Lu Fang	3237103624	Automatic update of fbcode/onnx to 42804705bdbf179d1a98394008417e1392013547 (#14777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14777 Previous import was 6b34743d2e361bbc0acb29dd73536478cb92562e Included changes: - [4280470](https://github.com/onnx/onnx/commit/4280470): Changes done internally at Facebook (#1668) <Lu Fang> - [f85221f](https://github.com/onnx/onnx/commit/f85221f): Fuse MatMul and Add into Gemm (#1542) <vloncar> - [022230e](https://github.com/onnx/onnx/commit/022230e): Replace np.long by np.int64 (#1664) <G. Ramalingam> - [0ab3c95](https://github.com/onnx/onnx/commit/0ab3c95): Infer shape from data in Constant nodes (#1667) <Shinichiro Hamaji> Reviewed By: bddppq Differential Revision: D13330082 fbshipit-source-id: 13cf328626cf872d0983bbd2154d95c45da70f1c	2018-12-04 18:37:48 -08:00
David Riazati	a66669a110	Enable testing on Loss modules (#14778 ) Summary: This PR adds `None` buffers as parameters (similarly to #14715). It also cleans up a bunch of the `test_jit.py` tests that should be covered by `common_nn.py` and brings in `criterion_tests` to test loss functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14778 Differential Revision: D13330849 Pulled By: driazati fbshipit-source-id: 924cc4cf94e0dcd11e811a55222fd2ebc42a9e76	2018-12-04 18:35:10 -08:00
Wanchao Liang	d872af9282	Add tests for dropout/batchnorm train/eval, remove training constants (#14780 ) Summary: This PR: 1. add tests for batchnorm/dropout for train/eval parameter mutatino 2. remove training constants from all our standard library Pull Request resolved: https://github.com/pytorch/pytorch/pull/14780 Differential Revision: D13331578 Pulled By: wanchaol fbshipit-source-id: d92ca3ce38cc2888688d50fe015e3e22539a20a5	2018-12-04 18:17:43 -08:00
Gregory Chanan	86b4dd8bb2	Split LegacyDeviceTypeInit from LegacyTypeDispatch. (#14723 ) Summary: The goal here is to have LegacyTHDispatch call into this as well, so LegacyTypeDispatch and LegacyTHDispatch don't have cross dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14723 Reviewed By: ezyang Differential Revision: D13314017 Pulled By: gchanan fbshipit-source-id: 8761cb4af2b2269d2e755203e073bfdba535b8c0	2018-12-04 17:51:37 -08:00
Michael Suo	f6f24cf0f4	don't allow cse to clean up nondeterministic nodes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14776 Differential Revision: D13330229 Pulled By: suo fbshipit-source-id: 6bc88811e1889949f0f079cffccd8cd4270584cc	2018-12-04 15:45:37 -08:00
Adam Paszke	d76fd43294	Reenable all forward-pass fusions that worked before the AD fix (#14558 ) Summary: Dealing with so many `aten::size` calls (in particular calls on elements computed inside fusion groups) requires us to do some extra graph processing in the fuser (to compute the sizes by explicit broadcasts, instead of writing the intermediate tensors only to check their size). This restores the forward expects of LSTM and MiLSTM to a single big kernel. Unfortunately the backward is much harder, because as long as we can't prove that the reductions are unnecessary (or if we can't distribute them over the op), we will not be able to fuse them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14558 Differential Revision: D13321748 Pulled By: zou3519 fbshipit-source-id: c04fc2f70d106d2bfb56206b5aec517a93b79d1f	2018-12-04 15:43:37 -08:00
David Riazati	c3bfa0e52b	BatchNorm support not tracking stats Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14764 Differential Revision: D13325800 Pulled By: driazati fbshipit-source-id: a3e4773dc31b83565e7a4de33614d6efd4a12de9	2018-12-04 15:11:53 -08:00
Lu Fang	c21f090ab4	Minor doc change in c10/Device.h (#14762 ) Summary: Make sure it's a valid regex. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14762 Reviewed By: zrphercule Differential Revision: D13326108 Pulled By: houseroad fbshipit-source-id: fdcae2d5d42774c4071651b7477f08047d385dfa	2018-12-04 14:52:22 -08:00
Gregory Chanan	9e1f4ba124	Introduce LegacyTHDispatcher for dispatching to TH functions. (#14754 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14754 This isn't hooked up to anything yet, this is just putting the skeleton in place. The idea here is that the functions generated via Declarations.cwrap and nn.yaml are not actually operators, they are implementation details of operators, and thus don't need to participate in VariableType, JIT dispatch generation. So, we will split these functions out from the usual Type/operator hierarchy; for now the dispatch will be done by a Type-like class called LegacyTHDispatcher. Once this is done this probably means we can collapse Type to be backend-specific, not Type/ScalarType specific, because all the ScalarType specific code will live in the LegacyTHDispatcher. Reviewed By: ezyang Differential Revision: D13321605 fbshipit-source-id: 25d1bbc9827a42d6ab5d69aabbad3eac72bf364c	2018-12-04 14:44:06 -08:00
Michael Suo	53a9d4f312	disable batch mm if we have mutable ops (#14771 ) Summary: Just to be safe, disable batch mm for mutable ops. We don't lose much for doing this, and we can go back at a calmer time to re-enable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14771 Reviewed By: eellison Differential Revision: D13327641 Pulled By: suo fbshipit-source-id: 96611e21ed3cb8492a2cd040f7d33fb58c52bd5e	2018-12-04 14:34:57 -08:00
Chandler Zuo	5ed9dfad98	Replace at::Half non-vectorized conversions with implementations from FP16 (#14411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14411 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14579 Folded the fp16 codes into c10. Reviewed By: ezyang Differential Revision: D13206450 fbshipit-source-id: 472208dd230dc49d33935622ff3286b17eeb0894	2018-12-04 14:32:33 -08:00
Thomas Viehmann	2d56df7892	Use .to to convert new tensors in new_tensor (#14097 ) Summary: This would solve the tracing problems of #13969. Fixes: #14732 I would appreciate if this got good scrutiny before applied. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14097 Differential Revision: D13323181 Pulled By: ezyang fbshipit-source-id: dcd104b497c0bfddb751923c6166a3824b7a3702	2018-12-04 14:03:56 -08:00
Zeming Lin	c7c5eed686	Export generator constructor (#14041 ) Summary: Missed a spot :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14041 Reviewed By: ezyang Differential Revision: D13283803 Pulled By: ebetica fbshipit-source-id: 482e245f57b0cea6ca3886355ea3ae487d024d4b	2018-12-04 13:50:06 -08:00
Zeming Lin	374b797569	c10d doesn't work with torch namespace (#14042 ) Summary: If both `Utils.hpp` and the `torch` namespace is included in the same file, the compiler won't know which fmap to use. I believe this is because of ADL. This change fixes that issue for me. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14042 Reviewed By: ezyang Differential Revision: D13283810 Pulled By: ebetica fbshipit-source-id: b68233336518230ba730e83ddac1226a66896533	2018-12-04 13:47:20 -08:00
Wanchao Liang	3aba2d99e1	Add resnet test, convert more modules (#14437 ) Summary: This PR add resnet to test_jit and convert more nn modules, stacked on #14533 and #14715 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14437 Differential Revision: D13325871 Pulled By: wanchaol fbshipit-source-id: 6c94a988b36794a373af6541c0c262a07291f7b1	2018-12-04 13:42:41 -08:00
David Riazati	25c9a8b1fc	Add missing test skip Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14763 Differential Revision: D13325350 Pulled By: driazati fbshipit-source-id: 4d64a7616b227983c2fc2748c5fbecd1bcbff832	2018-12-04 13:38:53 -08:00
Peter Goldsborough	875be849e9	Rename _local_scalar to item() (#13676 ) Summary: Make `at::_local_scalar` more "official" by renaming it to `item()`. gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/13676 Differential Revision: D13003020 Pulled By: goldsborough fbshipit-source-id: 0ac25f5237fb81a1576304a0a02f840ff44168a4	2018-12-04 13:19:26 -08:00
Edward Yang	e829a52977	Remove use of hipify_caffe2, in favor of file path test. (#14757 ) Summary: This is towards unifying build_pytorch_amd.py and build_caffe2_amd.py scripts. There is only one use of hipify_caffe2 left, which is just to control which files actually get HIPified. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14757 Differential Revision: D13323486 Pulled By: ezyang fbshipit-source-id: 958cd91be32dfc3c0a9ba9eda507adb5937aebcd	2018-12-04 12:48:49 -08:00
Jerry Zhang	a597c0ca05	Add inplace FeedTensor for python frontend (#14512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14512 att Reviewed By: dzhulgakov Differential Revision: D13243278 fbshipit-source-id: 78af417d0fcd9b9791ee839d62095903e49205cb	2018-12-04 12:45:11 -08:00
Elias Ellison	ba70cf22fa	Loss (#14720 ) Summary: Adding Loss modules to script. Some of the modules have an optional tensor parameter. I will wait until wanchao's diff to support optional tensors is landed before landing this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14720 Differential Revision: D13317990 Pulled By: eellison fbshipit-source-id: 535925bdf126d28d9e7d64077b83ebd836a5beba	2018-12-04 12:30:05 -08:00
Ailing Zhang	ef91cfd68b	Add new reduction mode in kl_div (#14457 ) Summary: Fixes #6622 . We used to average over all elements for kl divergence, which is not aligned with its math definition. This PR corrects the default reduction behavior of KL divergence that it now naverages over batch dimension. - In KL, default behavior `reduction=mean` averages over batch dimension. While for most other loss functions, `reduction=mean` averages over all elements. - We used to support scalar tensor as well. For BC purpose, we still support it, no reduction is performed on scalar tensor. - Added a new reduction mode called `batchmean` which has the correct behavior for KL. Add a warning to make `batchmean` as default for KL instead of `mean` in next major release. - [deprecated]I chose to not add a new reduction option, since "mean over batch dimension" is kinda special, and it only makes sense in few cases like KL. We don't want to explain why there's a option "batchmean" but it's not applicable for all other functions. I'm open to discussion on this one, as I cannot think of a perfect solution for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14457 Differential Revision: D13236016 Pulled By: ailzhang fbshipit-source-id: 905cc7b3bfc35a11d7cf098b1ebc382170a087a7	2018-12-04 12:24:28 -08:00
Michael Antonov	773f4d8081	Implements Gather operator for arbitrary axis, sharing the code with BatchGather. (#13756 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13756 This implements general Gather operator for arbitrary axis, sharing the code with BatchGather. - CPU gather & batch gather logic is now shared through caffe2::gather_helper, for any axis. - Shared CUDA kernel moved to gather_op.cuh, for any axis. - Gradients of axis > 0 delegate to BatchGatherGradientOp which now has axis argument. - BatchGatherOp doc strings updated to have correct rank (q + (r -1)) and output. - Added tests for axis == 2. GatherOp supports index wrapping for axis == 0 by default, which was earlier for ONNX. This diff also extends it to work in Cuda kernel. Added "wrap_indices" argument which specifies wheather this wrapping should be done; set it to true if you'd like wrapping for any axis. TBD: Update gradients to support negative indices (separate diff). TBD: Once we have operator versioning, we'd like to update GatherOp to NOT support axis 0 wrapping by default, but rather do it only if wrap_indices is set. Reviewed By: dzhulgakov Differential Revision: D12983815 fbshipit-source-id: 8add9d67b47fe8c5ba7a335f581ca0530b205cd7	2018-12-04 11:54:28 -08:00
SsnL	16558a1e9d	Refactor dataloader.py (#14668 ) Summary: As I am working on tasks in https://github.com/pytorch/pytorch/issues/13023, I realized how unreadable the code is because all functions to be run in multiprocessing must be at top global level. Adding more functionalities to `dataloader.py` will only make things worse. So in this PR, I refactor `dataloader.py` and move much of it into `data._utils`. E.g., the `_worker_loop` and related methods are now in `data._utils.worker`, signal handling code in `data._utils.signal_handling`, collating code in `data._utils.collate`, etc. This split, IMHO, makes code much clearer. I will base my future changes to DataLoader on top of this. No functionality is changed, except that I added `torch._six.queue`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14668 Reviewed By: soumith Differential Revision: D13289919 Pulled By: ailzhang fbshipit-source-id: d701bc7bb48f5dd7b163b5be941a9d27eb277a4c	2018-12-04 09:53:41 -08:00
Sebastian Messmer	7e4a5b89fe	Back out "Move TensorOptions, DefaultTensorOptions and OptionsGuard to c10" (#14745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14745 Original commit changeset: c62e7f9b0255 Reviewed By: suo Differential Revision: D13318594 fbshipit-source-id: 4d7dc35ca01b627accc3ee512bfcd6f2e805a533	2018-12-04 08:59:10 -08:00
Sebastian Messmer	ff7deb95d7	Back out "Fix include paths for TensorOptions, DefaultTensorOptions, OptionsGuard" (#14744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14744 Original commit changeset: d236d5351ecf Reviewed By: suo Differential Revision: D13318596 fbshipit-source-id: 55f1e9472d05fb5a9c47dc82c32e9a66b5e4308c	2018-12-04 08:59:07 -08:00
Adam Paszke	7bc489c827	Disable randn_like fusion in the JIT (#14752 ) Summary: Fixes #14674. We won't have time for a proper fix before the release, so at least disable fusion of nodes that trigger incorrect behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14752 Differential Revision: D13320407 Pulled By: zou3519 fbshipit-source-id: 2400f7c2cd332b957c248e755fdb0dadee68da5d	2018-12-04 08:55:47 -08:00
Ailing Zhang	86ffc2a5f1	fix import failure in hub test (#14742 ) Summary: Fix #14610 I can repro the test failure following the steps provided, and this fixes the issue for me. Seems the timing of inserting has to happen after the downloading. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14742 Differential Revision: D13318533 Pulled By: ailzhang fbshipit-source-id: b9207b4572d5a9443e516d9a84632e3d7b68e477	2018-12-04 08:37:05 -08:00
Edward Yang	9e58c4ef91	Revert D13304654: [pytorch][PR] Introduce LegacyTHDispatcher for dispatching to TH functions. Differential Revision: D13304654 Original commit changeset: cfe3e1a28adc fbshipit-source-id: 06669d3c88f83e1d959e2c266fd608316539d42a	2018-12-04 07:58:34 -08:00
Gregory Chanan	264111bfc1	Introduce LegacyTHDispatcher for dispatching to TH functions. (#14708 ) Summary: This isn't hooked up to anything yet, this is just putting the skeleton in place. The idea here is that the functions generated via Declarations.cwrap and nn.yaml are not actually operators, they are implementation details of operators, and thus don't need to participate in VariableType, JIT dispatch generation. So, we will split these functions out from the usual Type/operator hierarchy; for now the dispatch will be done by a Type-like class called LegacyTHDispatcher. Once this is done this probably means we can collapse Type to be backend-specific, not Type/ScalarType specific, because all the ScalarType specific code will live in the LegacyTHDispatcher. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14708 Reviewed By: ezyang Differential Revision: D13304654 Pulled By: gchanan fbshipit-source-id: cfe3e1a28adcc355f67fe143495ee7e5c5118606	2018-12-04 07:41:04 -08:00
Zachary DeVito	33b1f9f71a	add .code property to ScriptModule (#14735 ) Summary: simple change to allow `print(foo.code)` to give a pretty-printed description of all the methods on a module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14735 Differential Revision: D13317619 Pulled By: zdevito fbshipit-source-id: dc7f7ba12ba070f2dfccf362995c2a9e0e573cb7	2018-12-04 07:32:18 -08:00
Richard Zou	1921816f85	Fix clamp when min/max are both None (#14716 ) Summary: Before this PR, tensor.clamp() would return an empty tensor if min and max were not specified. This is a regression from 0.4.1, which would throw an error. This PR restores that error message. Fixes #14470 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14716 Differential Revision: D13311031 Pulled By: zou3519 fbshipit-source-id: 87894db582d5749eaccfc22ba06aac4e10983880	2018-12-04 07:07:09 -08:00
Lu Fang	6e0c5a8a4e	Restore device in cpp API (#14711 ) Summary: This is a stack PR based on https://github.com/pytorch/pytorch/pull/14454. It enables the restoring the storage to appropriate device. ~~[TODO]: add/modify appropriate tests~~ Done Pull Request resolved: https://github.com/pytorch/pytorch/pull/14711 Reviewed By: dzhulgakov Differential Revision: D13315746 Pulled By: houseroad fbshipit-source-id: fe6f24a45c35e88fd1a2eebc09950d4430fac185	2018-12-04 00:46:41 -08:00
Katherin Yu	cbd805169f	move structs to header file (#14728 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14728 Move IndexBlob,Index to header file so it can reused. Differential Revision: D13315898 fbshipit-source-id: 34432c9b8fa08af3d3387f32a940d35b02a59760	2018-12-04 00:42:41 -08:00
Lu Fang	c7f93668dc	improve the restore device test, and relax the assertion (#14734 ) Summary: Only compare the device index if device has it. Test the tensor restore with some computation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14734 Reviewed By: dzhulgakov Differential Revision: D13317949 Pulled By: houseroad fbshipit-source-id: 26b2f2912a9bbc3b660a62283fb403ddab437e49	2018-12-04 00:33:09 -08:00
Adam Paszke	8812a5d42e	Reduce broadcasted inputs in derivative code (#14485 ) Summary: Previously symbolic AD formulas assumed that no broadcasting happened, and would return gradients of incorrect shapes (possibly leading to silent errors later). Fixes a few bugs (known and unknown): - #11736 - ArgumentSpec didn't compute the input types correctly [(it didn't advance the offset for non-tensor args)](https://github.com/pytorch/pytorch/pull/14485/files#diff-4fd3157a056596aefb8cdf41022a208bR153) - Symbolic AD could suffer from use after free (dangling pointers in grad map), because [`EliminateDeadCode` could have removed nodes](https://github.com/pytorch/pytorch/pull/14485/files#diff-25d33ad1ed6855684dec79d927ca6142L781) that referenced gradients of certain values. - Undefined behavior in `aten::size` During my tests I've also found a few new problems, and I have opened issues for them: - FusionGroup seems to think that cat nodes broadcast their inputs (#14483) - `prim::ConstantChunk` derivative formula doesn't handle undefined inputs (#14484) This patch unfortunately deoptimizes some of our code (Fusion doesn't happen past chunk nodes, and outputs more tensors only because we have to get their size). I know how to fix those issues, but wanted to fix this terrible bug quickly. cc zou3519 zdevito ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/14485 Reviewed By: eellison Differential Revision: D13312888 Pulled By: suo fbshipit-source-id: ad46bfb4d0a306ad9451002f8270f7a790f72d58	2018-12-04 00:16:21 -08:00
Elias Ellison	862b8cae51	interpolate (#14123 ) Summary: Add support for interpolate and upsampling in weak_script mode. Because the function parameters are overloaded, i had to add it as a builtin op. For interpolate: size can be ?int \| int[]?, and scale_factor can be ?float \| float[]?. Every combination of the two parameters needs to be supported. The same logic applies for upsample_nearest, upsample_bilinear, and upsample. There are a few fixes that I came to along the way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14123 Differential Revision: D13278923 Pulled By: eellison fbshipit-source-id: e59729034369be4ce4b747291a3d1c74e135b869	2018-12-04 00:01:43 -08:00
David Riazati	a23863fd6f	Add Pooling modules to Script (#14527 ) Summary: Depends on #14584 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14527 Differential Revision: D13270773 Pulled By: driazati fbshipit-source-id: e4acd43ccbce0f4b62d41c30ce8d5c721171e19a	2018-12-03 23:55:04 -08:00
David Riazati	d429e78a9a	Add fractional_max_pool2d to standard lib Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14591 Differential Revision: D13270755 Pulled By: driazati fbshipit-source-id: 138a60256795f5ef8d236c75be2cfd929059b98f	2018-12-03 23:49:38 -08:00
David Riazati	e8e494caf8	Add GroupNorm to standard library (#14722 ) Summary: Depends on #14715 for the excluded tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/14722 Differential Revision: D13317714 Pulled By: driazati fbshipit-source-id: bf1cdbc0a3803f82befed41925e91ab60e20ec82	2018-12-03 23:46:19 -08:00
Michael Suo	95e5a5ae0c	basic testing of builtin alias annotations (#14588 ) Summary: Check whether the codegen'd alias annotations actually track alias creation and writes correctly. This could be made more exhaustive, but it's good enough for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14588 Differential Revision: D13312653 Pulled By: suo fbshipit-source-id: 98de1610ea86deada71957c75c222fff331a0888	2018-12-03 22:31:02 -08:00
Sebastian Messmer	9fbc2d3153	Remove TensorImpl -> LegacyTypeDispatch dependency Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14651 Reviewed By: ezyang Differential Revision: D13285370 fbshipit-source-id: cc93c3ca95e7260762c1cabca17b8973d52c4e22	2018-12-03 21:53:28 -08:00
Sebastian Messmer	d063c9c330	Fix include paths for TensorOptions, DefaultTensorOptions, OptionsGuard Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14647 Reviewed By: ezyang Differential Revision: D13283497 fbshipit-source-id: d236d5351ecf7ab9712a55e9ef12d8bba48eb53f	2018-12-03 21:53:26 -08:00
Sebastian Messmer	46772dba0c	Move TensorOptions, DefaultTensorOptions and OptionsGuard to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14646 Reviewed By: ezyang Differential Revision: D13283494 fbshipit-source-id: c62e7f9b02551926bf8f1e3ddf6ede4ec925d28d	2018-12-03 21:53:24 -08:00
Sebastian Messmer	1098500e9b	Fix include paths for Layout.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14645 Reviewed By: ezyang Differential Revision: D13283496 fbshipit-source-id: d70881e957c886a6c2befe3ef1d2c5a3fac18e7f	2018-12-03 21:53:22 -08:00
Sebastian Messmer	771eebad7b	Move Layout to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14644 Reviewed By: ezyang Differential Revision: D13283493 fbshipit-source-id: bb02f156d6a5b5129db5743c756acc84c38eca83	2018-12-03 21:53:20 -08:00
Sebastian Messmer	5a4082612f	Fix include paths for Backend.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14643 Reviewed By: ezyang Differential Revision: D13283492 fbshipit-source-id: 9919af9707d094118efc963543320e01b07d7bc5	2018-12-03 21:53:19 -08:00
Sebastian Messmer	c303fcb9cb	Moved Backend to c10 (#14642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14642 Unfortunately, TensorOptions depends on this, so we need it in c10. Reviewed By: ezyang Differential Revision: D13283495 fbshipit-source-id: 433cd47eb18aac1131be9c5cd650efc583870a20	2018-12-03 21:53:17 -08:00
Wanchao Liang	119f9ec291	enable NoneValue parameter assignment for WeakScriptModule (#14715 ) Summary: This PR: 1. Handle None value attr in the WeakScriptModuleProxy 2. add back module tests that now passing Pull Request resolved: https://github.com/pytorch/pytorch/pull/14715 Differential Revision: D13313573 Pulled By: wanchaol fbshipit-source-id: a6b7892707350290a6d69b6f6270ad089bfc954b	2018-12-03 20:40:55 -08:00
Zachary DeVito	bb546b2e5b	WAR for self.training (#14719 ) Summary: To enable self.training in script modules, this PR automatically adds a buffer called 'training' if a script method requests self.training. Assignment to self.training is overloaded to assign both to the boolean property and the tensor value. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14719 Differential Revision: D13310569 Pulled By: zdevito fbshipit-source-id: 406387bb602f8ce5794eeff37642863c75928be5	2018-12-03 20:32:16 -08:00
Zachary DeVito	9a932b8b90	fix expect Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14730 Differential Revision: D13316463 Pulled By: zdevito fbshipit-source-id: 8b11bdb22d354c17bf2de4bded352bb6eb086ec7	2018-12-03 20:15:27 -08:00
Lu Fang	44894915d6	Automatic update of fbcode/onnx to 6b34743d2e361bbc0acb29dd73536478cb92562e (#14637 ) Summary: Previous import was f461f7aad9987635b4aff108620ed7918f002d19 Included changes: - [6b34743](https://github.com/onnx/onnx/commit/6b34743): fix the const map initializatoin (#1662) <Lu Fang> - [ae80999](https://github.com/onnx/onnx/commit/ae80999): Fuse Pad into Conv optimizer (#1580) <vloncar> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14637 Differential Revision: D13281338 Pulled By: houseroad fbshipit-source-id: c31429914bf5954fdc85e0c02168836ef47d635c	2018-12-03 20:11:17 -08:00
Edward Yang	7b6c6f76f7	Skip CUDA tests when built with CUDA but no GPUs available; rename cuda tests so they're obvious. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14706 Reviewed By: soumith Differential Revision: D13304398 fbshipit-source-id: d5e2cda965ce8bc1721489b282336ea3ca7f0471	2018-12-03 18:49:59 -08:00
Edward Yang	22ab6183c5	Move manual_seed into ATen/Context.h; delete reimplementation in test_seed.h (#14625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14625 I want to reorg the test files, but I am too lazy to make the include paths for test_seed.h work out. So just delete it. Reviewed By: gchanan Differential Revision: D13277567 fbshipit-source-id: a3e8e46e4816b6fc0fe926b20779839f9e0a1a06	2018-12-03 18:49:58 -08:00
Zachary DeVito	78d594f46c	Implement Device as a type in the script (#14666 ) Summary: [ note: stacked on expect files changes, will unstack once they land ] This adds DeviceObjType (cannot use DeviceType it is already an enum) to the type hierarchy and an isDevice/toDevice pair to IValue. Previous hacks which used an int[] to represent Device are removed and at::Device is used instead. Note: the behavior or .to is only a subset of python, we need to fix the aten op so that it accepts Option[Device] and Optional[ScalarType]. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14666 Reviewed By: suo Differential Revision: D13290405 Pulled By: zdevito fbshipit-source-id: 68b4381b292f5418a6a46aaa077f1c902750b134	2018-12-03 16:54:40 -08:00
Wanchao Liang	4b31572375	Meta programming on If Stmt cond to enable conditional emit blocks (#14533 ) Summary: This PR is a part of task to unblock standard library export. Basically we want enable the ability to meta program IF stmt to dynamically emit different branches base on `cond`. This is primarily used to disable certain branch compilation on If, like the below ```python import torch class Test(torch.jit.ScriptModule): def __init__(self, b = None): self.b = b def forward(self, input): x = input if self.b is not None: x = self.b(input) return x Test()(torch.randn(2, 3)) ``` This is also the first step for us to bridge the gap between none simple value and any sugared value in JIT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14533 Differential Revision: D13310526 Pulled By: wanchaol fbshipit-source-id: 78d1a8127acda5e44d2a8a88f7627c43d29ff244	2018-12-03 15:47:15 -08:00
Edward Yang	298b775577	Delete temporary ATenCoreTest. (#14622 ) Summary: It was previously used to sure that ATen/core was working; but now we have plenty of headers and C++ files in ATen/core so this is no longer necessary. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14622 Differential Revision: D13276899 Pulled By: ezyang fbshipit-source-id: 9bef7eb1882ccdfa3ee7681a3d5b048ea94b59d3	2018-12-03 15:07:40 -08:00
Michael Suo	9ac845f734	Revert D13280899: [pytorch][PR] Reduce broadcasted inputs in derivative code Differential Revision: D13280899 Original commit changeset: 80cc5ec9331b fbshipit-source-id: 2335093cca8fd7db95470fd83b9299adfa17aa8e	2018-12-03 14:55:02 -08:00
Lu Fang	e0f68671bd	Restore device when import jit script module (#14454 ) Summary: We align the restore logic to `torch.load`, we try to restore to the right device, and if the device is not available, an exception is raised. We allow user to remap the device through a parameter `map_location`, it can be 1) a string like 'cuda:0`, `cpu`, 2) a device, torch.device('cpu'), 3) a dict, {'cuda:1', 'cuda:0'}, and a function, and its signature looks like string map_location(tensor, saved_device_string). Pull Request resolved: https://github.com/pytorch/pytorch/pull/14454 Reviewed By: zrphercule Differential Revision: D13271956 Pulled By: houseroad fbshipit-source-id: dfd6b6049b0dc07549ddeddf2dea03ac53ba6d49	2018-12-03 14:10:30 -08:00
David Riazati	b8da44dc13	Add linear + pixelshuffle modules to standard lib Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14654 Differential Revision: D13300968 Pulled By: driazati fbshipit-source-id: 2c36aab91ea99681687f8da6d318981fee49785b	2018-12-03 14:01:16 -08:00
Adam Paszke	68ffe46991	Reduce broadcasted inputs in derivative code (#14485 ) Summary: Previously symbolic AD formulas assumed that no broadcasting happened, and would return gradients of incorrect shapes (possibly leading to silent errors later). Fixes a few bugs (known and unknown): - #11736 - ArgumentSpec didn't compute the input types correctly [(it didn't advance the offset for non-tensor args)](https://github.com/pytorch/pytorch/pull/14485/files#diff-4fd3157a056596aefb8cdf41022a208bR153) - Symbolic AD could suffer from use after free (dangling pointers in grad map), because [`EliminateDeadCode` could have removed nodes](https://github.com/pytorch/pytorch/pull/14485/files#diff-25d33ad1ed6855684dec79d927ca6142L781) that referenced gradients of certain values. - Undefined behavior in `aten::size` During my tests I've also found a few new problems, and I have opened issues for them: - FusionGroup seems to think that cat nodes broadcast their inputs (#14483) - `prim::ConstantChunk` derivative formula doesn't handle undefined inputs (#14484) This patch unfortunately deoptimizes some of our code (Fusion doesn't happen past chunk nodes, and outputs more tensors only because we have to get their size). I know how to fix those issues, but wanted to fix this terrible bug quickly. cc zou3519 zdevito ngimel Pull Request resolved: https://github.com/pytorch/pytorch/pull/14485 Differential Revision: D13280899 Pulled By: soumith fbshipit-source-id: 80cc5ec9331be80e1bb9ddfe85b81c2b997e0b0c	2018-12-03 13:44:18 -08:00
Michael Suo	b768db0810	Allow DCE to clean up some mutable ops (#14601 ) Summary: This PR makes DCE a little smarter in the presence of mutable ops. Previously mutable ops could never be cleaned up, now they can be cleaned up if we can prove there are no live uses of any alias sets that the op writes to. This behavior is optional; if you pass DCE a block instead of a graph, it will do the same thing as before. Also changed `InlineAutographSubgraph` to use the common subgraph utils. Tested on traced ResNet, and it gets rid of the dead code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14601 Differential Revision: D13309118 Pulled By: suo fbshipit-source-id: dac2791e7d2ecf219ae717a2759b83c1e927f254	2018-12-03 13:31:08 -08:00
Michael Suo	9783ce3825	Revert D13272203: [pytorch][PR] [jit] Meta programming on If Stmt cond to enable conditional emit blocks Differential Revision: D13272203 Original commit changeset: 44a545abb766 fbshipit-source-id: 8861eb4810a6c9ea4aba8427b3a07d2fa0d69a15	2018-12-03 13:28:52 -08:00
Bram Wasti	6385d00185	Move global-constructor to lazily initialized (mobile restriction) (#14650 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14650 this fixes the build for mobile Reviewed By: dzhulgakov Differential Revision: D13267458 fbshipit-source-id: 83e7e76e3c875134395b6c43ea791c5b56871642	2018-12-03 13:24:56 -08:00
Wanchao Liang	5a2f5a216f	Make convertable to list also accepts optional Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14598 Differential Revision: D13308254 Pulled By: wanchaol fbshipit-source-id: bd0b6f9f20294d3d589cf68732dbd8c57b67e0e9	2018-12-03 13:09:11 -08:00
Jongsoo Park	b5181ba1df	add avx512 option (but no avx512 kernel yet) (#14664 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14664 This diff just adds a framework to add avx512 kernels. Please be really really careful about using avx512 kernels unless you're convinced using avx512 will bring good enough overall speedups because it can backfire because of cpu frequency going down. Reviewed By: duc0 Differential Revision: D13281944 fbshipit-source-id: 04fce8619c63f814944b727a99fbd7d35538eac6	2018-12-03 12:18:19 -08:00
Wanchao Liang	4b90702037	Meta programming on If Stmt cond to enable conditional emit blocks (#14533 ) Summary: This PR is a part of task to unblock standard library export. Basically we want enable the ability to meta program IF stmt to dynamically emit different branches base on `cond`. This is primarily used to disable certain branch compilation on If, like the below ```python import torch class Test(torch.jit.ScriptModule): def __init__(self, b = None): self.b = b def forward(self, input): x = input if self.b is not None: x = self.b(input) return x Test()(torch.randn(2, 3)) ``` This is also the first step for us to bridge the gap between none simple value and any sugared value in JIT. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14533 Differential Revision: D13272203 Pulled By: wanchaol fbshipit-source-id: 44a545abb766bbd39b762a6e19f9ebaa295e324b	2018-12-03 12:14:52 -08:00
Teng Li	cac03280f9	Fixed DistributedDataParallel state pickling for multi-gpus (#14690 ) Summary: Fixed: https://github.com/pytorch/pytorch/issues/14678 This PR fixed DDP doesn't work after save() and load() for multiple GPUs, because, it needs all these replicating logics and bucketing in the constructor. So I refactored some of the logics in the constructor to a helper function. And this will be used for load(). Added test too. Tested on 8 GPU machines. ``` tengli@learnfair062:~/pytorch/test$ python run_test.py -i distributed --verbose Test executor: ['/private/home/tengli/miniconda3/bin/python'] Selected tests: distributed Running test_distributed ... [2018-12-02 18:33:55.833580] /public/apps/openmpi/2.1.1/gcc.5.4.0/bin/mpiexec Running distributed tests for the mpi backend test_Backend_enum_class (__main__.TestMPI) ... test_Backend_enum_class (__main__.TestMPI) ... test_Backend_enum_class (__main__.TestMPI) ... ok test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel' test_DistributedDataParallelCPU (__main__.TestMPI) ... ok test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel' test_DistributedDataParallelCPU (__main__.TestMPI) ... ok test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel' test_DistributedDataParallelCPU (__main__.TestMPI) ... ok test_all_gather (__main__.TestMPI) ... ok test_all_gather (__main__.TestMPI) ... ok test_all_gather (__main__.TestMPI) ... ok test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather' test_all_gather_full_group (__main__.TestMPI) ... ok test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather' test_all_gather_full_group (__main__.TestMPI) ... ok test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather' test_all_gather_full_group (__main__.TestMPI) ... ok test_all_gather_group (__main__.TestMPI) ... ok test_all_gather_group (__main__.TestMPI) ... ok test_all_gather_group (__main__.TestMPI) ... ok test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu' test_all_reduce_full_group_max (__main__.TestMPI) ... ok test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu' test_all_reduce_full_group_max (__main__.TestMPI) ... ok test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu' test_all_reduce_full_group_max (__main__.TestMPI) ... ok test_all_reduce_full_group_min (__main__.TestMPI) ... ok test_all_reduce_full_group_min (__main__.TestMPI) ... ok test_all_reduce_full_group_min (__main__.TestMPI) ... ok test_all_reduce_full_group_product (__main__.TestMPI) ... ok test_all_reduce_full_group_product (__main__.TestMPI) ... ok test_all_reduce_full_group_product (__main__.TestMPI) ... ok test_all_reduce_full_group_sum (__main__.TestMPI) ... ok test_all_reduce_full_group_sum (__main__.TestMPI) ... ok test_all_reduce_full_group_sum (__main__.TestMPI) ... ok test_all_reduce_group_max (__main__.TestMPI) ... ok test_all_reduce_group_max (__main__.TestMPI) ... ok test_all_reduce_group_max (__main__.TestMPI) ... ok test_all_reduce_group_min (__main__.TestMPI) ... ok test_all_reduce_group_min (__main__.TestMPI) ... ok test_all_reduce_group_min (__main__.TestMPI) ... ok test_all_reduce_group_product (__main__.TestMPI) ... ok test_all_reduce_group_product (__main__.TestMPI) ... ok test_all_reduce_group_product (__main__.TestMPI) ... ok test_all_reduce_group_sum (__main__.TestMPI) ... ok test_all_reduce_group_sum (__main__.TestMPI) ... ok test_all_reduce_group_sum (__main__.TestMPI) ... ok test_all_reduce_max (__main__.TestMPI) ... ok test_all_reduce_max (__main__.TestMPI) ... ok test_all_reduce_max (__main__.TestMPI) ... ok test_all_reduce_min (__main__.TestMPI) ... ok test_all_reduce_min (__main__.TestMPI) ... ok test_all_reduce_min (__main__.TestMPI) ... ok test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_all_reduce_product (__main__.TestMPI) ... ok test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_all_reduce_product (__main__.TestMPI) ... ok test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_all_reduce_product (__main__.TestMPI) ... ok test_all_reduce_sum (__main__.TestMPI) ... ok test_all_reduce_sum (__main__.TestMPI) ... ok test_all_reduce_sum (__main__.TestMPI) ... ok test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestMPI) ... ok test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestMPI) ... ok test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestMPI) ... ok test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_full_group (__main__.TestMPI) ... ok test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_full_group (__main__.TestMPI) ... ok test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_full_group (__main__.TestMPI) ... ok test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_group (__main__.TestMPI) ... ok test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_group (__main__.TestMPI) ... ok test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_group (__main__.TestMPI) ... ok test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestMPI) ... ok test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestMPI) ... ok test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestMPI) ... ok test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce' test_broadcast_full_group (__main__.TestMPI) ... ok test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce' test_broadcast_full_group (__main__.TestMPI) ... ok test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce' test_broadcast_full_group (__main__.TestMPI) ... ok test_broadcast_group (__main__.TestMPI) ... ok test_broadcast_group (__main__.TestMPI) ... ok test_broadcast_group (__main__.TestMPI) ... ok test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_destroy_full_group (__main__.TestMPI) ... ok test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_destroy_full_group (__main__.TestMPI) ... ok test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_destroy_full_group (__main__.TestMPI) ... ok test_destroy_group (__main__.TestMPI) ... ok test_destroy_group (__main__.TestMPI) ... ok test_destroy_group (__main__.TestMPI) ... ok test_gather (__main__.TestMPI) ... ok test_gather (__main__.TestMPI) ... ok test_gather (__main__.TestMPI) ... ok test_gather_full_group (__main__.TestMPI) ... ok test_gather_full_group (__main__.TestMPI) ... ok test_gather_full_group (__main__.TestMPI) ... ok test_gather_group (__main__.TestMPI) ... ok test_gather_group (__main__.TestMPI) ... ok test_gather_group (__main__.TestMPI) ... ok test_get_backend (__main__.TestMPI) ... ok test_get_backend (__main__.TestMPI) ... ok test_get_backend (__main__.TestMPI) ... ok test_get_default_group (__main__.TestMPI) ... ok test_get_default_group (__main__.TestMPI) ... ok test_get_default_group (__main__.TestMPI) ... ok test_get_rank (__main__.TestMPI) ... ok test_get_rank (__main__.TestMPI) ... ok test_get_rank (__main__.TestMPI) ... ok test_get_rank_size_full_group (__main__.TestMPI) ... ok test_get_rank_size_full_group (__main__.TestMPI) ... ok test_get_rank_size_full_group (__main__.TestMPI) ... ok test_get_rank_size_group (__main__.TestMPI) ... ok test_get_rank_size_group (__main__.TestMPI) ... ok test_get_rank_size_group (__main__.TestMPI) ... ok test_irecv (__main__.TestMPI) ... ok test_irecv (__main__.TestMPI) ... ok test_irecv (__main__.TestMPI) ... ok test_isend (__main__.TestMPI) ... ok test_isend (__main__.TestMPI) ... ok test_isend (__main__.TestMPI) ... ok test_reduce_full_group_max (__main__.TestMPI) ... ok test_reduce_full_group_max (__main__.TestMPI) ... ok test_reduce_full_group_max (__main__.TestMPI) ... ok test_reduce_full_group_min (__main__.TestMPI) ... ok test_reduce_full_group_min (__main__.TestMPI) ... ok test_reduce_full_group_min (__main__.TestMPI) ... ok test_reduce_full_group_product (__main__.TestMPI) ... ok test_reduce_full_group_product (__main__.TestMPI) ... ok test_reduce_full_group_product (__main__.TestMPI) ... ok test_reduce_full_group_sum (__main__.TestMPI) ... ok test_reduce_full_group_sum (__main__.TestMPI) ... ok test_reduce_full_group_sum (__main__.TestMPI) ... ok test_reduce_group_max (__main__.TestMPI) ... ok test_reduce_group_max (__main__.TestMPI) ... ok test_reduce_group_max (__main__.TestMPI) ... ok test_reduce_group_min (__main__.TestMPI) ... ok test_reduce_group_min (__main__.TestMPI) ... ok test_reduce_group_min (__main__.TestMPI) ... ok test_reduce_group_product (__main__.TestMPI) ... ok test_reduce_group_product (__main__.TestMPI) ... ok test_reduce_group_product (__main__.TestMPI) ... ok test_reduce_group_sum (__main__.TestMPI) ... ok test_reduce_group_sum (__main__.TestMPI) ... ok test_reduce_group_sum (__main__.TestMPI) ... ok test_reduce_max (__main__.TestMPI) ... ok test_reduce_max (__main__.TestMPI) ... ok test_reduce_max (__main__.TestMPI) ... ok test_reduce_min (__main__.TestMPI) ... ok test_reduce_min (__main__.TestMPI) ... ok test_reduce_min (__main__.TestMPI) ... ok test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu' test_reduce_product (__main__.TestMPI) ... ok test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu' test_reduce_product (__main__.TestMPI) ... ok test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu' test_reduce_product (__main__.TestMPI) ... ok test_reduce_sum (__main__.TestMPI) ... ok test_reduce_sum (__main__.TestMPI) ... ok test_reduce_sum (__main__.TestMPI) ... ok test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce' test_scatter (__main__.TestMPI) ... ok test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce' test_scatter (__main__.TestMPI) ... ok test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce' test_scatter (__main__.TestMPI) ... ok test_scatter_full_group (__main__.TestMPI) ... ok test_scatter_full_group (__main__.TestMPI) ... ok test_scatter_full_group (__main__.TestMPI) ... ok test_scatter_group (__main__.TestMPI) ... ok test_scatter_group (__main__.TestMPI) ... ok test_scatter_group (__main__.TestMPI) ... ok test_send_recv (__main__.TestMPI) ... ok test_send_recv (__main__.TestMPI) ... ok test_send_recv (__main__.TestMPI) ... ok test_send_recv_any_source (__main__.TestMPI) ... ok test_send_recv_any_source (__main__.TestMPI) ... ok test_send_recv_any_source (__main__.TestMPI) ... ok test_send_recv_with_tag (__main__.TestMPI) ... ok test_send_recv_with_tag (__main__.TestMPI) ... ok test_send_recv_with_tag (__main__.TestMPI) ... ok ---------------------------------------------------------------------- Ran 68 tests in 6.315s OK (skipped=15) ok ---------------------------------------------------------------------- Ran 68 tests in 6.315s OK (skipped=15) ok ---------------------------------------------------------------------- Ran 68 tests in 6.315s OK (skipped=15) Running distributed tests for the mpi backend with file init_method test_Backend_enum_class (__main__.TestMPI) ... test_Backend_enum_class (__main__.TestMPI) ... test_Backend_enum_class (__main__.TestMPI) ... ok test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel' test_DistributedDataParallelCPU (__main__.TestMPI) ... ok test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel' test_DistributedDataParallelCPU (__main__.TestMPI) ... ok test_DistributedDataParallel (__main__.TestMPI) ... skipped 'Only Nccl & Gloo backend support DistributedDataParallel' test_DistributedDataParallelCPU (__main__.TestMPI) ... ok test_all_gather (__main__.TestMPI) ... ok test_all_gather (__main__.TestMPI) ... ok test_all_gather (__main__.TestMPI) ... ok test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather' test_all_gather_full_group (__main__.TestMPI) ... ok test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather' test_all_gather_full_group (__main__.TestMPI) ... ok test_all_gather_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA all gather' test_all_gather_full_group (__main__.TestMPI) ... ok test_all_gather_group (__main__.TestMPI) ... ok test_all_gather_group (__main__.TestMPI) ... ok test_all_gather_group (__main__.TestMPI) ... ok test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu' test_all_reduce_full_group_max (__main__.TestMPI) ... ok test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu' test_all_reduce_full_group_max (__main__.TestMPI) ... ok test_all_gather_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports allgather multigpu' test_all_reduce_full_group_max (__main__.TestMPI) ... ok test_all_reduce_full_group_min (__main__.TestMPI) ... ok test_all_reduce_full_group_min (__main__.TestMPI) ... ok test_all_reduce_full_group_min (__main__.TestMPI) ... ok test_all_reduce_full_group_product (__main__.TestMPI) ... ok test_all_reduce_full_group_product (__main__.TestMPI) ... ok test_all_reduce_full_group_product (__main__.TestMPI) ... ok test_all_reduce_full_group_sum (__main__.TestMPI) ... ok test_all_reduce_full_group_sum (__main__.TestMPI) ... ok test_all_reduce_full_group_sum (__main__.TestMPI) ... ok test_all_reduce_group_max (__main__.TestMPI) ... ok test_all_reduce_group_max (__main__.TestMPI) ... ok test_all_reduce_group_max (__main__.TestMPI) ... ok test_all_reduce_group_min (__main__.TestMPI) ... ok test_all_reduce_group_min (__main__.TestMPI) ... ok test_all_reduce_group_min (__main__.TestMPI) ... ok test_all_reduce_group_product (__main__.TestMPI) ... ok test_all_reduce_group_product (__main__.TestMPI) ... ok test_all_reduce_group_product (__main__.TestMPI) ... ok test_all_reduce_group_sum (__main__.TestMPI) ... ok test_all_reduce_group_sum (__main__.TestMPI) ... ok test_all_reduce_group_sum (__main__.TestMPI) ... ok test_all_reduce_max (__main__.TestMPI) ... ok test_all_reduce_max (__main__.TestMPI) ... ok test_all_reduce_max (__main__.TestMPI) ... ok test_all_reduce_min (__main__.TestMPI) ... ok test_all_reduce_min (__main__.TestMPI) ... ok test_all_reduce_min (__main__.TestMPI) ... ok test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_all_reduce_product (__main__.TestMPI) ... ok test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_all_reduce_product (__main__.TestMPI) ... ok test_all_reduce_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_all_reduce_product (__main__.TestMPI) ... ok test_all_reduce_sum (__main__.TestMPI) ... ok test_all_reduce_sum (__main__.TestMPI) ... ok test_all_reduce_sum (__main__.TestMPI) ... ok test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestMPI) ... ok test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestMPI) ... ok test_all_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestMPI) ... ok test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_full_group (__main__.TestMPI) ... ok test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_full_group (__main__.TestMPI) ... ok test_barrier_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_full_group (__main__.TestMPI) ... ok test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_group (__main__.TestMPI) ... ok test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_group (__main__.TestMPI) ... ok test_barrier_full_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_group (__main__.TestMPI) ... ok test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestMPI) ... ok test_barrier_group_cuda (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestMPI) ... ok test_barrier_group_cuda (__main__.TestMPI) ... skipped "MPI doesn't supports GPU barrier" test_barrier_timeout_full_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestMPI) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestMPI) ... ok test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce' test_broadcast_full_group (__main__.TestMPI) ... ok test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce' test_broadcast_full_group (__main__.TestMPI) ... ok test_broadcast_cuda (__main__.TestMPI) ... skipped 'Only Gloo and Nccl backend supports CUDA allReduce' test_broadcast_full_group (__main__.TestMPI) ... ok test_broadcast_group (__main__.TestMPI) ... ok test_broadcast_group (__main__.TestMPI) ... ok test_broadcast_group (__main__.TestMPI) ... ok test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_destroy_full_group (__main__.TestMPI) ... ok test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_destroy_full_group (__main__.TestMPI) ... ok test_broadcast_multigpu (__main__.TestMPI) ... skipped "MPI doesn't support broadcast multigpu" test_destroy_full_group (__main__.TestMPI) ... ok test_destroy_group (__main__.TestMPI) ... ok test_destroy_group (__main__.TestMPI) ... ok test_destroy_group (__main__.TestMPI) ... ok test_gather (__main__.TestMPI) ... ok test_gather (__main__.TestMPI) ... ok test_gather (__main__.TestMPI) ... ok test_gather_full_group (__main__.TestMPI) ... ok test_gather_full_group (__main__.TestMPI) ... ok test_gather_full_group (__main__.TestMPI) ... ok test_gather_group (__main__.TestMPI) ... ok test_gather_group (__main__.TestMPI) ... ok test_gather_group (__main__.TestMPI) ... ok test_get_backend (__main__.TestMPI) ... ok test_get_backend (__main__.TestMPI) ... ok test_get_backend (__main__.TestMPI) ... ok test_get_default_group (__main__.TestMPI) ... ok test_get_default_group (__main__.TestMPI) ... ok test_get_default_group (__main__.TestMPI) ... ok test_get_rank (__main__.TestMPI) ... ok test_get_rank (__main__.TestMPI) ... ok test_get_rank (__main__.TestMPI) ... ok test_get_rank_size_full_group (__main__.TestMPI) ... ok test_get_rank_size_full_group (__main__.TestMPI) ... ok test_get_rank_size_full_group (__main__.TestMPI) ... ok test_get_rank_size_group (__main__.TestMPI) ... ok test_get_rank_size_group (__main__.TestMPI) ... ok test_get_rank_size_group (__main__.TestMPI) ... ok test_irecv (__main__.TestMPI) ... ok test_irecv (__main__.TestMPI) ... ok test_irecv (__main__.TestMPI) ... ok test_isend (__main__.TestMPI) ... ok test_isend (__main__.TestMPI) ... ok test_isend (__main__.TestMPI) ... ok test_reduce_full_group_max (__main__.TestMPI) ... ok test_reduce_full_group_max (__main__.TestMPI) ... ok test_reduce_full_group_max (__main__.TestMPI) ... ok test_reduce_full_group_min (__main__.TestMPI) ... ok test_reduce_full_group_min (__main__.TestMPI) ... ok test_reduce_full_group_min (__main__.TestMPI) ... ok test_reduce_full_group_product (__main__.TestMPI) ... ok test_reduce_full_group_product (__main__.TestMPI) ... ok test_reduce_full_group_product (__main__.TestMPI) ... ok test_reduce_full_group_sum (__main__.TestMPI) ... ok test_reduce_full_group_sum (__main__.TestMPI) ... ok test_reduce_full_group_sum (__main__.TestMPI) ... ok test_reduce_group_max (__main__.TestMPI) ... ok test_reduce_group_max (__main__.TestMPI) ... ok test_reduce_group_max (__main__.TestMPI) ... ok test_reduce_group_min (__main__.TestMPI) ... ok test_reduce_group_min (__main__.TestMPI) ... ok test_reduce_group_min (__main__.TestMPI) ... ok test_reduce_group_product (__main__.TestMPI) ... ok test_reduce_group_product (__main__.TestMPI) ... ok test_reduce_group_product (__main__.TestMPI) ... ok test_reduce_group_sum (__main__.TestMPI) ... ok test_reduce_group_sum (__main__.TestMPI) ... ok test_reduce_group_sum (__main__.TestMPI) ... ok test_reduce_max (__main__.TestMPI) ... ok test_reduce_max (__main__.TestMPI) ... ok test_reduce_max (__main__.TestMPI) ... ok test_reduce_min (__main__.TestMPI) ... ok test_reduce_min (__main__.TestMPI) ... ok test_reduce_min (__main__.TestMPI) ... ok test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu' test_reduce_product (__main__.TestMPI) ... ok test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu' test_reduce_product (__main__.TestMPI) ... ok test_reduce_multigpu (__main__.TestMPI) ... skipped 'Only Nccl backend supports reduce multigpu' test_reduce_product (__main__.TestMPI) ... ok test_reduce_sum (__main__.TestMPI) ... ok test_reduce_sum (__main__.TestMPI) ... ok test_reduce_sum (__main__.TestMPI) ... ok test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce' test_scatter (__main__.TestMPI) ... ok test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce' test_scatter (__main__.TestMPI) ... ok test_reduce_sum_cuda (__main__.TestMPI) ... skipped 'Only Nccl supports CUDA reduce' test_scatter (__main__.TestMPI) ... ok test_scatter_full_group (__main__.TestMPI) ... ok test_scatter_full_group (__main__.TestMPI) ... ok test_scatter_full_group (__main__.TestMPI) ... ok test_scatter_group (__main__.TestMPI) ... ok test_scatter_group (__main__.TestMPI) ... ok test_scatter_group (__main__.TestMPI) ... ok test_send_recv (__main__.TestMPI) ... ok test_send_recv (__main__.TestMPI) ... ok test_send_recv (__main__.TestMPI) ... ok test_send_recv_any_source (__main__.TestMPI) ... ok test_send_recv_any_source (__main__.TestMPI) ... ok test_send_recv_any_source (__main__.TestMPI) ... ok test_send_recv_with_tag (__main__.TestMPI) ... ok test_send_recv_with_tag (__main__.TestMPI) ... ok test_send_recv_with_tag (__main__.TestMPI) ... ok ---------------------------------------------------------------------- Ran 68 tests in 6.415s OK (skipped=15) ok ---------------------------------------------------------------------- Ran 68 tests in 6.415s OK (skipped=15) ok ---------------------------------------------------------------------- Ran 68 tests in 6.415s OK (skipped=15) Running distributed tests for the nccl backend test_Backend_enum_class (__main__.TestDistBackend) ... ok test_DistributedDataParallel (__main__.TestDistBackend) ... ok test_DistributedDataParallelCPU (__main__.TestDistBackend) ... skipped 'nccl does not support DistributedDataParallelCPU' test_all_gather (__main__.TestDistBackend) ... skipped 'Only MPI supports CPU all gather' test_all_gather_cuda (__main__.TestDistBackend) ... skipped 'CUDA all gather skipped for NCCL' test_all_gather_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_gather_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_gather_multigpu (__main__.TestDistBackend) ... ok test_all_reduce_full_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_full_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_full_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_full_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_multigpu (__main__.TestDistBackend) ... skipped 'CUDA all_reduce multigpu skipped for NCCL' test_all_reduce_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_sum_cuda (__main__.TestDistBackend) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier' test_barrier_cuda (__main__.TestDistBackend) ... ok test_barrier_full_group (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier' test_barrier_full_group_cuda (__main__.TestDistBackend) ... ok test_barrier_group (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier' test_barrier_group_cuda (__main__.TestDistBackend) ... ok test_barrier_timeout_full_group (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_broadcast_cuda (__main__.TestDistBackend) ... ok test_broadcast_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_broadcast_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_broadcast_multigpu (__main__.TestDistBackend) ... skipped 'NCCL broadcast multigpu skipped' test_destroy_full_group (__main__.TestDistBackend) ... ok test_destroy_group (__main__.TestDistBackend) ... ok test_gather (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_gather_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_gather_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_get_backend (__main__.TestDistBackend) ... ok test_get_default_group (__main__.TestDistBackend) ... ok test_get_rank (__main__.TestDistBackend) ... ok test_get_rank_size_full_group (__main__.TestDistBackend) ... ok test_get_rank_size_group (__main__.TestDistBackend) ... ok test_irecv (__main__.TestDistBackend) ... skipped 'Nccl does not support irecv' test_isend (__main__.TestDistBackend) ... skipped 'Nccl does not support isend' test_reduce_full_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_full_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_full_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_full_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_multigpu (__main__.TestDistBackend) ... ok test_reduce_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_sum_cuda (__main__.TestDistBackend) ... ok test_scatter (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter' test_scatter_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter' test_scatter_group (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter' test_send_recv (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv' test_send_recv_any_source (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv from any source' test_send_recv_with_tag (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv' ---------------------------------------------------------------------- Ran 68 tests in 69.549s OK (skipped=52) Running distributed tests for the nccl backend with file init_method test_Backend_enum_class (__main__.TestDistBackend) ... ok test_DistributedDataParallel (__main__.TestDistBackend) ... ok test_DistributedDataParallelCPU (__main__.TestDistBackend) ... skipped 'nccl does not support DistributedDataParallelCPU' test_all_gather (__main__.TestDistBackend) ... skipped 'Only MPI supports CPU all gather' test_all_gather_cuda (__main__.TestDistBackend) ... skipped 'CUDA all gather skipped for NCCL' test_all_gather_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_gather_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_gather_multigpu (__main__.TestDistBackend) ... ok test_all_reduce_full_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_full_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_full_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_full_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_multigpu (__main__.TestDistBackend) ... skipped 'CUDA all_reduce multigpu skipped for NCCL' test_all_reduce_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_all_reduce_sum_cuda (__main__.TestDistBackend) ... skipped 'Only Gloo backend will have CUDA allReduce tested' test_barrier (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier' test_barrier_cuda (__main__.TestDistBackend) ... ok test_barrier_full_group (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier' test_barrier_full_group_cuda (__main__.TestDistBackend) ... ok test_barrier_group (__main__.TestDistBackend) ... skipped 'NCCL does not support CPU barrier' test_barrier_group_cuda (__main__.TestDistBackend) ... ok test_barrier_timeout_full_group (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_global (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts' test_barrier_timeout_group (__main__.TestDistBackend) ... skipped 'Only gloo backend supports timeouts' test_broadcast (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_broadcast_cuda (__main__.TestDistBackend) ... ok test_broadcast_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_broadcast_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_broadcast_multigpu (__main__.TestDistBackend) ... skipped 'NCCL broadcast multigpu skipped' test_destroy_full_group (__main__.TestDistBackend) ... ok test_destroy_group (__main__.TestDistBackend) ... ok test_gather (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_gather_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_gather_group (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_get_backend (__main__.TestDistBackend) ... ok test_get_default_group (__main__.TestDistBackend) ... ok test_get_rank (__main__.TestDistBackend) ... ok test_get_rank_size_full_group (__main__.TestDistBackend) ... ok test_get_rank_size_group (__main__.TestDistBackend) ... ok test_irecv (__main__.TestDistBackend) ... skipped 'Nccl does not support irecv' test_isend (__main__.TestDistBackend) ... skipped 'Nccl does not support isend' test_reduce_full_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_full_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_full_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_full_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_group_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_max (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_min (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_multigpu (__main__.TestDistBackend) ... ok test_reduce_product (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_sum (__main__.TestDistBackend) ... skipped 'Nccl does not support CPU tensors' test_reduce_sum_cuda (__main__.TestDistBackend) ... ok test_scatter (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter' test_scatter_full_group (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter' test_scatter_group (__main__.TestDistBackend) ... skipped 'Nccl does not support scatter' test_send_recv (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv' test_send_recv_any_source (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv from any source' test_send_recv_with_tag (__main__.TestDistBackend) ... skipped 'Nccl does not support send/recv' ---------------------------------------------------------------------- Ran 68 tests in 70.381s OK (skipped=52) `` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14690 Differential Revision: D13294169 Pulled By: teng-li fbshipit-source-id: 69ccac34c6c016899bfe8fbc50b48d4bfd1d3876	2018-12-03 12:04:26 -08:00
Edward Yang	18eaec7121	Add (unused) HIP API to the Context object. (#14623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14623 This is the last piece we need before we can start doing out-of-place HIPify on ATen. These APIs are not actually used at the moment; as we still do in-place HIPify which uses CUDA. Reviewed By: gchanan Differential Revision: D13277246 fbshipit-source-id: 771efa81c2d2022e29350f25a5b4bb8f49ac6df0	2018-12-03 10:54:57 -08:00
Edward Yang	b1faab3d8f	Replace THCState_getCurrentStream with direct at::cuda::getCurrentCUDAStream() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14500 Reviewed By: gchanan Differential Revision: D13241401 fbshipit-source-id: d78cf8ddce96876bedc1d14507b0646bcfd41aed	2018-12-03 10:54:55 -08:00
Edward Yang	a49bf21d50	Delete hasCuDNN from Context. (#14499 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14499 It still needs to stay in hooks, since it's part of the public C++ API, but I want library code to try to arrange for CuDNN checks to occur inside CUDA code, where we it's statically obvious if CuDNN is available (and you don't need to dynamic dispatch. Reviewed By: gchanan Differential Revision: D13241355 fbshipit-source-id: 4e668a5914ab890463a12d9e528ba4ecbb7dd7c2	2018-12-03 10:54:54 -08:00
Edward Yang	eb71df3e63	Delete at::current_device(), Context::current_device() and Context::getNumGPUs() (#14414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14414 The previous functions were CUDA-centric, and lead to lots of places where we improperly assumed that CUDA is the only game in town (it's not). Best to delete them. What are your alternatives? This diff fix some use sites which may give you some ideas. In particular, the "given a device type, give me the current device for that device type" might be a good function to enshrine for real. Reviewed By: gchanan Differential Revision: D13218540 fbshipit-source-id: 2f42cd6b9bdab4930d25166b8041c9466a1c6e0a	2018-12-03 10:54:52 -08:00
Wei Yang	5ee8312b63	sparse.mm(), reland #14526 (#14661 ) Summary: - reland reverted PR #14526 with doc fixes Pull Request resolved: https://github.com/pytorch/pytorch/pull/14661 Differential Revision: D13289047 Pulled By: weiyangfb fbshipit-source-id: 5b843a11a58b56aeada3af2680a27cf89ecef4d8	2018-12-03 10:39:27 -08:00
Pieter Noordhuis	7da2448d62	Fix multi-argument allreduce in ProcessGroupGloo (#14688 ) Summary: If multiple arguments are specified to c10d allreduce, they are interpreted as if they are expanding the ranks in the process group. Therefore, not only is every argument to allreduce an input that must be considered, it is also an output. The problem that this commit fixes is that they were not correctly considered as outputs. The upstream problem is tracked in facebookincubator/gloo#152. Once this is fixed there we can remove the copies that this commit adds. This fixes #14676. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14688 Differential Revision: D13294405 Pulled By: pietern fbshipit-source-id: 078a2a0a0ff12d051392461438f1496201ec3cb9	2018-12-03 09:41:17 -08:00
Gregory Chanan	b15242f70c	Assert all legacy operators are 'extended_method', remove codegen for… (#14649 ) Summary: … other paths. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14649 Differential Revision: D13285183 Pulled By: gchanan fbshipit-source-id: 91a58a22cba7e00eb0931bc277b0cb9d6f05cfdc	2018-12-03 07:41:50 -08:00
Gregory Chanan	737efa78ba	Remove 'type_method_inline_definitions' which isn't used. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14648 Differential Revision: D13284176 Pulled By: gchanan fbshipit-source-id: e6b8f9410fab57164259f97de2fd46f6bdf88d5a	2018-12-03 07:38:21 -08:00
Edward Yang	b96e6ee98d	Delete defunct DynamicCUDAInterface (#14621 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14621 Differential Revision: D13276723 Pulled By: ezyang fbshipit-source-id: b666b2cdf4c45ccec7c802e268878eb2f3e028aa	2018-12-03 07:33:05 -08:00
Gregory Chanan	af95f712b0	Get rid of deprecated_factory_method in codegen, which is no longer u… (#14641 ) Summary: …sed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14641 Differential Revision: D13283449 Pulled By: gchanan fbshipit-source-id: 35cedc48940fa6144b4eab6402d9e1dc74a67b65	2018-12-03 07:28:42 -08:00
Jongsoo Park	5c89190340	inline adagrad functions (#14194 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14194 Inline some of perfkernels/adagrad.h functions for better performance Reviewed By: hyuen Differential Revision: D13096351 fbshipit-source-id: b4da8053278d585eabc5389b8a8dcae0f253b413	2018-12-02 20:23:02 -08:00
Pieter Noordhuis	74c3cbc013	Increase test barrier timeout for barrier test (#14689 ) Summary: The CUDA initialization for the participating processes can take long enough for the barrier timeout to trigger on the process that doesn't participate in the group. See #14676. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14689 Reviewed By: teng-li Differential Revision: D13293695 Pulled By: pietern fbshipit-source-id: 6268dc9acfdb22f70c027e5e4be082f7127c0db4	2018-12-02 17:46:17 -08:00
Teng Li	5268dd468c	Fixed DistributedDataParallel cannot kick off all-reduce in a corner case (#14675 ) Summary: Ok, this corner happens for translation guys, and it only happens in the following corner case: (1) when the module is registered a parameter that does not requires grad and (2) this registered parameter has a unique type (say, double, or half) and it's the only unique type such that itself alone will be put into a separate bucket. and (3) it is the last parameter that got registered in the module, such that its bucket reduction is the first to be kicked off. Once this corner case happens, since it does not require grad, the backward hook won't be kicked off. Now that all other buckets are waiting for its bucket to be kicked off, in this case, no bucket will be kicked off since it's blocked by the first bucket (the unique type parameter). This PR fixes two things: (1) Make sure that we will only bucket parameters that requires_grad (2) Make all-reduction checks in the next iteration. As long as we detect the previous iteration's all-reduction has not been fully kicked off, we will issue an error in the next iteration. (3) Also removed some unused variables With this bug fixed, the only case when this error can happen is when the user changed parameters later after wrapping up the module with DDP, like the case in: https://github.com/pytorch/pytorch/issues/12603 Test covered as well Without the first fix, I varied that the repro in fbcode hit this error message: ``` result = self.forward(input, *kwargs) File "/data/users/tengli/fbsource/fbcode/buck-out/dev/gen/language_technology/neural_mt/os/pytorch_translate/train#link-tree/torch/nn/parallel/distributed.py", line 312, in forward raise RuntimeError("Not all gradients are all-reduced from " RuntimeError: Not all gradients are all-reduced from the backward of the previous iteration. This is unexpected and fatal error. Please check and ensure that the model's parameters are not changed after you wrap up the model with DistributedDataParallel. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14675 Differential Revision: D13291083 Pulled By: teng-li fbshipit-source-id: 2539b699fae843f104b4b8d22721ae82502ba684	2018-12-02 17:13:07 -08:00
peter	35c8f93fd2	Fix CUDA 8 build on Windows (#14665 ) Summary: Fixes #14663. Test for CUDA 8 is running here: https://dev.azure.com/pytorch/PyTorch/_build/results?buildId=54 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14665 Differential Revision: D13290392 Pulled By: soumith fbshipit-source-id: 57f0d5b704e5d1fcb4927cbc007327b4ed74f443	2018-12-01 16:50:38 -08:00
Ravi Vats	da2c3afa47	Fixed typo in README.md (#14346 ) Summary: Fixed the typo in the Docker image section of README.md file Pull Request resolved: https://github.com/pytorch/pytorch/pull/14346 Differential Revision: D13290403 Pulled By: soumith fbshipit-source-id: 1d848027a773f0cfc875c33d69a66e96abc7ac8b	2018-12-01 16:39:33 -08:00
Zachary DeVito	4c11dee0e8	Use Type::str() in Type::operator<< (#14657 ) Summary: Stacked on zip commit because it also changes expect files, read only the last commit. This reduces the number of ways we can print a Type from 3 (python_str, str, operator<<) to 2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14657 Differential Revision: D13288912 Pulled By: zdevito fbshipit-source-id: f8dd610cea798c511c1d4327395bba54b1aa1697	2018-12-01 00:53:27 -08:00
svcscm	143e171cb9	Updating submodules Reviewed By: yns88 fbshipit-source-id: 6b3905b999b1211196c9138d7236700a1b308491	2018-11-30 19:47:44 -08:00
Zachary DeVito	170ff7764f	Use a zip archive as our container format (#14521 ) Summary: After consulting with Owen, who pointed out the existence of the miniz library, I decided to take one last shot at using zip as our container format. miniz makes this surprisingly feasible and I think the benefits of using zip are large enough that we should do it. This replaces our custom container format with a zip archive, preserving all of the desirable features of our custom format, such as append-oriented writing, and mmap'able tensor data while adding a bunch of debugging advantages: 1. You can unzip and explore the container to debug what is going on with a model. 2. You can edit the model using a text editor (e.g. change the definition of a method, or editing the json-serialized meta-data), re-zip the file use OSX's native 'Compress' option, and re-load the result into pytorch. Note: this enables you to, e.g., print-debug serialized models. 3. We can easily enable features like compression in the future. 4. Stock python , without pytorch installed, and other programming languages can reasonably consume this format,using json and zipfile packages, which enables people to build tools like visualizers without those visualizers depending on pytorch. This will be especially useful if you want to, for instance, write a visualizer in javascript. Notes: * This add miniz (https://github.com/richgel999/miniz) as a dependency. miniz is a self-contained library for reading/writing zipfiles that unlike other zip libraries also includes libz compatible compress/decompress support. It is a single header and a single C file without any other dependencies. Note that the instructions for miniz explicitly state: > Please use the files from the releases page in your projects. Do not use the git checkout directly! So we have checked in the 'release' source. Miniz supports zip64, and its API is amenable to doing zip-align style things to align data. * Removes 'size' from RecordRef. This allows you to edit files in the zip archive without editing the meta-data file. Very important if you want to print-debug serialized models. * PyTorchStreamReader/PyTorchStreamWriter keep mostly the same API (though keys become strings) However, their implementation is completely swapped out to use miniz. * Code exists to check for the old magic number to give a decent warning to our preview users after we change the format. * Container version information is now put in a stand-alone 'version' file in the archive and serves a similar purpose to the other container version info. * All files in the zip archive start at 64-byte boundaries, using an approach similar to zip-align. Tests check that this property remains true. While the writer does this, the reader doesn't depend on it, allowing user-created archives that can use compression, and do not have to align data. * Added test to check for > 4GB files and archives. Disabled by default because it takes almost 2 minutes to run. * torchscript files are now optional: if a submodule does not have methods, it will not be written. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14521 Reviewed By: jamesr66a Differential Revision: D13252945 Pulled By: zdevito fbshipit-source-id: 01209294c0f6543d0fd716f85a38532249c52f8c	2018-11-30 19:19:29 -08:00
Alyssa Wang	1c21dc6e16	Revert D13252990: [pytorch][PR] [sparse] sparse.mm(S, D) Differential Revision: D13252990 Original commit changeset: 8fdb14144405 fbshipit-source-id: 49b8b0759a6e647854689962ffa72a205b4a2088	2018-11-30 18:53:47 -08:00
Jerry Zhang	c71edcc747	Tensor construction codemod - caffe2/caffe2/fb/operators - 2/3 Summary: Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13229251 fbshipit-source-id: 88b3984ea8ca82b9489c0ee9a338fd3f41dee615	2018-11-30 18:38:17 -08:00
Bram Wasti	fd17fd4aa9	Fix 'unknown type name 'optional'' (#14383 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14383 D11669870 seems to have missed a spot that wasn't triggered before the stacked code above Reviewed By: smessmer Differential Revision: D13198269 fbshipit-source-id: 74592bedae0721acee744e31ca95253ea6efdedb	2018-11-30 17:29:50 -08:00
Wanchao Liang	7f42d1c98a	fix double precision cast from pybind (#14417 ) Summary: JIT world only have double, not float, so when insertConstant, we need to cast the python `float_` to double instead of float. This will fix the incorrect `math.pi` and other high precision constants value Pull Request resolved: https://github.com/pytorch/pytorch/pull/14417 Differential Revision: D13282975 Pulled By: wanchaol fbshipit-source-id: 26a4c89ffc044d28598af673aebfec95153a869e	2018-11-30 17:25:32 -08:00
Elias Ellison	404ad939e5	Revert existing no_grad_embedding_renorm_ from aten (#14639 ) Summary: Remove no_grad_embedding_renorm_ from aten. Setting the derivatives of the inputs to false has different semantics from calling with no_grad(), because it will not error if an input is modified and then has it's grad accessed. Instead, make a custom op, and use NoGradGuard. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14639 Differential Revision: D13285604 Pulled By: eellison fbshipit-source-id: c7d343fe8f22e369669e92799f167674f124ffe7	2018-11-30 16:57:51 -08:00
Yan Zhu	aeb38cfcea	cuda implementation for PackSegment to support presence mask (#14635 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14635 as title Reviewed By: enosair Differential Revision: D13254097 fbshipit-source-id: b9f40109e2889907c925f9a4df9da14f67f45f38	2018-11-30 16:54:10 -08:00
svcscm	1d464d7f3e	Updating submodules Reviewed By: yns88 fbshipit-source-id: 17487c327cbe48969dff397656fe90efcf23b699	2018-11-30 16:23:00 -08:00
Zeming Lin	26f3fb34a1	Build distributed libs in build_libtorch.py (#14037 ) Summary: This patch detects and builds c10d and gloo for the C++ API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14037 Reviewed By: ezyang Differential Revision: D13283801 Pulled By: ebetica fbshipit-source-id: 006dbb691344819833da6b4b844c1f0572942135	2018-11-30 14:46:36 -08:00
Gregory Chanan	36c5f40ec0	Remove methods from _th_triu_ and _th_addcmul_. (#14624 ) Summary: These somehow slipped through when we moved all of Declarations.cwrap to functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14624 Reviewed By: ezyang Differential Revision: D13277434 Pulled By: gchanan fbshipit-source-id: e83451e2d0fdafb55635d4b757688a501454bf8c	2018-11-30 14:19:29 -08:00
Wei Yang	c3a2b1e155	sparse.mm(S, D) (#14526 ) Summary: - add `sparse.mm(S, D)` with backward - for `sparse.addmm()`, relax input constraint so that sparse matrix input doesn't have to coalesced Pull Request resolved: https://github.com/pytorch/pytorch/pull/14526 Reviewed By: ezyang Differential Revision: D13252990 Pulled By: weiyangfb fbshipit-source-id: 8fdb14144405a2122d4b8447ad4055cd0330e6e8	2018-11-30 14:15:34 -08:00
Freddie Mendoza	a84e873bb1	Put back linker flag for OpenMP to prevent build break on ppc64le (#14569 ) Summary: See #14539 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14569 Differential Revision: D13282161 Pulled By: ezyang fbshipit-source-id: 13a1131b26fa300b037f66d1919b97d14033f9e5	2018-11-30 14:13:04 -08:00
Peter Goldsborough	5c1692840e	Remove OptionsGuard from ATen (#14524 ) Summary: Resubmission of https://github.com/pytorch/pytorch/pull/13738 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14524 Differential Revision: D13268031 Pulled By: goldsborough fbshipit-source-id: fb306464b673c05ebd26d0f44d688ccd92d1d8c5	2018-11-30 13:30:35 -08:00
Jerry Zhang	4b915260c7	Explicitly ban uninitialized tensors when invoking Predictor classes (#14377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14377 att Reviewed By: dzhulgakov Differential Revision: D13197348 fbshipit-source-id: 85a451bde3a57a8acdd3af548606c05e223896a6	2018-11-30 13:26:00 -08:00
Fei Sun	738fc7054b	Report timer in benchmarking when requested Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14570 Reviewed By: llyfacebook Differential Revision: D13264904 Pulled By: sf-wind fbshipit-source-id: fd05bc32202b7734dc911e3c792357ddf9ecedee	2018-11-30 13:17:29 -08:00
Peter Goldsborough	f45405bf5b	Fix inheritance for SharedDataset (#14629 ) Summary: ezyang ebetica CC jaliyae Pull Request resolved: https://github.com/pytorch/pytorch/pull/14629 Differential Revision: D13278988 Pulled By: goldsborough fbshipit-source-id: 53afbcd1f3fc5cb23046ff92c4345cd90abd4584	2018-11-30 12:29:45 -08:00
David Riazati	814b5715ba	Move module tests to common_nn (#14578 ) Summary: This moves `new_module_tests` from `test_nn.py` to `common_nn.py` so that they can be used in `test_jit.py` without running any of `test_nn.py` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14578 Differential Revision: D13268286 Pulled By: driazati fbshipit-source-id: 6e8654a4c29ab754d656ac83820c14d1c1843e03	2018-11-30 12:14:59 -08:00
svcscm	c042f69dbb	Updating submodules Reviewed By: yns88 fbshipit-source-id: 863e9e2a1f0810f96494cabae1724622b9eb91ff	2018-11-30 11:47:16 -08:00
Brennan Vincent	5ae0ed8552	Remove default constructor lines that do nothing, and fix warnings with clang trunk (#14300 ) Summary: The lines removed in this diff were no-op, but confusing: the default constructors in `store_handler.h` are implicitly deleted, since `std::runtime_error` has no default constructor. Clang added a warning for this behavior [in September 2018](https://reviews.llvm.org/rL343285) (note that the warning is not just for cxx2a, despite the slightly confusing commit message), so building pytorch with a recent build of clang trunk causes spew of this warning, which is fixed by the present PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14300 Differential Revision: D13260039 Pulled By: umanwizard fbshipit-source-id: 92788dbd6794253e788ef26bde250a66d8fb917e	2018-11-30 11:16:35 -08:00
Roy Li	c03851e93a	remove copy_wrapper (#13937 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13937 We can now replace s_copy_ with our new _copy_ function. Experimented with moving s_copy_ out of VariableManualType.cpp, but seemed like there was enough special casing to warrant it staying. Reviewed By: ezyang Differential Revision: D13053648 fbshipit-source-id: e9e04d460baf4ee49b500212cf91b95221acd769	2018-11-30 11:12:59 -08:00
Roy Li	5c65a7812e	Move non_blocking copies to aten (#13866 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13866 just a straightforward port Reviewed By: ezyang Differential Revision: D13011878 fbshipit-source-id: f288efebf78fa634abfb681b938b44277064d5b6	2018-11-30 11:12:57 -08:00
Roy Li	e3840419ec	Move cuda copy to aten (#13348 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13348 Move cross device, cpu to device, device to cpu copies to aten. Most of it is a direct port, main difference is that we dispatch from a single _copy_ function for copies. Reviewed By: ezyang Differential Revision: D12850690 fbshipit-source-id: c2e3f336796b4ae38be6027d2ec131a274a6aa8c	2018-11-30 11:12:55 -08:00
Roy Li	0786dfee7c	Move THTensor_(copy) to aten (#13603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13603 P Moved vectorized CPU copy to aten. Notable changes mainly in _copy_same_type_. Reviewed By: ezyang Differential Revision: D12936031 fbshipit-source-id: 00d28813e3160595e73d104f76685e13154971c1	2018-11-30 11:12:54 -08:00
Sam Gross	c1c841a4e7	Changes based on @gchanan's review of #13420 (#14441 ) Summary: ``` The most significant change is that this fixes the error message when indexing an empty tensor with an out-of-bounds index. For example: x = torch.ones(10, 0) x[:, [3, 4]] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14441 Differential Revision: D13226737 Pulled By: colesbury fbshipit-source-id: d1c4a35a30e3217e3d1727d13f6b354a4a3b2a24	2018-11-30 11:03:20 -08:00
Michael Carilli	edb3ddf1a5	Accumulate grad fix (#14587 ) Summary: Rebased version of https://github.com/pytorch/pytorch/pull/13337. I don't think the lint errors in the original PR had to do with files I touched, so hopefully the rebase fixes them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14587 Differential Revision: D13277428 Pulled By: soumith fbshipit-source-id: f04c186b1dd4889b4250597eef87f9e9bf7b2426	2018-11-30 10:49:15 -08:00
fehiepsi	67308a9323	Fix expanded mvn and lowrankmvn (#14557 ) Summary: This PR fixes an issue of the slowness expanded MVN. A notebook to show the problem is [here](https://gist.github.com/fehiepsi/b15ac2978f1045d6d96b1d35b640d742). Basically, mvn's sample and log_prob have expensive computations based on `cholesky` and `trtrs`. We can save a lot of computation based on caching the unbroadcasted version of `scale_tril` (or `cov_diag`, `cov_factor` in lowrank mvn). When expanding, this cached tensor should not be expanded together with other arguments. Ref: https://github.com/uber/pyro/issues/1586 cc neerajprad fritzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/14557 Differential Revision: D13277408 Pulled By: soumith fbshipit-source-id: a6b16f999b008d5da148ccf519b7f32d9c6a5351	2018-11-30 10:49:13 -08:00
Jerry Zhang	2e0f3b038c	Tensor construction: combine Resize+mutable_data - 2/4 (#14205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14205 Original commit changeset: 8f9fb55842ae Reviewed By: dzhulgakov Differential Revision: D13126263 fbshipit-source-id: 12ba89e31b7738a81ec5c660ea7b79e8576c35dc	2018-11-30 10:46:58 -08:00
Daya S Khudia	f6354d903a	Unit tests need better compilation flow (#14547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14547 Unit tests used in dnnlowp need a better compilation flow as some of them need avx. Disabling for now so that pytorch builds with fbgemm. Reviewed By: jianyuh Differential Revision: D13240933 fbshipit-source-id: e2e187b758c5d89e524470cd261ce35493f427a2	2018-11-30 09:40:29 -08:00
Soumith Chintala	aa842fe101	clean up linkage options (#14609 ) Summary: minor code cleanup Differential Revision: D13277803 Pulled By: soumith fbshipit-source-id: 5ef925fe95037cab540b329054d7070c1ea7031e	2018-11-30 09:36:59 -08:00
Soumith Chintala	ad1b874a36	set mkl_set_dynamic to false (#13868 ) Differential Revision: D13277331 Pulled By: soumith fbshipit-source-id: 692bb7d5157235e00dea4776d1991bb07e16ff85	2018-11-30 09:29:43 -08:00
Soumith Chintala	37627a182b	fix USE_SYSTEM_NCCL build (#14606 ) Summary: fixes https://github.com/pytorch/pytorch/issues/14537 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14606 Differential Revision: D13274156 Pulled By: soumith fbshipit-source-id: f834715e8e17dacf60be459b0efffba1d4df40ae	2018-11-29 23:36:17 -08:00
CircleCI	ff91de43de	Set output of aten::mm to have the same output type as the original node after op canonicalization. (#14602 ) Summary: In CanonalizeOp, addmm is separated into mm and add. But output dimension and type are not preserved for the aten::mm node. Fixing this so that the dumped graph after this pass contains accurate information. sample output: before: %6 : Dynamic = aten::mm(%input.2, %5), scope: LinearModel/Sequential[model]/Linear[full0] after: %6 : Float(32, 200) = aten::mm(%input.2, %5), scope: LinearModel/Sequential[model]/Linear[full0] Pull Request resolved: https://github.com/pytorch/pytorch/pull/14602 Differential Revision: D13273754 Pulled By: soumith fbshipit-source-id: 82e22b5f30e9eb6ba9249c5a2216955421f39cc7	2018-11-29 23:24:27 -08:00
David Riazati	89c3dbcad8	Add binary cross entropy to standard lib Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14583 Differential Revision: D13269423 Pulled By: driazati fbshipit-source-id: 7cc1594d8189c3e8f2d4ce0462fdc0a03683006e	2018-11-29 22:23:13 -08:00
David Riazati	1f6d9f44fc	Add InstanceNorm, Distance modules to Script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551 Differential Revision: D13272741 Pulled By: driazati fbshipit-source-id: 3e4fe870d0e268903757f3ae8a56100606906bce	2018-11-29 22:18:55 -08:00
Pieter Noordhuis	3648c269e9	Misc distributed documentation updates (#14605 ) Summary: * s/environmental/environment/g * Casing (CUDA, InfiniBand, Ethernet) * Don't embed torch.multiprocessing.spawn but link to it (not part of the package) * spawn _function_ instead of _utility_ (it's mentioned after the launch utility which is a proper utility) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14605 Differential Revision: D13273480 Pulled By: pietern fbshipit-source-id: da6b4b788134645f2dcfdd666d1bbfc9aabd97b1	2018-11-29 21:51:43 -08:00
Pieter Noordhuis	11ef5191ff	Enable tests for CPU tensors in test_distributed.py (#14572 ) Summary: These were not enabled after adding support in the Gloo backend. The argument checks in ProcessGroupGloo raised an error in two cases: * If the input tensor list to scatter was ``[None]`` on processes other than the source process. * If the output tensor list to gather was ``[None]`` on processes other than the destination process. This commit prepares these arguments explicitly instead of boxing them at the process group call site. This fixes #14536. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14572 Differential Revision: D13272812 Pulled By: pietern fbshipit-source-id: 12cb0d85ec92f175365cbada585260f89330aad8	2018-11-29 21:39:02 -08:00
James Reed	1975917d0e	fix copy_ (#14593 ) Summary: Closes https://github.com/pytorch/pytorch/issues/14590 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14593 Differential Revision: D13272510 Pulled By: jamesr66a fbshipit-source-id: b6921a98460c371d435277c416dad0b5ab0fec8c	2018-11-29 20:31:53 -08:00
Pieter Noordhuis	220ce8046e	Binding for prctl(PR_SET_PDEATHSIG) (#14491 ) Summary: If torch.multiprocessing.spawn is used to launch non-daemonic processes (the default since #14391), the spawned children won't be automatically terminated when the parent terminates. On Linux, we can address this by setting PR_SET_PDEATHSIG, which delivers a configurable signal to child processes when their parent terminates. Fixes #14394. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14491 Differential Revision: D13270374 Pulled By: pietern fbshipit-source-id: 092c9d3c3cea2622c3766b467957bc27a1bd500c	2018-11-29 20:09:19 -08:00
Teng Li	9127ab3866	Fixed new_group won't work for two or more different rank groups (#14529 ) Summary: This fixed two things: (1) NCCL group doesn't support 2 or more groups, this is because, we need a group name in ProcessGroupNCCL class to keep track of the ProcessGroup ID within that group name, and also the NCCL unique ID within that group name and process group ID. Otherwise, different processes will create different NCCL PG in different orders and can clash on these names. This will fix the NCCL problem. (2) When using new_group, each rank should enter this function and update its global group name counter to ensure that every rank always operates on the same group name. With both fixes: repro code in: https://github.com/pytorch/pytorch/issues/14528 should work with both NCCL and Gloo backends. ``` tengli@learnfair096:~$ python -m torch.distributed.launch --nproc_per_node=8 --nnodes=1 --node_rank=0 --master_addr=127.0.0.1 --master_port=30000 ~/github_issues/nccl_group.py rank: 0 - val: 6.0 rank: 2 - val: 6.0 rank: 3 - val: 6.0 rank: 1 - val: 6.0 rank: 4 - val: 22.0 rank: 6 - val: 22.0 rank: 5 - val: 22.0 rank: 7 - val: 22.0 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14529 Differential Revision: D13253434 Pulled By: teng-li fbshipit-source-id: 8eb45882b996b06d951fc9a306d5de86a42e8b84	2018-11-29 19:57:47 -08:00
svcscm	e227aa9e2e	Updating submodules Reviewed By: yns88 fbshipit-source-id: 44cd40cc9bc25629ec9547327a515bac22e5c905	2018-11-29 19:46:35 -08:00
David Riazati	67e3905bc6	Revert D13268293: [pytorch][PR] [jit] Add InstanceNorm, Distance modules to Script Differential Revision: D13268293 Original commit changeset: cb33c6dcdadd fbshipit-source-id: 214a29b74c85b7b25df0eb48e3fdb81539049130	2018-11-29 19:19:35 -08:00
Teng Li	0d3cb91d8c	Make env init_method support both env and args for rank and size (#14494 ) Summary: Fixing: https://github.com/pytorch/pytorch/issues/14446 This was a supported behavior in old torch.distributed. We want to support it in the new release. Test should cover all combination of scenario when we have either env or arg set up for rank or size or both Pull Request resolved: https://github.com/pytorch/pytorch/pull/14494 Differential Revision: D13253433 Pulled By: teng-li fbshipit-source-id: c05974d84f1bdf969f74ec45763e11a841fe4848	2018-11-29 18:48:20 -08:00
Edward Yang	1a9602d5db	Delete caffe2_cuda_full_device_control (#14283 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14283 According to Yangqing, this code was only used by us to do some end-to-end performance experiments on the impact of cudaSetDevice and cudaGetDevice. Now that the frameworks are merged, there are a lot of bare calls to those functions which are not covered by this flag. It doesn't seem like a priority to restore this functionality, so I am going to delete it for now. If you want to bring it back, you'll have to make all get/set calls go through this particular interfaces. Reviewed By: dzhulgakov Differential Revision: D13156472 fbshipit-source-id: 4c6d2cc89ab5ae13f7c816f43729b577e1bd985c	2018-11-29 18:33:22 -08:00
Edward Yang	8617b780cf	Replace use of 'int' with more descriptive 'DeviceIndex' or 'StreamId'. (#14282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14282 This also is a substantive change, as 'DeviceIndex' and 'StreamId' are narrower types than 'int'. Reviewed By: Yangqing, smessmer Differential Revision: D13156471 fbshipit-source-id: 08aa0f70c4142415b6bd4d17c57da0641c1d0e9a	2018-11-29 18:33:21 -08:00
Zachary DeVito	fd31eae9ad	Switch import/export to python printing (#14400 ) Summary: Stacked on https://github.com/pytorch/pytorch/pull/14378, only look at the last commit. This changes the way methods are defined in TorchScript archives to use PythonPrint rather than ONNX protobufs. It also updates torch.proto to directly document the tensor data structure actually being serialized. Notes: * because PythonPrint prints all the methods at once per module, this removes MethodDef in favor of a single torchscript_area and a separate caffe2_graphs entry. Note that NetDef's already have method names, so there is no need or a separate method name entry. * This switches cpp/pickle area to RecordRef (references to a file in the container format) since it is possible the data in these arenas may be large and not suited to json ouput. * Removes 'annotations' -- annotations should be re-added on the first commit that actually has a practical use for them. In the current state it is unlikely they are representing the right information. * Some expect files have changed because PythonPrint is preserving more debug name information for parameter names. * MethodEncoder (the ONNX output format) has been deleted. There is still some cleanup possible combining EncoderBase and GraphEncode now that there is only a single pathway using EncoderBase. * This incorporates the changes from #14397 to define TensorDef Pull Request resolved: https://github.com/pytorch/pytorch/pull/14400 Reviewed By: suo Differential Revision: D13231800 Pulled By: zdevito fbshipit-source-id: af5c1152d0bd6bca8b06c4703f59b161bb19f571	2018-11-29 17:53:49 -08:00
Teng Li	2b7345bcd5	PT1 distributed doc update (#14530 ) Summary: Removed an incorrect section. We don't support this. I wrote this from my memory :( Pull Request resolved: https://github.com/pytorch/pytorch/pull/14530 Differential Revision: D13253471 Pulled By: teng-li fbshipit-source-id: c3f1ffc6c98ef8789157e885776e0b775ec47b15	2018-11-29 17:50:47 -08:00
David Riazati	75eccffdfe	Add InstanceNorm, Distance modules to Script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14551 Differential Revision: D13268293 Pulled By: driazati fbshipit-source-id: cb33c6dcdaddf8c7a49b3535894d77bf5d771ddd	2018-11-29 17:26:29 -08:00
David Riazati	15e8bb379e	Add `List` to annotations (#14482 ) Summary: This PR adds a polyfill for `typing.List` for Python versions that don't support `typing` as a builtin. It also moves the type defintions from `annotations.py` so that they can be used in `torch.nn`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14482 Differential Revision: D13237570 Pulled By: driazati fbshipit-source-id: 6575b7025c2d98198aee3b170f9c4323ad5314bd	2018-11-29 17:23:29 -08:00
Lu Fang	2752ad8045	Automatic update of fbcode/onnx to f461f7aad9987635b4aff108620ed7918f002d19 (#14568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14568 Previous import was 882c5283c54345d131e8fe5c859e4844dcf7ca8e Included changes: - [f461f7a](https://github.com/onnx/onnx/commit/f461f7a): Show the op's type and name when the shape inference is failed. (#1623) <Jerry> - [ab8aaf9](https://github.com/onnx/onnx/commit/ab8aaf9): Add scan test case (#1586) <G. Ramalingam> - [c95357e](https://github.com/onnx/onnx/commit/c95357e): link the tutorial (#1650) <Lu Fang> - [d7e2420](https://github.com/onnx/onnx/commit/d7e2420): Upgrade label encoder to support more input types (#1596) <Wei-Sheng Chin> - [6425108](https://github.com/onnx/onnx/commit/6425108): Add Doc about Adding New Operator into ONNX (#1647) <Lu Fang> - [295889c](https://github.com/onnx/onnx/commit/295889c): use an empty initializer to create map (#1643) <Lu Fang> - [e38f3ec](https://github.com/onnx/onnx/commit/e38f3ec): Remove redundant const (#1639) <daquexian> - [ea694bf](https://github.com/onnx/onnx/commit/ea694bf): implement fuse reduce->unsqueeze + fix assumption in nop_dropout pass (#1565) <Armen> - [6db386e](https://github.com/onnx/onnx/commit/6db386e): make output shape clear enough for Softmax family (#1634) <Lu Fang> - [2b67c6e](https://github.com/onnx/onnx/commit/2b67c6e): fix batchnorm doc (#1633) <Lu Fang> - [c901784](https://github.com/onnx/onnx/commit/c901784): remove inappropriate consts (#1632) <Lu Fang> - [de82119](https://github.com/onnx/onnx/commit/de82119): Shape inference fix for broadcast, concat and scan (#1594) <KeDengMS> - [d7ffe3b](https://github.com/onnx/onnx/commit/d7ffe3b): Update Optimizer Docs (#1607) <Armen> - [d09d139](https://github.com/onnx/onnx/commit/d09d139): mark PROTOBUF_INCLUDE_DIRS as BUILD_INTERFACE (#1466) <Yuta Okamoto> - [eb4b7c2](https://github.com/onnx/onnx/commit/eb4b7c2): allow variadic parameters of different types (#1615) <G. Ramalingam> - [4166246](https://github.com/onnx/onnx/commit/4166246): Fix onnxifi test (#1617) <Yinghai Lu> - [6706a4d](https://github.com/onnx/onnx/commit/6706a4d): Fix a bug in vector address access (#1598) <Raymond Yang> - [ae39866](https://github.com/onnx/onnx/commit/ae39866): Separate types of inputs 1 and 2 in OneHot op. (#1610) <Spandan Tiwari> - [45ba661](https://github.com/onnx/onnx/commit/45ba661): Handle new types in the switch. (#1608) <Dmitri Smirnov> - [14853b6](https://github.com/onnx/onnx/commit/14853b6): Bump docker image version to 230 used in CircleCI (#1606) <bddppq> - [e0993b8](https://github.com/onnx/onnx/commit/e0993b8): [onnxifi] Make sure that backend handles run async. (#1599) <Roman Dzhabarov> - [e6965cc](https://github.com/onnx/onnx/commit/e6965cc): Introduce SparseTensor ML proto (#1554) <Dmitri Smirnov> - [75b782f](https://github.com/onnx/onnx/commit/75b782f): In driver test check the return status of onnxGetBackendIDs (#1597) <bddppq> - [c05b364](https://github.com/onnx/onnx/commit/c05b364): Make CI log less verbose (#1595) <bddppq> - [fa568e4](https://github.com/onnx/onnx/commit/fa568e4): Loop type shape inferencing (#1591) <Scott McKay> - [937e64c](https://github.com/onnx/onnx/commit/937e64c): add uint8 (#1590) <Lu Fang> - [f86e951](https://github.com/onnx/onnx/commit/f86e951): Add domain as an optional parameter for make_node function (#1588) <Young Kim> - [ff45588](https://github.com/onnx/onnx/commit/ff45588): Remove unreachable code in shape_inference.h (#1585) <Changming Sun> - [f7dcad0](https://github.com/onnx/onnx/commit/f7dcad0): Add several hyperbolic function ops. (#1499) <Sergii Dymchenko> - [a60ac7d](https://github.com/onnx/onnx/commit/a60ac7d): Add OneHot op to ONNX. (#1567) <Spandan Tiwari> - [f6c3a7e](https://github.com/onnx/onnx/commit/f6c3a7e): [compiler flag] Issue a warning if class has virtual method but missing virtual dtor. (#1583) <Roman Dzhabarov> - [88d1784](https://github.com/onnx/onnx/commit/88d1784): Fix MaxUnpool shape inference when output_shape is provided as input (#1578) <Spandan Tiwari> - [20041b7](https://github.com/onnx/onnx/commit/20041b7): Add type shape inferencing for the If operator (#1571) <Scott McKay> - [d6c4c75](https://github.com/onnx/onnx/commit/d6c4c75): Add a virtual destructor to GraphInferencer (#1574) <Changming Sun> - [a339598](https://github.com/onnx/onnx/commit/a339598): fix ConvTranspose spec (#1566) <Wenhao Hu> Reviewed By: zrphercule Differential Revision: D13263831 fbshipit-source-id: a2ff22c6454e2430429e5a7d18d21661a7ffb0cb	2018-11-29 16:31:56 -08:00
Jane Wang	dc7498c84d	add gloo support for reduce on GPU (#14443 ) Summary: as titled Pull Request resolved: https://github.com/pytorch/pytorch/pull/14443 Reviewed By: pietern Differential Revision: D13222907 Pulled By: janewangfb fbshipit-source-id: f418c5d84880196f97089114d02957cf739243f8	2018-11-29 16:19:39 -08:00
Edward Yang	69d3c00ae1	Expunge use of type() from SparseTensor. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14546 Reviewed By: gchanan Differential Revision: D13258512 fbshipit-source-id: b2d562b6c5228288f60f02beab3c44c50163248f	2018-11-29 16:04:18 -08:00
Edward Yang	c7f828809b	Expunge occurrences of type() from scalar_test (#14545 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14545 Self explanatory Reviewed By: gchanan Differential Revision: D13258513 fbshipit-source-id: abce357de57b95cde58b3894c251da519ede6b53	2018-11-29 16:04:16 -08:00
Edward Yang	9aea856115	Expunge use of type() in Distributions.cpp (#14544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14544 Modern usage is options(). This doesn't have a functional difference, because all call sites were CPU only (where getting the device index right doesn't matter.) Reviewed By: gchanan Differential Revision: D13258252 fbshipit-source-id: c70f8d618ee9caf37ff2469cceaa439348b6114c	2018-11-29 16:04:14 -08:00
Edward Yang	7879c979b5	Expunge uses of type() from EmbeddingBag. (#14543 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14543 The modern way to do this is to use options(). It doesn't make a functional difference here because everything is CPU (so loss of device information is not a big deal), but it's definitely safer this way. Reviewed By: gchanan Differential Revision: D13257847 fbshipit-source-id: afbc9f7f8d4ca5a8b1cf198997c307e27a2c3333	2018-11-29 16:04:12 -08:00
Edward Yang	6fe1867c23	Expunge direct device index handling from tensor_conversion_dispatch (#14421 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14421 Last time I looked this, I bailed because it seemed like there were a lot of sites to fix. Well, I need this to work properly for out-of-place HIPify, so I took another whack at it. Changes should be pretty self-explanatory. Reviewed By: gchanan Differential Revision: D13221302 fbshipit-source-id: ed21e2668a1a629898a47358baf368fe680263a0	2018-11-29 16:04:10 -08:00
Jerry Zhang	5805ef5a83	call raw_mutable_data when data type didn't match in BlobGetMutableTensor (#14513 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14513 att Reviewed By: dzhulgakov Differential Revision: D13245875 fbshipit-source-id: 3398a1f41a6195e120ed574dee887070e86dfe1f	2018-11-29 15:18:58 -08:00
David Riazati	666d383a00	Add broadcast list default arg support (#14361 ) Summary: To convert `max_unpool` functions to weak script, this PR adds support for `T` as default arguments for `BroadcastingListN[T]`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14361 Differential Revision: D13192231 Pulled By: driazati fbshipit-source-id: a25b75a0e88ba3dfa22d6a83775e9778d735e249	2018-11-29 15:15:47 -08:00
Michael Carilli	a2d8e84594	Added launch bounds in VolumetricConvolution.cu (#14564 ) Summary: A few months ago we were seeing test failures on certain architectures due to invalid launch configurations of the kernels in aten/src/THCUNN/VolumetricConvolution.cu. This PR ensures that those kernels are always compiled such that at least one block can be resident on an SM, and such errors will not be encountered at runtime on any architecture after compiling for that architecture. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14564 Differential Revision: D13266136 Pulled By: soumith fbshipit-source-id: 35464b20848bb0a1168e8f3b233172331c50b35b	2018-11-29 14:49:29 -08:00
rohithkrn	0d663cec30	Unify cuda and hip device types in Caffe2 python front end (#14221 ) Summary: Goal of this PR is to unify cuda and hip device types in caffe2 python front end. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14221 Differential Revision: D13148564 Pulled By: bddppq fbshipit-source-id: ef9bd2c7d238200165f217097ac5727e686d887b	2018-11-29 14:00:16 -08:00
Lin Huang	bdaa0e38b8	Fix tautological-compare in aten/src/ATen/native/cuda/SummaryOps.cu (#14540 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14540 refactor the HANDLE_SWITCH_CASE to avoid tautological-compare in macro Reviewed By: ezyang Differential Revision: D13255725 fbshipit-source-id: cfa64bb7bc53d19c93a693015202f207567690b4	2018-11-29 13:57:27 -08:00
zrphercule	eeb0d67b92	Update to export in onnx_aten_fallback option Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14492 Differential Revision: D13265701 Pulled By: zrphercule fbshipit-source-id: b339c92078f73d152a14db7d5d2b3f5edda9dda6	2018-11-29 13:49:50 -08:00
Junjie Bai	2901777a0e	Add back the MAX_JOBS=4 restriction to make rocm CI more stable (#14566 ) Summary: As a workaround before hcc has fixed high memory usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/14566 Differential Revision: D13263555 Pulled By: bddppq fbshipit-source-id: 479c7a76aff3919f028e03ef345795537480f0fa	2018-11-29 13:24:56 -08:00
Michael Suo	1b0b2e69f8	assorted alias analysis fixes (#14556 ) Summary: - Correctly report whether nodes write to an alias set. - Fix loop convergence. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14556 Differential Revision: D13261376 Pulled By: suo fbshipit-source-id: 8123c0fb1f8f137a15bd82719be2d99e502bccc2	2018-11-29 13:09:26 -08:00
Adam Paszke	31b3d81714	Broadcast prim::FusedConcat inputs independently when checking kernels (#14503 ) Summary: Fixes #14483. cc zou3519 mruberry Pull Request resolved: https://github.com/pytorch/pytorch/pull/14503 Differential Revision: D13256343 Pulled By: zou3519 fbshipit-source-id: 1c68a23f425be067a742bada7ee8cdfab7fc3fa2	2018-11-29 13:05:00 -08:00
Your Name	cf059028f0	Do not load ROCm cmake files if USE_ROCM is off (#14261 ) Summary: Previously if it unconditionally tries to load rocm cmake files, so there was no way to disable rocm build. After this change, USE_ROCM=0 will disable rocm build. Should fix #14025 soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/14261 Differential Revision: D13242090 Pulled By: bddppq fbshipit-source-id: 652ec7d49dce9b357778bfa53a8e04b7079787ab	2018-11-29 11:17:19 -08:00
Sebastian Messmer	fb6806f6e9	Remove at references in c10 Allocator.h (#14434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14434 The referenced classes live now in c10, so we don't need to specify their namespace. Reviewed By: ezyang Differential Revision: D13224015 fbshipit-source-id: 6d154b8e3f9a1e38ff0407dbb1151f5c1d5df260	2018-11-29 11:07:22 -08:00
Pieter Noordhuis	4ec6bd7356	Add sourceRank() to ProcessGroup::Work (#14453 ) Summary: This function is only implemented for the subclasses where it makes sense. If it's not overridden it will throw an error. Having this function removes the need for a pointer passing hack to pass the source rank of a recv operation back to the caller. Instead, the caller can now call `source_rank` on the work object and achieve the same result. Closes #11804. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14453 Differential Revision: D13230898 Pulled By: pietern fbshipit-source-id: ef38f48bfaca8ef9a364e5be122951bafc9f8e49	2018-11-29 09:16:53 -08:00
Matthew Heidemann	7c24a16f82	Fixed typo for BCEWithLogitLoss doc comments (#14532 ) Summary: The math symbol was missing a prefix `:` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14532 Differential Revision: D13256077 Pulled By: soumith fbshipit-source-id: 2359819d8aa664f915be1c436cbb0c0756504028	2018-11-29 08:22:19 -08:00
Ryan Moore	29d697aec4	typo in Module docstring Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14511 Differential Revision: D13246061 Pulled By: soumith fbshipit-source-id: 6c13a2957c4c4324ab5d839d634689c61e25b0fe	2018-11-29 07:17:29 -08:00
Jaliya Ekanayake	44cb43bcc1	Jaliyae/samplers (#13870 ) Summary: Make Samplers optionally accept new size in their reset() method. This helps dataloader or dataset to reset the sampler for an epoch or a chunk of data with different sizes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13870 Differential Revision: D13240120 Pulled By: soumith fbshipit-source-id: 19c53f8be13c0fdcf504f0637b0d3e6009a8e599	2018-11-29 07:07:19 -08:00
David Riazati	9e93a02624	Use nn module tests in test_jit (#14238 ) Summary: This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types` Also depends on #14379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238 Differential Revision: D13252887 Pulled By: driazati fbshipit-source-id: e9638cf74089884a32b8f0f38396cf432c02c988	2018-11-28 23:31:25 -08:00
svcscm	ba25b37e9b	Updating submodules Reviewed By: yns88 fbshipit-source-id: f957056bb48c583738c5defaf3d1f01cd7df3915	2018-11-28 23:31:23 -08:00
svcscm	70e3736e20	Updating submodules Reviewed By: yns88 fbshipit-source-id: 9800251baaa09d9f7988eff340ef36e0ab11f579	2018-11-28 21:09:08 -08:00
Peter Goldsborough	db15f2e13f	Fix version.groups() (#14505 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/14502 fmassa soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/14505 Differential Revision: D13242386 Pulled By: goldsborough fbshipit-source-id: faebae8795e1efd9c0ebc2294fe9648193d16624	2018-11-28 20:27:33 -08:00
Elias Ellison	6d63e9dbff	Support Embedding + EmbeddingBag in Script + (Ignore flakey test) (#14509 ) Summary: Resubmitting PR #14415 The tests added for Embedding + EmbeddingBag had random numbers as input, which affected the random number generator & caused the flakey test to break. Everything but the last two commits have already been accepted Pull Request resolved: https://github.com/pytorch/pytorch/pull/14509 Differential Revision: D13247917 Pulled By: eellison fbshipit-source-id: ea6963c47f666c07687787e2fa82020cddc6aa15	2018-11-28 19:16:38 -08:00
Elias Ellison	105fa58748	pointwise_loss (#14134 ) Summary: Adding pointwise loss ops to weak_script Pull Request resolved: https://github.com/pytorch/pytorch/pull/14134 Differential Revision: D13209455 Pulled By: eellison fbshipit-source-id: 87fc0222121f34a2f4edb24c2da2a11124b097d8	2018-11-28 18:14:38 -08:00
James Sun	186341c5dc	Merge Caffe2 and PyTorch thread pool definitions (#14114 ) Summary: (1) Move Caffe2 thread pool to aten (2) Use the same thread pool definition for PyTorch interpreter (3) Make ivalue::Future thread-safe Pull Request resolved: https://github.com/pytorch/pytorch/pull/14114 Reviewed By: ilia-cher Differential Revision: D13110451 Pulled By: highker fbshipit-source-id: a83acb6a4bafb7f674e3fe3d58f7a74c68064fac	2018-11-28 18:10:20 -08:00
Sam Gross	533668d7e4	Ensure that indices are on the same device as self Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14504 Reviewed By: wat3rBro Differential Revision: D13242200 Pulled By: colesbury fbshipit-source-id: 82731cee808681ec612d406342070640eb26e519	2018-11-28 17:54:32 -08:00
Dmytro Dzhulgakov	da9e49e586	Remove Context dependency from Tensor class (#14269 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14269 Removes reference to Context proper and instead adds a bool argument for async copy (the same as `copy_`) For CopyFrom - I haven't tweaked all callsites yet. Instead I rely on a terrible hack that pointer to context is implicitly converted to bool when passed, haha :) It's not a good code and I propose to fix it in a follow up diff (maybe using clangr tooling). Reviewed By: ezyang Differential Revision: D13117981 fbshipit-source-id: 7cb1dc2ba6a4c50ac26614f45ab8318ea96e3138	2018-11-28 15:45:38 -08:00
Dmytro Dzhulgakov	0cfbbceac3	Change Tensor::CopyFrom to a simple double dispatch (#14268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14268 Removes the need for Context in Tensor by doing simple dispatch for CopyBytes. It'd eventually be subsumed by Roy Li's changes of proper copy_ op, but before that is done, let's get a clear logic of how copies are implemented and clean up some craft in CopyFrom implementation. Note, that with these changes, one can probably can get rid of Context::CopyFromCPU/CopyToCPU, but it's a matter for follow up diffs. This diff doesn't change the API of Tensor yet, but relies on the fact that passing `Context` to CopyFrom makes copy async if the device is CUDA and doesn't have any effect otherwise (that's how Context methods are implemented). This doesn't change semantics of copy async implementation - as before it blindly calls cudaMemcpyAsync which probably means that it can be misused if invoked separately outside of operator body. I'll leave it for the follow up copy_ unification. For Extend() we always do async copy - it makes sense as it's an in-place device-device operation and only any further op would be observable. Note: there are now three ways of invoking copy in C2 code - templated CopyBytes, virtual CopyFromCPU/etc, and double-dispatch free method here. Hopefully we can get rid of the second one. Also, please advise whether it's c10-worthy :) Reviewed By: ezyang Differential Revision: D13117987 fbshipit-source-id: a6772d6dcf3effaf06717da3a656fc9873b310b5	2018-11-28 15:45:37 -08:00
albanD	f80d34a1c8	Update Tensor doc (#14339 ) Summary: Add to the Tensor doc info about `.device`, `.is_cuda`, `.requires_grad`, `.is_leaf` and `.grad`. Update the `register_backward_hook` doc with a warning stating that it does not work in all cases. Add support in the `_add_docstr` function to add docstring to attributes. There is an explicit cast here but I am not sure how to handle it properly. The thing is that the doc field for getsetdescr is written as being a const char * (as all other doc fields in descriptors objects) in cpython online documentation. But in the code, it is the only one that is not const. I assumed here that it is a bug in the code because it does not follow the doc and the convention of the others descriptors and so I cast out the const. EDIT: the online doc I was looking at is for 3.7 and in that version both the code and the doc are const. For older versions, both are non const. Please let me know if this should not be done. And if it should be done if there is a cleaner way to do it ! Pull Request resolved: https://github.com/pytorch/pytorch/pull/14339 Differential Revision: D13243266 Pulled By: ezyang fbshipit-source-id: 75b7838f7cd6c8dc72b0c61950e7a971baefaeeb	2018-11-28 15:28:17 -08:00
andersj	fb7e40b7eb	nccl fixes (#14195 ) Summary: This has 4 changes 1) propagate USE_SYSTEM_NCCL. Previously it was ignored and cmake always did a FindPackage 2) respect SCCACHE_DISABLE in our caffe2 sccache wrapper for circleci 3) use SCCACHE_DISABLE when building nccl, because it triggers the same bug as when using CCACHE (already tracked in https://github.com/pytorch/pytorch/issues/13362). This was hidden because we weren't respecting USE_SYSTEM_NCCL, and were never building nccl ourselves in CI 4) In one particular CI configuration (caffe2, cuda 8, cudnn 7), force USE_SYSTEM_NCCL=1. Building the bundled nccl triggers a bug in nvlink. I've done some investigation, but this looks like a tricky, preexisting bug, so rather than hold up this diff I'm tracking it separately in https://github.com/pytorch/pytorch/issues/14486 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14195 Differential Revision: D13237502 Pulled By: anderspapitto fbshipit-source-id: 1100ac1269c7cd39e2e0b3ba12a56a3ce8977c55	2018-11-28 14:43:06 -08:00
Edward Yang	ca55c5411f	Clean up house on CUDAStream (#14247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14247 Just a bunch of clean up to get the code in a good state before we enshrine it in c10. Billing of changes: - Inline all "pointer" API functions into their real implementations, so we don't have a bunch of dead pointer functions hanging around. - Replace all occurrences of int64_t with DeviceIndex, as appropriate - Rename device field to device_index - Add documentation for everything in CUDAStream.h - Bring CUDAStream to API parity with Stream (e.g., support equality) - Delete uncheckedSetCurrentCUDAStream, it didn't work anyway because StreamId to internal pointer conversion has a bunch of ways it can fail. Just hope for the best! Reviewed By: dzhulgakov Differential Revision: D13141949 fbshipit-source-id: a02f34921e3d8294bd77c262bd05da07d1740a71	2018-11-28 14:01:59 -08:00
Edward Yang	3aeb288e40	Make clang-tidy shut up about Python C API macros. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14480 Reviewed By: goldsborough Differential Revision: D13235001 fbshipit-source-id: cd7f00b12ed3d9ef0fb0d7bd6c428e21561ec1b6	2018-11-28 13:54:42 -08:00
Sebastian Messmer	e3711aa93f	Make TensorImpl/StorageImpl safer (#14429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14429 - forbid copying - make final what ought to be Reviewed By: dzhulgakov Differential Revision: D13223125 fbshipit-source-id: e6176cc916d4cd8370c835f243ca90d5c3124c4a	2018-11-28 13:41:49 -08:00
Sebastian Messmer	f6dfd9d545	Handle copying intrusive_ptr_target correctly (#14428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14428 See in-code comment Reviewed By: ezyang Differential Revision: D13223126 fbshipit-source-id: 1e87e6112bbcca6377ca04ef2ba25ef937931061	2018-11-28 13:41:48 -08:00
Edward Yang	5f07b33857	Revert D13219647: [pytorch][PR] Support Embedding + EmbeddingBag in Script Differential Revision: D13219647 Original commit changeset: c90706aa6fbd fbshipit-source-id: d189e717ba0773de43d633876bc3a688830a9303	2018-11-28 13:38:58 -08:00
Sebastian Messmer	aec4c19460	Remove StorageImpl::type() (#14139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14139 This seems neither be used nor implemented. Also, it is a c10->aten dependency which we don't want. Reviewed By: ezyang Differential Revision: D13112298 fbshipit-source-id: 0407c4c3ac9b02bbd6fca478336cb6a6ae334930	2018-11-28 13:32:38 -08:00
Jerry Zhang	bcd7b03c2a	Add XBlobGetMutableTensor that returns Tensor (#14424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14424 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14136 Since now Tensor is a shared_ptr, it doesn't make sense to have Tensor* around anymore, so we want to change Tensor* to Tensor in the interface. We added functions that work with `Tensor` instead of `Tensor` in this diff. To remove Tensor, we'll do following ``` auto* Y = Ouptut(0); Y->mutable_data... ``` --> ``` auto Y = Output(0); Y.mutable_data... ``` But to run clangr codemod, we'll keep both APIs in different names, e.g. `Output` and `XOutput`, and do the refactor and then delete the old method and rename the new method into the old one. For example for `Output`, we'll first codemod the callsites from `Output` to `XOutput`, then delete the old `Output` and rename `XOutput` to `Output` in the end. Reviewed By: smessmer Differential Revision: D12934074 fbshipit-source-id: d0e85f6ef8d13ed4e7a7505faa5db292a507d54c	2018-11-28 13:29:48 -08:00
Pieter Noordhuis	0f62af4ab1	Add timeout kwarg to init_process_group (#14435 ) Summary: This applies to the gloo backend only. Timeout support for the NCCL and MPI backends is tracked in issues #14371 and #14372 respectively. When creating a new process group (either the global one or any subgroup created through `new_group`) you can specify a timeout keyword argument (of type datetime.timedelta). This timeout applies to all collective operations executed against that process group, such that any operation taking longer than the timeout will throw a runtime error. Using a different, better catchable error type is tracked in #14433. This fixes #14376. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14435 Differential Revision: D13234317 Pulled By: pietern fbshipit-source-id: 973993b67994dc64861c0977cbb6f051ec9d87f6	2018-11-28 11:35:01 -08:00
Edward Yang	7c4aef9dfc	Add support for HIP to DispatchStub. (#14413 ) Summary: I feel a bit bad writing this patch, because there isn't really any reason not to use the normal dispatch mechanism for CUDA and HIP here (so we have yet another dispatcher), but I don't really want to sign up to rewrite DispatchStub to deduplicate the dispatcher right now. Need to natively add support for HIP here, as I don't want to have to HIPify files which are not in a CUDA directory. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14413 Differential Revision: D13220358 Pulled By: ezyang fbshipit-source-id: cc61218322589a1dc2ab8eb9d5ddd3c616f6b712	2018-11-28 11:07:45 -08:00
Elias Ellison	7749804099	Support Embedding + EmbeddingBag in Script (#14415 ) Summary: Add support for Embedding and EmbeddingBag in script. Both functions require with torch.no_grad(), which we don't have any plans to support in the near future. To work around this, I added a embedding_renorm function without derivatives. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14415 Reviewed By: wanchaol Differential Revision: D13219647 Pulled By: eellison fbshipit-source-id: c90706aa6fbd48686eb10f3efdb65844be7b8717	2018-11-28 10:52:30 -08:00
Jongsoo Park	c32debb916	fix build error from D13188595 (#14481 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14481 Fix build error in mode/opt Reviewed By: dskhudia Differential Revision: D13234688 fbshipit-source-id: 6c8515c45f75e7b88713a303f22990ad85d68beb	2018-11-28 10:46:33 -08:00
Raghavendra Thodime	a02b3374d4	Revert D13144472: [fix] condition blob in while_op test changes data type Differential Revision: D13144472 Original commit changeset: af4d920a3148 fbshipit-source-id: 74d9f69fc66964b5e68b4b2cd2fd2be1f63e9d69	2018-11-28 10:43:22 -08:00
Jiong Gong	6039e25e8d	Fix the build issue in setup.py due to cmake version type x.x.x.x vio… (#14331 ) Summary: See https://github.com/pytorch/pytorch/issues/13226 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14331 Differential Revision: D13234639 Pulled By: orionr fbshipit-source-id: 87880057e84242e4af5ad6bf87e08831aa2c5459	2018-11-28 10:38:27 -08:00
JerryShih	8901935ad4	Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#14473 ) Summary: Original PR: https://github.com/pytorch/pytorch/pull/11563 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14473 Differential Revision: D13234208 Pulled By: ezyang fbshipit-source-id: 7d874c63659e93728af239ecdfb85547613e52ad	2018-11-28 09:28:26 -08:00
Edward Yang	302caef154	Revert D13166626: [pytorch][PR] ignore generated caffe2 docs and virtualenvs Differential Revision: D13166626 Original commit changeset: 4f11228d8b5d fbshipit-source-id: ff301f1791ca8a390767ae43cde8637dcd044d0c	2018-11-28 07:40:04 -08:00
Brennan Vincent	c638f379b3	Make `mean` function work across multiple dimensions. (#14252 ) Summary: Multi-dimensional `sum` is already implemented, and it's trivial to implement `mean` in terms of `sum`, so just do it. Bonus: Fix incomplete language in the `torch.sum` documentation which doesn't take into account multiple dimensions when describing `unsqueeze` (at the same time as introducing similar language in `torch.mean`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/14252 Differential Revision: D13161157 Pulled By: umanwizard fbshipit-source-id: c45da692ba83c0ec80815200c5543302128da75c	2018-11-28 06:53:09 -08:00
Francisco Massa	68251fb931	Fix half tensor printing plus speedup large tensor printing (#14418 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/14344 and https://github.com/pytorch/pytorch/issues/6863 The slowdown was due to the fact that we were only summarizing the tensor (for computing the number of digits to print) if its first dimension was larger than the threshold. It now goes over all the dimensions. Some quick runtime analysis: Before this PR: ```python In [1]: import torch; a = torch.rand(1, 1700, 34, 50) In [2]: %timeit str(a) 13.6 s ± 84.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ``` After this PR ```python In [1]: import torch; a = torch.rand(1, 1700, 34, 50) In [2]: %timeit str(a) 2.08 ms ± 395 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) In [3]: b = a.cuda() In [4]: %timeit str(b) 8.39 ms ± 45.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14418 Reviewed By: weiyangfb Differential Revision: D13226950 Pulled By: soumith fbshipit-source-id: 19eb4b855db4c8f891d0925a9c56ae8a2824bb23	2018-11-28 06:13:06 -08:00
Wei Yang	be7c618fd7	torch.sparse.sum() (#12430 ) Summary: - to fix #12241 - add `_sparse_sum()` to ATen, and expose as `torch.sparse.sum()`, not support `SparseTensor.sum()` currently - this PR depends on #11253, and will need to be updated upon it lands - [x] implement forward - [x] implement backward - performance [benchmark script](https://gist.github.com/weiyangfb/f4c55c88b6092ef8f7e348f6b9ad8946#file-sparse_sum_benchmark-py): - sum all dims is fastest for sparse tensor - when input is sparse enough nnz = 0.1%, sum of sparse tensor is faster than dense in CPU, but not necessary in CUDA - CUDA backward is comparable (<2x) between `sum several dims` vs `sum all dims` in sparse - CPU backward uses binary search is still slow in sparse, takes `5x` time in `sum [0, 2, 3] dims` vs `sum all dims` - optimize CUDA backward for now - using thrust for sort and binary search, but runtime not improved - both of CPU and CUDA forward are slow in sparse (`sum several dims` vs `sum all dims`), at most `20x` slower in CPU, and `10x` in CUDA - improve CPU and CUDA forward kernels (nnz, sizes, sum_dims, keepdim, sum all or dims, bk=backward) \| CPU (sparse vs dense) \| CUDA(sparse vs dense) -- \| -- \| -- (1000, [1000, 1000, 2, 2], [0, 1], False, sumAll) \| 8.77 µs vs 72.9 µs \| 42.5 µs vs 108 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumD) \| 112 µs vs 4.47 ms \| 484 µs vs 407 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) \| 141 µs vs 148 µs \| 647 µs vs 231 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumD, bk) \| 235 µs vs 1.23 ms \| 781 µs vs 213 µs (1000, [1000, 1000, 2, 2], [2, 3], False, sumD) \| 48.5 µs vs 360 µs \| 160 µs vs 2.03 ms (1000, [1000, 1000, 2, 2], [2, 3], False, sumD, bk) \| 258 µs vs 1.22 ms \| 798 µs vs 224 µs (1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD) \| 204 µs vs 882 µs \| 443 µs vs 133 µs (1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) \| 709 µs vs 1.15 ms \| 893 µs vs 202 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumAll) \| 39.8 µs vs 81 µs \| 42.4 µs vs 113 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumD) \| 747 µs vs 4.7 ms \| 2.4 ms vs 414 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) \| 1.04 ms vs 126 µs \| 5.03 ms vs 231 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumD, bk) \| 1.12 ms vs 1.24 ms \| 5.99 ms vs 213 µs (10000, [1000, 1000, 2, 2], [2, 3], False, sumD) \| 133 µs vs 366 µs \| 463 µs vs 2.03 ms (10000, [1000, 1000, 2, 2], [2, 3], False, sumD, bk) \| 1.56 ms vs 1.22 ms \| 6.11 ms vs 229 µs (10000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD) \| 1.53 ms vs 799 µs \| 824 µs vs 134 µs (10000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) \| 5.15 ms vs 1.09 ms \| 7.02 ms vs 205 µs - after improving CPU and CUDA forward kernels - in `(1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD)` forward, CPU takes ~~`171 µs`~~, in which `130 µs` is spent on `coalesce()`, for CUDA, total time is ~~`331 µs`~~, in which `141 µs` is spent on `coalesce()`, we need to reduce time at other places outside `coalesce()`. - after a few simple tweaks, now in the forward, it is at most `10x` slower in CPU, and `7x` in CUDA. And time takes in `sum dense dims only [2, 3]` is `~2x` of `sum all dims`. Speed of `sum all sparse dims [0, 1]` is on bar with `sum all dims` (nnz, sizes, sum_dims, keepdim, sum all or dims, bk=backward) \| CPU (sparse vs dense) \| CUDA(sparse vs dense) -- \| -- \| -- (1000, [1000, 1000, 2, 2], [0, 1], False, sumAll) \| 7 µs vs 69.5 µs \| 31.5 µs vs 61.6 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumD) \| 11.3 µs vs 4.72 ms \| 35.2 µs vs 285 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) \| 197 µs vs 124 µs \| 857 µs vs 134 µs (1000, [1000, 1000, 2, 2], [0, 1], False, sumD, bk) \| 124 µs vs 833 µs \| 796 µs vs 106 µs (1000, [1000, 1000, 2, 2], [2, 3], False, sumD) \| 20.5 µs vs 213 µs \| 39.4 µs vs 1.24 ms (1000, [1000, 1000, 2, 2], [2, 3], False, sumD, bk) \| 131 µs vs 830 µs \| 881 µs vs 132 µs (1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD) \| 95.8 µs vs 409 µs \| 246 µs vs 87.2 µs (1000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) \| 624 µs vs 820 µs \| 953 µs vs 124 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumAll) \| 45.3 µs vs 72.9 µs \| 33.9 µs vs 57.2 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumD) \| 81.4 µs vs 4.49 ms \| 39.7 µs vs 280 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumAll, bk) \| 984 µs vs 111 µs \| 6.41 ms vs 121 µs (10000, [1000, 1000, 2, 2], [0, 1], False, sumD, bk) \| 1.45 ms vs 828 µs \| 6.77 ms vs 113 µs (10000, [1000, 1000, 2, 2], [2, 3], False, sumD) \| 74.9 µs vs 209 µs \| 37.7 µs vs 1.23 ms (10000, [1000, 1000, 2, 2], [2, 3], False, sumD, bk) \| 1.48 ms vs 845 µs \| 6.96 ms vs 132 µs (10000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD) \| 1.14 ms vs 411 µs \| 252 µs vs 87.8 µs (10000, [1000, 1000, 2, 2], [0, 2, 3], False, sumD, bk) \| 4.53 ms vs 851 µs \| 7.12 ms vs 128 µs - time takes in CUDA backward of sparse is super long with large variance (in case of nnz=10000, it normally takes 6-7ms). To improve backward of sparse ops, we will need to debug at places other than CUDA kernels. here is a benchmark of `torch.copy_()`: ``` >>> d = [1000, 1000, 2, 2] >>> nnz = 10000 >>> I = torch.cat([torch.randint(0, d[0], size=(nnz,)), torch.randint(0, d[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz, d[2], d[3]) >>> size = torch.Size(d) >>> S = torch.sparse_coo_tensor(I, V, size).coalesce().cuda() >>> S2 = torch.sparse_coo_tensor(I, V, size).coalesce().cuda().requires_grad_() >>> data = S2.clone() >>> S.copy_(S2) >>> y = S * 2 >>> torch.cuda.synchronize() >>> %timeit y.backward(data, retain_graph=True); torch.cuda.synchronize() 7.07 ms ± 3.06 ms per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12430 Differential Revision: D12878313 Pulled By: weiyangfb fbshipit-source-id: e16dc7681ba41fdabf4838cf05e491ca9108c6fe	2018-11-28 02:19:12 -08:00
Jiyan Yang	a2fcd4dee5	Ensure FP16 rowwise Adagrad can be run Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12317 Reviewed By: hyuen Differential Revision: D10190778 fbshipit-source-id: 720a9aaa4e6b1736023d8c6326a613e4ea592b31	2018-11-28 02:15:36 -08:00
Jongsoo Park	e8754ee017	use fbgemm's im2col fusion and thread partitioning (#14350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14350 acc32 for now. Will have a separate diff for acc16 but that will need another out processing that does sparse convolution without im2col. Reviewed By: dskhudia Differential Revision: D13188595 fbshipit-source-id: e8faee46c7ea43e4a600aecb8b8e93e6c860a8c8	2018-11-28 01:13:11 -08:00
Teng Li	a38ed0268e	PT1 Stable Release Distributed Documentation (#14444 ) Summary: The doc covers pretty much all we have had on distributed for PT1 stable release, tracked in https://github.com/pytorch/pytorch/issues/14080 Tested by previewing the sphinx generated webpages. All look good. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14444 Differential Revision: D13227675 Pulled By: teng-li fbshipit-source-id: 752f00df096af38dd36e4a337ea2120ffea79f86	2018-11-28 00:34:11 -08:00
David Riazati	3d98810fbd	Revert D13192230: [pytorch][PR] [jit] Use nn module tests in test_jit Differential Revision: D13192230 Original commit changeset: 36488960b6c9 fbshipit-source-id: 63b68bd909b9ef0548f52c986c84f549aecb8909	2018-11-28 00:23:09 -08:00
Teng Li	7d07fcd215	Fixed SyncParam/QueueReduction/SyncReduction test for 2+ GPUs (#14452 ) Summary: Fixed: https://github.com/pytorch/pytorch/issues/14445 Also bumped up timeout to 30 seconds, since on 8-GPU machines, DDP test will take more than 15 seconds sometimes. Tested on 8 GPU machines: ``` tengli@learnfair062:~/pytorch/test$ python test_c10d.py --verbose test_dist_broadcast_coalesced_gloo (__main__.DistributedDataParallelTest) ... ok test_dist_broadcast_coalesced_nccl (__main__.DistributedDataParallelTest) ... skipped 'Test skipped due to known issues' test_fp16 (__main__.DistributedDataParallelTest) ... ok test_gloo_backend (__main__.DistributedDataParallelTest) ... ok test_nccl_backend (__main__.DistributedDataParallelTest) ... ok test_queue_reduction (__main__.DistributedDataParallelTest) ... ok test_sync_params_no_buffers (__main__.DistributedDataParallelTest) ... ok test_sync_params_with_buffers (__main__.DistributedDataParallelTest) ... ok test_sync_reduction (__main__.DistributedDataParallelTest) ... ok test_set_get (__main__.FileStoreTest) ... ok test_set_get (__main__.PrefixFileStoreTest) ... ok test_set_get (__main__.PrefixTCPStoreTest) ... ok test_allgather_basics (__main__.ProcessGroupGlooTest) ... ok test_allgather_checks (__main__.ProcessGroupGlooTest) ... ok test_allreduce_basics (__main__.ProcessGroupGlooTest) ... ok test_allreduce_basics_cuda (__main__.ProcessGroupGlooTest) ... ok test_allreduce_checks (__main__.ProcessGroupGlooTest) ... ok test_allreduce_stress (__main__.ProcessGroupGlooTest) ... ok test_allreduce_stress_cuda (__main__.ProcessGroupGlooTest) ... ok test_broadcast_basics (__main__.ProcessGroupGlooTest) ... ok test_broadcast_basics_cuda (__main__.ProcessGroupGlooTest) ... ok test_broadcast_checks (__main__.ProcessGroupGlooTest) ... ok test_broadcast_stress (__main__.ProcessGroupGlooTest) ... ok test_broadcast_stress_cuda (__main__.ProcessGroupGlooTest) ... ok test_gather_basics (__main__.ProcessGroupGlooTest) ... ok test_gather_checks (__main__.ProcessGroupGlooTest) ... ok test_reduce_basics (__main__.ProcessGroupGlooTest) ... ok test_reduce_checks (__main__.ProcessGroupGlooTest) ... ok test_scatter_basics (__main__.ProcessGroupGlooTest) ... ok test_scatter_checks (__main__.ProcessGroupGlooTest) ... ok test_send_recv_all_to_all (__main__.ProcessGroupGlooTest) ... ok test_timeout_kwarg (__main__.ProcessGroupGlooTest) ... ok test_allgather_ops (__main__.ProcessGroupNCCLTest) ... ok test_allreduce_ops (__main__.ProcessGroupNCCLTest) ... ok test_barrier (__main__.ProcessGroupNCCLTest) ... ok test_broadcast_ops (__main__.ProcessGroupNCCLTest) ... ok test_reduce_ops (__main__.ProcessGroupNCCLTest) ... ok test_common_errors (__main__.RendezvousEnvTest) ... ok test_nominal (__main__.RendezvousEnvTest) ... ok test_common_errors (__main__.RendezvousFileTest) ... ok test_nominal (__main__.RendezvousFileTest) ... ok test_common_errors (__main__.RendezvousTCPTest) ... ok test_nominal (__main__.RendezvousTCPTest) ... ok test_unknown_handler (__main__.RendezvousTest) ... ok test_address_already_in_use (__main__.TCPStoreTest) ... ok test_set_get (__main__.TCPStoreTest) ... ok ---------------------------------------------------------------------- Ran 46 tests in 162.980s OK (skipped=1) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14452 Differential Revision: D13230652 Pulled By: teng-li fbshipit-source-id: 88580fe55b3a4fbc7a499ca3b591958f11623bf8	2018-11-27 21:58:34 -08:00
David Riazati	4cdcbbf410	Use nn module tests in test_jit (#14238 ) Summary: This PR adds weak modules for all activation modules and uses `test_nn` module tests to test weak modules that have been annotated with `weak_module` and therefore are in `torch._jit_internal._weak_types` Also depends on #14379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14238 Differential Revision: D13192230 Pulled By: driazati fbshipit-source-id: 36488960b6c91448b38c0fa65422539a93af8c5e	2018-11-27 21:19:51 -08:00
Brian Vaughan	a0def0b57e	check for invalid ranges in torch.arange Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13915 Differential Revision: D13222110 Pulled By: nairbv fbshipit-source-id: fcff1ad058fbf792d0fdf4aa75d77f22e3b7483b	2018-11-27 20:38:56 -08:00
Brian Vaughan	b08a186153	roll along multiple dimensions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13874 Differential Revision: D13223669 Pulled By: nairbv fbshipit-source-id: 1678d52529c326fa4a0614d0994b1820ad12bc04	2018-11-27 20:32:30 -08:00
David Riazati	662f66ebb9	Add poisson_nll_loss to script Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14420 Differential Revision: D13220726 Pulled By: driazati fbshipit-source-id: 6c08a0050075beafcc8ba413c9603b273870c70c	2018-11-27 19:39:16 -08:00
David Riazati	d75f751bec	Add boolean dispatch for function overloading (#14425 ) Summary: This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See max_pool1d for an example usage. This is the first step in enabling the use of max_pool functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions. Fixes #14081 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14425 Differential Revision: D13222104 Pulled By: driazati fbshipit-source-id: 8cb676b8b13ebcec3262234698edf4a7d7dcbbe1	2018-11-27 19:36:47 -08:00
Zachary DeVito	23f901a737	fix enable_cpu_fuser Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14440 Differential Revision: D13226354 Pulled By: zdevito fbshipit-source-id: e4ed023eece8b5b670a4a27d24a8688907b36b90	2018-11-27 19:14:10 -08:00
Elias Ellison	82175f31b4	Move Affine grid to C++ (#14392 ) Summary: Port AffineGrid to C++, because script does not support compiling Function classes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14392 Differential Revision: D13219698 Pulled By: eellison fbshipit-source-id: 3ddad8a84c72010b5a6c6f7f9712be614202faa6	2018-11-27 18:38:11 -08:00
Peter Goldsborough	6f2307ba6a	Allow building libraries with setuptools that dont have abi suffix (#14130 ) Summary: When using `setuptools` to build a Python extension, setuptools will automatically add an ABI suffix like `cpython-37m-x86_64-linux-gnu` to the shared library name when using Python 3. This is required for extensions meant to be imported as Python modules. When we use setuptools to build shared libraries not meant as Python modules, for example libraries that define and register TorchScript custom ops, having your library called `my_ops.cpython-37m-x86_64-linux-gnu.so` is a bit annoying compared to just `my_ops.so`, especially since you have to reference the library name when loading it with `torch.ops.load_library` in Python. This PR fixes this by adding a `with_options` class method to the `torch.utils.cpp_extension.BuildExtension` which allows configuring the `BuildExtension`. In this case, the first option we add is `no_python_abi_suffix`, which we then use in `get_ext_filename` (override from `setuptools.build_ext`) to throw away the ABI suffix. I've added a test `setup.py` in a `no_python_abi_suffix_test` folder. Fixes https://github.com/pytorch/pytorch/issues/14188 t-vi fmassa soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/14130 Differential Revision: D13216575 Pulled By: goldsborough fbshipit-source-id: 67dc345c1278a1a4ee4ca907d848bc1fb4956cfa	2018-11-27 17:35:53 -08:00
Wanchao Liang	23d111c87f	Fix clang tidy errors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14427 Differential Revision: D13222381 Pulled By: wanchaol fbshipit-source-id: d90d210a810e95bf0eb404f9c1c304f4e6a3f61e	2018-11-27 17:30:50 -08:00
Zachary DeVito	226a01e5a1	Handling of pretty-printing methods (#14378 ) Summary: Stacked on #14176, review only the last commit. * Print parameters to methods as self.weight rather than as extra inputs. * Print entire set of methods out as a single string * Update test code to test the module-at-a-time export/import Pull Request resolved: https://github.com/pytorch/pytorch/pull/14378 Differential Revision: D13198463 Pulled By: zdevito fbshipit-source-id: 3fab02e8239cfd6f40d6ab6399047bd02cf0a8c8	2018-11-27 17:10:23 -08:00
Edward Yang	75bac5ab32	Eliminate necessity of HIPify on AccumulateType.h (#14412 ) Summary: I'd like to NOT HIPify files that are not in a cuda/ directory, so hand-HIPify AccumulateType.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14412 Differential Revision: D13221801 Pulled By: ezyang fbshipit-source-id: d1927cfc956e50a6a5e67168ac0e1ce56ecd1e0b	2018-11-27 16:39:55 -08:00
andersj	1620161d6b	when BUILD_CAFFE2_OPS is OFF, torch-python needs a direct dep on nccl (#14430 ) Summary: https://github.com/pytorch/pytorch/issues/14431 tracks supporting this with CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/14430 Differential Revision: D13224079 Pulled By: anderspapitto fbshipit-source-id: 47d7900d25910ed61585b93f9003acd1b2630a9f	2018-11-27 15:53:31 -08:00
Sam Gross	006505bb8f	Speed-up "advanced" indexing operations (#13420 ) Summary: This speeds-up "advanced" indexing (indexing a tensor by a tensor) on CPU and GPU. There's still a bunch of work to do, including speeding up indexing by a byte (boolean) mask and speeding up the derivative calculation for advanced indexing. Here's some speed comparisons to indexing on master using a little [benchmark script](https://gist.github.com/colesbury/c369db72aad594e5e032c8fda557d909) with 16 OpenMP threads and on a P100. The test cases are listed as (input shape -> output shape). \| Test case \| CPU (old vs. new) \| CUDA (old vs. new) \| \|-----------------------\|---------------------\|------------------------\| \| 1024x1024 -> 512x1024 \| 225 us vs. 57 us \| 297 us vs. 47 us \| \| 1024x1024 -> 1024x512 \| 208 us vs. 153 us \| 335 us vs. 54 us \| \| 50x50 -> 20000x50 \| 617 us vs. 77 us \| 239 us vs. 54 us \| \| 50x50 -> 50x20000 \| 575 us vs. 236 us \| 262 us vs. 58 us \| \| 2x5x10 -> 10 \| 65 us vs. 18 us \| 612 us vs. 93 us \| See #11647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13420 Reviewed By: soumith Differential Revision: D13088936 Pulled By: colesbury fbshipit-source-id: 0a5c2ee9aa54e15f96d06692d1694c3b24b924e2	2018-11-27 15:23:59 -08:00
Jiyan Yang	0199d59d3a	Resubmit: Set the correct engine name for position weighted pooling when fp16 is used for training Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13768 Reviewed By: xianjiec Differential Revision: D12996103 fbshipit-source-id: 5ca4cda4210f68ece2b5d6eced8cf52ee91fb36f	2018-11-27 14:51:56 -08:00
Will Feng	ae1b37650c	Windows local build: restore original working dir after activating VC environment (#14416 ) Summary: `call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x64` seems to change the working dir to `C:\Users\Administrator\source`, and we need to cd back to the PyTorch directory before running `git submodule update --init --recursive` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14416 Differential Revision: D13222269 Pulled By: yf225 fbshipit-source-id: a0eb3311fb11713b1bb8f52cd13e2c21d5ca9c7b	2018-11-27 14:18:45 -08:00
Jerry Zhang	5c84145354	condition blob in while_op test changes data type (#14279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14279 att Reviewed By: smessmer Differential Revision: D13144472 fbshipit-source-id: af4d920a3148c648d1a428a5bcd56da19ea8c38c	2018-11-27 14:16:39 -08:00
zrphercule	ba6c49cb9c	Add test of ONNX_ATEN (#14259 ) Summary: In #14239 we fixed ONNX_ATEN. In order to make sure its correctness in the future, we should add related test case. We use torch.fmod() to test ONNX_ATEN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14259 Differential Revision: D13204610 Pulled By: zrphercule fbshipit-source-id: e4660c346e5edd201f1458b7d74d7dfac49b94c7	2018-11-27 13:51:51 -08:00
Hassan Eslami	e392d428b1	Allowing TaskGroups to carry remote nets (#14342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14342 Sometimes, when we are creating a TaskGroup, we are in fact creating a TaskGroup for a distributed job. In some cases, we may want to register a few nets as "remote" to a TaskGroup. The remote net should have sufficient attributes on where they should be executed later on. This diff adds the remote net attribute to the TaskGroup class. It exposes two minimal functionalities: adding a remote net, and getting all remote nets added to a TaskGroup. Reviewed By: d4l3k Differential Revision: D13188320 fbshipit-source-id: efe947aec30817e9512a5e18be985713b9356bdc	2018-11-27 13:34:11 -08:00
Edward Yang	b7856a32f6	Add scaffolding for HIP backend in ATen/core. (#14285 ) Summary: This code doesn't actually do anything, but it will be the groundwork necessary to change PyTorch's HIPIFY pass from reusing CUDA identifiers directly, to actually switching to using HIP identifiers (moving us closer to a world where we can compile both HIP and CUDA PyTorch side-by-side.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14285 Differential Revision: D13158851 Pulled By: ezyang fbshipit-source-id: df2462daa5d0d4112455b67bd3067d60ba55cda5	2018-11-27 13:21:42 -08:00
Edward Yang	1b93cb7631	Document device_guard in native_functions.yaml (#14235 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14235 Differential Revision: D13145780 Pulled By: ezyang fbshipit-source-id: 0e93bf009ad492551bcdcada0357f2fef529e67d	2018-11-27 13:17:23 -08:00
David Riazati	1b80644b4d	Revert D13192228: [pytorch][PR] [jit] Add boolean dispatch for function overloading Differential Revision: D13192228 Original commit changeset: fce33c400c1f fbshipit-source-id: 75c9991dc7097f9513c6c89d16eff2de6e287c3b	2018-11-27 13:14:42 -08:00
Sebastian Messmer	f9c27d60c3	Remove fake dependencies from TensorImpl to caffe2 (#14141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14141 These includes weren't actually used, let's remove them. Reviewed By: ezyang Differential Revision: D13113129 fbshipit-source-id: 816995e280b81bf99002772ea8aea458bdfcd2c7	2018-11-27 12:59:56 -08:00
Sebastian Messmer	3257ac1ff3	Fix include paths for TensorTypeId.h and TensorTypeIdRegistration.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14070 Reviewed By: ezyang Differential Revision: D13081610 fbshipit-source-id: 685994a15a2cd15e9e5447cf77671343de5dd278	2018-11-27 12:59:54 -08:00
Sebastian Messmer	ed10ef97da	Move TensorTypeId to c10/core Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14327 Reviewed By: ezyang Differential Revision: D13131338 fbshipit-source-id: c4682cb6ed6fe4cd1636e09d918eef6e90c836f1	2018-11-27 12:59:52 -08:00
Sebastian Messmer	6c2e816268	Fix include paths for Storage.h and StorageImpl.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14062 Reviewed By: ezyang Differential Revision: D13081603 fbshipit-source-id: c272b715ef2f513d21d1c3f34fbf79eec6946441	2018-11-27 12:59:50 -08:00
Sebastian Messmer	3d4d09fe06	Move Storage and StorageImpl to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14061 Reviewed By: ezyang Differential Revision: D13081608 fbshipit-source-id: 1ea2d32e9ec9293b6ffa4b9e76c674cca55d5a1c	2018-11-27 12:59:48 -08:00
Sebastian Messmer	507ed9032e	Fix include paths for Allocator.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14060 Reviewed By: ezyang Differential Revision: D13081605 fbshipit-source-id: 02f23af174c0f0c38fb0163c2dfef3873ff5635d	2018-11-27 12:59:46 -08:00
Sebastian Messmer	3a71d5ee49	Move Allocator.h to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14059 Reviewed By: ezyang Differential Revision: D13081606 fbshipit-source-id: d6ad59ad4e3d363268cd4307b6c999a168681246	2018-11-27 12:59:44 -08:00
Sebastian Messmer	0b10f147b6	Move UniqueVoidPtr to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14058 Reviewed By: dzhulgakov Differential Revision: D13081602 fbshipit-source-id: e91ccf9fba9a7a02f99ed90b7a3a0fe7afd56832	2018-11-27 12:59:42 -08:00
Sebastian Messmer	8b1ca2810b	Move ScalarTypeUtils.h to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14024 Reviewed By: ezyang Differential Revision: D13081604 fbshipit-source-id: d7a09610f64eb2e9dd831bbb3c85f20691251594	2018-11-27 12:59:40 -08:00
Sebastian Messmer	44e21cf5bb	Fix include paths for Scalar.h and ScalarType.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14023 Reviewed By: ezyang Differential Revision: D13081609 fbshipit-source-id: c27eeafa381b39e043f0261ea7f6f634ee8bc238	2018-11-27 12:59:38 -08:00
Sebastian Messmer	50e9c56830	Move Scalar and ScalarType to c10/core Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14022 Reviewed By: ezyang Differential Revision: D13015236 fbshipit-source-id: 92aac4e342d85f75a31837b2943fa5b80f0c35c9	2018-11-27 12:59:36 -08:00
Michael Suo	3fca4bde50	Trace in-place ops (#14254 ) Summary: This PR adds a `try_outplace` option to the tracer. When `try_outplace` is true, the tracer will attempt to out-of-place ops (similar to how things are done today). When it's false, the correct in-place op is emitted. I made `try_outplace` false by default, but flipped it to true for ONNX export utils. zdevito jamesr66a, anywhere else I should preserve the existing behavior? Pull Request resolved: https://github.com/pytorch/pytorch/pull/14254 Reviewed By: eellison Differential Revision: D13166691 Pulled By: suo fbshipit-source-id: ce39fdf73ac39811c55100e567466d53108e856b	2018-11-27 12:40:56 -08:00
Teng Li	ffbc3905a1	Fixed torch.multiprocessing.spawn for not being able to spawn like dataloader workers (#14391 ) Summary: Should fix: https://github.com/pytorch/pytorch/issues/14390 Now imagenet example works fine with multiprocessing and more than 1 dataloader worker Pull Request resolved: https://github.com/pytorch/pytorch/pull/14391 Reviewed By: calebho Differential Revision: D13209800 Pulled By: teng-li fbshipit-source-id: e8abc0fb38d4436cf3474dcbba0e28f4290e4d29	2018-11-27 12:37:41 -08:00
Jerry Zhang	5fefb29a53	Tensor construction: combine Resize+mutable_data - 4/4 (#13856 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13856 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13007310 fbshipit-source-id: 941f064ef8934bb17fbfb706e6ed3db173b5d268	2018-11-27 12:34:25 -08:00
Zachary DeVito	e22cc7c072	Print default values and introduce ir view classes (#14176 ) Summary: [Stacked commit, only review the last commit] This PR adds support for printing default values in python printing as well as the logic for parsing default values back in using the parser. For simplicity, this PR simply creates a subgraph of the constant expressions and then runs that graph to generate the defaults. A more lightweight approach should be possible later, but would require more machinery. To make reading code in the printer easier, this also add ir_views.h. Similar to tree_views.h these classes can provide views of some commonly used IR nodes that have complicated structure and common operations on that structure. Currently it has only read-only views for prim::If and prim::Loop, but we should eventually add helpers to manipulate If/Loop nodes as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14176 Differential Revision: D13198455 Pulled By: zdevito fbshipit-source-id: dc99ab9692804ccaedb60a55040c0b89ac7a6a6d	2018-11-27 11:48:27 -08:00
Thomas Viehmann	8408dff55a	Add Type support to the fuser, fuse more (#14336 ) Summary: This adds scalar type support to the fuser, both internally (instead of auto / assuming float) and for the inputs/outputs. We can now fuse things with input / output of arbitrary scalar type, in particular comparisons and where work well. So it fixes #13384 by returning the right type tensor (and adds a test where byte and double tensors are returned). The type inference is done by re-calling PropagateTensorShapeOnNode in the compilation, I would venture that it isn't prohibitively expensive compared to the actual compilation. (Propagation was fixed for where to return the second argument's type and amended to handle FusedConcat.) I'm not sure how to add a check for the code generated by the fuser, but I am not sure we absolutely need to (we'd see if it is invalid / produces wrong results). Thanks in particular to apaszke, fmassa, mruberry for advice and encouragement! All the errors are my own. I have discussed order of PRs briefly with mruberry, if this goes in before he submits the PR, he graciously agreed to rebasing his, but I'd happily rebase, too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14336 Differential Revision: D13202620 Pulled By: soumith fbshipit-source-id: 855159e261fa15f21aca3053bfc05fb3f720a8ef	2018-11-27 11:33:11 -08:00
svcscm	bd629481fb	Updating submodules Reviewed By: yns88 fbshipit-source-id: e63160e97550942931bacaa860d91d591d2e1712	2018-11-27 11:23:32 -08:00
David Riazati	66c8bbf021	Add boolean dispatch for function overloading (#14081 ) Summary: This PR allows to overload functions based on the value of a parameter (so long as it is a constant). See `max_pool1d` for an example usage. This is the first step in enabling the use of `max_pool` functions for the standard library that can return `Tensor` or `Tuple[Tensor, Tensor]` based on the `return_indices` flag. This will give the JIT identical results to the Python versions of the functions. Depends on #14232 for `Optional[BroadcastingList[T]]` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14081 Differential Revision: D13192228 Pulled By: driazati fbshipit-source-id: fce33c400c1fd06e59747d98507c5fdcd8d4c113	2018-11-27 10:51:32 -08:00
Pieter Noordhuis	2cc35c161a	Barrier synchronizes with prior work before completing (#14386 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14386 See #13573, #14142, and #14271 for discussion. This change updates ProcessGroupGloo to ensure that all prior operations have completed before executing the barrier. Reviewed By: manojkris Differential Revision: D13205022 fbshipit-source-id: 673e7e6ca357dc843874d6dd8da590832e1de7fa	2018-11-27 10:46:42 -08:00
Pieter Noordhuis	9598d380b0	Make ProcessGroup::Work::wait() throw (#14298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14298 This is a breaking API change for users of the C++ c10d API. The work object defined wait() to return a boolean. If the work completed successfully it would return true, if it didn't it would return false. It was then up to the user to call the exception() function to figure out what went wrong. This has proven suboptimal as it allows users to forget about failure handling and errors may be ignored. The work class is semantically very similar to std::future, where a call to get() may throw if the underlying std::promise has set an exception. This commit changes the semantic of the work class to be similar to this and turns wait() into a void function that throws if the work completes with an exception. The exception() function can still be used to retrieve the exception if isSuccess() returns false, but now returns an std::exception_ptr instead of a reference to a std::exception. Reviewed By: manojkris Differential Revision: D13158475 fbshipit-source-id: 9cd8569b9e7cbddc867a5f34c6fd0b7be85581b8	2018-11-27 10:46:40 -08:00
Pieter Noordhuis	03864b7b11	Add option structs and timeout field (#14297 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14297 Adds option structs for allgather and barrier such that we have one for every collective. Add timeout member field to every one of these such that we can support per operation timeouts. Use default constructed options struct for every collective process group function exposed to Python. Reviewed By: manojkris Differential Revision: D13158474 fbshipit-source-id: 3d28977de2f2bd6fc2f42ba3108b63a429338906	2018-11-27 10:46:38 -08:00
Pieter Noordhuis	52f50220d9	Refer to all work with ProcessGroup prefix (#14296 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14296 There was mixed usage of "ProcessGroup::Work" and just "Work". Adding prefix for readability/consistency. Reviewed By: manojkris Differential Revision: D13128977 fbshipit-source-id: a54a8784fa91cd6023c723cb83e9f626fb896a30	2018-11-27 10:46:36 -08:00
Pieter Noordhuis	5865561a9a	Remove algorithm caching in ProcessGroupGloo (#14295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14295 This is no longer used after moving to Gloo new style algorithms. Closes #11912. Reviewed By: manojkris Differential Revision: D13111781 fbshipit-source-id: 53e347080e29d847cd9da36f2d93af047930690c	2018-11-27 10:46:34 -08:00
Pieter Noordhuis	936c2bba23	Use new style barrier support in c10d/gloo (#14294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14294 This is the final collective to be ported to the new style where there is no longer a need to keep a cached algorithm instance around. There is a follow up change incoming to remove the algorithm caching functionality in ProcessGroupGloo. Reviewed By: manojkris Differential Revision: D13111509 fbshipit-source-id: f3ea0d955a62029fc4e7cfc09055e4957e0943ac	2018-11-27 10:46:32 -08:00
Wei Yang	50bc9dc9c3	fix doc for sparse.addmm (#14403 ) Summary: - fixing the doc issue in sparse.addmm ================ before change ================== ![image](https://user-images.githubusercontent.com/38509346/49063994-2f10fe80-f1ce-11e8-9ccc-54241bc45f0b.png) ![image](https://user-images.githubusercontent.com/38509346/49064064-641d5100-f1ce-11e8-865a-7227be7156ef.png) ================ post change ================== ![image](https://user-images.githubusercontent.com/38509346/49064078-76978a80-f1ce-11e8-8f38-f1f8ac9ce63b.png) ![image](https://user-images.githubusercontent.com/38509346/49064085-7bf4d500-f1ce-11e8-8a0d-bf9e5460d21f.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14403 Differential Revision: D13216582 Pulled By: weiyangfb fbshipit-source-id: 52e0a20c6b341c37cfb31f281be3afe2a52ca532	2018-11-27 10:24:18 -08:00
Jongsoo Park	a3cfab2d63	per-group and per-channel quantization (#14340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14340 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/25 Per-group and per-channel quantization in fbgemm This diff also cleans up explicit template instantiation using macro expansion This diff also changes randFill interface which was easy to make mistakes of generating integer random numbers for floating point vectors. Using this in DNNLOWP operators will be done in a separate diff. Reviewed By: dskhudia Differential Revision: D13176386 fbshipit-source-id: e46c53e31e21520bded71b8ed86e8b19e010e2dd	2018-11-27 10:17:34 -08:00
Peter Goldsborough	49fe678fec	Add variable_factories.h to cppdocs (#14381 ) Summary: This will document `torch::from_blob` and such. soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/14381 Differential Revision: D13216560 Pulled By: goldsborough fbshipit-source-id: 112f60e45e4d38a8a9983fa71e9cc56bc1a73465	2018-11-27 10:13:23 -08:00
Jan Schlüter	c19af59a6e	Use integer math to compute output size of pooling operations (#14405 ) Summary: As reported in #13386, the pooling operations can return wrong results for large inputs. The root of the problem is that while the output shape is initially being computed with integer operations, it is converted to float32 for division by the stride and applying either a `ceil` or a `floor` depending on the `ceil_mode`. Since even moderately large integers (the smallest being 16,777,217) cannot be expressed exactly in float32, this leads to wrong result shapes. This PR relies purely on integer operations to perform the shape computation, including the ceil/floor distinction. Since I could not stand all that duplicated code, I pulled it out into a `pooling_shape.h` header, similar to the existing `linear_upsampling.h` header. I hope this is acceptable, let me know if you'd like to see it solved differently. I've also added tests to `test_nn.py` that fail without my changes and pass with my changes. They cover `{max,avg}_pool{1,2,3}d()` for CPU and GPU. Fixes #13386. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14405 Differential Revision: D13215260 Pulled By: soumith fbshipit-source-id: 802588ce6cba8db6c346448c3b3c0dac14d12b2d	2018-11-27 09:38:06 -08:00
Edward Yang	c5cc1e3ab2	Delete legacy THCStream (long live THCStream). (#14246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14246 This commit systematically eliminates THCStream entirely from THC, replacing it with at::cuda::CUDAStream. In places where the previous pointer type showed up in a public API signature, those functions are now only available to C++ clients. (It would not be too difficult to make a C-compatible version of CUDAStream, as it's really just a simple struct, but we leave this for future work.) All functions in THC that referred to THCStream were expunged in favor of their modern counterparts. One annoyance was that I didn't feel like redoing how the torch.cuda.Stream binding code worked, but I really wanted to get rid of the stored THCStream* pointer. So I repurposed the bit-packing code I implemented for Stream hashing, and used that to (reversibly) store streams in a uint64_t cdata field. A perhaps more future proof solution would be to get rid of cdata entirely, and store the device and stream ID directly. Billing of changes: - All CUDAStream_ pointer API functions are now hidden and anonymously namespaced (instead of being in the impl namespace). All use sites rewritten to use the modern C++ API. Since CUDAStreamInternals is no longer part of the public API, the CUDAStreamInternals constructor and internals() method have been removed, and replaced with anonymous functions in the C++ file. - device_index() returns DeviceIndex rather than int64_t now - Stream and CUDAStream now have pack/unpack methods. (CUDAStream checks that the unpacked bit-pattern is for a CUDA device.) - THCStream.h header is removed entirely - Most THCStream handling functions in THC API are removed Reviewed By: gchanan Differential Revision: D13121531 fbshipit-source-id: 48873262cc0a37c3eec75a7ba1c93c800da40222	2018-11-27 08:32:09 -08:00
Edward Yang	388258fb5e	Add hash functions for Stream, CUDAStream; fix Device hash function (#14191 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14191 Previously, Device's hash function only worked for CPU and CUDA. Now it works for everything. Implementing the bit concatenation was a bit tricky, and I got it wrong the first time. See Note [Hazard when concatenating signed integers] Reviewed By: smessmer Differential Revision: D13119624 fbshipit-source-id: 36bfa139cfc739bb0624f52aaf466438c2428207	2018-11-27 08:32:08 -08:00
Owen Anderson	3ff70712c2	Implement NaN-propagating max/min on Vec256. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13399 Differential Revision: D13199957 Pulled By: resistor fbshipit-source-id: 1565e079b13c5d4f42f2033830a7c997b7d824bc	2018-11-26 22:46:20 -08:00
svcscm	a0ef8afd7e	Updating submodules Reviewed By: yns88 fbshipit-source-id: 210f7eec65bea5e31817fb56dec27b0ab8af797a	2018-11-26 19:38:00 -08:00
Ilia Cherniavskii	f019a2d9b3	Remove unused executors, part 3 (#14199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14199 Remove legacy code for dag, async_dag Reviewed By: salexspb Differential Revision: D13019102 fbshipit-source-id: ff07e45304d9af4be0375215f4b642c4b0edb12d	2018-11-26 19:10:43 -08:00
Ilia Cherniavskii	7953b32dc4	Remove unused executors, part 2 (#14115 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14115 Remove legacy implementation of prof_dag Reviewed By: salexspb Differential Revision: D13019096 fbshipit-source-id: 4f2bf676444d84eaa2cc1effcc3ebdc764e0a016	2018-11-26 19:10:42 -08:00
Ilia Cherniavskii	34239006b0	Remove unused executors, part 1 (#14117 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14117 Removing unused legacy executors (htrace) Reviewed By: salexspb Differential Revision: D13019078 fbshipit-source-id: 19d0ed1b47a22cc17c27fdd15d748ced54806132	2018-11-26 19:10:40 -08:00
Edward Yang	507cb16583	Delete OPENMP_STUB translation. (#14286 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/14286 Differential Revision: D13205356 Pulled By: ezyang fbshipit-source-id: 08e9821e4b32f8d7f3c41906e481f280ee6cf2e3	2018-11-26 19:08:07 -08:00
Wei Yang	12558019a8	backward for sparse.addmm(D, S, D, alpha, beta) -> D (#13345 ) Summary: - introduce `sparse.addmm()` with backward for sparse matrix input for https://github.com/pytorch/pytorch/issues/12308 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13345 Differential Revision: D13094070 Pulled By: weiyangfb fbshipit-source-id: 136c08c3ca9bafb20577b60dd43d31c3e5cd5461	2018-11-26 17:47:48 -08:00
Marat Dukhan	9e1805d38e	Switch Int8ChannelShuffle operator to QNNPACK (#14362 ) Summary: 1.8-2.2X better performance on ARM devices Pull Request resolved: https://github.com/pytorch/pytorch/pull/14362 Reviewed By: jerryzh168 Differential Revision: D13192312 Pulled By: Maratyszcza fbshipit-source-id: 0d3dff067e300c7d741c42615b61246cbf09a829	2018-11-26 17:43:32 -08:00
Teng Li	2d6f039766	Fixed file init_method write/read race (#14388 ) Summary: This should fix the race among multiple processes: https://github.com/pytorch/pytorch/issues/13750 Essentially, the reader is trying to open the file, and will error out if it doesn't exist, we here factor in the timeout option of FileStore to apply a timeout for creating a file (should always be created anyway unless something is wrong), and more importantly, waiting for the file to be created. Tested on both NFS and local drive, the race disappears when 8 concurrent processes do distributed training. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14388 Differential Revision: D13207178 Pulled By: teng-li fbshipit-source-id: d3d5d62c4c8f01c0522bf1653c8986155c54ff80	2018-11-26 17:09:35 -08:00
Peter Goldsborough	f639249d51	Fix dataloader iterator test (#14045 ) Summary: I noticed the test `DataLoaderTest.CanDereferenceIteratorMultipleTimes` doesn't test proper progression of the iterator. I also added a test for using `std::copy`. Fixes https://github.com/pytorch/pytorch/issues/14276 ebetica ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/14045 Differential Revision: D13092187 Pulled By: goldsborough fbshipit-source-id: 57698ec00fa7b914b159677a4ab38b6b25c2860b	2018-11-26 17:06:41 -08:00
Teng Li	6f3002a50e	Fixed c10d test (#14389 ) Summary: Most likely a typo. Tested on 8-GPU machine ``` tengli@learnfair062:~/pytorch/test$ python test_c10d.py ProcessGroupNCCLTest.test_barrier . ---------------------------------------------------------------------- Ran 1 test in 29.341s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14389 Differential Revision: D13207207 Pulled By: teng-li fbshipit-source-id: aaffe14237076fe19d94e2fa4d9c093397f07bb9	2018-11-26 16:46:33 -08:00
Brennan Vincent	1ca0ec7299	fix typo in `torch.sum` documentation (#14250 ) Summary: Notice that an extra colon was added to `:attr:`, so in https://pytorch.org/docs/stable/torch.html#torch.sum , `dim` shows up as ":attr::_dim_". This patch fixes the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14250 Reviewed By: soumith Differential Revision: D13146363 Pulled By: umanwizard fbshipit-source-id: f7d03dcb0973aae248b56ab407ba8489f2b1fe36	2018-11-26 16:36:52 -08:00
Wanchao Liang	cef23a4b1d	More JIT type hierarchy refinement (#14127 ) Summary: JIT type system hierarchy refinement and refactors: 1. Make NumberType be the base type of IntType FloatType 2. Make single type container like OptionalType and FutureType share SingleElementType base type 3. Some refactors to make it more robust, e.g. adding python_str() for some types so that we have proper python_print serialization format Pull Request resolved: https://github.com/pytorch/pytorch/pull/14127 Differential Revision: D13112657 Pulled By: wanchaol fbshipit-source-id: 335c5b25977be2e0a462c7e4a6649c1b653ccb4f	2018-11-26 16:25:40 -08:00
Jesse Hellemn	afb2c0ce86	changing some rpath stuff (#14304 ) Summary: See if anything breaks Pull Request resolved: https://github.com/pytorch/pytorch/pull/14304 Differential Revision: D13201418 Pulled By: pjh5 fbshipit-source-id: ac2101b61a23bda37329d4d923c3d9d120e718bf	2018-11-26 15:57:47 -08:00
Kevin Chen	b18063b39a	Fix caffe2 => onnx exporter for ConvTranspose (#14143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14143 ConvTranspose has a per-operator attribute rename, which meant that the global attribute rename for kernels => kernel_shape was not applied. Changing the behavior so that the global renames always apply, but per-op renames can override those for specific attributes. Note: The python frontend path isn't actually used for ConvTranspose, but I thought it would be good to make it consistent. Reviewed By: yinghai Differential Revision: D13113395 fbshipit-source-id: cd3f124b4b5c753a506d297138b7d002b51bfb38	2018-11-26 15:51:42 -08:00
Will Feng	5918de8e84	Revert D13166669: [pytorch][PR] Allow dataloader to accept a custom memory pinning function Differential Revision: D13166669 Original commit changeset: ca965f9841d4 fbshipit-source-id: 0836b4f50f73ba01c97491a719660f02e36f20ad	2018-11-26 14:55:04 -08:00
andersj	bb7fb7e45f	remove CAFFE2_API from IdWrapper (#14044 ) Summary: it doesn't really make sense on a template class. Also it breaks if you try to build in debug on Windows, so this will save someone some frustration in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14044 Differential Revision: D13202960 Pulled By: anderspapitto fbshipit-source-id: 617d78366993d5ecc2ba1f23bb90010f10df41f3	2018-11-26 14:08:56 -08:00
Jerry Zhang	735cd06536	FeedTensor returns a Tensor (#14196 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14196 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13641 FeedTensor function used to take a pointer to Tensor and feed the content using Resize and mutable_data, but since Tensor is a pointer now, we can just return a Tensor instead. Reviewed By: dzhulgakov Differential Revision: D13091163 fbshipit-source-id: 9abf2fd320baca76e050530c500dd29f8e2d0211	2018-11-26 13:05:44 -08:00
Richard Zou	b13f91dbd9	Allow graph fuser to move chunks past multiple nodes. (#14055 ) Summary: Fixes #12290. Also speeds up JIT LSTM forward pass from 8.8ms to 7.8ms; previously, each JIT lstm cell used 2 fused kernels. Now, it only uses one fused kernel (which is how many kernels cudnn uses). Explanation: Let f, g, h be fusible ops. ``` x = f(v, w) z = g(x, y) a, b = chunk(z) c = h(a, b) ``` becomes (before this PR): ``` x = f(v, w) x', y' = broadcast_tensors([x, y]) ax, bx = chunk(x') ay, by = chunk(y') a = g(ax, ay) b = g(bx, by) c = h(a, b) ``` The graph fuser then puts g, g, and h into one FusionGroup and is unable to move `x = f(v, w)` into the FusionGroup. This PR lets the graph fuser move `x = f(v, w)` into the FusionGroup. It does this by abstracting the broadcast_tensors + multiple chunk nodes into one intermediate `prim::BroadcastingChunk[chunks, dim]` node. A `BroadcastingChunk[chunks, dim](inputs)` node is equivalent to: - broadcasting all of inputs - chunk-ing each broadcasted input into `chunks` chunks along dim `dim`. Abstracting the broadcasting chunk behavior away, it is now a lot easier for the graph fuser to move (broadcast + chunk) past an operation. After this PR, the above graph becomes: ``` x = f(v, w) ax, bx, ay, by = BroadcastingChunk(x, y) a = g(ax, ay) b = g(bx, by) c = h(a, b) ``` Now, to move `x = f(v, w)` after the BroadcastingChunk, one just needs to add f's operands to the BroadcastingChunk: ``` ay, by, av, bv, aw, bw = BroadcastingChunk(y, v, w) ax = f(av, aw) by = f(bv, bw) a = g(ax, ay) b = g(bx, by) c = h(a, b) ``` cc apaszke mruberry zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/14055 Differential Revision: D13159259 Pulled By: zou3519 fbshipit-source-id: 134e9e645c950384d9be6a06a883a10e17a73d7d	2018-11-26 12:31:49 -08:00
svcscm	8cc5d54b66	Updating submodules Reviewed By: yns88 fbshipit-source-id: b4d74bf58b5536a0de654dfe73d41b5e1126eec6	2018-11-26 12:21:09 -08:00
Jesse Hellemn	0d1f382e39	Removing Caffe2-specific conda infra Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11961 Differential Revision: D10045909 Pulled By: pjh5 fbshipit-source-id: e9c12124897ee586aeb8b6654b31e4b81687199a	2018-11-26 12:18:17 -08:00
Michael Suo	2fa3c8327c	fix tensor advanced indexing with assignment (#14311 ) Summary: Fix a mishandling of `foo[a] = b` when `a` was a tensor. We were assigning to a copy of `foo`, not a view of it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14311 Differential Revision: D13196109 Pulled By: suo fbshipit-source-id: c929401fda7c4a27622d3fe2b11278b08a7f17f1	2018-11-26 12:10:48 -08:00
Jongsoo Park	80ba65e2f5	remove unnecessary zero_point argument from constructors (#14323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14323 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/24 As title says. Reviewed By: dskhudia Differential Revision: D13167073 fbshipit-source-id: 6d6c526fd6e29a14e97f71a0881f28ada8703107	2018-11-26 11:48:17 -08:00
svcscm	0651b594d8	Updating submodules Reviewed By: yns88 fbshipit-source-id: 06e234f1a0217a268712832f21cb06b7109538a6	2018-11-26 11:27:01 -08:00
Peter Goldsborough	a10a993872	Fix -Wreturn-std-move (#14113 ) Summary: On clang-7 (internal) a warning, `-Wreturn-std-move`, is being emitted and raised to an error via `-Werror` for the code this PR fixes. The reason is that `autograd::make_variable` returns an `autograd::Variable`, so returning it from a function that returns `at::Tensor` disallows the compiler from eliding the return value (RVO). So let's explicitly convert the `autograd::Variable` to an `at::Tensor` before returning it. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/14113 Differential Revision: D13105638 Pulled By: goldsborough fbshipit-source-id: 6e1dc31c6512e105ab2a389d18807422ee29283c	2018-11-26 11:15:59 -08:00
Jongsoo Park	90ed2f5aca	minimize code compiled with avx2 and header includes from them (#14313 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14313 Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/22 This diff is an attempt to minimize code compiled with avx2. Reviewed By: dskhudia Differential Revision: D13166591 fbshipit-source-id: 2be241141f6d7478b86a422953791e237ff10268	2018-11-26 11:09:21 -08:00
Peter Goldsborough	fa73037233	Add proper from_blob overloads (#13982 ) Summary: There was an overload for `torch::from_blob` missing that allowed passing strides. ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/13982 Differential Revision: D13108089 Pulled By: goldsborough fbshipit-source-id: b87594ec0bf55b35d106b4438bc18b2ce9fc8f71	2018-11-26 10:14:51 -08:00
Brennan Vincent	b30c803662	allow concatenating "hybrid" (sparse/dense) tensors along their dense dimensions (#13761 ) Summary: Follow-up to #13577 The idea is to take each values tensor, concatenate it with zeros before and after itself (along the dimension corresponding to the one we're catting the tensors along), to get a tensor corresponding to the values for that tensor in the result. Then we concatenate all of those together to get the final values tensor. (Hopefully, this will be more clear from the example in the comments). The indices are more straightforward: since we aren't concatenating along a sparse dimension, they don't change at all, so all we need to do are concatenate the indices from the different tensors together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13761 Differential Revision: D13160343 Pulled By: umanwizard fbshipit-source-id: 13d7adecd369e0eebdf5bce3d90a51029b66bd1d	2018-11-26 10:06:49 -08:00
Peter Goldsborough	a13fd7ec28	Allow torch.utils.cpp_extension.load to load shared libraries that aren't Python modules (#13941 ) Summary: For custom TorchScript operators, `torch.ops.load_library` must be used and passed the path to the shared library containing the custom ops. Our C++ extensions stuff generally is meant to build a Python module and import it. This PR changes `torch.utils.cpp_extension.load` to have an option to just return the shared library path instead of importing it as a Python module, so you can then pass it to `torch.ops.load_library`. This means folks can re-use `torch.utils.cpp_extension.load` and `torch.utils.cpp_extension.load_inline` to even write their custom ops inline. I think t-vi and fmassa will appreciate this. soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/13941 Differential Revision: D13110592 Pulled By: goldsborough fbshipit-source-id: 37756307dbf80a81d2ed550e67c8743dca01dc20	2018-11-26 09:39:21 -08:00
Adam Paszke	a60368982b	Batch more matrix multiplies (#13456 ) Summary: This handles the input pre-multiplication in RNNs, yielding pretty significant speedups in backward times. This pass depends on loop unrolling, so we'll batch only as many elements as the unrolling factor allows. cc mruberry ngimel zou3519 zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/13456 Differential Revision: D12920339 Pulled By: zou3519 fbshipit-source-id: 5bcd6d259c054a6dea02ae09a9fdf9f030856443	2018-11-26 09:20:35 -08:00
Gregory Chanan	1ef949036c	Enable native wrappers for the remainder of nn functions. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14290 Differential Revision: D13162562 Pulled By: gchanan fbshipit-source-id: 615e1727988bfeeade48f9b38162333a2e298f7b	2018-11-26 07:58:59 -08:00
Huan Gui	60e7d04961	Add Recency Weighted into SparseLookup (#14291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14291 Add RecencyWeighted into SparseLookup. Reviewed By: Wakeupbuddy Differential Revision: D13147738 fbshipit-source-id: de5dc3aaee8ce7d41c6d30d2ff47e9786a7fa4da	2018-11-24 02:43:31 -08:00
Shuichi KITAGUCHI	6e1e2032d3	quote NUMPY_INCLUDE_DIR (#14341 ) Summary: when NUMPY_INCLUDE_DIR contains space character (e.g. "C:\Program Files (x86)\Microsoft Visual Studio\..."), cmake cannot receive correct path name. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14341 Differential Revision: D13188408 Pulled By: soumith fbshipit-source-id: b62127d90e53da94fe6af5d3bdd2ea4fd6546210	2018-11-23 21:34:01 -08:00
Michael Suo	33d091f432	shape analysis fix (#14325 ) Summary: This PR is deceptively large because of an indenting change. The actual change is small; I will highlight it inline Pull Request resolved: https://github.com/pytorch/pytorch/pull/14325 Differential Revision: D13183296 Pulled By: suo fbshipit-source-id: fcbf6d5317954694ec83e6b8cc1c989f2d8ac298	2018-11-23 11:24:24 -08:00
peter	8e3240d022	Some minor fixes for Windows build script (#14218 ) Summary: 1. Fix execution failure when some of the paths are not defined 2. Users can now optionally override install dir by setting `CMAKE_INSTALL_PREFIX` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14218 Differential Revision: D13180350 Pulled By: soumith fbshipit-source-id: 8c9680d1285dbf08b49380af1ebfa43ede99babc	2018-11-23 08:17:16 -08:00
Michael Carilli	7557a993ab	Allow dataloader to accept a custom memory pinning function (#14171 ) Summary: Currently, the `pin_memory_batch` function in the dataloader will return a batch comprised of any unrecognized type without pinning the data, because it doesn't know how. This behavior was preventing us from overlapping data prefetching in Mask-RCNN, whose custom `collate_fn` returns a custom batch type. The present PR adds the ability for the user to pass a `pin_fn` alongside any custom `collate_fn` to handle such custom types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14171 Differential Revision: D13166669 Pulled By: soumith fbshipit-source-id: ca965f9841d4a259b3ca4413c8bd0d8743d433ab	2018-11-23 08:12:43 -08:00
Michael Carilli	c36156eded	Option to preserve bitwise accuracy of gradient checkpointed vs non-checkpointed dropout (#14253 ) Summary: This issue was noticed, and fix proposed, by raulpuric. Checkpointing is implemented by rerunning a forward-pass segment for each checkpointed segment during backward. This can result in the RNG state advancing more than it would without checkpointing, which can cause checkpoints that include dropout invocations to lose end-to-end bitwise accuracy as compared to non-checkpointed passes. The present PR contains optional logic to juggle the RNG states such that checkpointed passes containing dropout achieve bitwise accuracy with non-checkpointed equivalents.** The user requests this behavior by supplying `preserve_rng_state=True` to `torch.utils.checkpoint` or `torch.utils.checkpoint_sequential`. Currently, `preserve_rng_state=True` may incur a moderate performance hit because restoring MTGP states can be expensive. However, restoring Philox states is dirt cheap, so syed-ahmed's [RNG refactor](https://github.com/pytorch/pytorch/pull/13070#discussion_r235179882), once merged, will make this option more or less free. I'm a little wary of the [def checkpoint(function, args, preserve_rng_state=False):](https://github.com/pytorch/pytorch/pull/14253/files#diff-58da227fc9b1d56752b7dfad90428fe0R75) argument-passing method (specifically, putting a kwarg after a variable argument list). Python 3 seems happy with it. Edit: It appears Python 2.7 is NOT happy with a [kwarg after args](https://travis-ci.org/pytorch/pytorch/builds/457706518?utm_source=github_status&utm_medium=notification). `preserve_rng_state` also needs to be communicated in a way that doesn't break any existing usage. I'm open to suggestions (a global flag perhaps)? **Batchnorm may still be an issue, but that's a battle for another day. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14253 Differential Revision: D13166665 Pulled By: soumith fbshipit-source-id: 240cddab57ceaccba038b0276151342344eeecd7	2018-11-23 08:09:43 -08:00
svcscm	1e05f4be73	Updating submodules Reviewed By: yns88 fbshipit-source-id: e92b0c24a56b588dcf30542692cb4bdc2d474825	2018-11-22 22:04:37 -08:00
Sebastian Messmer	d55b25a633	Remove individual "using c10:xxx" statements (#13168 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13168 We now have a "using namespace c10" in the at and caffe2 namespaces, we don't need the individual ones anymore Reviewed By: ezyang Differential Revision: D11669870 fbshipit-source-id: fc2bb1008e533906914188da4b6eb30e7db6acc1	2018-11-22 11:57:10 -08:00
Yinghai Lu	f79fb58744	Make sure we bind input/output of Onnxifi op positionally (#14214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14214 This is to pick up the residual task of T36325466 to make sure that input/output binding of c2 Onnxifi op is positional. Reviewed By: dzhulgakov Differential Revision: D13134470 fbshipit-source-id: d1b916dade65c79133b86507cd54ea5166fa6810	2018-11-22 00:31:01 -08:00
Wanchao Liang	7fc34a4122	Convert gumbel_softmax, lp pooling weak functions and modules (#14232 ) Summary: 1. Support `Optional[BroadcastingList1[int]]` like type annotation to accept a int or a list[int] 2. Convert gumbel_softmax, lp pooling weak functions and modules Pull Request resolved: https://github.com/pytorch/pytorch/pull/14232 Differential Revision: D13164506 Pulled By: wanchaol fbshipit-source-id: 6c2a2b9a0613bfe907dbb5934122656ce2b05700	2018-11-21 23:44:24 -08:00
Sebastian Messmer	08b77d3844	Use ADL to find toString (#14021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14021 I'm planning to move at::Scalar to c10, and there's a at::toString(Scalar) defined. Unfortunately, we call it by specifying at::toString() instead of relying on ADL. This diff changes that to prepare the actual move. Reviewed By: ezyang Differential Revision: D13015239 fbshipit-source-id: f2a09f43a96bc5ef20ec2c4c88f7790fd5a04870	2018-11-21 23:08:52 -08:00
Sebastian Messmer	0e93a03a3a	Fix include paths for intrusive_ptr (#13692 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13692 This now lives in c10/util, not ATen/core anymore. Reviewed By: ezyang Differential Revision: D12937091 fbshipit-source-id: ea2d420a15e7941a38d0b4c75e20ca18437c73f8	2018-11-21 23:08:50 -08:00
Sebastian Messmer	4160c13cd2	Move intrusive_ptr to c10/util Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13691 Reviewed By: ezyang Differential Revision: D12937090 fbshipit-source-id: fe9d21d5f7ea4e78e7e38ac60db13814a9971ed9	2018-11-21 23:08:49 -08:00
Joel Marcey	e91c8e2f2d	ignore generated caffe2 docs and virtualenvs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14309 Reviewed By: soumith Differential Revision: D13166626 Pulled By: JoelMarcey fbshipit-source-id: 4f11228d8b5da85cec222bf11282722a7319581b	2018-11-21 22:30:34 -08:00
svcscm	3918e226fd	Updating submodules Reviewed By: yns88 fbshipit-source-id: 20976d595e68a08d746d8806fd0205d810656366	2018-11-21 22:02:07 -08:00
Jongsoo Park	fb8c3d62fe	removing quantization utility functions moved to fbgemm (#14301 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14301 This diff removes quantization utility functions copied to fbgemm Reviewed By: Maratyszcza Differential Revision: D13159299 fbshipit-source-id: a7f3cd2af0aa241a8578d532a70a157da70d9289	2018-11-21 21:38:23 -08:00
Achal Shah	8c4910b095	Cuda version comparison with CUDA_VERSION_STRING (#14302 ) Summary: Cuda headers include cuda version in form of major.minor. But when we do find_package(cuda). CUDA_VERSION variable includes patch number as well which fails following condition. ` if(NOT ${cuda_version_from_header} STREQUAL ${CUDA_VERSION}) ` For example: I have cuda 10.0 installed. My nvcc output looks like this `Cuda compilation tools, release 10.0, V10.0.130 ` If I compile my application with caffe2. It gives me following error: ``` CMake Error at /usr/share/cmake/Caffe2/public/cuda.cmake:59 (message): FindCUDA says CUDA version is (usually determined by nvcc), but the CUDA headers say the version is 10.0. This often occurs when you set both CUDA_HOME and CUDA_NVCC_EXECUTABLE to non-standard locations, without also setting PATH to point to the correct nvcc. Perhaps, try re-running this command again with PATH=/usr/local/cuda/bin:$PATH. See above log messages for more diagnostics, and see https://github.com/pytorch/pytorch/issues/8092 for more details. ``` In this case, it got failed because cuda_version_from_header = 10.0 CUDA_VERSION = 10.0.130 (Came from NVCC) `if(NOT ${cuda_version_from_header} STREQUAL ${CUDA_VERSION}) ` Fix: We should compare header version with major.minor format which is given by CUDA_VERSION_STRING Pull Request resolved: https://github.com/pytorch/pytorch/pull/14302 Differential Revision: D13166485 Pulled By: soumith fbshipit-source-id: 1b74e756a76c4cc5aa09978f5850f763ed5469b6	2018-11-21 21:02:28 -08:00
svcscm	992e2750fd	Updating submodules Reviewed By: yns88 fbshipit-source-id: ee60b4dddf688608ef80043b1dc336d120a045d0	2018-11-21 21:02:26 -08:00
svcscm	341b48529e	Updating submodules Reviewed By: yns88 fbshipit-source-id: 366c29d09bec53459e2a4890c7fe8d10f45ff5c3	2018-11-21 20:31:53 -08:00
Teng Li	b26f82b0ec	Robust NCCL barrier improvement to cover all devices combinations (#14271 ) Summary: This covers the very edgy case when we run the same NCCL process group with multiple GPU combinations instead of the last GPU combination. We always keep track of what GPUs have been used previously in the NCCL process group and barrier() itself will synchronize on each GPU's NCCL stream. Test covered as well. Tested on 8-GPU machine Pull Request resolved: https://github.com/pytorch/pytorch/pull/14271 Differential Revision: D13164993 Pulled By: teng-li fbshipit-source-id: 81e04352740ea50b5e943369e74cfcba40bb61c1	2018-11-21 18:23:55 -08:00
Michael Suo	b149456645	alias analysis (#14018 ) Summary: First draft of an alias analysis pass. It's a big PR unfortunately; a rough table of contents/suggested order of review: 1. `AliasAnalysis` pass, which traverses the graph and builds an `AliasDb`. The basic strategy is to assign alias information to every value of mutable type (list/tuple/tensor), and use the alias annotations of each node's schema to assign alias info to the outputs based on the alias info the inputs. Nodes that aren't explicitly schematized have hand-written analysis rules. 2. Integration of aliasing information into `moveBefore/AfterTopologicallyValid()`. Basically, we pass in an alias DB when we ask for moveBefore/After. Similar to how we can boil down dependency analysis to "what nodes use this node", we can boil down mutability analysis to "what nodes write to an alias set input/output'd by this node". 3. Integration of alias analysis to optimization passes that need it. Right now, it is `GraphFuser`, `CreateAutodiffSubgraphs`, constant prop, and CSE. Not sure if any others need it. - Testing; still figuring out the best way to do this. - Eventually we want to integrate the alias db into the graph, but we shouldn't do that until we can guarantee that the information can stay up to date with mutations. - Do the same thing `python_printer` did for operators and force people to register alias analyzers if they can't schematize their op. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14018 Differential Revision: D13144906 Pulled By: suo fbshipit-source-id: 1bc964f9121a504c237cef6dfeea6b233694de6a	2018-11-21 17:48:46 -08:00
Ilia Cherniavskii	d55ba77a5d	Remove extra include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14206 Reviewed By: dzhulgakov Differential Revision: D13131318 fbshipit-source-id: 559b55b8d98cdf6b7d1d3e31237c5473edc5e462	2018-11-21 17:21:44 -08:00
Teng Li	85d3fccee7	Removed redundant allreduce options in DDP (#14208 ) Summary: This somehow is not cleaned up after the C++ migration. Unused and can be removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14208 Differential Revision: D13132492 Pulled By: teng-li fbshipit-source-id: 0f05b6368174664ebb2560c037347c8eb45f7c38	2018-11-21 16:56:46 -08:00
David Riazati	d9cdcc9a3b	Add list inequality operator (#14129 ) Summary: This PR adds `aten::neq` for list inequality comparisons and converts `nll_loss` to weak script Pull Request resolved: https://github.com/pytorch/pytorch/pull/14129 Differential Revision: D13123894 Pulled By: driazati fbshipit-source-id: 8c1edf7c163217ec00eb653f95d196db3998613f	2018-11-21 16:32:58 -08:00
Yinghai Lu	34db39d87a	Add onnxifi support to SparseLengthsWeightedSum (#14210 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14210 We left `SparseLengthsWeightedSum` as benchmark is not testing it due to fp16 filler issue. It was flushed out by unit tests. Hence we add the support here. Reviewed By: bddppq Differential Revision: D13132320 fbshipit-source-id: b21c30c185c9e1fbf3980641bc3cdc39e85af2e1	2018-11-21 15:47:24 -08:00
Gu, Jinghui	60963c2ecb	Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim. (#12971 ) Summary: Add "axis" and "axis_w" arguments in FC to support customized axix to reduce dim. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12971 Reviewed By: bddppq Differential Revision: D12850675 Pulled By: yinghai fbshipit-source-id: f1cde163201bd7add53b8475329db1f038a73019	2018-11-21 15:44:50 -08:00
Viswanath Sivakumar	accbcca338	IDEEP fallback for ResizeNearest op (#14212 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14212 TSIA Reviewed By: yinghai Differential Revision: D13134134 fbshipit-source-id: e3c5c9c8756d6e25b213f8dde9d809a44373d7a3	2018-11-21 13:44:07 -08:00
zrphercule	2cacb39a21	Fix ONNX_ATEN mode (#14239 ) Summary: Fix ONNX_ATEN mode by adding it to the validateBlock method. Before this pr, validateBlock will throw an exception when using this mode. I will add related test cases for ONNX_ATEN mode in a different pr once this is merged, since we dont have any currently. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14239 Differential Revision: D13145443 Pulled By: zrphercule fbshipit-source-id: 60e7942aa126acfe67bdb428ef231ac3066234b1	2018-11-21 13:15:23 -08:00
Pieter Noordhuis	fe068d9032	Bump gloo (#14281 ) Summary: Includes more robust error handling and timeout support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14281 Differential Revision: D13158232 Pulled By: pietern fbshipit-source-id: e80432799a020576d5abdcd9a21d66b629479caf	2018-11-21 11:27:42 -08:00
Jongsoo Park	31ba34b73c	fix comment on dnnlowp op arguments (#14265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14265 Fix comment Reviewed By: hx89 Differential Revision: D13152106 fbshipit-source-id: fbe98906963cbd5cb20a583a737a792fbc38292e	2018-11-21 09:39:57 -08:00
Gregory Chanan	6ce9907d51	native NN wrappers, including with buffers. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14256 Differential Revision: D13148783 Pulled By: gchanan fbshipit-source-id: 4b6179033cf1df26061b6731eaaa4e008692e592	2018-11-21 09:08:00 -08:00
Pieter Noordhuis	91c0b7159a	Remove header generated at configuration time (#14244 ) Summary: The build was picking up the empty stub header instead of the generated one. Because of the large number of include paths we end up passing to the compiler it is brittle to have both an empty stub file and a generated file and expect the compiler to pick up the right one. With the recent change to compile everything from a single CMake run we can now use native CMake facilities to propagate macros that indicate backend support. The stanzas target_compile_definitions with the INTERFACE flag ensure that these macros are set only for downstream consumers of the c10d target. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14244 Reviewed By: teng-li Differential Revision: D13144293 Pulled By: pietern fbshipit-source-id: f49324220db689c68c126b159f4f00a8b9bc1252	2018-11-21 08:45:08 -08:00
Zachary DeVito	788d2e87bd	Address jittering issues in python_print (#14064 ) Summary: export - print a method with python_print import - import a method with import_method We want to ensure: export(g) == export(import(export(g))) That is after after exporting/importing once, the graph will stay exactly the same. This is less strict that g == import(export(g)) which would require us to maintain a lot more information about the structure of the IR and about the names of debug symbols. This PR addresses this with the following fixes: * print out double-precision numbers with high enough precision such that they always parse in the same way * when creating loop-carried dependencies, sort them by variable name, ensuring a consistent order * parse nan correctly * DCE: remove unused outputs of if statements, and loop-carried dependencies in loops that are dead both after the loop and inside the body of the loop. * Do not set uniqueName for variables whose names are _[0-9]+, these are probably rare in user code, and we need a way to communicate that we do not care about a variable name when re-parsing the graph. Otherwise temporary variable names will jitter around. * Expand the definition of a constant in printing code to None, and family. * Allow re-treeing to work as long as the only thing in its way is a constant node. These do not have side effects but are sometimes inserted in a different order when tracing compared to how we print them. * Print all constant nodes out first in the order in which they are used_val (or, if they are inlined, ensure they get assigned CONSTANT.cX number in a consistent order). Cleanup tuples (this is done in the compiler, but not in the tracer, leading to some tuple indexing jitter if not done). * use strtod_l, not std::stod which can throw exceptions Other: * Add REL_WITH_DEB_INFO to setup.py. It already existed for the cmake files. Threading it into setup.py allows us to turn on debug symbols with optimization everywhere. * enable round trip testing for all generated graphs. This only adds ~6 seconds to total build time but tests printing for every graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14064 Differential Revision: D13094637 Pulled By: zdevito fbshipit-source-id: 0a1c6912194d965f15d6b0c6cf838ccc551f161d	2018-11-21 06:38:29 -08:00
svcscm	af82396f7f	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 27838fb2dad82c78906faf3cc2d124557c30e88f	2018-11-21 06:38:28 -08:00
svcscm	166ee86b46	Updating submodules Reviewed By: cdelahousse fbshipit-source-id: 3c17e12a579245a84e9a56b1d8a1641232150675	2018-11-21 00:27:50 -08:00
Lu Fang	7a654617eb	Add tensor table in ModelDef and use it for jit script serialization and deserialization (#13861 ) Summary: As we discussed, the tensors in the torch script will be associated with the tensor data in the serialized file. So let's add a table of tensor (actually it's a repeated TensorProto filed) in the ModelDef. TensorProto.name will be the id. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13861 Reviewed By: dzhulgakov Differential Revision: D13036940 Pulled By: zrphercule fbshipit-source-id: ecb91b062ac4bc26af2a8d6d12c91d5614efd559	2018-11-20 23:37:50 -08:00
Tongzhou Wang	17432a1051	c10d Automatically retry on EINTR (#14180 ) Summary: Probably fixes https://github.com/pytorch/pytorch/issues/14170 Actually I probably shouldn't retry all `SYSCHECK` calls. I'll leave to the reviewers to decide. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14180 Reviewed By: pietern Differential Revision: D13144741 Pulled By: SsnL fbshipit-source-id: d73288f76b18cae14b1b43dad4e5e8d010a96d95	2018-11-20 23:31:26 -08:00
Teng Li	bb301a431d	Make NCCL backend support barrier op (#14142 ) Summary: This is a feature request from: https://github.com/pytorch/pytorch/issues/13573 As the title says, this PR makes NCCL backend support barrier op. There are a couple scenarios that need to be addressed: (1) When there is already a NCCL op happened, we need to record what GPU device(s) the previous op happened and queue the allreduce barrier op on the same GPU device (2) When there is no NCCL op yet, we will try to use a single GPU and separate each process from a single GPU as the best effort. As for the async work, during wait, we would like not just wait on the NCCL kernel to be completed, but also block the thread until the current stream and nccl stream return. `test_distributed` should cover the test. I also manually tested both scenarios. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14142 Differential Revision: D13113391 Pulled By: teng-li fbshipit-source-id: 96c33d4d129e2977e6892d85d0fc449424c35499	2018-11-20 21:12:22 -08:00
Yinghai Lu	1acaafbe70	Fix memory leakage in onnxifi transformer (#14245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14245 tsia Reviewed By: bddppq, rdzhabarov Differential Revision: D13144783 fbshipit-source-id: 5e07bb7ab883ba1af68547a26272cd320967b9e3	2018-11-20 18:03:05 -08:00
David Riazati	8f20d40bb7	Allow undefined tensors as constants (#14120 ) Summary: This PR inserts `prim::None` constants for undefined tensors. This comes in the standard library if an `Optional[Tensor]` is statically determined to be `None`: ```python torch.jit.script def fn(x=None): # type: (Optional[Tensor]) -> Tensor return torch.jit._unwrap_optional(x) torch.jit.script def fn2(): # type: () -> Tensor return fn() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14120 Differential Revision: D13124625 Pulled By: driazati fbshipit-source-id: 9eaa82e478c49c503f68ed89d8c770e8273ea569	2018-11-20 16:54:27 -08:00
Wanchao Liang	d6bfc53b9e	Export BatchNorm functional and module, add necessary JIT support (#14016 ) Summary: This PR did three things: 1. It export the BatchNorm functional and module, and rewrite some of the components to stay align with the current supported JIT features 2. In the process of export, add necessary compiler support for in_place op aug assign 4. change the test_jit behavior in add_module_test to utilize a single rng state during module initialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/14016 Differential Revision: D13112064 Pulled By: wanchaol fbshipit-source-id: 31e3aee5fbb509673c781e7dbb6d8884cfa55d91	2018-11-20 14:15:06 -08:00
Thomas Viehmann	1f871f126f	Have PYTORCH_FUSION_DEBUG print C kernel source (#14213 ) Summary: - Move up handling the environment variable from CPU only to all - Introduce two levels to be enabled with PYTORCH_FUSION_DEBUG=n: 1: print C source 2: print CPU assembly, too (previous effect of PYTORCH_FUSION_DEBUG) apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/14213 Differential Revision: D13135393 Pulled By: soumith fbshipit-source-id: befa4ebea3b3c97e471393a9f6402b93a6b24031	2018-11-20 12:45:07 -08:00
Tugrul Ates	1224ef9ea1	Delete backwards compatibility StorageImpl.h and TensorImpl.h (#14230 ) Summary: Since they directly include the real ones in core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14230 Differential Revision: D13140323 Pulled By: tugrulates fbshipit-source-id: d7e3b94e891b2d7fa273d01c0b7edfebdbd7e368	2018-11-20 12:29:24 -08:00
Jongsoo Park	9a281451ed	remove unused parameters from caffe2_dnnlowp_utils.cc (#14164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14164 See title Reviewed By: csummersea Differential Revision: D13115470 fbshipit-source-id: d754f558cd06e5f4c1cd00315e912cdb7b50731a	2018-11-20 00:56:06 -08:00
Jongsoo Park	3c2462cf24	use pragma once (#14163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14163 Some of the names we were using to guard the header file was too short (e.g. DYNAMIC_HISTOGRAM_H). Reviewed By: csummersea Differential Revision: D13115451 fbshipit-source-id: cef8c84c62922616ceea17effff7bdf8d67302a2	2018-11-20 00:56:04 -08:00
Jongsoo Park	4224ce10a8	format python files (#14161 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14161 Formatting using Nuclide Reviewed By: hx89 Differential Revision: D13115348 fbshipit-source-id: 7432ce6072a1822d7287b4ebcfcb6309282e15ac	2018-11-20 00:56:02 -08:00
Jongsoo Park	3c0ce51484	clang-format (#14160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14160 clang-format of C++ files Reviewed By: hx89 Differential Revision: D13115201 fbshipit-source-id: d2ad65f66209e00578ef90f87f41272de2d24aa9	2018-11-20 00:56:00 -08:00
Hui Wu	acd7811e33	Add sigmoid op based on MKL-DNN Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13097 Differential Revision: D13105366 Pulled By: yinghai fbshipit-source-id: d156e8fd519baeecf61c25dcd8fa2c2fa7351ef4	2018-11-19 22:56:35 -08:00
Daya S Khudia	c96b72d61f	OSS build fix (#14192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14192 We can only use C10_* in OSS. The build is only broken if built with USE_FBGEMM=ON Reviewed By: jianyuh Differential Revision: D13121781 fbshipit-source-id: f0ee9a75997766e63e1da8a53de7ddb98296a171	2018-11-19 22:47:17 -08:00
Lu Fang	6dacc20073	Make EncodeMethod in jit script serialization return a string (#14167 ) Summary: Nit Pull Request resolved: https://github.com/pytorch/pytorch/pull/14167 Reviewed By: ezyang Differential Revision: D13116584 Pulled By: dzhulgakov fbshipit-source-id: c0e7e71a81004031564bd2fc59f393041e1283d5	2018-11-19 22:15:19 -08:00
Jongsoo Park	a036f9a65f	Create README.md of caffe2/quantization/server Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14217 Reviewed By: csummersea Differential Revision: D13135086 Pulled By: jspark1105 fbshipit-source-id: bddf4f1c2dc5ec8ea6ebe9e265956f367e082d52	2018-11-19 21:59:34 -08:00
Will Feng	6dc28e666c	CircleCI: fix NCCL install (#14172 ) Summary: The `$BUILD_ENVIRONMENT` checks work in `test.sh` but not `build.sh`, this PR fixes the issue. This replaces https://github.com/pytorch/pytorch/pull/14124. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14172 Differential Revision: D13135087 Pulled By: yf225 fbshipit-source-id: 42fff3926734778713d483d74ba0a89e5502dd9e	2018-11-19 21:30:32 -08:00
zrphercule	03a02b6fd5	Fix a bug in test case of onnx::If Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14209 Differential Revision: D13132607 Pulled By: zrphercule fbshipit-source-id: b7f7ccc6a6cbdeb57a7f88a1971d15dd81e6fc81	2018-11-19 18:46:21 -08:00
Teng Li	b807970aea	Tensor type checking and informative error messages for torch.distributed (#14204 ) Summary: This will address https://github.com/pytorch/pytorch/issues/13574 This error message should be more informative to the user for all the non-multiGPU ops, since we python binding to multi-gpu ops always. test_distributed should cover all. Also tested both RunTime errors. ``` >>> a = torch.ByteTensor([]) >>> b = [a, a] >>> dist.all_reduce(b) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 809, in all_reduce _check_single_tensor(tensor, "tensor") File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 207, in _check_single_tensor "to be a torch.Tensor type".format(param_name)) RuntimeError: Invalid function argument. Expecting parameter: tensor to be a torch.Tensor type >>> b = ["b"] >>> dist.all_gather(b, a) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 1006, in all_gather _check_tensor_list(tensor_list, "tensor_list") File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 225, in _check_tensor_list "to be a List[torch.Tensor] type".format(param_name)) RuntimeError: Invalid function argument. Expecting parameter: tensor_list to be a List[torch.Tensor] type ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14204 Differential Revision: D13131526 Pulled By: teng-li fbshipit-source-id: bca3d881e41044a013a6b90fa187e722b9dd45f2	2018-11-19 18:30:54 -08:00
Edward Yang	7d1db89ef9	Move stream functions from CUDAContext to CUDAStream (#14110 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14110 I'm planning to move CUDAStream to c10/cuda, without also moving CUDAContext, and so it's most convenient if these definitions are in the actual header file in question. Reviewed By: smessmer Differential Revision: D13104693 fbshipit-source-id: 23ce492003091adadaa5ca6a17124213005046c2	2018-11-19 17:05:48 -08:00
Edward Yang	50b914aeeb	Move CUDAStreamInternals inside detail namespace. (#14109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14109 Previously it was at the top level, because the author was under the impression that you could only refer to top-level C++ names from C, but this is not true; you just need to make a stub struct conditioned on __cplusplus. Reviewed By: smessmer Differential Revision: D13104694 fbshipit-source-id: ecb7ae6dcfa4ab4e062aad7a886937dca15fd1b2	2018-11-19 17:05:46 -08:00
Edward Yang	e58bbbac18	Delete dependencies from CUDAStream; remove synchronize_with (#13920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13920 I want to move CUDAStream and CUDAGuard to c10_cuda without also bringing along CUDAContext or CUDAEvent for the ride (at least for now). To do this, I need to eliminate those dependencies. There's a few functions in CUDAContext.h which don't really need THCState, so they're separated out and put in general purpose c10/cuda/CUDAFunctions.h Reviewed By: smessmer Differential Revision: D13047468 fbshipit-source-id: 7ed9d5e660f95805ab39d7af25892327edae050e	2018-11-19 17:05:41 -08:00
Yavuz Yetim	a20c7ce848	Fix race in AtomicFetchAdd. (#13479 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13479 Increases the lock scope to above Output() calls. These calls potentially allocate the underlying blob/tensor objects and multiple invocations race each other over the same output blobs/tensors. Reviewed By: bwasti Differential Revision: D12891629 fbshipit-source-id: a6015cfdb08e352521a1f062eb9d94a971cfbdb0	2018-11-19 16:11:58 -08:00
Sebastian Messmer	1a29950478	Remove API macros from intrusive_ptr (#14137 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14137 This is a templated header-only class and shouldn't need export/import macros. Reviewed By: ezyang Differential Revision: D13111712 fbshipit-source-id: c8c958e75b090d011d25156af22f37f9ca605196	2018-11-19 15:39:20 -08:00
Jerry Zhang	1c2ed4eb23	Tensor construction: combine Resize+mutable_data - 1/4 (#13942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13942 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13054770 fbshipit-source-id: a9e86e5dfcb4f7cebf5243e1d359fad064561bed	2018-11-19 15:33:50 -08:00
Jerry Zhang	8aa5174106	Tensor construction: combine Resize+mutable_data - 3/4 (#13944 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13944 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13854 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: ezyang Differential Revision: D13054836 fbshipit-source-id: 5de07a156687f1ee607d0450410881d9176a87a7	2018-11-19 15:28:13 -08:00
Lu Fang	f34c848f52	Store the optimize flag in module (#14166 ) Summary: When the save/load of script module, we store optimize flag in module instead of encoding it in method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14166 Reviewed By: ezyang Differential Revision: D13117577 Pulled By: dzhulgakov fbshipit-source-id: dc322948bda0ac5809d8ef9a345497ebb8f33a61	2018-11-19 14:34:05 -08:00
Junjie Bai	7fd1ea6ab7	Cleanup caffe2 hipify exclude patterns (#14198 ) Summary: depthwise_3x3_conv_op.cu does not exist Pull Request resolved: https://github.com/pytorch/pytorch/pull/14198 Differential Revision: D13127479 Pulled By: bddppq fbshipit-source-id: ec6bd434055a49ea405c4b399bde8c074114f955	2018-11-19 14:27:56 -08:00
Gregory Chanan	b6edd7bbb4	Support 'python_module' of 'nn' in native functions. (#14126 ) Summary: Also move mse_loss, binary_cross_entropy, l1_loss to use this functionality. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14126 Reviewed By: ezyang Differential Revision: D13109975 Pulled By: gchanan fbshipit-source-id: 0b29dc8cf222d25db14da7532d8dc096a988a0ec	2018-11-19 14:13:25 -08:00
Junjie Bai	1e73ab25f5	Use onnx proto_utils to support using protobuf-lite Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14150 Differential Revision: D13115586 Pulled By: bddppq fbshipit-source-id: d6b6935a8deac60f6f58d62a71f6840182a72a51	2018-11-19 13:32:46 -08:00
Daya S Khudia	6b4852213d	Use fbgemm revision file added by shipit (#14105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14105 Pull Request resolved: https://github.com/facebook/fbshipit/pull/62 Use fbgemm revision file created by ShipIt for updating fbgemm revision for pytorch. We don't have to manually update submodule now. Reviewed By: yns88 Differential Revision: D13072074 fbshipit-source-id: bef9eabad50f7140179c370a60bd9ca73067b9b5	2018-11-19 12:12:21 -08:00
Your Name	b6290531aa	Setup sccache for PyTorch ROCm CI (#14153 ) Summary: Discovered huge build time difference between caffe2 rocm build and pytorch rocm build (6min vs. 30min), turns out it's because the sccache setup needed in caffe2 docker images are not n pytorch build script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14153 Differential Revision: D13115097 Pulled By: bddppq fbshipit-source-id: 88414f164b980f0e667c8e138479b4a75ab7692e	2018-11-19 11:31:55 -08:00
Ailing Zhang	e387d945c2	allow empty index for scatter_* methods (#14077 ) Summary: Fixes #2027 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14077 Differential Revision: D13095788 Pulled By: ailzhang fbshipit-source-id: ad2c8bbf83d36e07940782b9206fbdcde8905fd3	2018-11-19 09:50:21 -08:00
ArmenAg	751b5ea941	use at::Device throughout JIT (#14181 ) Summary: zdevito soumith Sorry about the previous PR, had some git issues. This is the same exact code as the previous PR but updated w.r.t pytorch/master. fixes #13254 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14181 Differential Revision: D13117688 Pulled By: soumith fbshipit-source-id: 044840b2c7a0101ef43dd16655fd9a0f9981f53f	2018-11-19 09:21:57 -08:00
Gregory Chanan	fc61f1a1d1	Support named return arguments in native_functions. (#14100 ) Summary: Note there was a hacky way of doing this before by specifying "return:" lists manually; this makes the return names part of the function declaration itself. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14100 Differential Revision: D13101810 Pulled By: gchanan fbshipit-source-id: 1c80574cd4e8263764fc65126427b122fe36df35	2018-11-19 08:27:20 -08:00
Edward Yang	ce85150cb4	Split out CUDAMultiStreamGuard from CUDAGuard (#13912 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13912 The implementation and API of CUDAMultiStreamGuard is less mature, and it cannot be implemented generically (yet) in c10_cuda. This might be a reasonable thing to do eventually, but not for now. Reviewed By: smessmer Differential Revision: D13046500 fbshipit-source-id: 4ea39ca1344f1ad5ae7c82c98617aa348c327848	2018-11-19 08:20:11 -08:00
Edward Yang	48099c23b4	Move AT_CUDA_CHECK to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13910 Reviewed By: smessmer Differential Revision: D13046201 fbshipit-source-id: 8d360a0e4d6c2edf070d130e600c6b04f0ee0058	2018-11-19 08:20:10 -08:00
Edward Yang	928687bb24	Add c10 cuda library. (#13900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13900 Add c10 cuda library. Right now, this is not used by anything, and only tests if the CUDA headers are available (and not, e.g., that linking works.) Extra changes: - cmake/public/cuda.cmake now is correctly include guarded, so you can include it multiple times without trouble. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: smessmer Differential Revision: D13025313 fbshipit-source-id: fda85b4c35783ffb48ddd6bbb98dbd9154119d86	2018-11-19 08:20:07 -08:00
Marat Dukhan	2681852438	Switch Int8Add operator to QNNPACK (#14089 ) Summary: - Improved single-threaded performance due to optimized low-level micro-kernels - Improved parallelization (previously was parallelized across images in a batch and pixels only, now within channels as well) - Slightly different result due to different implementation of fixed-point arithmetics (no accuracy loss expected) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14089 Differential Revision: D13110135 Pulled By: Maratyszcza fbshipit-source-id: 1f149394af5c16940f79a3fd36e183bba1be2497	2018-11-18 23:57:57 -08:00
Teng Li	92dbd0219f	No more -werror for c10d (#14155 ) Summary: As the title says Pull Request resolved: https://github.com/pytorch/pytorch/pull/14155 Differential Revision: D13115769 Pulled By: teng-li fbshipit-source-id: 278deba090364544d92fa603621604ce37fa974e	2018-11-18 13:53:41 -08:00
Summer Deng	55b25365e9	Add ultra low precision options (#14133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14133 Experiment with ultra low precisions on the Resnext-101 URU trunk model Reviewed By: jspark1105 Differential Revision: D10108518 fbshipit-source-id: f04d74fbe1c9e75efafcd9845719bdb2efbbfe9c	2018-11-18 12:51:34 -08:00
Soumith Chintala	ef3d7963d8	Adds symbolic diff for THNN Conv2d and aten native BatchNorm (#13888 ) Summary: Adds symbolic diff and tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13888 Differential Revision: D13115548 Pulled By: soumith fbshipit-source-id: ba75b01a95a5715a7761724dda018168b6188917	2018-11-18 09:22:31 -08:00
Your Name	07a8a730af	Print warning when ROCm memory leaking is detected in pytorch tests (#14151 ) Summary: We keep seeing random failures in CI because of ROCm memory leaking, e.g: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/3102//console https://ci.pytorch.org/jenkins/job/pytorch-builds/job/py2-clang7-rocmdeb-ubuntu16.04-test/3080//console To make the CI more stable, turn it to warning instead of failure. iotamudelta please help investigating the memory leaking Pull Request resolved: https://github.com/pytorch/pytorch/pull/14151 Differential Revision: D13115096 Pulled By: bddppq fbshipit-source-id: a13b68274ecba363d9d8436aa6a62ac40a77d78c	2018-11-18 00:11:44 -08:00
vishwakftw	a5891e6124	Remove debugging code in test_cholesky_batched (#14156 ) Summary: They didn't turn up in my tests because I use pytest which doesn't print debug statements if the tests pass Differential Revision: D13115227 Pulled By: soumith fbshipit-source-id: 46a7d47da7412d6b071158a23ab21e7fb0c6e11b	2018-11-17 22:28:21 -08:00
Jerry Zhang	1bafa6236f	Back out "[reland][codemod][caffe2] Tensor construction: combine Resize+mutable_data - 2/4" (#14154 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14154 Original commit changeset: e89c2e692178 Reviewed By: amateurcoffee Differential Revision: D13115023 fbshipit-source-id: 8f9fb55842ae6c8139d5cd88ec6d0abb0c5cc5e7	2018-11-17 19:51:03 -08:00
Martin Schatz	12bb4742ad	CostInference for 1D conv (#14009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14009 As title Reviewed By: yinghai Differential Revision: D13078718 fbshipit-source-id: 081e7b13ad6741c635ef413915b555f10f93bd33	2018-11-17 17:28:52 -08:00
vishwakftw	a30ade1139	Batched cholesky decomposition (#14017 ) Summary: Implements batching for the Cholesky decomposition. Performance could be improved with a dedicated batched `tril` and `triu` op, which is also impeding autograd operations. Changes made: - batching code - tests in `test_torch.py`, `test_cuda.py` and `test_autograd.py`. - doc string modification - autograd modification - removal of `_batch_potrf` in `MultivariateNormal`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14017 Differential Revision: D13087945 Pulled By: ezyang fbshipit-source-id: 2386db887140295475ffc247742d5e9562a42f6e	2018-11-17 10:49:15 -08:00
Jongsoo Park	390bf1e779	remove unnecessary file from avx2 list (#14012 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14012 conv_dnnlowp_op.cc doesn't need avx2 anymore. Reviewed By: dskhudia Differential Revision: D13079665 fbshipit-source-id: dbfe8d2213de4969b6334d54de81d51149268cbd	2018-11-17 10:29:25 -08:00
Your Name	505dedf6ad	Change from using enum to int to store data_type Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14140 Differential Revision: D13112937 Pulled By: bddppq fbshipit-source-id: 124d9546bfbd1f9c207a21e40eb3646f7739bd58	2018-11-17 09:24:03 -08:00
Junjie Bai	4f0434d5ab	Revert "CircleCI: fix NCCL install (#14124 )" (#14146 ) Summary: This reverts commit a1fa9d8cf9b2b0e7373ec420c2487d4dfd0e587c. [pytorch_linux_trusty_py2_7_9_build](https://circleci.com/gh/pytorch/pytorch/270206?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link/console): ``` Nov 17 07:37:27 + sudo apt-get -qq update Nov 17 07:37:30 W: Ignoring Provides line with DepCompareOp for package gdb-minimal Nov 17 07:37:30 W: You may want to run apt-get update to correct these problems Nov 17 07:37:30 + sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev Nov 17 07:37:30 E: Command line option --allow-downgrades is not understood Nov 17 07:37:30 + cleanup Nov 17 07:37:30 + retcode=100 Nov 17 07:37:30 + set +x ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14146 Differential Revision: D13113912 Pulled By: bddppq fbshipit-source-id: cd9d371cf72159f03d12a8b56ed5bd2060ebbe59	2018-11-17 00:35:31 -08:00
Junjie Bai	fade36668a	Revert D10428917: [Caffe2] Add cost into profile observer Differential Revision: D10428917 Original commit changeset: 7c100e551bdd fbshipit-source-id: 5164d9ba61cc103eccfdeb91a5cc140cea31a819	2018-11-16 23:30:07 -08:00
Junjie Bai	a43037fa11	Revert D10439558: Add cost for non-linear ops Differential Revision: D10439558 Original commit changeset: 9aeb05bac8b5 fbshipit-source-id: f00977b4f95bdd500d254eb44fb5b0c816506ee4	2018-11-16 23:30:05 -08:00
Marat Dukhan	afc91e4900	Update FXdiv submodule (#14128 ) Summary: Use the most recent version that disables inline assembly. I suspect inline assembly causes miscompilation on some versions of gcc7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14128 Reviewed By: bddppq Differential Revision: D13112370 Pulled By: Maratyszcza fbshipit-source-id: 36cc95dc51390a293b72c18ae982c3a515a11981	2018-11-16 22:45:26 -08:00
Marat Dukhan	6d9a7d0e60	Rename neon2sse.h to NEON_2_SSE.h to match upstream repo Summary: - NEON2SSE is a header that implements NEON intrinsics on top fo SSE intrinsics - Upstream repo provides NEON_2_SSE.h header, but internally it was imported as neon2sse.h - This patch fix incompatibilities between internal and upstream versions Reviewed By: hlu1 Differential Revision: D13096755 fbshipit-source-id: 65e1df9a2a5e74bd52c9aee9be27469ba938cd8c	2018-11-16 21:41:53 -08:00
Marat Dukhan	351478439f	Disable QNNPACK for multi-architecture iOS builds (#14125 ) Summary: QNNPACK contains assembly files, and CMake tries to build them for wrong architectures in multi-arch builds. This patch has two effects: - Disables QNNPACK in multi-arch iOS builds - Specifies a single `IOS_ARCH=arm64` by default (covers most iPhones/iPads on the market) Pull Request resolved: https://github.com/pytorch/pytorch/pull/14125 Differential Revision: D13112366 Pulled By: Maratyszcza fbshipit-source-id: b369083045b440e41d506667a92e41139c11a971	2018-11-16 21:18:01 -08:00
Sebastian Messmer	d56b2258f4	Register caffe2 layer norm with c10 dispatcher (#13693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13693 We can't directly call the caffe2::Operator class from c10 yet because that class isn't deprotobuffed yet. Instead, we factor out the kernel into a reusable static method and call it from the caffe2::Operator and also register it with c10. Reviewed By: ezyang Differential Revision: D12912242 fbshipit-source-id: c57502f14cea7a8be281f9787b175bb6e402d00c	2018-11-16 20:17:47 -08:00
Sebastian Messmer	c905a81c92	Add c10/core/ to cmake build (#14111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14111 It was already in TARGETs, but we forgot it in cmake. Reviewed By: ezyang Differential Revision: D13105166 fbshipit-source-id: f09549e98ebca751339b5ada1150e00cc4cd9540	2018-11-16 20:17:45 -08:00
Haixin Liu	bb404e7a32	Update atol scale in dnnlowp test (#14135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14135 Update atol scale of dnnlowp test. Can't reproduce the flaky test error in the task locally even after setting the same seed value, but found according to comments in check_quantized_results_close(), atol_scale should be 1/1.9=0.526315789473684, which is larger than current value 0.51. So increase the atol_scale to 0.53. Reviewed By: jspark1105 Differential Revision: D13108415 fbshipit-source-id: 1e8840659fdf0092f51b439cf499858795f9706a	2018-11-16 19:18:55 -08:00
Jongsoo Park	c784f847de	fix sparse_adagrad param_size overflow error (#14049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14049 param_size should be passed as int64_t Reviewed By: hyuen Differential Revision: D13090511 fbshipit-source-id: 7892d315d7c82c7d7ca103fb36d30cdf1fe24785	2018-11-16 18:53:32 -08:00
Haixin Liu	cbc94894fb	Add cost for non-linear ops (#13327 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13327 Add cost inference function to non-linear ops. Since the actual flops of the non-linear operator depends on the implementation, we use the number of non-linear operations as the proxy for the analytical flops for non-linear operators. Reviewed By: jspark1105 Differential Revision: D10439558 fbshipit-source-id: 9aeb05bac8b5c7ae5d351ebf365e0a81cf4fc227	2018-11-16 18:53:30 -08:00
Haixin Liu	86dc3ab252	Add cost into profile observer (#12793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12793 Add analytical cost into profile observer. It includes the op level cost information for each op run and net level aggregated cost information for each op type. It outputs the following information: 1. analytical flops 2. analytical bytes_read 3. analytical bytes_written Example output at op level: ```I1017 14:58:14.245978 3686541 profile_observer_gpu.cc:26] --------- Starting operator FC op#24 --------- I1017 14:58:14.246049 3686541 profile_observer_gpu.cc:33] Input 0: Tensor model1/embedded_encoder_inputs of type float. Dims: (17,1,256,): I1017 14:58:14.246109 3686541 profile_observer_gpu.cc:33] Input 1: Tensor model1/encoder/layer0/fw/milstm/i2h_w of type float. Dims: (2048,256,): I1017 14:58:14.246176 3686541 profile_observer_gpu.cc:33] Input 2: Tensor model1/encoder/layer0/fw/milstm/i2h_b of type float. Dims: (2048,): I1017 14:58:14.246217 3686541 profile_observer_gpu.cc:44] Argument 0: name: "use_cudnn" i: 1 I1017 14:58:14.246271 3686541 profile_observer_gpu.cc:44] Argument 1: name: "cudnn_exhaustive_search" i: 0 I1017 14:58:14.246338 3686541 profile_observer_gpu.cc:44] Argument 2: name: "order" s: "NHWC" I1017 14:58:14.246372 3686541 profile_observer_gpu.cc:44] Argument 3: name: "axis" i: 2 I1017 14:58:14.246418 3686541 profile_observer_gpu.cc:44] Argument 4: name: "quantization_scheme" i: 1 I1017 14:58:14.246470 3686541 profile_observer_gpu.cc:53] Output 0: Tensor model1/encoder/layer0/fw/milstm/i2h of type float. Dims: (17,1,2048,): I1017 14:58:14.246596 3686541 profile_observer_gpu.cc:61] Cost (flops, bytes_read, bytes_written): I1017 14:58:14.246649 3686541 profile_observer_gpu.cc:62] 17860608 2122752 139264 I1017 14:58:14.246677 3686541 profile_observer_gpu.cc:64] --------- Finished operator FC in 0.764221 ms --------- ``` Example output at net level: ``` I1017 11:13:44.675585 3146691 profile_observer_gpu.cc:165] ================ Detailed stats for net model0/encoder/layer0/bw/milstm ================ I1017 11:13:44.675662 3146691 profile_observer_gpu.cc:167] Cost (flops, bytes_read, bytes_written) per operator type: I1017 11:13:44.675706 3146691 profile_observer_gpu.cc:169] 20992000 42045440 81920 FC I1017 11:13:44.675745 3146691 profile_observer_gpu.cc:169] 20480 163840 81920 Mul I1017 11:13:44.675824 3146691 profile_observer_gpu.cc:169] 20480 163840 81920 Sum I1017 11:13:44.675878 3146691 profile_observer_gpu.cc:169] 0 0 0 ElementwiseLinear I1017 11:13:44.675909 3146691 profile_observer_gpu.cc:169] 0 0 0 LSTMUnit I1017 11:13:44.675958 3146691 profile_observer_gpu.cc:169] 0 0 0 rnn_internal_apply_link ``` Reviewed By: mdschatz Differential Revision: D10428917 fbshipit-source-id: 7c100e551bdd3ac8d7c09be12c72d70a2d67cae1	2018-11-16 18:53:28 -08:00
Will Feng	a1fa9d8cf9	CircleCI: fix NCCL install (#14124 ) Summary: The `$BUILD_ENVIRONMENT` checks work in `test.sh` but not `build.sh`, this PR is trying to figure out why. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14124 Reviewed By: teng-li Differential Revision: D13112483 Pulled By: yf225 fbshipit-source-id: 5f65997586648805cf52217a261389625b5535e1	2018-11-16 18:53:26 -08:00
Teng Li	eeb3e67eeb	Fixed MPI build with higher version of GCC (#14122 ) Summary: This appears as I enabled -Werror in c10d build. Good to catch this and fix it. Should fix https://github.com/pytorch/pytorch/issues/14078 and https://github.com/pytorch/pytorch/issues/13962 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14122 Differential Revision: D13110678 Pulled By: teng-li fbshipit-source-id: f4c19e16976d65debbd33ed59e17ddbaa19f765a	2018-11-16 18:53:24 -08:00
Teng Li	778e23606b	multiprocessing.spawn python version check (#14039 ) Summary: This will be super helpful to the user Pull Request resolved: https://github.com/pytorch/pytorch/pull/14039 Differential Revision: D13089200 Pulled By: teng-li fbshipit-source-id: 29e7507bd8fe5a0c58a85c52f976bfca282b4c1b	2018-11-16 18:53:23 -08:00
Gregory Chanan	ce6192a21f	Don't python bind _thnn_ functions. (#14101 ) Summary: This is needed for moving nn functions to native functions, but since some functions are already named this way, I'm going to stop binding pre-emptively so we can check if there are any current dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14101 Differential Revision: D13102219 Pulled By: gchanan fbshipit-source-id: 6bbcca33a03ab1bf648f1b73cadfe84339fa3050	2018-11-16 17:18:08 -08:00
Peter Goldsborough	55e1b1ec3e	Fix docs/cpp/requirements.txt (#14121 ) Summary: soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/14121 Differential Revision: D13108063 Pulled By: goldsborough fbshipit-source-id: 35cf65ba776e8826c5cab7ae6d3a2d446f87e7cc	2018-11-16 14:56:30 -08:00
Thomas Viehmann	8610ff1072	Allow cooperative structured objects to be passed modules in tracing (#13961 ) Summary: Before this patch, the JIT does not allow Module's forward to take structured objects. This patch allows cooperative objects to do so. Cooperative means: - It has a method self._jit_unwrap() that returns (a list/tuple of) tensors. These are then used in _iter_tensors. - It has a method self._jit_wrap(flattened_input) that takes (a list/tuple?) the flattened_unput (potentially more than it needs) and returns itself (updated) and the unconsumed flattened_inputs. This is then used in the _unflatten mechanism. This is all it takes to permit maskrcnn-benchmark to use its structured BoxList/ImageList types and trace it without calling the .forward directly. I'll push a model working with this patch in https://github.com/facebookresearch/maskrcnn-benchmark/pull/138 I must admit I haven't fully checked whether there are ONNX changes needed before it, too, can profit, but I would be hopeful that anything currently usable remains so. fmassa zdevito So the main downside that I'm aware of is that people will later want to use more elaborate mechanisms, but I think this could be done by just amending what wrap/unwrap are returning / consuming. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13961 Differential Revision: D13103927 Pulled By: soumith fbshipit-source-id: 2cbc724cc4b53197388b662f75d9e601a495c087	2018-11-16 14:02:13 -08:00
Peter Goldsborough	fb6535ec70	Add SharedDataset (#13800 ) Summary: This PR adds a `SharedDataset` to the C++ frontend data API, which allows wrapping a shared_ptr to a dataset into a class that conforms to the `Dataset` interface (with `get_batch`). This enables use cases where a custom dataset is (1) thread-safe and (2) expensive to copy. All workers will reference a single instance of this dataset. No additional copies are incurred. jaliyae apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/13800 Differential Revision: D13075610 Pulled By: goldsborough fbshipit-source-id: 4ffdfd7959d49b042c0e254110085f62a0bfeb6c	2018-11-16 13:07:10 -08:00
jjsjann123	96e5d23bad	remove dynamic initialization warning (#13913 ) (#13967 ) Summary: removed assignment in default constructor. removed static shared memory and used dynamic shared memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13967 Differential Revision: D13089996 Pulled By: soumith fbshipit-source-id: 2a218b909c849bed39636b45a02d10ebc279a0b0	2018-11-16 13:04:22 -08:00
Peter Goldsborough	5b1b8682a3	Missing .decode() after check_output in cpp_extensions (#13935 ) Summary: soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/13935 Differential Revision: D13090852 Pulled By: goldsborough fbshipit-source-id: 47da269d074fd1e7220e90580692d6ee489ec78b	2018-11-16 12:16:29 -08:00
ArutyunovG	8e91da4cb3	Windows shared build (#13550 ) Summary: Hi guys, I'd like to build Caffe2 with more supported options in Windows with Microsoft Visual Studios. This is the first pull request. Running scripts/build_windows_shared.bat is able to build Caffe2 with both CMAKE_BUILD_TYPE=Debug and CMAKE_BUILD_TYPE=Release with Visual Studio 14 2015. CUDA is 9.0, cudnn is 7.0.5, glog, gflags and lmdb are supported on my system. Python is 3.5, Detectron works from python interface as well. It was even possible to debug detectron code and step into caffe2_gpu.dll with pdbs built. What is disappointing, that c10/experimental ops don't build with this Visual Studio generator, I added special option INCLUDE_EXPERIMENTAL_C10_OPS (default ON) to deal with it in build_windows_shared.bat. After this pull request the next step is to add Visual Studio 2017 support in the script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13550 Reviewed By: ezyang Differential Revision: D13042597 Pulled By: orionr fbshipit-source-id: f313f909f599cd582a1d000eff766eef3a9fc4fc	2018-11-16 12:16:28 -08:00
Freddie Mendoza	2c21de2007	Make JOIN_TIMEOUT longer for ppc64le (#14107 ) Summary: This should resolve the issue on ppc64le getting FAIL: test_proper_exit (__main__.TestDataLoader). This only happens when the CI build machine is very busy and fails with a timeout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14107 Differential Revision: D13103859 Pulled By: soumith fbshipit-source-id: 268be80b59840853c5025f3211af272f68608fe5	2018-11-16 12:12:58 -08:00
Ilia Cherniavskii	c192788188	Log error from the net's run (#14035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14035 Log error meesage in case of net's run failure Reviewed By: andrewwdye Differential Revision: D13085431 fbshipit-source-id: d79f76782410cd3a5bd2d8d7f5fb1e535d821051	2018-11-16 12:06:50 -08:00
Junjie Bai	0d7a986da1	Change hip filename extension to .hip (#14036 ) Summary: xw285cornell - To make hip files to have unique filename extension we change hip files from _hip.cc to .hip (it's the only blessing option other than .cu in hipcc `3d51a1fb01/bin/hipcc (L552)`). - Change to use host compiler to compile .cc\|.cpp files. Previously we use hcc to compile them which is unnecessary - Change the hipify script to not replace "gpu" with "hip" in the filename of the generated hipified files. Previously we do this because hcc has a bug when linking files that have same filename. We have now changed to use host linker to do linking so this is unnecessary anymore. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14036 Reviewed By: xw285cornell Differential Revision: D13091813 Pulled By: bddppq fbshipit-source-id: ea3d887751d8abb39d75f5d5104aa66ce66b9ee0	2018-11-16 11:55:59 -08:00
Your Name	30018fcd0b	Enable Caffe2 ROCm test on centos (#14090 ) Summary: xw285cornell petrex ashishfarmer rohithkrn Pull Request resolved: https://github.com/pytorch/pytorch/pull/14090 Differential Revision: D13096874 Pulled By: bddppq fbshipit-source-id: b471c6e4db95cd51567745a2f758d58bba7eafad	2018-11-16 11:51:58 -08:00
Junjie Bai	5a53861d3a	Enable Caffe2 test on centos (#14091 ) Summary: Turns out we don't have any centos test CI job Pull Request resolved: https://github.com/pytorch/pytorch/pull/14091 Differential Revision: D13104722 Pulled By: bddppq fbshipit-source-id: 22fe92ad4b7f2c391eea16b8b95658fa1ee605e2	2018-11-16 11:51:56 -08:00
Thomas Viehmann	1256cbaa69	Relax limits for gradients in test_jit's checkGraph (#14094 ) Summary: - This should help TestJit.test_lstm_fusion_concat_cuda to be less flaky. (Checked on manual_seed 0..99) Fixes: #14026 - Revert the renaming of test_fused_abs that was introduced to game the order of tests to avoid the flakiness above. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14094 Differential Revision: D13100174 Pulled By: soumith fbshipit-source-id: 91bb63b07a960a81dddfc0bf25c67696c0f6c46d	2018-11-16 11:43:52 -08:00
Anders Papitto	2983998bb3	add torch-python target (#12742 ) Summary: This is the next minimal step towards moving _C into cmake. For now, leave _C in setup.py, but reduce it to an empty stub file. All of its sources are now part of the new torch-python cmake target. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12742 Reviewed By: soumith Differential Revision: D13089691 Pulled By: anderspapitto fbshipit-source-id: 1c746fda33cfebb26e02a7f0781fefa8b0d86385	2018-11-16 11:43:48 -08:00
Michael Suo	cb86ae304e	alias annotation parsing #2 (#14053 ) Summary: hopefully this one doesn't break master. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14053 Differential Revision: D13093406 Pulled By: suo fbshipit-source-id: 8fed44f1a3d463748726cb14acac2ea53dedf29b	2018-11-16 11:39:25 -08:00
Andy Chen	77c2f4d0d7	Make THPDtype_New error instead of truncate (#14103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14103 Addressing T34828781, we change THPDtype_New so that it throws a RuntimeError if the length of name is greater than buffer size (DTYPE_NAME_LEN) - instead of truncating the string to fit the buffer. Reviewed By: ezyang Differential Revision: D13094600 fbshipit-source-id: d0dbf8fdfa342630c31f4d8ca7230d5f24a1254a	2018-11-16 11:35:18 -08:00
Yinghai Lu	7c053b7e64	Add filler for SparseLengthsWeightedSum (#13949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13949 This diff adds support to fillers for `SparseLengthsWeight` ops. It does 3 things: 1. Add the fillers for `SparseLengthsWeight` ops 2. Add filling heuristics to consider the path of `LengthsRangeFill` -> `Gather` -> `SparseLengthsWeightedSum`, where the length input is shared by `LengthsRangeFill` and `SparseLengthsWeightedSum`. Therefore, we need to carefully bound the value of that length input so that at `Gather`, it does not index out-of-bound for the weight input of `Gather`. 3. Fix and simplify the logic of `math::RandFixedSum`, where we just keep rejecting the generated value if it violates the invariants. Reviewed By: highker Differential Revision: D13048216 fbshipit-source-id: bfe402e07e6421b28548047d18b298c148e0ec87	2018-11-16 11:31:05 -08:00
Wanchao Liang	3c7b575a14	Update ATen doc with optional syntax (#14086 ) Summary: Update the readme to reflect the recent optional syntax change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14086 Differential Revision: D13096114 Pulled By: wanchaol fbshipit-source-id: 713834d4d92021e1c7a31f3a56a00fb7da58c348	2018-11-16 10:03:24 -08:00
Tongzhou Wang	562f61a662	Add missing space in stft doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14092 Reviewed By: soumith Differential Revision: D13100177 Pulled By: SsnL fbshipit-source-id: 4eeaa3d0c04212516941d8d5a266aafb53bd9672	2018-11-16 09:57:06 -08:00
Brian Vaughan	e4bb56570c	Preemptively test for out-of-order length. (#13933 ) Summary: torch.nn.utils.rnn.pack_padded_sequence segment fault if not in decreasing order #13324 We were seeing this segfault on throw, pre-emptively checking avoids this: * Error in `/home/bvaughan/anaconda3/bin/python': double free or corruption (!prev): 0x00005555566e7510 * Pull Request resolved: https://github.com/pytorch/pytorch/pull/13933 Differential Revision: D13090389 Pulled By: nairbv fbshipit-source-id: 6f6b319e74cb55830be799e9c46bc33aa59256d8	2018-11-16 08:39:05 -08:00
Duc Ngo	c7a247facf	nomnigraph - support subgraph visualization (#13795 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13795 Add ability for dot string generation for a single subgraph and python bindings (which is pretty useful for model exploration in Python) Restructure DotGenerator class a bit to make it easy to implement this feature Reviewed By: bwasti Differential Revision: D13010512 fbshipit-source-id: 825665438394b7e6968ab6da167b477af82a7b62	2018-11-16 08:19:20 -08:00
Duc Ngo	d7b95dda51	nomnigraph - easy - expose hasProduce(NodeRef) to python (#14075 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14075 Expose hasProduce(NodeRef) to python Reviewed By: bwasti Differential Revision: D13092930 fbshipit-source-id: f1ec06e73e0f5f6a16ad0cbb7d2e3e499a861d8e	2018-11-16 08:19:18 -08:00
Duc Ngo	e7f5fceb99	nomnigraph - easy - expose inducesEdges and addNode to python's NNSubgraph (#14074 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14074 expose inducesEdges and addNode to python's NNSubgraph. This make it easy to manually construct a NNSubgraph in python Reviewed By: bwasti Differential Revision: D13092885 fbshipit-source-id: a94ed0b318162e27e3a4b5a4954eb6d169da7405	2018-11-16 08:19:16 -08:00
Thomas Viehmann	7b0f674367	Two small improvements to TorchConfig.cmake (#13849 ) Summary: - Fix the test for TORCH_INSTALL_PREFIX in the environment. The previous version didn't actually work. - Add a guess path to find_package for Caffe2. I'd suspect that it's close to the Torch one. I noticed these while compiling PyTorch custom ops, in particular for the C++ side when you don't want to go through Python. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13849 Differential Revision: D13090186 Pulled By: ezyang fbshipit-source-id: cfe98900ab8695f008506a8d0b072cfd9c673f8f	2018-11-16 07:41:57 -08:00
lyuwenyu	1b1cdd944c	Keep `ModuleList` consistent with python `list` in `__setitem__` function. (#13102 ) Summary: `ModuleList` class function `__setitem__` has implicit rist ``` In [26]: mlist = nn.ModuleList([nn.ReLU(), nn.Conv2d(10, 10, 3, 1)]) In [27]: mlist Out[27]: ModuleList( (0): ReLU() (1): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1)) ) In [28]: mlist[-1] = nn.ReLU() In [29]: mlist Out[29]: ModuleList( (0): ReLU() (1): Conv2d(10, 10, kernel_size=(3, 3), stride=(1, 1)) (-1): ReLU() ) In [30]: mlist[-1] --------------------------------------------------------------------------- KeyError Traceback (most recent call last) <ipython-input-30-229d1b6823a0> in <module>() ----> 1 mlist[-1] ~/anaconda3/lib/python3.6/site-packages/torch/nn/modules/container.py in __getitem__(self, idx) 134 return ModuleList(list(self._modules.values())[idx]) 135 else: --> 136 return self._modules[self._get_abs_string_index(idx)] 137 138 def __setitem__(self, idx, module): KeyError: '2' ``` modified as ``` def __setitem__(self, idx, module): idx = self._get_abs_string_index(idx) return setattr(self, str(idx), module) ``` to fix it. ``` In [31]: class NewModuleList(nn.ModuleList): ...: def __setitem__(self, idx, module): ...: idx = self._get_abs_string_index(idx) ...: return setattr(self, str(idx), module) ...: In [32]: mlist = NewModuleList([nn.ReLU(), nn.Conv2d(10, 10, 2, 1)]) In [33]: mlist[-1] = nn.ReLU() In [34]: mlist Out[34]: NewModuleList( (0): ReLU() (1): ReLU() ) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13102 Differential Revision: D13092480 Pulled By: ezyang fbshipit-source-id: 7ff7688f66e44bbd263a10d2d09db7bb0df4b749	2018-11-16 07:39:26 -08:00
vishwakftw	a3f39f1ebb	Fix randint docs (#14083 ) Summary: Closes #14079 Differential Revision: D13095904 Pulled By: soumith fbshipit-source-id: e39319c5326bfdf6f401eaddebe94474349901c3	2018-11-16 03:04:02 -08:00
Your Name	2fe4711eb4	Revert "Remove OptionsGuard from ATen (#13738 )" (#14082 ) Summary: This reverts commit 37cb357d8da3427900b8f72f6de7e77b77dcdbae. Try to see if it unbreaks master Pull Request resolved: https://github.com/pytorch/pytorch/pull/14082 Differential Revision: D13095888 Pulled By: bddppq fbshipit-source-id: c728f80f233b4d9daaf65f43202d8104651029a9	2018-11-15 23:47:36 -08:00
Teng Li	45fd77d3b7	Adding GLOO_SOCKET_IFNAME env to allow user set gloo device (#14065 ) Summary: Address https://github.com/pytorch/pytorch/issues/14063 This is a lot easier to use, follow the NCCL convention since they provide the similar NCCL_SOCKET_IFNAME. We can later document this better. Tested on my two hosts, and work out of the box Pull Request resolved: https://github.com/pytorch/pytorch/pull/14065 Differential Revision: D13095522 Pulled By: teng-li fbshipit-source-id: 131dff212626f1aab7e752427f1b684845b909dc	2018-11-15 22:33:56 -08:00
Parth Raichura	3808e9fad3	Caffe2: Fix for creating entries of external_input in predic_net (#12979 ) Summary: Currently after performing export it gives two entries of externel_input of input data in predict_net proto because it extends the externel_input twice once seperately using input blob and one it is extendind all the entries of external_input from proto in which input blob is already included Signed-off-by: Parth Raichura <parth.raichura@softnautics.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12979 Differential Revision: D12916349 Pulled By: soumith fbshipit-source-id: 4d4a1c68c0936f8de3f4e380aea1393fe193cd2d	2018-11-15 22:33:50 -08:00
Michael Suo	1e8aeb0bee	fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14076 Differential Revision: D13095528 Pulled By: suo fbshipit-source-id: 78d08719ad5579dc0d6bb9563972df393e4286fe	2018-11-15 22:10:06 -08:00
Syed Tousif Ahmed	3a15de9e44	Fix CUDA_tensor_apply1 base case (#14056 ) Summary: I got some build errors when modifying the `bernoulli_tensor_cuda_kernel` in my Generator refactor https://github.com/pytorch/pytorch/pull/13070. Turns out the functions signature for `CUDA_tensor_apply1` was a little wrong. This PR fixes it. Following is the code and error I was getting before this patch: Code: ``` template<typename scalar_t, typename prob_t> void bernoulli_tensor_cuda_kernel( at::Tensor& ret, const at::Tensor& p, std::pair<uint64_t, uint64_t> seeds) { // The template argument `4` below indicates that we want to operate on four // element at each time. See NOTE [ CUDA_tensor_applyN helpers ] for details. at::cuda::CUDA_tensor_apply2<scalar_t, prob_t, 4>( ret, p, [seeds] __device__(scalar_t& v1, const prob_t& p1) { at::cuda::Philox4_32_10 engine( seeds.first, blockIdx.x * blockDim.x + threadIdx.x, seeds.second); auto x = at::cuda::standard_uniform_distribution(engine); assert(0 <= p1 && p1 <= 1); v1 = static_cast<scalar_t>(x <= p1); } ); } ``` Error: ``` ov 15 23:43:03 /var/lib/jenkins/workspace/aten/src/ATen/cuda/CUDAApplyUtils.cuh(236): error: no suitable conversion function from "const lambda [](uint8_t &)->void" to "int" exists Nov 15 23:43:03 detected during: Nov 15 23:43:03 instantiation of "void at::cuda::<unnamed>::ApplyOp1<Op, scalar, IndexType, ADims, remaining_steps, Offsets...>::apply(at::cuda::detail::TensorInfo<scalar, IndexType> &, const Op &, int, IndexType, Offsets...) [with Op=lambda [](uint8_t &)->void, scalar=uint8_t, IndexType=unsigned int, ADims=1, remaining_steps=1, Offsets=<>]" Nov 15 23:43:03 (282): here Nov 15 23:43:03 instantiation of "void at::cuda::<unnamed>::kernelPointwiseApply1<Op,scalar,IndexType,ADims,step>(at::cuda::detail::TensorInfo<scalar, IndexType>, IndexType, Op) [with Op=lambda [](uint8_t &)->void, scalar=uint8_t, IndexType=unsigned int, ADims=1, step=1]" Nov 15 23:43:03 (735): here Nov 15 23:43:03 instantiation of "__nv_bool at::cuda::CUDA_tensor_apply1<scalar,step,Op>(at::Tensor, Op, at::cuda::TensorArgType) [with scalar=uint8_t, step=1, Op=lambda [](uint8_t &)->void]" Nov 15 23:43:03 (774): here Nov 15 23:43:03 instantiation of "__nv_bool at::cuda::CUDA_tensor_apply1<scalar,Op>(at::Tensor, Op, at::cuda::TensorArgType) [with scalar=uint8_t, Op=lambda [](uint8_t &)->void]" Nov 15 23:43:03 /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Distributions.cu(118): here Nov 15 23:43:03 instantiation of "void <unnamed>::bernoulli_scalar_cuda_kernel<scalar_t>(at::Tensor &, double, std::pair<uint64_t, uint64_t>) [with scalar_t=uint8_t]" Nov 15 23:43:03 /var/lib/jenkins/workspace/aten/src/ATen/native/cuda/Distributions.cu(227): here Nov 15 23:43:03 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/14056 Differential Revision: D13095362 Pulled By: soumith fbshipit-source-id: 6416bc91616ec76036479062a66517557a14d1b9	2018-11-15 21:33:07 -08:00
Viswanath Sivakumar	037d6b697b	Add ResizeNearest DNNLOWP op (#13940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13940 As in title Reviewed By: jspark1105 Differential Revision: D13054325 fbshipit-source-id: 81af5f095a1aca92d4b5e1fe0e71ae2f21b43922	2018-11-15 21:03:01 -08:00
Daya S Khudia	f66cb02016	Turn fbgemm off by default for pytorch (#14048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14048 Setting USE_FBGEMM to OFF by default until we figure out properly separating avx2 code. See [this issue](https://github.com/pytorch/pytorch/issues/13993). Pytorch can still be compiled with fbgemm by using USE_FBGEMM=ON. Reviewed By: jspark1105 Differential Revision: D13090454 fbshipit-source-id: 6e0e92612e4362a306e376df3dc33e8edeb066e9	2018-11-15 18:42:16 -08:00
Teng Li	f17b2fdf1b	Fixed THD DistributedDataParallel not picklable (#14051 ) Summary: This fixed https://github.com/pytorch/pytorch/issues/12261 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14051 Differential Revision: D13091703 Pulled By: teng-li fbshipit-source-id: 16eb85a259c981f3cacd2fbaecc0edbae292e358	2018-11-15 18:10:47 -08:00
Peter Goldsborough	37cb357d8d	Remove OptionsGuard from ATen (#13738 ) Summary: Deletes the `OptionsGuard` from ATen. This works towards the goal of reworking `DefaultTensorOptions`. `OptionsGuard` is troublesome because it relies on mutating thread local state. This PR fixes those code locations and then deletes the `OptionsGuard`. ezyang gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/13738 Differential Revision: D13000962 Pulled By: goldsborough fbshipit-source-id: c8143ee75070c2280f5fd1d9af86f8ce14279b72	2018-11-15 17:37:27 -08:00
Peter Goldsborough	8f4dc192b6	Fix DataLoaderTest.EnforcesOrderingAmongThreadsWhenConfigured (#14038 ) Summary: I think this will be it. So for one, the previous test was bullshit because it was returning the thread id instead of the sample index (which is the thing whose ordering is enforced). Just turning up the number of threads to 10 from 4 made this very obvious. I also think there is a race condition, which may or may not have surfaced, in that there was nothing stopping one worker to get multiple batches, which would screw with the whole ordering logic. I've added a barrier struct such that workers wait for all workers to be in the `get_batch` function before actually doing something. Fixes https://github.com/pytorch/pytorch/issues/14002 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/14038 Differential Revision: D13088132 Pulled By: goldsborough fbshipit-source-id: 4bded63756c6a49502ee07ef8709a03073e7e05f	2018-11-15 17:30:41 -08:00
Ilia Cherniavskii	f930c4307c	Clean up executor's execution flags (#13869 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13869 Remove unused flags and consolidate them into one struct Reviewed By: yinghai Differential Revision: D13032207 fbshipit-source-id: 2cef093589036238732099e3851a97e739b5fd55	2018-11-15 17:11:51 -08:00
bddppq	874a8a321b	Fix out of order member fields initializaitons (#14015 ) Summary: xw285cornell Unfortunately it's not easy to add -Werror=reorder flag since there are out of order initializations in thrust headers as well, and the rocm cmake macro hip_include_directories doesn't offer a way to include headers as external headers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14015 Reviewed By: soumith Differential Revision: D13081104 Pulled By: bddppq fbshipit-source-id: 2540421cb29cf556c79f2d86c460bde6ea5a182e	2018-11-15 17:11:50 -08:00
Edward Yang	31d41a983a	Revert D13088038: [pytorch][PR] [jit] extend alias annotations Differential Revision: D13088038 Original commit changeset: 49dc5d0e9cd4 fbshipit-source-id: b77e4607f3cbd9c202c522a436f90e9a98acd4b4	2018-11-15 16:55:11 -08:00
Brian Johnson	6d378d3740	Updating C++ documentation to PyTorch theme. (#13791 ) Summary: Updates C++ documentation to the PyTorch Sphinx theme. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13791 Reviewed By: soumith Differential Revision: D13013908 Pulled By: brianjo fbshipit-source-id: 253a91c6784ad72aa1c37426cd4a945061a60fec	2018-11-15 16:45:52 -08:00
David Riazati	0d29846d5e	Convert more weak functions (#14003 ) Summary: Same deal as #13707 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14003 Differential Revision: D13076403 Pulled By: driazati fbshipit-source-id: eb3cb3b2c31caf1de591b613bdc4c9a6ed4e1767	2018-11-15 16:45:50 -08:00
Matthew Brandyberry	c5afad5579	Fix skip logic in caffe_translator_test.py (#13627 ) Summary: Avoid false failure by checking for the presence of the test data in setup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13627 Differential Revision: D13090324 Pulled By: ezyang fbshipit-source-id: e85571943d168c0007212d7b1a5b99ffa0c39235	2018-11-15 16:45:49 -08:00
Ilia Cherniavskii	0e93500841	Remove async_polling (#13825 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13825 async_polling was an intermediate step towards async_scheduling and is not used Reviewed By: yinghai Differential Revision: D13019059 fbshipit-source-id: eee6ba53e7f476ddb481afba3bf1768303864d32	2018-11-15 16:23:15 -08:00
Zachary DeVito	0573169e23	Import a method from an python_print string (#13959 ) Summary: * Add hooks to get a callback whenever a valid graph is produced in the compiler or through tracing. These hooks can be used to pretty_print and then reparse every graph our tests produce to check that the serialization function works correctly. Currently this is guarded by an environment variable since there are a few remaining failures. * Fix printing bugs: True and False rather than 1 and 0, print 0. for floating point zero * Change behavior of NoneType. It is now no longer a subtype of Optional but instead implicitly converts to it, returning a prim::Node with an Option[T] type for some specific T. This allows functions like `_unwrap_optional` to correctly match against a None while still deriving the right type. * Fix a bug where empty blocks did not correctly emit "pass" in printer. * Fix a bug where prim::Undefine sometimes cannot be printed as None because it is being used in a schema-less op. This should be fixable once Optional[T] always uses the same None object. * Other minor printing bugs Pull Request resolved: https://github.com/pytorch/pytorch/pull/13959 Reviewed By: jamesr66a Differential Revision: D13073519 Pulled By: zdevito fbshipit-source-id: 4167a6b614f2e87b4d21823275a26be5ba4fc3dd	2018-11-15 16:11:37 -08:00
Yinghai Lu	84d464f8f9	Revert "Upgrade mkldnn bridge to reduce overhead of bridge itself (#1… (#14040 ) Summary: …2164)" This reverts commit 4b7c6150d848d134d1fe850e777dc68321d35465. Pull Request resolved: https://github.com/pytorch/pytorch/pull/14040 Differential Revision: D13089531 Pulled By: yinghai fbshipit-source-id: 2114b36111dab6f179c02921bbc9bd382ef461bf	2018-11-15 15:34:15 -08:00
Jerry Zhang	90b0c4f43d	Tensor construction: combine Resize+mutable_data - 2/4 (#13943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13943 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13852 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: ezyang Differential Revision: D13054815 fbshipit-source-id: e89c2e69217880980187f2befb844c277e51c1e0	2018-11-15 15:34:14 -08:00
Joe Peplowski	136f5c9fe1	Replaced using declaration with explicit constructors 3/3 (#13875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13875 This replaces a using declaration with an explicit constructor Reviewed By: mnovakovic Differential Revision: D13033260 fbshipit-source-id: ce4cc5667ee66abdeebd1e49466c3cf3a65ffb96	2018-11-15 14:52:47 -08:00
Edward Yang	3fbb753512	Revert D12873145: [pt1][tensor][refactor] FeedTensor returns a Tensor Differential Revision: D12873145 Original commit changeset: 653735c20d61 fbshipit-source-id: aa6e40a6a24c6f90acbe87b32b3be0020e2584f8	2018-11-15 14:52:46 -08:00
Michael Suo	d91c686c33	extend alias annotations (#13632 ) Summary: Grab bag of additions to alias annotations that were useful when writing the alias analysis pass. Not very organized since these were mostly split off from that PR. - Switch alias sets to actual sets, since we will want to union them. - Correctly parse alias set unions `a\|b`, and correctly parse wildcards - Move writes into `AliasInfo`, which cleans up some code that was passing a `writes` vector everywhere and simplifies tracking aliased writes during analysis. - Change Tensor list extraction ops to return wildcard tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13632 Differential Revision: D13088038 Pulled By: suo fbshipit-source-id: 49dc5d0e9cd4895427fea3a87b0ec325bd5fe437	2018-11-15 14:23:40 -08:00
Thomas Viehmann	c7e0db140e	use fabs instead of absf in fuser code for aten::abs (#13985 ) Summary: absf didn't work for CUDA Fixes: #13971 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13985 Differential Revision: D13084601 Pulled By: soumith fbshipit-source-id: 0027ee719ae2b6a2bfce9c26f21db9c5e6159686	2018-11-15 13:23:59 -08:00
Xiang Gao	c3578b561c	Skip all builtin functions when importing names from _C._VariableFunctions to torch (#13884 ) Summary: We don't want builtin functions of `_C._VariableFunctions` to replace those of `torch`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13884 Reviewed By: ezyang Differential Revision: D13044686 Pulled By: yf225 fbshipit-source-id: 23657d47a4e2fd8ee41103cd6a13c639ce107f67	2018-11-15 13:23:57 -08:00
Gu, Jinghui	4b7c6150d8	Upgrade mkldnn bridge to reduce overhead of bridge itself (#12164 ) Summary: Upgrade mkldnn bridge to reduce overhead of bridge itself Pull Request resolved: https://github.com/pytorch/pytorch/pull/12164 Reviewed By: yinghai Differential Revision: D10159149 Pulled By: wesolwsk fbshipit-source-id: 5ede1130c00a2cd3afe301dcb94bcb89e01bc5a2	2018-11-15 12:54:06 -08:00
Bram Wasti	3de0fd846f	Fix converter to accept const NetDef& Summary: convertToNNModule didn't accept `const Netdef&`. fixed this Reviewed By: duc0 Differential Revision: D13057450 fbshipit-source-id: dc6fa2c86077a56b955f15c369b941a2d32de911	2018-11-15 12:18:11 -08:00
Lingyi Liu	5639332a28	fix the deeptext issue (#14005 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/14005 the partial initialization of tensor is no longer supported, we need to fix multiple places Reviewed By: hl475 Differential Revision: D13078206 fbshipit-source-id: a1be2bd2a9f573db54e1366a0d7a17cc2e0db0c9	2018-11-15 12:13:45 -08:00
Jerry Zhang	b8de8f6261	Refactor tensor construction in onnxifi_op Summary: att Reviewed By: ezyang Differential Revision: D13028624 fbshipit-source-id: efd8dee5d59f26830a15bb17211eee373f6c8dee	2018-11-15 11:23:21 -08:00
Peter Goldsborough	464c0c2204	Use realpath for loaded libraries (#13936 ) Summary: I noticed `CDLL` needs an absolute path (when calling `torch.ops.load_library`) zdevito soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/13936 Differential Revision: D13075605 Pulled By: goldsborough fbshipit-source-id: 297c490cfa3bfaf540b95a9c2644d9153abe4c32	2018-11-15 11:23:20 -08:00
Lin Yang	17b2d2d373	fix TensorPrinter when tensor have 0 size. (#13986 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13986 if totoal_count == 0, it crash on: values_stream << tensor_data[total_count - 1]; Reviewed By: jerryzh168 Differential Revision: D13066438 fbshipit-source-id: b7a2d681ca0cf5b68d78872c94fac6de9c5de2dc	2018-11-15 07:51:13 -08:00
Ilia Cherniavskii	4574ea3bec	Make RNN operator handle exceptions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13997 Reviewed By: dzhulgakov, bddppq Differential Revision: D13072518 Pulled By: ilia-cher fbshipit-source-id: c4fd897038b6dca41db652b9e063fc12d98f6d07	2018-11-15 00:48:22 -08:00
Wanchao Liang	6d094224b9	Fix optional import/export, export multi-margin-loss (#13877 ) Summary: This PR did two thing: 1. it fix the optional import/export to include any type including tensor types (previously we only support base types), this is essential to unblock optional tensor type annotation in our test logic 2. it tries to export mult_margin_loss functional to serve as a example of optional undefined tensor use case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13877 Differential Revision: D13076090 Pulled By: wanchaol fbshipit-source-id: c9597295efc8cf4b6462f99a93709aae8dcc0df8	2018-11-15 00:45:22 -08:00
Teng Li	ddbd87e310	Build with -Werror (#13998 ) Summary: Also fixed a warning As a thought while trying to solve #12854 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13998 Reviewed By: pietern Differential Revision: D13078615 Pulled By: teng-li fbshipit-source-id: eb25c429d7dd28b42e4e95740a690d5794a0c716	2018-11-14 22:45:30 -08:00
James Reed	5390ab1d52	Dont crash on 1d convolution (#13999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13999 Temporary mitigation for SEV3 https://our.intern.facebook.com/intern/sevmanager/view/s/168910/ Reviewed By: yinghai Differential Revision: D13075307 fbshipit-source-id: 4df2bcc37b91900653443f7766d5bb080ca3f5a9	2018-11-14 22:38:00 -08:00
Michael Suo	eb024cd1d0	don't throw in matchTypeVariables (#13989 ) Summary: Avoid throwing on match errors. In general, it's not good to throw when failure is expected. But the real reason I'm doing this is it makes it annoying to set a breakpoint on exceptions in my debugger 😛 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13989 Differential Revision: D13069980 Pulled By: suo fbshipit-source-id: 636d4371f8a5be45c935198b73cdea06275b1e9e	2018-11-14 21:45:19 -08:00
Teng Li	20e395a130	Fixed uninitialized warning (#14001 ) Summary: Fixing: https://github.com/pytorch/pytorch/issues/12014 Pull Request resolved: https://github.com/pytorch/pytorch/pull/14001 Differential Revision: D13078583 Pulled By: teng-li fbshipit-source-id: 6c8d663da81bc3e564f0643926d67260df828dd8	2018-11-14 21:37:11 -08:00
Sebastian Messmer	e3bb6ff334	Move c10 dispatcher prototype to c10/ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13690 Reviewed By: dzhulgakov Differential Revision: D12912235 fbshipit-source-id: 974b85790c23335be8130a50aa4692e3ddcd2bf9	2018-11-14 18:04:36 -08:00
Sebastian Messmer	4b0fc5200b	Fix include paths for typeid.h (#13689 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13689 Now that typeid.h lives in c10/util, the include paths should reflect that. Reviewed By: ezyang Differential Revision: D12912237 fbshipit-source-id: e54225f049f690de77cb6d5f417994b211a6e1fb	2018-11-14 18:04:09 -08:00
Edward Yang	72da09bb4d	Canonicalize THD includes with .. in them Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13980 Reviewed By: jerryzh168 Differential Revision: D13062706 fbshipit-source-id: 100e10d1bae7efc3e13f029708c2c1dd053ce074	2018-11-14 17:43:56 -08:00
Michael Suo	7ea9c674bc	migrate subgraph slicing to use `moveBefore/moveAfter` (#13862 ) Summary: Migrate the `CreateAutodiffSubgraphs` pass to use topologically-safe moves instead of DynamicDAG. This is to unify the interface that we use for determining safe node moves to prepare for mutability. The pass looks a lot like GraphFuser now, and there's a lot of code duplication. I plan to pull common stuff out into a "subgraph manipulation utils" thing, but didn't want to clutter this PR. Future steps: - Get rid of code duplication (see above) - Use DynamicDAG to back the `moveBefore/After` calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13862 Differential Revision: D13072871 Pulled By: suo fbshipit-source-id: 92e7880ef444e0aefd51df60964bba7feaf42ae0	2018-11-14 17:33:36 -08:00
Yan Zhu	2356c8d542	device inference for Adam (#13990 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13990 to make sure ITER blob lives on CPU. Reviewed By: xianjiec Differential Revision: D13056070 fbshipit-source-id: 148edbf745e50e886da3eb99d4e485d11c1924e2	2018-11-14 17:21:08 -08:00
Edward Yang	fed8d8975a	Various improvements to hipify_python.py (#13973 ) Summary: - Speed up hipify_python.py by blacklisting useless (and quite large) directory trees that it would otherwise recurse into - Pass around relative paths instead of absolute paths. This makes it easier to do filename matches based on the root of the tree. - Redo the streaming output to contain more useful information - Make it handle c10/cuda correctly, rewrite c10::cuda to c10::hip, and the header name from CUDAMathCompat.h to CUDAHIPCompat.h Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13973 Differential Revision: D13062374 Pulled By: ezyang fbshipit-source-id: f0858dd18c94d449ff5dbadc22534c695dc0f8fb	2018-11-14 17:11:24 -08:00
Gregory Chanan	02152c515e	Ensure nn Losses check scalar vs non-scalar values. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13860 Reviewed By: ezyang Differential Revision: D13029364 Pulled By: gchanan fbshipit-source-id: 20f1330fa181e52aea1f879dc655a9a6f62b5f53	2018-11-14 16:46:27 -08:00
Yinghai Lu	6811e32f03	Support exporting Gather and BatchGather to ONNX (#13987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13987 Gather and BatchGather are also used in sparse network. Reviewed By: bddppq, houseroad Differential Revision: D13067290 fbshipit-source-id: e09572a5c4544768f9e1af48166f7c8d78127e63	2018-11-14 15:40:17 -08:00
Brennan Vincent	7daa829bce	Implement `unsqueeze` for sparse vectors (this also makes `stack` work out of the box) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13760 Differential Revision: D13065342 Pulled By: umanwizard fbshipit-source-id: a5e2e80f87ffbbfdf8759b1b593ef34d290ae907	2018-11-14 15:23:05 -08:00
Pieter Noordhuis	ff4f4a0a35	Retry test on "Address already in use" error (#13911 ) Summary: This fixes #13907. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13911 Differential Revision: D13046256 Pulled By: pietern fbshipit-source-id: bab70cd73ef868e23d4857b06e72830ad29ddb4f	2018-11-14 15:23:03 -08:00
Edward Yang	61a0df5af0	Canonicalize THC/THCTensorMasked.cuh include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13977 Reviewed By: jerryzh168 Differential Revision: D13062564 fbshipit-source-id: 77d42585198cd75bc8a2625787604552e5369787	2018-11-14 14:56:30 -08:00
Edward Yang	01d606e048	Canonicalize TH/THRandom.h include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13975 Reviewed By: jerryzh168 Differential Revision: D13062526 fbshipit-source-id: 510e0ff5ce68c20c2f46bae71efa8e4355c6ce05	2018-11-14 14:56:27 -08:00
Edward Yang	9e1655bb22	Canonicalize THCUNN/linear_upsampling.h include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13979 Reviewed By: jerryzh168 Differential Revision: D13062649 fbshipit-source-id: 28b2cbe97613b485ab11bf35be60ca6ee668bbef	2018-11-14 13:50:30 -08:00
Edward Yang	af6d1ec52c	Canonicalize THCUNN/common.h include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13978 Reviewed By: jerryzh168 Differential Revision: D13062631 fbshipit-source-id: 2b1b13c28ee8be603b0cdca46c7ac7f86317c39f	2018-11-14 13:30:27 -08:00
Edward Yang	a7d43702d4	Canonicalize THCGenerate*.h includes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13976 Reviewed By: jerryzh168 Differential Revision: D13062604 fbshipit-source-id: 48b7e2a2bdf97c55820036db9a4ff18a1f4dbce2	2018-11-14 13:30:25 -08:00
Daya S Khudia	f446c67e2f	submodule update to fix compilation warnings (#13925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13925 Fixing compilation warnings; Already fixed in fbgemm repo so just updating submodule Reviewed By: jianyuh Differential Revision: D13048100 fbshipit-source-id: 568f0f90a5499b6f2cab525b2379299d1565bbae	2018-11-14 13:27:32 -08:00
Lara Haidar-Ahmad	587f769a99	Fix missing symbol linker error when using libtorch generated on windows : (#13672 ) Summary: Libtorch is missing some symbols when generated on windows, causing linker errors when using it. It seems like there were some issues in the past with enabling CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS to export all symbols during the build. (See the link below : - Enabling CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS : https://github.com/pytorch/pytorch/pull/3617?fbclid=IwAR084kOPgLUvYjpJMvGG_Q22IPcvmzlywamytdhxd5U3hELkESO6yM8BGfo - Disabling CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS : https://github.com/pytorch/pytorch/issues/9092?fbclid=IwAR0QSeEcXNh8A1zrgCQvsEq-0S0GJvHBywhZ6kDvoHe6TeRUsTNRzzgXea0 and https://github.com/pytorch/pytorch/pull/9693?fbclid=IwAR2cSya4fbeHvF-BYkXk2NesXjQ3ZWg9vHJ3ivrT9GDJYqHSpg518KAMzW8 ) So enabling CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS is not an option. But some symbols are still missing for Libtorch to be working. We added some functions to TORCH_API in this PR, but we might be missing some. (We also tried adding the whole structure Method (struct TORCH_API Method { ... }) instead of adding the functions separately, but the build fails with a "one or more multiply defined symbols found" error) Do you have any recommendations on how to detect functions that should/shouldn't be in TORCH_API, so the build is successful and the generated Libtorch has all the required exported symbols? I also attached toch_exports_missing.txt, which contains the symbols that are exported with the CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS flag enabled but not in the current Libtorch version. ( by generating the output for both torch.dll libraries with "dumpbin /EXPORTS torch.dll" and comparing both outputs and generating the difference) So any symbol that could be missing from Libtorch should be in this list, but the list has more than 8000 symbols, and I am not sure which ones require to be exported and added to TORCH_API. This PR currently exports the missing symbols for torch::jit::script::Method that appears in the attached list (in the exception of defaultSchemaFor, and emit_call_to that cause a "multiply defined symbols" error). [torch_exports_missing.txt](https://github.com/pytorch/pytorch/files/2558466/torch_exports_missing.txt) Pull Request resolved: https://github.com/pytorch/pytorch/pull/13672 Differential Revision: D12959348 Pulled By: soumith fbshipit-source-id: ef7e85b047b3937dc6aa01ba67e4e01f8eae4eca	2018-11-14 12:00:36 -08:00
Edward Yang	0478d32cb8	Move AlignOf, SmallVector and ArrayRef to c10. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13916 Reviewed By: smessmer Differential Revision: D13046722 fbshipit-source-id: 1583d3170d60e22f0a535cd1fd56bdf928186f5d	2018-11-14 11:13:16 -08:00
Teng Li	4983397c02	Better documentation and warning (#13946 ) Summary: This is to address https://github.com/pytorch/pytorch/issues/12603 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13946 Differential Revision: D13055254 Pulled By: teng-li fbshipit-source-id: 20a206ebd3456eac9dc50584664c4bca3ee955d1	2018-11-14 10:41:46 -08:00
Xiang Gao	143ba72264	Move cosine_similarity to ATen (#12199 ) Summary: I'm now traveling and don't have access to a good computer to compile test by myself. Will see the outcome of CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12199 Differential Revision: D13062326 Pulled By: nairbv fbshipit-source-id: 85873525caa94906ccaf2c739eb4cd55a72a4ffd	2018-11-14 10:41:44 -08:00
Jongsoo Park	53c3a92a50	consistent rounding (#9 ) Summary: Pull Request resolved: https://github.com/pytorch/FBGEMM/pull/9 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13960 The vectorized code was rounding to even in halfway cases with _mm256_round_ps + (_MM_FROUND_TO_NEAREST_INT \|_MM_FROUND_NO_EXC) (see more details in https://software.intel.com/en-us/node/523819), but we were still using std::round in a couple of places which does rounding away from zero in halfway cases. With this diff, we use std::nearbyint in all scalar code (except a few cases where we don't care exact rounding mode and uses rint which is the fastest in general) to be more consistent. nearbyint is the same as what the vectorized code does only when the current rounding mode is FE_TONEAREST but in practice this is OK because we almost always use the default rounding mode FE_TONEAREST. This is inspired by Marat's diff for mobile quantization. Reviewed By: dskhudia Differential Revision: D13017719 fbshipit-source-id: 6b8f99db7ea2e233aa2e3bd2adf622e03ed6258e	2018-11-14 10:21:42 -08:00
Edward Yang	96663edca6	Remove the hip ignore; it conflicts with real in-tree HIP development. (#13972 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13972 Differential Revision: D13062253 Pulled By: ezyang fbshipit-source-id: 4442b194bb08e4f718dff844743d23fd3a6dc8e9	2018-11-14 10:03:19 -08:00
lberrada	35a24a9a94	Example with edge case 0 for torch.sign (#13771 ) Summary: The behavior of the edge case 0 is not self-evident for the `torch.sign` function ( I personally expected a result of 1): ```python >>> a = torch.tensor([0.7, -1.2, 0., 2.3]) >>> a tensor([ 0.7000, -1.2000, 0.0000, 2.3000]) >>> torch.sign(a) tensor([ 1., -1., 0., 1.]) ``` This is not currently documented, I think it is worth it to give a simple example showing this behavior. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13771 Differential Revision: D13044520 Pulled By: ailzhang fbshipit-source-id: c3011ccbdf1c13348f6c7242b06a9aa52ebc9204	2018-11-14 09:16:09 -08:00
Jongsoo Park	dead6632b3	bug fix for 1D conv in NHWC layout (#13813 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13813 Title says it all. Reviewed By: hx89 Differential Revision: D13017652 fbshipit-source-id: e3cea6c7dee2878119d154bb9f3efbc329d7c0d5	2018-11-14 09:16:07 -08:00
Gregory Chanan	4341dd2753	Move most sccalar checks from nn.yaml into THNN/THCUNN code. (#13906 ) Summary: This includes everything in nn.yaml except for convolutions, multi_margin_loss, multi_label_margin_loss, nll_loss, and nll_loss2d. Note that scalar_check False just means we don't do any extra scalar checks (we could elide this from the generated code, which I may do in a later commit). Pull Request resolved: https://github.com/pytorch/pytorch/pull/13906 Reviewed By: ezyang Differential Revision: D13044507 Pulled By: gchanan fbshipit-source-id: ebd3bdca2bcf512ca44de1ce3be81946f6c0828e	2018-11-14 07:58:35 -08:00
Edward Yang	46c0e2c268	Clean up caffe2/tools/build_pytorch_libs.{sh,bat} (#13954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13954 - Remove extra include directories from BASIC_C_FLAGS. We suspect that in some rare cases on Windows, this can cause us to get confused about which header to include. Make this agree with build_pytorch_libs.sh Ditto with BASIC_CUDA_FLAGS - Delete CWRAP_FILES from both places; it's unused in sh, and it's dead in CMAKE - Delete NO_NNPACK in Windows, replace with USE_NNPACK (I'm not sure if this actually does anything on Windows lol) - Delete a bunch of defunct cmake arguments from the build (NOT build_caffe2) target. Reviewed By: soumith Differential Revision: D13056152 fbshipit-source-id: efcc06c65a9f3606666196f3fe5db268844d44d9	2018-11-14 07:42:11 -08:00
Edward Yang	a440629f14	Remove defunct build.sh/THConfig.cmake (#13953 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13953 Differential Revision: D13056128 Pulled By: ezyang fbshipit-source-id: 9fd17f4fe000ac06144b04be996ef6849de2bafa	2018-11-14 07:42:09 -08:00
Edward Yang	fbabe5bf62	Rename c10::detail to c10::impl (#13838 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13838 According to Sebastian, the detail convention is specifically for header-private functionality. That's not what c10/detail is; it's general, library private headers which may be used in multiple places within PyTorch. Rename it to impl to avoid the confusion in nomenclature. Reviewed By: smessmer Differential Revision: D13024368 fbshipit-source-id: 050f2632d83a69e3ae53ded88e8f938c5d61f0ef	2018-11-14 07:39:37 -08:00
Richard Zou	db5aeafa60	Avoid grabbing DeviceGuard in at::empty when possible (#13785 ) Summary: Changed at::empty to allocate the correct amount of memory instead of "allocate 0 memory and then resize it to the necessary size". This leads to a 300 ns speedup for at::empty for a cuda tensor of size (64, 2048). (1790ns -> 1460ns for at::empty). Also does the following: Removes DeviceGuards for: - empty_* functions that end up calling functions that already have a DeviceGuard - t(), which gets called a lot in LSTMs, - Remove one of the two DeviceGuard that at::empty(...) uses. It only needs one for correctness, the other comes from the resize_ implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13785 Reviewed By: ezyang Differential Revision: D13004938 Pulled By: zou3519 fbshipit-source-id: f45b7e6abe06c05d1f81cc53e190c7bab6d1c116	2018-11-14 07:39:35 -08:00
Richard Zou	1e45e7a404	Speed up fusion compiler tensor allocation (#13914 ) Summary: Previously the fusion compiler would allocate an empty tensor and then resize it to the correct size. This PR changes the fusion compiler to allocate a tensor of the correct size the first time around. The difference between these approaches for a single tensor is around 400ns; for something like LSTMCell's FusionGroup that emits 8 outputs this is theoretically a 3us win. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13914 Differential Revision: D13046728 Pulled By: zou3519 fbshipit-source-id: e2f28c0dc2ee5bcfee0efe10610039694691415c	2018-11-14 07:26:27 -08:00
Sebastian Messmer	109dd5b412	Move typeid to c10/util Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13688 Reviewed By: ezyang Differential Revision: D12912240 fbshipit-source-id: 1632172003682f62cea9b8c52596c3c0d8504b23	2018-11-14 02:58:04 -08:00
Teng Li	97036d3c30	FileStore auto deletes file and FileStore::add bug fix (#13708 ) Summary: This addressed: https://github.com/pytorch/pytorch/issues/11874 and we will have the identical file init_method behavior as the previous THD file init. Also the FileStore::add bug is pretty annoying. Two bugs: (1) Add doesn't append to the end of the file. (2) Cache doesn't get updated. Both are fixed and tests are covered. I examined the /tmp to ensure that all temp files are auto deleted after test_c10d.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/13708 Reviewed By: pietern Differential Revision: D12972810 Pulled By: teng-li fbshipit-source-id: 917255390aa52845f6b0ad0f283875a7a704da48	2018-11-14 01:34:22 -08:00
Lu Fang	e2a7d43dfd	Use the torch.proto to store script module (#13736 ) Summary: Directly operate protobuf in the serializer/deserializer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13736 Reviewed By: dzhulgakov Differential Revision: D13028487 Pulled By: houseroad fbshipit-source-id: e578474008874f00f2a22f0a2ffd85f52643881a	2018-11-14 00:22:09 -08:00
Zachary DeVito	2871d3951f	More robust ->match behavior (#13952 ) Summary: Allow schema matching against string literals to work even with white space and other minor differences. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13952 Differential Revision: D13056043 Pulled By: zdevito fbshipit-source-id: 0b502ce8311587308370285f7062914fce34faf0	2018-11-13 23:40:42 -08:00
Junjie Bai	346c418fc9	Add caffe2 clang7 build CI job Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13928 Differential Revision: D13053770 Pulled By: bddppq fbshipit-source-id: 8a015d4d8c3fb6a98b86ce7d7d96c13fc4f0d3f5	2018-11-13 23:12:23 -08:00
Peter Goldsborough	5151d33287	Unflake the ordering enforcement test (#13919 ) Summary: Attempts to unflake the dataloader ordering enforcement test. I think the issue was that the `thread_counter` variable was not atomic. I've made it atomic, and also global just to make it a bit clearer. Fixes https://github.com/pytorch/pytorch/issues/13634 colesbury SsnL ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13919 Differential Revision: D13051718 Pulled By: goldsborough fbshipit-source-id: b9f7f6317701a8b861a1d5c6a9b2b17b44782561	2018-11-13 21:05:02 -08:00
Ashish	f4e502a8c5	Added MIOpen conv transpose op (#13938 ) Summary: This pull request contains changes for: 1. Removing ConvTranspose related changes from caffe2/operators/hip/conv_op_miopen.cc 2. Adding the file caffe2/operators/hip/conv_transpose_op_miopen.cc 3. Modifying the tests to run convTranspose op using MIOpen engine Differential Revision: D13055099 Pulled By: bddppq fbshipit-source-id: ca284f8f9a073005b22013c375cc958257815865	2018-11-13 21:01:52 -08:00
Xiaodong Wang	5059beb644	Change assert --> CUDA_ASSERT_KERNEL to avoid hip undefined __assert_fail (#13902 ) Summary: Change assert --> CUDA_ASSERT_KERNEL to avoid hip undefined __assert_fail() Otherwise crash trace: ``` caffe2/caffe2/operators/hip/top_k_radix_selection_hip.cuh:409:7: error: '__assert_fail': no overloaded function has restriction specifiers that are compatible with the ambient context 'gatherTopK' assert(writeIndex < outputSliceSize); ^ glibc/include/assert.h:88:6: note: expanded from macro 'assert' : __assert_fail (#expr, __FILE__, __LINE__, __ASSERT_FUNCTION)) ^ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13902 Reviewed By: bddppq Differential Revision: D13042820 Pulled By: xw285cornell fbshipit-source-id: 5117f6946db8109ae35e644e7423c8456e65e61f	2018-11-13 20:55:50 -08:00
jario-jin	0bedaf9cf6	Update setup.py to support Nvidia TX2 (#13939 ) Summary: add platform.machine() == 'aarch64' for supporting Nvidia TX2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13939 Differential Revision: D13055834 Pulled By: soumith fbshipit-source-id: 0fadc87adf9e6b796978ce743e824eb98b006856	2018-11-13 20:10:35 -08:00
Edward Yang	79ec5de3fc	Add some more files to gitignore. (#13924 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13924 Differential Revision: D13047983 Pulled By: ezyang fbshipit-source-id: bb2a8aa747d0c8195084c650006518df2a00daab	2018-11-13 19:02:57 -08:00
Sam Gross	c3680e2b19	Fix sum() on fp16 (#13926 ) Summary: The size of the shared and global memory buffers were incorrect for float16. They were sized based on float16 elements, but the buffers store intermediate float32 values. Fixes #13909 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13926 Differential Revision: D13048334 Pulled By: colesbury fbshipit-source-id: 5a07df53f1152d5920258e91ed3f1e1de89b29e1	2018-11-13 16:50:36 -08:00
Junjie Bai	3002cb2ad0	Revert D13007266: [codemod][caffe2] Tensor construction: combine Resize+mutable_data - 2/4 Differential Revision: D13007266 Original commit changeset: a9f0427a11db fbshipit-source-id: c23bb511bb26108405b7e8622377fc18573d4311	2018-11-13 16:44:33 -08:00
Junjie Bai	76d8979afe	Revert D13007287: [codemod][caffe2] Tensor construction: combine Resize+mutable_data - 3/4 Differential Revision: D13007287 Original commit changeset: c89a24458e04 fbshipit-source-id: 74d3fe310f1f551e2f52c6e3d9a744a47767b4b1	2018-11-13 16:41:53 -08:00
Junjie Bai	fbd50bbfb9	Revert D13007246: [codemod][caffe2] Tensor construction: combine Resize+mutable_data - 1/4 Differential Revision: D13007246 Original commit changeset: 230de42a3843 fbshipit-source-id: 40ce266826f00d320f7215169188ef4ead232660	2018-11-13 16:41:52 -08:00
Zachary DeVito	30676bdcd3	Finish up TODOs in python printer (#13879 ) Summary: * Correctly adds annotate when needed for lists * Parser/Emitter handles octal escapes so we do not fail for some strings. * more complete keyword list in pretty printer * floating point numbers are always printed with a decimal to ensure we never mistake them in parsing Pull Request resolved: https://github.com/pytorch/pytorch/pull/13879 Differential Revision: D13037860 Pulled By: zdevito fbshipit-source-id: f09ab174fc33402a429b21a5bfaf72e15c802cad	2018-11-13 16:39:46 -08:00
Peter Goldsborough	8311bbee7f	Fix Windows build and test in CI (#11716 ) Summary: This PR adds Windows support for the C++ frontend. A lot of declarations were missing `TORCH_API` macros, and lots of code just did not compile on MSVC. ebetica ezyang orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/11716 Reviewed By: orionr Differential Revision: D13038253 Pulled By: goldsborough fbshipit-source-id: c8e5a45efd26117aeb99e768b56fcd5a89fcb9f8	2018-11-13 16:35:54 -08:00
Elias Ellison	f649d8b3a9	add floordiv and bitwise ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13873 Reviewed By: driazati, wanchaol Differential Revision: D13033709 Pulled By: eellison fbshipit-source-id: df7edee0f790038fb2a806d20640ad25c70b50eb	2018-11-13 16:32:22 -08:00
Yan Zhu	7c1fe17288	fix UnpackSegments cuda op (#13917 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13917 There is a bug in UnpackSegments cuda op when setting "max_length". "buck test mode/opt //caffe2/caffe2/python/operator_test:pack_ops_test -- test_pack_with_max_length_ops" fails on trunk. This diff fixed this bug. Reviewed By: xianjiec Differential Revision: D13045106 fbshipit-source-id: 4d640d61405bb86326dc33c81145824060cf987e	2018-11-13 15:38:58 -08:00
Yinghai Lu	cd49afce64	Allow attaching additional net info when supplying the benchmark net (#13820 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13820 We would like to provide an option to show additional info of the net to be benchmarked. Reviewed By: highker, rdzhabarov Differential Revision: D13018219 fbshipit-source-id: d3ec69901bdae58117a482ddd2c327b0f8cf7cb6	2018-11-13 15:08:25 -08:00
Shuting Wang	23e19ebfa7	add non expotential emphasis loss to Lambdarank Summary: Currently Lambdarank applies exponential emphasis on relevance, i.e., g=2^rel when calculating dcg, this diff adds options that supports g=rel in the loss function. Reviewed By: itomatik Differential Revision: D9891514 fbshipit-source-id: 64730d467a665670edd37e6dc1c077987991d1a8	2018-11-13 14:54:04 -08:00
Teng Li	dfa4767754	Update nccl submodule to latest (#13921 ) Summary: This should include fix to the issue: https://github.com/NVIDIA/nccl/issues/153 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13921 Differential Revision: D13048999 Pulled By: teng-li fbshipit-source-id: a83f3bbb004f4a4137d187a010c7ec6b48f27eeb	2018-11-13 14:22:39 -08:00
Sam Gross	c46dd5163f	Temporarily disable part of test_spectral_norm (#13908 ) Summary: See #13818 for suggestions about a long-term fix Pull Request resolved: https://github.com/pytorch/pytorch/pull/13908 Differential Revision: D13047262 Pulled By: colesbury fbshipit-source-id: 0f29bd5b659bb97826381abbc305fb8a25b131ed	2018-11-13 14:19:16 -08:00
David Riazati	5163a28917	Convert more weak functions (#13707 ) Summary: Convert some more functions to match up with features added. Some conversions were unsuccessful but the type line was left in for later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13707 Differential Revision: D13030210 Pulled By: driazati fbshipit-source-id: 02d5712779b83b7f18d0d55539e336321335e0cc	2018-11-13 13:50:57 -08:00
David Riazati	53bc5fb043	Support nn.Sequential in script (#13889 ) Summary: This PR makes weak modules in `nn.Sequential` get properly compiled when used Pull Request resolved: https://github.com/pytorch/pytorch/pull/13889 Differential Revision: D13039559 Pulled By: driazati fbshipit-source-id: d3266305f0e206b2a19b63230ac2ab8f02faa603	2018-11-13 13:48:58 -08:00
Thomas Viehmann	5cfccd76e6	Jit load error msg (#13894 ) Summary: When loading a non-existant / non-openeable file, the current error message is ``` Expected to read 8 bytes but got %llu bytes0 ``` This - fixes two ASSERTM formatting calls (including the above), - throws a more specific error message if the ifstream constructor sets `.fail`. Here is someone apparently confused by the current message: https://github.com/facebookresearch/maskrcnn-benchmark/pull/138#issuecomment-437848307 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13894 Differential Revision: D13043228 Pulled By: soumith fbshipit-source-id: b348b482c66d5e420874ae6e101b834106b89e82	2018-11-13 12:33:31 -08:00
Jerry Zhang	283062f574	Tensor construction: combine Resize+mutable_data - 2/4 (#13852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13852 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13007266 fbshipit-source-id: a9f0427a11dbe084a30837aa32da67c9302cbc6c	2018-11-13 12:28:35 -08:00
Jerry Zhang	e030ee8197	Tensor construction: combine Resize+mutable_data - 3/4 (#13854 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13854 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13007287 fbshipit-source-id: c89a24458e0428485402b3eb23519a92804d768e	2018-11-13 12:28:33 -08:00
Jerry Zhang	9d36c37bdb	Tensor construction: combine Resize+mutable_data - 1/4 (#13853 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13853 Codemod generated with clangr shard mode, 25 files per diff, motivation: https://github.com/pytorch/pytorch/pull/12407 Reviewed By: smessmer Differential Revision: D13007246 fbshipit-source-id: 230de42a3843d71599e812d5511f52f3af47f59b	2018-11-13 12:26:02 -08:00
Sebastian Messmer	96a01f82d1	Remove unnecessary include (#13878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13878 This removes a dependency to another header to simplify moving this header to c10. Also fix some include paths to prepare that move Reviewed By: ezyang Differential Revision: D13036478 fbshipit-source-id: cbddb5281498256fddcbebce61aa606c51b7b8d7	2018-11-13 12:18:28 -08:00
Edward Yang	60a85857dd	s/CAFFE_ENFORCE_WITH_CALLER/AT_ASSERTM/ (#13829 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC sinkingsugar Pull Request resolved: https://github.com/pytorch/pytorch/pull/13829 Differential Revision: D13019452 Pulled By: ezyang fbshipit-source-id: cf8b58b25a484720d9a612df6dd591c91af6f45a	2018-11-13 11:24:51 -08:00
Xiaodong Wang	561bc09026	Remove CUDNN_BATCHNORM_SPATIAL_PERSISTENT mode for accuracy (#13844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13844 In S163230, we've found CuDNN 7 upgrade causes accuracy drop in training convolution network such as ResNeXt-101 (~0% accuracy), and video R(2+1)D (65 --> 63%). We've fixed this in Caffe2 D9601217, and we should do the same to ATen as well. Reviewed By: ezyang Differential Revision: D13025486 fbshipit-source-id: 04f4f0d9af6287b0400ca1842fb2cdac1f8cdb70	2018-11-13 11:17:16 -08:00
Michael Carilli	0d2762e876	Minor fix to reenable nvtx sequence numbers for the forward methods of custom (Python) autograd functions (#13876 ) Summary: Some of our arch people (mkolod, Aditya Agrawal, kevinstephano) notified me that the sequence number annotations weren't showing up for forward methods of custom autograd functions, which was breaking their nvprof dump parsing. Two one-line fixes in the appropriate code paths. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13876 Differential Revision: D13042381 Pulled By: ezyang fbshipit-source-id: a114118f5c07ad4ba482e7a4892d08805b23c65b	2018-11-13 11:10:32 -08:00
Jerry Zhang	266bb8bf30	FeedTensor returns a Tensor (#13641 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13641 FeedTensor function used to take a pointer to Tensor and feed the content using Resize and mutable_data, but since Tensor is a pointer now, we can just return a Tensor instead. Reviewed By: ezyang Differential Revision: D12873145 fbshipit-source-id: 653735c20d611ff6ac9e380d8b3c721cb396a28f	2018-11-13 10:50:32 -08:00
Wanchao Liang	98b450deb9	Clean optional undefined tensor syntax in ATen yaml files and codegen (#13871 ) Summary: Previously there're multiple undefined tensor syntax exists in ATen definition files, this PR make all follows the same "?" syntax Pull Request resolved: https://github.com/pytorch/pytorch/pull/13871 Differential Revision: D13033486 Pulled By: wanchaol fbshipit-source-id: 7673bc22d08cd6975503deb51fba47ada6bc5156	2018-11-13 10:37:42 -08:00
Jie	bbc7412615	(#13765 ) Summary: fix cuda native batch norm for small feature planes. 1. fixed warp reduction divergent call of WARP_SHFL_XOR, causes hang with CUDA_ARCH > 7.0 2. split Normalization.cu into two files for code reuse, preparation for sync BN Pull Request resolved: https://github.com/pytorch/pytorch/pull/13765 Differential Revision: D13043331 Pulled By: soumith fbshipit-source-id: bf8565bff6ba782475ad0e4be37ea53c8052eadf	2018-11-13 10:14:37 -08:00
Edward Yang	8559fcf791	Unpin Sphinx. (#13831 ) Summary: Sphinx 1.8.2 is released, per https://github.com/sphinx-doc/sphinx/issues/5419 Fixes #11618 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13831 Differential Revision: D13020339 Pulled By: ezyang fbshipit-source-id: 4c7f3aff172efd3aca54ef48ac9052989cce5e4c	2018-11-13 09:45:12 -08:00
Stuart Golodetz	f6e4fc071a	Fix a bug that causes nvcc to emit an unknown option error (#13904 ) Summary: Using `"-Xcompiler -fPIC"` causes nvcc to emit the following: nvcc fatal : Unknown option 'Xcompiler -fPIC' As per fixes lower down in the file (see also issue #7126 on GitHub), the fix is to replace it with `"-Xcompiler" "-fPIC"`. This one was apparently missed when the original fix was applied. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13904 Differential Revision: D13043189 Pulled By: soumith fbshipit-source-id: 6dc6d325671e4d08cd8e6242ffc93b3bd1f65351	2018-11-13 09:41:44 -08:00
Taekin Kim	f112aa746a	Fix document about torch.get_default_dtype() (#13890 ) Summary: Minor fix. ``` torch.get_default_dtype() → :class:`torch.dtype` ``` → ``` torch.get_default_dtype() → torch.dtype ``` :class: is not rendered in https://pytorch.org/docs/stable/torch.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/13890 Differential Revision: D13040704 Pulled By: colesbury fbshipit-source-id: 5fadb01ad365042d5df2bac058f4ae89b281d3b7	2018-11-13 09:25:32 -08:00
Gregory Chanan	a83a1544b1	Move device_guard from _th_ functions to the wrapper. (#13842 ) Summary: This is what we would want to check in anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13842 Differential Revision: D13025463 Pulled By: gchanan fbshipit-source-id: d1ff9b10f4adc811bbd3db15b440ed00c16c82d1	2018-11-13 08:03:36 -08:00
Richard Zou	e43fb1d26d	Fix cuda out of memory test (#13864 ) Summary: torch.randn(big_number_here, dtype=torch.int8) is wrong because randn isn't implemented for torch.int8. I've changed it to use torch.empty instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13864 Differential Revision: D13032130 Pulled By: zou3519 fbshipit-source-id: d157b651b47b8bd736f3895cc242f07de4c1ea12	2018-11-13 07:30:30 -08:00
Jongsoo Park	7f002008f1	remove ShouldFp32FallbackToNCHW (#13814 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13814 D10333829 implemented 3D conv in NHWC in fp32 ops so int8 ops don't need special handling anymore. Reviewed By: hx89 Differential Revision: D13017666 fbshipit-source-id: 41df449f5e21c4c7134cc5c480e559f8c247069b	2018-11-13 00:52:41 -08:00
Yinghai Lu	a7eee0a1e9	Add Reshape if there is add_axis when exporting C2 concat (#13798 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13798 The semantics of C2 and ONNX Concat is a bit different. C2 concat accepts "add_axis" arg and will raise the dim if so. It's equivalent of attaching a Reshape after plain concat in ONNX. Reviewed By: rdzhabarov Differential Revision: D13012867 fbshipit-source-id: da23e555bae709fd2a373b04dcb9db4e984ae315	2018-11-12 22:27:49 -08:00
Ailing Zhang	a17c0118a5	fix stability in bce with pos_weight formula (#13863 ) Summary: Fixes #13773 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13863 Differential Revision: D13031803 Pulled By: ailzhang fbshipit-source-id: 6c9e044f0450eebf4555bbc02c125713d9378e2f	2018-11-12 22:04:24 -08:00
Jongsoo Park	0bfbdcac89	fix bug in D13017777 Summary: Mistakenly created an infinite recursive call. (Note: this ignores all push blocking failures!) Reviewed By: jianyuh Differential Revision: D13038053 fbshipit-source-id: 8b760cb73b5369647d8ef651b8c196ac3f7af04d	2018-11-12 21:57:31 -08:00
Johannes M Dieterich	ce48958606	enable more unit tests (#13166 ) Summary: This enables the distributions and utils test sets for ROCm. Individual tests are enabled that now pass due to fixes in HIP/HCC/libraries versions in white rabbit. For attention: bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13166 Differential Revision: D12814759 Pulled By: bddppq fbshipit-source-id: ea70e775c707d7a8d2776fede6154a755adef43e	2018-11-12 18:49:52 -08:00
Xiaomeng Yang	cec3455a8b	Add gitignore item for YCM config Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13805 Reviewed By: yinghai Differential Revision: D13031332 Pulled By: bddppq fbshipit-source-id: 279b7bb8879e49eef8abed51dc30b4b7ea0a2fa9	2018-11-12 16:58:56 -08:00
Jesse Hellemn	1600649792	Fix for nightly builds (#13779 ) Summary: Being tested on nightlies manually. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13779 Reviewed By: yinghai Differential Revision: D13001930 Pulled By: pjh5 fbshipit-source-id: 954eaabe052914b7b23c74e922666bf9dbfb630a	2018-11-12 16:38:14 -08:00
Bram Wasti	b052fe6c2f	Upgrade DLPack Summary: Needed to use TVM Reviewed By: ajtulloch Differential Revision: D12994038 fbshipit-source-id: f0b6c48a43a87fac37fcef73b78026d8384cd022	2018-11-12 15:59:46 -08:00
Bram Wasti	8480fe0105	Fix up creation of unique data nodes Summary: There was a bug in the uniqueness check that only made the first run unique Reviewed By: duc0 Differential Revision: D13013504 fbshipit-source-id: ecf7526d0fafd7968f1301734123f93968efef46	2018-11-12 15:37:08 -08:00
Will Feng	03c0f4fbe7	Use RNG mutex for randperm on CPU (#13832 ) Summary: When we added `randperm_cpu` and `THTensor_(randperm)` we forgot to lock the `THGenerator` mutex before calling `THRandom_random`, which causes segfault error mentioned in https://github.com/facebookresearch/maskrcnn-benchmark/pull/93#issuecomment-435479043. This PR fixes the bug. Closes https://github.com/pytorch/pytorch/issues/1868. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13832 Differential Revision: D13025453 Pulled By: yf225 fbshipit-source-id: 6e363a35c72b4862412eaea6516a154126634c9d	2018-11-12 15:27:41 -08:00
Will Feng	fc79f70f9a	CircleCI: Add Linux CUDA 10 build (#13858 ) Summary: Moving CUDA 10 build to CircleCI so that we have one less job running on Jenkins. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13858 Differential Revision: D13031916 Pulled By: yf225 fbshipit-source-id: 57aa54941d7f529e7094c8d037b836ec2fb6191c	2018-11-12 15:07:34 -08:00
Max Katsev	8de9564c12	Fix gcc-7 build in caffe2/caffe2/quantization/server/activation_distribution_observer.cc (#13799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13799 Fix broken operator= Reviewed By: jspark1105 Differential Revision: D13014333 fbshipit-source-id: 6075906ecf0735bd9a74d57108036a33e1575df8	2018-11-12 14:52:51 -08:00
CircleCI	f1a2bc4eae	Corrected python lib path on windows to be consistent with Linux (#13848 ) Summary: The python lib path on Windows was set to an incorrect path. This fixes it to be consistent with Linux. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13848 Differential Revision: D13030945 Pulled By: soumith fbshipit-source-id: 7fb9013ffe66cff98018aea25fdb5cda03cbceb1	2018-11-12 14:39:55 -08:00
Johannes M Dieterich	53a3c46950	Switch to packaged Thrust on Ubuntu, enable CentOS 7.5 as a CI target (#12899 ) Summary: 1) Use the hip-thrust version of Thrust as opposed to the GH master. (ROCm 267) 2) CentOS 7.5 docker (ROCm 279) * Always install the libraries at docker creation for ubuntu. * Add Dockerfile for CentOS ROCm * Enable the centos build * Source devtoolset in bashrc * Set locales correctly depending on whether we are on Ubuntu or CentOS * Install a newer cmake for CentOS * Checkout thrust as there is no package for CentOS yet. PyTorch/Caffe2 on ROCm passed tests: https://github.com/ROCmSoftwarePlatform/pytorch/pull/280 For attention: bddppq ezyang Docker rebuild for Ubuntu not urgent (getting rid of Thrust checkout and package install is mainly cosmetic). If docker for CentOS 7.5 is wanted, build is necessary. Build of PyTorch tested by me in CentOS docker. PyTorch unit tests work mostly, however, a test in test_jit causes a python recursion error that seems to be due to the python2 on CentOS as we haven't ever seen this on Ubuntu - hence please do not enable unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12899 Differential Revision: D13029424 Pulled By: bddppq fbshipit-source-id: 1ca8f4337ec6a603f2742fc81046d5b8f8717c76	2018-11-12 14:39:54 -08:00
Pieter Noordhuis	1caa341c68	Add torch.multiprocessing.spawn docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13846 Differential Revision: D13029595 Pulled By: pietern fbshipit-source-id: b733b00f7070c18535c31801f20e6e717eec7748	2018-11-12 14:39:52 -08:00
Michael Suo	1a0cb08918	allow `Node::isAfter` to work across blocks (#13855 ) Summary: Extend `isAfter` to work for nodes in different blocks. This is useful if we want to ask a question like "are any of the uses of value `v` after this node", since uses may be inside inner blocks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13855 Differential Revision: D13030528 Pulled By: suo fbshipit-source-id: f681405396f3ec68eec1a2cb92e40873921a4b78	2018-11-12 14:39:50 -08:00
David Brownell	75bf877534	Preventing error where ninja build files are overwritten when invokin… (#13698 ) Summary: …g clean and build together Pull Request resolved: https://github.com/pytorch/pytorch/pull/13698 Differential Revision: D13030905 Pulled By: soumith fbshipit-source-id: 234576ac92e0aa8c2d2409958d3cf85eb29ed1f3	2018-11-12 14:39:48 -08:00
Elias Ellison	686e83223f	add ops between float & int, and change list equality output to be a boolean Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13793 Reviewed By: wanchaol Differential Revision: D13010872 Pulled By: eellison fbshipit-source-id: 2c8248f30b51eab1a87290711f99b7ceb6df2009	2018-11-12 14:39:47 -08:00
Pieter Noordhuis	e3839dfc35	Add matplotlib to docs/requirements.txt (#13828 ) Summary: Used in docs/source/scripts/build_activation_images.py. Don't know if we need a specific version. I installed the latest version (3.0.2) and that works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13828 Differential Revision: D13030294 Pulled By: pietern fbshipit-source-id: b4e7b381182036645924453a1e2abb719090bbc4	2018-11-12 13:43:07 -08:00
Junjie Bai	5bf14c23b7	Bump Caffe2 docker images to version 230 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13857 Differential Revision: D13029637 Pulled By: bddppq fbshipit-source-id: 73c4a0f3d39257a2312b36c9dd55dc001067d9c4	2018-11-12 13:26:23 -08:00
Jongsoo Park	309cc76469	BaseType:: -> this-> (#13817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13817 gcc7 doesn't like BaseType::func<..>() . Should use this->func<...>() Reviewed By: hx89 Differential Revision: D13017777 fbshipit-source-id: 0cf68d459b44379b1c103cf74382857db9a91bef	2018-11-12 12:51:12 -08:00
zrphercule	6093f29409	Update coverage info (#13788 ) Summary: Right now we dont have coverage info of how many pytorch operators can be exported to onnx. This pr will add torch.nn operators to it, while later functional modules will be added as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13788 Differential Revision: D13010448 Pulled By: zrphercule fbshipit-source-id: 19349cabaeff42fda3620bb494f7ec4360d96b76	2018-11-12 12:39:12 -08:00
Duc Ngo	d8f35c42be	nomnigraph - easy - support blob renaming (#13845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13845 Support renaming a blob in nomnigraph Reviewed By: itomatik Differential Revision: D13026762 fbshipit-source-id: fc8cecb4562a6c618ce5c8e2ff79a2a282a8ff09	2018-11-12 12:32:10 -08:00
David Riazati	0c375571f5	Support OptionalType export and type match (#13647 ) Summary: * Adds `OptionalType` support for import/export * Optionals get exported along with their contained type, i.e. 'Optional[int]' * Allows concrete types and `None` to be passed to an op that takes an optional * Converts `softmax` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13647 Differential Revision: D12954672 Pulled By: driazati fbshipit-source-id: 159e9bfb7f3e398bec3912d414c393098cc7455a	2018-11-12 12:15:25 -08:00
Richard Zou	bf00008aa1	Use SmallVector for TensorImpl sizes and strides. (#13649 ) Summary: This removes dynamic allocations for sizes/strides for tensors with <= 5 dims. This should cover the most common tensor use cases; we use a lot of 4D tensors in images (N, C, H, W) and LSTMs use tensors with 3 or fewer dims. Benchmarking results can be found here: https://gist.github.com/zou3519/ce4182722ae7e2a228bc8b57ae60b0e9 The quick summary is that this PR: - makes aten LSTM's forward pass ~1ms faster and improves JIT lstm perf as well - Tensor as_strided is now 200ns faster for dimensions <= 5 - at::empty performance is 200ns slower for dimensions > 5. For dims <= 5, there is no noticeable perf change. - Variable ops are 200-500ns faster because Variables never used their sizes/strides fields in the first place. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13649 Differential Revision: D12950409 Pulled By: zou3519 fbshipit-source-id: 0bd87ec9f712ddc0d533a347d781e3a91a954b90	2018-11-12 10:40:32 -08:00
Zachary DeVito	aef9e76283	Get pretty printer ready for use as a serialization format (#13616 ) Summary: Get pretty printer ready for use as a serialization format This PR adds a bunch of functionality to the pretty printer (now called python_printer to reflect the fact that it will be used to output valid python source). The idea is to get the printer ready for use as serialization format. This PR does not have tests beyond what the pretty printer already had. PRs stacked on this one will do round-trip export/import to test this functionality more robustly. Notes: * PythonPrinter is an evolution of the original pretty printer. However, much of it has changed so it is best just to read it as a new implementation. Trying to correlate it to the original implementation is probably not much help. * The printer tries to get reasonably close to how the original function was likely written, such as writing expressions rather than making intermediates when possible. We may decide to turn this off for the actual serialization, but it is useful for pretty printing. * tensor field access was changed so that prim::device and family have schema * fixed a bug in the compiler where setUniqueName gets called even when a value already has one. this sometimes assigned really poor names to graph inputs * Graph::insert gains an optional range argument to make range-preserving inserts easier. * prim:: ops that can have schema now have schema. This is because when we parse them back in, we will need the schema to correctly set their output types. * there is code in the python printer to complain if you try to add a prim op and do not update the printer. * BuiltinModule is generalized to take an operator namespace and a version number for work in future commits. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13616 Reviewed By: goldsborough Differential Revision: D13008252 Pulled By: zdevito fbshipit-source-id: 32b33bc6410d6ca1c6f02bd6e050f8d5eea32083	2018-11-12 10:21:30 -08:00
Gregory Chanan	b7a7ab364b	Improve mm / addmm error message with sparse tensors (#13796 ) Summary: and write derivatives in terms of native functions. This is the same as https://github.com/pytorch/pytorch/pull/13648 but has a fix for the canonicalize op jit pass to propagate shape information. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13796 Reviewed By: ezyang Differential Revision: D13012281 Pulled By: gchanan fbshipit-source-id: 88d0d91e72b5967c51ff865350fcbdd7ffed92ef	2018-11-12 07:16:47 -08:00
laurent	8752214fb7	Apply weight-decay before momentum in the SGD optimizer. (#13801 ) Summary: While trying to understand why two implementations of the same model, one in Python, one using the C++ api (via some [ocaml wrappers](https://github.com/LaurentMazare/ocaml-torch)) did not perform equally well, I noticed that the Python and C++ implementation of SGD slightly differ on weight decay. - In the [Python version](https://github.com/pytorch/pytorch/blob/master/torch/optim/sgd.py#L91-L93) weight decay is applied before momentum (and so momentum applies to the weight decay). - In the C++ implementation the weight decay is applied after momentum. In the couple computer-vision models I have looked at the Python version performs a little better so this PR tweaks the C++ implementation to perform weight-decay before momentum. This is possibly caused by having more regularization - maybe increasing the weight decay while keeping the current code would hold the same improvements however a nice advantage of this change is to put the C++ and Python version in line. After this change my Python and C++/ocaml models performed similarly when using the same weight-decay parameter. Maybe there was some real reason to have weight decay after momentum in the C++ version but I haven't found any. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13801 Differential Revision: D13020709 Pulled By: soumith fbshipit-source-id: 7c2ac245577dd04bc3728aec4af0477120a60f13	2018-11-11 23:54:50 -08:00
Gregory Chanan	7e8572be2d	Change method-only _th_ prefix Declarations to functions. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13754 Reviewed By: ezyang Differential Revision: D12988489 Pulled By: gchanan fbshipit-source-id: b62bb9288f67d72320925c36283f6ce6cbf95d20	2018-11-11 15:47:06 -08:00
Yan Zhu	003f97cefa	fc layer accept axis argument (#13822 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13822 as title Reviewed By: xianjiec Differential Revision: D12996338 fbshipit-source-id: 1aa61e71e2d79535325ea7034c82e1cb6bf3a9f6	2018-11-11 13:44:57 -08:00
Edward Yang	e35418b3be	New implementations of DeviceGuard, StreamGuard and MultiStreamGuard (with CUDA specializations) (#13342 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13342 This PR introduces a few new concepts: - DeviceGuardImplInterface, and implementations for CPU and CUDA, which provide a generic interface for interfacing with device and stream state, without requiring a direct dependency on the code in question. - InlineDeviceGuard, a general template for generating both specialized and dynamically dispatched device guard implementations. Dynamic dispatch is done by specializing it on a VirtualGuardImpl. - Provide a device-independent DeviceGuard class, which can be used even from CPU code. It uses the aforementioned dynamic dispatch. - CUDA-specialized CUDAGuard class, which doesn't have a dynamic dispatch but can only be used from CUDA. - StreamGuard, which is the same as above, but for streams rather than devices. - Optional variants of all the aforementioned guards, which are a no-op if no device/stream is specified - CUDAMultiStreamGuard, specifically for the case when we want to set a device on every guard. There are some subtle semantic changes, which have been thoroughly documented in the class definition. BC-breaking changes: - Move constructor/assignment have been removed from all device guard implementations. - In some cases where you previously wrote 'set_device' (or 'set_stream'), you now must write 'reset_device', because if you switch devices/device types, the stream/device on the previous device is unset. This is different from previous behavior. - CUDAGuard no longer handles streams, or multiple streams. Use CUDAStreamGuard or CUDAMultiStreamGuard as appropriate for your use case. Reviewed By: dzhulgakov Differential Revision: D12849620 fbshipit-source-id: f61956256f0b12be754b3234fcc73c2abc1be04e	2018-11-11 12:11:10 -08:00
Jongsoo Park	4b86a215ca	moving simd adagrad code to perfkernels (#13549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13549 caffe2/perfkernels has a nice framework to switch btw implementations optimized for different instructions at runtime. This can be a good preparation to implement avx512 adagrad kernels. Reviewed By: hyuen Differential Revision: D12882872 fbshipit-source-id: a8f0419f6a9fd4e9b864c454dad0a80db267190c	2018-11-11 00:20:39 -08:00
Yinghai Lu	d97ac82bf5	Back out "Revert D12967258: Support more data types in ONNXIFI transform" (#13812 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13812 Original commit changeset: 2cf95bdc5ed8 Looks like in iOS, `uint64_t` is not the same as `size_t`. :( Fixed it here. Reviewed By: houseroad Differential Revision: D13017390 fbshipit-source-id: d33854ce341225aba372fb945c3704edc14f9411	2018-11-10 20:00:34 -08:00
Pieter Noordhuis	786f9ba6ea	Remove potential infinite loop from test_c10d.py (#13816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13816 If common.find_free_port() returns the same port over and over again, and the TCPStore fails to bind to it over and over again, this function has the potential to loop forever. If we can't find a free port after 10 tries, we are safe to assume something is wrong... Differential Revision: D13017700 fbshipit-source-id: 2139a0ea0f30ce08b5571f80ae0551f1fa7ba4a2	2018-11-10 17:58:13 -08:00
Pieter Noordhuis	c3603301d7	Fix race condition in TCPStoreDaemon initialization (#13815 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13815 If the TCPStoreDaemon was constructed and destructed shortly after, it was possible for the controlPipeFd_ to get initialized by the background thread after the stop() function was already called. Then, the destructor hangs on waiting for the thread to terminate, when the termination signal (closing the write side of the control pipe) will never happen. Differential Revision: D13017697 fbshipit-source-id: 9528286fbfc773237990f1a666605d27bac2c0e5	2018-11-10 17:54:21 -08:00
Thomas Viehmann	4c3b76c402	Add std::string to the getTypePtr for JIT inference of custom op types (#13683 ) Summary: This allows custom ops to take string parameters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13683 Differential Revision: D13017010 Pulled By: soumith fbshipit-source-id: 7c40aca7f57ba3f8812d34bc55828ff362c69bd2	2018-11-10 12:58:53 -08:00
Soumith Chintala	7c02f285dc	Revert D12967258: Support more data types in ONNXIFI transform Differential Revision: D12967258 Original commit changeset: 688076e6f504 fbshipit-source-id: 2cf95bdc5ed8f1e13646bc5cf8139bdc516861d7	2018-11-10 12:34:31 -08:00
Yinghai Lu	5923d76f96	Support more data types in ONNXIFI transform (#13745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13745 We need to support types beside `int64` and `float`. Reviewed By: bddppq, rdzhabarov Differential Revision: D12967258 fbshipit-source-id: 688076e6f504b2bf24bba89714df87a678c5638a	2018-11-10 10:41:01 -08:00
Yan Shang	c85463fc74	Allow Gather to handle empty data (#13781 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13781 allow Gather Op to handle empty data. Reviewed By: intermilan Differential Revision: D13001267 fbshipit-source-id: 633c8471b637c56be8f6574f9bf9430785073977	2018-11-10 10:00:47 -08:00
iotamudelta	4f622c26b9	fix ffs intrinsic for long long (ROCm 290) (#13804 ) Summary: * Switch to __ffsll in Embedding which is the correct intrinsic here. * Fix WARP_BALLOT and ffsll in LookupTable as well. Fix comes from iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/13804 Differential Revision: D13016184 Pulled By: bddppq fbshipit-source-id: 2287a78ee9e592630336a073ad1e55a90e1f946d	2018-11-10 02:02:43 -08:00
James Sun	d02781a2ef	Make InterpresterStateImpl a intrusive_ptr_target (#13784 ) Summary: InterpresterStateImpl con continue its lifecycle by increment the ref count itself. This patch also removes InterpresterState::clone() interface that conflicts with intrusive_ptr_target that disallows copy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13784 Differential Revision: D13015451 Pulled By: highker fbshipit-source-id: a05f1ea6549d52ec693ccffefaa4d520b2474b8c	2018-11-09 23:39:18 -08:00
Michael Suo	079e86a915	schematize some prim ops (#13790 ) Summary: We're relying on the default function schema (which contains no argument information) in places where we don't need to. This is bad because alias analysis will be very conservative when it doesn't have schema information present. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13790 Differential Revision: D13009185 Pulled By: suo fbshipit-source-id: 023516937bd3dcae8a969185a89c55f38d691ba5	2018-11-09 15:50:29 -08:00
Wanchao Liang	e552c04d53	Add proper comment for dispatch_to (#13783 ) Summary: Add proper comment to the fix in https://github.com/pytorch/pytorch/pull/13700 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13783 Differential Revision: D13009956 Pulled By: wanchaol fbshipit-source-id: 34f5259204dab12f4159ab191e7b08e2f5226292	2018-11-09 15:48:15 -08:00
Vishwak Srinivasan	7b2fb012a8	Make potrs batched (#13453 ) Summary: - This is a straightforward PR, building up on the batch inverse PR, except for one change: - The GENERATE_LINALG_HELPER_n_ARGS macro has been removed, since it is not very general and the resulting code is actually not very copy-pasty. Billing of changes: - Add batching for `potrs` - Add relevant tests - Modify doc string Minor changes: - Remove `_gesv_single`, `_getri_single` from `aten_interned_strings.h`. - Add test for CUDA `potrs` (2D Tensor op) - Move the batched shape checking to `LinearAlgebraUtils.h` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13453 Reviewed By: soumith Differential Revision: D12942039 Pulled By: zou3519 fbshipit-source-id: 1b8007f00218e61593fc415865b51c1dac0b6a35	2018-11-09 15:16:26 -08:00
Ansha Yu	e3e6ca1102	operator serialized test coverage summary document (#13703 ) Summary: Add a markdown document summarizing the coverage of serialized operator tests. This currently only takes into account what has been covered by the tests with respect to the entire registry of c2 operators. Next, we will break down the coverage by which operators have unit tests associated with them, which have hypothesis tests, and which have tests more specifically calling assertReferenceChecks. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13703 Reviewed By: dzhulgakov Differential Revision: D12970810 Pulled By: ajyu fbshipit-source-id: 4f0cd057b1cf734371333e24d26cbab630a170e1	2018-11-09 15:04:08 -08:00
Sam Gross	014ea1e1f8	Improve CUDA out-of-memory error message (#13751 ) Summary: ``` The new error message now looks like (from Python): RuntimeError: CUDA out of memory. Tried to allocate 16.00 GiB (GPU 0; 11.93 GiB total capacity; 4.00 GiB already allocated; 7.33 GiB free; 179.00 KiB cached) Summary of terms: "total capacity": total global memory on GPU "already allocated": memory allocated by the program using the caching allocator "free": free memory as reported by the CUDA API "cached": memory held by the allocator but not used by the program The "allocated" amount does not include memory allocated outside of the caching allocator, such as memory allocated by other programs or memory held by the driver. The sum of "allocated" + "free" + "cached" may be less than the total capacity due to memory held by the driver and usage by other programs. Note that at this point cuda_malloc_retry has already returned all possible "cached" memory to the driver. The only remaining "cached" memory is split from a larger block that is partially in-use. ``` This also fixes an issue where on out-of-memory could cause an unrelated subsequent CUDA kernel launch to fail because `cudaGetLastError()` was not cleared. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13751 Differential Revision: D13007177 Pulled By: colesbury fbshipit-source-id: ea7121461b3f2a34646102959b45bde19f2fabab	2018-11-09 14:33:28 -08:00
Edward Yang	ae7c6bcfcf	Make c10 buildable by itself. (#13742 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13742 Along the way, I switch us to globbing directories by hand, so we don't actually pick up generated cpp files in c10/build (if you're doing the normal idiom for a cmake build.) Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: dzhulgakov Differential Revision: D12988039 fbshipit-source-id: 08b7ec50cfef82b767b4ca9972e5ba65bc45bcbb	2018-11-09 13:40:39 -08:00
Peter Goldsborough	09369fa9d7	Fix clang_tidy.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13776 Differential Revision: D13002845 Pulled By: goldsborough fbshipit-source-id: 7b019a032680796cbb04f733b31749ef7c6abe54	2018-11-09 11:46:50 -08:00
Wanchao Liang	79ceecec8e	Optional undefined tensor support (#13650 ) Summary: This PR is a part of task to unblock standard library export. * we treat None differently from Tensor and other types, when passing None as Tensor, it's an undefined tensor rather than the None IValue. * Refine the type system so that we have correct tensor types hierarchy (Dynamic/Tensor/CompleteTensor), Dynamic should be at the top of the inheritance hierarchy. * It also tries to export bilinear as an example of undefined tensor(None) input. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13650 Differential Revision: D12967026 Pulled By: wanchaol fbshipit-source-id: 6aedccc7ce2a12fadd13d9e620c03e1260103a5a	2018-11-09 11:29:57 -08:00
Igor Sugak	607094c4bf	fix null-pointer-use in reshape_op.h Summary: UndefinedBehaviorSanitizer: null-pointer-use ../fbcode/third-party-buck/gcc-5-glibc-2.23/build/libgcc/include/c++/5.5.0/bits/stl_vector.h:794:16 ``` Here we take the address of the first element in the empty vector. Fix the error by guarding against empty source. Reviewed By: pixelb Differential Revision: D12989957 fbshipit-source-id: ac5ec366385df835b546bd1756e30cd762f13a7a	2018-11-09 10:07:04 -08:00
Sebastian Messmer	107e067654	Move IdWrapper to c10/util Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13687 Reviewed By: ezyang Differential Revision: D12912238 fbshipit-source-id: f7a37de52cd3b3c45b3b0e9eeb29dff624fa0258	2018-11-09 10:02:45 -08:00
Peter Goldsborough	332a7db35e	Use MNIST dataset in C++ integration test (#13737 ) Summary: We have an MNIST reader in the C++ data API, so we can get rid of the custom one currently implemented in the integration tests. ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/13737 Differential Revision: D12990936 Pulled By: goldsborough fbshipit-source-id: 125a1910ec91d53dbf121570fc9eec6ccfba0477	2018-11-09 09:55:02 -08:00
Tongzhou Wang	a63ef1d605	Suggest git submodule update --init --recursive (#13769 ) Summary: We now have submodules that have submodules Pull Request resolved: https://github.com/pytorch/pytorch/pull/13769 Reviewed By: soumith Differential Revision: D13000203 Pulled By: SsnL fbshipit-source-id: 63c0c19c6c9d25ae3bf255a2421a82ca68278866	2018-11-09 08:41:44 -08:00
Gregory Chanan	a1b2f1710d	Remove _th_is_contiguous, make is_set_to a function, not a method. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13725 Differential Revision: D12980246 Pulled By: gchanan fbshipit-source-id: e5c5742a67e5a25062df736e28b44c133a635ca8	2018-11-09 07:02:38 -08:00
Gregory Chanan	10a1534c43	Remove _th methods that also have a function. (#13721 ) Summary: There's no reason we need these as the native function wrapper calls into the function anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13721 Differential Revision: D12977449 Pulled By: gchanan fbshipit-source-id: 54701ebe2f0bb2b55484cb437501c626e6471347	2018-11-09 06:57:20 -08:00
Thomas Viehmann	9ffabcfcaa	Use nested variant of getValueTrace to allow more flexible tracing script modules (#13597 ) Summary: When tracing scripted functions, we used to only allow Tensor arguments. This enables tracing script modules with List[Tensor] or Tuple[Tensor, Tensor] arguments (passing tuples). Fixes: #13566 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13597 Differential Revision: D12990464 Pulled By: soumith fbshipit-source-id: fdce3afcb1e09f3c26d6ce834c01bf18d261f47c	2018-11-09 06:24:02 -08:00
James Sun	dca3c2c60f	Save and execute futures in a task queue (#13212 ) Summary: Upon calling wait(), save the forked thread and the current thread to a task queue. A idling thread (which currently is single threaded) should pick a ready task and run till there is nothing in the task queue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13212 Differential Revision: D12884522 Pulled By: highker fbshipit-source-id: b3942a0ee63c148e05f5f41bdc73007fa3c3368e	2018-11-09 01:46:35 -08:00
Junjie Bai	4484f67b47	Revert D10203439: [pytorch][PR] Fix batch norm multiplier init Differential Revision: D10203439 Original commit changeset: 999cc134a45e fbshipit-source-id: 7871e384063db2f3788169338e9c965d5f8ac351	2018-11-09 00:37:05 -08:00
peterjc123	26751ce300	Fix the improper use of windows-native slashes (#13220 ) Summary: Trying to fix #12510. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13220 Differential Revision: D12994483 Pulled By: soumith fbshipit-source-id: adbaf7e7a0a7cd1fc3ec947ddb209b55a9cda2a6	2018-11-08 21:09:44 -08:00
Zachary DeVito	44fb23a2f5	Add ability to annotate jit types inside function (#13752 ) Summary: This adds torch.jit.annotate for annotating the type of an intermediate. This is Py2/3 compatible, e.g.: ``` from torch.jit import annotate from typing import List torch.jit.script def foo(): a = annotate(List[int], []) ``` This is needed to output valid python programs from our IR. It removes the need for the empty list constructors. A future patch can add support to the C++ parser and Python 3, via desugaring: ``` a : int = b a = anntoate(int, b) ``` But this functionality is not required for serialization so is not added in this patch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13752 Differential Revision: D12989885 Pulled By: zdevito fbshipit-source-id: 161573a7352094543dc0d33a892f2a3b9103d847	2018-11-08 20:25:00 -08:00
Ashish	5ae3b44255	Added HIP top_k operator (#13747 ) Summary: This PR contains changes for: 1. Adding HIP top_k operator in Caffe2 2. Added HIP equivalent definitions of GPUDefs and GPUScanUtils 3. Removing the top_k operator test from ROCm test ignore list 4. Bug fixes in related code in THC/THCAsmUtils.cuh Differential Revision: D12986451 Pulled By: bddppq fbshipit-source-id: 6d5241fb674eaeb7cde42166426ac88043b83504	2018-11-08 20:14:53 -08:00
Will Feng	32b3fe8ce6	CircleCI: enable OSX jobs again (#13731 ) Summary: CircleCI now offers 60x OSX concurrency, which is 2x of what we currently have in Jenkins. This should help alleviate the OSX CI wait time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13731 Differential Revision: D12993737 Pulled By: yf225 fbshipit-source-id: f475ad9a1d031eda95b7cacdaf52f31fbb2f4f93	2018-11-08 20:09:05 -08:00
Jianyu Huang	2ee4ef5290	Change all namespace fbgemm2 in the new fbgemm2 to namespace fbgemm (#13740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13740 We would like to rename the old fbgemm to “fbgemm0”, and the new fbgemm2 to “fbgemm”: This DIFF changes all namespace fbgemm2 to namespace fbgemm. The purpose is to avoid the confusion of "fbgemm2" when we release our FBGEMM open source. Reviewed By: jspark1105 Differential Revision: D12850449 fbshipit-source-id: 08cc47864b157e36fbceddb7a10bf26218c67bd8	2018-11-08 19:59:12 -08:00
Jianyu Huang	55964abb11	Change all namespace fbgemm in the old fbgemm to namespace fbgemm0 (#13701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13701 We would like to rename the old fbgemm to “fbgemm0”, and the new fbgemm2 to “fbgemm”: This DIFF changes all namespace fbgemm to namespace fbgemm0. Reviewed By: jspark1105 Differential Revision: D12848727 fbshipit-source-id: 47935e9e2c4714a7ce1bfc3f7e4d6a334130132e	2018-11-08 19:59:10 -08:00
Freddie Mendoza	a8e303dc46	change USE_MKLDNN default from ON (from #13303 ) to OFF for ppc64le (#13759 ) Summary: MKLDNN is not supported on ppc64le change USE_MKLDNN to OFF for ppc64le Pull Request resolved: https://github.com/pytorch/pytorch/pull/13759 Differential Revision: D12993121 Pulled By: soumith fbshipit-source-id: 539d5cfcff2c03b59fa71e10b52fac333a64c381	2018-11-08 19:33:39 -08:00
Gregory Chanan	dd3f52fbe6	Remove _th_ndimension, which doesn't actually do anything. (#13723 ) Summary: Tensor.ndimension is hardcoded. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13723 Reviewed By: ezyang Differential Revision: D12979461 Pulled By: gchanan fbshipit-source-id: b95251b74a7b96ebcce2331f847873216968124d	2018-11-08 19:29:59 -08:00
Kaixhin	c9be135bb9	Fix batch norm multiplier init (#12325 ) Summary: Fixes #12259 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12325 Differential Revision: D10203439 Pulled By: SsnL fbshipit-source-id: 999cc134a45e2554313adb7eb93ee98e1f84335f	2018-11-08 19:00:00 -08:00
Peter Goldsborough	42001e7c17	Fix clang-tidy for Python2 (#13735 ) Summary: `clang_tidy.py` doesn't run with Python2 right now. Needs a minor fix ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13735 Differential Revision: D12990613 Pulled By: goldsborough fbshipit-source-id: ad19b229a14188fd048dde198a7f4c3483aeff95	2018-11-08 17:57:08 -08:00
Gregory Chanan	89b54229b1	Make _th_unfold and _th_view into functions, from methods. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13724 Reviewed By: ezyang Differential Revision: D12979865 Pulled By: gchanan fbshipit-source-id: 92462198f3c51664f7973c142956774d88d831ca	2018-11-08 16:36:55 -08:00
Roy Li	00e752a46e	Move cpu copy to aten Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13347 Reviewed By: ezyang Differential Revision: D12850691 fbshipit-source-id: d72577efb0ccb6df69e33f0c0a94c9f71937ccf8	2018-11-08 15:56:41 -08:00
Dan Zheng	51f58f0990	Fix typo in CTC loss doc comments. (#13727 ) Summary: `target_lenghts` -> `target_lengths` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13727 Differential Revision: D12981582 Pulled By: zou3519 fbshipit-source-id: e5e02b26cf3030a91494655ff863273333cc4133	2018-11-08 14:50:48 -08:00
Brennan Vincent	bff931a10d	implement concatenation of sparse tensors (#13577 ) Summary: With this change applied, `torch.cat` works for sparse tensors. The algorithm is just to concatenate the values, and give the new values the proper indices (which will be the same as their old indices in every dimension except the catted dimension, and their old indices plus the sum of the size of every previous tensor in the catted dimension). This is my first time contributing to PyTorch so please feel free to tell me if this approach seems totally wrong. Coming next: `torch.stack` for sparse tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13577 Differential Revision: D12980948 Pulled By: umanwizard fbshipit-source-id: 51ebdafee7fcd56d9762dcae9ebe5b4ab8e1dd6b	2018-11-08 14:15:30 -08:00
Ning Dong	65ff84b49e	Catch error by reference in module.cpp (#13743 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13743 "catch by reference, throw by value" Catching polymorpic type std::bad_weak_ptr was an error earlier. Reviewed By: goldsborough Differential Revision: D12982626 fbshipit-source-id: 0ff22c0352acc7a94078ce6d5b2a4e56fee75be5	2018-11-08 13:49:21 -08:00
Bram Wasti	8a5869a3f7	Move function_schema to aten/core (#13729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13729 final move to expose function_schema to caffe2 Differential Revision: D12981563 fbshipit-source-id: e4f7fa611a2498a96c27dfa8bfd18e10ad781c10	2018-11-08 13:28:37 -08:00
James Reed	85bde3801b	Tracer now records Python variable names (#13441 ) Summary: This is probably slow but it should make the traces more understandable and make debugging easier. Any suggestions for how to make it faster (i.e. make it so we don't have to traverse all of locals() and globals()) would be appreciated Pull Request resolved: https://github.com/pytorch/pytorch/pull/13441 Differential Revision: D12879763 Pulled By: jamesr66a fbshipit-source-id: b84133dc2ef9ca6cfbfaf2e3f9106784cc42951e	2018-11-08 13:08:42 -08:00
Edward Yang	64a910bac7	Remove unnecessary tools/ qualification. (#13706 ) Summary: H/t kalisp for pointing it out Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13706 Differential Revision: D12983983 Pulled By: ezyang fbshipit-source-id: 6a43cdde142fe64550121b16716f206e7c4d68d6	2018-11-08 12:55:19 -08:00
Brian Vaughan	4fadf571fd	handle flat rolling (no dim specified) T36264909 (#13588 ) Summary: update roll to behave as in numpy.roll when dimension to roll not specified. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13588 Differential Revision: D12964295 Pulled By: nairbv fbshipit-source-id: de9cdea1a937773033f081f8c1505a40e4e08bc1	2018-11-08 12:39:35 -08:00
David Riazati	59d021b63a	Fix nn threshold test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13734 Differential Revision: D12983358 Pulled By: driazati fbshipit-source-id: 6db30b8bbc8e34c6e01f678724dfca9555a86177	2018-11-08 12:31:39 -08:00
vishwakftw	0a090fe60a	Fix torch.dist for infinity, zero and minus infinity norms (#13713 ) Summary: Fixes #13559 Differential Revision: D12981556 Pulled By: zou3519 fbshipit-source-id: 99e86abab3ca045257374a9212ca24e7ca59fe9d	2018-11-08 12:03:07 -08:00
Elias Ellison	a92ff57a4d	update range doc (#13730 ) Summary: Update range documentation to show that we don't support start or increment parameters Pull Request resolved: https://github.com/pytorch/pytorch/pull/13730 Differential Revision: D12982016 Pulled By: eellison fbshipit-source-id: cc1462fc1af547ae80c6d3b87999b7528bade8af	2018-11-08 11:40:52 -08:00
Jongsoo Park	869ef71343	AsyncNet: option for time based tracing and trace path (#13440 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13440 Time based tracing is easier to look at when multiple nets are running asynchronously. This diff also adds an option to change the path to dump trace files. Reviewed By: aazzolini, ilia-cher Differential Revision: D12479259 fbshipit-source-id: 94d379634ba7b90c111c92b1136ffa4226b8bb8c	2018-11-08 11:34:34 -08:00
David Riazati	556ff8e7b7	Add builtins for `size()` and list with defaults (#13639 ) Summary: * `aten::size()` to match `torch.Tensor.size` * `aten::list_with_default` for semantics of `torch.nn.modules.utils.list_with_default` * converts `adaptive_avg_pool2d` and `adaptive_avg_pool3d` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13639 Differential Revision: D12954670 Pulled By: driazati fbshipit-source-id: 68c30af0efc02c60af5fb8c9715b2435cc01a0d9	2018-11-08 11:26:35 -08:00
Gu, Jinghui	d01cb70497	build with mkl-dnn by default (#13303 ) Summary: build with mkl-dnn by default Pull Request resolved: https://github.com/pytorch/pytorch/pull/13303 Reviewed By: yinghai Differential Revision: D12979633 Pulled By: orionr fbshipit-source-id: 00d23fa27c0d13e82f7e5acb3ebd00ed7ba1d5dc	2018-11-08 11:18:27 -08:00
Yinghai Lu	8581d3ec67	Allow blacklist ops in onnxifi transform Differential Revision: D12945523 fbshipit-source-id: cf5055652591bd1dd8d4be92b7fd6a40a0764536	2018-11-08 09:59:03 -08:00
peter	fd9aaa6b79	Fix linking errors on Windows (#13100 ) Summary: 1. Removes the flag "/FORCE:UNRESOLVED" that shouldn't be used. 2. Fix the code logic for ONNX_BUILD_MAIN_LIBS on Windows 3. Add a patch for protobuf using CMake Pull Request resolved: https://github.com/pytorch/pytorch/pull/13100 Differential Revision: D12978950 Pulled By: orionr fbshipit-source-id: db9eb8136acf5712cfb5a24ed228b7934d873331	2018-11-08 09:54:09 -08:00
Uladzislau Paulovich	3e877a70e3	Enable unused-private-field warning (#13450 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13450 Pull Request resolved: https://github.com/facebook/react-native/pull/22065 This diff enables -Wunused-private-field clang warning for Android builds and fixes all broken targets. Reviewed By: gkmhub Differential Revision: D12881793 fbshipit-source-id: 515555661e137be9e7b20eac9b5bdcb549d6a094	2018-11-08 09:23:11 -08:00
Edward Yang	df022f8078	Disable CopyFrom src with uninitialized storage Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12692 Reviewed By: li-roy, dzhulgakov Differential Revision: D10392295 fbshipit-source-id: 3a37173b03e76862ec421e0b6d0b0e322b2749b5	2018-11-08 07:45:42 -08:00
David Riazati	4472ad3b2f	Move functional _Reduction to its own module (#13401 ) Summary: To support `_Reduction` in the jit this PR moves it out to a new file so that it goes through the paths for python modules in the script compiler and converts `F.ctc_loss` to weak script Depends on #13484 for saving rng state Pull Request resolved: https://github.com/pytorch/pytorch/pull/13401 Differential Revision: D12868501 Pulled By: driazati fbshipit-source-id: 23cec0fb135744578c73e31ac825e238db495d27	2018-11-08 01:04:10 -08:00
Xiaoqiang Zheng	de41d1ae0b	Enable junk fill for the default CPU allocator (#13377 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13377 * Enable junk fill for the default CPU allocator. The first diff only enables this for the tests. A second diff will change the default of zero-fill to false. * Fix tests to use 64-bit counters that IterOp and LearningRateOp demands. * Fix kernels that uses uninitialized memory. Reviewed By: salexspb Differential Revision: D10866512 fbshipit-source-id: 17860e77e63a203edf46d0da0335608f77884821	2018-11-08 00:02:37 -08:00
Michael Suo	21991c05a9	Support assignment to subscripted lhs expr (#13486 ) Summary: Support things like `foo[0] = bar` in script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13486 Differential Revision: D12964550 Pulled By: suo fbshipit-source-id: 3dda8ffd683d1b045787c65bfa0c7d43b0455658	2018-11-07 23:07:57 -08:00
Wanchao Liang	411d89ca64	Fix the bug in dispatch_to when calling cpu() (#13700 ) Summary: When we added to in #13146, we did not emit the cast correctly in one of the dispatch overloads, then when we call .cpu(), the dtype will always be the default float type, which is wrong. CC jamesr66a eellison Pull Request resolved: https://github.com/pytorch/pytorch/pull/13700 Differential Revision: D12968699 Pulled By: wanchaol fbshipit-source-id: c1aaf2bf6a163643ce5360797da61c68271d8bf8	2018-11-07 22:57:35 -08:00
Jongsoo Park	90ea61800f	operators/quantized/server -> quantization/server (#13660 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13660 Any change in server side quantized operator was triggering ios-sanity-check with more than 5 hours testing time. I suspect this was because the operator code was synced with xplat directory. This diff moves server side quantized operators to caffe2/caffe2/quantization/server to avoid this issue. Reviewed By: hx89 Differential Revision: D12955420 fbshipit-source-id: b6c824b9de5e2a696f8c748e1b2c77d81d46746b	2018-11-07 22:54:13 -08:00
Tongzhou Wang	2448a83d30	Give broadcast_coalesced tensors different version counters (#13594 ) Summary: In `broadcast_coalesced`, since multiple variables can be "views" of a big flattened tensor, they can share the same version counter. However, this base flat tensor is not exposed and they don't share any memory locations, so this is not necessary. Furthermore, it can cause problems, e.g., when two buffers are broadcast together in `DataParallel` and one of them is modified in-place during `forward` but the other is needed in backward, autograd engine will complain. Fixing the bug discovered at https://github.com/pytorch/pytorch/pull/13350#issuecomment-436011370 edit: This is a very real problem. E.g., consider using Spectral Norm + Batch Norm together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13594 Differential Revision: D12967311 Pulled By: SsnL fbshipit-source-id: 52998dbabe149f575cf0fb79e7016f0b95e4b9e5	2018-11-07 21:49:35 -08:00
Wei Yang	5dd153b1c2	speed up torch.sparse_mask() cpu kernel (#13290 ) Summary: - `sparse_mask(D, S)` is useful to implement backward for `sparse_addmm()` - previous `sparse_mask(D, S)` cpu kernel is not parallelized - this PR speed up the cpu kernel for two separated cases: - `D.dim == S.sparse_dim`: simply parallelize the kernel - `D.dim > S.sparse_dim`: simply use CUDA kernel implementation - performance: `D.dim == S.sparse_dim` ``` >>> nnz = 100000 >>> dims = [1000, 1000] >>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)), torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz) >>> size = torch.Size(dims) >>> S = torch.sparse_coo_tensor(I, V, size).coalesce() >>> D = torch.randn(dims) >>> %timeit D.sparse_mask(S) ======= before change ======= 6.4 ms ± 684 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ======= after change ======= 333 µs ± 89.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` `D.dim > S.sparse_dim` ``` >>> nnz = 100000 >>> dims = [1000, 1000, 2, 2] >>> I = torch.cat([torch.randint(0, dims[0], size=(nnz,)), torch.randint(0, dims[1], size=(nnz,))], 0).reshape(2, nnz) >>> V = torch.randn(nnz, dims[2], dims[3]) >>> size = torch.Size(dims) >>> S = torch.sparse_coo_tensor(I, V, size).coalesce() >>> D = torch.randn(dims) %timeit D.sparse_mask(S) ======= before change ======= 495 ms ± 41.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each) ======= after change ======= 594 µs ± 68.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13290 Differential Revision: D12878336 Pulled By: weiyangfb fbshipit-source-id: 10b5981af382f7c6095a42c0fee7297d6438ce37	2018-11-07 20:02:17 -08:00
Wei Yang	6bfce16873	fix flip() shape bug in CPU (#13344 ) Summary: - a walk around for #13292, a complete fix requires investigation on the root cause when using advanced indexing - this PR brings in `filp()` CUDA implementation for CPU kernel - with this change: ``` >>> t = torch.randn(1, 3, 4, 5) >> t.flip(1, 3).shape torch.Size([1, 3, 4, 5]) ``` - performance: ``` ====== with this PR ====== >>> a = torch.randn(1000, 1000) >>> %timeit -r 100 a.flip(0, 1) 1.98 ms ± 579 µs per loop (mean ± std. dev. of 100 runs, 1000 loops each) ====== Perf at previous PR #7873 ====== 100 loops, best of 3: 11 ms per loop ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13344 Differential Revision: D12968003 Pulled By: weiyangfb fbshipit-source-id: 66f434049d143a0575a35b5c983b3e0577a1a28d	2018-11-07 19:53:49 -08:00
Bram Wasti	1616587540	Redo jit/type and utils/functional to ATen/core (#13455 ) Summary: This is a redo of the previous move which broke OS X and Windows tests -- RTTI seemed to be broken Pull Request resolved: https://github.com/pytorch/pytorch/pull/13455 Differential Revision: D12883775 Pulled By: bwasti fbshipit-source-id: 2b6c65e8150e6f89624c6ee99c389335c6fb4bb8	2018-11-07 18:11:29 -08:00
Peter Goldsborough	87b47ff850	Remove .data() use in C++ frontend (#13675 ) Summary: Removes the last uses of `.data()` in implementation code of the C++ frontend. CC yf225 ezyang ebetica apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/13675 Differential Revision: D12966061 Pulled By: goldsborough fbshipit-source-id: fbc0c83c3ba56598ff853bc7b1ddf9005fdd9c41	2018-11-07 17:30:29 -08:00
Teng Li	eb88098e11	Kill c10d/private/CUDAUtils.hpp (#13681 ) Summary: Use AT_CUDA_CHECK instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/13681 Differential Revision: D12966607 Pulled By: teng-li fbshipit-source-id: da0431f588969791a19519368edb909b9c3dc5ab	2018-11-07 17:09:08 -08:00
Zachary DeVito	c8bb665b5d	Fix a bug in tuple assignment (#13656 ) Summary: Previously, we did not distinguish between `a = b` (simple assignment), and `a, = b` (tuple destructuring of a singleton tuple). The second case would fail in the string frontend, and would not unpack in the python frontend. This patch fixes both issues and also cleans up the error reporting for unexpected expressions on the LHS. Will likely conflict with #13486 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13656 Differential Revision: D12964566 Pulled By: zdevito fbshipit-source-id: 992b19e5068aef59a78cd23cb0e59a9eeb7755d1	2018-11-07 16:44:22 -08:00
Brendan Soffientini	9900a8dd89	Remove outdated css and font files in html docs (#13699 ) Summary: The stylesheet at docs/source/_static/css/pytorch_theme.css is no longer necessary for the html docs build. The new html docs theme styles are located at https://github.com/pytorch/pytorch_sphinx_theme. The Lato font is also no longer used in the new theme. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13699 Differential Revision: D12967448 Pulled By: soumith fbshipit-source-id: 7de205162a61e3acacfd8b499660d328ff3812ec	2018-11-07 16:31:28 -08:00
Peter Goldsborough	7978ba45ba	Update path in CI script to access ninja (#13646 ) Summary: We weren't running C++ extensions tests in CI. Also, let's error hard when `ninja` is not available instead of skipping C++ extensions tests. Fixes https://github.com/pytorch/pytorch/issues/13622 ezyang soumith yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13646 Differential Revision: D12961468 Pulled By: goldsborough fbshipit-source-id: 917c8a14063dc40e6ab79a0f7d345ae2d3566ba4	2018-11-07 14:31:29 -08:00
Brian Vaughan	bf9b5dffbf	ensure flake8 ignores non-conforming python files generated by build Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13680 Differential Revision: D12964332 Pulled By: nairbv fbshipit-source-id: a28358c265fd305f5f8cf893d25d34d6b5929210	2018-11-07 14:27:41 -08:00
Peter Goldsborough	d4f9dbfa66	Remove catch check Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13677 Differential Revision: D12961992 Pulled By: goldsborough fbshipit-source-id: 1f0207704d05ac67ed1ec1502bec617c845d9f79	2018-11-07 12:27:15 -08:00
Teng Li	dceec1de30	Distributed Data Parallel documentation for PT1 release (#13657 ) Summary: This should fix https://github.com/pytorch/pytorch/issues/12604 Make html and look through the html pages to make sure that everything looks good Pull Request resolved: https://github.com/pytorch/pytorch/pull/13657 Reviewed By: calebho Differential Revision: D12954250 Pulled By: teng-li fbshipit-source-id: 40e1925ec0cdce5e6a1d8ba29537937da8ef9194	2018-11-07 12:11:57 -08:00
Jongsoo Park	216c5d0bdc	caching packed matrix (#13626 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13626 Reuse pack matrix of weights. Reviewed By: dskhudia Differential Revision: D12916630 fbshipit-source-id: f0ec5734f5506134a79d9c0601146488e15c3afe	2018-11-07 12:03:39 -08:00
Yiming Wu	94fe8faa00	new QNNPACK dwconv support and tests (#13652 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13652 new dwconv 3x3 5x5 tests provided Reviewed By: Maratyszcza Differential Revision: D12951866 fbshipit-source-id: f853bb7412a724de594ed36c6b2b69ec268d6464	2018-11-07 12:03:35 -08:00
Teng Li	1413dd4bfc	Added the finer bucketing option for DDP (#13607 ) Summary: We only need this for backward, for FWD cast, the non-fine-grained bucketing should be better since it's sequential anyway. Test should be covered all by c10d test, reduced bucket size to make bucketing happen in c10d test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13607 Differential Revision: D12944515 Pulled By: teng-li fbshipit-source-id: d982e8dca2874c91d39b30b73a85bfbeb768c508	2018-11-07 12:00:55 -08:00
Tongzhou Wang	044d00516c	Rename DistBackend -> Backend (#11830 ) Summary: Also add docs for get_backend, Backend, and reduce_op fixes #11803 cc The controller you requested could not be found. pietern apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11830 Differential Revision: D9927991 Pulled By: SsnL fbshipit-source-id: a2ffb70826241ba84264f36f2cb173e00b19af48	2018-11-07 11:58:12 -08:00
rohithkrn	afc7dbd586	Hipify caffe2/utils/math_gpu.cu (#13521 ) Summary: This PR adds caffe2/utils/math_gpu.cu to pyHipify bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/13521 Differential Revision: D12954843 Pulled By: bddppq fbshipit-source-id: a2bf367da07e49cb7807ba6876b42d0733fc8205	2018-11-07 11:34:15 -08:00
Jerry Zhang	0f59dcb317	Remove partially initialized Tensor + CopyFrom (#13629 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13629 Previously we have a Tensor which has a initialized storage(therefore a known device_type) and then we'll call CopyFrom on it to initialize the sizes and data. We want to eliminate partially initialized Tensor by replacing the pattern of calling CopyFrom with a partially initialized Tensor with either splitting that to undefined Tensor + initialization API(1)(3) or combine all the initialization in the same step(2). 1. member variable initialization + CopyFrom Previously we have a tensor that is initialized with device_type, and then use CopyFrom to populate the content, now we remove the partial initialization by make the original member variable an undefined Tensor and use ReinitializeFrom to copy from another Tensor. 2. Output + CopyFrom Previously, we first get a tensor with device_type, and then CopyFrom another Tensor, We changed it two combining these two operations into OperatorBase::OutputTensor. 3. Output + custom functions Example can be found in TransformGPU function. In this case we move the part that initializes the tensor outside of the function, and do that explicitly outside so that we could reuse the Output functions to make a fully initialized Tensor. Note that to keep the original semantics, both of the APIs has a caching effect based on device_type, which means we only create a Tensor object when device_type does not match or the Tensor is undefined, otherwise, we will reuse the original Tensor object. Reviewed By: dzhulgakov Differential Revision: D12848855 fbshipit-source-id: 37bb4ddc1698ebea533b73006eeb1218faa8ddf8	2018-11-07 11:31:03 -08:00
albanD	6c8ac50753	Fix exception catching to catch c10::Error properly (#13665 ) Summary: In particular, this was breaking the logic for cudnn algorithm to fall back to a less memory hungry algorithm if the selected one OOM when creating the workspace. c10::Error are subclass of `std::exception` and not `std::runtime_error`. I removed `runtime_error` in all places in our code and replaced them with `const exception`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13665 Differential Revision: D12958396 Pulled By: soumith fbshipit-source-id: af557efd9887b013140113d3067de157ffcf8465	2018-11-07 11:22:48 -08:00
Alekh Karkada Ashok	674e23bbab	Fixed a small error in docstrings for ConvTranspose3d (#13668 ) Summary: In the example for ConvTranspose3d, the docstring had "Conv3d" instead of "ConvTranspose3d" in one instance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13668 Differential Revision: D12958372 Pulled By: soumith fbshipit-source-id: 5ec901e20b90f4eed2bf04c5b417183ec2096447	2018-11-07 11:22:46 -08:00
Peter Goldsborough	2fe9e3a207	Remove catch from caffe2/.gitmodules Summary: Step 3 to remove catch submodule from PyTorch Reviewed By: ezyang Differential Revision: D12959020 fbshipit-source-id: 49347de8b027433d422b653dd854ad76349d0e25	2018-11-07 11:10:09 -08:00
Peter Goldsborough	e7652cfb40	Remove caffe2/submodules/catch-rev.txt Summary: Step 1 to remove catch submodule from PyTorch Reviewed By: ezyang Differential Revision: D12958997 fbshipit-source-id: ab4b9e103ac83ad490375440722f95247eb1ac7f	2018-11-07 11:10:07 -08:00
Peter Goldsborough	ab0c72ab6f	Replace cursors with OrderedDict (#13427 ) Summary: This is a pre-cursor diff to Python <-> C++ frontend integration -- I have a follow-up PR coming for that. This PR changes the C++ frontend module interface to replace the custom "cursor"s I introduced some time ago with `OrderedDict`. I introduced cursors at the time as a convenient way of applying functions and query operations on a modules' parameters, buffers and modules, allowing things like `module.parameters().map(my_func)`. However, I noticed that (1) this functionality is easily implement-able on top of a regular data structure and (2) more importantly, using OrderedDicts is much, much easier for Python integration. This is especially true given that ScriptModule today also uses OrderedDict. Since C++ frontend modules and ScriptModules will soon too share as many implementation details as possible, it is overall the best move to ditch the custom cursor datastructure and pervasively use OrderedDict everywhere. For this I did: 1. Changed the C++ frontend module interface to more closely match the Python one by providing `parameters()`, `named_parameters()` and other methods Python provides. This is very important for the following diff which binds these into Python for inter-op with Python modules. 2. In lieu of the `Cursor::apply()` method I added `nn::Module::apply`. This again is one more unifying step between Python and C++, since Python modules have an apply function too. 3. Deleted all uses of Cursor. 4. Tidied and beefed up the `OrderedDict` class. In particular, I made `OrderedDict::Item` store an `std::pair` under the hood, because that is trivial to bind into Python and saved me a lot of headaches. `key` and `value` become methods instead of fields, which they should have been from the very start anyway because it allows exactly these kinds of changes, as per usual good software engineering principle of encapsulation. 5. Added many tests for the OrderedDict use in `nn::Module`. ebetica ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13427 Differential Revision: D12894092 Pulled By: goldsborough fbshipit-source-id: 715770c95a9643753a1db26d7f9da9a78619a15d	2018-11-07 11:10:05 -08:00
Jerry Zhang	b652c2de50	Rename dim(i) -> size(i) Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(dim(i)->size(i)): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12935287 fbshipit-source-id: 700050640c756d7064c8db4fd50fe6a1421a61ef	2018-11-07 11:07:26 -08:00
bddppq	4326873330	Skip std and var tests in pytorch rocm CI (#13662 ) Summary: https://github.com/pytorch/pytorch/pull/13435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13662 Reviewed By: soumith Differential Revision: D12958408 Pulled By: bddppq fbshipit-source-id: 170b59769fbed149c9246b6549c62160e27d2404	2018-11-07 10:10:25 -08:00
Peter Goldsborough	9403eddce4	Fix tracing bug for custom ops (#13654 ) Summary: Due to a logic bug, tracing is broken for custom ops. Unfortunately, there also weren't any tests for tracing custom ops. The fix is a single line change of moving `pop(stack, std::get<Is>(arguments)...);` before `node = getTracedNode<Is...>(schema, arguments);`. Other changes are added tests and improved commenting/formatting. Fixes https://github.com/pytorch/pytorch/issues/13564 CC The controller you requested could not be found. fmassa zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/13654 Differential Revision: D12952887 Pulled By: goldsborough fbshipit-source-id: 87d256576f787c58e8d8f5c13a0fecd0ec62a602	2018-11-07 09:22:44 -08:00
François Garillot	edd2e38023	Clean up a couple of items in the C2 test scaffolding (WIP) (#7847 ) Summary: - Py3 compatibility - utility functions refactoring Pull Request resolved: https://github.com/pytorch/pytorch/pull/7847 Reviewed By: pietern Differential Revision: D9355096 Pulled By: huitseeker fbshipit-source-id: 8e78faa937488c5299714f78075d7cadb1b2490c	2018-11-07 09:16:13 -08:00
Jongsoo Park	10fdcf748a	swap with empty vector to force deallocation (#13625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13625 v.clear() doesn't guarantee deallocation and it was causing memory capacity issues Reviewed By: jianyuh Differential Revision: D12941938 fbshipit-source-id: b9c80828b122a44e883b32f43b5d8dfb36065773	2018-11-07 08:33:34 -08:00
Gregory Chanan	398d310bac	changes for cumsum/cumprod backward not depending on TH. (#13570 ) Summary: This is a subset of https://github.com/pytorch/pytorch/pull/13467 which is failing with ASAN errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13570 Differential Revision: D12922619 Pulled By: gchanan fbshipit-source-id: 007470243d8aee719ab9441abf29f06b4c84d59f	2018-11-07 07:45:33 -08:00
Jerry Zhang	a228a95b94	Rename ndim() -> dim() - 1/6 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12935693 fbshipit-source-id: f24f1c10cd5bbb9e63cda0a0da989e6e3766380a	2018-11-07 07:30:11 -08:00
Jerry Zhang	4794da03f8	Rename ndim() -> dim() - 4/6 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12935774 fbshipit-source-id: 2a7cb7da534da73b61f01eb0ff124abf193309ee	2018-11-07 07:30:09 -08:00
Jerry Zhang	57ec8f111f	Rename ndim() -> dim() - 6/6 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12935827 fbshipit-source-id: 80ecb034c243dbfd267b9f131cee9d7afd5ef063	2018-11-07 07:27:45 -08:00
Richard Zou	e60a7c2c88	codemod tensor.type().is_cuda(), tensor.type().is_sparse() (#13590 ) Summary: Followup to #12841 Changed these to not require type dispatch: tensor.type().is_cuda() -> tensor.is_cuda() tensor.type().is_sparse() -> tensor.is_sparse() isVariable(tensor.type()) -> tensor.is_variable() This probably does not affect performance very much in most cases but it is nice to have. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13590 Reviewed By: ezyang Differential Revision: D12929301 Pulled By: zou3519 fbshipit-source-id: 8ac5c6200c579dd7a44fb4ee58fc9bb170feb1d7	2018-11-07 07:27:42 -08:00
Richard Zou	e70321ed9e	Remove unnecessary type dispatches from Variable::Impl ctor (#13630 ) Summary: This should improve the performance of wrapping a tensor in a Variable Pull Request resolved: https://github.com/pytorch/pytorch/pull/13630 Reviewed By: ezyang Differential Revision: D12944960 Pulled By: zou3519 fbshipit-source-id: 89fa78a563e46a747d851a90ffd1b5cf3cd2d0d7	2018-11-07 07:27:40 -08:00
Jerry Zhang	2ae8e46105	Rename ndim() -> dim() - 2/6 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12935727 fbshipit-source-id: a0c306c8f451a671b80db54fef5aa091ed58bfe5	2018-11-07 07:25:20 -08:00
Gregory Chanan	7341ab0a33	Fix range of target examples and JIT test case for CTC loss. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13644 Differential Revision: D12949733 Pulled By: gchanan fbshipit-source-id: 1c4cacbb6a50d5002165bdd0a7881883db5c8249	2018-11-07 07:04:31 -08:00
Alex Şuhan	a132a7d9ce	Add autodiff support for a few additional operators (#13288 ) Summary: Added aten::{avg_pool2d, log_softmax, max_pool2d_with_indices, threshold}, enabled aten::{expand, view}. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13288 Differential Revision: D12954929 Pulled By: soumith fbshipit-source-id: 6fba58af82cafbc7446705d8c8145cdeaf4954ca	2018-11-06 23:24:12 -08:00
Junjie Bai	a1ba29a2c0	Change to use json format to store disabled_features in hipify (#13595 ) Summary: Since json is a builtin module in Python (>= 2.6), this makes pyhipify can be invoked without installing any extra dependencies. petrex iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/13595 Differential Revision: D12931045 Pulled By: bddppq fbshipit-source-id: 31d68fb6e730fd9d11593550ca531423cb0596e9	2018-11-06 22:06:10 -08:00
Marat Dukhan	7d64c9df39	Remove C2GEMMContext (#13443 ) Summary: C2GEMMContext is a remnant of old times when Int8 ops used gemmlowp. It is no longer needed: formerly gemmlowp-based ops use QNNPACK with pthreadpool interface, and other ops (Int8Add, Int8ChannelShuffle) use Caffe2 thread pool interface directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13443 Differential Revision: D12887773 Pulled By: Maratyszcza fbshipit-source-id: bd2732e2c187b399c8a82efebdd244457720256b	2018-11-06 21:50:53 -08:00
David Riazati	dbc467545f	Update weak script modules to match fns (#13631 ) Summary: Add weak modules for those that use weak script functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/13631 Differential Revision: D12945328 Pulled By: driazati fbshipit-source-id: 6cb235763bf5ab35c7b32e0f734f08d22418594f	2018-11-06 21:22:52 -08:00
Thomas Viehmann	14004cbef6	Native batch norm (#13263 ) Summary: - Move batch norm from TH(CU)NN to native - Speedups in many cases (e.g. #12006) for CUDA due to new block/grid layout and Welford-type mean/variance calculations (the latter for training mode) - It splits the forward kernel in two pieces and reuses the evaluation kernel for the transformation. - We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to maintain reasonable precision. Compared to the ill-fated #12368 - I changed the CPU kernel to not call `.sum()` from within parallel for. This seemed to have caused the breakage (NaN-results) in TestModels.test_dcgan_netG (thank you houseroad for the repro, errors in assessment of the fix are my own) - I updated the Half->Float upcasting in tensors to go through `t.type().scalarType()` instead of `t.dtype()`. - I have merged master Pull Request resolved: https://github.com/pytorch/pytorch/pull/13263 Differential Revision: D12946254 Pulled By: SsnL fbshipit-source-id: 3bb717ee250fbccaf10afe73722996aa4713d10d	2018-11-06 20:05:54 -08:00
Zachary DeVito	392ca1e59f	Remove compileFunction (#13640 ) Summary: This finishes a TODO to get torch.jit.script to go through the same pathway as methods, removing the need for forward_schema and for compileFunction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13640 Differential Revision: D12949713 Pulled By: zdevito fbshipit-source-id: 3d1a5f14910d97a68670a3fd416bdbfe457f621d	2018-11-06 19:37:06 -08:00
Teng Li	ce6edbfbd9	Fixed NCCL backend not being built (#13653 ) Summary: An regression caused by NCCL build refactoring earlier. CC fmassa Fixing: https://github.com/facebookresearch/maskrcnn-benchmark/issues/122 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13653 Differential Revision: D12952555 Pulled By: teng-li fbshipit-source-id: b42e2a88fff83c9ddd58eeb33e933f1f59f51c52	2018-11-06 19:33:49 -08:00
Tongzhou Wang	2cd912bcc2	Fix more spectral norm bugs (#13350 ) Summary: Problems with SN and DP after #12671 : 1. in eval mode, `weight_orig` is not getting correct gradient #12737 . Fix: keep `v` vector around as a buffer and always calculate `W = W_orig / (u @ W_orig @ v)` even in eval. 2. in training mode, the `weight` buffer of the parallelized module is never updated, if someone touches `weight_orig` and/or `weight` and makes them not sharing storage. So in `eval` the weight used is wrong. Fix: Make `weight` not a buffer anymore and always calculate it as above. 3. #12671 changed SN to update `u` in-place to make DP work correctly, but then it breaks backward through two forwards (e.g., the common GAN loss `D(real) - D(fake)`) because the vectors needed to backprop the 1st forward is changed in the 2nd forward. Fix: This PR clones `u` and `v` before using them. To maintain BC, I added a hook interface for producing and loading state_dict. This is ugly and we should really have better interface for spectral_norm. But for the purpose to fix this issue, I make this patch. Even if we have a better interface, BC mechanism for legacy loading legacy state_dict still needs to be done. cc The controller you requested could not be found. crcrpar Pull Request resolved: https://github.com/pytorch/pytorch/pull/13350 Differential Revision: D12931044 Pulled By: SsnL fbshipit-source-id: 8be6f934eaa62414d76d2c644dedd7e1b7eb31ef	2018-11-06 19:16:13 -08:00
Tianshu Bao	eb29485ed8	Support custimzed timeout when fetching blob from KVStore (#13582 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13582 Worker nodes sometimes witness timeout failures when getting session_id blob from Zeus, which due to delays in master node setting the blob. This diff will add flexibility to specify longer timeout for getting blobs from Zeus. Reviewed By: pietern Differential Revision: D12926156 fbshipit-source-id: b1a4d1d9cf7de084785bfa4a8a0cd3cfd095ba5c	2018-11-06 18:54:56 -08:00
Will Feng	bc1de6ae7d	CircleCI: disable output buffering to better locate test timeout (#13516 ) Summary: ASAN test timeout such as https://circleci.com/gh/pytorch/pytorch/165649?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link doesn't actually show where the timeout happened because of the bash output buffering. This PR turns off the buffering to better surface the error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13516 Differential Revision: D12952513 Pulled By: yf225 fbshipit-source-id: 48058c021470e5aa7a2246e1fcd974cfabf5df54	2018-11-06 18:14:26 -08:00
verhoek	619c2f8b44	small fixes regarding docu of torch tensors (#13635 ) Summary: Removed duplicate doc args block. Made statements involving 'each element' more precise. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13635 Differential Revision: D12946987 Pulled By: soumith fbshipit-source-id: a17da92f69086b530ff769cf4662ae29843fd188	2018-11-06 17:24:42 -08:00
Jerry Zhang	508f676c50	Rename ndim() -> dim() - 5/6 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: salexspb Differential Revision: D12935787 fbshipit-source-id: 303d71d3eb050789af2ab9575e5dcc48f6037086	2018-11-06 16:38:35 -08:00
Elias Ellison	6cf450744f	propagate python op error msg (#13624 ) Summary: Correctly propagate the error msg from a python op to the JIT interpreter. In the interpreter we wrap the exception and re-throw it as a Runtime Exception. Potentially in a future diff we can throw the same type of python exception as was originally thrown. Fix for https://github.com/pytorch/pytorch/issues/13560 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13624 Differential Revision: D12948756 Pulled By: eellison fbshipit-source-id: 94cdf4c376143c5e40dcb9716aefb3c1e2d957db	2018-11-06 16:28:39 -08:00
Bram Wasti	feff7be294	Remove RTTI from jit/type.h (#13591 ) Summary: RTTI can't be used on android so this is needed Pull Request resolved: https://github.com/pytorch/pytorch/pull/13591 Differential Revision: D12914402 Pulled By: bwasti fbshipit-source-id: be8c8c679bb20c7faaa7e62cd92854cedc19cb3a	2018-11-06 16:19:52 -08:00
Daya S Khudia	18de330e86	CMake integration for int8 server operators Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13558 Reviewed By: Maratyszcza Differential Revision: D12945460 Pulled By: dskhudia fbshipit-source-id: 1a91027b305fd6af77eebd9a4fad092a12f54712	2018-11-06 15:45:15 -08:00
Pradeep Dorairaj	76c1b5cd79	Fix overflow error in stats_put_ops Summary: I was hitting this error: caffe2/caffe2/operators/stats_put_ops.h:66:25: runtime error: 9.22337e+18 is outside the range of representable values of type 'long' So, the assignment from int64_t to float loses some precision and because of that we overflow. Reproduced this issue with this diff D12945013 Reviewed By: mlappelbaum, jdshi-fb Differential Revision: D12927086 fbshipit-source-id: 7eae7fe25ab49d5ac15279335bd5b1fa89d6e683	2018-11-06 15:41:51 -08:00
Jerry Zhang	e73943e488	Remove partially initialized Tensor + ShareData (#13522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13522 Currently Tensor is a shared pointer to the underlying implementation, rather than a value, copying the pointer will share the underlying TensorImpl, ShareData probably don't make sense anymore. Reviewed By: dzhulgakov Differential Revision: D12871708 fbshipit-source-id: d3773c66b7ed0bf1c37e886f69f59aec158b216b	2018-11-06 15:23:41 -08:00
Jie	fbe3c3f57f	(#13435 ) Summary: Moved torch.var torch.std to use THC reduction kernel, this greatly improves performance for computing variance over non-contiguous dimensions. Resolving #13192 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13435 Differential Revision: D12947137 Pulled By: soumith fbshipit-source-id: c0a22cb799fa57e8fbed81c7dcb880666f461883	2018-11-06 14:42:26 -08:00
Peter Goldsborough	393ad6582d	Use torch:: instead of at:: in all C++ APIs (#13523 ) Summary: In TorchScript and C++ extensions we currently advocate a mix of `torch::` and `at::` namespace usage. In the C++ frontend I had instead exported all symbols from `at::` and some from `c10::` into the `torch::` namespace. This is far, far easier for users to understand, and also avoid bugs around creating tensors vs. variables. The same should from now on be true for the TorchScript C++ API (for running and loading models) and all C++ extensions. Note that since we're just talking about typedefs, this change does not break any existing code. Once this lands I will update stuff in `pytorch/tutorials` too. zdevito ezyang gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/13523 Differential Revision: D12942787 Pulled By: goldsborough fbshipit-source-id: 76058936bd8707b33d9e5bbc2d0705fc3d820763	2018-11-06 14:32:25 -08:00
Pieter Noordhuis	be424de869	Add torch.multiprocessing.spawn helper (#13518 ) Summary: This helper addresses a common pattern where one spawns N processes to work on some common task (e.g. parallel preprocessing or multiple training loops). A straightforward approach is to use the multiprocessing API directly and then consecutively call join on the resulting processes. This pattern breaks down in the face of errors. If one of the processes terminates with an exception or via some signal, and it is not the first process that was launched, the join call on the first process won't be affected. This helper seeks to solve this by waiting on termination from any of the spawned processes. When any process terminates with a non-zero exit status, it terminates the remaining processes, and raises an exception in the parent process. If the process terminated with an exception, it is propagated to the parent. If the process terminated via a signal (e.g. SIGINT, SIGSEGV), this is mentioned in the exception as well. Requires Python >= 3.4. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13518 Reviewed By: orionr Differential Revision: D12929045 Pulled By: pietern fbshipit-source-id: 00df19fa16a568d1e22f37a2ba65677ab0cce3fd	2018-11-06 14:08:37 -08:00
zrphercule	056f2cd238	ATen/test/basic.cpp: Catch2Gtest (#12142 ) Summary: In #11846 , we immigranted all catch tests in Aten/test/ to use gtest except of basic.cpp for a GPU bug (valgrind related). In this PR, we will find out what the bug is, and immigrant last piece of aten catch to use gtest. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12142 Differential Revision: D12946980 Pulled By: zrphercule fbshipit-source-id: cf3b21f23ddec3e363ac8ec4bdeb4bc4fe35f83b	2018-11-06 14:00:18 -08:00
Michael Suo	06bfabf1f5	add tests to no-gtest Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13637 Differential Revision: D12946644 Pulled By: suo fbshipit-source-id: 161ddab275d5315fc053030d0f4956a4529602b1	2018-11-06 13:46:07 -08:00
Elias Ellison	137150be88	add unwrap optional operator (#13599 ) Summary: Add a builtin to refine the type of Optional[T] -> T. This is a short-term solution to unblock porting of the the standard library. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13599 Reviewed By: driazati, wanchaol Differential Revision: D12943193 Pulled By: eellison fbshipit-source-id: 31c893a78d813313bbbc1d8212b5c04e403cfb4d	2018-11-06 11:54:56 -08:00
Pieter Noordhuis	1906305c07	Consolidate argument checkers (#13623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13623 Moves the bulk of shared argument checkers in the gloo backend to Utils.hpp. Reviewed By: teng-li Differential Revision: D12934598 fbshipit-source-id: 7b80e67ccc3425f21498c30fbe7837af314f96f2	2018-11-06 11:52:38 -08:00
Richard Zou	7ffa864953	Speed up tensor.options() by avoiding type dispatch (#13330 ) Summary: Also speeds up tensor.is_variable(), tensor.layout(), and tensor.device(). This PR speeds up tensor.options() from 54ns to 17ns, resulting in a comparable speedup in torch.as_strided performance: https://gist.github.com/zou3519/7645262a4f89e237405857925bb872c3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13330 Differential Revision: D12847695 Pulled By: zou3519 fbshipit-source-id: 60b303671b0cce7b6140068c7f90c31d512643be	2018-11-06 11:39:28 -08:00
Edward Yang	464dc31532	Add README to tools, delete defunct scripts. (#13621 ) Summary: Some extra documentation for other bits too. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13621 Differential Revision: D12943416 Pulled By: ezyang fbshipit-source-id: c922995e420d38c2698ce59c5bf4ffa9eb68da83	2018-11-06 11:20:53 -08:00
Gu, Jinghui	6aee5488b5	correct omp dependency for mkl-dnn (#13449 ) Summary: The motivational of this PR is to enforce mkldnn to use the same omp version of caffe2 framework. Meanwhile, do not change other assumptions within mkldnn. Previously, the MKL_cmake_included is set in caffe2 in order to disable omp seeking in mkldnn. But, with such change, mkldnn has no chance to adapt for mkl found by caffe2. Then, some building flags of mkl will be not set in mkldnn. For example, USE_MKL, USE_CBLAS, etc. In this PR, we enforce set the MKLIOMP5LIB for mkldnn according to caffe2, and tell the mkl root path in MKLROOT for mkldnn. Then, mkldnn is built as expected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13449 Differential Revision: D12899504 Pulled By: yinghai fbshipit-source-id: 22a196bd00b4ef0a11d350a32c049304613edf52	2018-11-06 10:48:09 -08:00
Soumith Chintala	a7ee632dff	Various Test and build fixes (#13556 ) Summary: - fixes weights-contiguous requirement for THCUNN Convolutions - Add tests that conv backward pass works for non-contiguous weights - fix RNN tests / error messages to be consistent and pass - relax weight grad precision for fp16 for a particular test - fix regression of CMAKE_PREFIX_PATH not passing through - add missing skipIfNoLapack annotations where needed Differential Revision: D12918456 Pulled By: soumith fbshipit-source-id: 8642d36bffcc6f2957800d6afa1e10bef2a91d05	2018-11-06 07:13:47 -08:00
Gregory Chanan	9ca9469de6	mm backwards to not depend on TH. (#13575 ) Summary: This is a subset of https://github.com/pytorch/pytorch/pull/13476. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13575 Differential Revision: D12923473 Pulled By: gchanan fbshipit-source-id: 8787808d2ab377cc535f69c3c63dcd671c72b7db	2018-11-06 06:47:44 -08:00
Gregory Chanan	3c1d593a27	cumsum/cumprod derivatives not depending on TH. (#13579 ) Summary: This is identical to https://github.com/pytorch/pytorch/pull/13467 but doesn't include the tests in common_invocations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13579 Differential Revision: D12925404 Pulled By: gchanan fbshipit-source-id: 0a52fd26b15c7e0bbdfec03948f3e6c849e65091	2018-11-06 06:42:01 -08:00
Junjie Bai	95ca66763d	Add math functions overloaded over different numeric types for cuda and hip (#13602 ) Summary: petrex ashishfarmer rohithkrn iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/13602 Reviewed By: dzhulgakov Differential Revision: D12935797 Pulled By: bddppq fbshipit-source-id: a49ec66fb60bfd947c63dd2133d431884df62235	2018-11-06 01:40:31 -08:00
Hong Li	d03c6ba50d	Adding Fetching Real number representation Summary: Adding Fetching Real number representation for int8 tensor in workpace.py Reviewed By: harouwu Differential Revision: D12936556 fbshipit-source-id: f8756a37bce21c93d44d52faf5da9c9bd6473f4a	2018-11-05 23:35:24 -08:00
Jerry Zhang	3c32f897ca	Rename ndim() -> dim() - 3/6 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(ndim()->dim()): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: dzhulgakov Differential Revision: D12935748 fbshipit-source-id: fccec04e28ec049789f772e70d691382cb8927e0	2018-11-05 23:21:40 -08:00
Jie	bbacd859ab	Updating heuristics for cudnn persistent RNN (#13612 ) Summary: modifying rnn heuristics to exclude GPU with sm == 7.5 from using perssistent RNN Pull Request resolved: https://github.com/pytorch/pytorch/pull/13612 Differential Revision: D12937455 Pulled By: soumith fbshipit-source-id: 5cdaea083d55383b85dbe6e5443f1b36e578e4f5	2018-11-05 21:35:44 -08:00
David Riazati	fc6a9a19ea	Add torch._C._nn built-in, more weak fns (#13322 ) Summary: This PR adds functions defined in `torch._C._nn` as builtin functions (including inplace variants). This allows for the conversion of more functions to weak script NB: many `torch.nn.functional` functions will have to be slightly rewritten to avoid early returns (as with `threshold` in this PR) Converts these functions to weak script: * `threshold` * `relu` * `hardtanh` * `relu6` * `elu` * `selu` * `celu` * `leaky_relu` * `rrelu` * `tanh` * `sigmoid` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13322 Differential Revision: D12852203 Pulled By: driazati fbshipit-source-id: 220670df32cb1ff39d120bdc04aa1bd41209c809	2018-11-05 21:02:18 -08:00
zrphercule	10d67716db	bump docker image to 262 (#13581 ) Summary: We updated valgrind version in our recent docker image. https://github.com/pietern/pytorch-dockerfiles/pull/23 https://github.com/pytorch/ossci-job-dsl/pull/31 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13581 Reviewed By: goldsborough Differential Revision: D12936485 Pulled By: zrphercule fbshipit-source-id: 981532394b23e8d8ecfd6b2458ddf03710d5ac67	2018-11-05 20:43:39 -08:00
Teng Li	bad8235a3a	Disabling NCCL coalesced bcast test since it hangs in CI (#13606 ) Summary: Functionality test shouldn't be affected since we have both backends testing for the same thing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13606 Differential Revision: D12937185 Pulled By: teng-li fbshipit-source-id: 03d897b6690f7932654fdb7d11a07016dfffa751	2018-11-05 20:34:15 -08:00
Richard Zou	9ef98624b3	Don't allocate empty Storage/StorageImpl for Variable. (#13580 ) Summary: Variable owns a Tensor which already has a Storage/StorageImpl if necessary. The Variable ctor was unnecessarily allocating another Storage/StorageImpl, which costs around 200ns. This PR gets rid of that behavior and cuts the `as_variable` time from 670ns to 475ns, reducing Variable overhead Pull Request resolved: https://github.com/pytorch/pytorch/pull/13580 Differential Revision: D12925495 Pulled By: zou3519 fbshipit-source-id: 4f5ec33776baa848d1c318abcf40b57125b3bed7	2018-11-05 19:24:14 -08:00
zrphercule	02d3787a19	Support new upsample in symbolic, caffe2 backend & caffe2 frontend (#13272 ) Summary: We updated the description of upsample_op in onnx: https://github.com/onnx/onnx/pull/1467 Therefore, we need to support the new upsample_op in caffe2-onnx backend as well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13272 Reviewed By: houseroad Differential Revision: D12833656 Pulled By: zrphercule fbshipit-source-id: 21af5282abaae12d2d044e4018a2b152aff79917	2018-11-05 19:13:57 -08:00
Jerry Zhang	ebaabfbbd5	ReinitializeTensor function for refactoring Tensor as member variable (#13147 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13147 We want to refactor ``` class A { void func() { x_.Resize(dims); auto* data = x_.mutable_data<T>(); } Tensor x(CPU); }; ``` to ``` class A { void func() { ReinitializeTensor(&x_, dims, at::dtype<T>().device(CPU)); auto* data = x_.mutable_data<T>(); } Tensor x_; // Undefined Tensor }; ``` This diff adds the ReinitializeTensor function. Reviewed By: dzhulgakov Differential Revision: D10861298 fbshipit-source-id: 9f432297d07a4890e29bb68436364e0b2e2545e7	2018-11-05 19:13:55 -08:00
mruberry	a340dce133	Replaces c10d's CUDAEvent with ATen's (#13464 ) Summary: This PR: - Replaces c10d's CUDAEvent with ATen's, removing the two associated c10d files - Updates c10d's usage of CUDAEvent to reflect the ATen API - Updates c10d's usage of streams to reflect the ATen API - Removes use of historic THCState in the touched c10d files - (EDIT) Fixes a bug in CUDAEvent.h where events could be recorded on the wrong device. Now adds a device guard for this case. The controller you requested could not be found. pietern Pull Request resolved: https://github.com/pytorch/pytorch/pull/13464 Reviewed By: teng-li Differential Revision: D12924291 Pulled By: pietern fbshipit-source-id: b8ebe3e01e53d74e527ad199cca3aa11915c1fc0	2018-11-05 19:13:52 -08:00
Peter Goldsborough	e2272dd312	Remove ATen/README.md in favor of cppdocs/notes/tensor_basics.rst (#13601 ) Summary: Removes aten/README.md (and some other files dating from when aten was its own repo), and moves the not outdated documentation into a note called "Tensor Basics". I updated the text lightly but did not overhaul the content. CC zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13601 Differential Revision: D12934480 Pulled By: goldsborough fbshipit-source-id: 012a4267b4d6f27e4d5d55d6fc66363ddca10b41	2018-11-05 19:13:50 -08:00
Wanchao Liang	af4a228426	Fix erase_number_type pass, negative indices in c2 and some onnx symbolics (#12888 ) Summary: The PR did two things: 1. fix the bug in erase_number_type on node inputs 2. handle negative indices for dim-reduce in caffe2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12888 Reviewed By: houseroad Differential Revision: D12833486 Pulled By: wanchaol fbshipit-source-id: c3ceb400d91f0173b73ad95e392b010c3c14db7d	2018-11-05 19:13:49 -08:00
Daya S Khudia	2398a3255e	fbgemm submodule update (#13592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13592 submodule update for fbgemm Reviewed By: jspark1105 Differential Revision: D12929740 fbshipit-source-id: 546e4d7042696ffc5b0ee7cabd236ec944d218e7	2018-11-05 17:39:20 -08:00
Sebastian Messmer	b1c57caaf9	Move flat_hash_map to c10/util Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13527 Reviewed By: ezyang Differential Revision: D12912239 fbshipit-source-id: bb44d3ff87c4ca94943ec2667acf1e7ce2b3c914	2018-11-05 17:39:18 -08:00
Sebastian Messmer	b7c9575c93	Move LeftRight to c10/util Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13526 Reviewed By: ezyang Differential Revision: D12912241 fbshipit-source-id: 70525a9b20daa8aae623d0cb4002acecc34b1932	2018-11-05 17:39:16 -08:00
Peter Goldsborough	8fafa7b6ac	Remove size() from BatchDataset and templatize IndexType (#12960 ) Summary: This PR brings to changes to the recently landed C++ Frontend dataloader: 1. Removes the `size()` method from `BatchDataset`. This makes it cleaner to implement unsized ("infinite stream") datasets. The method was not used much beyond initial configuration. 2. Makes the index type of a dataset a template parameter of `BatchDataset` and `Sampler`. This essentially allows custom index types instead of only `vector<size_t>`. This greatly improves flexibility. See the `InfiniteStreamDataset` and `TestIndex` datasets in the tests for what this enables. Some additional minor updates and code movements too. apaszke SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/12960 Differential Revision: D12893342 Pulled By: goldsborough fbshipit-source-id: ef03ea0f11a93319e81fba7d52a0ef1a125d3108	2018-11-05 17:13:09 -08:00
David Riazati	1969898647	Convert functional dropouts to weak script (#13484 ) Summary: To convert `nn.functional.dropout` * `_VF` had to be exposed as a Python module so this PR adds a module class to forward to `torch._C._VariableFunctions` * rng state between calls in the tests needed to be made consistent Pull Request resolved: https://github.com/pytorch/pytorch/pull/13484 Differential Revision: D12929622 Pulled By: driazati fbshipit-source-id: 78b455db9c8856b94d2dda573fb7dc74d5784f56	2018-11-05 17:13:07 -08:00
David Riazati	23e3a12d5e	Add `pass` support to script (#13535 ) Summary: This PR adds basic support for `pass` statements Pull Request resolved: https://github.com/pytorch/pytorch/pull/13535 Differential Revision: D12929529 Pulled By: driazati fbshipit-source-id: 70c7c52630d46e76366c4caa875d6c5419a1e03f	2018-11-05 17:13:06 -08:00
David Riazati	df67d4180a	Validate schema with no returns (#13525 ) Summary: If there is no return type then the returns of the schema are not checked against the returns in the graph, so this PR adds an error if that case is detected. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13525 Differential Revision: D12929524 Pulled By: driazati fbshipit-source-id: da562e979482393098830bbded26729a2499152a	2018-11-05 16:51:55 -08:00
Peter Goldsborough	7b9d755d88	Restructure torch/torch.h and extension.h (#13482 ) Summary: This PR restructures the public-facing C++ headers in a backwards compatible way. The problem right now is that the C++ extension header `torch/extension.h` does not include the C++ frontend headers from `torch/torch.h`. However, those C++ frontend headers can be convenient. Further, including the C++ frontend main header `torch/torch.h` in a C++ extension currently raises a warning because we want to move people away from exclusively including `torch/torch.h` in extensions (which was the correct thing 6 months ago), since that used to be the main C++ extension header but is now the main C++ frontend header. In short: it should be possible to include the C++ frontend functionality from `torch/torch.h`, but without including that header directly because it's deprecated for extensions. For clarification: why is `torch/torch.h` deprecated for extensions? Because for extensions we need to include Python stuff, but for the C++ frontend we don't want this Python stuff. For now the python stuff is included in `torch/torch.h` whenever the header is used from a C++ extension (enabled by a macro passed by `cpp_extensions.py`) to not break existing users, but this should change in the future. The overall fix is simple: 1. C++ frontend sub-headers move from `torch/torch.h` into `torch/all.h`. 2. `torch/all.h` is included in: 1. `torch/torch.h`, as is. 2. `torch/extensions.h`, to now also give C++ extension users this functionality. With the next release we can then: 1. Remove the Python includes from `torch/torch.h` 2. Move C++-only sub-headers from `all.h` back into `torch.h` 3. Make `extension.h` include `torch.h` and `Python.h` This will then break old C++ extensions that include `torch/torch.h`, since the correct header for C++ extensions is `torch/extension.h`. I've also gone ahead and deprecated `torch::CPU` et al. since those are long due to die. ezyang soumith apaszke fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/13482 Differential Revision: D12924999 Pulled By: goldsborough fbshipit-source-id: 5bb7bdc005fcb7b525195b769065176514efad8a	2018-11-05 16:46:52 -08:00
Teng Li	1b64c0f8fe	Error msg on TCP backend (#13596 ) Summary: Clean it up from my queue: https://github.com/pytorch/pytorch/issues/12721 ``` >>> torch.distributed.init_process_group(backend="tcp") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 275, in init_process_group backend = DistBackend(backend) File "/private/home/tengli/pytorch/torch/distributed/distributed_c10d.py", line 55, in __new__ raise ValueError("TCP backend has been deprecated. Please use " ValueError: TCP backend has been deprecated. Please use Gloo or MPI backends for collective operations on CPU tensors. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13596 Differential Revision: D12931196 Pulled By: teng-li fbshipit-source-id: bb739b107ad7454e2e0a17430087161fedd4c392	2018-11-05 16:40:02 -08:00
Teng Li	74819087de	Mixed precision DDP hang fix and fine-grained option for DDP perf (#13496 ) Summary: When go to mixed precision fp16 training, DDP randomly hangs. Initially, I thought this smells like a similar NCCL bug I filed a while ago. It turns out it's not. Again, I am seeing different rank process has different size. How could this even happen? It turns out that take_tensors will generate a list of bucketed tensors in an un deterministic order, because, the key to the map is a pointer. An interesting bug digging and fix. Now fp16 DDP training should be fully working now. Also, added another take_tensor fine grained helper that aims to improve the performance of DDP, making it a TODO to replace the DDP take_tensors with that. Fixed: https://github.com/pytorch/pytorch/issues/12150 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13496 Differential Revision: D12920985 Pulled By: teng-li fbshipit-source-id: 26f3edae7be45a80fa7b2410a2e5a1baab212d9c	2018-11-05 16:22:15 -08:00
Peter Goldsborough	84cfc28f23	Note on Tensor Creation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13517 Differential Revision: D12914271 Pulled By: goldsborough fbshipit-source-id: df64fca6652525bc814f6fd3e486c87bff29b5b5	2018-11-05 16:10:58 -08:00
Adam Paszke	f6ff5d8934	Append parameters when checking graphs for TorchScript Methods (#13553 ) Summary: Also, add an assertion in the GraphExecutor to make sure we don't access memory out of bounds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13553 Differential Revision: D12924796 Pulled By: soumith fbshipit-source-id: ea2a134084538484178b8ebad33d6716a8e1d633	2018-11-05 16:07:36 -08:00
Bram Wasti	f3c197d6fa	Add explicit c10:: namespace to converter (#13593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13593 Should fix up master Reviewed By: orionr Differential Revision: D12929779 fbshipit-source-id: 23119f5bf1d9f1e37e8ed01bfa2cc40647725390	2018-11-05 14:52:16 -08:00
Pieter Noordhuis	7faca2a217	Add new style broadcast support in c10d/gloo (#13497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13497 This replaces the existing broadcast implementation with the new style collective call in the gloo backend. The CUDA path copies CUDA tensors to CPU tensors and then runs the CPU broadcast implementation. Reviewed By: teng-li Differential Revision: D12890013 fbshipit-source-id: 43f346fb2814f421bedc7babf89169703a46bb9c	2018-11-05 13:52:07 -08:00
Pieter Noordhuis	d2f26a450e	Add new style allreduce support in c10d/gloo (#13426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13426 This replaces the existing allreduce implementation with the new style collective call in the gloo backend. This is the first one to include both a CPU and a CUDA path. The CUDA path copies CUDA tensors to CPU tensors and then runs the CPU allreduce implementation. This is not much different from the current situation in the case where there is a single input tensor per call (which is the case when called from DistributedDataParallel). Reviewed By: teng-li Differential Revision: D12855689 fbshipit-source-id: 574281d762dd29149fa7f634fb71f8f6a9787598	2018-11-05 13:52:05 -08:00
Pieter Noordhuis	d50dd47ccd	Add reduce support in c10d/gloo (#13425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13425 This adds support for the new style reduce collective call in the gloo backend. Reviewed By: teng-li Differential Revision: D12869404 fbshipit-source-id: 93c641e6aba3b03c796bda80737547c565cfa571	2018-11-05 13:52:02 -08:00
Pieter Noordhuis	8f0f97749c	Add allgather support in c10d/gloo (#13424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13424 This adds support for the allgather collective call in the gloo backend. The gloo implementation does not support multiple inputs per rank (nor one or more outputs per rank), so we use a temporary flattened buffer and unflatten once the collective finishes. Reviewed By: teng-li Differential Revision: D12832009 fbshipit-source-id: 2f5c1934a338589cef1d3192bd92ada135fecd7a	2018-11-05 13:52:01 -08:00
Pieter Noordhuis	75c2b34c86	Add gather support in c10d/gloo (#13423 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13423 This adds support for the gather collective call in the gloo backend. The gloo implementation does not yet support the mode where the root has multiple output tensors (one per rank), so we use a temporary flattened buffer and unflatten on the root once the collective finishes. Reviewed By: teng-li Differential Revision: D12811647 fbshipit-source-id: 90fe8af8c390090b7d4ef43aa74f4e3e67ab9d0b	2018-11-05 13:51:59 -08:00
Pieter Noordhuis	9cfe9418e6	Add scatter support in c10d/gloo (#13422 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13422 This adds support for the scatter collective call in the gloo backend. This is the first of the new style collectives that do not expect to be created once and used many times. This commit contains some shortcuts to make this new style work side by side with the existing implementations (such as the std::tuple with nullptr's). These shortcuts are temporary until we have moved over all collectives to this new style. Reviewed By: teng-li Differential Revision: D12310219 fbshipit-source-id: 32e68717f819d5980f0e469d297204948351cefc	2018-11-05 13:51:57 -08:00
Sam Gross	98f5c005da	Speed up CPU threshold and relu implementation (#13182 ) Summary: ``` The previous threshold implementation was not vectorized or parallelized. This speeds up ResNet-50 CPU inference [1] from ~88 ms to ~67 ms CPU timings: https://gist.github.com/colesbury/d0d1be6974841d62696dbde329a8fde8 1 thread (before vs. after) 10240: 17.4 us vs. 6.9 µs per loop 102400: 141 us vs. 39.8 µs per loop 16 threads (before vs. after) 10240: 17.4 us vs. 6.7 µs per loop 102400: 141 us vs. 14.3 µs per loop CUDA timings are not measurably different. [1]: compiled with MKL-DNN, 8 threads, batch norm merged into convolutions https://gist.github.com/colesbury/8a64897dae97558b3b82da665048c782 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13182 Reviewed By: soumith Differential Revision: D12825105 Pulled By: colesbury fbshipit-source-id: 557da608ebb87db8a04adbb0d2882af4f2eb3c15	2018-11-05 12:51:29 -08:00
Lu Fang	b2127cfa9a	Make the inception onnx test more stable Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13563 Differential Revision: D12924968 Pulled By: houseroad fbshipit-source-id: ba43c88aabee749cb1e1307a412eacda4b8870b0	2018-11-05 12:39:00 -08:00
Jerry Zhang	5f514a483c	Move Half.{h, cpp} and Half-inl.h to c10 (#13361 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13361 att Reviewed By: Yangqing Differential Revision: D12853472 fbshipit-source-id: ad3b96cbc6904435553a6c9e58aa158ec77a2961	2018-11-05 12:32:12 -08:00
Jerry Zhang	e06f92785c	Move ATen/core/Macros.h to c10/macros/Macros.h Summary: EXT=h,cc,cpp,hpp,cxx,cu,cuh d=caffe2/aten/ codemod -m -d $d --extensions $EXT 'AT_HOST_DEVICE' 'C10_HOST_DEVICE' codemod -m -d $d --extensions $EXT 'AT_DEVICE' 'C10_DEVICE' codemod -m -d $d --extensions $EXT 'AT_HOST' 'C10_HOST' codemod -m -d $d --extensions $EXT 'AT_ANDROID' 'C10_ANDROID' codemod -m -d $d --extensions $EXT 'AT_IOS' 'C10_IOS' codemod -m -d $d --extensions $EXT 'AT_MOBILE' 'C10_MOBILE' codemod -m -d $d --extensions $EXT 'ATen/core/Macros.h' 'c10/macros/Macros.h' codemod -m -d $d --extensions $EXT 'HIP_HOST_DEVICE' 'C10_HIP_HOST_DEVICE' Reviewed By: dzhulgakov Differential Revision: D12851341 fbshipit-source-id: 7d540530ef779e16ddf2b4cdda9dcc85a61410c3	2018-11-05 12:32:11 -08:00
Pieter Noordhuis	8c182cd89e	Add overload of ProcessGroup.allreduce with list of tensors (#13576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13576 TSIA Reviewed By: SsnL Differential Revision: D12923457 fbshipit-source-id: 7824490548edbacac3cda81c7500bd1f851c6093	2018-11-05 11:56:49 -08:00
Peter Goldsborough	482b1366e6	Remove half_support.* (#13534 ) Summary: These two files are unused. I think at the time I moved the code into an inline extension (https://github.com/pytorch/pytorch/blob/master/test/test_cpp_extensions.py#L288) and forgot to delete the files. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/13534 Differential Revision: D12924365 Pulled By: goldsborough fbshipit-source-id: 050dd7da267008ea58a5dcc8febee7d7e443bc3d	2018-11-05 10:04:21 -08:00
Thomas Viehmann	f0ed927b62	Add diag_embed to ATen and torch (#12447 ) Summary: Fixes: #12160 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12447 Differential Revision: D12916234 Pulled By: SsnL fbshipit-source-id: 512a04efb0c2e0a54295b857a61be66c3aae13da	2018-11-05 08:55:28 -08:00
Brian Vaughan	07f8b61cc6	Roll operator t32802531 (#13261 ) Summary: Adding a roll operator Pull Request resolved: https://github.com/pytorch/pytorch/pull/13261 Differential Revision: D12922575 Pulled By: nairbv fbshipit-source-id: ff05c075d9c484a615011192b023debf47da4017	2018-11-05 08:33:36 -08:00
Jerry Zhang	e7242cbaf2	Rename dim(i) -> size(i) - 1/2 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(dim->size): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12896712 fbshipit-source-id: 909731691fab7799efbcfc3b5dcc9e531831c2d4	2018-11-05 07:27:04 -08:00
Daya S Khudia	3ea64bd80b	fbgemm submodule update (#13562 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13562 submodule update for fbgemm. This version of fbgemm has cmake minimum required same as pytorch. Without this, OSS build fails. Reviewed By: jianyuh Differential Revision: D12920951 fbshipit-source-id: 9ef532e715e3f7612fecc8430736633cf6b17f34	2018-11-05 07:22:34 -08:00
Adam Paszke	e988dc621b	Stop depending on static analysis of tensor types in graph fuser (#13387 ) Summary: Built on top of #13108, so please review only the last commit. This makes the graph fuser ignore input types (device/scalar type) when considering graphs for fusion, making it much more robust to shape-prop failures. Those properties are now checked at run time, as part of the kernel validation. This should enable graph fusions in `jit_premul` and `jit_multilayer` timelines in our benchmarks. One regression is that I've disabled fusions of comparison ops (and `type_as`). That's because there's really no good way to ensure that those are really valid, and are a source of bugs (I filed #13384). cc ngimel mruberry zdevito zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13387 Differential Revision: D12888104 Pulled By: zou3519 fbshipit-source-id: c233ea599679c34ac70fb4d8b8497c60aad9e480	2018-11-05 06:32:08 -08:00
Viswanath Sivakumar	505f9b4d63	Add Int8BatchPermutation op in DNNLOWP (#13539 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13539 This is used by OCR's FPN model to detect small/dense text. Just a simple permutations along the batch dim based on the input indices, and we can avoid the unnecessary quantize/dequantize ops. Reviewed By: csummersea Differential Revision: D12894055 fbshipit-source-id: d25639a5ffc2c490a0ee7ef307302eb2953c307e	2018-11-05 01:57:50 -08:00
Jongsoo Park	54e8623d26	3D Conv in NHWC layout (#12733 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12733 Conv in NHWC layout only works for 2D images. This has been a pain point when implementing quantized 3D convolution because we need NHWC layout for best performance (note that NHWC layout in general gives better performance in CPU not just for quantized operators). For example, our quantized ops have a functionality to measure quantized error operator by operator but this needs running a shadow fp32 operator, but this is not easy when there's no 3D conv in NHWC layout is available (currently we're doing layout conversion on the fly for the shadow fp32 operator which is error prone). Some of Caffe2 frameworks like brew generates error when we try to create a 3D conv op in NHWC layout. This was also a blocker for using aibench because aibench is using brew. i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D10333829 fbshipit-source-id: 2d203ee1db833cd3f9d39353219e3894b46c4389	2018-11-04 21:50:09 -08:00
Hector Yuen	274f3c0951	add explicit fpga context (#13318 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13318 add a context to describe fpga this will remove the need of having opencl with fpga engine the next step is to change the opencl implementation to explicitly use the fpga context Reviewed By: soumith Differential Revision: D12828795 fbshipit-source-id: 0700a83672d117d7aa3d941cd39c2ae627cb6e5f	2018-11-04 21:47:45 -08:00
albanD	246d5282b3	fix handling of single input in gradcheck (#13543 ) Summary: Now gradcheck properly accept a single Tensor as input. It was almost supported already but not completely. Should fix the confusion from #13540 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13543 Differential Revision: D12918526 Pulled By: soumith fbshipit-source-id: a5bad69af0aea48c146f58df2482cabf91e24a01	2018-11-04 20:28:34 -08:00
Dmytro Dzhulgakov	fdf34c8da8	Kill more weird constructors on Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13433 Reviewed By: jerryzh168 Differential Revision: D12874599 fbshipit-source-id: 0c262fda72cbc4f3ea80df790cc8e95140bdc7e0	2018-11-04 16:54:49 -08:00
Jongsoo Park	f000101b81	add a few comments on layout after im2col (#12429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12429 Comments to clarify layout after NHWC im2col for group convolution. i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D10233284 fbshipit-source-id: 996a69f2f932e02c978abaade7571b00741b6ae8	2018-11-04 11:02:58 -08:00
Daya S Khudia	6b578cd388	update fbgemm submodule (#13547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13547 update fbgemm submodule Reviewed By: jspark1105, jianyuh Differential Revision: D12917297 fbshipit-source-id: ad9b2c7f119ca159af3826266b59ec26fc54911c	2018-11-04 09:15:17 -08:00
Viswanath Sivakumar	c1ed1b4779	Duplicate bias blobs shared by different conv ops to handle scale correctly (#13538 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13538 In architectures such as FPN (https://arxiv.org/abs/1612.03144), few Conv ops share the same weight and bias and are run at different scales of the input. Since 'bias_scale = input_scale * weight_scale', sharing the same bias blob among multiple Conv ops means that we need different bias scale for each of the ops. To achieve this, we just duplicate those bias blobs that are used by multiple Conv ops before performing int8 rewrite. Reviewed By: csummersea Differential Revision: D12854062 fbshipit-source-id: 42a2951877819339b117f13f01816291a4fa6596	2018-11-04 04:15:28 -08:00
Jongsoo Park	2a6850bf73	remove unnecessary files (#13537 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13537 plot_hist.py and dnnlowp_fc_perf_comparison.py were not supposed to be in operators/quantized/server Reviewed By: hx89 Differential Revision: D12916259 fbshipit-source-id: f5bc0c01a4924cad6f82eff624ba5f79becbea33	2018-11-04 01:01:28 -07:00
Jongsoo Park	8be0efaa8c	omit group conv NHWC test for HIP (#13554 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13554 D10233252 broke ROCM test. We don't have group conv in NHWC for hip yet and this diff omits related tests. Reviewed By: hyuen Differential Revision: D12917880 fbshipit-source-id: 9baf36a8cb061ee8cf393b2c438a2d1460ce5cd8	2018-11-03 21:18:23 -07:00
Ilija Radosavovic	9e432b593d	Include caffe2 proto headers in pytorch package data (#13217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13217 Caffe2 proto headers are not included in pytorch package data (https://github.com/pytorch/pytorch/blob/master/setup.py#L1180). However, they are required for building custom Caffe2 ops living outside PyTorch/Caffe2 repo (e.g. custom Detectron ops). Reviewed By: pjh5 Differential Revision: D12815881 fbshipit-source-id: 4d1aaa6a69a2193247586e85e4244fbbdb3e8192	2018-11-03 16:19:39 -07:00
Ilija Radosavovic	149afef5c4	Include lib subdir in caffe2 include dirs path (#13216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13216 Caffe2 headers are placed under `lib/include` in pytorch package data (https://github.com/pytorch/pytorch/blob/master/setup.py#L1201). However, `CAFFE2_INCLUDE_DIRS` path is set to `"${_INSTALL_PREFIX}/include"` which does not exist in package data. This results in issues when trying to build custom Caffe2 ops living outside Caffe2/PyTorch repo (e.g. custom Detectron ops). Reviewed By: pjh5 Differential Revision: D12815878 fbshipit-source-id: 7cb1b4a729f8242b7437e3f30dace3b9cf044144	2018-11-03 16:19:38 -07:00
Jongsoo Park	d40b23e750	remove unused use_scratch argument from batch_matmul (#11745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11745 use_scratch was introduced in D5834868 but D8944686 refactored GemmStridedBatched and use_scratch is not used anywhere and not documented as far as I can tell. Reviewed By: BIT-silence Differential Revision: D9846488 fbshipit-source-id: 915d92aa57bc211888dfb09ad657f7c2b4f4b71c	2018-11-03 15:31:24 -07:00
Jongsoo Park	2bc6a7a260	enable group conv test in NHWC layout in CPU (#12428 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12428 Group conv in NHWC layout was enabled in CPU after D7547497. In D7547497, unit test of group conv in NHWC layout in CPU was enabled in group_conv_test.py but not in conv_test.py . This diff also enables it in conv_test.py . Reviewed By: BIT-silence Differential Revision: D10233252 fbshipit-source-id: aeeaf3eedc60e1cf6321b5a1dbe6a561e3aacbde	2018-11-03 11:58:51 -07:00
Soumith Chintala	2b280c6b74	minor build fixes for incremental builds (#13293 ) Summary: Workaround a cmake-ninja bug, which doesn't track the dependency between xxx-generated-xxx.cu and updating the timestamp of build.ninja, (the consequence being cmake is rerun on a next rebuild). This was surfaced after analyzing the outputs of `ninja -d explain install` Now, compared to https://github.com/pytorch/pytorch/pull/11487#issue-214450604 we're seeing: ``` python setup.py rebuild develop # first time - ~1m 42s python setup.py rebuild develop # second time - ~12 s ``` This gets even faster if we replace the default linker with multithreaded linkers like `lld` or `gold` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13293 Differential Revision: D12916346 Pulled By: soumith fbshipit-source-id: 3817c09a9a687fa2273f90444e5071ce1bb47260	2018-11-03 09:53:04 -07:00
Peter Goldsborough	0479517325	Add modernize-* checks to clang-tidy (#13196 ) Summary: Enables almost all `modernize-*` checks in clang-tidy. This warns against things such as: - Use of `const std::string&` instead of new-style `std::string` + move, - Using old-style loops instead of range-for loops, - Use of raw `new` - Use of `push_back` instead of `emplace_back` - Use of `virtual` together with `override` (`override` is sufficient) ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13196 Differential Revision: D12891837 Pulled By: goldsborough fbshipit-source-id: 4d0f782a09eb391ee718d3d66f74c095ee121c09	2018-11-02 20:30:40 -07:00
Mingzhe Li	4bca51e3e7	unify BLAS check between Caffe2 and ATen (#13514 ) Summary: This PR is unifying BLAS check between Caffe2 and ATen. It skips redundant BLAS check for ATen in certain conditions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13514 Reviewed By: orionr Differential Revision: D12905272 Pulled By: mingzhe09088 fbshipit-source-id: 05163704f363c97a762ff034f88a67bd32ac01d0	2018-11-02 18:40:10 -07:00
Haixin Liu	8fc63e523e	Reslove lint and infer warning (#13520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13520 Reslove lint and infer warning shown in the dnnlowp migration diff. Reviewed By: dskhudia Differential Revision: D12905972 fbshipit-source-id: b07400e25b80ea656795b005b91ac1438abe2695	2018-11-02 17:43:49 -07:00
Wanchao Liang	f74fa91b8e	Fix EraseListConstruct pass during ONNX export (#13195 ) Summary: There should really be a single place to erase or do special treatment to the prim::ListConstruct during ONNX export, this will make it consistent across different calls. e.g it will give a correct output graph in the following case: ```python class Test(torch.nn.Module): def forward(self, input): return torch.cat([input, torch.zeros(input.size(0), 1).type_as(input)], dim=1) ``` Before this PR, we have the onnx graph as: ``` graph(%0 : Byte(2, 3)) { %1 : Long() = onnx::Constant[value={0}](), scope: Test %2 : Dynamic = onnx::Shape(%0), scope: Test %3 : Long() = onnx::Gather[axis=0](%2, %1), scope: Test %4 : Long() = onnx::Constant[value={1}](), scope: Test %5 : Dynamic = onnx::Unsqueeze[axes=[0]](%3) %6 : Dynamic = onnx::Unsqueeze[axes=[0]](%4) %7 : int[] = onnx::Concat[axis=0](%5, %6) %8 : Float(2, 1) = onnx::ConstantFill[dtype=1, input_as_shape=1, value=0](%7), scope: Test %9 : Byte(2, 1) = onnx::Cast[to=2](%8), scope: Test %10 : Byte(2, 4) = onnx::Concat[axis=1](%0, %9), scope: Test return (%10); } ``` Which is wrong since onnx does not have a concept of `int[]`, here is the onnx graph after this PR: ``` graph(%0 : Byte(2, 3)) { %1 : Long() = onnx::Constant[value={0}](), scope: Test %2 : Dynamic = onnx::Shape(%0), scope: Test %3 : Long() = onnx::Gather[axis=0](%2, %1), scope: Test %4 : Long() = onnx::Constant[value={1}](), scope: Test %5 : Dynamic = onnx::Unsqueeze[axes=[0]](%3) %6 : Dynamic = onnx::Unsqueeze[axes=[0]](%4) %7 : Dynamic = onnx::Concat[axis=0](%5, %6) %8 : Float(2, 1) = onnx::ConstantFill[dtype=1, input_as_shape=1, value=0](%7), scope: Test %9 : Byte(2, 1) = onnx::Cast[to=2](%8), scope: Test %10 : Byte(2, 4) = onnx::Concat[axis=1](%0, %9), scope: Test return (%10); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13195 Differential Revision: D12812541 Pulled By: wanchaol fbshipit-source-id: db6be8bf0cdc85c426d5cbe09a28c5e5d860eb3e	2018-11-02 15:09:06 -07:00
Jerry Zhang	519570def8	Rename dim(i) -> size(i) - 2/2 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(dim->size): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: salexspb Differential Revision: D12896721 fbshipit-source-id: deb0290354a1ffd69d080f0f126479844bf04e3c	2018-11-02 14:29:06 -07:00
Pieter Noordhuis	7b48a7c3f6	Bump gloo (#13513 ) Summary: Included math.h changes needed in #13422 and later PRs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13513 Differential Revision: D12906653 Pulled By: pietern fbshipit-source-id: 4d4ec7566bf07925b4ce86eb0c63d784cb6b9992	2018-11-02 12:16:17 -07:00
Junjie Bai	da029ca042	Skip Conv1D tests for MIOPEN (#13512 ) Summary: miopen currently only supports 2d Pull Request resolved: https://github.com/pytorch/pytorch/pull/13512 Differential Revision: D12903307 Pulled By: bddppq fbshipit-source-id: a8b0f0580a1859f1e0c1518907406abf013c4c8c	2018-11-02 11:38:26 -07:00
Xiaomeng Yang	34dd831dc2	Revert MKL rowwise moments (#13480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13480 Revert D12845220 since the MKL functions are using multi-thread while the single-thread run is slower than eigen version. i-am-not-moving-c2-to-c10 Reviewed By: houseroad Differential Revision: D12891751 fbshipit-source-id: 2a61727b269a304daeee2af6ff7fee7820cb5344	2018-11-02 11:31:43 -07:00
Guoxia Wang	cc3cecdba0	Fix the bug when compile using nvcc compiler. (#13509 ) Summary: I found a bug about compiling the cuda file when I install maskrcnn-benchmark lib. `python setup.py build develop` will throw the error: ``` File "/usr/local/lib/python2.7/dist-packages/torch/utils/cpp_extension.py", line 214, in unix_wrap_compile original_compile(obj, src, ext, cc_args, cflags, pp_opts) File "/usr/lib/python2.7/distutils/unixccompiler.py", line 125, in _compile self.spawn(compiler_so + cc_args + [src, '-o', obj] + TypeError: coercing to Unicode: need string or buffer, list found ``` For more information, please see [issue](https://github.com/facebookresearch/maskrcnn-benchmark/issues/99). Pull Request resolved: https://github.com/pytorch/pytorch/pull/13509 Differential Revision: D12902675 Pulled By: soumith fbshipit-source-id: b9149f5de21ae29f94670cb2bbc93fa368f4e0f7	2018-11-02 11:09:43 -07:00
Gregory Chanan	2827fc7681	Add native wrappers for inplace bitwise operators. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13490 Differential Revision: D12894826 Pulled By: gchanan fbshipit-source-id: bd7a0a50e824d92f8ad39e159c1c10318741191d	2018-11-02 11:03:24 -07:00
Tongzhou Wang	9f2b2cac37	Fix handling all empty bags in CUDA embedding bag (#13483 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11847 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13483 Differential Revision: D12902914 Pulled By: SsnL fbshipit-source-id: 577a53e815231e988da716b1ee5667e1f36408ca	2018-11-02 10:21:14 -07:00
Haixin Liu	3d392cc5ec	Migrate dnnlowp code to open source directory (#13500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13500 This diff migrate dnnlowp related files and operators from deeplearning/quantization/caffe2 and deeplearning/quantization/dnnlowp to the open source directory. Reviewed By: jspark1105 Differential Revision: D10842192 fbshipit-source-id: 53d0666d0ae47a01db9c48114345d746b0a4f11f	2018-11-02 09:36:59 -07:00
Gregory Chanan	bcb851a3d6	Write gesv derivatives in terms of native function. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13469 Reviewed By: ezyang Differential Revision: D12889116 Pulled By: gchanan fbshipit-source-id: 1a25dd6ec3fda5897c5cabbb9a62423b50bfda36	2018-11-02 08:30:24 -07:00
Freddie Mendoza	1e1dd88c4a	Add Linux ppc64le CPU/GPU CI build status Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13507 Differential Revision: D12902281 Pulled By: soumith fbshipit-source-id: d2c89dcf08dcbe1e451ae52e85256f658155a0e1	2018-11-02 07:51:40 -07:00
Tongzhou Wang	2f82a06826	Fix half_tensor.bernoulli_(double) (#13474 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/12431 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13474 Differential Revision: D12897834 Pulled By: SsnL fbshipit-source-id: 598250fd7b9f1d2509ec0e5012724d7895a62daf	2018-11-02 07:46:46 -07:00
Sergei Nikolaev	61a2d47ec6	Special handling for 1D covolutional kernels in cuDNN flavor of conv_op. (#12902 ) Summary: Essentially makes cuDNN to think of those kernels like of Nx1 ones. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12902 Reviewed By: BIT-silence Differential Revision: D10852862 Pulled By: soumith fbshipit-source-id: 7416cf6d131177340d21cbf1d42c1daa6c7cad8c	2018-11-02 07:08:23 -07:00
Zachary DeVito	86192301b3	Fix a few bugs in format and vararg handling (#13492 ) Summary: There are a couple subtle bugs in the way varargs is implemented: 1. it fails if you pass 0 arguments, because it doesn't handle the case when there are 0 varargs, and because Operator::matches was not updated. 2. it breaks all the named-based lookups on nodes. For instance node->get<int>(attr::value) will return a single entry of the varargs if you look it up by name. Furthermore it complicates some assumptions about the positional arguments (e.g. they use to be 1-to-1 with node inputs but with varargs they are not). Because varargs are only being used for format, this diff instead just allows format to take any value as input, regardless of type. It just provides a way to set is_vararg from the schema but does not restrict the type of the varargs things. This is inline with the pre-existing behavior for is_vararg so it doesn't require Operator::matches changes. This also keeps format inline with how print works, and is closer to the python implementation of format. Note that the implementation of format already worked with arbitrary IValues so restricting to strings was just making it more conservative than needed. This also fixes the implementation of format to work when there are 0 arguments or text before and after a format string, where it would not print things. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13492 Differential Revision: D12896989 Pulled By: zdevito fbshipit-source-id: 21425bac8edc81709030a7408180494edea0a54b	2018-11-02 00:07:00 -07:00
Michael Suo	5fbaf0eaf8	add augmented assignment ops (#13364 ) Summary: This PR changes the compiler to correctly emit in-place operators for augmented assignments (`+=` and friends). - To better match the Python AST structure, add an `AugAssign` tree view and make `Assign` apply only to `=` assignments. - Emit those `AugAssign` exprs in the compiler, dispatching to in-place aten ops for tensors and lowering to simple assignments for scalar types. - In order to preserve (suspect) ONNX export semantics, add a pass to lower the in-place operators to out-of-place operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13364 Differential Revision: D12899734 Pulled By: suo fbshipit-source-id: bec83be0062cb0235eb129aed78d6110a9e2c146	2018-11-02 00:01:07 -07:00
Fei Sun	a0e783768f	Do not fill in new data in every iteration if the input data only has one entry (#13495 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13495 If the user has one data file to put in, the data is filled up in every iteration, which actually flushes the caches. The retrieved latency is larger than the latency when the caches are warm. Instead of doing that, we should only rely on wipe_cache variable to wipe the caches. The change is to skip filling the data if the input only has one size and it is not the first iteration Reviewed By: hl475 Differential Revision: D12897946 fbshipit-source-id: ee54ed09b8ec85fcefe930858420b90d494ad972	2018-11-01 22:06:09 -07:00
Michael Suo	57e162da56	Switch mutable lists to new mutable schema (#13406 ) Summary: Goodbye, World! This PR removes the world tokens and associated pass and switches lists over to the new mutability/aliasing annotations. Should resolve #12780 since we are disabling optimization pending alias analysis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13406 Differential Revision: D12886463 Pulled By: suo fbshipit-source-id: e64e55905aebdcad273b39862df3209f823f5408	2018-11-01 19:41:04 -07:00
Tongzhou Wang	6d2b3cc869	Fix pytest, make it work with run_test.py (#13416 ) Summary: Fixes #13326 Also now you can use `run_test.py` with `pytest`. E.g., ``` python run_test.py -vci distributed -pt ``` Yes it works with `distributed` and `cpp_extension`. cc zou3519 vishwakftw Pull Request resolved: https://github.com/pytorch/pytorch/pull/13416 Differential Revision: D12895622 Pulled By: SsnL fbshipit-source-id: 2d18106f3a118d642a666bfb1318f41c859c3df7	2018-11-01 19:08:06 -07:00
Wanchao Liang	0fd176fea4	Add operator is, not, is not to script (#13336 ) Summary: As titled, this PR is a part of tasks to unblock exporting the standard library. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13336 Differential Revision: D12888912 Pulled By: wanchaol fbshipit-source-id: 6213a17a75a593ae45999994fd9562f29b7d42df	2018-11-01 16:55:28 -07:00
Pieter Noordhuis	24839aac59	Link libgloo.a after libc10d.a to resolve remaining symbols (#13462 ) Summary: libcaffe2.so depends on libgloo.a for the ops in caffe2/contrib/gloo. Symbols in libgloo.a that are not used are ignored and don't end up in libcaffe2.so. libc10d.a depends on the caffe2 target, which in turn depends on the gloo target, and it expects all libgloo.a symbols to be part of libcaffe2.so. Symbols from libgloo.a that are not used in libcaffe2.so remain undefined in libc10d.a. To fix this, we link to libgloo.a when linking _C.so, such that any gloo symbols in libc10d.a are resolved when linking _C.so. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13462 Differential Revision: D12892830 Pulled By: pietern fbshipit-source-id: 7560b3899b62f76081b394498480e513a84cefab	2018-11-01 16:03:33 -07:00
Xiaodong Wang	e6b6cc06ee	caffe2/core hipify (#13457 ) Summary: Small edits to caffe2/core hipify to make it compile in fbcode. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13457 Reviewed By: bddppq Differential Revision: D12883472 Pulled By: xw285cornell fbshipit-source-id: 1da231d721311d105892db13ed726240398ba49e	2018-11-01 15:49:56 -07:00
Elias Ellison	421f3f3e52	add npair builtins (#13473 ) Summary: Add npair builtins to unblock standard library. As with broadcasting list, the only occurrences are with int/floats. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13473 Differential Revision: D12890844 Pulled By: eellison fbshipit-source-id: c360bb581d0f967cb51b858b6f964c300992d62a	2018-11-01 15:42:52 -07:00
Peter Goldsborough	27002e3fd5	Enable a few hicpp (#13189 ) Summary: Enabling three checks from ["High Integrity C++"](https://www.perforce.com/blog/qac/high-integrity-cpp-hicpp) ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13189 Differential Revision: D12859779 Pulled By: goldsborough fbshipit-source-id: 8ec22370dcf88618dae749a8dae0e82678e68b0e	2018-11-01 15:19:17 -07:00
Yufei Wang	d843f63f2a	optimization on cpu conv3d (#11884 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11884 In cpu mode, current convNd uses Im2ColNdNCHWImpl, which is generic implementation to handle convolutional layer for arbitrary number of dimensions. In video modeling, we use convNd for filter dimension=3. The problem of current convNd is that Im2ColNdNCHWImpl is much slower than Im2Col used by conv2d for the filters with same Flops. For example, a (1, 7, 7) 3d filter takes 5 times longer than a (7, 7) 2d filter at inference time. This diff extends Im2Col to 3d case (Im2Col3dNCHWImpl), and this optimization for 3d convolution gives 4~5 times faster inference time on cpu for various video models: {F128300920} i-am-not-moving-c2-to-c10 Reviewed By: BIT-silence Differential Revision: D8245940 fbshipit-source-id: 75231d65c9dd56059dfe31701e26021fd1ff2a85	2018-11-01 15:13:26 -07:00
vishwakftw	d714ecf879	Rename potrf to cholesky (#12699 ) Summary: This PR performs a renaming of the function `potrf` responsible for the Cholesky decomposition on positive definite matrices to `cholesky` as NumPy and TF do. Billing of changes - make potrf cname for cholesky in Declarations.cwrap - modify the function names in ATen/core - modify the function names in Python frontend - issue warnings when potrf is called to notify users of the change Reviewed By: soumith Differential Revision: D10528361 Pulled By: zou3519 fbshipit-source-id: 19d9bcf8ffb38def698ae5acf30743884dda0d88	2018-11-01 15:10:55 -07:00
Adam Paszke	26a8bb62ee	Re-enabled mm+add tree batching in the JIT (#13228 ) Summary: I've had to generously increase the range of the CreateADSubgraphs pass, because even though it collapses the RNN loop to a single differentiable subgraphs and a few other nodes, the range uses the distances in the original graph... cc zdevito zou3519 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13228 Differential Revision: D12871316 Pulled By: zou3519 fbshipit-source-id: 32da6f30f7821e4339034f1a4dec41ed0849abfb	2018-11-01 14:50:17 -07:00
Nim Arora	81438f1220	Add transpose network pass (#13437 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13437 revert transform the NCHW Convolution operators to NHWC and the tensors around these operators Reviewed By: bwasti Differential Revision: D12871789 fbshipit-source-id: 6509a29fa1654424d22904df0d3e60f8cd9c0ec7	2018-11-01 14:27:07 -07:00
Nim Arora	a1728602da	Convert Arguments to dictionary (#13436 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13436 revert Add a utility function to convert a list of caffe2_pb2.Argument to a dictionary. Reviewed By: bwasti Differential Revision: D12871811 fbshipit-source-id: 486ad09f3f37723c92a946c486ce3e24a649b4e6	2018-11-01 14:27:05 -07:00
Peter Goldsborough	469c6b0539	Replace tmpnam usage (#13289 ) Summary: Fix ``` /torch_shm_manager#compile-manager.cpp.oc089dac2,gcc-5-glibc-2.23-clang/manager.cpp.o:manager.cpp:function main: warning: the use of `tmpnam' is dangerous, better use `mkstemp` ``` apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/13289 Differential Revision: D12873282 Pulled By: goldsborough fbshipit-source-id: fc64b59403d52eb271744378ef4ee8338c79312c	2018-11-01 13:50:43 -07:00
Elias Ellison	edc6d721e0	fix flake (#13463 ) Summary: fix flake on test/test_jit.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/13463 Differential Revision: D12886532 Pulled By: eellison fbshipit-source-id: 1cd2a736663d5037bb4bdcd1d8ca1f201cf6a1cf	2018-11-01 13:39:39 -07:00
David Riazati	99ce499bfe	Revert D12852205: [pytorch][PR] [jit] Add str() builtin Differential Revision: D12852205 Original commit changeset: 3e0e9218afdf fbshipit-source-id: 114b4873504109394fe9d489200d39764ecc638e	2018-11-01 12:48:48 -07:00
Benoit Steiner	e2e560d9c8	Improved the caffe2 to ONNX export (#13429 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13429 Made the SSA transformation idempotent. This ensures that if a caffe2 graph is already in SSA form, the name of the ONNX models inputs/outputs match these of the caffe2 graph. Avoid evaluating the model by running it if the shapes of all the blobs are present in the value_info map. This speeds up the conversion and decrease its memory usage in the case of medium to large nets. Reviewed By: abadams Differential Revision: D12873354 fbshipit-source-id: d695b28e610562afa9a41c2d4da05be212ccb488	2018-11-01 12:40:24 -07:00
Daya Khudia	54d63c5752	added fbgemm as submodule (#13354 )	2018-11-01 15:35:02 -04:00
Peter Goldsborough	c2dd0b9fad	Put torch/csrc/jit/fuser/config.h in gitignore Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13461 Differential Revision: D12886222 Pulled By: goldsborough fbshipit-source-id: f7cfb65f671129f46b5eafd75a6b00fa996371ac	2018-11-01 12:27:57 -07:00
Sam Gross	de0d85ba98	Remove getTHCudaHostAllocator in favor of getPinnedMemoryAllocator (#13451 ) Summary: ``` Both allocate "pinned" memory on the host (CPU). The allocator returned by at::cuda::getPinnedMemoryAllocator caches allocations, while getTHCudaHostAllocator would synchronize on frees. ``` This is super minor, but I want to avoid people grabbing getTHCudaHostAllocator by accident. (It's not currently used anywhere). We still need a better API for allocating pinned memory from both C++ and Python. (See https://github.com/pytorch/pytorch/issues/2206) Pull Request resolved: https://github.com/pytorch/pytorch/pull/13451 Differential Revision: D12883037 Pulled By: colesbury fbshipit-source-id: 5d327e715acc1ded9b19660f84ecd23c8334d1c1	2018-11-01 12:18:29 -07:00
David Riazati	8f2bc1bc56	Add str() builtin (#13278 ) Summary: Allow casting to string from any IValue type Pull Request resolved: https://github.com/pytorch/pytorch/pull/13278 Differential Revision: D12852205 Pulled By: driazati fbshipit-source-id: 3e0e9218afdf27569da3ebf155f25e77e9f12984	2018-11-01 12:01:50 -07:00
Elias Ellison	70db53661b	expose fixed length list argument (#13142 ) Summary: Arguments have an optional fixed length list field which allows either a list or a single element that will be broadcast to a fixed length. This PR exposes that as a denotable argument, mostly to cover the many instances in which this used in the standard library. It appears in the standard library with ints & floats. Since this is not really a pattern we want to promote moving forward, I did not expose this for booleans or tensors. We could consider making the optional static length part of the list type, instead of the argument, which would make some of this code much nicer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13142 Differential Revision: D12876047 Pulled By: eellison fbshipit-source-id: e7359d2a878b4627fc2b9ebc090f9849ee524693	2018-11-01 10:34:52 -07:00
Tongzhou Wang	99a5d19591	Rename elementwise_mean to mean (#13419 ) Summary: Closes #12459 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13419 Differential Revision: D12883299 Pulled By: SsnL fbshipit-source-id: 8b4512ff73b66fdc674412904dbb3bf497ba70a7	2018-11-01 10:31:26 -07:00
Elias Ellison	a5b627a0bf	add assert statements (#13408 ) Summary: Adding assert statements to unblock standard library. The same limitations that apply to the existing implementation of Exceptions apply to this as well (No control-flow logic, & we ignore the specific Exception thrown). Pull Request resolved: https://github.com/pytorch/pytorch/pull/13408 Reviewed By: driazati Differential Revision: D12876451 Pulled By: eellison fbshipit-source-id: 767ba5a50ba7c5dd6a857ed4845ac076a81cf305	2018-11-01 10:01:07 -07:00
Richard Zou	004fc2f430	Stop unnecessarily setting storage in as_strided. (#13411 ) Summary: As per ezyang's suggestion Previously, tensor.as_strided would: - allocate a tensor `result` and a storage - throw away that storage in favor of the input tensor's storage. This PR makes tensor.as_strided not allocate a storage just to throw it away. This speeds up as_strided from 770ns to 344ns. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13411 Reviewed By: ezyang Differential Revision: D12870309 Pulled By: zou3519 fbshipit-source-id: 1415e656f4d1931585c9a6006dcd4670123352d0	2018-11-01 08:32:53 -07:00
Edward Yang	c0e24443f7	Revert D10459665: [c10] Redo jit/type and utils/functional to ATen/core Differential Revision: D10459665 Original commit changeset: 563dec9987aa fbshipit-source-id: bea1dac93ebe73c9e09753d641f04f722d80aef7	2018-11-01 07:26:54 -07:00
Huamin Li	8444ed951d	add sleep time between runs (#12347 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12347 add sleep time between net and operator runs, and between each iteration. Reviewed By: sf-wind Differential Revision: D10209308 fbshipit-source-id: 9a42b47e1fdc14b42dba6bb3ff048fe8e2934615	2018-11-01 00:25:22 -07:00
Junjie Bai	86e1009497	Make ATen core HIP compatible (#13343 ) Summary: So caffe2 can include aten core files without hipifying aten cc xw285cornell Pull Request resolved: https://github.com/pytorch/pytorch/pull/13343 Reviewed By: xw285cornell Differential Revision: D12853162 Pulled By: bddppq fbshipit-source-id: f9402691292180dde110a58ea3b1cedc62aab0ba	2018-10-31 21:08:54 -07:00
Bram Wasti	10a6a3e404	Redo jit/type and utils/functional to ATen/core (#12862 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12862 This is a redo of the previous move in a way that doesn't migrate the namespace -- also will check for the windows cudnn build failure Reviewed By: Yangqing Differential Revision: D10459665 fbshipit-source-id: 563dec9987aa979702e6d71072ee2f4b2d969d69	2018-10-31 19:57:43 -07:00
Cheng,Penghui	c76fc75292	Implementation copy operator for mkl-dnn (#12820 ) Summary: It is a operator to copy blob from ideep device to ideep device. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12820 Reviewed By: ezyang Differential Revision: D10850956 Pulled By: yinghai fbshipit-source-id: f25bff6238cefe847eb98277979fa59139bff843	2018-10-31 19:35:53 -07:00
Tongzhou Wang	96ab7cbe5c	Make gels error message nicer (#13421 ) Summary: cc vishwakftw Pull Request resolved: https://github.com/pytorch/pytorch/pull/13421 Differential Revision: D12875237 Pulled By: SsnL fbshipit-source-id: 889a9820be77bb8055d41e395d7bf55d092b35d7	2018-10-31 19:25:57 -07:00
mruberry	6fe089c6ea	Hierarchical device independent -> device specific architecture (#13108 ) Summary: This PR principally redesigns the fuser's logical flow to be hierarchical, with device-independent logic directing (relatively little) device-specific logic. This design is based on reviews of XLA, TVM, internal design review at NVIDIA and discussions with fuser owners at Facebook. To further vet the design I have begun developing the next significant PR (extended fusion logic) on top of this architecture and it has made the work significantly easier. This PR also improves fuser modularity, which should make it easier for others to contribute to. Unfortunately, this PR is large and its nature has made breaking it into smaller pieces challenging. Future PRs should be smaller. The fusion flow is now: - Fusions are "registered" and "upfront compilation" occurs. The fusion specifications, which includes the graph, go into a thread-safe device-independent cache. Upfront compilation generates some information used later during shape inference. - Fusions are run, which passes them to an executor that performs shape inference, requests an instantiated fusion from the specification's thread-safe store, and launches them. Launch logic eventually defers to device-specific logic. - Fusions not previously instantiated are compiled. Compilation is device-specific and arg-specific. Compilation logic eventually defers to device-specific logic. - If the fusion could not be run because fusion on the requested device is disabled or shape inference fails a fallback is invoked. This flow can be thought of as PyTorch IR -> Device-Independent Fusion Logic -> Device-Specific Fusion Logic. The current upstream logic is, by contrast, PyTorch IR -> Device-Specific Logic -> Device-Independent Logic, which results in needless code duplication and lack of conceptual clarity. That was my mistake when splitting the fuser off from the rest of the jit and our reviews since then have been incredibly helpful in understanding why the approach in this PR is better. This PR does not only move code around. It also fixes few couple bugs and makes some logical/code changes. Bug fixes: - thread-safety is improved with caches preventing concurrent access - the nvrtc version is now reviewed to determine the appropriate compute architecture to compile for, fixing a bug that would cause runtime errors if a user's nvrtc didn't support the compute architecture their gpu reported - an issue with DeviceGuard not setting the device properly and failing silently is worked-around (ezyang mentioned he was reviewing the dynamic registration DeviceGuard uses, which may resolve the issue) Code/Logical changes: - "const" now appears many more places (note: I cast const away in operator.h because of some obscure build issues -- I think we should be able to fix this and will take a look while this goes through testing) - The new flow allowed some redundant code to be removed (AnnotatedGraph is gone, for example, and the more straightforward flow eliminated duplication of effort elsewhere) - Fallback logic is now also invoked if a fusion is requested on a device that cannot handle fusions - Use of macros to determine which files are compiled is reduced (though they may come back if the Windows build is unhappy) - There is no more "common" code or folder, the device-independent logic being at the forefront of the fuser replaces and improves upon the goal of sharing code apaszke who I promised naming rights to zdevito who correctly pointed out that the device-independent logic should be the bulk of what the fuser is doing ngimel who contributed to the design of this architecture Pull Request resolved: https://github.com/pytorch/pytorch/pull/13108 Reviewed By: gchanan, fmassa Differential Revision: D12850608 Pulled By: soumith fbshipit-source-id: 24e2df6dfa97591ee36aeca8944519678c301fa3	2018-10-31 18:13:00 -07:00
Sebastian Messmer	2df6d3e3c7	Fix allocator handling in raw_mutable_data (#13349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13349 When we get a Tensor that was created in ATen, it will have an allocator set. Such tensors, before, crashed when you called raw_mutable_data on them. This diff fixes that. Reviewed By: ezyang, teng-li Differential Revision: D12850833 fbshipit-source-id: 51a5f7030afc4854b439cb3698d0ccd8dd101e2c	2018-10-31 18:04:41 -07:00
Junjie Bai	a682ce9144	Add back HIP support to async net (#13400 ) Summary: We lost HIP support in last refactoring `620ece2668` Pull Request resolved: https://github.com/pytorch/pytorch/pull/13400 Differential Revision: D12868211 Pulled By: bddppq fbshipit-source-id: 72dbfda105b826bee28ddf480e88fca7d63f93d8	2018-10-31 17:52:36 -07:00
Junjie Bai	eaf141dd64	Enable opencv and lmdb in ROCM CI (#13430 ) Summary: They are needed to run resnet50_trainer when using datasets from https://download.caffe2.ai/databases/resnet_trainer.zip cc xw285cornell Pull Request resolved: https://github.com/pytorch/pytorch/pull/13430 Differential Revision: D12876593 Pulled By: bddppq fbshipit-source-id: 912943d1d84d165ad396c8a99d2b948d933e12f2	2018-10-31 17:50:33 -07:00
Jerry Zhang	2e1b7a6f4f	Renaming dim() to size() - 1/3 (#13434 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13434 Codemod generated with clangr shard mode, 50 files per diff, clangr code(dim->size): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12867223 fbshipit-source-id: 3e05be1a370ebd1a273bd4c70499d019fd056ac4	2018-10-31 17:43:52 -07:00
Jerry Zhang	edd902594a	Renaming meta() to dtype() - 1/2 (#13333 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13333 Codemod generated with clangr shard mode, 50 files per diff, clangr code(meta->dtype): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12845168 fbshipit-source-id: 492091963d2211ea80215200e981965767566135	2018-10-31 17:14:08 -07:00
Yiming Wu	470bfaa586	int8 sigmoid op (#13298 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13298 Int8 sigmoid ops, test provided. Only supports first axis now Reviewed By: newstzpz Differential Revision: D12837824 fbshipit-source-id: 2a9f1739813fe7b48f841ae15e0206768e57cd3e	2018-10-31 16:22:45 -07:00
Yangqing Jia	48db74ea03	net_simple_refcount type to help experimentation with dynamic allocation. (#13370 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13370 This diff adds a new net type (simple_refcount) that does one thing: for all intermediate results produced by a net, it will keep refcount about internal usage, and when it finishes its consumption, the net will delete the blob content to mimic the case of dynamic allocation. In fact, this would also be the behavior when we go functional: anything that is not explicitly marked as input or output will be up to the executor for lifetime management. See the comments in net_simple_refcount.cc for details. Reviewed By: dzhulgakov Differential Revision: D12855489 fbshipit-source-id: 594a47a786305d595fd505b6700864dd1d9c72aa	2018-10-31 15:59:16 -07:00
Rui Zhu	479b8266bf	Back out "[pytorch][PR] Support upsample" (#13413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13413 Original commit changeset: d5db200365f1 Reviewed By: houseroad Differential Revision: D12870356 fbshipit-source-id: be115d2370636786901c822895664ccace2a9bc2	2018-10-31 15:51:41 -07:00
sven	a4778862c7	Docs/cpp misc features and fixes (#12914 ) Differential Revision: D10502199 Pulled By: ezyang fbshipit-source-id: ec7523caf37d2c92a0e7a2981e1badf51b93dd05	2018-10-31 15:22:45 -07:00
Peter Goldsborough	7b47262936	Use names instead of indices in format (#13266 ) Summary: apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/13266 Differential Revision: D12841054 Pulled By: goldsborough fbshipit-source-id: 7ce9f942367f82484cdae6ece419ed5c0dc1de2c	2018-10-31 15:17:47 -07:00
Anders Papitto	a376f3a53f	Revert "Revert D12858091: [pytorch][PR] restore USE_C10D_NCCL" (#13407 ) Summary: This reverts commit b1fe541de35381e3a31a9e71db2be4b3af59dbcc. some CI confusion made it look like this diff needed to be reverted; however the actual issue was elsewhere Pull Request resolved: https://github.com/pytorch/pytorch/pull/13407 Differential Revision: D12869650 Pulled By: anderspapitto fbshipit-source-id: 3a436d41fc8434f9aa79b145f20904c99093eef4	2018-10-31 14:02:25 -07:00
David Riazati	f9c0a08eed	Fix len() for tensors (#13398 ) Summary: Fixes #13376, `len(tensor)` was converting tensor to a 1 element list and returning 1 every time. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13398 Differential Revision: D12867630 Pulled By: driazati fbshipit-source-id: 28f3580a072d763df0980b3149c49d1894842ec9	2018-10-31 13:13:21 -07:00
Freddie Mendoza	9577811908	Using pip --user in test.sh script breaks ppc64le builds (#13388 ) Summary: Recent PR #13366 added --user to pip install breaks ppc64le testing when using test.sh. This fix makes it not be used for ppc64le builds/test as both ninja and hypothesis are already in ppc64le docker images. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13388 Differential Revision: D12870164 Pulled By: soumith fbshipit-source-id: b66bafc06ad2c5116bb5ef5e4681cf9c776084aa	2018-10-31 13:09:26 -07:00
Will Feng	08b7c791ff	Windows CI hotfix: Pin Python version to 3.6.7 (#13410 ) Summary: The newest version of `mkl` in conda only supports Python 3.6.7, and installing it as dependency will automatically downgrade Python from 3.7 to 3.6.7, which creates environment divergence between Windows CI build and test jobs. This PR pins Python version to 3.6.7, so that Windows CI build and test jobs have the same conda environment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13410 Differential Revision: D12870201 Pulled By: yf225 fbshipit-source-id: 2c5a41ad4bcc72e02d12ea6529550d5e1cdd45ef	2018-10-31 13:02:18 -07:00
David Riazati	404f8660e7	Add string.format() (#13157 ) Summary: This PR adds `aten::format` as a builtin op for strings with the basic formatting semantics of Python. It also adds varargs to the schema parser (with the limitation that the varargs item is the last argument, i.e. `(args, *kwargs)` is not supported) and to the compiler Pull Request resolved: https://github.com/pytorch/pytorch/pull/13157 Differential Revision: D12832537 Pulled By: driazati fbshipit-source-id: 17c1a5615bb286c648fc9e38f2ebe501b064c732	2018-10-31 12:50:56 -07:00
Gregory Chanan	b3ef98450b	Use non-th versions of some functions when defining backwards. (#13394 ) Summary: In these cases, the native function doesn't do anything different besides checking so there is no semantic change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13394 Differential Revision: D12861272 Pulled By: gchanan fbshipit-source-id: ef7403ef3ce0326ccb12178434ce0cf14b28426e	2018-10-31 12:42:03 -07:00
Michael Suo	f30c74558c	Revert D10861211: Convert Arguments to dictionary Differential Revision: D10861211 Original commit changeset: da2fcc3e3b4d fbshipit-source-id: 7243cb340920cf0acb57420bb5de908acd02a064	2018-10-31 12:38:43 -07:00
Michael Suo	93b16b6422	Revert D10519758: [nomnigraph] Add transpose network pass Differential Revision: D10519758 Original commit changeset: a268374fb0b1 fbshipit-source-id: 4de4c99a185c4083665226af94312b38dd0f6820	2018-10-31 12:34:14 -07:00
Will Feng	b1fe541de3	Revert D12858091: [pytorch][PR] restore USE_C10D_NCCL Differential Revision: D12858091 Original commit changeset: 1cc91bb3b82e fbshipit-source-id: a9b55ea8c138f939af71caefdfe7d4bccf0cd331	2018-10-31 11:32:46 -07:00
Hong Xu	a43c6385f1	When looking for pybind11, do not attempt to get properties from pybind11:pybind11. (#12188 ) Summary: There is no property name "INTERFACE_INCLUDE_DIRECTORIES" for pybind11::pybind11. This will cause cmake error if there exists a system installation of pybind11. In addition, pybind11_INCLUDE_DIRS is already set once "find_package(pybind11 CONFIG)" finds pybind11. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12188 Differential Revision: D10362655 Pulled By: soumith fbshipit-source-id: 9c5d13295c4a2cf9aacd03e195994287d06ed15c	2018-10-31 11:23:01 -07:00
Sam Gross	f5b34e3446	Handle exceptions in at::parallel_for() (#13393 ) Summary: Currently, exceptions thrown in at::parallel_for() will cause a hard crash if the code is executed by a background thread. This catches the exception and re-throws it in the main thread. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13393 Differential Revision: D12861142 Pulled By: colesbury fbshipit-source-id: d53f5ff830ef8c11f90477eb63e5016f7ef1a698	2018-10-31 11:22:59 -07:00
Sam Gross	a4f00c3d1e	Fix error message in tensorlist() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13392 Differential Revision: D12860921 Pulled By: colesbury fbshipit-source-id: 86da3ef15d70b0343dc922a3842449001c1afffa	2018-10-31 11:19:56 -07:00
Nim Arora	cda44ffa81	Add transpose network pass (#13396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13396 stub for bootcamp task Reviewed By: bwasti Differential Revision: D10519758 fbshipit-source-id: a268374fb0b119c5d1960a4382e51c5e1ca240ba	2018-10-31 11:16:41 -07:00
Nim Arora	04e8a6d9ef	Convert Arguments to dictionary (#13332 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13332 Add a utility function to convert a list of caffe2_pb2.Argument to a dictionary. Reviewed By: bwasti Differential Revision: D10861211 fbshipit-source-id: da2fcc3e3b4dbf8decbe14a8e2d5621b3fcc377f	2018-10-31 11:16:39 -07:00
Nim Arora	2cebcbae8c	createUniqueDataNode Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13395 Reviewed By: bwasti Differential Revision: D12831584 fbshipit-source-id: a349dfe7a1da0d90e62b47e1b917f358275007be	2018-10-31 11:16:38 -07:00
serega	a25d3b4d8c	Use byte tensor for mnist labels. (#13363 ) Summary: The C++ mnist example https://github.com/goldsborough/examples/blob/cpp/cpp/mnist/mnist.cpp does not work because the labels are not correctly loaded. Currently it achieves 100 % accuracy. Specifying byte dtype fixes the issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13363 Differential Revision: D12860258 Pulled By: goldsborough fbshipit-source-id: ad7b9256e4fc627240e25c79de9d47b31da18d38	2018-10-31 11:05:40 -07:00
Ailing Zhang	488d393ea6	Fix pointwise loss broadcast (#12996 ) Summary: Fixes #12129 , #12327 Differential Revision: D10513781 Pulled By: ailzhang fbshipit-source-id: a210008a39ff6c3f056c9fbe3f0576cfcce638ec	2018-10-31 10:17:25 -07:00
Gregory Chanan	27ccc8787f	Implement data_ptr as a native function. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13367 Reviewed By: ezyang Differential Revision: D12855339 Pulled By: gchanan fbshipit-source-id: da5d75ab38e01365717eed9a676dcbb22ac89fe7	2018-10-31 09:51:04 -07:00
Anders Papitto	cb87319eb0	restore USE_C10D_NCCL Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13371 Differential Revision: D12858091 Pulled By: anderspapitto fbshipit-source-id: 1cc91bb3b82ec075481353e6f58dfe4e802fee5d	2018-10-31 09:46:45 -07:00
Will Feng	4c06f1f2bb	CircleCI: enable all flaky tests (#13356 ) Summary: A few Caffe2 tests are currently disabled in `py2-gcc4.8-ubuntu14.04` test job because they are known to be flaky. https://github.com/pytorch/pytorch/pull/13055 likely had fixed the flakiness, and this PR tests it. Fixes https://github.com/pytorch/pytorch/issues/12395. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13356 Differential Revision: D12858206 Pulled By: yf225 fbshipit-source-id: 491c9c4a5c48ac1b791fdc9d78acf66091e80457	2018-10-31 09:34:49 -07:00
David Riazati	bc74ec80d0	Add support for torch.backends.cudnn.enabled (#13057 ) Summary: This is used commonly in `nn` functions. This PR adds it as a weak module (and also alters the conversion of weak modules to strong modules to accept ordinary `object`s) Pull Request resolved: https://github.com/pytorch/pytorch/pull/13057 Differential Revision: D10846618 Pulled By: driazati fbshipit-source-id: 028b9f852d40e2e53ee85b93282c98cef8cd336b	2018-10-31 09:31:09 -07:00
Gregory Chanan	b200b51602	Give _dirichlet_grad a native wrapper. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13368 Reviewed By: ezyang Differential Revision: D12855461 Pulled By: gchanan fbshipit-source-id: a220ff464ef09e4efcd9da296fa8b6839b94c337	2018-10-31 07:57:32 -07:00
Edward Yang	0aaff5eaf9	Replace CUDA-specific set_index(_from) method from DeviceGuard with set_device. (#13275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13275 This resulted in a bunch of knock-on changes, which I will now describe: - s/original_index/original_device/ - s/last_index/last_device/ - A bunch of places that used set_index, now use CUDAGuard (which does have set_index) because they were CUDA-specific code. Major caveat: DeviceGuard doesn't actually work non-CUDA/CPU devices, To make that happen, I plan on totally replacing the implementation of DeviceGuard; what I mostly care about here is wrangling the API into an acceptable state. Reviewed By: gchanan Differential Revision: D12832080 fbshipit-source-id: 7de068c7cec35663dc8a533026a626331336e61d	2018-10-31 07:55:13 -07:00
Edward Yang	e5d56659ec	Delete DeviceGuard(int64_t) constructor. (#13232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13232 DeviceGuard should be device agnostic, which means that it shouldn't assume that int64_t means select the CUDA device. Reviewed By: gchanan Differential Revision: D10858024 fbshipit-source-id: b40e8337e4046906fd8f83a95e6206367fb29dbe	2018-10-31 07:55:11 -07:00
Edward Yang	e93c721da1	Add c10::Stream, make at::cuda::CUDAStream use it. (#13133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13133 c10::Stream is a device agnostic object which represents a stream on some device (defined as c10::Device). The primary benefit of introducing this object is that we can easily refer to it from code in the non-CUDA library (since it doesn't actually refer to any CUDA specific bits.) Streams are identified by an ID into an appropriate pool. There's some work to translate to and from pointers to the pool; see inline comments. Reviewed By: gchanan Differential Revision: D10855883 fbshipit-source-id: cc447f11a528432e41c2edc789f40e7a6f17bdd3	2018-10-31 07:55:10 -07:00
Gregory Chanan	a3410f7994	Give addbmm a native wrapper. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13316 Reviewed By: ezyang Differential Revision: D12840406 Pulled By: gchanan fbshipit-source-id: ebcc495f2437da71778001971c32ad6074cf98b7	2018-10-31 07:28:46 -07:00
Gregory Chanan	e6ace54840	Move underscore prefixed th functions _th prefix. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13308 Differential Revision: D12839464 Pulled By: gchanan fbshipit-source-id: ceb5913cd154de301d0d476d70b3a4fc62eb319c	2018-10-31 07:03:34 -07:00
Teng Li	e475d3ede3	DDP multi-GPU segfault fix (#13291 ) Summary: Fix https://github.com/pytorch/pytorch/issues/13200 Tested on 8 GPU machines since CI doesn't have this many GPUs, so multi-GPU test won't be triggered ``` tengli@learnfair096:~/pytorch/test$ python run_test.py -i distributed --verbose Selected tests: distributed Running test_distributed ... [2018-10-29 20:32:46.355858] /public/apps/openmpi/2.1.1/gcc.5.4.0/bin/mpiexec Running distributed tests for the gloo backend test_DistBackend (__main__.TestDistBackend) ... ok test_DistributedDataParallel (__main__.TestDistBackend) ... ok test_DistributedDataParallelCPU (__main__.TestDistBackend) ... ok ``` Also I would like to bump up the bucket size of broadcast to higher for performance reasons Pull Request resolved: https://github.com/pytorch/pytorch/pull/13291 Differential Revision: D12842840 Pulled By: teng-li fbshipit-source-id: e8c50f15ebf2ab3e2cd1b51d365e41a6106b98fe	2018-10-31 00:43:42 -07:00
bddppq	dc854c0ee6	Add --user to pip install in pytorch test scripts (#13366 ) Summary: caffe2 docker images uses native system python, which requires sudo to do pip install. In pytorch rocm Ci we use caffe2 docker image Pull Request resolved: https://github.com/pytorch/pytorch/pull/13366 Differential Revision: D12855748 Pulled By: bddppq fbshipit-source-id: 3e53fa203fa6bb3c43d4065c38c2b61e47f45f1e	2018-10-30 23:09:00 -07:00
Anders Papitto	44d2ca660a	Disable CCACHE while building NCCL (#13340 ) Summary: I don't have a full analysis, but ccache appears to often fail while nccl. To work around this, run the NCCL build with CCACHE_DISABLE. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13340 Differential Revision: D12855467 Pulled By: anderspapitto fbshipit-source-id: 63eb12183ab9d03dd22090f084688ae6390fe8bd	2018-10-30 22:19:21 -07:00
Xiaomeng Yang	bfe7df2211	Optimize rowwise_moments by MKL (#13329 ) Summary: i-am-not-moving-c2-to-c10 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13329 Optimize rowwise_moments by MKL Reviewed By: houseroad Differential Revision: D12845220 fbshipit-source-id: b047e52ba82ed184bd322680fbf96306dfbb9867	2018-10-30 21:43:36 -07:00
Teng Li	865a10feba	Update NCCL to 2.3.7-1 (#13353 ) Summary: Including some hang fixes. Tested locally and distributed works fine Pull Request resolved: https://github.com/pytorch/pytorch/pull/13353 Differential Revision: D12853714 Pulled By: teng-li fbshipit-source-id: be72b9ffb48cffdb590e5452b0a4ec597f052685	2018-10-30 21:34:59 -07:00
Duc Ngo	265c97decf	nomnigraph - More operator definitions (#13358 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13358 More operator definitions changes Reviewed By: itomatik Differential Revision: D12852403 fbshipit-source-id: 0a69d9c6b55ab48344521ab9dba1de003dfc0714	2018-10-30 20:59:42 -07:00
Elias Ellison	59f8e8ada7	First step at adding exceptions (#12789 ) Summary: This is a first step towards adding exceptions. We need minimal support in order to begin converting the torch library to weak script mode (which is the main goal here). Some limitations (that are documented in the tests & compiler): 1. Cannot assign exceptions to variables 2. Any name after raise is being treated as a valid Exception 3. No control flow analysis yet. Below a will be undefined: if True: a = 1 else: raise Exception("Hi") return a Pull Request resolved: https://github.com/pytorch/pytorch/pull/12789 Differential Revision: D12848936 Pulled By: eellison fbshipit-source-id: 1f60ceef2381040486123ec797e97d65b074862d	2018-10-30 20:25:50 -07:00
Junjie Bai	c7027a511f	In pytorch CI install ninja via pip instead of building it from source Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13042 Differential Revision: D12854708 Pulled By: bddppq fbshipit-source-id: 2693d8c9818782cb9f0c958dee8f77a1c131e32d	2018-10-30 20:05:40 -07:00
Junjie Bai	3c66520dd8	Remove aten/src/ATen/CUDAStream.cpp from hipify script (#13357 ) Summary: Deleted in https://github.com/pytorch/pytorch/pull/13251 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13357 Differential Revision: D12852983 Pulled By: bddppq fbshipit-source-id: 0816a14188590e1971fabefcd575489c7339e122	2018-10-30 19:48:07 -07:00
Jerry Zhang	13b9fd3e05	Renaming meta() to dtype() - 2/2 (#13334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13334 Codemod generated with clangr shard mode, 50 files per diff, clangr code(meta->dtype): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp i-am-not-moving-c2-to-c10 Reviewed By: ezyang Differential Revision: D12845197 fbshipit-source-id: f87eb575d3c31593ca76b70780cc4fca888e706b	2018-10-30 18:24:30 -07:00
Gregory Chanan	cb5f374f6c	More functions moved to native, use _th_ prefix more consistently. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13262 Reviewed By: ezyang Differential Revision: D12827704 Pulled By: gchanan fbshipit-source-id: c910c069200c0766dd6d5f998d341124d560e80d	2018-10-30 17:41:55 -07:00
James Reed	7d9ab140bf	Fix aten::to symbolic + add expand_as (#13325 ) Summary: https://github.com/pytorch/pytorch/pull/13146 broke some cases of ONNX export, this fixes them Pull Request resolved: https://github.com/pytorch/pytorch/pull/13325 Differential Revision: D12844294 Pulled By: jamesr66a fbshipit-source-id: f98dd0685820b2a1e5fcd49733cfa5c19c48a4e7	2018-10-30 17:28:15 -07:00
jithunnair-amd	4d141bee98	Skip test_sum_noncontig in ROCm (#13341 ) Summary: Since it fails due to insufficient precision for DoubleTensor .sum() on ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/13341 Differential Revision: D12851335 Pulled By: bddppq fbshipit-source-id: e211c3868b685aa705160ce98a2a18a915ad493f	2018-10-30 16:54:44 -07:00
Gregory Chanan	f1d02f6d1c	Move underscore prefixed linear algebra TH functions to _th prefix. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13309 Reviewed By: ezyang Differential Revision: D12839533 Pulled By: gchanan fbshipit-source-id: 27bdc5254d2529269b705c2c057826a44297a34b	2018-10-30 16:31:53 -07:00
Will Feng	11a16961a5	Fix "CUDA Tensor __rsub__ breaks when device is not 0" (#12956 ) Summary: Currently, `a = 1 - torch.tensor([1]).to('cuda:1')` puts `a` in `cuda:1` but reports `a.device` as `cuda:0` which is incorrect, and it causes illegal memory access error when trying to access `a`'s memory (e.g. when printing). This PR fixes the error. Fixes https://github.com/pytorch/pytorch/issues/10850. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12956 Differential Revision: D12835992 Pulled By: yf225 fbshipit-source-id: 5737703d2012b14fd00a71dafeedebd8230a0b04	2018-10-30 16:29:19 -07:00
Michael Suo	d2659f6689	fix lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13346 Differential Revision: D12850686 Pulled By: michaelsuo fbshipit-source-id: b7474d0a3f3347034592bef45125610c040cff6a	2018-10-30 16:22:58 -07:00
Michael Antonov	f58e4fbc45	Remove redundant array-gen loop in gather_ops_test.py (#13338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13338 Remove unnecessary [r for r in []] statements. Reviewed By: ezyang Differential Revision: D12848907 fbshipit-source-id: 256551b286ac6801585acf9bb0b2644ef0b7ed58	2018-10-30 16:20:22 -07:00
Dan Nguyen	77b8aade58	Revert D12809293: Kill more weird constructors on Tensor Differential Revision: D12809293 Original commit changeset: 5eb663fe8182 fbshipit-source-id: 709a5378fdbbb3fcfaacef8fc48b6530afbbc28f	2018-10-30 16:01:51 -07:00
Xiaodong Wang	ed60f94dba	hipify caffe2 script in fbcode (#13265 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13265 Make changes to make hipify_python script to work with fbcode. 1. Add TARGETS file 2. Make hipify_python a module as well as a standalone script. Reviewed By: bddppq Differential Revision: D10851216 fbshipit-source-id: cacd04df6fe2084832256d1916d62dccea86baa9	2018-10-30 15:51:28 -07:00
Gregory Chanan	9ca8a76645	Rename Type.tensor to Type._th_tensor. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13313 Reviewed By: ezyang Differential Revision: D12840136 Pulled By: gchanan fbshipit-source-id: 896d705eb5091f7677d6d91dbd50629343dfa24d	2018-10-30 15:34:06 -07:00
Anders Papitto	c68b82ebc8	don't expand cmake variable in IF Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13331 Differential Revision: D12849306 Pulled By: anderspapitto fbshipit-source-id: 2f1f72a44ed3a176be8c7490652e49771c3fadbf	2018-10-30 15:20:43 -07:00
Gregory Chanan	cc3618ce36	Move _cumsum and _cumprod to _th_ prefixes. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13311 Reviewed By: ezyang Differential Revision: D12839706 Pulled By: gchanan fbshipit-source-id: 79e20b31c6ca2f22229ad3903aacf70dc674c25c	2018-10-30 15:01:14 -07:00
Jerry Zhang	ce469e6c71	dims() to sizes() remaining part Summary: Made the clangr rule more robust and it discovered more callsites. Reviewed By: smessmer Differential Revision: D12825017 fbshipit-source-id: 3be1eeb7ea697b36ef89e78ba64c0ee1259439c4	2018-10-30 14:56:21 -07:00
Sam Gross	9af18d847a	Fix accesses to uninitialized memory when running sum() within an OMP… (#13274 ) Summary: ``` … parallel region. The two_pass_reduction code allocates a buffer of size at::max_threads(). When called within a parallel region, at::parallel_for only uses 1 thread so some of this buffer is not written. This makes two changes: 1) two_pass_reduction is not called when already in a parallel region 2) two_pass_reduction fills unwritten buffer elements with the identity (the value in dst) ``` cc The controller you requested could not be found. SsnL: I think this should fix the NaNs in BatchNorm when calling sum() within a parallel region. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13274 Differential Revision: D12840034 Pulled By: colesbury fbshipit-source-id: d32e80909a98a0f1bb1c80689fe5089b7019ef59	2018-10-30 14:17:35 -07:00
Peter Goldsborough	f04a705cb2	Remove assertions in conv modules (#13283 ) Summary: These assertions aren't necessary because these conditions are checked inside the ATen ops, and right now they're not very user-friendly because they don't have an error message or reference the dimension of the tensor being checked. Let's just remove them (the error then comes from ATen with a friendlier message). ezyang ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/13283 Differential Revision: D12840730 Pulled By: goldsborough fbshipit-source-id: 1902056c7d673f819c85f9164558e8d01507401c	2018-10-30 13:51:12 -07:00
Gregory Chanan	c0411719fc	Rename th_addmm to _th_addbmm. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13317 Reviewed By: ezyang Differential Revision: D12840603 Pulled By: gchanan fbshipit-source-id: 10ead96cd181535cbd4dfe84be813375024dbd2c	2018-10-30 13:48:49 -07:00
Dong Shi	3a81984bde	Make Stat put ops accept empty tensors safely (#13178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13178 Add default value option to stats put ops Reviewed By: mlappelbaum Differential Revision: D10858564 fbshipit-source-id: cc9b3e621abf3fc21821b73f354bebdcd35e477e	2018-10-30 13:28:58 -07:00
Lu Fang	ce51e3fe55	Move the Test conversion script to main repo (#13287 ) Summary: Better to keep it in the main repo, so we will have the correct dependencies. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13287 Reviewed By: zrphercule Differential Revision: D12834665 Pulled By: houseroad fbshipit-source-id: 3a0afaa705a9b8f4168fcd482123bcabcf083579	2018-10-30 13:25:22 -07:00
Wei Yang	3cb2470bb3	add __deepcopy__ back to Parameter (#12886 ) Summary: - fix https://github.com/pytorch/pytorch/issues/315 - add `__deepcopy__` back to Parameter class Pull Request resolved: https://github.com/pytorch/pytorch/pull/12886 Differential Revision: D12838771 Pulled By: weiyangfb fbshipit-source-id: b2ce12244e36f981d89f6c7cdead63237dd820ea	2018-10-30 12:56:26 -07:00
Yangqing Jia	a35162f1bc	Remove net_simple_async (#13320 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13320 simple_async has been deprecated via the network override rule for a while, and we should be able to safely remove it. This also clears up 2 tech debts: (1) in rnn executor, rely on the executor override to get the right net. (2) clearly mark checkExecutorOverride as a potential change to net_type by making it c++ style guide compliant. Reviewed By: dzhulgakov Differential Revision: D12840709 fbshipit-source-id: 667702045fa024f5bdc87a9c28ea1786c78432b3	2018-10-30 12:36:38 -07:00
verhoek	0db505bf27	Made docstrings for Embedding more accurate. (#13310 ) Summary: Made the previous description for max_norm more precise, avoiding 'this' and describing what actually happens in the code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13310 Differential Revision: D12840813 Pulled By: SsnL fbshipit-source-id: 98090c884267a62ce93cd85da84252d46926dfa5	2018-10-30 12:25:38 -07:00
Kean Finucane	264deae5da	Improve visual representation of NQL subgraphs (#13143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13143 For primary review (will likely delete to keep commit chain cleaner) Retain operator input order in DOT string conversion on NQL tool. Assumptions * No API to discern input graph node type * Graph is bipartite * No generative operators; i.e. operator w/o input but creates output * Not supporting subgraph Mocks (from input P60154484) Old: https://pxl.cl/j4mV (DOT string P60154515) New: https://pxl.cl/j0wd (DOT string P60154461) Reviewed By: bwasti Differential Revision: D10224942 fbshipit-source-id: 8b0ce2f1f9248dfaa89aa01a3fd77e327de16ea4	2018-10-30 12:22:37 -07:00
Xiaomeng Yang	017b91f861	Optimize channel_shuffle_op on GPU (#13066 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13066 Optimize channel_shuffle_op on GPU Reviewed By: houseroad Differential Revision: D10639281 fbshipit-source-id: 394b937403e5d4e9df93548bbf87285bffaa55a9	2018-10-30 12:18:27 -07:00
Egil Martinsson	518b0d0600	Fix add out=None to digamma docstring (Fixes #13225 ) (#13307 ) Summary: Fixes #13225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13307 Differential Revision: D12840231 Pulled By: SsnL fbshipit-source-id: 2732a2466ac1d2f3fdabfd1eaccddec96e89ba1b	2018-10-30 11:52:35 -07:00
Michael Suo	5ba952afcc	use topological move in graph fuser (#13271 ) Summary: Turns out that getting rid of the multiple passes in fusion is a little more involved, so leaving it off for another day. Expect test changes are just things moving around with new orders, but I would appreciate if someone glanced at them for something crazy. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13271 Differential Revision: D12832752 Pulled By: michaelsuo fbshipit-source-id: 55f16c80a97601744a06df2ead45cef7b3a19c08	2018-10-30 11:10:28 -07:00
Jason Gauci	5b15a501da	Refactor & unit test feed predictor Summary: 1. Refactor DDPG predictor. Merge the critic predictor with ParametricDQNPredictor since they are the same 2. Fix bug where loss was multiplied by the batch size 3. Create DDPGFeedPredictor which uses the feed predictor output format 4. Add support for gridworld simulation memoization to DDPG. Also memoize normalization tables. Reviewed By: kittipatv Differential Revision: D10161240 fbshipit-source-id: 2813890043de1241c1fb9b9c2b6a897403f9fc12	2018-10-30 10:27:47 -07:00
Dmytro Dzhulgakov	ec754adb14	Kill more weird constructors on Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13190 Reviewed By: ezyang Differential Revision: D12809293 fbshipit-source-id: 5eb663fe818276d97cf31d1ed1e7f025d2b69851	2018-10-30 10:25:40 -07:00
Will Feng	10de2c1187	CircleCI: fix test timeout by running CPU build and test on different machines (#13284 ) Summary: It seems that we can fix the test timeout issue by running CPU build and test on different machines (I manually ran this patch through the CI 50 times to confirm this). The actual reason of timeout is still unknown, but I suspect it has to do with memory / disk space. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13284 Differential Revision: D12840371 Pulled By: yf225 fbshipit-source-id: af326f0358355602ee458696c3ffb325922e5289	2018-10-30 10:22:57 -07:00
David Riazati	ac64724ed9	Add support for tuple constants (#13086 ) Summary: Depends on #13072 Adds support for tuples as variables instead of just as literals. Before, tuples would give the error `python value of type 'tuple' cannot be used as a value`. This PR adds a flag on `SugaredValue` to determine in a value is a tuple or not. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13086 Differential Revision: D10846632 Pulled By: driazati fbshipit-source-id: 7b5d6ae9426ca3dd476fee3f929357d7b180faa7	2018-10-30 09:01:17 -07:00
albanD	f06b70a6e9	Fix memory leak during packing in tuples (#13305 ) Summary: Verified on python 3.6 that it fixes #13243 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13305 Differential Revision: D12838764 Pulled By: soumith fbshipit-source-id: 206a8b22d1d05e5f156f1db1baaa82358f3eaa83	2018-10-30 08:32:26 -07:00
Richard Zou	8a888c48da	Reimplement as_strided in ATen. (#13185 ) Summary: This moves away from using tensor.set_(...) for as_strided, which went through TH and was weirdly slow/complicated. The new as_strided has a new invariant that it will never resize the storage to a larger size (the previous as_strided allowed that behavior but it seemed weird and none of our code relied on it.) This offers a small speedup on as_strided: it went from 1300ns to 1100ns although the benchmarks get a little noisy here. Also on the changelog is a quick fix to resize_ code to avoid unsigned underflow. I'll rewrite the resize_ zero dim logic in a future diff, it doesn't make sense the way it is written right now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13185 Reviewed By: ezyang Differential Revision: D12809160 Pulled By: zou3519 fbshipit-source-id: 3885df9d863baab2b2f8d8e2f8e2bfe660a49d85	2018-10-30 07:52:50 -07:00
Richard Zou	8c2d0c831f	Speed up tensor.storage_offset (#13267 ) Summary: This PR special cases tensor.storage_offset to avoid dispatches in the common case. tensor.storage_offset is important for torch.as_strided performance, because as_strided(sizes, strides) shares an implementation with as_strided(sizes, strides, storage_offset) and it might not be the best if there were two separate implementations (including backward implementations). This PR reduces times on a tensor.storage_offset microbenchmark from 22ns to 2ns (these numbers are pretty stable). For a torch.as_strided benchmark, this PR reduces numbers from 1042 to 928ns, a 100ns improvement, but this number is noisy and goes up and down. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13267 Reviewed By: ezyang Differential Revision: D12829828 Pulled By: zou3519 fbshipit-source-id: df907731e2398ce2baf1c8b1860a561ccc456f78	2018-10-30 07:36:21 -07:00
Richard Zou	ee010a2bee	Operators that never (re)allocate memory do not need DeviceGuard (#13269 ) Summary: This PR removes DeviceGuard for the following native function tensor reshaping operations: - broadcast_tensors - chunk - expand - expand_as - narrow - reshape - reshape_as - select - slice - split - split_with_sizes - squeeze - squeeze_ - transpose - transpose_ - unsqueeze - unsqueeze_ There are probably more but I'm putting this out for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13269 Reviewed By: ezyang Differential Revision: D12830317 Pulled By: zou3519 fbshipit-source-id: 466a1bbd835aa708fe72c3c620e07fed3f85661f	2018-10-30 07:13:15 -07:00
Dmytro Dzhulgakov	47c0d88739	Bring back warning for dtype uninitialized in serialization (#13239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13239 Previous diff missed the if (dtype_initialized) check, duh. Also, for safety of spamming - using LOG_EVERY_MS if it's available Reviewed By: kennyhorror Differential Revision: D12818938 fbshipit-source-id: 76590bd1b28010fb13f5d33423c8eac1395e9f76	2018-10-29 22:09:54 -07:00
Edward Yang	bb703b1ff5	Remove defunct ATen/CUDAStream.h,cpp (#13251 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13251 Differential Revision: D12823807 Pulled By: ezyang fbshipit-source-id: 7fa1ecc8058f3b0dacf5d3a4054f10422832599d	2018-10-29 21:08:10 -07:00
Jerry Zhang	91e87c0395	Renaming size() to numel() - 2/2 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(size->numel): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp i-am-not-moving-c2-to-c10 Reviewed By: ezyang Differential Revision: D12833748 fbshipit-source-id: 98dc2d3abc23c177c2c9e457b81499952d4b690c	2018-10-29 18:59:29 -07:00
Anders Papitto	c82e8bf988	bump gloo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13286 Differential Revision: D12835150 Pulled By: anderspapitto fbshipit-source-id: 4e3bbca077447ef0c007568a359f2260229c2a51	2018-10-29 18:56:21 -07:00
Ailing Zhang	4a3baec961	Hub Implementation (#12228 ) Summary: [Edit: after applied colesbury 's suggestions] * Hub module enable users to share code + pretrained weights through github repos. Example usage: ``` hub_model = hub.load( 'ailzhang/vision:hub', # repo_owner/repo_name:branch 'wrapper1', # entrypoint 1234, # args for callable [not applicable to resnet18] pretrained=True) # kwargs for callable ``` * Protocol on repo owner side: example https://github.com/ailzhang/vision/tree/hub * The "published" models should be at least in a branch/tag. It can't be a random commit. * Repo owner should have the following field defined in `hubconf.py` * function/entrypoint with function signature `def wrapper1(pretrained=False, args, kwargs):` `pretrained` allows users to load pretrained weights from repo owner. * `args` and `kwargs` are passed to the callable `resnet18`, repo owner should clearly specify their help message in the docstring ``` def wrapper1(pretrained=False, args, kwargs): """ pretrained (bool): a recommended kwargs for all entrypoints args & kwargs are arguments for the function """ from torchvision.models.resnet import resnet18 model = resnet18(args, *kwargs) checkpoint = 'https://download.pytorch.org/models/resnet18-5c106cde.pth' if pretrained: model.load_state_dict(model_zoo.load_url(checkpoint, progress=False)) return model ``` Hub_dir * `hub_dir` specifies where the intermediate files/folders will be saved. By default this is `~/.torch/hub`. * Users can change it by either setting the environment variable `TORCH_HUB_DIR` or calling `hub.set_dir(PATH_TO_HUB_DIR)`. * By default, we don't cleanup files after loading so that users can use cache next time. * Cache logic : * We used the cache by default if it exists in `hub_dir`. * Users can force a fresh reload by calling `hub.load(..., force_reload=True)`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12228 Differential Revision: D10511470 Pulled By: ailzhang fbshipit-source-id: 12ac27f01d33653f06b2483655546492f82cce38	2018-10-29 18:43:14 -07:00
mruberry	955a01562d	Removes debug spew in test_jit.py (#13280 ) Summary: Looks like a print() snuck in by accident with a recent PR and it's printing a lot of spew when the tests are run. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13280 Differential Revision: D12833449 Pulled By: michaelsuo fbshipit-source-id: 5b50fd4b03bb73e5ca44cabdc99609c10017ff55	2018-10-29 18:25:30 -07:00
Peter Goldsborough	6071389a90	Enable cppcoreguidelines checks in clang-tidy (#12959 ) Summary: Enables most of `cppcoreguidelines-*` checks for clang-tidy. Major fixes included: - Uninitialized members, - Use of `const_cast`, - Use of raw `new` ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/12959 Differential Revision: D11349285 Pulled By: goldsborough fbshipit-source-id: 9e24d643787dfe7ede69f96223c8c0179bd1b2d6	2018-10-29 18:23:35 -07:00
Jerry Zhang	8260441b45	Renaming size() to numel() - 1/2 Summary: Codemod generated with clangr shard mode, 50 files per diff, clangr code(size->numel): diffusion/FBS/browse/master/fbcode/caffe2/caffe2/fb/codemods/TensorMethodRename.cpp Reviewed By: ezyang Differential Revision: D12833710 fbshipit-source-id: aef469b7b6d7715dada593f0f55e5813fbd963ac	2018-10-29 18:01:01 -07:00
Xiaodong Wang	fbd497f169	Fix initialization order in MIOpen file (#13264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13264 Simply change the initialization order to make hcc happy. Otherwise will have to add -Wno-error=reorder. Reviewed By: bddppq Differential Revision: D12827635 fbshipit-source-id: 6f4cd67209f2aa8ae85cfbdc53df0efb3b3cc473	2018-10-29 16:48:54 -07:00
Tongzhou Wang	d8dab6ffa8	Add tensor.to(options) (#13146 ) Summary: ezyang on the template hack smessmer on SFINAE of the `TensorOptions(Device)` goldsborough on the C++ API test changes zdevito on the `jit` codegen changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/13146 Reviewed By: ezyang Differential Revision: D12823809 Pulled By: SsnL fbshipit-source-id: 98d65c401c98fda1c6fa358e4538f86c6495abdc	2018-10-29 16:26:06 -07:00
albanD	3365d74df9	Fix refcounting in anomaly metadata Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13249 Differential Revision: D12823875 Pulled By: soumith fbshipit-source-id: a0857a7cc8a4888aff99991fbae6bdd7a49d1ac4	2018-10-29 15:55:08 -07:00
David Brownell	50a8f8531b	Updated for for arbitrary command line arg ordering Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13253 Differential Revision: D12829884 Pulled By: soumith fbshipit-source-id: 9d8abcdf635e2daffce80ddf1e0e418a1e4c337d	2018-10-29 15:52:03 -07:00
sli	9d9e5f8d1e	Solve bug of DistributedDataParallel (#13248 ) Summary: Fixed bug [https://github.com/facebookresearch/maskrcnn-benchmark/issues/52](https://github.com/facebookresearch/maskrcnn-benchmark/issues/52) Pull Request resolved: https://github.com/pytorch/pytorch/pull/13248 Reviewed By: pietern Differential Revision: D12830451 Pulled By: teng-li fbshipit-source-id: ab33faf3f6f4545f8fe07da7ecbeb2f0a2ea23f0	2018-10-29 15:19:55 -07:00
verhoek	33b00bdbb8	cwd arg in shell function of run_test set to optional (#13247 ) Summary: Tiny fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13247 Differential Revision: D12830311 Pulled By: soumith fbshipit-source-id: 405620e3a1de5bfc7e039f9aaf2f7cb7a3bca1b1	2018-10-29 15:17:00 -07:00
Jerry Ma	7956e9718b	Add name for required optimizer parameter. (#13202 ) Summary: Small change -- the benefit is that the docs will show ``<required parameter>`` instead of ``<object object>`` for these required parameters. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13202 Reviewed By: SsnL Differential Revision: D12826252 Pulled By: jma127 fbshipit-source-id: 5f2c8495e5c56920377e4e012b8711e8f2a6e30e	2018-10-29 15:02:21 -07:00
Dong Shi	2e19529bd1	Add HasDeviceOption [nomnigraph] (#13206 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13206 Add has device option for checking if a node has a device option set Reviewed By: bwasti Differential Revision: D12815365 fbshipit-source-id: 58477df93777f470cfb30cd75f02a659a7017b7c	2018-10-29 14:25:40 -07:00
Edward Yang	2cfe439cc7	Turn off tests for Travis-derived Python jobs. (#13252 ) Summary: They appear to timeout 30% of the time when run on CircleCI. Long term plan is to switch to using some binaries which are not provided by Travis. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13252 Differential Revision: D12828812 Pulled By: ezyang fbshipit-source-id: 7189e2a3200ae08c4ece16a27357ff0fd06f3adb	2018-10-29 14:04:57 -07:00
Dmytro Dzhulgakov	3c78cc6c2b	Remove Tensor(const Tensor&, BaseContext*, type) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13204 Reviewed By: ezyang Differential Revision: D11915764 fbshipit-source-id: baf883b3095bc9d5adf0b942eb874eaa7c1f45e5	2018-10-29 13:57:43 -07:00
Dmytro Dzhulgakov	5a2b2aa6af	Remove calls to CopyFrom that can be sync (#13205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13205 CopyFrom without context argument does the sync copy on the current gpu - exactly what most of the places need. This diff kills about 60% of CopyFrom usages. Most common pattern is gpu->cpu copy with further FinishDeviceComputation - the latter can be just killed. Reviewed By: Yangqing Differential Revision: D11236076 fbshipit-source-id: eb790ca494dfc5d5e3a7d850b45d6f73221bb204	2018-10-29 13:57:42 -07:00
Tongzhou Wang	8ad69a80e3	Test scripts only run cases defined in the running script (#13250 ) Summary: 1. Refactors `TestTorch` into `TestTorchMixin` (subclass of `object`) and `TestTorch` (subclass of `TestCase`, MRO `(TestCase, TestTorchMixin)`, only defined if `__name__ == '__main__'`). So other scripts won't accidentally run it. 2. Adds an assertion in `load_tests` that each script only runs cases defined in itself. cc yf225 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13250 Differential Revision: D12823734 Pulled By: SsnL fbshipit-source-id: 7a169f35fe0794ce76e310d8a137d9a3265c012b	2018-10-29 13:57:40 -07:00
James Reed	db0b5c7ab7	ArgumentStash for int64_t arguments (#12939 ) Summary: Closes https://github.com/pytorch/pytorch/issues/12906. https://github.com/pytorch/pytorch/issues/12580 is still open because the schema is marked as `traceable=false` in the arg parser constructor, I think. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12939 Differential Revision: D10492031 Pulled By: jamesr66a fbshipit-source-id: ca5376de3997b5fb62b493e2e6a9bb0d6c3b9687	2018-10-29 13:55:24 -07:00
Anders Papitto	aabdcaa8fa	No tmp install (#13215 ) Summary: This is a small patch on top of https://github.com/pytorch/pytorch/pull/13150 - please review only the top commit here Pull Request resolved: https://github.com/pytorch/pytorch/pull/13215 Differential Revision: D12827675 Pulled By: anderspapitto fbshipit-source-id: adb01d72a827b6dbffc25f7f99fdc3129906b1ca	2018-10-29 12:59:44 -07:00
Anders Papitto	a69af69ffc	remove vestigial logic related to onnxbot tracking PRs (#13260 ) Summary: onnx always has a million branches so this is noisy Pull Request resolved: https://github.com/pytorch/pytorch/pull/13260 Differential Revision: D12827640 Pulled By: anderspapitto fbshipit-source-id: 55eced08970cc0a888bd8f7bc8670eea48deb288	2018-10-29 12:49:11 -07:00
Anders Papitto	380d2dfb27	absorb nccl (#13150 ) Summary: always build nccl from within the main cmake build, rather than via a separate invocation in build_pytorch_libs.sh. Use the existing caffe2 codepaths Pull Request resolved: https://github.com/pytorch/pytorch/pull/13150 Differential Revision: D12815674 Pulled By: anderspapitto fbshipit-source-id: a710b6f242d159b9816911a25ee2c4b8c3f855aa	2018-10-29 12:04:32 -07:00
Peter Goldsborough	1c8a823b3b	More robust ABI compatibility check for C++ extensions (#13092 ) Summary: This PR makes the ABI compatibility check for C++ extensions more robust by resolving the real path of the compiler binary, such that e.g. `"c++"` is resolved to the path of g++. This more robust than assuming that `c++ --version` will contain the word "gcc". CC jcjohnson Closes #10114 soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/13092 Differential Revision: D12810448 Pulled By: goldsborough fbshipit-source-id: 6ac460e24496c0d8933b410401702363870b7568	2018-10-29 11:56:02 -07:00
Bram Wasti	48b98d2f7f	Expose nn:: namespace to python (#13132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13132 Expose more of the C++ API to python Reviewed By: duc0 Differential Revision: D10855086 fbshipit-source-id: 98cc89bc72ef91ed1c59c1a19688e047765cf90b	2018-10-29 11:36:51 -07:00
Sebastian Messmer	62b27d27b7	Re-enable experimental ops build (#12821 ) Summary: The experimental ops for the c10 dispatcher have accidentally been disabled in the oss build when the directory changed from `c10` to `experimental/c10`. This PR re-enables them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12821 Differential Revision: D10446779 Pulled By: smessmer fbshipit-source-id: ac58cd1ba1281370e62169ec26052d0962225375	2018-10-29 11:28:54 -07:00
Roy Li	b818d31a3e	use TypeMeta instead of ScalarType in TensorOptions (#13172 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13172 reland D10419671 Reviewed By: ezyang Differential Revision: D12143282 fbshipit-source-id: 43504d06a901af30130ebe97fb0b33def45cdc9a	2018-10-29 11:15:37 -07:00
Jerry Zhang	dcbca53e58	Renaming size() to numel() - 1/6 Summary: Codemod generated with clangr shard mode, 50 files per diff Reviewed By: li-roy Differential Revision: D10866373 fbshipit-source-id: 589194164d4fea93b74d83fa7fc4c59558c41f4a	2018-10-29 11:11:19 -07:00
Gregory Chanan	b1cf3ad1c2	More Declarations.cwrap functions moved to native, mainly LAPACK, sim… (#13194 ) Summary: …ple math. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13194 Reviewed By: ezyang Differential Revision: D12811972 Pulled By: gchanan fbshipit-source-id: 461beb5efa2b6aba0808d2419eb7eb3153d18d15	2018-10-29 11:03:04 -07:00
Gu, Jinghui	dbab9b73b6	seperate mkl, mklml, and mkldnn (#12170 ) Summary: 1. Remove avx2 support in mkldnn 2. Seperate mkl, mklml, and mkldnn 3. Fix convfusion test case Pull Request resolved: https://github.com/pytorch/pytorch/pull/12170 Reviewed By: yinghai Differential Revision: D10207126 Pulled By: orionr fbshipit-source-id: 1e62eb47943f426a89d57e2d2606439f2b04fd51	2018-10-29 10:52:55 -07:00
zrphercule	bb96b6635c	Support upsample (#13152 ) Summary: This will enable the updated attribute and input format of operator upsample. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13152 Reviewed By: houseroad Differential Revision: D12812491 Pulled By: zrphercule fbshipit-source-id: d5db200365f1ab2bd1f052667795841d7ee6beb3	2018-10-29 10:40:35 -07:00
Anders Papitto	5be20f92ca	Towards a quieter CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13210 Differential Revision: D12824924 Pulled By: anderspapitto fbshipit-source-id: 76dc9d43a1b5c57eca1051ce6c92200b5fbda7ae	2018-10-29 10:35:40 -07:00
Ilia Cherniavskii	1032cf9fe4	Support for zero-length sequences in RNN executor (#13244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13244 Adding support for zero-length sequences into RNN executor Reviewed By: dzhulgakov Differential Revision: D10848803 fbshipit-source-id: f2994ee28c09fb30146243bb300ae7205024dd17	2018-10-29 10:32:42 -07:00
Sam Gross	52b6460d3a	Fix bug in some reductions that use global memory (#13211 ) Summary: Reductions that used global memory, but didn't reduce across threads in a warp did not have enough global memory allocated for their intermediate results. These reductions that were non-contiguous in their reduced dimension and large enough to benefit from reducing across blocks in a grid. Fixes #13209 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13211 Differential Revision: D12815772 Pulled By: colesbury fbshipit-source-id: f78be2cb302e7567a76097ca3ba1e7b801c0cdad	2018-10-29 10:23:30 -07:00
Elias Ellison	9e6a695116	Add string equality test, string concat (#12992 ) Summary: Adding string equality comparison, and concat. Both are used in the standard library Pull Request resolved: https://github.com/pytorch/pytorch/pull/12992 Differential Revision: D10513681 Pulled By: eellison fbshipit-source-id: 1f845ef50be7850fdd3366951b20dc2a805c21fd	2018-10-29 10:13:21 -07:00
Michael Carilli	74ac86d2fe	Show demangled names on nvtx ranges (#13154 ) Summary: AsyncDBConnMarkedDownDBException As we discussed, this changes the backward pass profiler annotations such that 1. they're demangled and 2. if they came from a custom Python-side autograd function, they show a unique name based on the name of that Python-side function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13154 Differential Revision: D12808952 Pulled By: colesbury fbshipit-source-id: 4119dbaed7714b87c440a81d3a1835c5b24c7e68	2018-10-29 08:45:54 -07:00
Edward Yang	277b637811	Delete default constructor from CUDAStream. (#13021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13021 Let's make nullptr CUDAStream an illegal state. Reviewed By: gchanan Differential Revision: D10520421 fbshipit-source-id: 723c1f5130b2c92ec97411a958707fac4a90173f	2018-10-29 08:27:24 -07:00
Edward Yang	1a4473bbd7	Rewrite THPUtils_PySequence_to_CUDAStreamList to return vector<optional<CUDAStream>> (#13125 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13125 Previously, it returned a vector of THCStream*, which we eventually turned into CUDAStream. No need to spatter the conversion code everywhere: just do it correctly to begin with. An important side effect of doing it this way is that we no longer pass nullptr to CUDAStream; instead, we create the default stream. I will rely on this in a later patch. Reviewed By: gchanan Differential Revision: D10853224 fbshipit-source-id: f6bd6594eba4626eb41a4a5e67fc64c9bbb46a1a	2018-10-29 08:27:23 -07:00
Edward Yang	175f248310	Reduce sizes in TestUncoalescedSparse.test_to_sparse (#13236 ) Summary: The old test took 2min to run. Signed-off-by: Edward Z. Yang <ezyang@fb.com> See #13233 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13236 Differential Revision: D12823474 Pulled By: ezyang fbshipit-source-id: c800492a96e41a4cd18d41901f411d9d4e978613	2018-10-29 08:01:58 -07:00
Gregory Chanan	71113c6b9e	Respect kwarg-only of native functions moved from Declarations.cwrap. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13237 Reviewed By: ezyang Differential Revision: D12818917 Pulled By: gchanan fbshipit-source-id: 0ff55ccac3459edd3b28068a0378e9dae085eda0	2018-10-29 07:48:48 -07:00
Ilia Cherniavskii	4276fe7867	Support for saving exceptions in async CPU ops (#12904 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12904 Enabling support for saving exceptions in async parts of CPU ops via event().SaveException(). The error contract for CPU ops becomes: - return false in sync part -> net->Run() returns false - throw in sync part -> net->Run() rethrows the same exception - SetFinished("error msg") in async part -> net->Run() returns false - event().SetFinishedWithException() in async part -> net->Run() rethrows the same exception Reviewed By: andrewwdye Differential Revision: D10479130 fbshipit-source-id: 850ee9cbf83b04dd24b25eba359439b0cf7853c0	2018-10-29 04:57:40 -07:00
Edward Yang	4fe8ca74af	Test if GCC 7 fixes timeout problem. (#13230 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13230 Differential Revision: D12818863 Pulled By: ezyang fbshipit-source-id: 371337ca4b9d8f8e71eb78d6a53085e1c3619631	2018-10-28 20:53:07 -07:00
Edward Yang	34799faccd	Fix move constructor on c10d::CUDAEvent (#13183 ) Summary: Previously, the move constructor performed a swap between the item being moved in, and the uninitialized garbage from the object itself. I didn't bother adding a test because I shortly intend to kill this class entirely. But the fix is so easy that I wanted to put it in in case I don't get around to doing this. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13183 Reviewed By: pietern Differential Revision: D12809062 Pulled By: ezyang fbshipit-source-id: 0d94bb9796fb7d30621256bfb401a4f89ba8ddc8	2018-10-28 17:47:12 -07:00
vishwakftw	1fe8278559	Batched Inverse (#9949 ) Summary: Complete billing of changes: Related to Batch Inverse: - [x] Add batched inverse (CPU) - [x] Add batched inverse (CUDA) - [x] Modify autograd entry - [x] Add tests - [x] test_autograd - [x] test_cuda - [x] test_torch - [x] Modify docs - [x] Remove `_batch_inverse` in `MultivariateNormal`. - [x] Allow batch matrices as inputs for negative powers in `matrix_power` Miscellaneous modifications: - [x] Move all batch operations to BatchLinearAlgebra.cpp/.cu and provide general framework for adding more batch ops. - [x] Add a RAII structure for MAGMA queue management. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9949 Differential Revision: D10559089 Pulled By: zou3519 fbshipit-source-id: 7da24977f8a79d97dd42883302e13e708c1726e4	2018-10-27 23:42:46 -07:00
James Sun	4d62eef505	Add Future to IValue (#12976 ) Summary: Future now is an IValue. prim::Wait now is replaced by aten::wait This PR is built on top of #12925 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12976 Differential Revision: D10861483 Pulled By: highker fbshipit-source-id: 9e17926a625bc502fb12335ef9ce819f25776be7	2018-10-27 10:00:35 -07:00
Marat Dukhan	0f261ee359	Fix performance regresion introduced in D10524381 (#13199 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13199 D10524381 removed inclusion of int8_simd.h in Caffe2 Int8 operators, and although the resuling code still compiles and works, it is up to 50% end-to-end slower (no SIMD!) on some models Reviewed By: bertmaher Differential Revision: D12813095 fbshipit-source-id: 03a713a4c070c0ad1e79e71e91d09eaddc0751eb	2018-10-27 08:16:49 -07:00
Ashish	df8c5a3572	Refactoring MIOpen activation ops (#13187 ) Summary: This pull request contains changes for: 1. Adding a generalized MIOpen activation class to be used by activation operators 2. Refactoring MIOpen ReLU op to use the new class 3. Adding ELU, Tanh and Sigmoid MIOpen ops Differential Revision: D12810112 Pulled By: bddppq fbshipit-source-id: 9519b3a0cd733b906bcba5d8948be089029c43ac	2018-10-27 00:22:54 -07:00
Lu Fang	f8864f0505	Revert "Move batch_norm to ATen/native, speed up (#12368 )" (#13191 ) Summary: Revert #12368 since it's causing onnx related test cases failing. https://github.com/pytorch/pytorch/pull/12368 SsnL The controller you requested could not be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13191 Reviewed By: BIT-silence Differential Revision: D12810778 Pulled By: houseroad fbshipit-source-id: 1c373b92628580097cffcd237dccc5b3d8697577	2018-10-26 23:05:50 -07:00
Doug Friedman	bc352ace7c	dense.to_sparse() re: #8853 (#12171 ) Summary: Here is my stab at ```dense.to_sparse``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12171 Differential Revision: D10859078 Pulled By: weiyangfb fbshipit-source-id: 5df72f72ba4f8f10e283402ff7731fd535682664	2018-10-26 21:48:52 -07:00
Lu Fang	5182fdad0b	Compute the offset to make sure the order in InlineContainer test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13198 Reviewed By: bddppq Differential Revision: D12812909 Pulled By: houseroad fbshipit-source-id: f448e0d7957c316099a6b565d129eabb7ef81e59	2018-10-26 21:32:25 -07:00
Johannes M Dieterich	7a6e0bd77e	Skip ROCm tests that fail as per #12824 (#13181 ) Summary: For attention: bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/13181 Differential Revision: D12811207 Pulled By: bddppq fbshipit-source-id: de1c92e5a8cf4fc634c4644376d07374441c24e3	2018-10-26 21:06:20 -07:00
Summer Deng	723f40d94e	video model test workflow on CPU (#13203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13203 Minor changes in the test workflow to run the model on CPUs Reviewed By: stephenyan1231 Differential Revision: D9925797 fbshipit-source-id: b7b1fb2658ab68b1ffc2b1f7b314958ea4732b32	2018-10-26 20:48:18 -07:00
Zachary DeVito	dae7616078	Shard all of tests based on how many tests exist. (#13160 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13160 Reduces pytorch_core build from 2 hours to 30 minutes Reviewed By: soumith, dzhulgakov Differential Revision: D10524261 fbshipit-source-id: 97270ac73404b5ea4c264cd0e9d8d4b1be79b0e9	2018-10-26 18:20:34 -07:00
Xiaomeng Yang	7637b7c966	Opitmize LayerNormOp (#13173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13173 Opitmize LayerNormOp Reviewed By: houseroad Differential Revision: D12398163 fbshipit-source-id: 6b76bc4bd9f34e623f8e385dd07d4ce99490badf	2018-10-26 17:00:18 -07:00
Jerry Zhang	537d671829	Renaming size() to numel() - 4/6 Summary: Codemod generated with clangr shard mode, 50 files per diff Reviewed By: li-roy Differential Revision: D10866391 fbshipit-source-id: 3badc4e86edaac376918fca8d09dbfa396ac3a2c	2018-10-26 16:47:36 -07:00
Michael Suo	3ca272cf5a	Topologically-safe node moves (#13026 ) Summary: Add new methods to move a node before/after another node while preserving data data dependencies. Any suggestions for a pithier name for the methods would be appreciated 😃 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13026 Differential Revision: D10854574 Pulled By: QueryConnectionException fbshipit-source-id: b42751cac18d1e23940e35903c8e6a54a395292e	2018-10-26 16:29:03 -07:00
Ilia Cherniavskii	620ece2668	Simplify thread pool creation logic (#13114 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13114 Using one thread pool creator for all device types Reviewed By: manojkris, wesolwsk Differential Revision: D10851533 fbshipit-source-id: 32ca51d7932ba7faa8137df26315f52ecb4c6157	2018-10-26 16:02:08 -07:00
Benoit Steiner	63ce3fbde8	Created a transformer to convertr caffe2 NetDef into ONNX models. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13167 Reviewed By: abadams Differential Revision: D11296189 fbshipit-source-id: 7e49c7a78d26f4af39d50b40f70372272debb34a	2018-10-26 15:57:53 -07:00
Gregory Chanan	9e6bb605f6	Native wrappers for many Declarations.cwrap entries Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13003 Differential Revision: D10515654 Pulled By: gchanan fbshipit-source-id: c3f2809fdb7daeea2209ef1bcdea60266dc4854d	2018-10-26 15:55:15 -07:00
Peter Goldsborough	80f766e5cd	Create FAQ (#13129 ) Summary: Creates a FAQ. https://github.com/pytorch/tutorials/pull/345 now just links to this page. soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13129 Differential Revision: D10854264 Pulled By: goldsborough fbshipit-source-id: 6e57574ffa61409d4d9d1750aa618893b897ad41	2018-10-26 15:44:51 -07:00
Jerry Zhang	eea2ee6d29	Renaming size() to numel() - 1/17 Summary: Codemod generated with clangr shard mode, 25 files per diff Reviewed By: li-roy Differential Revision: D10866237 fbshipit-source-id: 020fcfdf52083430c5b674eda8e07ad3adfcc838	2018-10-26 15:36:59 -07:00
Jerry Zhang	06392bd6a3	Renaming size() to numel() - 3/6 Summary: Codemod generated with clangr shard mode, 50 files per diff Reviewed By: li-roy Differential Revision: D10866389 fbshipit-source-id: 65489f7b3439ff9a62a5a09b77112f0f4931c609	2018-10-26 15:30:11 -07:00
Junjie Bai	883da952be	Hipify caffe2/core (#13148 ) Summary: petrex ashishfarmer iotamudelta Pull Request resolved: https://github.com/pytorch/pytorch/pull/13148 Reviewed By: xw285cornell Differential Revision: D10862276 Pulled By: bddppq fbshipit-source-id: 1754834ec50f7dd2f752780e20b2a9cf19d03fc4	2018-10-26 15:27:32 -07:00
William Horton	1bec8f773b	Move ConstantPadNd into ATen (#10885 ) Summary: Addresses #9499. Completed work on the forward function, tests should be passing for that. Working on backward function now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10885 Differential Revision: D9643786 Pulled By: SsnL fbshipit-source-id: 2930d6f3d2975c45b2ba7042c55773cbdc8fa3ac	2018-10-26 15:25:27 -07:00
Jerry Zhang	e13e86724e	Renaming size() to numel() - 2/6 Summary: Codemod generated with clangr shard mode, 50 files per diff Reviewed By: li-roy Differential Revision: D10866381 fbshipit-source-id: 2fabf78dfea262e0c789cf24cd3ca6191852983b	2018-10-26 15:21:50 -07:00
Sam Gross	b090a54a38	Enable MKLDNN in PyTorch in fbcode (#13165 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13165 Also mark conflicting functions `static` to avoid duplicate symbol errors Reviewed By: orionr Differential Revision: D10998641 fbshipit-source-id: b93aab99b91daa1e082cc778abb28bf9d33c21d5	2018-10-26 14:52:19 -07:00
Sam Gross	e6ce9f303f	Check that QNNPACK directory exists in setup.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13174 Differential Revision: D12808599 Pulled By: colesbury fbshipit-source-id: 2548a024043f32ee570378dfead8880b00608478	2018-10-26 14:37:11 -07:00
Edward Yang	f282fa1afe	Comment out LOG(ERROR) for legacy no-dtyle serialization behavior Reviewed By: wylqc Differential Revision: D12569279 fbshipit-source-id: 46def8ca163bcf9070a1179166fd8970e07ee229	2018-10-26 13:18:27 -07:00
Roy Li	0687f58441	Fix broken master (#13171 ) Summary: Fixes colliding changes in #12766 and #12368 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13171 Differential Revision: D12109430 Pulled By: li-roy fbshipit-source-id: f068c7df227d920aa3840762e892ce6e9c109237	2018-10-26 12:30:55 -07:00
Peter Goldsborough	c21471c77f	Sampler serialization and deserialization (#12999 ) Summary: Implements serialization and deserialization for samplers in the C++ frontend dataloader. apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/12999 Differential Revision: D10859676 Pulled By: goldsborough fbshipit-source-id: cd132100fd35323e5a3df33e314511750806f48d	2018-10-26 12:20:51 -07:00
Lu Fang	9f9f06c937	Improve inline container and add some test (#12993 ) Summary: Added getNextRecord/hasNextRecord methods. Even the model data is stored at the end, we can still read the file from the beginning. Added gtest to cover reader and writer's code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12993 Reviewed By: yinghai Differential Revision: D10860086 Pulled By: houseroad fbshipit-source-id: 01b1380f8f50f5e853fe48a8136e3176eb3b0c29	2018-10-26 12:06:47 -07:00
Wanchao Liang	7ca995c815	Add optional default type annotation to support JIT None default value (#13161 ) Summary: As titled, this PR is a part of tasks to unblock exporting the standard library Pull Request resolved: https://github.com/pytorch/pytorch/pull/13161 Differential Revision: D10866927 Pulled By: wanchaol fbshipit-source-id: 50038dbe6840b097b98cbed9d46a189a64e82302	2018-10-26 11:38:50 -07:00
Peter Goldsborough	8797bb1d30	Revert D10419671: use TypeMeta instead of ScalarType in TensorOptions Differential Revision: D10419671 Original commit changeset: 9cc8c5982fde fbshipit-source-id: c870ecdd3730cf695007ebb110d362996da05e5d	2018-10-26 11:09:58 -07:00
Zachary DeVito	ce0d3e9b35	Bind inplace and _out variants into JIT (#13093 ) Summary: This commit is a minimial initial pass at adding inplace and _out variants to the JIT. It changes gen_jit_dispatch.py to add bindings for these operators, and it also supplements the FunctionSchema with alias information for these operators and for viewing operators. Tests are very minimal and will need to be improved in future commits. Notes: * Custom operator tests needed to be changed since _out variants add overloads, which the custom operator pipeline does not handle when called from python. This commit registers special test ops in the _test namespace for this purpose. * Extends the schema parser to parse alias annotations more robustly. * Extends FunctionSchema with `writes()` a set of alias set names that the op will write to, and `annotatedType()` which will return AnnotatedType objects which contain the alias_set information that was parsed from the schema. * Disables all optimizations in graph executor when a mutable operator is found. This is something that will be improved in the future but is necessary for correctness now. * Adds annotate_ops to gen_jit_dispatch which adds aliasing information to all of the aten ops. * Adds AnnotatedType to the type hierarchy which is used to mark List and Tensor types with their alias_set. These types only appear in schema when you call annotatedType and are erased from types in normal use. * Extends jit::Type with .containedTypes() and .withContained(new_types). The first returns all types contained within the type (e.g. T for T[], or {T,L} for a tuple (T, L)). The second constructs a new version of the same type, replacing the contained types with new_types. This simplifies a lot of logic for recursively cleaning up types. * Refactor List[T] into a common part that is shared with Annotated[T] and can be shared with Optional[T] and Future[T] when they are merged. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13093 Differential Revision: D10848176 Pulled By: zdevito fbshipit-source-id: d057f23eeb99cde8881129b42d3f151ed5e7655d	2018-10-26 10:37:20 -07:00
Roy Li	a70573b589	use TypeMeta instead of ScalarType in TensorOptions (#12768 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12768 Note: DefaultTensorOptions no longer fits in 64-bits. I kept functions that take ScalarType as input to minimize changes for now. Reviewed By: ezyang Differential Revision: D10419671 fbshipit-source-id: 9cc8c5982fde9ff243e03d55c0c52c2aa2c7efd8	2018-10-26 09:27:12 -07:00
Roy Li	2f1542839f	reduce Device to 32bits (#12767 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12767 In preparation of using TypeMeta in TensorOptions. We need TensorOptions to fit in 128-bits, this isn't possible if both TypeMeta and Device are 64-bit. Reviewed By: ezyang Differential Revision: D10416051 fbshipit-source-id: 23c75db14650f7f3045b1298977f61a0690a8534	2018-10-26 09:27:11 -07:00
Roy Li	a7ba4cb383	Change return type of Tensor::dtype() from ScalarType to TypeMeta (#12766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12766 In preparation of using TypeMeta in TensorOptions. Reviewed By: ezyang Differential Revision: D10232118 fbshipit-source-id: 5c69a524fa38e50aa555fb9feb87540bc3575a63	2018-10-26 09:27:09 -07:00
Pieter Noordhuis	46ef2b2898	Ignore flake8 warnings in test_c10d.py (#13159 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13159 These lint violations are intentional. Reviewed By: ezyang Differential Revision: D10862131 fbshipit-source-id: 70ad4b0a360cb12d050805fd7b1080dfe4566e86	2018-10-26 09:17:57 -07:00
Pieter Noordhuis	435228508e	Remove test_distributed_trap.py (#13151 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13151 No longer needed. Reviewed By: ezyang Differential Revision: D10862319 fbshipit-source-id: 01405d7cf2553f59ff7d3dce33755a5fdd8a8f05	2018-10-26 09:15:27 -07:00
Gregory Chanan	929bffe020	Turn some th_ prefixes into _th_ prefixes for conformity. (#13128 ) Summary: This is the same as https://github.com/pytorch/pytorch/pull/12889 with the addmm changes stripped out, since that appears to cause onnx broadcasting issues I don't understand. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13128 Reviewed By: ezyang Differential Revision: D10853911 Pulled By: gchanan fbshipit-source-id: 08ec8629331972f0c332ccd036980fd9c87562b0	2018-10-26 08:08:09 -07:00
Dmytro Dzhulgakov	c95fa4b904	fix dtype uninitialized tensor serialization Summary: See D10380678 for the discussion. Caffe2 serialization code was able to handle dtype uninitalized tensor as long as their numel was 0 O_O. For safety to unblock the push I'm preserving this behavior with critical. As we fix all occurrences of old API, we can delete this test. Reviewed By: kennyhorror Differential Revision: D10866562 fbshipit-source-id: e172bd045fdfca660ff05b426e001f5f2f03f408	2018-10-26 01:30:47 -07:00
Peter Goldsborough	8e1e3ba7b8	Hide c10::optional and nullopt in torch namespace (#12927 ) Summary: Does ```cpp namespace torch { using c10::optional; using c10::nullopt; } ``` So that users can be oblivious of our changes with ATen/c10 happening in the background, and also don't have to deal with multiple namespaces (which is very confusing). ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12927 Differential Revision: D10510630 Pulled By: goldsborough fbshipit-source-id: e456264f2fbca3eda277712de11cdd8acc77fbd4	2018-10-26 00:08:04 -07:00
Dmytro Dzhulgakov	f72f91610f	Move stream to thread local (#13080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13080 This is the first step to untangle this logic: - moves stream id to thread local mechanically - relies on the fact that the value of thread local is valid in conjunction with CUDAContext only until the next SwitchToDevice is called - we should move to proper RAII in the following diffs Follow up diffs are going to move more stuff outside of CUDAContext (by making gpu_id thread local too) and simplify the CopyFrom. The only expected change in behavior is that before CopyFrom would do copy on stream logical id 0 if the context was created on the fly and now it'd do so on the current stream. Since it'd block explicitly, I don't think it matters much. Also, observers were semi-broken by waiting on the potentially wrong stream. It can be fixed later - I renamed the method to avoid abuse. Reviewed By: ezyang Differential Revision: D10525134 fbshipit-source-id: 5d495a21490bebe060a76389f1b47bdf12cbc59e	2018-10-26 00:04:32 -07:00
Thomas Viehmann	dc211c7de4	Move batch_norm to ATen/native, speed up (#12368 ) Summary: - Speed up the case of #12006 in the forward - The backward still isn't as fast as one might hope (factor 2-3 in the #12006 case). - More extensive benchmarking shows not so great performance compared to CuDNN for cases with many channels, e.g. bs=8-128 / c=1024 / f=1024. - We change the meaning of save_mean and save_invstd (aka save_var) to accscalar to maintain reasonable precision. Needless to say that I would happily separate the TensorAccessor fixes in a separate PR, as they're fixes and unrelated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12368 Differential Revision: D10559696 Pulled By: SsnL fbshipit-source-id: f0d0d1e0912e17b15b8fb7a2c03d0fe757598419	2018-10-25 23:41:10 -07:00
Marat Dukhan	5e73b828bd	CMake integration for Int8 ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13145 Differential Revision: D10860849 Pulled By: Maratyszcza fbshipit-source-id: fdbcc23ff9beaeaedfd561176df6cfe87685c1f5	2018-10-25 22:25:10 -07:00
Richard Zou	4870b1b68f	Speed up tensor.resize_(sizes) when tensor has correct size (#12824 ) Summary: While using gbenchmark, I found `tensor.resize_({0})` would take 300ns if tensor already has the correct size. This is important for `at::empty({0})` perf because `at::empty` always calls `resize_`, which in turn is a important for JIT perf: the fusion compiler creates empty tensors and then `resize_`s them to computed sizes. Most of the 300ns is due to DeviceGuard (200ns) Summary of findings: - `at::empty({0}, cuda)`: 851ns - `empty_tensor.resize({0})`: 308ns - `DeviceGuard(tensor)`: ctor + dtor: 200ns (Going to look into this next because it impacts `resize_` perf). - vdispatch overhead (`tensor.resize_()` vs `at::native::resize__cuda(tensor)`): ~10ns This PR rips out the TH `resize_` implementation and adds it to ATen with the following modifications: - DeviceGuard used only after the same-size check. - Same-size check rewritten for simplicity. The new check doesn't affect perf. - empty_cpu / empty_cuda avoid the dispatch overhead to tensor.resize_. Timing with this PR: - `at::empty({0}, cuda)`: 363ns - `empty_tensor.resize_({0})`: 17ns Future: - Investigate `resize_(sizes)` slowness when `tensor.sizes() != sizes` - Should tell resize_as_ to use the new resize_ implementation... (because resize_as_ is in TH, it is calling the old TH resize_) Pull Request resolved: https://github.com/pytorch/pytorch/pull/12824 Differential Revision: D10449209 Pulled By: zou3519 fbshipit-source-id: cecae5e6caf390017c07cd44a8eaf2fa6e3fdeb6	2018-10-25 21:09:41 -07:00
Roy Shi	60c0508d96	Use CAFFE_ENFORCE instead of CHECK in caffe2 rnn executor (#13144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13144 The intention of this diff is to prevent prevent predictor service from crashing by the "Check failed: timestep >= 0 && timestep < _T" error, as a bandage, before D10848803 can be landed (assuming D10848803 replaces the CHECKs into CAFFE_ENFORCEs, too). Reviewed By: ilia-cher Differential Revision: D10857963 fbshipit-source-id: bb56ad83aa867a2d25953aa7ffd84b078f8bf84a	2018-10-25 20:58:13 -07:00
zrphercule	5cbb33f939	Disable upsample optest (#13135 ) Summary: Temporarily disable upsample tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13135 Reviewed By: bddppq Differential Revision: D10859926 Pulled By: houseroad fbshipit-source-id: 9eb068198d43ba0939d81a9e41eb6f24ff19cb6d	2018-10-25 20:37:09 -07:00
Richard Zou	efab8e8fdf	Speed up tensor.get_device(), is_cuda(), is_sparse() by avoiding dispatches (#12841 ) Summary: `tensor.get_device()` went through two dispatches: once to the native function `get_device()`, and another when `get_device` calls `_th_get_device()`. This PR avoids the dispatch by directly implementing the `get_device` function as a method on Tensor. Future Work: - Investigate caching Device on TensorImpl. This will probably bring the tensor.get_device down to 2ns, but I'm not sure it's worth it. before: ``` ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ BM_TensorTypeId 0 ns 0 ns 1000000000 BM_TensorType 8 ns 8 ns 89407911 BM_TensorIsCuda 24 ns 24 ns 29313017 BM_TensorIsSparse 27 ns 27 ns 26083160 BM_TensorTypeIsCuda 11 ns 11 ns 65128120 BM_TensorNumel 11 ns 11 ns 68314492 BM_TensorGetDevice 71 ns 71 ns 9633125 BM_DeviceGuardCtor 173 ns 173 ns 4067173 BM_DeviceGuard 232 ns 232 ns 3009690 ``` after: ``` ------------------------------------------------------------------------ Benchmark Time CPU Iterations ------------------------------------------------------------------------ BM_TensorTypeId 0 ns 0 ns 1000000000 BM_TensorType 10 ns 10 ns 69803872 BM_TensorIsCuda 2 ns 2 ns 321626683 BM_TensorIsSparse 6 ns 6 ns 177045382 BM_TensorNumel 12 ns 12 ns 58770533 BM_TensorGetDevice 4 ns 4 ns 128113396 BM_DeviceGuardCtor 52 ns 52 ns 14997278 BM_DeviceGuard 158 ns 158 ns 5767248 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12841 Differential Revision: D10489353 Pulled By: zou3519 fbshipit-source-id: a596bc77352f21d5d35433c6de02c2f65aab5f9e	2018-10-25 19:57:52 -07:00
Frank Jiang	b827a40880	Implement bucket-based attention pooling for IdScoreList features (#13004 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13004 Implement BucketWeighted model layer, which learns a weight for each possible score in an IdScoreList. Here, we assume that the scores in the IdScoreList have already been converted into the appropriate 'buckets'. If this is not done, then essentially each score represents its own bucket. We assume that the scores/buckets are integers, and if max_score is not set, we assume that the maximum cardinality of the score is less than or equal to the cardinality of the ids. Reviewed By: chonglinsun Differential Revision: D10413186 fbshipit-source-id: 743e643a1b36adf124502a8b6b29976158cdb130	2018-10-25 18:04:08 -07:00
Wanchao Liang	3ac9a9577c	Remove optional from caffe2 utils (#12965 ) Summary: Now we have everything from c10::optional, we can delete this and keep a single version in c10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12965 Differential Revision: D10504042 Pulled By: wanchaol fbshipit-source-id: c0ec3892e92968cca264ae8924c19111674631ba	2018-10-25 17:29:04 -07:00
Orion Reblitz-Richardson	99d24aefc3	Move a number of ATen checks out of Dependencies.cmake (#12990 ) Summary: cc Yangqing mingzhe09088 anderspapitto mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12990 Differential Revision: D10862301 Pulled By: orionr fbshipit-source-id: 62ba09cf0725f29692fac71bc30173469283390b	2018-10-25 17:26:25 -07:00
Yangqing Jia	852d6e8b65	Fix python2 and python 3 compatibility found by lint. (#13140 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13140 This is an example about the benefit of proper facebook linter. The old code was not python 2.x (actually, pre-python 3.3) compatible. Note that FileExistsError is added in Python 3.3: https://stackoverflow.com/questions/20790580/python-specifically-handle-file-exists-exception Reviewed By: mingzhe09088 Differential Revision: D10858804 fbshipit-source-id: a4c995aef9f720cb8b0ce463f0a51db667fc42f2	2018-10-25 17:20:11 -07:00
Michael Suo	defe96eb6c	add topology index check in Graph::lint() (#13037 ) Summary: just a sanity check to make sure everything is in order Pull Request resolved: https://github.com/pytorch/pytorch/pull/13037 Differential Revision: D10854563 Pulled By: michaelsuo fbshipit-source-id: 409303c4cbf058b75e24bf2213b49e9d79cb862e	2018-10-25 17:02:38 -07:00
Pieter Noordhuis	526460fc8b	Use default timeout of 30 minutes for gloo backend (#13056 ) Summary: The existing default timeout was set at 10 seconds, which is too low for asynchronous tasks that depend on a barrier to resynchronize. Having a single timeout for all operations is not ideal and this will be addressed in future commits. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13056 Reviewed By: teng-li Differential Revision: D10558746 Pulled By: pietern fbshipit-source-id: d857ea55b1776fc7d0baf2efd77951b5d98beabb	2018-10-25 16:35:53 -07:00
Wanchao Liang	4e1c64caee	Add c10::optional to type syntax (#12582 ) Summary: This PR adds optional type to ATen native, autograd, JIT schema and Python Arg parser, closes #9513. It allows us to use optional default values (including None) for function signature and implementations like clamp, etc., and also let us remove the python_default_init hack. Follow up: remove python_default_init completely. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12582 Differential Revision: D10417423 Pulled By: wanchaol fbshipit-source-id: 1c80f0727bb528188b47c595629e2996be269b89	2018-10-25 16:08:29 -07:00
Wendong Li	569a29b81a	Make chunk size configurable in SaveOp (#12949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12949 Currently the default chunk size in save operation is 1MB and I don't find a way to configure it at runtime. Add a parameter to configure chunk size in SaveOp. Reviewed By: mraway, xsh6528 Differential Revision: D10454037 fbshipit-source-id: a5cd8f9846aea4b1e3612a3fcfa431b68bda8104	2018-10-25 15:47:34 -07:00
Jerry Zhang	f6ccb6a0f9	bring caffe2::Tensor API closer to aten/pytorch (#13134 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13134 For tensor, we plan to do the following renaming: ``` * t.ndim() → t.dim() * t.size() → t.numel() * dims() → t.sizes() * t.meta() → t.dtype() * t.dim(d) → t.size(d) ``` This diff adds new APIs in caffe2::Tensor so we can start codemod, we'll remove old API after the codemod Reviewed By: ezyang Differential Revision: D10856028 fbshipit-source-id: 1638997e234d7b3113ef8be65a16246f902273c7	2018-10-25 15:45:09 -07:00
Dmytro Dzhulgakov	49046239f2	Change explicit usages of at::optional to c10::optional (#13082 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13082 Follow up of D10511254. For these cases we can move to preferred `optional` without namespace right away. Reviewed By: ezyang, Yangqing Differential Revision: D10844117 fbshipit-source-id: 99a59e692fb4b236b299579f937f1536d443d899	2018-10-25 15:17:53 -07:00
Dmytro Dzhulgakov	be99eff75a	Back out "Revert D10494123: [c10] Remove at::Optional" (#12991 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12991 Remove the file proxying. Before we can do land `using namespace c10` everywhere, we just keep the one off namespace proxy. The follow up diff is going to replace explicit at::optional but keep just `optional` usage Reviewed By: ezyang, Yangqing Differential Revision: D10511254 fbshipit-source-id: 8297c61d7e9810ae215a18869a6ec9b63f55d202	2018-10-25 15:17:51 -07:00
Yangqing Jia	c47f680086	arc lint torch/utils (#13141 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13141 This is an example diff to show what lint rules are being applied. Reviewed By: mingzhe09088 Differential Revision: D10858478 fbshipit-source-id: cbeb013f10f755b0095478adf79366e7cf7836ff	2018-10-25 14:59:03 -07:00
Teng Li	4f94d82c7f	clang-format on c10d and THD (#13138 ) Summary: clang-format-6 run on all cpp,cc,c,cu,cxx,hpp,hxx,h files under /c10d and /thd Pull Request resolved: https://github.com/pytorch/pytorch/pull/13138 Differential Revision: D10857742 Pulled By: teng-li fbshipit-source-id: f99bc62f56019c05acdfa8e8c4f0db34d23b4c52	2018-10-25 14:16:47 -07:00
zrphercule	c6defa0847	Add randn in onnx symbolic (#12880 ) Summary: In this pr we added operator randn in onnx symbolic. Also, related tests are added. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12880 Reviewed By: houseroad Differential Revision: D10501788 Pulled By: zrphercule fbshipit-source-id: ba8bb00ca848c4b95decabf638a1bc13fe11d03e	2018-10-25 14:11:23 -07:00
Sebastian Messmer	979560c9fc	Include c10 namespace into caffe2 and at namespaces. (#12950 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12950 For backwards compatibility, we want the c10 symbols to be reachable from caffe2 and aten. When we move classes from at/caffe2 to c10, this 1. allow keeping backwards compatibility with third paty code we can't control 2. allows splitting diffs that move such classes into two diffs, where one only fixes the includes and the second one fixes the namespaces. Reviewed By: ezyang Differential Revision: D10496244 fbshipit-source-id: 914818688fad8c079889dfdc6242bc228b539f0e	2018-10-25 14:08:47 -07:00
Sebastian Messmer	d6fe812187	Fix TensorList ambiguity (#13024 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13024 There's a TensorList type in ivalue.h and one in ScalarType.h, and they are different. This diff moves IValue types into an ivalue namespace so we can merge the namespaces without conflicts. Reviewed By: ezyang Differential Revision: D10518929 fbshipit-source-id: cb760b6804a399880d2bff3acf9a3422d99fc0b8	2018-10-25 14:08:45 -07:00
David Riazati	14ea4bf0d1	Make 7 nn modules into weak modules (#12966 ) Summary: Depends on #12682 ([stacked diff](https://github.com/driazati/pytorch/compare/weak_mod...driazati:mod_conv1)) * Adds tests for weak module conversion that creates a `ScriptModule` that uses the weak module and checks its graph * Adds `torch._jit_internal.weak_module` tags to modules that already work * `Sigmoid` * `Tanh` * `Hardshrink` * `PReLU` * `Softsign` * `Tanhshrink` * `PairwiseDistance` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12966 Differential Revision: D10559557 Pulled By: driazati fbshipit-source-id: dc4bea3aa744b3c44d4fa7dceefd97e951f824d0	2018-10-25 13:59:34 -07:00
Anders Papitto	e07e63f0b3	Absorb shm Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13088 Differential Revision: D10856067 Pulled By: anderspapitto fbshipit-source-id: cfbf0f6cad3953e1ee1c55482c00a3db9f140594	2018-10-25 13:55:23 -07:00
Peter Goldsborough	175e553974	Do a better job of checking registered names (#13016 ) Summary: We currently don't check names in `register_module` and `register_parameter` as thoroughly as we do in Python. This PR fixes this. Python checks are e.g. in https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py#L108 ezyang ebetica apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/13016 Differential Revision: D10853800 Pulled By: goldsborough fbshipit-source-id: 765357875e90a5046e72351a7a47a86511633ab6	2018-10-25 13:52:08 -07:00
Gregory Chanan	c91d982691	Improve expand error message by including complete sizes rather than … (#13124 ) Summary: …size at dimension. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13124 Reviewed By: ezyang Differential Revision: D10853167 Pulled By: gchanan fbshipit-source-id: 76eeb922304bf19243d9bc52da87f2be8d1700ae	2018-10-25 13:37:25 -07:00
Marat Dukhan	9cb4bce847	Open-source Caffe2 Int8 ops (#13065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13065 - Open-source Caffe2 Int8 (quantized) operators Reviewed By: Yangqing Differential Revision: D10524381 fbshipit-source-id: 6daa153dc247572900c91e37262d033c368b382d	2018-10-25 12:43:00 -07:00
Edward Yang	faa354e102	Commentary about size constraints on TensorImpl. (#13126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13126 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Differential Revision: D10454455 Pulled By: ezyang fbshipit-source-id: 7018a41b94e316305751f2f8ad2c2d049799f5d4	2018-10-25 12:24:49 -07:00
Edward Yang	cb15c7615a	Documentation on TensorImpl. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12713 Reviewed By: li-roy, dzhulgakov Differential Revision: D10404407 fbshipit-source-id: cbc6be2172af068c3fc96e1f6da0b04b6f29ad4b	2018-10-25 12:24:48 -07:00
Peter Goldsborough	ae44627661	Rm test_jit.cpp (#12988 ) Summary: Removes test_jit.cpp, which was supposed to have been deleted in https://github.com/pytorch/pytorch/pull/12030 I had to move zou3519's dynamic DAG tests into `test/cpp/jit/tests.h` too. No other changes to `test_jit.cpp` seem to have happened in the meantime. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/12988 Differential Revision: D10854320 Pulled By: goldsborough fbshipit-source-id: 7ab533e6e494e34a16ce39bbe62b1150e48fcb58	2018-10-25 12:18:15 -07:00
Jerry Zhang	314d95a5f2	Renaming dims() to sizes() (caffe2/caffe2) - 3/4 (#13096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13096 Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10842875 fbshipit-source-id: 1784859735ed4d1bd5ccd7ca56e289498374a68f	2018-10-25 12:14:21 -07:00
Johannes M Dieterich	557db18c85	Enable MIOpen properly (#13048 ) Summary: * Disable MIOpen convolution on double tensors * MIOpen: set group count in convolution descriptor * MIOpen: Honor Max Dim (ROCm 222) * MIOpen: Batchnorm - Allow half/half and half/float, disallow double * Limit MIOpen batchnorm to same-precision * Fix maxdim check. (ROCm 246) * Fix reversed logic in DISABLE_MIOPEN (ROCm 253) * Export LANG/LC_ALL also for the test step. * Make tensors contiguous before calling MIOpen batch norm * Actually pass dilation to MIOpen. * Do not use MIOpen if there is dilation and the group size is > 1. - This is officially not supported currently. * Fixes for miopenforward bias call * Modified init conv descriptor param values and used same value for dilation * MIOpen: disable transposed convolutions For attention: bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13048 Differential Revision: D10785250 Pulled By: bddppq fbshipit-source-id: f9d9797de644652280d59308e5ea5cc07d177fd4	2018-10-25 11:32:49 -07:00
Tristan Rice	ab40eff5dd	caffe2: UpsampleBilinear CUDA implementation (#12843 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12843 This adds a cuda implementation for the UpsampleBilinearOp and UpsampleBilinearGradientOp. The CUDA code is based off of the corresponding ResizeNearest operators but with bilinear interpolation logic taken from the CPU implementation. Reviewed By: houseroad Differential Revision: D10453776 fbshipit-source-id: b29ac330b72465974ddb27c0587bca590773fdec	2018-10-25 11:10:04 -07:00
Richard Zou	796181d762	Fix UB in CPU_tensor_apply (#13121 ) Summary: std::memcpy has UB when either of src or dest are NULL, even if length is 0. This can and does happen when the input tensors are scalar tensors. This triggered UBSAN on #12824 but it is strange that it has not been triggered before. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13121 Differential Revision: D10853113 Pulled By: zou3519 fbshipit-source-id: c4b4ad5e41de6f73dc755e0c25bc9947576a742d	2018-10-25 10:58:06 -07:00
David Riazati	eac3e7ab7c	improve constants error message (#13072 ) Summary: Adds the attribute name to the error message and fixes the corresponding test to actually run Pull Request resolved: https://github.com/pytorch/pytorch/pull/13072 Differential Revision: D10846622 Pulled By: driazati fbshipit-source-id: a7eee6320c28140c4937ede3d4e4685cfce08d84	2018-10-25 10:45:42 -07:00
Sam Gross	9fefab5ac6	Add support for reductions to TensorIterator (#11908 ) Summary: This adds support for reductions like sum() and mul() to TensorIterator. Performance is similar to existing optimized code for CPU, and generally better than existing code for CUDA kernels. The templatized CUDA kernel requires fewer instantiations than the existing THCReduce/THCReduceAll code. For example, sum() previously generated 43 CUDA kernels, while it now requires only one (larger) CUDA kernel. I suspect this should reduce code-size and compilation time, but I haven't measured it. Below are timings for sum() on [CPU](https://ark.intel.com/products/81908/Intel-Xeon-Processor-E5-2680-v3-30M-Cache-2_50-GHz) (12 threads and 1 thread) and CUDA with various tensor sizes. CPU \| Reduction (dim) \| Master \| PR \| Master (1 thread) \| PR (1 thread) \| \|----------------------\|---------\|---------\|-------------------\|---------------\| \| 1024x1024 (all) \| 22 us \| 34 us \| 136 us \| 147 us \| \| 1024x1024 (0) \| 30 us \| 28 us \| 160 us \| 160 us \| \| 1024x1024 (1) \| 25 us \| 25 us \| 171 us \| 146 us \| \| 1024x10x1024 (all) \| 542 us \| 550 us \| 4.14 ms \| 3.11 ms \| \| 1024x10x1024 (0) \| 658 us \| 690 us \| 6.80 ms \| 5.93 ms \| \| 1024x10x1024 (1) \| 761 us \| 757 us \| 3.34 ms \| 3.52 ms \| \| 1024x10x1024 (2) \| 538 us \| 545 us \| 3.73 ms \| 3.04 ms \| \| 1024x1024x1024 (all) \| 72 ms \| 71 ms \| 364 ms \| 357 ms \| \| 1024x1024x1024 (0) \| 94 ms \| 90 ms \| 935 ms \| 927 ms \| \| 1024x1024x1024 (1) \| 80 ms \| 86 ms \| 881 ms \| 688 ms \| \| 1024x1024x1024 (2) \| 71 ms \| 71 ms \| 456 ms \| 354 ms \| CUDA \| Reduction (dim) \| M40 base \| M40 PR \| P100 base \| P100 PR \| \|----------------------\|----------\|---------\|-----------\|-----------\| \| 1024x10x1024 (all) \| 238 us \| 182 us \| 136 us \| 97 us \| \| 1024x10x1024 (0) \| 166 us \| 179 us \| 105 us \| 84 us \| \| 1024x10x1024 (1) \| 181 us \| 182 us \| 89 us \| 91 us \| \| 1024x10x1024 (2) \| 180 us \| 168 us \| 88 us \| 79 us \| \| 1024x1024x1024 (all) \| 17.5 ms \| 16.4 ms \| 8.23 ms \| 7.48 ms \| \| 1024x1024x1024 (0) \| 27.2 ms \| 28.6 ms \| 7.63 ms \| 7.38 ms \| \| 1024x1024x1024 (1) \| 16.5 ms \| 16.3 ms \| 7.66 ms \| 7.40 ms \| \| 1024x1024x1024 (2) \| 17.8 ms \| 16.4 ms \| 8.37 ms \| 7.31 ms \| Timings were generated with this script: https://gist.github.com/colesbury/d3238b266d8a9872fe6f68f77619b379 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11908 Differential Revision: D10071760 Pulled By: colesbury fbshipit-source-id: 40e37a0e6803f1628b94cc5a52a10dfbb601f3d6	2018-10-25 09:42:55 -07:00
Jerry Zhang	e5752f2cb4	Renaming dims() to sizes() (fbcode) Summary: Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10848643 fbshipit-source-id: ac75833be8be9162e35b00dcd352f616bc7bbafe	2018-10-25 09:32:18 -07:00
harouwu	1720757220	added submodules for int8 ops (#13106 )	2018-10-25 09:11:11 -07:00
Pieter Noordhuis	2a6431ba2d	Use fixed MASTER_PORT in test_distributed (#13109 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13109 The "right" strategy of creating a socket, binding to an undefined port, closing the socket, and reusing the port it was bound to, was subject to a race condition. Another process could bind to that same port sooner than the tests would, causing an "Address already in use" failure when rank 0 would try and bind to that same port. The THD tests have been using a fixed port since forever. Time will tell if this fixes #12876. Differential Revision: D10850614 fbshipit-source-id: c19f12bb4916141187ee8ddb52880f5f418310dc	2018-10-25 08:51:34 -07:00
Edward Yang	956e620c64	Eliminate numel == -1 state, delete Storage-only constructor (#12656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12656 I originally wanted to do this in two steps, but deleting the Storage-only constructor also changes the default numel state (which breaks tests), so easiest to do it all in one go.) - I still need a way to compute the correct TensorTypeId for all of the Caffe2 constructors; rather than hard-code it, I wrote a function in at::detail::computeTensorTypeId() to do this calculation. Maybe this function could be used more widely, but for now, it's used by Caffe2 only. - Added a pile more TensorTypeId for all of Caffe2's supported DeviceTypes - Because I still can't put arbitrary TypeMeta in TensorOptions, the TensorTypeId() calculation doesn't respect dtype. For now, this is not a problem, but this might block work to split non-POD dtypes into their own TensorTypeId. Reviewed By: li-roy Differential Revision: D10380678 fbshipit-source-id: 10c5d12020596fc9f27d5579adffad00513af363	2018-10-25 08:44:05 -07:00
Edward Yang	c368f26f88	Disable CircleCI merging to master. (#13074 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13074 Differential Revision: D10852728 Pulled By: ezyang fbshipit-source-id: 6b96c941f4655ba240adaa0678844efa2af81d06	2018-10-25 08:07:45 -07:00
Edward Yang	e8613d99b5	Delete ATen/CUDAGuard.h (#13078 ) Summary: It's empty. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/13078 Differential Revision: D10843892 Pulled By: ezyang fbshipit-source-id: 39e6f73b3a8be3e7573c1af727b65da246d4515b	2018-10-25 07:52:38 -07:00
Andrey Malevich	6995b84d45	Make SparseToDense handle empty outputs properly. (#13043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13043 memset on nullptr is undefined-behavior and as a result filament_test is failing in dev build. This diff is making operator to handle empty output properly, so we can return that test back. I'm not sure either this is even valid to call this op with input that would require empty memset (empty batch?). Will leave this to ninghz and sunnieshang to decide. Reviewed By: xianjiec Differential Revision: D10525605 fbshipit-source-id: a911cdbd62fc3d948328981fd01cd205ec2ad99f	2018-10-25 00:27:52 -07:00
Bram Wasti	f1e4304d19	Add operator_def property to annotation (#13094 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13094 Expose operator_def property Reviewed By: duc0 Differential Revision: D10847125 fbshipit-source-id: 67a066555b690715e1f5f04125fd446ab197f45a	2018-10-24 23:42:35 -07:00
Anders Papitto	b883afc928	Absorb c10d into the main cmake build Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12953 Differential Revision: D10850274 Pulled By: anderspapitto fbshipit-source-id: 42296e6e49ad8c1845040e031eab95ddbaf58ae4	2018-10-24 22:34:00 -07:00
Teng Li	c250f6f3d5	DDP perf improvement: move sync_reduction to C++, dedicated CUDA streams for memcpy (#12954 ) Summary: - Moved sync_reduction to C++ - Use a dedicated CUDA stream for memcpy - Also use a dedicated CUDA stream for memcpy in queue_reduction Added test as well. CI should cover both DDP and unittest Pull Request resolved: https://github.com/pytorch/pytorch/pull/12954 Differential Revision: D10520069 Pulled By: teng-li fbshipit-source-id: 64348e4e43c15f9695a4c28b036c232587ecfb65	2018-10-24 21:37:13 -07:00
Anders Papitto	69906afaee	absorb THD into main cmake build (#12775 ) Summary: We want to move _C into the same cmake invocation that builds libcaffe2 and libtorch. However, _C depends on THD and c10d, which in turn depend on libcaffe2. That means that we can't move _C into that cmake file unless we do these two first. This change does so. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12775 Differential Revision: D10457374 Pulled By: anderspapitto fbshipit-source-id: 2c1aa3b8a418a73d2112e93c7da53a2e70cf7bba	2018-10-24 21:28:37 -07:00
Teng Li	2d9b1fcd09	Make c10d support MPICH and further (#13083 ) Summary: Fixed issue: https://github.com/pytorch/pytorch/issues/12921 Build and works with mpich, all test passed. We should add MPICH to CI at one point of time alter Pull Request resolved: https://github.com/pytorch/pytorch/pull/13083 Reviewed By: soumith Differential Revision: D10844833 Pulled By: teng-li fbshipit-source-id: e8cdc866ee1ee7a33e469017ea562a08da119d53	2018-10-24 20:11:56 -07:00
Teng Li	b4d0dc77be	Eliminate CUDAStream nullptr in NCCL (#13089 ) Summary: As the title says, we should always use the current stream on device in NCCL. This can unblock ezyang on his further work Pull Request resolved: https://github.com/pytorch/pytorch/pull/13089 Reviewed By: ezyang Differential Revision: D10847172 Pulled By: teng-li fbshipit-source-id: 7fc7c4248b5efa1971d2af4d43f62d3379debfe4	2018-10-24 20:04:41 -07:00
iotamudelta	fc1c8f8b5b	Enable test_nn embedding tests and use correct warp size in Embedding.cu (#13046 ) Summary: * Enable test_nn embedding tests and use correct warp size in Embedding.cu * Fix embedding_backward_feature_kernel kernel for HIP For attention: bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/13046 Differential Revision: D10560721 Pulled By: bddppq fbshipit-source-id: e6c3cbeb980a34ff52a92dba8bde745a2e03f2fd	2018-10-24 19:43:37 -07:00
Yiming Wu	444cc0ee0a	Back out "[pytorch][PR] added gemmlowp module" (#13090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13090 Original commit changeset: 7f8a649c739c Reviewed By: Maratyszcza Differential Revision: D10846367 fbshipit-source-id: a5a5aad29b51287dc1cb80c707eb5a0008ec78f5	2018-10-24 19:41:15 -07:00
Ailing Zhang	478886be30	Fix print precision and match numpy behavior (#12746 ) Summary: Fixes #12578 #9395. * Fix and simplify print logic * Follow numpy print rule `eb2bd11870/numpy/core/arrayprint.py (L859)` > scientific notation is used when absolute value of the smallest number is < 1e-4 or maximum > 1e8 or the ratio of the maximum absolute value to the minimum is > 1e3 I hope I didn't break anything since there seems to be a lot of edge cases here... Here are some easy sanity checks. ``` In [5]: torch.tensor(1) Out[5]: tensor(1) Out[2]: array(1) # numpy In [6]: torch.tensor(10) Out[6]: tensor(10) Out[3]: array(10) # numpy In [8]: torch.tensor(99000000) Out[8]: tensor(99000000) Out[5]: array(99000000) # numpy In [9]: torch.tensor(100000000) Out[9]: tensor(100000000) Out[6]: array(100000000) # numpy In [10]: torch.tensor(100000001) Out[10]: tensor(100000001) Out[7]: array(100000001) # numpy In [11]: torch.tensor(1000000000) Out[11]: tensor(1000000000) Out[8]: array(1000000000) # numpy In [12]: torch.tensor([1, 1000]) Out[12]: tensor([ 1, 1000]) Out[9]: array([ 1, 1000]) # numpy In [13]: torch.tensor([1, 1010]) Out[13]: tensor([ 1, 1010]) Out[10]: array([ 1, 1010]) # numpy ``` For floating points, we use scientific when `max/min > 1000 \|\| max > 1e8 \|\| min < 1e-4` Lines with "old" are old behaviors that either has precision issue, or not aligned with numpy ``` In [14]: torch.tensor(0.01) Out[14]: tensor(0.0100) Out[11]: array(0.01) # numpy In [15]: torch.tensor(0.1) Out[15]: tensor(0.1000) Out[12]: array(0.1) # numpy In [16]: torch.tensor(0.0001) Out[16]: tensor(0.0001) Out[14]: array(0.0001) # numpy In [17]: torch.tensor(0.00002) Out[17]: tensor(2.0000e-05) Out[15]: array(2e-05) # numpy Out[5]: tensor(0.0000) # old In [18]: torch.tensor(1e8) Out[18]: tensor(100000000.) Out[16]: array(100000000.0) # numpy In [19]: torch.tensor(1.1e8) Out[19]: tensor(1.1000e+08) Out[17]: array(1.1e8) # numpy 1.14.5, In <= 1.13 this was not using scientific print Out[10]: tensor(110000000.) # old In [20]: torch.tensor([0.01, 10.]) Out[20]: tensor([ 0.0100, 10.0000]) Out[18]: array([ 0.01, 10. ]) # numpy In [21]: torch.tensor([0.01, 11.]) Out[21]: tensor([1.0000e-02, 1.1000e+01]) Out[19]: array([ 1.00000000e-02, 1.10000000e+01]) # numpy Out[7]: tensor([ 0.0100, 11.0000]) # old ``` When print floating number in int mode, we still need to respect rules to use scientific mode first ``` In [22]: torch.tensor([1., 1000.]) Out[22]: tensor([ 1., 1000.]) Out[20]: array([ 1., 1000.]) # numpy In [23]: torch.tensor([1., 1010.]) Out[23]: tensor([1.0000e+00, 1.0100e+03]) Out[21]: array([ 1.00000000e+00, 1.01000000e+03]) # numpy Out[9]: tensor([ 1., 1010.]) # old ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12746 Differential Revision: D10443800 Pulled By: ailzhang fbshipit-source-id: f5e4e3fe9bf0b44af2c64c93a9ed42b73fa613f5	2018-10-24 18:12:51 -07:00
Bram Wasti	3761adc889	C++ API Cleanup Extension (#13087 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13087 API changes that simplify subgraph replacement drastically Reviewed By: duc0 Differential Revision: D10444011 fbshipit-source-id: 22c699bb5bc0f21538c70fe9401899d4f7e1b055	2018-10-24 18:06:50 -07:00
Bram Wasti	3fa9ccf1ba	Add new NeuralNetOps for fusion (#13068 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13068 Basic ops.def update and converter.cc updates This is the standard way to ingest networks into nomnigraph redo of D10412639 Reviewed By: ZolotukhinM Differential Revision: D10560324 fbshipit-source-id: c8ccb0aabde6ee8f823657ee5cd3ed9ed6c45549	2018-10-24 18:06:49 -07:00
Bram Wasti	e0a8665d03	Converter fix to allow unimplemented convertToOperatorDef (#13069 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13069 simply a new fallback Reviewed By: ZolotukhinM Differential Revision: D10591414 fbshipit-source-id: 1ad8f16135a6c68b2df889101f06b736a3e4f7da	2018-10-24 18:06:48 -07:00
Bram Wasti	ef019a2d18	Improve the C++ API (#13067 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13067 Cleaning up the interface for nomnigraph in C++ world redo of D10438090 Reviewed By: ZolotukhinM Differential Revision: D10560323 fbshipit-source-id: e4e084284615e813836a7d031b5a71e8d80b0e62	2018-10-24 18:06:46 -07:00
Jerry Zhang	3b919a6f82	Renaming dims() to sizes() (caffe2/caffe2) - 1/4 Summary: Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10842786 fbshipit-source-id: 551421a2cb4d2f2fc7f43775d4554643de0f0694	2018-10-24 17:36:08 -07:00
Yiming Wu	9573ecefe3	Back out "[pytorch][PR] Add sse2neon tp" (#13091 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13091 Original commit changeset: 8b4f9f361cc1 Reviewed By: Maratyszcza Differential Revision: D10846301 fbshipit-source-id: 2798f1fca5c1a2362979977ef5eb724dd37c4e6d	2018-10-24 17:17:34 -07:00
Junjie Bai	e290a9d2fd	Back out "Migrate DeviceOption.numa_node_id to DeviceOption.device_id" Summary: Original commit changeset: 82583d0ad4b8 Reviewed By: enosair, ilia-cher Differential Revision: D10560741 fbshipit-source-id: e289a37d441bd2243b369810abf451292891d9ee	2018-10-24 17:11:25 -07:00
Junjie Bai	ccfaf46431	Make CUDNN an alias of MIOPEN for HIP ops (#12278 ) Summary: This is mostly for reusing all the cudnn test cases in our python operator_tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12278 Differential Revision: D10842592 Pulled By: bddppq fbshipit-source-id: 4b3ed91fca64ff02060837b3270393bc2f9a9898	2018-10-24 17:07:31 -07:00
Kashif Rasul	e1243cef88	fixed docs for Student-T distribution (#13044 ) Summary: added loc and scale args. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13044 Differential Revision: D10560762 Pulled By: ezyang fbshipit-source-id: 6c98ecc04975df8993364b06c480d015a25e2061	2018-10-24 16:59:23 -07:00
Peter Goldsborough	86881cdb39	MNIST images should have an extra dim (#13060 ) Summary: Our convolution ops and such expect three dimensional images, but the images in the MNIST dataset of the C++ frontend currently only have two. apaszke ebetica soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/13060 Differential Revision: D10560754 Pulled By: goldsborough fbshipit-source-id: a2cc877b4f43434482bec902c941fafb7a157d5d	2018-10-24 16:53:37 -07:00
David Riazati	6727133f3d	Support warnings.warn (#12964 ) Summary: `warnings.warn` is used commonly thoughout `nn.functional`, so this adds support for it by forwarding its arguments to `print` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12964 Differential Revision: D10559427 Pulled By: driazati fbshipit-source-id: 5b591f6f446c906418f9fc7730c17e301f263d9b	2018-10-24 16:48:02 -07:00
Jerry Zhang	b790fcaf39	Renaming dims() to sizes() (caffe2/caffe2) - 4/4 Summary: Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10842900 fbshipit-source-id: 8d58ed4d403fb0308a8fa286659f8e830b040bec	2018-10-24 16:32:51 -07:00
Edward Yang	a4475d529d	Use GetFetchStackTrace for the AT_* error macros too. (#13007 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13007 No reason to use the hook if it's set, this helps fbcode traces. This slightly pessimizes the stack trace for ATen functions, because we are no longer skipping all of the frames we should. This is probably OK. Reviewed By: Yangqing Differential Revision: D10518499 fbshipit-source-id: be54e490df3c3fde7ff894b5b1473442ffc7ded3	2018-10-24 16:18:25 -07:00
Pieter Noordhuis	917b203b01	Assert spawned processes terminating in distributed tests (#13071 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13071 In the case where a process got stuck and timed out on joining, we would see a None != 1 assertion error in the code path where the exit statuses are compared. This implies that the first process exited with exit code 1 and another one didn't exit at all. With this commit the error message is more descriptive. Differential Revision: D10785266 fbshipit-source-id: c8cc02d07ea4fdc6f5374afd9a0aac72218fe61d	2018-10-24 16:03:36 -07:00
Jerry Zhang	2ac7b6b683	Tensor dims() -> sizes() (caffe2/operators) - 5/5 (#13032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13032 Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10476235 fbshipit-source-id: 263ad75689d864b414dae63cb9a30cb3285dae31	2018-10-24 15:07:43 -07:00
Jerry Zhang	cccd457a1e	Tensor dims() -> sizes() (caffe2/operators) - 4/5 (#13031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13031 Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10476232 fbshipit-source-id: cb4ad76be068065eb2c5e7d87f33d04423cf93c4	2018-10-24 15:07:42 -07:00
Jerry Zhang	ab253c2bf1	Tensor dims() -> sizes() (caffe2/operators) - 3/5 (#13030 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13030 Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10476226 fbshipit-source-id: 757583e3bde8d5246565433883bd328ab34f3e09	2018-10-24 15:02:40 -07:00
Yiming Wu	b55dc8d971	Add sse2neon tp (#12948 ) Summary: Adding sse2neon in thrid-party as dependencies Pull Request resolved: https://github.com/pytorch/pytorch/pull/12948 Differential Revision: D10801574 Pulled By: harouwu fbshipit-source-id: 8b4f9f361cc1722f631830f7675b9d209a9f22ef	2018-10-24 14:56:24 -07:00
Jerry Zhang	be43a0faa9	Tensor dims() -> sizes() (caffe2/operators) - 2/5 (#13029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13029 Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10476225 fbshipit-source-id: 5e63ca80b3843967ea1661ada447bbc18661378d	2018-10-24 14:34:45 -07:00
Jerry Zhang	07c0f4a097	Tensor dims() -> sizes() (caffe2/operators) - 1/5 (#13028 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13028 Codemod generated with clangr shard mode, 25 files per diff, for renaming dims() to sizes() Reviewed By: ezyang Differential Revision: D10476220 fbshipit-source-id: 3c3b3d5e2082cd6a1f0ff4a3c8641b30e6f16896	2018-10-24 14:18:18 -07:00
Orion Reblitz-Richardson	4b5d13abab	Use cmake3 if it exists and cmake isn't sufficient (#12972 ) Summary: A tweak to https://github.com/pytorch/pytorch/pull/12916 that only uses cmake3 when cmake isn't good enough. Hopefully fixes the issue zdevito saw. cc zdevito SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/12972 Differential Revision: D10560674 Pulled By: orionr fbshipit-source-id: 90c71929630bb8167a3ee2cc6f306eefe5b85445	2018-10-24 14:14:39 -07:00
Duc Ngo	10046c2b2b	nomnigraph - (easy) Expose operators (#13063 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13063 Expose the following operators GatherRanges Slice MergeIdLists Reviewed By: itomatik Differential Revision: D10560138 fbshipit-source-id: 90f74d7d4c2bfca40788a5fcec4c73d71b156d3b	2018-10-24 14:09:27 -07:00
Yiming Wu	c64a65c977	added gemmlowp module (#12947 ) Summary: Adding gemmlowp dependency in thrid-party folder Pull Request resolved: https://github.com/pytorch/pytorch/pull/12947 Differential Revision: D10794559 Pulled By: harouwu fbshipit-source-id: 7f8a649c739ccb6c307327080711379b1db8c3e0	2018-10-24 13:53:58 -07:00
David Reiss	0f5cee2f6b	Convert some docstrings from char* to char[] (#13062 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13062 Gold (the linker) isn't able to gc unreferenced string constants, but converting these to arrays puts them in their own data sections and reduces (Android) binary size as a result. I'm told even in server builds, this reduces binary size by a few dozen bytes and speeds up startup by a few hundred ns. :-P Reviewed By: Yangqing Differential Revision: D10510808 fbshipit-source-id: 247ba9574e7a9b6a8204d33052994b08c401c197	2018-10-24 13:48:18 -07:00
David Reiss	97b6a25329	Use REGISTER_CPU_GRADIENT_OPERATOR for many operators (#12616 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12616 Focusing on operators in common use on mobile. Also use GRADIENT_OPERATOR_SCHEMA. Reviewed By: Yangqing Differential Revision: D10245216 fbshipit-source-id: 5cc023da170149b637fe3c729d3756af948aa265	2018-10-24 13:48:17 -07:00
Edward Yang	df47bbe9c1	Fix test_glu_old HealthCheck with smarter generation strategy. (#12975 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12975 Differential Revision: D10513493 Pulled By: ezyang fbshipit-source-id: ac183aeb4ae7f0a5f91f1a369b595ae92c3e844d	2018-10-24 13:45:19 -07:00
Anders Papitto	2dacf28b66	link libgloo_cuda.a explictly from setup.py (#12951 ) Summary: rather than pass a list through a text file Pull Request resolved: https://github.com/pytorch/pytorch/pull/12951 Differential Revision: D10528309 Pulled By: anderspapitto fbshipit-source-id: d94befcd61b6304815859694b623046f256462df	2018-10-24 13:19:46 -07:00
Jerry Zhang	dd7c2d4284	Change the function signature for caffe2::empty (#13015 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13015 att Reviewed By: ezyang Differential Revision: D10469310 fbshipit-source-id: f4621fe5d17bb4663192860f81effe6bdfe21bea	2018-10-24 13:14:24 -07:00
Viswanath Sivakumar	1bea5fc3ad	Fix UpsampleNearest op CPU impl batch handling (#13002 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13002 Batch dim wasn't handled in the CPU impl (will fail for inputs with N > 1). Fixing that here. Differential Revision: D10515159 fbshipit-source-id: ee7e4f489d2d4de793f550b31db7c0e2ba3651e8	2018-10-24 13:10:53 -07:00
Jerry Zhang	353fdefdd6	dims() -> sizes() (caffe2/core) (#13014 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13014 Tensor method renaming using clangr Reviewed By: ezyang Differential Revision: D10467556 fbshipit-source-id: 7d7eaf5fc59bbb493c057d5b8bfdda03b140c97e	2018-10-24 12:49:28 -07:00
103yiran	0a190c8869	Move the location of annotation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12969 Differential Revision: D10560824 Pulled By: ezyang fbshipit-source-id: 86c21149682db5ebfd9610df9e9845688a3db3b0	2018-10-24 12:35:08 -07:00
Fei Sun	fcf801f061	Support building binary on windows machines Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13059 Reviewed By: llyfacebook Differential Revision: D10560147 Pulled By: sf-wind fbshipit-source-id: c8f38b30c9acdf6ae494e56a5876fd4493696e5d	2018-10-24 12:24:42 -07:00
Will Feng	8355219e68	CircleCI: turn off OSX jobs temporarily Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13064 Differential Revision: D10561008 Pulled By: yf225 fbshipit-source-id: c48364662efa82865a1bc1a7e2db3a9fb8af10d5	2018-10-24 12:22:05 -07:00
Anders Papitto	85273acca8	fix pinning of hypothesis (#13055 ) Summary: tested manually that this works fixes https://github.com/pytorch/pytorch/issues/12395 obviates https://github.com/pytorch/pytorch/pull/12774 Pull Request resolved: https://github.com/pytorch/pytorch/pull/13055 Differential Revision: D10559788 Pulled By: anderspapitto fbshipit-source-id: 5cd8bac6eff548280c8742f36a5e7f2748a24623	2018-10-24 11:46:28 -07:00
Jesse Hellemn	448a32e0ee	Adding timestamps to the beginning of every test file in run_test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12994 Reviewed By: anderspapitto Differential Revision: D10515291 Pulled By: pjh5 fbshipit-source-id: 191054cdacff308b63e9063d22d62314398e4f88	2018-10-24 11:42:31 -07:00
Zachary DeVito	6c8d47f2af	Add methods to FunctionSchema (#12967 ) Summary: We are beginning to use this class in a wider reaching set of use-cases. This PR refactors it so that we always access schema properties through methods. This will make adding extra information like alias information easier (i.e. we can a version of `type()` that returns the type with alias information and another version that returns a type without that information). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12967 Differential Revision: D10502674 Pulled By: zdevito fbshipit-source-id: a88783ed8f20ab3be6460c12da95f9f940891c44	2018-10-24 10:32:27 -07:00
Yangqing Jia	52beb338ab	Add Modules_CUDA_Fix folder to installed folder (#13013 ) Summary: This is used to patch our cmake cuda scripts - should be in the installation script. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13013 Reviewed By: ir413 Differential Revision: D10519104 Pulled By: Yangqing fbshipit-source-id: 542049224ea41068f32d4c0f6399c7e8b684f764	2018-10-24 10:16:18 -07:00
Tongzhou Wang	46162ccdb9	Autograd indices/values and sparse_coo ctor (#13001 ) Summary: Reopen of #11253 after fixing bug in index_select Pull Request resolved: https://github.com/pytorch/pytorch/pull/13001 Differential Revision: D10514987 Pulled By: SsnL fbshipit-source-id: 399a83a1d3246877a3523baf99aaf1ce8066f33f	2018-10-24 10:00:22 -07:00
Roy Li	e0f21a4977	restore caffe2 strides (#12883 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12883 Attempting to do this again. last try broke oss ci: D10421896 Reallocation of strides_ if there's no change in dim seems to cause the error that broke internal flow last time. This fixes that. Found a potential race condition in caffe2 counter ops that might be the cause, we will investigate that. Reviewed By: ezyang Differential Revision: D10469960 fbshipit-source-id: 478186ff0d2f3dba1fbff6231db715322418d79c	2018-10-24 09:45:46 -07:00
anderspapitto	88f70fcef9	remove progress from git operations in CI builds (#13017 ) Summary: these are pretty spammy - unless we have a reason to keep them, let's not Pull Request resolved: https://github.com/pytorch/pytorch/pull/13017 Differential Revision: D10528295 Pulled By: anderspapitto fbshipit-source-id: 5514371a6e61e13ec070cc5517488523d42f2935	2018-10-24 09:26:05 -07:00
Richard Zou	7863c17b26	Fix convtranspose3d output_size calculation (#12952 ) Summary: Closes #2119. There was a small bug where the output_size got sliced with `[-2:]` where we really meant to slice it as `[2:]` (to remove the batch and channel dimensions). Added a new test for this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12952 Differential Revision: D10510678 Pulled By: zou3519 fbshipit-source-id: 4c04a5007fc6d002e1806d6fe981b43d33d6a4f2	2018-10-24 09:23:05 -07:00
Orion Reblitz-Richardson	046672eed5	Set proper scope on nodes added by JIT (#12400 ) Summary: In order to support tensorboardX and other visualization tools, we need to make sure a non-empty scope is set on all nodes added by the JIT. This attempts to do this, but is still a WIP. This is a new version of https://github.com/pytorch/pytorch/pull/10749 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12400 Reviewed By: ezyang Differential Revision: D10224380 Pulled By: orionr fbshipit-source-id: d1bccd0eee9ef7c4354112c6a39a5987bfac2994	2018-10-24 09:05:46 -07:00
Soumith Chintala	cf235e0894	fix lint after new flake8 release added new style constraints (#13047 ) Summary: fix lint after new flake8 release added new style constraints Pull Request resolved: https://github.com/pytorch/pytorch/pull/13047 Differential Revision: D10527804 Pulled By: soumith fbshipit-source-id: 6f4d02662570b6339f69117b61037c8394b0bbd8	2018-10-24 09:03:38 -07:00
Edward Yang	d72de9fb1e	Replace direct use of int32_t with an alias DeviceIndex (#13019 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13019 It just makes the semantic meaning of the int32_t a little bit clearer. Reviewed By: zou3519 Differential Revision: D10520295 fbshipit-source-id: 45b0bd1b6afddee17072b628d8e9b87d7c86e501	2018-10-24 08:27:45 -07:00
Edward Yang	34cca9f05b	Move Device and DeviceType to c10 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12995 Reviewed By: Yangqing Differential Revision: D10513246 fbshipit-source-id: 0c6d52e09166d7e8a786c1a0e21685ec9c35b12a	2018-10-24 08:27:44 -07:00
Edward Yang	ca03c10cef	Rename createCUDAStream() to getStreamFromPool() (#12940 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12940 Dmytro was reading this code and requested that we rename the interface to something that made it more obvious that pooling was going on. Seems reasonable to me! Final name is a suggestion from Pieter. Reviewed By: dzhulgakov Differential Revision: D10492071 fbshipit-source-id: b1c2cac760f666968d58166be649dabfe1127c5e	2018-10-24 07:23:31 -07:00
Edward Yang	924326e171	Revert D10438090: [nomnigraph] Improve the C++ API Differential Revision: D10438090 Original commit changeset: 6b4309b8a4b3 fbshipit-source-id: 5f6a28cf032e0be2544f0b33508148f4f49e10c5	2018-10-24 07:04:33 -07:00
Edward Yang	97d4c05566	Revert D10412639: [nomnigraph] Add new NeuralNetOps for fusion Differential Revision: D10412639 Original commit changeset: a4c523fda96b fbshipit-source-id: 973b6dd30b63b9a08069275278b0780b65067635	2018-10-24 07:04:31 -07:00
Yinghai Lu	17c6d168de	Attach Shape node if Concat node has 2 outputs (#13006 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13006 In Caffe2, Concat can have 2 outputs. The second being the output shape of the 1st output. In ONNX, Concat only has 1 output. So when we do the exporting, we need to add a `Shape` to the first output and generate the second output from it. Differential Revision: D10517698 fbshipit-source-id: 38e974423e2506b16d37b49d51c27ad87b73e63a	2018-10-23 22:56:48 -07:00
Bram Wasti	53ac4de79d	Expose basic transformation API to Python (#13033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/13033 Basic graph manipulation exposed to python Reviewed By: ZolotukhinM Differential Revision: D10519720 fbshipit-source-id: 0f9a494d122289a3a9e23d4cff99ac0a21382ec6	2018-10-23 20:54:54 -07:00
David Riazati	4e0b6c8500	Speed up resolution callback creation (#12859 ) Summary: `inspect.stack()` calls are slow since they access a bunch of extra info about the frame. This PR instead uses `inspect.currentframe()` and goes up the stack until it reaches the correct frame. [Context](stackoverflow.com/questions/17407119/python-inspect-stack-is-slow) Pull Request resolved: https://github.com/pytorch/pytorch/pull/12859 Differential Revision: D10509912 Pulled By: driazati fbshipit-source-id: b85325adf1b3c85a1a3a82e96e567b8be498531b	2018-10-23 20:40:04 -07:00
Bram Wasti	08d99c4486	Add new NeuralNetOps for fusion Summary: Basic ops.def update and converter.cc updates This is the standard way to ingest networks into nomnigraph Reviewed By: duc0 Differential Revision: D10412639 fbshipit-source-id: a4c523fda96bbe0e31de0d9fcf795ae9c7377c90	2018-10-23 19:27:10 -07:00
Bram Wasti	9c1195fe61	Improve the C++ API Summary: Cleaning up the interface for nomnigraph in C++ world Reviewed By: duc0 Differential Revision: D10438090 fbshipit-source-id: 6b4309b8a4b3730f3309edf0047d4006a001895b	2018-10-23 19:27:09 -07:00
Elias Ellison	f9b7ce9c99	Add tuple indexing support for constant integers (#11492 ) Summary: Add support indexing tuples with constant integers by creating a new prim::TupleIndex operator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11492 Differential Revision: D9811996 Pulled By: eellison fbshipit-source-id: a458c2522b3c81476252d920e27a8d6c7b9a036b	2018-10-23 17:52:03 -07:00
Yangqing Jia	ff508c91a1	Remove numba dependency Summary: TSIA - we want to deprecate numba in fbcode when moving to new compiler tiers. Converted the old test to a non-numba regular python op test. Reviewed By: xw285cornell Differential Revision: D10519910 fbshipit-source-id: 0e9188a6d0fc159100f0db704b106fbfde3c5833	2018-10-23 17:03:47 -07:00
Michael Antonov	a6949abb15	Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE (fixed reverted bug) (#12848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12848 Updated all non-test uses of protobuf::MessageLite::SerializeAsString to call SerializeAsString_EnforceCheck so that the return value is checked and can throw an exception if failing. Most of the affected code was called from classes derived from BlobSerializeBase. Didn't touch most tests and ENFORCE calls because they usually do checks anyway. Original commit changeset: c0760e73ecc7 Reviewed By: dzhulgakov Differential Revision: D10453456 fbshipit-source-id: d2f2b7b4578e721924354149f08f627c7e3bf070	2018-10-23 16:21:26 -07:00
Michael Suo	dd00c2997f	fix expect tests (#13005 ) Summary: the topological index shuffled arguments around, updating expect files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/13005 Differential Revision: D10517246 Pulled By: michaelsuo fbshipit-source-id: 8f95e4e4ca8ff51da0507f9b0eb838c23ddaa821	2018-10-23 15:53:16 -07:00
Mikhail Zolotukhin	821b04e819	Nomnigraph: Remove Copy constructor and copy assign operator from BasicBlock, add move constructor. Summary: We cannot use copying as it loses recorded callbacks and thus after copying tracked values are no longer tracked. Reviewed By: bwasti, duc0 Differential Revision: D10510057 fbshipit-source-id: b64fdef3fb28fc26fe55eba41f4b5007ba6894de	2018-10-23 15:41:48 -07:00
Sebastian Messmer	83f788d088	Fix MSVC build for Python 3.6 (#12878 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12878 Python 3.6 headers define their own ssize_t, which clashes with our definition. Luckily, they also define a `HAVE_SSIZE_T` macro we can use to check for this case. Reviewed By: ezyang Differential Revision: D10467239 fbshipit-source-id: 661675ad1e30a6ca26d6790eaa75657ef6bf37c2	2018-10-23 15:30:01 -07:00
iotamudelta	b8a11cffdb	Minor improvements cherry-pick (#12973 ) Summary: * Enable disabled functions for ROCm (ROCm 252) * fixes for topk fp16 (ROCm 270) * HIP needs kernel invocation to be explicitly templated to be able to take non-const arg as const kernel arg (ROCm 281) For attention: bddppq ezyang Full set of PyTorch/Caffe2 tests on ROCm here: https://github.com/ROCmSoftwarePlatform/pytorch/pull/283 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12973 Differential Revision: D10516072 Pulled By: bddppq fbshipit-source-id: 833b3de1544dfa4886a34e2b5ea53d77b6f0ba9e	2018-10-23 15:03:47 -07:00
Junjie Bai	223a96a9a0	Add missing NCHW2NHWC symbols for HIP (#13000 ) Summary: petrex ashishfarmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/13000 Differential Revision: D10516020 Pulled By: bddppq fbshipit-source-id: 017bd393da3d97fbae3f0227ad01977c5c0744c6	2018-10-23 14:20:33 -07:00
iotamudelta	470e766062	Fix illegal code in rocblas_handle rocblas_handle() that causes failure w/ gcc as base compiler (#12957 ) Summary: The legal function cublasHandle_t cublas_handle() was hipified to the clearly illegal rocblas_handle rocblas_handle(). It should not work and correctly fails with gcc as the host compiler as it induces an ambiguity. Function now hipifies to rocblas_handle rocblashandle() Fixes long standing issue we've observed in PyTorch when base compiler is gcc. For attention: bddppq ezyang Tests on ROCm PyTorch/Caffe2: https://github.com/ROCmSoftwarePlatform/pytorch/pull/284 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12957 Differential Revision: D10501227 Pulled By: bddppq fbshipit-source-id: 568cb80801c0d14c9b1b61e3a7db387a5c21acf4	2018-10-23 13:46:15 -07:00
Pat Mellon	21285e73da	Add Google pixel code Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12998 Differential Revision: D10515096 Pulled By: JoelMarcey fbshipit-source-id: 7f97014451448a70ea7f91d7d8bd96fbf6e83f7f	2018-10-23 13:26:37 -07:00
Peter Goldsborough	8e4bea107a	Fix clang-tidy 404 in Travis Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12963 Differential Revision: D10510026 Pulled By: goldsborough fbshipit-source-id: b6b9634a7a2575ff4e2983321d2e4e5829626347	2018-10-23 09:34:43 -07:00
Peter Goldsborough	9ea19cb079	Windows CI integration for custom ops (#12928 ) Summary: Resubmission of https://github.com/pytorch/pytorch/pull/11527 ezyang orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/12928 Differential Revision: D10501342 Pulled By: goldsborough fbshipit-source-id: 7ce74795aab2f13efeb38f56ce82f53055f5eade	2018-10-23 09:18:09 -07:00
David Riazati	af78d4cd49	Add weak script modules (#12682 ) Summary: Adds support for weak script modules created that get compiled to `ScriptModule`s once added as a submodule of a `ScriptModule`: ```python weak_module class Test(torch.nn.Module): ... weak_script_method def forward(self, x): ... ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12682 Differential Revision: D10458626 Pulled By: driazati fbshipit-source-id: 10ae23cb83cdafc4646cee58f399e14b2e60acd4	2018-10-23 09:06:02 -07:00
Benoit Steiner	3fb3a07f54	Added a default constructor for torch.finfo. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12847 Differential Revision: D10457487 Pulled By: benoitsteiner fbshipit-source-id: 7d164a71ba52631e5906098f643eecb0630879d1	2018-10-23 09:03:24 -07:00
Jat	1b07eb7148	torch.utils.cpp_extension.verify_ninja_availability() does not return True as documented Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12922 Differential Revision: D10502167 Pulled By: ezyang fbshipit-source-id: 2e32be22a310e6e014eba0985e93282ef5764605	2018-10-23 07:38:08 -07:00
Gregory Chanan	428300d318	Revert D10494123: [c10] Remove at::Optional Differential Revision: D10494123 Original commit changeset: 761bdf7359d6 fbshipit-source-id: 552fb4ab0dc253b95ce87ec6a1c65aba4b07e84a	2018-10-23 07:18:54 -07:00
Yangqing Jia	d401dc4374	Remove at::Optional (#12958 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12958 TSIA - this is an ongoing diff to fully move to c10 namespace. Reviewed By: dzhulgakov Differential Revision: D10494123 fbshipit-source-id: 761bdf7359d62ef4503ecb1b8d0ae1c0762e073c	2018-10-23 00:03:20 -07:00
Michael Suo	27af265a5e	Index to track topological order within a block (#12748 ) Summary: Simple index to track topological order. Replaced `topological_index` in the graph fuser with this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12748 Differential Revision: D10502983 Pulled By: michaelsuo fbshipit-source-id: 5855e5add3c9742fe07e86d854260baa34beab3b	2018-10-22 23:55:20 -07:00
Thomas Viehmann	dd823ccd28	small improvements to torch.nn.normalization docs (#12936 ) Summary: Based on a [discussion at the forums](https://discuss.pytorch.org/t/question-about-functional-normalize-and-torch-norm/27755), it might be worthwhile to clarify the documentation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12936 Differential Revision: D10502139 Pulled By: ezyang fbshipit-source-id: 480c3c367f8c685dcde107b3018cb4129032322d	2018-10-22 23:14:47 -07:00
Parth Raichura	8d7607e346	Add attribute exhaustive_search in _blacklist_caffe2_args (#12805 ) Summary: - exhaustive_search attribute will be blacklisted so it will be discarded from the coverted onnx model. At present it throws error while verifying the onnx model Signed-off-by: Parth Raichura <parth.raichura@softnautics.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12805 Differential Revision: D10502374 Pulled By: ezyang fbshipit-source-id: 0926dfa3237a8a431184e7f7250146e5b0cbfb85	2018-10-22 22:48:31 -07:00
Edward Yang	bc1d96ca98	Add support for inline expect tests. (#12825 ) Summary: expecttest and test_expecttest are the implementation and tests for this functionality. I wired it up to the --accept flag, but there's also a new environment variable EXPECTTEST_ACCEPT which may be more convenient to trigger. Haven't tested if this works in fbcode. There may be a few expect tests which will benefit from inline treatment, but I just did one to show it works. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12825 Reviewed By: teng-li Differential Revision: D10448630 Pulled By: ezyang fbshipit-source-id: 3d339f82e2d00891309620a60e13039fa1ed8b46	2018-10-22 19:29:04 -07:00
Edward Yang	952df2ba8f	Install torchvision before all tests, tickles #7851 (#8311 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> Pull Request resolved: https://github.com/pytorch/pytorch/pull/8311 Differential Revision: D10239923 Pulled By: ezyang fbshipit-source-id: 3f8cdc6229bfbe701c7583cede65435aa952ed85	2018-10-22 18:16:47 -07:00
Yangqing Jia	3894ed22a8	Remove nullopt from native_parse.py (#12961 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12961 According to zdevito - this is not used at all, so we are removing it for safety. It is also possible that this native_parser.py will completely go away in the near future. Reviewed By: zdevito Differential Revision: D10501616 fbshipit-source-id: 3218708e6150d3c94d730fbd25ae1f7abb5718b5	2018-10-22 18:13:37 -07:00
Ilia Cherniavskii	da2da55170	Make sure to update success_ at the end of the run (#12806 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12806 Make sure to update success_ status at the end of the run when going through task statuses Reviewed By: aazzolini Differential Revision: D10443704 fbshipit-source-id: 79f8f7fe1eccb78f6e2859f3b1e66dc44347bcc8	2018-10-22 16:58:20 -07:00
Edward Yang	8c514627a4	Add C10_LIKELY/C10_UNLIKELY macros (#12932 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12932 I was looking at some assembly for some code I was working on, and felt a desire to have likely()/unlikely() macros. I checked if we already had them, and we didn't. This commit adds them, and fixes up all known use sites to make use of it. Reviewed By: Maratyszcza Differential Revision: D10488399 fbshipit-source-id: 7476da208907480d49f02b37c7345c17d85c3db7	2018-10-22 16:26:19 -07:00
Teng Li	8d3e7e2fcb	Move DDP queue_reduction to C++ (#12852 ) Summary: fully working version by using continuing on goldsborough 's initial version. waiting on the stream guard to be merged before adding more stream perf logics into the c++ version Pull Request resolved: https://github.com/pytorch/pytorch/pull/12852 Differential Revision: D10468696 Pulled By: teng-li fbshipit-source-id: 8e46d408796973817abfd9dbd6566e0ca5b7a13f	2018-10-22 16:07:46 -07:00
Edward Yang	8682999767	Remove trailing whitespace from files in aten/ (#12942 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12942 I hate trailing whitespace. Reviewed By: Yangqing Differential Revision: D10492507 fbshipit-source-id: 94ed80988670361e9e7e508c3b07c5e5c6e500e7	2018-10-22 16:04:21 -07:00
Peter Goldsborough	f575e138d8	Credits to Exhale in cppdocs (#12926 ) Summary: Some creds to svenevs soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/12926 Differential Revision: D10498288 Pulled By: goldsborough fbshipit-source-id: 878d23ebf260dac17871677635a3283eb3a8a423	2018-10-22 15:39:36 -07:00
egg-west	e64f75a1d8	fix ZeroDivisionError in utils.bottleneck (#11987 ) Summary: ZeroDivisionError occurs when `cuda_prof_exec_time` is small enough. This situation is normal for a project that has little CUDA work. Or someone does not make his work transferred to CUDA successfully. In this time he profiles the code, this error occurs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11987 Differential Revision: D10488568 Pulled By: soumith fbshipit-source-id: db8c1e9e88a00943c100958ebef41a1cb56e7e65	2018-10-22 14:00:15 -07:00
Yangqing Jia	95caa37565	Remove CAFFE2_USE_MINIMAL_GOOGLE_GLOG (#12938 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12938 We will be using C10_USE_MINIMAL_GLOG. Also, this will be in exported flags, so dependent libraries won't need to define it. Reviewed By: smessmer, BIT-silence Differential Revision: D10468993 fbshipit-source-id: 04ae3ae17122d46b1b512d4202ab014365b87f4a	2018-10-22 13:37:38 -07:00
Yinghai Lu	283d41885d	Accept external input hint when doing ONNXIFI transform (#12900 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12900 Workspace sometimes will be populated with input tensors for shape inference but net.external_input() is not a reliable way to tell weights from input in the workspace. We say in some usecases where net.external_input() is empty. In this case, we need to give user an option to provide input hint. Reviewed By: bddppq Differential Revision: D10476822 fbshipit-source-id: 1a3fa2df69b959d5b952a7824eba9e6c713f4f07	2018-10-22 13:32:33 -07:00
Peter Goldsborough	5f37c0afda	Fix doxygen check Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12920 Differential Revision: D10494081 Pulled By: goldsborough fbshipit-source-id: c96b9b61cbae39006b48b23b901248e762cbd232	2018-10-22 12:28:17 -07:00
Yinghai Lu	56bf4850cb	Clean up of the multithreaded benchmark (#12905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12905 This diff does some clean up of the multithread benchmark code: 1. Split implementation to `.cc` file to separate implementation and improve build 2. Make `MutatingNetSupplier` more generic by providing the mutating function as an argument instead of virtual method. 3. Fix AI benchmark by sticking to the original option names Reviewed By: highker Differential Revision: D10479238 fbshipit-source-id: afa201fc287e3fdbb232db24513ecf8024501f66	2018-10-22 12:09:16 -07:00
Anders Papitto	1b530fdae0	remove the find-package codepath for gloo in caffe2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12893 Differential Revision: D10493310 Pulled By: anderspapitto fbshipit-source-id: ba5bd375c118b0f0ab7fb7b9fda010fe17a6ac8d	2018-10-22 11:54:53 -07:00
Sebastian Messmer	6cc15c1a22	Simplify typeid SFINAE (#12706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12706 If both branches are valid C++ code independent from the type passed in, then we can just use if/else inside of a constexpr function to decide between the cases. Only if one branch would be invalid code (say because type T doesn't have a default constructor), we'd need "constexpr if" or SFINAE. Reviewed By: ezyang Differential Revision: D10400927 fbshipit-source-id: 16d9855913af960b68ee406388d6b9021bfeb34a	2018-10-22 11:27:10 -07:00
Xiaomeng Yang	3092a69546	Optimize NCHW2NHWC on GPU (#12910 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12910 Optimize NCHW2NHWC on GPU Reviewed By: houseroad Differential Revision: D10481163 fbshipit-source-id: 6ddbd0ec9c96965b96aa1b8a006232d6f2b94249	2018-10-22 11:24:29 -07:00
Anders Papitto	cfb7f0a8f2	remove onnx CODEOWNERS entries (#12941 ) Summary: we don't need these anymore; let's reduce notification spam Pull Request resolved: https://github.com/pytorch/pytorch/pull/12941 Reviewed By: bddppq Differential Revision: D10492266 Pulled By: anderspapitto fbshipit-source-id: 3251b6d0160f773d17b64afc504216323d61276a	2018-10-22 11:09:08 -07:00
Anders Papitto	8f51c513a6	gloo: build once, share between pytorch/caffe2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12885 Differential Revision: D10492244 Pulled By: anderspapitto fbshipit-source-id: 79af1ceb9bb0dab4585a728e64554ff4f38d6c32	2018-10-22 11:06:14 -07:00
Tongzhou Wang	df06fba1f1	Use the newer one of cmake and cmake3. (#12916 ) Summary: On my devgpu, `cmake` is newer than `cmake3`. Using `cmake3` causes compilation to fail. Instead of blindly using `cmake3`, we pick the newer of the two. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12916 Differential Revision: D10481922 Pulled By: SsnL fbshipit-source-id: 8340136c459e25da9f5fc4f420c7e67cadc28aff	2018-10-22 10:29:55 -07:00
Tongzhou Wang	5e8e199f8d	Add note on traced module train/eval behavior Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12903 Differential Revision: D10489090 Pulled By: SsnL fbshipit-source-id: 13ff5587f53706b360dd0905d0ae97fb16ae2bf0	2018-10-22 10:26:15 -07:00
Peter Goldsborough	a022fd2d6b	Implement DataLoader (#11918 ) Summary: This PR implements a DataLoader API for the C++ frontend. The components present in this API largely match the Python API. It consists of: - `Dataset`s: Conceptually a function from a set of indices to a batch of examples; - `Transform`s: A functional transformation of a dataset. A `Map<D, T>` for Dataset `D` and transform `T` is itself a dataset; - `Sampler`s: Specify a strategy for generating indices for a new batch; - A `DataLoader`, with the ability to automatically parallelize fetching of samples across multiple worker threads; Note that collation functions fall naturally out of the `Map<Dataset, Transform>` abstraction. Things that are missing right now that maybe should be added: - Memory pinning for CUDA tensors The API was designed to be generalizable to almost any kind of dataset, transform or sampling strategy, while providing a convenient API out of the box. To achieve this, it is quite heavily templatized on various possible input types. There are many parts to this PR! Right now, I would like feedback on: - Your impression of the general usability of the API; - Your impression of which parts seem too complex or overthought; - The implementation of the parallelization aspects of the DataLoader. I've followed the Python implementation in some matters, but also differ in others. I think my implementation is a little cleaner and decouples components slightly better than the Python dataloader. I haven't added too many comments yet, as this is fresh out of the oven. Let me know if anything is unclear from the code itself. There also aren't any tests yet. I will write a comprehensive test suite once we agree on the API and implementation. apaszke ezyang The controller you requested could not be found. pietern Pull Request resolved: https://github.com/pytorch/pytorch/pull/11918 Reviewed By: ezyang Differential Revision: D9998881 Pulled By: goldsborough fbshipit-source-id: 22cf357b63692bea42ddb1cc2abc71dae5030aea	2018-10-22 10:22:41 -07:00
David Reiss	96d826f635	Define REGISTER_CPU_GRADIENT_OPERATOR (#12588 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12588 By default, this is an alias for REGISTER_CPU_OPERATOR. If gradients are not required (e.g., on mobile) it can be converted to a no-op by defining CAFFE2_NO_GRADIENT_OPS, resulting in a smaller build. GRADIENT_OPERATOR_SCHEMA works similarly. CAFFE2_NO_GRADIENT_OPS also converts REGISTER_GRADIENT to a no-op. Use these macros in fully_connected_op.cc as an example. Follow-up diffs will convert more operators. I had to introduce MACRO_EXPAND to handle the way Visual Studio expands VA_ARGS. Reviewed By: Yangqing Differential Revision: D10209468 fbshipit-source-id: 4116d9098b97646bb30a00f2a7d46aa5d7ebcae0	2018-10-22 10:01:02 -07:00
Yangqing Jia	da73d709a8	Remove unsafecoalesce op (#12897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12897 UnsafeCoalesce Op is used during memonger days when we try to coalesce operators for better efficienct computation kernels. It creates a little bit of an unsafe underlying memory storage pattern. With the new tensor unification I am not sure if it is still safe for us to do so, so I propose we delete it for the sake of safety. Reviewed By: bddppq, ilia-cher Differential Revision: D10475980 fbshipit-source-id: b1a838c9f47d681c309ee8e2f961b432236e157e	2018-10-22 09:42:26 -07:00
Evgeniy Zheltonozhskiy	c774cb8913	Rephrase unclear error message for shape mismatch (#12870 ) Summary: I spent a couple of minutes trying to understand which shape corresponds to checkpoint and which one to the model Pull Request resolved: https://github.com/pytorch/pytorch/pull/12870 Differential Revision: D10466600 Pulled By: SsnL fbshipit-source-id: 3b68530b1b756462a2acd59e3a033ff633567a6b	2018-10-22 08:57:16 -07:00
Gregory Chanan	25f4b3efe3	Add simple scripts for checking if generated code changed. (#12835 ) Summary: This is designed to make it easier to see how your codegen changes affected actual generated code. Limitations: A) This is NOT robust; if new directories are added that include generated files, they need to be added to tools/generated_dirs.txt. Note that subdirectories of the list are not included. B) This is particular to my workflow which I don't claim is generally applicable. Ideally we would have a script that pumped out a diff that could be attached to PRs. C) Only works on OSS and definitely won't work on windows. How to use: 1) python setup.py ... 2) tools/git_add_generated_dirs 3) Edit codegen 4) python setup.py ... 4) git diff to see changes 5) If satisfied: tools/git_reset_generated_dirs, commit, etc. If not satisfied: Go to 3) Pull Request resolved: https://github.com/pytorch/pytorch/pull/12835 Reviewed By: ezyang Differential Revision: D10452255 Pulled By: gchanan fbshipit-source-id: 294fc74d41d1b840c7a26d20e05efd0aff154635	2018-10-22 07:33:32 -07:00
Peter Goldsborough	01227f3ba7	Env variable to not check compiler abi (#12708 ) Summary: For https://github.com/pytorch/pytorch/issues/10114 soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/12708 Differential Revision: D10444102 Pulled By: goldsborough fbshipit-source-id: 529e737e795bd8801beab2247be3dad296af5a3e	2018-10-21 20:07:50 -07:00
David Riazati	1e8064dec0	Convert 2 nn.functional functions to weak script (#12723 ) Summary: * Moves `weak_script` annotation to `torch/_jit_internal.py` folder to resolve dependency issue between `torch.jit` and `torch.nn` * Add `torch._jit.weak_script` to `tanhshrink` and `softsign`, their tests now pass instead of giving an `unknown builtin op` error * Blacklist converted `torch.nn.functional` functions from appearing in the builtin op list if they don't actually have corresponding `aten` ops Pull Request resolved: https://github.com/pytorch/pytorch/pull/12723 Differential Revision: D10452986 Pulled By: driazati fbshipit-source-id: c7842bc2d3ba0aaf7ca6e1e228523dbed3d63c36	2018-10-21 14:09:55 -07:00
Tongzhou Wang	b357470421	Add DistributedDataParallelCPU to doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12864 Differential Revision: D10481669 Pulled By: SsnL fbshipit-source-id: 20831af41aaba75546e6ed6a99f011f0447b1acf	2018-10-21 11:20:11 -07:00
Ran Bi	ed02619ba0	Add topological sort to nomnigraph (#12790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12790 Add DFS based topological sort to nomnigraph. Reviewed By: duc0 Differential Revision: D10434645 fbshipit-source-id: aaf106b0cc37806b8ae61f065c1592a29993eb40	2018-10-20 01:07:30 -07:00
Yinghai Lu	a839a67aad	Add IDEEP unit test with zero-dim tensors (#8459 ) Summary: This test flushes out the issue that IDEEP cannot handle tensor with dims like (0, 2), which is a valid tensor shape. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8459 Differential Revision: D10419328 Pulled By: yinghai fbshipit-source-id: c5efcd152364a544180a8305c47a2a2d126ab070	2018-10-19 23:57:33 -07:00
Yangqing Jia	7dbb38e856	Moving logging from caffe2 to c10. (#12881 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12881 TSIA. This should not change any functionality. Remaining work: - change the build script to deprecate use of CAFFE2_USE_MINIMAL_GOOGLE_GLOG and use a C10 macro instead. - Unify the exception name (EnforceNotMet -> Error) - Unify the logging and warning APIs (like AT_WARNING) Reviewed By: dzhulgakov Differential Revision: D10441597 fbshipit-source-id: 4784dc0cd5af83dacb10c4952a2d1d7236b3f14d	2018-10-19 20:22:08 -07:00
Teng Li	d120b9af5a	Make c10d pickling/unpickling work (#12694 ) Summary: This fixes the issue for https://github.com/pytorch/pytorch/issues/12168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12694 Differential Revision: D10468717 Pulled By: teng-li fbshipit-source-id: 3df31d75eea19d6085af665f5350d3cb667a5048	2018-10-19 16:42:36 -07:00
Dong Shi	8cb0848bdc	expose delete_node (#12840 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12840 Add binding for delete_node Reviewed By: duc0 Differential Revision: D10453555 fbshipit-source-id: cdcaca8420a9a0c61479961d907ef6bb5478a41d	2018-10-19 13:30:50 -07:00
Junjie Bai	202893fe1a	Migrate DeviceOption.numa_node_id to DeviceOption.device_id Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12717 Reviewed By: ilia-cher Differential Revision: D10408325 fbshipit-source-id: 82583d0ad4b8db094ee4c5c607b52500826328f7	2018-10-19 12:45:48 -07:00
Will Feng	7921e16ca2	Revert D10421896: restore caffe2 strides Differential Revision: D10421896 Original commit changeset: b961ea0bca79 fbshipit-source-id: 9d9d2ed0c2cb23a3fdf6bbfc9509539aeeb7e382	2018-10-19 12:15:44 -07:00
Will Feng	bf99ffc4d2	Remove OMP_NUM_THREADS and MKL_NUM_THREADS settings from docker images (#12836 ) Summary: `OMP_NUM_THREADS` and `MKL_NUM_THREADS` are set to 4 by default in the docker images, which causes `nproc` to only show 4 cores in the docker containers by default, and building PyTorch is slow in this default case. We likely don't need these two flags to be set, and this PR tests that hypothesis. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12836 Differential Revision: D10468218 Pulled By: yf225 fbshipit-source-id: 7a57962c962e162a8d97f730626825aa1e371c7f	2018-10-19 11:44:22 -07:00
Xiaomeng Yang	14ff866505	Optimize GroupNormOp (#12844 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12844 Optimize GroupNormOp Reviewed By: houseroad Differential Revision: D10455567 fbshipit-source-id: aee211badd1e0c8ea6196843e3e77f7c612a74d5	2018-10-19 11:40:12 -07:00
Elias Ellison	f3e1fe5ca5	add string as supported input / output of script functions (#12731 ) Summary: Add strings to our set of built-in types for annotations. This is used in the the functional library. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12731 Differential Revision: D10453153 Pulled By: eellison fbshipit-source-id: f54177c0c529f2e09f7ff380ddb476c3545ba5b0	2018-10-19 11:17:19 -07:00
Roy Li	186219a643	restore caffe2 strides (#12845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12845 Attempting to do this again. Reallocation of strides_ if there's no change in dim seems to cause the error that broke internal flow last time. This fixes that. Found a potential race condition in caffe2 counter ops that might be the cause, we will investigate that. Reviewed By: ezyang Differential Revision: D10421896 fbshipit-source-id: b961ea0bca79757991013a2d60cfe51565689ee9	2018-10-19 10:00:16 -07:00
Edward Yang	68f4a4b3ba	Delete THCStreamGuard in favor of CUDAGuard, also c10d code cleanup (#12849 ) Summary: I got annoyed at waiting for OSS to tell me my c10d builds were busted, so I also added support for building the test scripts in fbcode and fixed the warnings this uncovered. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12849 Reviewed By: pietern Differential Revision: D10457671 fbshipit-source-id: 5b0e36c606e397323f313f09dfce64d2df88faed	2018-10-19 09:48:41 -07:00
Will Feng	6ec2f09188	CircleCI: enable OSX jobs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12667 Differential Revision: D10466661 Pulled By: yf225 fbshipit-source-id: a1a150d3b384eb88ba4c7e6d57e59d8ed834e53c	2018-10-19 09:42:06 -07:00
Will Feng	7837ec553c	CircleCI: Add doc-push job Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12833 Differential Revision: D10464815 Pulled By: yf225 fbshipit-source-id: 06a6a673b6bb32f7c252a217f9ce59db35c75e9c	2018-10-19 08:58:04 -07:00
Tristan Rice	6190408e24	caffe2: UpsampleBilinear support for scales (#12736 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12736 This updates UpsampleBilinearOp and UpsampleBilinearGradientOp to support scales to bring it inline with ResizeNearestOp https://github.com/pytorch/pytorch/pull/12720. Reviewed By: houseroad Differential Revision: D10416228 fbshipit-source-id: f339b7e06979c9c566afb4cee64a2d939b352957	2018-10-19 08:55:55 -07:00
Gregory Chanan	d736f4f0a7	Kill 'python_name' in Declarations.cwrap. (#12832 ) Summary: I'm trying to do some transformations on Declarations.cwrap and this makes things overly difficult and doesn't do anything useful. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12832 Reviewed By: ezyang Differential Revision: D10450771 Pulled By: gchanan fbshipit-source-id: 1abb1bce27b323dd3e93b52240e7627cd8e56566	2018-10-19 08:47:27 -07:00
Zachary DeVito	31232061aa	Use C local in lexer (2) (#12838 ) Summary: trying again without xlocale.h Pull Request resolved: https://github.com/pytorch/pytorch/pull/12838 Differential Revision: D10453078 Pulled By: zdevito fbshipit-source-id: 760852c82e16acee7d1abb8a918822bf5ff59bca	2018-10-19 00:25:35 -07:00
Jan Schlüter	373b5080da	Warn that tensor.resize_() resets strides (#12816 ) Summary: As discussed in #1570, this adds a warning to the docstring of `tensor.resize_()` to prevent people from naively using it as an in-place view or reshape. For your convenience, the updated docstring renders as follows: ![torch_resize_docstring](https://user-images.githubusercontent.com/629706/47148782-f1b57900-d2d1-11e8-9749-e9c7387113ed.png) Fixes #1570. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12816 Differential Revision: D10457755 Pulled By: ezyang fbshipit-source-id: dd4b3a821e8c76dc534d81c53084abdb336e690a	2018-10-18 22:47:30 -07:00
Edward Yang	d783249674	Revert D10457796: [pytorch][PR] fix typo Differential Revision: D10457796 Original commit changeset: 9d1582c11c2e fbshipit-source-id: 9be38e999a2783dae4a387821806e6850b6a3671	2018-10-18 21:48:14 -07:00
James Reed	ca5dc9f13a	Add py2 compatibility for builtins import (#12784 ) Summary: Testing if this is a solution for the issue reported at https://github.com/pytorch/pytorch/pull/12504#issuecomment-430758448 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12784 Differential Revision: D10454398 Pulled By: jamesr66a fbshipit-source-id: a0304acde5df438c08cceb2d5280933de24664c4	2018-10-18 20:54:23 -07:00
crcrpar	aa6f47e229	fix typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12814 Differential Revision: D10457796 Pulled By: ezyang fbshipit-source-id: 9d1582c11c2e6dec5ff1c87525fac127a7e77273	2018-10-18 20:42:08 -07:00
James Reed	f47d12b0ef	shape_as_tensor should return a CPU tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12846 Differential Revision: D10456885 Pulled By: jamesr66a fbshipit-source-id: fa66d0736cfb0ed09e566ae7c2eaeac37f8bb0e4	2018-10-18 20:20:00 -07:00
Parth Raichura	40ff69b796	Add attribute exhaustive_search in caffe2 blacklist args (#12815 ) Summary: Currently while converting from caffe2 to onnx it doesn't blacklist the exhaustive_search attribute in support_onnx_export. So conversion fails when onnx model is verified using C.check_model. Signed-off-by: Parth Raichura <parth.raichura@softnautics.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12815 Differential Revision: D10457777 Pulled By: ezyang fbshipit-source-id: dc2183d8abef8cd753b348f2eaa62c952a058920	2018-10-18 19:53:40 -07:00
Tongzhou Wang	8a35aafca6	Try to fix randomness.rst formatting again Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12853 Differential Revision: D10458439 Pulled By: SsnL fbshipit-source-id: ebd259e598327b0c5d63de6b7c182781fe361fbd	2018-10-18 19:18:49 -07:00
JerryShih	0fa69c0276	Remove the protobuf library in pytorch linking list. (#12451 ) Summary: There will be a link error when the caffe2 doesn't use its protobuf under third_party. The pytorch will always link that protobuf. The pytorch doesn't use the protobuf directly. We could remove it from the list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12451 Differential Revision: D10262676 Pulled By: ezyang fbshipit-source-id: c2ff3fdf757fc21ed689e7f663c082064b1a0bca	2018-10-18 18:31:51 -07:00
Tongzhou Wang	a85174b46a	Fix randomness.rst formatting Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12850 Differential Revision: D10457694 Pulled By: SsnL fbshipit-source-id: fa64964ff6d41625d9383ca96393017230e4ee0f	2018-10-18 18:26:26 -07:00
Zachary DeVito	87d3d209a6	Enable JIT tests in fbcode (#12777 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12777 Enables JIT tests in FBCode. Changes pybind11 code to avoid mixing py::args with positinally matched arguments because old versions of PyBind11 leak memory in this case. Reviewed By: jamesr66a Differential Revision: D10419708 fbshipit-source-id: 74bc466001b5d363132d1af32e96841b38601827	2018-10-18 18:18:37 -07:00
Edward Yang	99bc541b5b	size_from_dim(0) is like numel() but worse. Don't do it. (#12729 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12729 This may have a dependency on D10380678 if size_from_dim(0) was required because numel() used to return -1 in some cases. This is no longer true. Reviewed By: li-roy, dzhulgakov Differential Revision: D10415069 fbshipit-source-id: 39f46f56249ecaf3533f62a0205b3a45d519d789	2018-10-18 18:06:37 -07:00
Xiang Li	89bf98ac4c	Update '__all__' in '__init.py__' (#12762 ) Summary: It's the best coding practice to always include dynamically declared module level methods in the "__all__" field. Otherwise, IDEs (such as PyCharm) with referenced module inspectors will complain "Cannot find reference ..." . This PR adds 'rand' and 'randn' in __init.py__' . Pull Request resolved: https://github.com/pytorch/pytorch/pull/12762 Differential Revision: D10427541 Pulled By: ezyang fbshipit-source-id: ec0704dfd91e78d7ad098b42cfd4bd1ad0e119df	2018-10-18 17:52:10 -07:00
Edward Yang	a223c5ed2c	Extend ONNX while op by x2, rather than x1.02 Summary: I think the original author wrote 2.0f in attempt to double in size, but this argument takes a percentage increase, not a factor increase. Created from Diffusion's 'Open in Editor' feature. Reviewed By: jamesr66a Differential Revision: D10412946 fbshipit-source-id: 95eb3d284255f232b7782bb1d2c9c2ef8aa6f8a7	2018-10-18 17:49:51 -07:00
Lu Fang	f9d1b63d18	Automatic update of fbcode/onnx to f8828e532da4795e8ea15f5850a37c5179917b9b (#12823 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12823 Previous import was 1cbe2743cda739ff752d6ce79553b0ef8ad49783 Included changes: - [f8828e5](https://github.com/onnx/onnx/commit/f8828e5): Use vector instead of set to keep the order of the opt passes (#1524) <Lu Fang> - [b5a37c4](https://github.com/onnx/onnx/commit/b5a37c4): Pin awscli to last known good version (#1518) <bddppq> - [3e219f6](https://github.com/onnx/onnx/commit/3e219f6): ONNX Optimization Rewrite (#1452) <Armen> - [96758c9](https://github.com/onnx/onnx/commit/96758c9): Add MaxUnpool op to ONNX. (#1494) <Spandan Tiwari> - [c4f7043](https://github.com/onnx/onnx/commit/c4f7043): Update docker image version used in CircleCI (#1511) <bddppq> Differential Revision: D10447573 fbshipit-source-id: 8748ba6e3be322a26a9a360ff7f2babd54fd581f	2018-10-18 16:17:25 -07:00
James Reed	f380f0ba27	Move torch.onnx.operators functions into ATen (#12803 ) Summary: These were indiscriminately dumping `onnx::` instructions into traces, and making it so you couldn't run the traces in the JIT interpreter Pull Request resolved: https://github.com/pytorch/pytorch/pull/12803 Differential Revision: D10443526 Pulled By: jamesr66a fbshipit-source-id: 07172004bf31be9f61e498b5772759fe9262e9b3	2018-10-18 16:04:34 -07:00
Anders Papitto	79709f02e9	fix overwriting of CMAKE_EXE_LINKER_FLAGS (#12834 ) Summary: bug lurking since 2016 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12834 Reviewed By: bddppq Differential Revision: D10452484 Pulled By: anderspapitto fbshipit-source-id: 352584af06e2fb35338fb66b3d8eb1050b716349	2018-10-18 15:34:28 -07:00
Dmytro Dzhulgakov	92890d4314	Delete ExtendTensor operator Summary: Added 2 years ago in D3665603, never used, kill it. Reviewed By: ezyang Differential Revision: D10421336 fbshipit-source-id: 1b027a9ef2b71d0dd2c572cd4338bc8e046320d8	2018-10-18 15:18:40 -07:00
Zachary DeVito	324a510f9c	JIT Cleanups (#12804 ) Summary: 1. Change scope ownership model so they can be shared across Graphs. Now scopes own their parent and are intrusive pointers. Graphs no longer require a scope_root and cloning a node automatically clones its scope. This causes some changes in expect files for trace+script things. As far as I can tell these are not bugs but a different way of interpreting how scopes should propagate. Big traces like that of alexnet keep their scopes unchanged. 2. Remove VariableType.cpp dependency on a symbol being in the pre- declared symbol list. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12804 Differential Revision: D10447922 Pulled By: zdevito fbshipit-source-id: dcfcaf514bbe5687047df0f79c2be536ea539281	2018-10-18 14:41:55 -07:00
Andrey Malevich	6058886b03	Speedup pnorm (#12811 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12811 L1 version of this operator was super slow and timing out one of our unit-tests. This diff is addressing TODO and making it fast. Reviewed By: chocjy Differential Revision: D10444267 fbshipit-source-id: 550b701b6a5cb3f2540997fd7d8b920400b983a6	2018-10-18 14:22:55 -07:00
James Sun	68843c683d	Open source multithreaded predictor bench utils (#11135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11135 This diff does not have any logic change; it simply move files/functions/classes around. Open source (almost all) necessary dependency for multithreaded predictor bench. The benchmark itself can be open sourced once the predictor is open sourced. Reviewed By: salexspb Differential Revision: D9602006 fbshipit-source-id: 386c9483e2c64c8b7d36e4600189c4e0b7e159ff	2018-10-18 14:16:36 -07:00
Joel Marcey	ee563c5899	Add license reference to README.md Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12822 Differential Revision: D10451895 Pulled By: JoelMarcey fbshipit-source-id: dee4cafd3120571e52cf242bb0674c7aa7dab217	2018-10-18 14:10:24 -07:00
Will Feng	9473e57eca	Revert D10444104: [pytorch][PR] Windows CI integration for custom ops Differential Revision: D10444104 Original commit changeset: 4c447beeb967 fbshipit-source-id: ead52444aefa27692e3f36dadad986e2313261bd	2018-10-18 14:08:18 -07:00
Yinghai Lu	ed317b6203	Remove useless MKL target (#12783 ) Summary: Context: https://github.com/pytorch/pytorch/pull/12625#issuecomment-430560919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12783 Differential Revision: D10451726 Pulled By: yinghai fbshipit-source-id: 3cd1e61209628d7c52b440e5b232ae95dd09885e	2018-10-18 14:03:34 -07:00
Junjie Bai	805f4d5cb8	Revert D10416438: Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE Differential Revision: D10416438 Original commit changeset: cb842e3e26b0 fbshipit-source-id: c0760e73ecc76ca9b1b74f6844e243c2df5260a2	2018-10-18 13:46:33 -07:00
Jerry Zhang	57ddc08a57	Enable multiple external output (#12778 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12778 att Differential Revision: D10248027 fbshipit-source-id: fc3d17314e8c2d9704b8bfcc50ace176ec2c85d7	2018-10-18 13:36:23 -07:00
Bram Wasti	dec9bc5f0b	Expose device_option directly Summary: as title states Reviewed By: duc0 Differential Revision: D10442424 fbshipit-source-id: bba2dd600e1979ff018ac0e403463f992a94a6e5	2018-10-18 13:22:17 -07:00
Michael Antonov	63cd051867	Guard all Caffe2 protobuf string serializations with CAFFE_ENFORCE (#12799 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12799 Updated all non-test uses of protobuf::MessageLite::SerializeAsString to call SerializeAsString_EnforceCheck so that the return value is checked and can throw an exception if failing. Most of the affected code was called from classes derived from BlobSerializeBase. Didn't touch most tests and ENFORCE calls because they usually do checks anyway. Reviewed By: ezyang Differential Revision: D10416438 fbshipit-source-id: cb842e3e26b0918829d71267a375d4dd40600d58	2018-10-18 12:49:01 -07:00
Duc Ngo	2c566a17c7	nomnigraph - simplify subgraph matching APIs (#12681 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12681 - Get rid of NodeMatchCriteria as a template parameter, which was too generic. So MatchNode<NodeMatchCriteria> becomes MatchNode<GraphType>, and MatchStore stores the predicate on GraphType::NodeRef. - Similarly, get rid of NNNodeMatchCriteria Now one can just pass in a function pointer NodeRef -> bool to NNMatchNode constructor directly like this mg.createNode(is<Relu>) - Merge static utilities in SubgraphMatcher class into MatchGraph class - Rename MatchNode to MatchPredicate Change use cases and tests to make it work Reviewed By: ZolotukhinM Differential Revision: D10386907 fbshipit-source-id: 43874bd154e3d7c29ce07b4b74eca8a7a9f3078a	2018-10-18 12:32:40 -07:00
Pieter Noordhuis	9c617140f7	Try to reduce c10d test flakiness (#12782 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12782 We have seen the "Address already in use" error popup a few times when instantiating the TCPStore. The port that it uses is dynamically generated through common.find_free_port(), which binds a new socket to a random port, closes the socket, and returns the port that the OS had assigned. If some other process grabs that port in the time between closing the socket and the TCPStore binding to it, the bind error shows up. This commit changes most tests to use the FileStore instead and includes a retry when testing the TCPStore. Differential Revision: D10433401 fbshipit-source-id: 8dd575ac91a3cddd1cc41ddb0ff4311ddc58c813	2018-10-18 12:12:33 -07:00
Max Katsev	3fe35300ed	Revert D10417038: [pytorch][PR] Use C locale in lexer Differential Revision: D10417038 Original commit changeset: 1d5f2f9a24ec fbshipit-source-id: 5780fed8e29551ec5b0a56ad6966a560c02bc171	2018-10-18 11:45:18 -07:00
James Reed	545f22c070	Link libshm against c10 (#12802 ) Summary: Fixes this build failure i got: https://gist.github.com/jamesr66a/1e0025d8d6d30b090f0e247457063093 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12802 Differential Revision: D10447916 Pulled By: jamesr66a fbshipit-source-id: ab2cddff95429881db992c04e80453a46eb81f79	2018-10-18 11:38:42 -07:00
Edward Yang	5b971445a6	Typo fix (#12826 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12826 Differential Revision: D10449047 Pulled By: ezyang fbshipit-source-id: eb10aa5886339b43bb8c239dd8742e458f3d024d	2018-10-18 11:36:00 -07:00
Tommy Yu	2b63b7a0a5	Support GPU version of Spatial Batch Norm (#11711 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11711 Added GPU support for spatial batch normalization. This functions by reducing values from GPUs onto a CPU and broadcasting those results back to each GPU. We have run several experiments, and found these results to be better than those without spatial bn: https://fb.quip.com/fr7HAeDliPB8 Reviewed By: enosair Differential Revision: D9547420 fbshipit-source-id: ccbd2937efd6cfd61182fff2f098fb7c5ae8aeb1	2018-10-18 11:22:13 -07:00
Lu Fang	e240e89984	move the torch/csrc/jit/serialization.h to caffe2 source folder and rename to inline_container.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12781 Reviewed By: dzhulgakov Differential Revision: D10436151 Pulled By: houseroad fbshipit-source-id: 7f59eec21df5acbab0ea693e1a1cd4fa152f05e5	2018-10-18 09:47:19 -07:00
Duc Ngo	963b012bd8	nomnigraph - HEFT scheduler (#12788 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12788 Static task scheduling algorithm - Input/Output for static scheduler - HEFT static scheduling algorithm - Theoretical critical path analyzer Reviewed By: bwasti Differential Revision: D10436418 fbshipit-source-id: 074bc587b9a2c7cb2d9e64291981ff1c160f02b2	2018-10-18 08:40:46 -07:00
Peter Goldsborough	12be60cc04	Windows CI integration for custom ops (#11527 ) Summary: This is likely currently broken due to symbol visibility issues, but we will investigate it using this PR. CC orionr yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11527 Differential Revision: D10444104 Pulled By: goldsborough fbshipit-source-id: 4c447beeb9671598ecfc846cb5c507ef143459fe	2018-10-18 07:55:05 -07:00
Peter Goldsborough	eb6a1245a2	Fix torch::jit::load docs (#12709 ) Summary: `torch::jit::load` is currently incorrectly documented/rendered soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12709 Differential Revision: D10422064 Pulled By: goldsborough fbshipit-source-id: 4b195a84847d731ae3fe2d40868ebe858d510a2e	2018-10-18 07:52:13 -07:00
Peter Goldsborough	b1a6fa90e1	Add script::Module::to (#12710 ) Summary: There is currently no obvious way for users to move their `script::Module` to GPU memory. This PR implements the `to()` functions that C++ frontend modules have. zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/12710 Differential Revision: D10444103 Pulled By: goldsborough fbshipit-source-id: daa0ec7e7416c683397ee392c6e78b48273f72c7	2018-10-18 07:48:51 -07:00
Wei Yang	710191e292	fix error message of large kernel size in conv2D (#12791 ) Summary: - fix #12565 - test plan: with this fix, we have: ``` >>> m = nn.Conv2d(in_channels=3, out_channels=33, kernel_size=10, stride=1, bias=True) >>> input = torch.randn(1, 3, 1, 1) >>> output = m(input) ``` RuntimeError: Calculated padded input size per channel: (1 x 1). Kernel size: (10 x 10). Kernel size can't be greater than actual input size at ~/pytorch/aten/src/THNN/generic/SpatialConvolutionMM.c:50 not sure why these are `int` instead of `int64_t`: `5ccdd7a626/aten/src/THNN/generic/SpatialConvolutionMM.c (L10)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12791 Differential Revision: D10443045 Pulled By: weiyangfb fbshipit-source-id: 2620acb40bdd49d29cec06337f6dfb4653d1987c	2018-10-18 00:51:16 -07:00
Lu Fang	f1e7d384b6	Support scales as inputs in ResizeNearest (#12720 ) Summary: To address https://github.com/onnx/onnx/pull/1467 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12720 Reviewed By: BIT-silence Differential Revision: D10414813 Pulled By: houseroad fbshipit-source-id: 8831381b0115c363065c8d23bd1a95b4d641b857	2018-10-17 23:08:53 -07:00
James Sun	f4944f0f8a	Rename test/common.py to test/common_utils.py (#12794 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12794 common.py is used in base_module for almost all tests in test/. The name of this file is so common that can easily conflict with other dependencies if they happen to have another common.py in the base module. Rename the file to avoid conflict. Reviewed By: orionr Differential Revision: D10438204 fbshipit-source-id: 6a996c14980722330be0a9fd3a54c20af4b3d380	2018-10-17 23:04:29 -07:00
Sepehr Sameni	cffeb03a2d	fix forward and backward for norm with negative infinity norm (#12722 ) Summary: I found a bug in norm() and fixed it (and added tests to make sure it's fixed) here is how to reproduce it: ```python import torch x = torch.FloatTensor([[10, 12, 13], [4, 0, 12]]) print(torch.norm(x, -40, dim=0, keepdim=True)) #output is tensor([[ 4.0000, 0.0000, 11.9853]]) print(torch.norm(x, float('-inf'), dim=0, keepdim=True)) #output is tensor([[1., 1., 1.]]) which is wrong! from numpy.linalg import norm as np_norm x = x.numpy() print(np_norm(x, ord=-40, axis=0)) #output is array([[4., 0., 11.985261]]) print(np_norm(x, ord=float('-inf'), axis=0)) #output is array([[4., 0., 12.0]]) ``` it's related to [#6817](https://github.com/pytorch/pytorch/issues/6817) and [#6969](https://github.com/pytorch/pytorch/pull/6969) Pull Request resolved: https://github.com/pytorch/pytorch/pull/12722 Differential Revision: D10427687 Pulled By: soumith fbshipit-source-id: 936a7491d1e2625410513ee9c39f8c910e8e6803	2018-10-17 21:07:43 -07:00
Xiaomeng Yang	ed5eb7196b	Add quantized GroupNormOp (#11852 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11852 Add quantized GroupNormOp Reviewed By: houseroad Differential Revision: D9931468 fbshipit-source-id: 02af82d98356a49736e44162042783c9e36a81b5	2018-10-17 18:32:44 -07:00
Yangqing Jia	08aab4dfdd	remove ATen/Error.h and ATen/core/Error.h (#12792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12792 This is a follow up diff after D10238910. Only non-codemod change is the removal of ATen/Error.h and ATen/core/Error.h. Other files are basically changing the inclusion path + clang format for inclusion order. Reviewed By: bddppq Differential Revision: D10437824 fbshipit-source-id: 7f885f80ab5827468d1351cfb2765d0e3f555a69	2018-10-17 17:25:42 -07:00
Will Feng	cd88c5ccf4	CircleCI hot fix: pin awscli to 1.16.35 (#12787 ) Summary: awscli==1.16.36 is broken: https://circleci.com/gh/pytorch/pytorch/77338?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-build-link Pull Request resolved: https://github.com/pytorch/pytorch/pull/12787 Differential Revision: D10437424 Pulled By: yf225 fbshipit-source-id: c15bed7aa83ddca92ff32e2aaa69fbe97ac6ab1c	2018-10-17 15:57:52 -07:00
Kaixhin	84ce3ab47e	Add MAE and L2 loss to docs (#12754 ) Summary: Fixes #12751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12754 Differential Revision: D10427661 Pulled By: ezyang fbshipit-source-id: 75bbef85976e253ab5a7140fc57f7a0ad34d96f5	2018-10-17 15:40:20 -07:00
Orion Reblitz-Richardson	5ccdd7a626	Support cmake3 for 14.04 and CentOS (#12771 ) Summary: Fix https://github.com/caffe2/caffe2.github.io/issues/24 cc pjh5 anderspapitto soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/12771 Reviewed By: anderspapitto Differential Revision: D10430865 Pulled By: orionr fbshipit-source-id: 10c03cd25ab9faad49d53d0f18dd9566bfd28ae2	2018-10-17 15:02:19 -07:00
Sam Gross	21ff6de4b3	Add missing HANDLE_TH_ERRORS (#12770 ) Summary: THPSize_pynew is called from the Python C API and may throw exceptions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12770 Differential Revision: D10431180 Pulled By: colesbury fbshipit-source-id: 93dd1b604ac6bc05d4eb02b97e3f79a73aec73c5	2018-10-17 13:52:02 -07:00
Jerry Zhang	ab1a25aa9b	caffe2::empty for Resize+mutable_data refactor (#12407 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12407 We want to use tensor factory to refactor the caffe2's old way of initialize Tensor by Resize and mutable_data in order to eliminate uninitialized Tensor. Previously when we want to create a Tensor in caffe2, we'll do the following ``` Tensor x(CPU); // device type provided x.Resize({1, 2, 3}); // size provided x.mutable_data<float>(); // data type provided and memory allocated ``` This leaves Tensor in not fully initialized state during the process, to eliminate this, we want to provide all the needed information in the begining. ATen already has its TensorFactories: https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/native/TensorFactories.cpp, and there is a TensorOption, we want to adopt the same interface to ease future refactoring. In the callsite, we used to have `Output(i)` that returns a `Blob` that contains an uninitialized `Tensor` and we'll call Resize and mutable_data afterwards to provide dimension and data type, ``` // uninitialized tensor auto* Y = Output(0); // set dimensions Y->Resize({1, 2, 3}); // actually allocate the data auto* data = Y->mutable_data<float>(); // After this step, Tensor is fully initialized. ``` We want to change it to the following: ``` // provide dimensions and TensorOptions which include device type and data type. // This will set all the information of Tensor properly and also allocate memory. auto* Y = Output(0, {1, 2, 3}, at::device({context_.device_type()}).template dtype<T>()); // Tensor is fully initialized after this step // following `mutable_data` call won't allocate memory. auto* data = Y->mutable_data<float>(); ``` microbenchmarks ``` ============================================================================ caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s ============================================================================ OperatorNewOutputTensorAPI 3.27us 306.05K OperatorOldOutputTensorAPI 3.55us 281.54K ============================================================================ ``` Reviewed By: ezyang Differential Revision: D10207890 fbshipit-source-id: f54ddacaa057b7c6bc7d5a8290171f35e9e40e29	2018-10-17 13:03:06 -07:00
Yangqing Jia	7d5f7ed270	Using c10 namespace across caffe2. (#12714 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12714 This is a short change to enable c10 namespace in caffe2. We did not enable it before due to gflags global variable confusion, but it should have been mostly cleaned now. Right now, the plan on record is that namespace caffe2 and namespace aten will fully be supersets of namespace c10. Most of the diff is codemod, and only two places of non-codemod is in caffe2/core/common.h, where ``` using namespace c10; ``` is added, and in Flags.h, where instead of creating aliasing variables in c10 namespace, we directly put it in the global namespace to match gflags (and same behavior if gflags is not being built with). Reviewed By: dzhulgakov Differential Revision: D10390486 fbshipit-source-id: 5e2df730e28e29a052f513bddc558d9f78a23b9b	2018-10-17 12:57:19 -07:00
Peter Goldsborough	348867c10b	Remove cereal submodule (#12666 ) Summary: Cereal is dead! soumith orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/12666 Reviewed By: soumith Differential Revision: D10422061 Pulled By: goldsborough fbshipit-source-id: ca1ac66d05e699df9de00fc340a399571b7ecb9f	2018-10-17 11:52:47 -07:00
Sebastian Messmer	dd7501e3a8	Remove Blob::ShareExternal from serialization (#11926 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11926 With the preparation work of diffs stacked below, we're now able to remove this call to Blob::ShareExternal(), preparing for removing that function from Blob, Reviewed By: dzhulgakov Differential Revision: D9884563 fbshipit-source-id: 7dd5c5fe02be0df7a44be45587c1dd7c474126ef	2018-10-17 11:50:35 -07:00
Sebastian Messmer	6cbf1992bd	Serialization takes pointers instead of Blob (#11925 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11925 This is step 1 in the refactoring to remove Blob::ShareExternal(), i.e. Blob would then always own its contents. ShareExternal() is for example used to pass non-owning blobs to serialization. This diff prepares removing that. Reviewed By: ezyang Differential Revision: D9884177 fbshipit-source-id: d01df9a613a4fc62e5679fe45bfc47e2c899b818	2018-10-17 11:50:34 -07:00
Ailing Zhang	25db86cca5	Fix isfinite for int input (#12750 ) Summary: `torch.isfinite()` used to crash on int inputs. ``` >>> import torch >>> a = torch.tensor([1, 2]) >>> torch.isfinite(a) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/scratch/pytorch/torch/functional.py", line 262, in isfinite return (tensor == tensor) & (tensor.abs() != inf) RuntimeError: value cannot be converted to type int64_t without overflow: inf ``` But this is a easy special case and numpy also supports it. ``` >>> import numpy as np >>> a = np.array([1, 2]) >>> a.dtype dtype('int64') >>> np.isfinite(a) array([ True, True], dtype=bool) ``` So added a hacky line to handle non-floating-point input. Since pytorch raises exception when overflow, we can safely assume all valid int tensors are infinite numbers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12750 Differential Revision: D10428204 Pulled By: ailzhang fbshipit-source-id: f39b2d0975762c91cdea23c766ff1e21d85d57a5	2018-10-17 11:48:25 -07:00
Zachary DeVito	9a76e84a08	Use C locale in lexer (#12739 ) Summary: Possible fix for #11326. Testing in CI for windows code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12739 Differential Revision: D10417038 Pulled By: zdevito fbshipit-source-id: 1d5f2f9a24eceef7047dc218669faca8a187c65c	2018-10-17 10:42:38 -07:00
Wei Yang	459cff93fe	fix math formula for conv1d and conv2d (#12740 ) Summary: - fix math formula - test plan: build html and view on a browser Pull Request resolved: https://github.com/pytorch/pytorch/pull/12740 Differential Revision: D10419430 Pulled By: weiyangfb fbshipit-source-id: b8eee9e75c3ce6e37535e3de597431ef5030e9ac	2018-10-17 10:24:11 -07:00
Andrey Malevich	e027f7a913	Fix character with wrong encodding in documentation (#12761 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12761 , is not really , and thus it can fail some of the Python 2 import. Reviewed By: weiyangfb Differential Revision: D10423231 fbshipit-source-id: 3738c0b9d2f52aa47eef06250f84c5933a38783f	2018-10-17 10:20:45 -07:00
James Reed	9d79030d38	Fixup THPUtils_unpackIndex (#12738 ) Summary: See https://github.com/pytorch/pytorch/issues/12735 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12738 Differential Revision: D10416682 Pulled By: jamesr66a fbshipit-source-id: 69f3452750dffda3cfed50463d9241fd7b52528b	2018-10-17 10:16:54 -07:00
103yiran	409ee5bcd9	Remove redundant semicolon Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12753 Differential Revision: D10427674 Pulled By: ezyang fbshipit-source-id: f790dbbafc6b1965c4e1368f311076ea045555de	2018-10-17 09:52:48 -07:00
Jaivarsan	1a6071d436	fixing `seq` to `tensors` in documentation (#12741 ) Summary: Fixes #12251 In the docs the actual key word argument was supposed to be `tensors` but instead it is given as `seq` for doing `torch.cat` operation. zou3519 can you review this code? I don't have access to request for code reviews. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12741 Differential Revision: D10419682 Pulled By: ezyang fbshipit-source-id: a0ec9c3f4aeba23ac3a99e2ae89bd07d2b9ddb58	2018-10-17 09:16:04 -07:00
Sebastian Messmer	7edfe11ba4	Use TypeMeta::dtor() instead of Blob::DestroyCall (#11500 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11500 Since TypeMeta already stores a destructor, and we removed the ability from Blob to store a custom destructor in a diff stacked below this, there is now no reason for Blob to store it again. Reviewed By: ezyang Differential Revision: D9763423 fbshipit-source-id: d37a792ffd6928ed1906f5ba88bd4f1d1e2b3781	2018-10-17 06:21:46 -07:00
Sebastian Messmer	7b7bf09e3c	Add TypeMeta::New/Delete (#12307 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12307 This adds non-placement variants of New/Delete to TypeMeta. In a future diff, this is going to be used from Blob to destruct its contents. Reviewed By: dzhulgakov Differential Revision: D10184116 fbshipit-source-id: 7dc5592dbb9d7c4857c0ec7b8570329b33ce5017	2018-10-17 06:21:45 -07:00
Guillaume Huard	90737f7f5d	Fix missing final activation in NLLLoss second example (#12703 ) Summary: Fixed the second example in NLLLoss. The LogSoftmax activation was missing after the convolution layer. Without this activation, the second example loss was sometimes negative. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12703 Differential Revision: D10419694 Pulled By: ezyang fbshipit-source-id: 98bfefd1050290dd5b29d3ce18fe075103db4674	2018-10-17 02:57:39 -07:00
Thomas Viehmann	0521c47c91	Amend nondeterminism notes (#12217 ) Summary: include atomicAdd commentary as this is less well known There is some discussion in #12207 Unfortunately, I cannot seem to get the ..include working in `_tensor_docs.py` and `_torch_docs.py`. I could use a hint for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12217 Differential Revision: D10419739 Pulled By: SsnL fbshipit-source-id: eecd04fb7486bd9c6ee64cd34859d61a0a97ec4e	2018-10-16 23:59:26 -07:00
Andrey Malevich	8c873def88	Revert D10220313: restore caffe2 strides Differential Revision: D10220313 Original commit changeset: aaf9edebf4ff fbshipit-source-id: 46c4d23d89d47be26c3f4967476271d8c2f95f11	2018-10-16 23:57:20 -07:00
bddppq	70c527dacd	Re-disable softmax ops tests in ROCM (#12749 ) Summary: They are flaky in master. ashishfarmer petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/12749 Differential Revision: D10420265 Pulled By: bddppq fbshipit-source-id: cac58efb711941786b10b07ada58e0d59ab1db1d	2018-10-16 22:54:50 -07:00
Tongzhou Wang	034c969f3c	Simply exit DataLoader when Python is dying (#12700 ) Summary: I struggled with yet another DataLoader hang for the entire evening. After numerous experiments, I realized that it is unsafe to do anything when Python is shutting down. We also unfortunately implement our DataLaoder cleaning-up logic in `__del__`, a function that may or may not be called during shutdown, and if called, may or may not be called before core library resources are freed. Fortunately, we are already setting all our workers and pin_memory_thread as daemonic. So in case of Python shutting down, we can just do a no-op in `__del__` and rely on the automatic termination of daemonic children. An `atexit` hook is used to detect Python exit. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12700 Differential Revision: D10419027 Pulled By: SsnL fbshipit-source-id: 5753e70d03e69eb1c9ec4ae2154252d51e2f79b0	2018-10-16 22:05:33 -07:00
Thomas Viehmann	d34578026c	Various example code fixes (#12707 ) Summary: - Fix broken sparse_coo_examples, update output - Tensor(...) to tensor(...) - Fix arguments to math.log to be floats While the last might be debateable, mypy currently complains when passing an int to math.log. As it is not essential for our examples, let's be clean w.r.t. other people's expectations. These popped up while checking examples in the context of #12500 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/12707 Differential Revision: D10415256 Pulled By: SsnL fbshipit-source-id: c907b576b02cb0f89d8f261173dbf4b3175b4b8d	2018-10-16 21:59:40 -07:00
Zachary DeVito	c8ac878b98	Fix bug in script for where (#12385 ) Summary: Where is declared as: ``` where(Tensor condition, Tensor self, Tensor other) ``` Previously the compiler assumed that self must be the first argument. But this is not true in practice for `where` and for a few other exceptions. This changes the compiler to take an explicit self argument which gets matched to the `self` that appears in the schema. Note that this requires renaming a variant of pow, which referred to an exponent Tensor as `self` because otherwise that would cause `t^3` to match against `t` being the exponent. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12385 Differential Revision: D10364658 Pulled By: zdevito fbshipit-source-id: 39e030c6912dd19b4b0b9e35fcbabc167b4cc255	2018-10-16 21:05:14 -07:00
Bram Wasti	84edd4a48b	Enable mapping from operatordef to converted node for debugging Summary: Add a mapping for conversion -- this will help with debugging as well but is directly used by the TUI stacked on top of this Reviewed By: duc0 Differential Revision: D10396130 fbshipit-source-id: cdd39278f0ed563bb828b1aebbbd228f486d89c8	2018-10-16 21:03:28 -07:00
bstriner	1bf642800d	Remove duplicate descriptors (#8321 ) Summary: This PR removes some duplication in `recurrent_op_cudnn.cc`. Instead of 4 of the same exact descriptor, should work fine with just 1. I don't see any other code that relies on those being 4 separate locations, but if that is what you need you can always allocate additional descriptors as necessary. Have not fully tested this thing out, just something I noticed when I was reading through the descriptor code. Cheers Pull Request resolved: https://github.com/pytorch/pytorch/pull/8321 Differential Revision: D10363744 Pulled By: ezyang fbshipit-source-id: 733c8242fb86866f1d64cfd79c54ee7bedb03b84	2018-10-16 20:59:00 -07:00
wuhuikx	e497aa1e35	Optimize UpsampleNearest Op (#12151 ) Summary: Optimize the UpsampleNearest Op. 1. Add OMP 2. revise the translated_idx method Pull Request resolved: https://github.com/pytorch/pytorch/pull/12151 Differential Revision: D10362856 Pulled By: ezyang fbshipit-source-id: 535a4b87c7423942217f2d79bedc463a0617c67a	2018-10-16 20:34:20 -07:00
Thomas Viehmann	ba25e13782	Forbid Module.to with copy argument. (#12617 ) Summary: Module.to uses the Tensor.to parsing facility. It should not, however, accept "copy" as a keyword/fourth positional argument. See #12571 for discussion. Thank you SsnL for noticing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12617 Differential Revision: D10392053 Pulled By: ezyang fbshipit-source-id: b67a5def7993189b4b47193abc7b741b7d07512c	2018-10-16 20:31:44 -07:00
ChongyuIntel	5416260b1e	Add the OpenMP optimization for BatchPermutation. (#12153 ) Summary: This is for Caffe2 optimization. WIth this optimization, the following two ops can boost a lot. (Test with MaskRCNN, on SKX8180 one socket) BatchPermutation op: reduced from 8.296387 ms to 1.4501984 ms. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12153 Differential Revision: D10362823 Pulled By: ezyang fbshipit-source-id: 04d1486f6c7db49270992cd8cde41092154e62ee	2018-10-16 20:23:09 -07:00
Edward Yang	3709734b1c	Improve reporting on pytest. (#12610 ) Summary: Before and after coming after I run the tests on CI Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12610 Differential Revision: D10419483 Pulled By: ezyang fbshipit-source-id: 5543e971f8362e4cea64f332ba44a26c2145caea	2018-10-16 20:15:01 -07:00
Edward Yang	3bfa7258b3	Don't serialize hooks (#11705 ) Summary: Fixes #11683. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11705 Differential Revision: D9833057 Pulled By: ezyang fbshipit-source-id: 18af9bcd77b088326738d567100fbe4a4c869dd6	2018-10-16 20:11:03 -07:00
Edward Yang	b1892226aa	A quick rundown of codebase structure. (#12693 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12693 Differential Revision: D10419424 Pulled By: ezyang fbshipit-source-id: dc3999253f19b5615849619bd3e4a77ab3ca984e	2018-10-16 20:02:27 -07:00
Yinghai Lu	0054df19b1	Simplify InheritOnnxSchema registration (#12696 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12696 In majority of the case, we use `InheritOnnxSchema(type_)`. This diff makes declaration of such case easier. Reviewed By: bddppq Differential Revision: D10395109 fbshipit-source-id: 914c1041387d5be386048d923eb832244fc506c3	2018-10-16 19:59:49 -07:00
Wei Yang	81975a497f	update docs for sparse tensor (#12221 ) Summary: - update docs examples at sparse tensor after print format changed - update example to create empty sparse tensor: ``` >>> torch.sparse_coo_tensor(torch.LongTensor(size=[1,0]), [], torch.Size([1])) tensor(indices=tensor([], size=(1, 0)), values=tensor([], size=(0,)), size=(1,), nnz=0, layout=torch.sparse_coo) ``` zou3519 SsnL yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12221 Differential Revision: D10412447 Pulled By: weiyangfb fbshipit-source-id: 155b8cb0965f060e978f12239abdc1b3b41f6ab0	2018-10-16 19:56:51 -07:00
Yinghai Lu	dc07102b17	Check dim size preventively when doing shape inference for BatchMatMul (#12691 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12691 We check input(0) but not input(1) in BatchMatMul. This may result in a protobuf exception which won't be caught by upstream and causing termination of the program. Check that with `CAFFE_ENFORCE` will be caught by upstream inference function. Plus, it will print out clean stack tracing showing where went wrong. Reviewed By: bddppq, houseroad, BIT-silence Differential Revision: D10391130 fbshipit-source-id: daf8dcd8fcf9629a0626edad660dff54dd9aeae3	2018-10-16 17:27:44 -07:00
Thomas Viehmann	50c0aedbec	Don't segfault on Tensor.__delitem__ (#12726 ) Summary: The mapping protocol stipulates that when `__delitem__` is called, this is passed to `__setitem__` [(well, the same function in the C extension interface)](https://docs.python.org/3/c-api/typeobj.html#c.PyMappingMethods.mp_ass_subscript) with NULL data. PyTorch master crashes in this situation, with this patch, it does not anymore. Test code (careful, sefaults your interpreter): ```python import torch a = torch.randn(5) del a[2] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12726 Differential Revision: D10414244 Pulled By: colesbury fbshipit-source-id: c49716e1a0a3d9a117ce88fc394858f1df36ed79	2018-10-16 17:24:18 -07:00
Sebastian Messmer	6476e4598c	Rename TypeMeta function pointers (#12306 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12306 In a future diff, I'm going to introduce non-placement constructor and destructor to TypeMeta. To make it less ambigous, this diff is first renaming the existing ones to PlacementXXX. Reviewed By: dzhulgakov Differential Revision: D10184117 fbshipit-source-id: 119120ebc718048bdc1d66e0cc4d6a7840e666a4	2018-10-16 16:45:47 -07:00
Ashish	d0df1e8ec9	Remove MIOpen Softmax operator (#12727 ) Summary: This PR contains changes for: 1. Removing MIOpen softmax operator. Will be added later with the required functionality 2. Enabling softmax_ops_test on ROCm target Differential Revision: D10416079 Pulled By: bddppq fbshipit-source-id: 288099903aa9e0c3378e068fffe6e7d6a9a84841	2018-10-16 16:45:46 -07:00
Lu Fang	30aaa07594	New serialization format (#12384 ) Summary: Addressed Dima's feedback. The proposal is here: https://fb.quip.com/TbQmAuqIznCf Pull Request resolved: https://github.com/pytorch/pytorch/pull/12384 Reviewed By: dzhulgakov Differential Revision: D10246743 Pulled By: houseroad fbshipit-source-id: c80db0c35d60ca32965275da705f2b1dfb2a7265	2018-10-16 16:36:58 -07:00
Tongzhou Wang	ac994f2c78	Fix SpectralNorm with DataParallel (#12671 ) Summary: There were two problems with SN + DP: 1. In SN, the updated _u vector is saved back to module via a `setattr`. However, in DP, everything is run on a replica, so those updates are lost. 2. In DP, the buffers are broadcast via a `broadcast_coalesced`, so on replicas they are all views. Therefore, the `detach_` call won't work. Fixes are: 1. Update _u vector in-place so, by the shared storage between 1st replica and the parallelized module, the update is retained 2. Do not call `detach_`. 3. Added comments in SN about the subtlety. 4. Added a note to the DP doc on this particular behavior of DP. cc crcrpar taesung89 The controller you requested could not be found. yaoshengfu Fixes https://github.com/pytorch/pytorch/issues/11476 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12671 Differential Revision: D10410232 Pulled By: SsnL fbshipit-source-id: c447951844a30366d8c196bf9436340e88f3b6d9	2018-10-16 16:02:17 -07:00
Ansha Yu	c414eb2618	fix improper calling of ShareExternalPointer from RNN op (#12593 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12593 size() returns numel_, but what we really want is nbytes(), which is the capacity. Reviewed By: salexspb Differential Revision: D10354488 fbshipit-source-id: f7b37ad79ae78290ce96f37c65caa37d91686f95	2018-10-16 15:58:14 -07:00
Yinghai Lu	4d698cae2e	Enhance shape inference in ONNXIFI transformer (#12685 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12685 In this diff, we push the fake run of the net into the ONNXIFI transformer, because 1. We cannot do shape inference for every op 2. Since the net has been SSA rewritten, we cannot use shape info from outer workspace directly. In addition, this diff adds input shape info when querying the `onnxBackendCompatibility` function. Reviewed By: bddppq Differential Revision: D10390164 fbshipit-source-id: 80475444da2170c814678ed0ed3298e28a1fba92	2018-10-16 14:15:46 -07:00
Lu Fang	f53d5e0a75	Automatic update of fbcode/onnx to 1cbe2743cda739ff752d6ce79553b0ef8ad49783 (#12676 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12676 Previous import was 06f6d63d5529e3a94533c9f34c402be1793420b1 Included changes: - [1cbe274](https://github.com/onnx/onnx/commit/1cbe274): fix the optimizer (#1510) <Lu Fang> - [481ad99](https://github.com/onnx/onnx/commit/481ad99): Fix TensorProto int32_data comment (#1509) <Lutz Roeder> - [f04fbe0](https://github.com/onnx/onnx/commit/f04fbe0): fix ninja external (#1507) <Rui Zhu> Reviewed By: jamesr66a, wanchaol Differential Revision: D10388438 fbshipit-source-id: 298100589ce226c63d4e58edf185c9227fd52c85	2018-10-16 10:24:15 -07:00
Ailing Zhang	e15501fb68	fix bce_with_logits with legacy reduce (#12689 ) Summary: Fix #12624 . internal usecase of legacy `reduce`. Add test in test_nn Pull Request resolved: https://github.com/pytorch/pytorch/pull/12689 Reviewed By: ezyang Differential Revision: D10391195 Pulled By: ailzhang fbshipit-source-id: 1af2b258c4abb2b6527eaaeac63e8bf1762c66a1	2018-10-16 09:46:58 -07:00
Roy Li	00f0dca4b5	restore caffe2 strides (#12381 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12381 The workflow passes after D10150834, so we can restore strides. Reviewed By: ezyang Differential Revision: D10220313 fbshipit-source-id: aaf9edebf4ff739cbe45b2d32e77918fce47ba34	2018-10-16 09:19:42 -07:00
Igor Sugak	7035975508	fix double free exposed by latest llvm (#12697 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12697 Latest LLVM started reporting double free related to this code. The stack trace: P60181558 Fix it by using the leaky Meyers' singleton Reviewed By: meyering Differential Revision: D10352976 fbshipit-source-id: 11afc2999235831da10c73609d1153d04742ba18	2018-10-16 07:32:08 -07:00
Gregory Chanan	a9981c8477	Remove Type.tensor, Type.native_tensor. (#12687 ) Summary: They aren't needed anymore now that at::empty can handle all backends. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12687 Differential Revision: D10390740 Pulled By: gchanan fbshipit-source-id: 521d6f92448798aa368186685662451e191c0b05	2018-10-16 07:12:16 -07:00
Gregory Chanan	7d24985852	Kill is_type_dispatched. (#12684 ) Summary: All factory functions are now implemeneted in terms of TensorOptions, which is passed through Type, if necessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12684 Differential Revision: D10390224 Pulled By: gchanan fbshipit-source-id: fb536271735e6e0e542f021e407529998b0482eb	2018-10-16 07:05:49 -07:00
Tongzhou Wang	5b8a640d0b	Update fft docs for new cache size (#12665 ) Summary: Follow up of #12553 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12665 Differential Revision: D10385615 Pulled By: SsnL fbshipit-source-id: 44fe9ec75cb735de37c56270f160a16a1d2bfb64	2018-10-16 01:47:36 -07:00
Peter Goldsborough	0916f4a337	Remove caffe2/submodules/cereal-rev.txt Summary: Zero-th step in removing the cereal submodule. Reviewed By: yns88 Differential Revision: D10385343 fbshipit-source-id: cc93c22b2cafa73f929f2f7659a6f6e66458aa7e	2018-10-16 01:42:20 -07:00
Bram Wasti	04d4ec285c	Cleanup namespace that were moved to ATen accidentally (#12680 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12680 torch::jit shouldn't live in aten Reviewed By: ezyang Differential Revision: D10389502 fbshipit-source-id: f38582e61a275edccf22845c7d709a201f6a0be1	2018-10-16 01:25:08 -07:00
Peter Goldsborough	eb02a1d8a7	Fix clang tidy master comparison (#12674 ) Summary: This PR makes the clang-tidy CI get its diff by comparing the current commit against the base branch that the PR is targeting. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12674 Differential Revision: D10397692 Pulled By: goldsborough fbshipit-source-id: 7fd9e22c92dd885112cd5c003c732d1c12667157	2018-10-16 01:17:18 -07:00
Bram Wasti	31d8e5e71a	Improve Python API with the addition of pythonic setters/getters Summary: Simple additions that make it vastly easier to use nomnigraph in python Reviewed By: duc0 Differential Revision: D10383027 fbshipit-source-id: 441a883b84d4c53cca4f9c6fcc70e58692b8f782	2018-10-16 00:57:54 -07:00
Owen Anderson	f2b62e113c	Clean up IR.h (#12551 ) Summary: Move a lot of methods that don't have an obvious reason for being inline out-of-line. This cleans up the header and should help reduce the problem of touching IR.h and having to rebuild the world. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12551 Differential Revision: D10384808 Pulled By: resistor fbshipit-source-id: 314af89e3282f35fdc94fa3fd3000e3040c8cb6b	2018-10-15 21:21:39 -07:00
Lu Fang	058c1284be	Fix the symbolic for pixel shuffle (#12192 ) Summary: Using Transpose + Reshape, not using DepthToSpace, since they are not available in C2 yet. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12192 Reviewed By: BIT-silence Differential Revision: D10129913 Pulled By: houseroad fbshipit-source-id: b60ee6d53b8ee95fd22f12e628709b951a83fab6	2018-10-15 19:53:35 -07:00
Junjie Bai	a1dd608260	Reduce MAX_JOBS for pytorch rocm build to make CI more stable Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12662 Differential Revision: D10393109 Pulled By: bddppq fbshipit-source-id: e14f72ebc877b5c0f75fe5d195c8b4dbb9b111db	2018-10-15 18:12:46 -07:00
Thomas Viehmann	d80a3eb549	Set philox seed and offset on cuda manual_seed (#12677 ) Summary: Fixes: #12669 Thank you Changmao Cheng for reporting this on the forum with a small example! Pull Request resolved: https://github.com/pytorch/pytorch/pull/12677 Differential Revision: D10391989 Pulled By: ezyang fbshipit-source-id: 5aa7a705bdb8ce6511a8eb1b3a207f22741046bf	2018-10-15 17:45:59 -07:00
Achal	01a333fd7f	OpenCV 4.0 Compatibility fix (#9966 ) Summary: caffe2 compiles with latest opencv 4.0 after committed changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9966 Differential Revision: D10369130 Pulled By: ezyang fbshipit-source-id: 9a104803edca5a22e27e140a794e4b8c878ca416	2018-10-15 17:42:04 -07:00
Yangqing Jia	083e037dea	minor fix (#12688 ) Summary: This seems to be a typo that never got caught - no actual functionality changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12688 Differential Revision: D10391704 Pulled By: Yangqing fbshipit-source-id: ce633776957628c4881956c5423bfab78294d512	2018-10-15 17:25:49 -07:00
ahirner	23c4dbd6d7	Fix ONNX upsample mode (#12648 ) Summary: Fixes #12647 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12648 Differential Revision: D10389124 Pulled By: houseroad fbshipit-source-id: 53bc17b592d0d7f1884b555f3a12a33dbf18b4a0	2018-10-15 17:14:44 -07:00
Konstantin Semianov	7a52117792	Add AdaptiveAvgPool2d and AdaptiveMaxPool2d to ONNX.symbolic (#9711 ) Summary: Add AdaptiveAvgPool2d and AdaptiveMaxPool2d to ONNX.symbolic Due to limitations in ONNX only output_size=1 is supported. AdaptiveAvgPool2d -> GlobalAveragePool AdaptiveMaxPool2d -> GlobalMaxPool Fixes #5310 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9711 Differential Revision: D10363462 Pulled By: ezyang fbshipit-source-id: ccc9f8ef036e1e54579753e50813b09a6f1890da	2018-10-15 17:02:20 -07:00
Bowen Bao	52cbf4b774	Update eigen submodule to fix CUDA arch>=5.3 build issue. (#12191 ) Summary: Discussed in #11379, #12545. Eigen submodule needs to be updated to `f59336cee3` to support building with CUDA arch >= 5.3. It seems there was a similar fix checked in from #6746, but later the Eigen submodule is switched to the current mirror #7793 at a point the fix was not included. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12191 Differential Revision: D10362557 Pulled By: ezyang fbshipit-source-id: 548541e2c93f412bf6680ee80b8da572846f80d2	2018-10-15 17:02:19 -07:00
Dong Shi	e22a776890	Fix for some tests (#12575 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12575 Just my guess as to why those tests are failing. Waiting on sandcastle to see if the tests resolve themselves. Reviewed By: mlappelbaum, wesolwsk Differential Revision: D10305051 fbshipit-source-id: 455597b12bbe27dd6c16f7d0274f2c939949d878	2018-10-15 16:53:18 -07:00
Sebastian Messmer	0b96e5d792	Move some files to c10/util (#12245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12245 Move these files to c10/util: - C++17.h - Metaprogramming.h - TypeList.h - TypeTraits.h - Array.h (including .cpp files and test cases) Reviewed By: ezyang Differential Revision: D10139933 fbshipit-source-id: ce7ce89392bf1a6be070ffdfc0407a8a2ce4ba6e	2018-10-15 16:25:12 -07:00
Viswanath Sivakumar	ade97afc74	Re-enable IDEEP graph rewrite test (#12661 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12661 Was disabled since workspace.has_mkldnn is now set to false Reviewed By: yinghai Differential Revision: D10383913 fbshipit-source-id: ad6dc705f0606b3711e8b450dc384ad3ebb87686	2018-10-15 15:50:28 -07:00
Peter Goldsborough	ab7520eb50	Revamp and document serialization, support streams (#12421 ) Summary: This PR does three things: 1. Add support for serializing to `ostream` and deserializing from `istream`s in addition to files. This is after https://github.com/pytorch/pytorch/pull/11932 added support for streams in `torch::jit::ExportModule` and `torch::jit::load`. 2. Update the internal interface for how things get serialized into archives (e.g. use the more idiomatic `operator<<` instead of a `save` method). The external interface does not change. 3. Add documentation. ezyang ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/12421 Reviewed By: ezyang Differential Revision: D10248529 Pulled By: goldsborough fbshipit-source-id: 6cde6abd0174e3fbf3579c05376a32db0b53755f	2018-10-15 15:47:59 -07:00
Syed Tousif Ahmed	03429e4eaf	Update Gloo submodule to resolve __CUDA_DEPRECATED warning (#12574 ) Summary: Gloo was updated with `type` usage for cudaPointerAttributes which resolves the `__CUDA_DEPRECATED` warnings in our CUDA 10 CI. This PR brings in that change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12574 Differential Revision: D10342450 Pulled By: ezyang fbshipit-source-id: d50564bfcd8623a20b82b0052fba441c8358c17b	2018-10-15 15:45:13 -07:00
Sebastian Messmer	ef18f74e20	Simplify typeid macros (#12654 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12654 The previous diff managed to get the macros working, but they've been quite unmaintainable. This diff improves the situation a bit. - Before, there were three global variables for each registered type: type id, type name and a global type meta instance. Now, it's only type id and type meta, type name is gone. I also wanted to get rid of type id, but that doesn't work due to issues with static initialization ordering (type ids for types are requested during static initialization time, meh) - Instead of repeating the whole CAFFE_KNOWN_TYPE macro for GCC and non-GCC because they need different export flags, define it only once and use a EXPORT_IF_NOT_GCC macro. - The CAFFE_KNOWN_TYPE macro has to delegate to a _CAFFE_KNOWN_TYPE_DEFINE_TYPEMETADATA_INSTANCE macro, because of the counter. The pattern was copied for the macros for preallocated types. However, there we don't use a counter but use the preallocated id, so there's no need to delegate to a separate macro. Reviewed By: ezyang Differential Revision: D10379903 fbshipit-source-id: 50a32a5cb55ab85db49618a5f1ee4e8b06e0dfb2	2018-10-15 15:42:10 -07:00
Gregory Chanan	bb35d085ef	Dispatch backend-specific TensorOptions-based 'factory' functions via… (#12071 ) Summary: … Type. This allows one to write a cpu/cuda split 'factory' function that uses TensorOptions. Also move all remaining native_functions with either function or method variants that use Type to use TensorOptions. Thus, there are no more Types in the public function / method API. I believe there is a _lot_ of opportunity for cleanup here, as the old tensor, th_tensor, native_tensor and sparse variants can probably be removed, but let's do that in a follow-on patch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12071 Reviewed By: ezyang Differential Revision: D10041600 Pulled By: gchanan fbshipit-source-id: 30ebc17146d344bc3e32ccec7b98b391aac5470b	2018-10-15 15:21:11 -07:00
Zachary DeVito	86aa6a61e0	Dedup MethodValue and FunctionValue (#12589 ) Summary: ... they are basically the same class and I didn't see it in the initial PR. I also got resolvers back onto std::functions by keeping the function_table logic local to defineMethodInModules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12589 Differential Revision: D10383103 Pulled By: zdevito fbshipit-source-id: 1b0a85eb4f112bc28256cac44446d671d803d3a2	2018-10-15 15:00:54 -07:00
Zachary DeVito	71d142604f	Add upcoming features to schema parser (#12585 ) Summary: This commit adds the hooks in schema parser for futures, options, mutable alias sets, marking writes, and named output arguments that need to exist for other upcoming work. This also fixes that problem where you could not declare Lists of Lists. Implementation of most of these features is left NYI. This commit should avoid merge conflicts for these individual features on the schema parser. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12585 Differential Revision: D10382229 Pulled By: zdevito fbshipit-source-id: 41d794e58ca462cf3a389861c533c68944dc560b	2018-10-15 14:51:42 -07:00
Anders Papitto	4c21b2f2d3	split register_aten_ops.cpp into shards (#12615 ) Summary: after an analogous breakup of VariableType.cpp, the generated register_aten_ops.cpp is now the slowest-to-compile file in a typical incremental rebuild by a wide margin. Therefore, give it the same treatment - the generated code is split across several files to allow parallel compilation. Note that the existing code takes some care to arrange that overloads of the same op name are given in a particular order. This diff preserves that behavior, by treating all overloads of the same name as a single indivisible unit, and sharding based on these groups rather than on individual constructors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12615 Reviewed By: ezyang Differential Revision: D10367363 Pulled By: anderspapitto fbshipit-source-id: 07db5f9cb79748040909716349626412a13bc86e	2018-10-15 14:12:27 -07:00
Will Feng	c6f0fe5f26	CircleCI: Remove --depth from git fetch Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12657 Differential Revision: D10386020 Pulled By: yf225 fbshipit-source-id: 08d1c57159b323c19d5fc94180972d0c70d6aec1	2018-10-15 13:55:27 -07:00
Will Feng	6f339cac6b	Windows local dev: install conda in user-specific directory to avoid conflict (#12663 ) Summary: Currently when developing on the shared Windows debug machine, it's very easy to accidentally wipe out someone else's working binary because the conda environment is shared. This PR fixes that by always installing conda in the user's directory instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12663 Differential Revision: D10386130 Pulled By: yf225 fbshipit-source-id: 1242ef8b2b4239c4a96459a59eb0255b44ed9628	2018-10-15 13:46:12 -07:00
Benoit Steiner	bbe6ef3864	torch.finfo and torch.iinfo to mimic the numpy equivalent (#12472 ) Summary: This pull request intends to provide the functionality requested in https://github.com/pytorch/pytorch/issues/10742 by adding a new torch.finfo and torch.iinfo API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12472 Differential Revision: D10250829 Pulled By: benoitsteiner fbshipit-source-id: eb22ca55d5b0064bef381fa7f1eb75989977df30	2018-10-15 13:43:52 -07:00
Peter Goldsborough	e8d8ccb34a	Emphasize that the /path/to/libtorch must be absolute (#12660 ) Summary: ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/12660 Differential Revision: D10386952 Pulled By: goldsborough fbshipit-source-id: efd82f2aa3a349e9acd29303984b8fd7c3208c3f	2018-10-15 13:41:18 -07:00
Peter Goldsborough	a74cc03aa7	Use branch of exhale that fixes overloads (#12668 ) Summary: Docs for [`torch::jit::load`](https://pytorch.org/cppdocs/api/function_namespacetorch_1_1jit_1ace2c44fb8af5905ae17834e81086b8a3.html#exhale-function-namespacetorch-1-1jit-1ace2c44fb8af5905ae17834e81086b8a3) are currently broken. svenevs has a fix on this branch, and we need to update to it. soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12668 Differential Revision: D10386949 Pulled By: goldsborough fbshipit-source-id: 1887ba53989e5a77b178f8b2782a7b3ae52b7405	2018-10-15 13:39:01 -07:00
Yangqing Jia	713e706618	Move exception to C10 (#12354 ) Summary: There are still a few work to be done: - Move logging and unify AT_WARN with LOG(ERROR). - A few header files are still being plumbed through, need cleaning. - caffe2::EnforceNotMet aliasing is not done yet. - need to unify the macros. See c10/util/Exception.h This is mainly a codemod and not causing functional changes. If you find your job failing and trace back to this diff, usually it can be fixed by the following approaches: (1) add //caffe2/c10:c10 to your dependency (or transitive dependency). (2) change objects such as at::Error, at::Optional to the c10 namespace. (3) change functions to the c10 namespace. Especially, caffe2::MakeString is not overridden by the unified c10::str function. Nothing else changes. Please kindly consider not reverting this diff - it involves multiple rounds of rebasing and the fix is usually simple. Contact jiayq@ or AI Platform Dev for details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12354 Reviewed By: orionr Differential Revision: D10238910 Pulled By: Yangqing fbshipit-source-id: 7794d5bf2797ab0ca6ebaccaa2f7ebbd50ff8f32	2018-10-15 13:33:18 -07:00
Ansha Yu	aef8cadb9a	mark Storage functions as const (#12623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12623 Mark Storage functions as const so that they they can be exposed outside of TensorImpl when calling storage() Based on this discussion https://github.com/zdevito/ATen/issues/27#issuecomment-330717839 Also potentially useful in the effort to remove ShareExternalPointer Reviewed By: ezyang Differential Revision: D10370201 fbshipit-source-id: 43cf3803a4aa7b94fdf0c3a604d7db769ca0bdd5	2018-10-15 13:03:28 -07:00
Evan Klitzke	189c1e1afb	Rewrite http://pytorch.org -> https://pytorch.org throughout project (#12636 ) Summary: The pytorch.org site redirects all of the http:// requests to the https:// site anyway, so the comments and error messages might as well refer directly to the https:// site. The GitHub project description should also be updated to point to https://pytorch.org Pull Request resolved: https://github.com/pytorch/pytorch/pull/12636 Differential Revision: D10377099 Pulled By: soumith fbshipit-source-id: f47eaba1dd3eecc5dbe62afaf7022573dc3fd039	2018-10-15 13:03:27 -07:00
Bram Wasti	a6c7cf8741	python bindings: enable generic nn operator handling Summary: hotfix to unblock @[100000295380748:Dong Shi] Reviewed By: duc0 Differential Revision: D10385763 fbshipit-source-id: 80badd31c1039a245f32940c719e867a86ec7e47	2018-10-15 12:55:42 -07:00
vishwakftw	0740a5d521	compute_uv for SVD (#12517 ) Summary: Adds a `compute_uv` argument that defaults to `True` for optionally computing the singular vectors during SVD. Closes https://github.com/pytorch/pytorch/issues/12420 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/12517 Differential Revision: D10384554 Pulled By: SsnL fbshipit-source-id: 704998a257afa815eda901b8ae830e8a661695be	2018-10-15 12:35:56 -07:00
ArmenAg	d5eae90537	update onnx tests (#12619 ) Summary: Fixes #12586 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12619 Reviewed By: ezyang Differential Revision: D10377548 Pulled By: houseroad fbshipit-source-id: 1166e40aa8b98f1fe015fb1bdb2e90acfad3c356	2018-10-15 11:59:19 -07:00
Yinghai Lu	d17b0bc679	Allow running root tasks inline (#12289 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12289 When we have a all sync net, the chaining algorithm will generate one single group. And we want to just run that in the serving thread instead of scheduling it onto the worker queue. This will closely mimic the behavior of simple net and gives us the expected performance. Reviewed By: ilia-cher Differential Revision: D10174323 fbshipit-source-id: 1dae11a478936634f8ef1e4aa43d7884d6362e52	2018-10-15 11:14:12 -07:00
mratsim	a1bbe80e21	Remove NervanaGPU operators from Caffe2 (#12564 ) Summary: Fix #12540 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12564 Reviewed By: orionr Differential Revision: D10379775 Pulled By: soumith fbshipit-source-id: a925b116f2687e56bf54465fc02ca2eb1e7c8eb0	2018-10-15 11:04:46 -07:00
Will Feng	151b28521a	Fix Windows test script on local dev machine (#12073 ) Summary: We should not clean up Miniconda environment when the user is running `win-test.sh` locally. This would help reproduce #11527 locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12073 Differential Revision: D10053497 Pulled By: yf225 fbshipit-source-id: 11027500e7917a7cb79270c811379e11dbbb6476	2018-10-15 09:36:50 -07:00
Gregory Chanan	7326739188	Remove out-of-date TODO. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12638 Differential Revision: D10376584 Pulled By: gchanan fbshipit-source-id: 47fb0333cd9e41a66c2e215f91e129fe19dc9225	2018-10-15 08:45:59 -07:00
Edward Yang	07d67aa17a	Make TensorOptions immutable. (#12630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12630 Instead of providing mutable accessors, our "mutators" now return new copies of TensorOptions. Since TensorOptions is simply two 64-bit integers, this is not a big efficiency problem. There may be some sites that assumed that TensorOptions was mutable. They need to be fixed. Reviewed By: SsnL Differential Revision: D10249293 fbshipit-source-id: b3d17acc37e78c0b90ea2c29515de5dd01209bd3	2018-10-15 08:30:16 -07:00
Soumith Chintala	1014c8a7db	'Re-sync with internal repository' (#12652 )	2018-10-15 10:57:10 -04:00
Xingdong Zuo	6dd71947ea	remove unused Iterable, also avoid Python 3.7 deprecation warning Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12639 Differential Revision: D10377094 Pulled By: soumith fbshipit-source-id: d904c4c1bbac900e44ea0b3b5635697159aec717	2018-10-15 02:30:22 -07:00
Andrey Malevich	eaf33f22c8	Revert D10123465: Set the correct engine name for position weighted pooling when fp16 is used for training Differential Revision: D10123465 Original commit changeset: e8d929d4153d fbshipit-source-id: 36269e49ac79955fe695ac1a53a3c386aa2f5bec	2018-10-15 01:53:48 -07:00
Mingfei Ma	02695c11db	fix masked_fill_ bug on non-contiguous tensor (#12594 ) Summary: bug fix on #12230 , the following script pass after the fix. ```python x = torch.randn(2, 2, 2) x = x.permute((2, 0, 1)) y = x.clone() y.masked_fill_(y > 0, 1) x.masked_fill_(x > 0, 1) print((x == y).all()) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12594 Differential Revision: D10377088 Pulled By: soumith fbshipit-source-id: 88feabe1459d325bfdf9a860412ddbd28686a28b	2018-10-14 23:12:27 -07:00
Edward Yang	0c6ab0e8f4	Delete caffe2/mkl, and references. (#12625 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12625 It's obsoleted by ideep Reviewed By: Yangqing Differential Revision: D10372230 fbshipit-source-id: 2d6475ae72389dd654ba0bcbb57766530eb4ac1a	2018-10-13 22:02:32 -07:00
Natalia Gimelshein	a98958d3bd	dtype option for softmax (#11719 ) Summary: Add dtype argument to softmax/log_softmax functions. Computing softmax in fp32 precision is necessary for mixed precision training, and converting output of the previous layer into fp32 and then reading it as fp32 in softmax is expensive, memory and perf-wise, this PR allows one to avoid it. For most input data/dtype combinations, input data is converted to dtype and then softmax is computed. If input data is half type and dtype is fp32, kernels with the corresponding template arguments are called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11719 Reviewed By: ezyang Differential Revision: D10175514 Pulled By: zou3519 fbshipit-source-id: 06d285af91a0b659932236d41ad63b787eeed243	2018-10-13 17:57:10 -07:00
Adam J. Stewart	e986f307c3	Fix math formatting of PairwiseDistance docs (#12628 ) Summary: `:math:` was being displayed in the docs for https://pytorch.org/docs/stable/nn.html#torch.nn.PairwiseDistance. I haven't tested this locally, but I assume it works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12628 Differential Revision: D10373778 Pulled By: SsnL fbshipit-source-id: 6eb918c521e73c17f6662d83f69e0e4b14dec860	2018-10-13 16:39:15 -07:00
Peter Goldsborough	a91f3338a0	Some documentation fixes (#12521 ) Summary: ezyang soumith Partly addresses https://github.com/pytorch/cppdocs/issues/2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12521 Differential Revision: D10374244 Pulled By: goldsborough fbshipit-source-id: 8e9fe688cbaa2d2b0b96f721e5477ee8845b8f20	2018-10-13 14:20:42 -07:00
James Reed	1f94ce1f97	Fix aten::to export in ONNX Summary: D10356994 broke ONNX export for casting, this fixes it Reviewed By: wanchaol Differential Revision: D10366103 Pulled By: jamesr66a fbshipit-source-id: 039454cce571a1186265708e7ddcb946814cc8b0	2018-10-12 21:20:01 -07:00
Jiyan Yang	635cbff300	Set the correct engine name for position weighted pooling when fp16 is used for training Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12225 Reviewed By: hyuen, xianjiec Differential Revision: D10123465 fbshipit-source-id: e8d929d4153d1ee987ae3d1c37892525d7574d16	2018-10-12 20:15:13 -07:00
onnxbot	6bc8d303eb	Update onnx to onnx/onnx@06f6d63 (#12621 ) Summary: `06f6d63d55` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12621 Differential Revision: D10368472 Pulled By: bddppq fbshipit-source-id: b62fbbc0ad5bc41c5e7221ba889b1061087c3214	2018-10-12 17:25:20 -07:00
Ilia Cherniavskii	63a220f54d	Deprecate prof_dag (#11956 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11956 Deprecate prof_dag and redirect it to the unified executor Reviewed By: aazzolini Differential Revision: D9983992 fbshipit-source-id: 16821628a99a5683dc39cbb345ddab56e9d8721c	2018-10-12 16:37:57 -07:00
Brian W. Hart	53f4dbc9ac	test_proper_exit: avoid truncation of info message (#12612 ) Summary: test_proper_exit in the dataloader test bucket includes (as its docstring) a reassuring message about complaints that may appear during the test. The message is displayed when the tests are run in verbose mode. But the docstring includes a line break, and the unittest framework only prints the first line of the docstring (see shortDesription()). As a result, the 2nd (more reassuring) half of the message is not displayed. Catenate the docstring onto a single line so all is visible. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12612 Differential Revision: D10368786 Pulled By: ezyang fbshipit-source-id: 14b259a6d6a3491d4290148eae56e6ab06f2a9b6	2018-10-12 16:32:28 -07:00
Hector Yuen	17ab3bd502	implement rowwise quantization for fp16 (#12382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12382 implement fp16-> (uint8 + scale and bias in fp32) this is similar to fp32 rowwise quantization we could have done scale and bias in fp16 but not too motivated since we are not saving much and those datatypes have to be converted to fp32 to process since x86 doesn't support half float operations anyways Reviewed By: csummersea Differential Revision: D10220463 fbshipit-source-id: 6c382026de881f03798c2e5fc43abfc80f84ea1f	2018-10-12 13:57:55 -07:00
Alex Ford	7a1b668283	Implement Tensor.__cuda_array_interface__. (#11984 ) Summary: _Implements pytorch/pytorch#11914, cc: ezyang_ Implements `__cuda_array_interface__` for non-sparse cuda tensors, providing compatibility with numba (and other cuda projects...). Adds `numba` installation to the `xenial-cuda9` jenkins test environments via direct installation in `.jenkins/pytorch/test.sh` and numba-oriented test suite in `test/test_numba_integration.py`. See interface reference at: https://numba.pydata.org/numba-doc/latest/cuda/cuda_array_interface.html Pull Request resolved: https://github.com/pytorch/pytorch/pull/11984 Differential Revision: D10361430 Pulled By: ezyang fbshipit-source-id: 6e7742a7ae4e8d5f534afd794ab6f54f67808b63	2018-10-12 13:41:05 -07:00
Natalia Gimelshein	134b5d62e8	don't copy weight gradients in rnn (#12600 ) Summary: This PR gets rid of unnecessary copy of weight gradients in cudnn rnn. Also removes unnecessary check for input size when deciding whether to use persistent rnn, and adds doc string explaining when persistent rnn can be used. cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12600 Differential Revision: D10359981 Pulled By: soumith fbshipit-source-id: 0fce11b527d543fabf21e6e9213fb2879853d7fb	2018-10-12 13:34:10 -07:00
Anders Papitto	49256ddb4a	split generated VariableType.cpp (#12493 ) Summary: On my devgpu, this brings the time taken for `touch torch/csrc/jit/type.h && time python setup.py rebuild develop` (debug mode, multicore build) down from 75 seconds to 62 seconds. For the `ninja install` of libtorch portion, which this affects, the reduction is from 52 seconds to 35. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12493 Reviewed By: zdevito Differential Revision: D10315988 Pulled By: anderspapitto fbshipit-source-id: 316dc4ab81134aaa17a568cfc07408b7ced08c2e	2018-10-12 13:14:44 -07:00
Lu Fang	3f52a0aad7	Fix the linter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12613 Differential Revision: D10364963 Pulled By: houseroad fbshipit-source-id: f9e2a76c1ab021cce4f45f5b4e74ddcc9618c138	2018-10-12 13:12:08 -07:00
103yiran	239b2ac718	make the variable declaration closer to usage Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9262 Differential Revision: D10363576 Pulled By: ezyang fbshipit-source-id: 05c8eb12f3b389caf562cca9e338cc91b0e9acc1	2018-10-12 12:07:08 -07:00
Anders Papitto	15bdb9fe61	remove duplicate BUILD_TEST flag in libtorch cmake file (#12583 ) Summary: there is already a BUILD_TEST flag in the root-level cmake file. Removing this makes sure it doesn't interfere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12583 Differential Revision: D10348620 Pulled By: anderspapitto fbshipit-source-id: 3957783b947183e76a4479a740508c0dc1c56930	2018-10-12 11:53:07 -07:00
dzung-hoang	7da4643232	Caffe2: fix error C2398 and syntax error with Visual Studio 2015 (#10089 ) Summary: Similar fix to [pull #7024](https://github.com/pytorch/pytorch/pull/7024). Pull Request resolved: https://github.com/pytorch/pytorch/pull/10089 Differential Revision: D10363341 Pulled By: ezyang fbshipit-source-id: bc9160e2ea75fc77acf3afe9a4e20f327469592e	2018-10-12 11:47:34 -07:00
Lu Fang	c1d0784dcb	enable onnx integration tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12592 Reviewed By: BIT-silence, zrphercule Differential Revision: D10363056 Pulled By: houseroad fbshipit-source-id: 4d1dc0302a8cbe3d6ff1594f0d038330ba4efc81	2018-10-12 11:34:16 -07:00
Xiang Gao	97eec33f80	Allow tensor.device, tensor.dtype, and tensor.shape in JIT (#12363 ) Summary: Closes https://github.com/pytorch/pytorch/issues/12364 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12363 Differential Revision: D10362491 Pulled By: ezyang fbshipit-source-id: f2716e656977370c5ec51cb15f62b6376798e617	2018-10-12 11:29:04 -07:00
Ailing Zhang	5317429e82	move bceWithLogits from python to Aten (#11054 ) Summary: Fixes #10648 . Perf comparison: ``` import torch import torch.nn as nn import time def bm(testsize, repeat=100, cuda=False): total_time = 0.0 pos_weight= torch.ones(testsize[1], device='cuda' if cuda else 'cpu') / testsize[1] # loss = nn.BCEWithLogitsLoss(pos_weight=pos_weight) loss = nn.BCEWithLogitsLoss() input = torch.randn(testsize, device='cuda' if cuda else 'cpu').clamp_(2.8e-2, 1 - 2.8e-2) target = torch.randn(testsize, device='cuda' if cuda else 'cpu').gt(0).float() input.requires_grad = True target.requires_grad = True for _ in range(repeat): start = time.time() l = loss(input, target) l.backward() # print(target.grad) end = time.time() total_time += end - start return total_time for cuda in [False, True]: for testsize in [(100, 100), (1000, 1000), (2000, 2000)]: # print(testsize, cuda) print('{:.5f}'.format(bm(testsize, cuda=cuda))) ``` \| \| Python CPU \| Aten CPU \| Python GPU \| Aten GPU \| ------------- \| ------------- \| ------------- \| ------------- \| ------------- \| \| (100, 100) \| 0.15813s \| 0.10890s \| 0.14601s \| 0.07070s \| \| (1000, 1000) \| 1.74051s \| 0.95038s \| 0.15158s \| 0.10153s \| \| (2000, 2000) \| 5.36515s \| 2.46996s \| 0.31322s \| 0.200941s \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/11054 Differential Revision: D9728289 Pulled By: ailzhang fbshipit-source-id: b7c5bc50635f8cc63c317caa4321e32f7df860f8	2018-10-12 11:13:33 -07:00
Tongzhou Wang	6069f6f454	Try to prevent occasional timeout in test_proper_exit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12587 Differential Revision: D10361411 Pulled By: SsnL fbshipit-source-id: 97d0ff9d40918b7729c21f4de6d8cabeb65c728a	2018-10-12 10:53:01 -07:00
Aidan Cully	12686ec656	fix _AllReduce not applying the DeviceScope guard to model.Copy operations. (#12342 ) Summary: This resolves an issue where the `model.Copy` operation would copy to the wrong GPU, such that the below `net.Sum` operation would use an input argument for which p2p access was not enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12342 Differential Revision: D10343181 Pulled By: ezyang fbshipit-source-id: fd2d6d0ec6c09cda2db0a9a4f8086b3560e5a3ec	2018-10-12 10:47:58 -07:00
103yiran	dfad8b60ba	Remove duplicate codes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12526 Differential Revision: D10342611 Pulled By: ezyang fbshipit-source-id: 470b4a181fd9091c3fd33d3d43a2cf6d44594202	2018-10-12 09:58:44 -07:00
François-Michel De Rainville	038d5ca943	Remove incompatibility MSVC, Cuda and Debug (#12572 ) Summary: Experimentally this works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12572 Differential Revision: D10342468 Pulled By: ezyang fbshipit-source-id: dc36587c32ab0910aa14b7351ca12532acd41c7d	2018-10-12 09:52:13 -07:00
Sebastian Messmer	63e09707a2	Use SFINAE instead of macros for 'long' hack (#12605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12605 Some compilers define 'long' as a separate type from 'int32_t' or 'int64_t', some don't. Before, we had a cmake check setting a macro and depending on the macro, we either defined a separate type id for long or didn't. Then, we removed the cmake check and used compiler detection macros directly. This is, however, error prone. This new approach uses SFINAE to register a type id for 'long' only if it's a separate type. Reviewed By: Prowindy Differential Revision: D10359443 fbshipit-source-id: aa371cbb43658c8cd3664ba3d9b0dedbaa225c1d	2018-10-12 09:46:07 -07:00
Philip Yang	b57fdf1db5	Properly set cmake python library and include_dirs (#12569 ) Summary: Properly set cmake python_library and include_dirs hints, so that systems with multiple version of python can still find the correct libraries and header files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12569 Differential Revision: D10359910 Pulled By: soumith fbshipit-source-id: 2238dcbed7aac8a818c9435e6bba46cda5f81cad	2018-10-12 08:11:21 -07:00
vishwakftw	48bc57fa8d	Introduce chain_matmul (#12380 ) Summary: - This was one of the few functions left out from the list of functions in NumPy's `linalg` module - `multi_mm` is particularly useful for DL research, for quick analysis of deep linear networks - Added tests and doc string Pull Request resolved: https://github.com/pytorch/pytorch/pull/12380 Differential Revision: D10357136 Pulled By: SsnL fbshipit-source-id: 52b44fa18d6409bdeb76cbbb164fe4e88224458e	2018-10-12 03:58:12 -07:00
Thomas Viehmann	0cf3c1ce66	Add copy= keyword to Tensor.to (#12571 ) Summary: Fixes: #12454 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12571 Differential Revision: D10356994 Pulled By: SsnL fbshipit-source-id: d87416078a5a8e5ffa690cd73c09fa6b4e16aa25	2018-10-12 02:10:44 -07:00
James Reed	2279299c6c	Implement aten::contiguous (#12541 ) Summary: Implement contiguous as `aten::contiguous` so it can be recorded during tracing. This was causing issues with both the trace checker as well as when a `contiguous()`-ed tensor was used downstream in a view that expected certain strides Pull Request resolved: https://github.com/pytorch/pytorch/pull/12541 Differential Revision: D10304028 Pulled By: jamesr66a fbshipit-source-id: dc4c878771d052f5a0e9674f610fdec3c6782c41	2018-10-11 23:39:39 -07:00
Edward Yang	1be8b7cc56	Delete "default" codeowners from root directories. (#12584 ) Summary: We will still have an informal notion of codeowner, but it is not necessary to get a review from these people in particular for these directories. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12584 Differential Revision: D10348999 Pulled By: ezyang fbshipit-source-id: 97331ec4bab9f1aa02af82b71ad525a44ad1e7fe	2018-10-11 23:18:04 -07:00
Junjie Bai	0df4d66210	Update caffe2 docker images version in circleci (#12596 ) Summary: `72b6d26950` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12596 Differential Revision: D10355881 Pulled By: bddppq fbshipit-source-id: 33c15819ec51315defc23a7fbc23caa2ddd65e75	2018-10-11 21:54:33 -07:00
Giuseppe Ottaviano	fa99ed9b30	Emit warning about optimization passes only once Reviewed By: ajtulloch Differential Revision: D9584925 fbshipit-source-id: 191035eaefe3ab3980e46598f2ebf34b2b704a9b	2018-10-11 21:41:17 -07:00
Lu Fang	01cb90adf1	fix the ONNX test_operator test (#12591 ) Summary: update the expect file Pull Request resolved: https://github.com/pytorch/pytorch/pull/12591 Differential Revision: D10355620 Pulled By: houseroad fbshipit-source-id: 5acdbf2406d322378025631808108a2d795be916	2018-10-11 21:41:15 -07:00
David Riazati	eb5fdc5fb5	Add default values in script (#12345 ) Summary: Add support for default values on script functions and Modules Followup to #11962 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12345 Reviewed By: michaelsuo Differential Revision: D10263613 Pulled By: driazati fbshipit-source-id: 9b380d8c3f8c4abb2d24c33b23c00ec5896ca372	2018-10-11 20:49:23 -07:00
Syed Tousif Ahmed	97bee5cd80	Adds max plan number for CUDA 10 cufft plan cache array (#12553 ) Summary: SsnL As per your review in https://github.com/pytorch/pytorch/pull/12017/, I added a max plan number for CUDA 10 path. Our internal cuFFT team couldn't suggest a number since the limit depends on host/device memory. That is, a plan allocates some buffers on the device and also creates objects for the plans on the host side. I raised this number to 4x arbitrarily per you suggestion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12553 Differential Revision: D10320832 Pulled By: SsnL fbshipit-source-id: 3148d45cd280dffb2039756e2f6a74fbc7aa086d	2018-10-11 19:36:25 -07:00
Johannes M Dieterich	957142a4fe	switch ROCm CI targets to white rabbit release (#12577 ) Summary: * switches docker files over to white rabbit release - removed custom package installs * skips five tests that regressed in that release * fixes some case-sensitivity issues in ROCm supplied cmake files by sed'ing them in the docker * includes first changes to the infrastructure to support upcoming hip-clang compiler * prints ROCm library versions as part of the build (as discussed w/ ezyang ) * explicitly searches for miopengemm * installs the new hip-thrust package to be able to remove the explicit Thrust checkout in a future revision Pull Request resolved: https://github.com/pytorch/pytorch/pull/12577 Differential Revision: D10350165 Pulled By: bddppq fbshipit-source-id: 60f9c9caf04a48cfa90f4c37e242d944a175ab31	2018-10-11 18:03:11 -07:00
Xiaolong Wang	93a4b76114	Enable alternative LayerNorm impl in FisherGan (#12178 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12178 Fisher GAN calls processor_util.add_mlp, which inject the layer norm through the normalizer. We allow to use alternative impl for LayerNorn in the normalizer. Reviewed By: Wakeupbuddy Differential Revision: D9235528 fbshipit-source-id: 88c126c658102926613242ef84a481f6de1676ed	2018-10-11 17:36:11 -07:00
Xiaolong Wang	8ac8b823c2	Allow use substitute ops for LayerNorm (#12177 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12177 as titled Reviewed By: Wakeupbuddy Differential Revision: D9218047 fbshipit-source-id: 8d68861472c99d587e678c3d76ac43abc9c8fe6d	2018-10-11 17:36:10 -07:00
Cong Chen	d9eff40546	Revert D10209620: Use SFINAE instead of macros for 'long' hack Differential Revision: D10209620 Original commit changeset: 68f09339e279 fbshipit-source-id: e33927e92e34efc40917d97cd8ba80996a875dff	2018-10-11 16:50:09 -07:00
Junjie Bai	5973312abc	Add clang 6 docker images Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12581 Differential Revision: D10349785 Pulled By: bddppq fbshipit-source-id: 638641d369be0898dd6232737ebaa9d9a8c2e557	2018-10-11 16:48:13 -07:00
Richard Zou	a1487bf874	Smarter differentiable subgraph slicing (#12175 ) Summary: If any inputs require_grad then the graph executor does differential subgraph slicing. The existing algorithm combines adjacent differentiable Node. There are two major motivations. The first is improving fusion opportunities: the graph fusion pass runs after differential subgraph slicing. This means that only nodes that are a part of the same differential subgraph may be considered for fusion. If something like the following happens, ``` y = f(x) k = not_differentiable_op(m) z = g(y) ``` and f and g are both fusible and differentiable operations, then they will be inserted into different differential subgraphs and not fused together. The second is to enable JIT optimizations on backward passes for things like an (automatically) unrolled LSTM. Right now, in an unrolled LSTM, we see something like the following: ``` lstm_cell() non_differentiable_list_op() lstm_cell() non_differentiable_list_op() lstm_cell() non_differentiable_list_op() ``` Each lstm_cell itself is differentiable and gets put into a separate differential subgraph. During the backwards pass, each prim::DifferentiableSubgraph has its own graph executor: these graph executors cannot talk to each other. It is better if we combined all of the lstm_cells (where applicable) into one differential subgraph so their backward passes are combined into one graph executor that can perform better optimizations than several separate graph executors. Think about the computation graph as a DAG where edges are data dependencies and vertices are operations (the nodes). Each vertex is either black or red; a vertex is colored black if it is differentiable and red otherwise. The goal is to contract edges (merge nodes) to have the fewest black vertices remaining such that the graph is still a DAG. The algorithm is the following: - Take the Graph& and create a shadow "DynamicDAG" object to wrap Node and edges. Each Vertex holds multiple Node* (but starts out holding one Node) and each edge is a data dependency. - Greedily contract vertices in the DynamicDAG if they are "differentiable". This operation is unrelated to the Graph&. - A Vertex is "differentiable" if all the nodes it holds is differentiable. - When contracting vertices, combine their Node contents. - The DynamicDAG keeps its vertices in topological order and complains if the contraction is invalid so everything is good. - Take the DynamicDAG: reorder the nodes in the Graph& to match the topological order in the DynamicDAG. - Finally, go through each Vertex in the DynamicDAG: if it contains multiple Node* then merge all of them into a prim::DifferentiableGraph. The DynamicDAG is based off of the dynamic top sort algorithm in [this paper](https://www.doc.ic.ac.uk/~phjk/Publications/DynamicTopoSortAlg-JEA-07.pdf) by Pearce and Kelly. Each contractEdge(producer, consumer) call is `O(\|AR\| log \|AR\| * min(\|out_edges(producer)\|, \|in_edges(consumer)\|)` where `AR` is the "affected region" (defined as the set of nodes that, in topological order, are between producer and consumer). By only considering contractions such that `\|ord(producer) - ord(consumer)\| < threshold1` and `\|out_edges(producer)\| < threshold2` we can make each contractEdge(producer, consumer) call take constant time. The resulting algorithm is linear in the number of nodes. Added a lot of small test cases. Looking for suggestions on the following: - what big computation graphs should I run this on to test how fast or slow it is? - what things other than correctness should I be thinking about when I test this? cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/12175 Differential Revision: D10302564 Pulled By: zou3519 fbshipit-source-id: 8a94d130d82f8a1713cc28483afef9a72d83d61a	2018-10-11 16:20:53 -07:00
Yinghai Lu	0ee2e7c398	Relax the locking of running_mutex_ in async_scheduling net (#12544 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12544 `running_mutex_` inside async_scheduling net is used to guard access to the `running_` variable. So we don't need to acquire that lock when we are actually running the net. This will help us prevent potential double locking situation when we decide to run the root nodes inline. Reviewed By: ilia-cher Differential Revision: D10304745 fbshipit-source-id: 5f701b2c22b06ff5bee7f2c37ac634326748f579	2018-10-11 16:00:54 -07:00
James Reed	0f9807ee61	Enable addmm fusion for ONNX export only (#12538 ) Summary: There's some action at a distance issues and not having this is disabling quantization in C2 for prod use cases ref T34831022 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12538 Differential Revision: D10302931 Pulled By: jamesr66a fbshipit-source-id: 700dc8c5c4297e942171992266ffb67b815be754	2018-10-11 13:57:50 -07:00
Orion Reblitz-Richardson	7b0f5d6631	Support USE_CUDNN for Windows (#12518 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/12495 cc peterjc123 mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12518 Reviewed By: mingzhe09088 Differential Revision: D10338792 Pulled By: orionr fbshipit-source-id: b465c42ea6d5fe9dbc2a4e1f973d952365d0af07	2018-10-11 13:53:27 -07:00
wuhuikx	033e00cd3f	Fix bug in caffe_translator tool (#10056 ) Summary: 1. Fix BN translator IntelCaffe and NVCaffe fuse BN+Scale, and the "BatchNorm" op contains 5 params including (scale and bias) 2. Fix Scale translator the translated outputs of scale have the same names with those of Conv. All their names are output + '_w' and output + '_b' Pull Request resolved: https://github.com/pytorch/pytorch/pull/10056 Differential Revision: D10099205 Pulled By: yinghai fbshipit-source-id: 73a73868e3e16c495e8b233fdb1d373d556a9537	2018-10-11 13:13:12 -07:00
Wei Wen	666bebc7d2	adapting caffe2 operator docs generator to pytorch url Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10801 Differential Revision: D9472991 Pulled By: ezyang fbshipit-source-id: 1b8ba77b8255b7e900b6528bd93b3b870f9ba0d4	2018-10-11 12:55:06 -07:00
Will Feng	eef083e477	CircleCI: add timestamp to build log, clean up unused jobs, print docker image name Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12556 Differential Revision: D10343032 Pulled By: yf225 fbshipit-source-id: fd2dcba18a5cb037fdc448dba64bf9d747dc3761	2018-10-11 12:23:42 -07:00
James Reed	a4120fa132	Get rid of emitApplyIdent (#12504 ) Summary: And reroute builtin/CompilationUnit function resolution through one resolution pathway Pull Request resolved: https://github.com/pytorch/pytorch/pull/12504 Differential Revision: D10319920 Pulled By: jamesr66a fbshipit-source-id: 3ab9877664dd32b97136a7625d0688e1adc0c022	2018-10-11 10:53:53 -07:00
Junjie Bai	8482ea8774	Update develop install command in onnx scripts Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12561 Differential Revision: D10340194 Pulled By: bddppq fbshipit-source-id: 10fb7261028d56f73111e2ca39d4eb2ab930812a	2018-10-11 10:38:52 -07:00
Orion Reblitz-Richardson	cee19eb31c	Back out "[c10][NFCI] Move jit/type, function_schema, and utils/functional to ATen/core" (#12568 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12568 Second attempt at D10324615 Original commit changeset: b71eeec98dfe Original commit changeset #2: 1af6400ae0c1 Reviewed By: bwasti Differential Revision: D10338168 fbshipit-source-id: 04cb443a89a9cd1a174df6d5ac1a86c3d423d56b	2018-10-11 09:53:40 -07:00
Laura Gustafson	7acb145893	Fixed print issue for TensorTypeId (#12402 ) Summary: Fixed printing issue for TensorTypeID. It used to print a hex of the ID, e.g. /x1 Now it prints the ID as a string, e.g. 1 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12402 Reviewed By: ezyang Differential Revision: D10224026 Pulled By: lauragustafson fbshipit-source-id: a9ca841d08c546fccbb948a17f06a29fea66f3fb	2018-10-11 08:23:32 -07:00
Andrey Malevich	229397b439	Revert D10324615: [pytorch][PR] Revert #12466 and #12467 to fix JIT test error on Windows CI Differential Revision: D10324615 Original commit changeset: 12e5fc73da42 fbshipit-source-id: 710c5f3b7a4fe56799ae31a86359b2085b7e741d	2018-10-11 03:39:14 -07:00
Sergei Nikolaev	1c7832c854	CUDA 10 warnings fixed (#12442 ) Summary: Deprecation warning against `cudaPointerAttributes`, where `memoryType` field has been deprecated in favor of `type`, see https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__UNIFIED.html#contents-end for details Pull Request resolved: https://github.com/pytorch/pytorch/pull/12442 Differential Revision: D10251239 Pulled By: zou3519 fbshipit-source-id: 500f1e02aa8e11c510475953ef5244d5fb13bf9e	2018-10-11 00:25:22 -07:00
harrysummer	234e6b3797	Bugfix in onnx exporter (#10607 ) Summary: Incorrect processing for int and float arguments. Possibly a typo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10607 Differential Revision: D9376040 Pulled By: bddppq fbshipit-source-id: e3665e7bbb26842d1d7eed50442993cfdbf55a80	2018-10-11 00:25:20 -07:00
Will Feng	1f7cbea984	Revert #12466 and #12467 to fix JIT test error on Windows CI (#12557 ) Summary: Sample error log: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-test2/11766/console Pull Request resolved: https://github.com/pytorch/pytorch/pull/12557 Differential Revision: D10324615 Pulled By: yf225 fbshipit-source-id: 12e5fc73da42ffa22e39250aee9ea072fd2e33de	2018-10-10 23:56:56 -07:00
tomguluson92	170d84228e	Delete redundant statement of `col2im` (#12514 ) Summary: Hi, I found that there was two statement of `col2im` in `im2col.h` and think the former one may be redundant. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12514 Differential Revision: D10328721 Pulled By: ezyang fbshipit-source-id: d225547848803511c7cc58bd9df1cc6832a537fb	2018-10-10 23:56:54 -07:00
sclarkson	2b033332c8	Allow linking to backwards-compatible cuDNN at runtime (#12239 ) Summary: Fixes #12193 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12239 Differential Revision: D10321744 Pulled By: soumith fbshipit-source-id: bf437f7f9b6231158a1585d2dabae8d937396478	2018-10-10 23:56:51 -07:00
Ailing Zhang	8734b174ca	Multinomial raise error (#12490 ) Summary: Fixes #12260 #2896 ``` torch.multinomial(torch.FloatTensor([0, 1, 0, 0]), 3, replacement=False) ``` The old behavior is that we return `0` after we run out of postive categories. Now we raise an error based on discussion in the issue thread. - Add testcase for cpu & cuda case, in cuda case `n_samples=1` is a simple special case, so we test against `n_sample=2` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12490 Differential Revision: D10278794 Pulled By: ailzhang fbshipit-source-id: d04de7a60f60d0c0d648b975db3f3961fcf42db1	2018-10-10 20:39:04 -07:00
Jerry Zhang	b89a3b50fb	Remove StaticContext (#12547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12547 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12305 Remove StaticContext from context_base.h Reviewed By: dzhulgakov Differential Revision: D10073519 fbshipit-source-id: 350beec3c54365edef338318ce58229ccb825a98	2018-10-10 19:41:03 -07:00
Will Feng	c32839fc90	CircleCI: better credentials visibility (#12552 ) Summary: We will rotate the credentials if the new setting works. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12552 Differential Revision: D10322121 Pulled By: yf225 fbshipit-source-id: 158f2f89b83a751566a912869a4400d5be6e5765	2018-10-10 18:25:09 -07:00
Junjie Bai	89010d60f9	Migrate HIP to use DeviceOption.device_id and delete DeviceOption.hip_gpu_id Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12546 Reviewed By: hyuen, xw285cornell Differential Revision: D10305222 fbshipit-source-id: 955e1d2878508a25fe4e9980ae66f8f54aaf7db9	2018-10-10 18:25:06 -07:00
Orion Reblitz-Richardson	25bd7fe488	Add USE_FFMPEG flag for setup.py and R2Plus1D (#12543 ) Summary: Needed for https://github.com/facebookresearch/R2Plus1D/pull/46 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12543 Differential Revision: D10320147 Pulled By: orionr fbshipit-source-id: a7dcbf7c0d4b405b9e89b28ef75a0ed1cf2a3e6a	2018-10-10 18:09:48 -07:00
Dong Shi	da3dd9af12	No Op Optimizer (#12390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12390 Introduce a no op optimizer for when we don't want updates to happen, but don't want to affect downstream processes. Reviewed By: mlappelbaum Differential Revision: D10209812 fbshipit-source-id: 2af4ebc0fb42e78ea851c3a9f4860f3d224037b6	2018-10-10 18:09:46 -07:00
Marat Dukhan	8399778049	Update FP16 submodule (#12554 ) Summary: Pull a patch that fixes remaining incompatibility with Microsoft compiler on Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/12554 Differential Revision: D10319736 Pulled By: Maratyszcza fbshipit-source-id: bcd88581df48f2678ef81e095f947391104f24d5	2018-10-10 17:25:17 -07:00
Syed Tousif Ahmed	543048d275	Adds launch bounds for CTC loss kernel (#12379 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/12324 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12379 Differential Revision: D10318361 Pulled By: ezyang fbshipit-source-id: aec4ae8205e780b18560d639543ed9d0ef0527ce	2018-10-10 17:09:38 -07:00
Jerry Zhang	7724807551	Remove ExtractDeviceOption from StaticContext (#12304 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12304 - make ExtractDeviceOption to be a free function. - Add a Strorage(at::Device) constructor in order to preserve the device_id. Reviewed By: dzhulgakov Differential Revision: D10069839 fbshipit-source-id: a5f3994a39bdf1b7503b39bb42c228e438b52bfa	2018-10-10 14:12:16 -07:00
Giovanni	0d50c117db	Introduce BUILD_ATEN_ONLY cmake option (#12443 ) Summary: Following up #11488 conversation with orionr And our brief conversation at PTDC about ATen with soumith and apaszke This PR enables a very slim build focused on ATen particularly without caffe2 and protobuf among other dependencies. WIth this PR NimTorch tests pass fully, including AD, convolutions, wasm, etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12443 Reviewed By: mingzhe09088 Differential Revision: D10249313 Pulled By: orionr fbshipit-source-id: 4f50503f08b79f59e7717fca2b4a1f420d908707	2018-10-10 12:54:19 -07:00
Will Feng	a442853f4f	CircleCI: try to fix submodule not found error (#12542 ) Summary: Try to fix the "submodule not found" infra error: https://circleci.com/gh/pytorch/pytorch/48431 by switching to use the official git client (instead of CircleCI's default git client). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12542 Differential Revision: D10305027 Pulled By: yf225 fbshipit-source-id: 42db0694efb468d9460ef51d7b4b2bd90d78ff24	2018-10-10 12:54:17 -07:00
Marat Dukhan	b51901f7d3	Update FP16 submodule (#12539 ) Summary: Pull a patch that makes FP16 compatible with Microsoft compiler on Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/12539 Reviewed By: hyuen Differential Revision: D10303487 Pulled By: Maratyszcza fbshipit-source-id: 4e20ece6338e4d0663cd3591914ce333f0972693	2018-10-10 11:54:06 -07:00
Will Feng	45db8274de	CircleCI: Add credentials for pushing to perf test S3 bucket (#12523 ) Summary: This will fix the perf test baseline update in master builds. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12523 Reviewed By: bddppq Differential Revision: D10289415 Pulled By: yf225 fbshipit-source-id: 408893ab2b0f93c7cffb9f8fbf74453155b850c4	2018-10-10 11:54:04 -07:00
Jerry Zhang	c2a57d082d	Fix windows build (#12534 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12534 att Reviewed By: orionr Differential Revision: D10300123 fbshipit-source-id: 3079864b6979779af4a524a54b28f9b2baed8ba4	2018-10-10 09:39:06 -07:00
Peter Goldsborough	033e95765c	Diff against master and enable bugprone-* checks (#12378 ) Summary: This PR: 1. Makes clang-tidy diff against `master` instead of `HEAD~1` in CI, which makes much more sense 2. Enables all checks in the `bugprone-*` category (see https://clang.llvm.org/extra/clang-tidy/checks/list.html) except one about parantheses in macros, because it doesn't always apply too well for us. Fixed some nice code smells. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12378 Differential Revision: D10247972 Pulled By: goldsborough fbshipit-source-id: 97dc9e262effa6874d2854584bf41a86684eb8bd	2018-10-10 07:23:57 -07:00
Sebastian Messmer	727609f435	Use SFINAE instead of macros for 'long' hack (#12424 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12424 Some compilers define 'long' as a separate type from 'int32_t' or 'int64_t', some don't. Before, we had a cmake check setting a macro and depending on the macro, we either defined a separate type id for long or didn't. Then, we removed the cmake check and used compiler detection macros directly. This is, however, error prone. This new approach uses SFINAE to register a type id for 'long' only if it's a separate type. Reviewed By: Yangqing, dzhulgakov Differential Revision: D10209620 fbshipit-source-id: 68f09339e279a9a56b95caeef582c557371b518d	2018-10-10 01:11:06 -07:00
Gao, Xiang	e25b8869f7	typo: Aten.h -> ATen.h in cppdocs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12519 Differential Revision: D10287901 Pulled By: goldsborough fbshipit-source-id: 56e0c1851aade84e4154777776d14e087645a762	2018-10-09 23:40:14 -07:00
Marat Dukhan	3829f86c7a	Update NNPACK-related submodules (#12505 ) Summary: Update submodules below: - NNPACK - FP16 - pthreadpool - cpuinfo - psimd Pull Request resolved: https://github.com/pytorch/pytorch/pull/12505 Reviewed By: hyuen Differential Revision: D10286690 Pulled By: Maratyszcza fbshipit-source-id: 279214b47c82e9e2582693191cc218173c00ea69	2018-10-09 21:54:07 -07:00
Xiaodong Wang	283f21d518	Caffe 2 adoption (#12116 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12116 Adapt Caffe 2 to platform007 (gcc 8): * gcc 8 + nvcc template symbol lookup (D9319742): context_.template CopySameDevice<T> ==> this->context_.template CopySameDevice<T> * New gcc 8 warning (error): * -Werror=sizeof-pointer-div * Unnecessary parenthesis Reviewed By: bddppq Differential Revision: D10045844 fbshipit-source-id: 95f509fefc9593cbb82b1687793fef8930260d2f	2018-10-09 19:29:23 -07:00
Ilia Cherniavskii	16b8075acd	finishRun fix (#10970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10970 Fixing a possible case when next iteration of a net may be started prematurely. We have to ensure that resetting running_ flag is done after finalizeEvents (e.g. waiting for the rest of net's event to be finished). Reviewed By: heslami Differential Revision: D9545442 fbshipit-source-id: bc324a180b1e93054b051981817be7985f52b4cb	2018-10-09 16:09:46 -07:00
Junjie Bai	f54ab540af	Rename cuda_gpu_id to device_id in DeviceOption (#12456 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12456 codemod with 'Yes to all' codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id Overload TextFormat::ParseFromString to do string replace when parsing from protobuf format Reviewed By: Yangqing Differential Revision: D10240535 fbshipit-source-id: 5e6992bec961214be8dbe26f16f5794154a22b25	2018-10-09 15:54:04 -07:00
Bram Wasti	caf8b0777a	Move function_schema to ATen/core (#12467 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12467 final move of files to enable nomnigraph wrapped pytorch IR Reviewed By: ezyang Differential Revision: D10242930 fbshipit-source-id: 1af6400ae0c1f1e7c3be262fbca58010eb2bfa86	2018-10-09 15:38:27 -07:00
Bram Wasti	f989d4b18e	Move jit/type and utils/functional to ATen/core (#12466 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12466 Moves type.{h,cpp} and functional.h to ATen/core move is necessary for IR merging -- slimmed down from this diff: D9819906 Reviewed By: ezyang Differential Revision: D10242680 fbshipit-source-id: b71eeec98dfe9496e751a91838d538970ff05b25	2018-10-09 15:38:24 -07:00
onnxbot	58b247fc42	Update onnx to onnx/onnx@008e381 (#12492 ) Summary: `008e381855` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12492 Differential Revision: D10268646 Pulled By: bddppq fbshipit-source-id: 39d2eae66abee898a30b71c23e54f5c51d3f9ac8	2018-10-09 15:38:22 -07:00
iotamudelta	64f707cd26	Enable more unit tests (ROCm 255) (#12486 ) Summary: * Enable more tests that relied on CPU LAPACK at compile time. * enabled min/max tests in test_cuda (ROCm 236) bddppq ezyang Tests ran as part of the ROCm CI here: https://github.com/ROCmSoftwarePlatform/pytorch/pull/255 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12486 Differential Revision: D10262534 Pulled By: ezyang fbshipit-source-id: 167a06fc8232af006f4b33dcc625815fd4b06d6b	2018-10-09 15:38:19 -07:00
vishwakftw	dcd9d73d47	Expunge torch.utils.trainer.* (#12487 ) Differential Revision: D10273602 Pulled By: SsnL fbshipit-source-id: 630c1f8ee0e366f7092d4f93dbe1efa96fc860e0	2018-10-09 14:56:00 -07:00
Peter Goldsborough	8468b7d3f0	Fix tensor doc (#12469 ) Summary: The C++ docs for `at::Tensor` are currently broken because we moved the place `Tensor.h` gets generated to without updating our docs. I use `GEN_TO_SOURCE=1` when generating ATen files, so the `Tensor.h` file should end up in `aten/src/ATen/core/Tensor.h` if i understand correctly. dzhulgakov ezyang gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/12469 Differential Revision: D10248521 Pulled By: goldsborough fbshipit-source-id: 8d8a11f0f6e2703b8d767dbc523fc34a4374f345	2018-10-09 14:09:22 -07:00
Will Feng	2b22c60980	Fix GPU perf tests on CircleCI (#12491 ) Summary: `COMMIT_SOURCE` is missing in the current CircleCI config, which is used in perf tests to decide whether to store the new numbers as baseline. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12491 Differential Revision: D10274426 Pulled By: yf225 fbshipit-source-id: 047ef6cc61a12738062f9940d1bfd4c3bf152909	2018-10-09 13:53:45 -07:00
iotamudelta	b572e27502	Fix types and warp sizes for ROCm (ROCm 256) (#12485 ) Summary: * Correct the warp size for current AMD GPUs * Fix copy paste error in configure * Correct the wrong typing explicitly bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12485 Differential Revision: D10262490 Pulled By: ezyang fbshipit-source-id: 93467944247ed764d9ac9f7bb212a94fc250608e	2018-10-09 12:53:48 -07:00
iotamudelta	c96afa3322	topk and sort fixes (#12337 ) Summary: * Topk part 1: fix intrinsincs for 64 wave front (#224) 64 in a wave front - intrinsics change. * Disable in-place sorting on ROCm. (#237) It is known to hang - use the Thrust fallback Skip one test - fails with the fallback. * Topk fixes (#239) * Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 (bfe) and 9.7.1.20 (bfi) requires pos and len to be limited to 0...255 * Spec (https://docs.nvidia.com/cuda/pdf/ptx_isa_6.3.pdf) Sec 9.7.1.19 requires extracted bits to be in LSBs * Correct logic for getLaneMaskLe. Previous logic would return 0x0 instead of 0xffffffffffffffff for lane 63 * Round up blockDim.x to prevent negative index for smem bddppq ezyang Note the one additional skipped test resulting from using the thrust sort fallback for all sizes. We are working on getting bitonic to work properly (and always). Until then, this needs to be skipped on ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12337 Differential Revision: D10259481 Pulled By: ezyang fbshipit-source-id: 5c8dc6596d7a3103ba7b4b550cba895f38c8148e	2018-10-09 12:08:48 -07:00
Thomas Viehmann	ea79f7c032	Add derivative to pow with scalar base (#12450 ) Summary: Fixes: #12426 Thank you, DriesSmit, for the report! Pull Request resolved: https://github.com/pytorch/pytorch/pull/12450 Differential Revision: D10238556 Pulled By: soumith fbshipit-source-id: 8bf71467c6734ecc5ff30f15500304d731f7e155	2018-10-09 11:38:48 -07:00
Jie	a3fb004b18	(#12474 ) Summary: Modifies the DistributedSampler logic. Now each process samples elements with a given interval, instead of a consecutive section. This eliminates the possibility where the DataLoader uses padded data while dropping the real data. It happens when: 1. DistributedSampler padded data; and 2. DataLoader drops_last is effectively true, and drops less then the number of padded data. from the example down, we see that data (10, 11, 12) are padded through duplicating data sample (1, 2, 3) The old sampler drops legit original data (3, 6, 9) and introduces duplication (10, 11) into the training set; while the new sampler logic samples correct data points from the data set. This example has been added to dataloader unit test example: ``` data after shuffle: 1, 2, 3, 4, 5, 6, 7, 8, 9 padded data : 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12 old sampler: -> DataLoader with (batch_size=2 and drop_last=True) p 1: 1, 2, 3 1, 2 p 2: 4, 5, 6 4, 5 p 3: 7, 8, 9 7, 8 p 4:10,11,12 10,11 new sampler: -> p 1: 1, 5, 9 1, 5 p 2: 2, 6,10 2, 6 p 3: 3, 7,11 3, 7 p 4: 4, 8,12 4, 8 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12474 Differential Revision: D10260410 Pulled By: SsnL fbshipit-source-id: 710856571260f42ce25955b81a5b8008e04938cf	2018-10-09 11:23:50 -07:00
Jerry Zhang	1c69d368e1	Remove New with Allocator Registry (#12111 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12111 Setup allocator registry keyed by at::DeviceType, and remove New from StaticContext. Reviewed By: ezyang Differential Revision: D10022853 fbshipit-source-id: 3e88a181fe5df24f33f49b88be1f75284a185588	2018-10-09 10:53:52 -07:00
Christian Puhrsch	f564163951	Remove SSE-only code and convolve5x5 (#12109 ) Summary: Performance oriented code will use AVX/AVX2, so we don't need SSE specific code anymore. This will also reduce the probability of running into an error on legacy CPUs. On top of this convolve is covered by modern libraries such as MKLDNN, which are much more performant and which we now build against by default (even for builds from source). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12109 Differential Revision: D10055134 Pulled By: colesbury fbshipit-source-id: 789b8a34d5936d9c144bcde410c30f7eb1c776fa	2018-10-09 10:53:50 -07:00
Tongzhou Wang	11c31aef04	Prevent hanging in data loader altogether Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11985 Differential Revision: D10202374 Pulled By: SsnL fbshipit-source-id: 1ab1a07185f78a104f9b05930a87ef5a32f431e4	2018-10-09 09:54:19 -07:00
Roy Li	1a0d82e4f4	fix import for script module with control flow blocks (#12351 ) Summary: The value_info proto field was being processed in BuildGraph, but control flow blocks used buildBlocks instead. This PR moves moves that step to BuildBlock. I removed DecoderBase because it was making the code confusing and we never needed it in the first place. closes #12319 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12351 Differential Revision: D10212411 Pulled By: li-roy fbshipit-source-id: 47f289a462a1ab7391ff57368185401673980233	2018-10-08 22:25:14 -07:00
Bram Wasti	c959be9d1d	Create named functions construct (#12237 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12237 This diff creates named functions and cleans up a lot of the basic block usage throughout the code Reviewed By: duc0 Differential Revision: D10134363 fbshipit-source-id: d0c4ae0bbb726236a15251dbfd529d4fddcd9e9f	2018-10-08 22:12:18 -07:00
Bram Wasti	8414094562	cleanup controlflow (#12235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12235 SSA is actually implicitly maintained so not only was this function not implemented, it never should be implemented. Reviewed By: duc0 Differential Revision: D10133928 fbshipit-source-id: e8e5e2386f8b57812b0be2c380af85ed07cd3152	2018-10-08 22:12:13 -07:00
Tongzhou Wang	d400502b1d	Fix a bunch of warnings in TestNN Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12453 Differential Revision: D10244130 Pulled By: SsnL fbshipit-source-id: e425c76bfb721fe118a32ddd1fa6eca3a3cd86f0	2018-10-08 17:38:23 -07:00
Will Feng	cdead5ace1	Enable CircleCI for Linux jobs (#12389 ) Summary: Changes in this PR: 1. Intermediate Docker image is shared from build stage to test stage through ECR, in order to fix the Caffe2 flaky CUDA tests. 2. There are ~7 Caffe2 operator tests that are only flaky in `caffe2_py2_gcc4_8_ubuntu14_04_test` on CPU. Disabling those tests on that config only, which is okay to do because we are still running those tests in other test jobs. After this PR is merged, CircleCI will be running on master automatically, and will be running on PRs if the author rebased their PR onto the newest master (which we will ask all the authors to do when we switch off Jenkins for Linux). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12389 Differential Revision: D10224267 Pulled By: yf225 fbshipit-source-id: dd1a90a425c3d13b870d3d328cb301eee2e6e2cd	2018-10-08 17:09:37 -07:00
Dong Shi	5a0d2c7138	Add clamping functionality to stats_put_ops Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12391 Reviewed By: mlappelbaum Differential Revision: D10220000 fbshipit-source-id: 10fdbc8ebab931a5be31df964b5de5728048205d	2018-10-08 16:53:26 -07:00
Edward Z. Yang	1ee6fc4002	Delete noexcept on the move constructor of OrderedDict (#12369 ) Summary: Previously we tested if default-construction was noexcept, which doesn't really mean that the move constructor is noexcept too. Shuts up clang-tidy. Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC goldsborough Pull Request resolved: https://github.com/pytorch/pytorch/pull/12369 Differential Revision: D10217348 Pulled By: ezyang fbshipit-source-id: b46437d8ac7a8d756cf03ed0c6bf4400db7ecde7	2018-10-08 16:38:27 -07:00
Ilia Cherniavskii	dd4b9b06a4	Back out "Back out "[caffe2] Use custom CPU thread pool in async_scheduling"" (#12418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12418 Original commit changeset: 32921600925b Reviewed By: yinghai Differential Revision: D10231119 fbshipit-source-id: 7d09ea8de82ff2d911d9ded88d87af4226464d1b	2018-10-08 16:24:07 -07:00
Teng Li	c5d7494ca1	Use open-source NCCL2 in PyTorch (#12359 ) Summary: - Removed the old nccl file - Make open-source NCCL a submodule - CMake to make NCCL itself NCCL2 now is in the default build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12359 Reviewed By: orionr, yns88 Differential Revision: D10219665 Pulled By: teng-li fbshipit-source-id: 134ff47057512ba617b48bf390c1c816fff3f881	2018-10-08 15:39:07 -07:00
James Reed	c3987a0fc3	Fix issues with ATenOp handling methods where `self` is not the first arg (#12353 ) Summary: ATenOp was handling `torch.where` incorrectly. Whereas the `torch.where` overload (and `aten::` function) had arguments in the order `Tensor condition, Tensor self, Tensor other`, ATenOp was emitting code that assumed that `self` was the 0th argument, and thus was trying to interpret the wrong value as the condition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12353 Differential Revision: D10218435 Pulled By: jamesr66a fbshipit-source-id: afe31c5d4f941e5fa500e6b0ef941346659c8d95	2018-10-08 15:25:39 -07:00
Elias Ellison	d0e1dca0f5	fix expect file (#12465 ) Summary: Fix expect file that got out of sync Pull Request resolved: https://github.com/pytorch/pytorch/pull/12465 Differential Revision: D10244646 Pulled By: eellison fbshipit-source-id: 66d101d4c6c0a235ce9fa47dc3cce027624c86bc	2018-10-08 13:54:24 -07:00
James Reed	5bac46508a	Fix TestJit.test_alexnet expect file (#12458 ) Summary: This test only runs when you have torchvision installed, which is not the case on CI builds. When I run test_jit on my local machine, this fails, so fixing up the expect file here. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12458 Differential Revision: D10244344 Pulled By: jamesr66a fbshipit-source-id: 728c5d9e6c37f807a0780066f20f6c31de84d544	2018-10-08 13:54:22 -07:00
Marcela Morales Quispe	d4b4c1fbec	Add missing url links to README.md file. (#12440 ) Summary: Signed-off-by: Marcela Morales Quispe <marcela.morales.quispe@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12440 Differential Revision: D10242642 Pulled By: SsnL fbshipit-source-id: f47d7579cf3df097c476a97b58149ca4b1eb17ab	2018-10-08 13:54:21 -07:00
Marat Dukhan	a55b9f77a0	Implement 3D and 4D parallelization in Caffe2 thread pool (#12455 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12455 - Mirror changes in pthreadpool Reviewed By: harouwu Differential Revision: D10240470 fbshipit-source-id: c1af769b5894f7865736fdaf4e0e5bf17c524614	2018-10-08 13:12:57 -07:00
Bram Wasti	d181e0f1fc	Add move{Node,Edge,Subgraph} for Graph move-like semantics (#12303 ) Summary: Adding back import{Node,Edge} as move{Node,Edge} and adding a new function moveSubgraph. Previous diff broke OSS Pull Request resolved: https://github.com/pytorch/pytorch/pull/12303 Differential Revision: D10182522 Pulled By: bwasti fbshipit-source-id: 9619431d6d1a44f128613a4f6d8b7f31232ccf28	2018-10-08 12:53:25 -07:00
Bram Wasti	cf2b88fa30	Induce edges on subgraphs (#12255 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12255 Simple algorithm to connect a subgraph Reviewed By: ZolotukhinM Differential Revision: D10141701 fbshipit-source-id: c79c5bc2be89100db602d0a5ff3d17e3dc332d8c	2018-10-08 12:24:55 -07:00
Bram Wasti	7103d0d938	Add python bindings (#12253 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12253 Adding python bindings to unblock DAI development Reviewed By: duc0 Differential Revision: D10141621 fbshipit-source-id: efac7fb8a0cc787e1c4cc94515e673812529a997	2018-10-08 12:24:53 -07:00
Yinghai Lu	e7653c7561	New chaining/partitioning algorithm for async_scheduling for inference (#11957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11957 For distributed inference, we want to use async_scheduling net to run the net as we need its async part. However, according to the profiling, async_net has big overhead of dispatching tasks onto worker threads. This diff improves the issue by generating a smaller number of chains/tasks by grouping the sync ops that can be run in one shot. Note that it also schedule individual async ops as a single chain because unlike gpu ops, rpc ops are not guaranteed to be linearized at the remote site. For example, if you have two rps ops `op1->op2`, op2 won't implicitly block until op1 finishes. Therefore we need to put each of the async op as one chain as async_scheduling net will only sync the tail of the chain. For the all sync op nets, this change give us `1.5X` slower than simple_net, while without the change, it is `7X` slower. Next step is to work on the executor to make the task scheduling faster. And add a fallback path to be able to run ops inline if it's a all-sync net. Reviewed By: ilia-cher Differential Revision: D9874140 fbshipit-source-id: fcd45328698c29211f2c06ee3287194acda12227	2018-10-08 12:24:52 -07:00
Jongsoo Park	f1f521f71b	make bench_gen.py work for 3d conv (#12433 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12433 To test 3d conv, we need to pass lists in spec argument. We also don't want to set use_cudnn=True which is the default in brew. Reviewed By: llyfacebook, csummersea Differential Revision: D10234315 fbshipit-source-id: 96a39992a97e020d6e9dac103e6d64df0cc1020b	2018-10-08 12:24:43 -07:00
Elias Ellison	00aedfc0e2	constant pooling pass (#12222 ) Summary: Add a pass to move all constants to the beginning of the graph, and deduplicate. This extends https://github.com/pytorch/pytorch/pull/10231 to also handle constants introduced in inlining, constant propagation, etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12222 Reviewed By: driazati Differential Revision: D10201616 Pulled By: eellison fbshipit-source-id: bc9c5be26868c8b5414257a0d4462de025aeb9bd	2018-10-08 11:55:02 -07:00
Gregory Chanan	83b4dc6822	Remove Type.tensor(). (#12360 ) Summary: Use at::empty instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12360 Reviewed By: ezyang Differential Revision: D10215119 Pulled By: gchanan fbshipit-source-id: f9bb257dff1b1bf1ecd3a6e358c4791d81b5bd31	2018-10-08 11:39:05 -07:00
peter	28e1571843	Add the x64 msvc toolchain into PATH (#12446 ) Summary: A possible fix for the problem stated in #12410. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12446 Differential Revision: D10238572 Pulled By: soumith fbshipit-source-id: 17ade148c4036d2481b878e5cd7d9d67c1e3626e	2018-10-08 07:54:20 -07:00
Roy Li	def655ec27	fix critical section of atomic add op Summary: When testing D10220313, I ran into this bug. Reviewed By: aazzolini Differential Revision: D10224295 fbshipit-source-id: f46d7333612bce437c1ae6c0b0b579fc2a639665	2018-10-08 02:20:23 -07:00
Marcela Morales Quispe	8689d8af36	Format inline code block. (#12441 ) Summary: Signed-off-by: Marcela Morales Quispe <marcela.morales.quispe@gmail.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12441 Differential Revision: D10236743 Pulled By: SsnL fbshipit-source-id: c0e446a81a388cf6a558bf7ab8ba0e59703dc169	2018-10-08 00:51:07 -07:00
Thomas Viehmann	0e44db8b0d	Add check for backend of arguments to bmm cpu (#12434 ) Summary: Fixes: #12406 Thank you, jcjohnson, for reporting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12434 Differential Revision: D10235799 Pulled By: soumith fbshipit-source-id: 44ee35010bac3791901f604095f5b4bc66b0e7f8	2018-10-07 18:55:42 -07:00
Peter Goldsborough	db8d01b248	Move JIT tests to gtest (#12030 ) Summary: In our #better-engineering quest of removing all uses of catch in favor of gtest, this PR ports JIT tests to gtest. After #11846 lands, we will be able to delete catch. I don't claim to use/write these tests much (though I wrote the custom operator tests) so please do scrutinize whether you will want to write tests in the way I propose. Basically: 1. One function declaration per "test case" in test/cpp/jit/test.h 2. One definition in test/cpp/jit/test.cpp 3. If you want to be able to run it in Python, add it to `runJitTests()` which is called from Python tests 4. If you want to be able to run it in C++, add a `JIT_TEST` line in test/cpp/jit/gtest.cpp Notice also I was able to share support code between C++ frontend and JIT tests, which is healthy. ezyang apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/12030 Differential Revision: D10207745 Pulled By: goldsborough fbshipit-source-id: d4bae087e4d03818b72b8853cd5802d79a4cf32e	2018-10-06 23:09:44 -07:00
Sebastian Messmer	6f664d3917	Improve TypeMeta (#11502 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11502 TypeMeta now is only a pointer to a TypeMetaData structure, of which there is exactly one global instance per type. This reduces the size of everything storing a TypeMeta (Tensor, Blob, ...) and potentially improves performance. Also, this diff gets rid of the type name registry in favor of static strings. Experiments (summary: 1-3% perf gain) - Service Lab: https://our.intern.facebook.com/intern/servicelab/30712497/ -> No significant results found. - Mobile Lab c10bench.json: https://our.intern.facebook.com/intern/fblearner/details/75984908/ -> 1-3% perf gain - Mobile Lab c10bench default: https://our.intern.facebook.com/intern/fblearner/details/75984999/ -> 2-3% perf gain - adindexer canary: https://our.intern.facebook.com/intern/ads/canary/413002142824203076 -> no significant changes (benchmark too noisy) - adfinder canary: https://our.intern.facebook.com/intern/ads/canary/413002166737860362 -> no significant changes (benchmark too noisy) Reviewed By: dzhulgakov Differential Revision: D9763422 fbshipit-source-id: fc08937f114af5ff9f3ddbe7c7e396942868cdf5	2018-10-06 14:09:28 -07:00
Sebastian Messmer	ac9bb8ecef	Make dynamic_cast_if_rtti safer (#12408 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12408 Using static_cast is better than reinterpret_cast because it will cause a compile time error in the following cases, while reinterpret_cast would run into undefined behavior and likely segfault: - Src and Dst are not related through inheritance (say converting int* to double*) - Src and Dst are related through virtual inheritance This `dynamic_cast_if_rtti` is still unsafe because `dynamic_cast` and `static_cast` behave differently if the runtime type is not what you expected (i.e. dynamic_cast returns nullptr or throws whereas static_cast has undefined behavior), but it's much safer than doing reinterpret_cast. Reviewed By: Yangqing Differential Revision: D10227820 fbshipit-source-id: 530bebe9fe1ff88646f435096d7314b65622f31a	2018-10-06 12:56:27 -07:00
Gregory Chanan	0e966fc9f9	Back out "[caffe2] Use custom CPU thread pool in async_scheduling" (#12415 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12415 Original commit changeset: 95da8c938b8e Reviewed By: ilia-cher Differential Revision: D10229804 fbshipit-source-id: 32921600925b65edb5bb201c9afba0d03ed49426	2018-10-06 00:42:06 -07:00
Gregory Chanan	695465915a	Remove some Type.tensor usages and remove native_tensor without size. (#12403 ) Summary: Same as before, but with "initialTensorOptions()" instead of "TensorOptions(false)". Pull Request resolved: https://github.com/pytorch/pytorch/pull/12403 Differential Revision: D10225427 Pulled By: gchanan fbshipit-source-id: 60bd025a5cc15bdbbab6eafc91ea55f5f2c3117e	2018-10-05 20:55:14 -07:00
Ilia Cherniavskii	14b48a2404	Use custom CPU thread pool in async_scheduling (#12295 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12295 Add ability to use custom implementations of thread pool instead of TaskThreadPool Reviewed By: yinghai Differential Revision: D10046685 fbshipit-source-id: 95da8c938b8e60b728484c520319b09b0c87ff11	2018-10-05 19:56:04 -07:00
David Riazati	92b0e7026e	Add weak script mode for script functions (#11963 ) Summary: This PR is the start of weak script mode for functions Weak scripts allow you to compile a graph from Python code at runtime by annotating with `torch.jit.weak_script` for use in the JIT without affecting eager execution. Scripts are compiled lazily on the first call in a graph to avoid long Python startup times. apaszke zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11963 Differential Revision: D10183451 Pulled By: driazati fbshipit-source-id: 128750994d5eb148a984f8aba4113525c3e248c8	2018-10-05 18:55:49 -07:00
Edward Yang	058a31839d	Warn about local_rank not being globally unique. (#12370 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC deepakn94 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12370 Differential Revision: D10220135 Pulled By: ezyang fbshipit-source-id: 6d1a8a383951ae52753e4f75a14b8080bf02b815	2018-10-05 17:38:41 -07:00
iotamudelta	3f04ca9a91	Remove duplicate math transpilation function (ROCm 233) (#12387 ) Summary: * Remove duplicate math transpilation function * Modify regex to expand matches to more __device__ functions * Try a different tack. Apply math transpilations only to .cu and .cuh files * Undo change that's not required anymore since we're not using regex to detect device functions This should address "overtranspilation" as observed in another PR. bddppq ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12387 Differential Revision: D10226798 Pulled By: bddppq fbshipit-source-id: fa4aac8cd38d8f7ef641fad5129ed4714c0fada5	2018-10-05 17:16:35 -07:00
James Reed	e1fe617600	Fix flipped pad buffer constructor arguments Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12361 Differential Revision: D10218404 Pulled By: jamesr66a fbshipit-source-id: f02137f97cd138155ba8181df3ab65f41d5abab7	2018-10-05 17:16:32 -07:00
Eli Amesefe	99de4565dd	Split reduction_front_backops.[cc\|cu] into smaller units to allow build of smaller size (#12315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12315 Allows inclusion of needed reduce_front_back_* ops only Differential Revision: D10188611 fbshipit-source-id: e17fd955ac5aa163a039872b6a435942b1e1e164	2018-10-05 16:50:21 -07:00
Zachary DeVito	b937cbb776	Fix a bug that would resize tensor storage on export Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12377 Differential Revision: D10219213 Pulled By: zdevito fbshipit-source-id: 85cfa4467c672ff5a718e58cfae7e8c8b1cfc532	2018-10-05 16:24:54 -07:00
Anders Papitto	57fcc57f31	set CMAKE_INSTALL_MESSAGE to NEVER (#12392 ) Summary: this removes a bunch of spam output from the build. This is (1) cleaner (2) a couple seconds faster in some cases, e.g. my slow-rendering emacs-based shell Pull Request resolved: https://github.com/pytorch/pytorch/pull/12392 Differential Revision: D10225340 Pulled By: anderspapitto fbshipit-source-id: 477ee76d24f8db50084b1e261db8c22733de923b	2018-10-05 15:57:44 -07:00
Edward Yang	54d9823d00	Make caffe2::Tensor::dims() return an IntList instead of a const vector& (#12180 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12180 I had to fix a lot of call sites, because a lot of places assume that you can actually get a const vector&, and if the internal representation of sizes in a tensor is NOT a vector, it's not possible to fulfill this API contract. Framework changes: - I deleted TensorImpl::dims(); caffe2::Tensor::dims() just forwards to sizes() now. - De-templatized SetDims; now it is an explicit list of ArrayRef and variadic overloads. This makes implicit conversions work again, so I don't need to explicitly list the std::vector cases too. - As a knock-on effect, this causes Reset() to accept at::IntList as well as const std::vector<int64_t>& - Edited variadic overloads of SetDims to all forward to the underlying arbitrary-dim implementation, reducing code duplication. (It's probably marginally less efficient in the new world.) - Replace Tensor constructor accepting const std::vector<int64_t>& with at::IntList - Make MKLTensor accept ArrayRef along with vector in constructor and Reset (unfortunately, no implicit conversions here, since it's templated on index type.) - There are a few other places, like cudnn, where I changed functions that previously took const std::vector<int64_t>& to take at::IntList instead. Classification of call site changes: - 'const std::vector<int64_t>& x_dims = x.dims()' ==> 'at::IntList x_dims = x.dims()' - 'std::vector<int64_t> x_dims = x.dims()' ==> 'std::vector<int64_t> x_dims = x.dims().vec()' (we need a copy!) Usually this is because we're about to mutably modify the vector to compute some new dimension. However, it also very commonly occurs in the form: 'x_dims_ = x.dims()' because we frequently cache sizes in operators. - Instead of constructing std::vector<int64_t>{blah, blah}, construct an at::IntList directly ArrayRef changes: - cbegin()/cend() iterators, they operate the same aas begin()/end() because everything on ArrayRef is const. - Moved operator<< into ArrayRef.h, so that it's always available when working with ArrayRef. I also templated it, so it now works on an ArrayRef of any type. - Add operator== overload for ArrayRef, and also add variants to permit comparison of ArrayRef with std::vector, a very common operation. (The non-templated version of operator== can get these automatically via implicit conversion, but with templates C++ refuses to do any explicit conversions.) I'm planning to audit all dims() call sites to make sure they don't expect 'auto x = t.dims()' to give you an x whose lifetime can validly outlive the tensor. I opted not to do a dims() to sizes() rename, because dims() also matches the protobufs accessor. Bad news! Reviewed By: jerryzh168 Differential Revision: D10111759 fbshipit-source-id: a2a81dc4b92c22ad4b3b8ef4077a7e97b6479452	2018-10-05 15:57:41 -07:00
Sam Gross	f9fb37ca79	Guard Denormals-Are-Zero with runtime CPU check (#12386 ) Summary: Previously, we were only enabling Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ) when compiling with SSE3 enabled. After, Christian's patch (https://github.com/pytorch/pytorch/pull/12109) we won't be compiling core files with SSE3 or SSE4 enabled, to better support older AMD processors. This moves the FTZ and DAZ code behind a runtime CPU check in preparation for that change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12386 Differential Revision: D10222237 Pulled By: colesbury fbshipit-source-id: 7ffe32561ab965e1e5f9eb6e679602bbf4775538	2018-10-05 14:54:54 -07:00
Zachary DeVito	bd09ab6687	Remove stages from IR, they are not longer used Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12352 Differential Revision: D10219743 Pulled By: zdevito fbshipit-source-id: 4d9441dc3748616f9b1f0734c65ec1a7abb0d663	2018-10-05 13:58:15 -07:00
Brian Vaughan	c7e8044fc8	Support additional device types (#12293 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12293 Adding support for additional device types besides cuda and cpu. Reviewed By: ezyang Differential Revision: D10175683 fbshipit-source-id: 7a8a35c3f1b13a3b6ed84dd2d835f3902a418a6c	2018-10-05 13:15:05 -07:00
daquexian	f8086845aa	Fix bug in grad.py when conv bias != None (#12281 ) Summary: Obviously, the grads of conv weight and conv input are not relevant to the bias, but the original `convXd_input` and `convXd_weight` methods receive a `bias` parameter. What's more, while the doc says `bias` should have the shape `(out_channels,)`, one will get a `RuntimeError` if the bias != None and in_channels != out_channels, for the weight of transposed conv has the shape `(in_channels, out_channels, kH, kW)` while the weight of vanilla conv has the shape `(out_channels, in_channels, kH, kW)` ``` RuntimeError: Given transposed=1, weight of size [channel1, channel2, kH, kW], expected bias to be 1-dimensional with channel2 elements, but got bias of size [channel1] instead ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12281 Differential Revision: D10217370 Pulled By: ezyang fbshipit-source-id: bc00b439e5ae539276a5e678bdb92af700197bb2	2018-10-05 12:55:14 -07:00
Gregory Chanan	e2d2b270db	Revert D10212616: [pytorch][PR] Remove some Type.tensor usages and remove native_tensor without size. Differential Revision: D10212616 Original commit changeset: c9cd128d1111 fbshipit-source-id: 923781ba9cd6e60e7c92789832e5601a1fd848b5	2018-10-05 11:55:45 -07:00
Gregory Chanan	705d80b51e	Remove some Type.tensor usages and remove native_tensor without size. (#12355 ) Summary: This is to move us along the path to removing Type from the public API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12355 Reviewed By: ezyang Differential Revision: D10212616 Pulled By: gchanan fbshipit-source-id: c9cd128d1111ab219cb0b2f3bf5b632502ab97c0	2018-10-05 11:12:07 -07:00
David Riazati	9ebac3d7fe	Improve type kind error message (#12344 ) Summary: Address #12326 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12344 Differential Revision: D10210681 Pulled By: driazati fbshipit-source-id: fcc2e26b79dd2d7d5f9e7ef930e2bf434f2a7e08	2018-10-05 10:57:16 -07:00
Devashish Tyagi	0ebbfc25f3	Add utility function make_tensor (#12288 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12288 Current implementation of Tensor takes an intrusive_ptr as an argument for storing data. But instead of requiring users to explicitly pass an intrusive_ptr we want them to pass args for intrusive ptr directly which are forwarded internally through new helper function called make_tensor Reviewed By: ezyang Differential Revision: D10152661 fbshipit-source-id: bfa72de161ace3fd1c4573427abcd1bfbd12e29e	2018-10-05 10:40:28 -07:00
Edward Yang	dd2c487ab0	Enforce invariant that storage_ is always non-null (#12328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12328 - Delete reset() from Storage, as it makes it easy to accidentally create a null storage. - Immediately reject a storage if it is null when passed in Reviewed By: dzhulgakov Differential Revision: D10200448 fbshipit-source-id: 14bfa45f8f59859cc350bd9e20e3ef8692e3991d	2018-10-05 09:43:34 -07:00
Yangqing Jia	7788ec9dd1	Remove dangling cmake check for long typemeta (#12356 ) Summary: TSIA Pull Request resolved: https://github.com/pytorch/pytorch/pull/12356 Differential Revision: D10212726 Pulled By: Yangqing fbshipit-source-id: b9c2c778fb496278477ef323ecfefd5d19d1af3c	2018-10-05 09:43:32 -07:00
Edward Yang	1e7050072b	Make TensorOptions contain optional fields, optimize struct size (#12103 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12103 This defers lookup of defaults to the site where we read out of TensorOptions. THIS IS A BC-BREAKING BEHAVIOR CHANGE, but we expect the bulk of uses of OptionsGuard don't allocate TensorOptions inside the OptionsGuard region, and then use it outside of the region (the situation where behavior could change.) I also optimize the size of TensorOptions by rearranging fields, so that we always fit in two 64-bit words. Reviewed By: goldsborough Differential Revision: D10052523 fbshipit-source-id: f454a15b4dbf8cd17bc902ab7d2016f2f689ed13	2018-10-05 09:24:53 -07:00
ohlr	b3cdaee6db	Update README.md of ATen Documentation (#12367 ) Summary: The changes are made to clarify how the parsing between the yaml files and header files of THNN and THCUNN works. As issue #12320 shows it is not that easy to understand the existing code without a hint to the important files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12367 Differential Revision: D10217459 Pulled By: ezyang fbshipit-source-id: 9b3e64dea4f156843814840e736dc3230332060c	2018-10-05 08:39:55 -07:00
Bram Wasti	5cb2b2358c	Move interned_strings and get build working (#12039 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12039 Refactoring out this diff D9819906 Reviewed By: ezyang Differential Revision: D10024844 fbshipit-source-id: 75b6c93526dc1490299f8b5e564e029146338178	2018-10-05 00:41:18 -07:00
Roy Li	f494f004b7	Fix unintended casting to long (and fix Half overloads) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12357 Reviewed By: Yangqing Differential Revision: D10213037 Pulled By: li-roy fbshipit-source-id: 98f7f5ee2b51a3fab378faf65482919caf008957	2018-10-05 00:28:00 -07:00
Tongzhou Wang	d4c58216d7	Stop warnings on AT_DECLARE_TENSOR_TYPE(.); (#12348 ) Summary: e.g., ``` │../aten/src/ATen/core/TensorTypeIdRegistration.h:101:43: warning: extra ‘;’ [-Wpedantic] │ AT_DECLARE_TENSOR_TYPE(SparseCUDATensorId); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12348 Differential Revision: D10210072 Pulled By: SsnL fbshipit-source-id: 90eacc97ef490148c0ac1357cf28f1326a791dfa	2018-10-04 23:16:47 -07:00
Yinghai Lu	d9ba2b6894	Add Pytorch domain specifc ONNX schema for SparseNN ops (#12338 ) Summary: as the tile said Pull Request resolved: https://github.com/pytorch/pytorch/pull/12338 Differential Revision: D10204691 Pulled By: yinghai fbshipit-source-id: fe6bb8c715a54372508672fc0651841bbc4b8656	2018-10-04 23:16:45 -07:00
Edward Yang	bd8980e8c0	Enable CUDA 10 in CI. (#12343 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12343 Differential Revision: D10215274 Pulled By: ezyang fbshipit-source-id: ab14e0cadd4100d7cfc3c7e924dd92742da3c29e	2018-10-04 23:16:42 -07:00
Edward Yang	6544cd4590	Revert D10205876: Fix unintended casting to long Differential Revision: D10205876 Original commit changeset: b0678b019b19 fbshipit-source-id: ebd3acc017fd10cf293e1de281ea294da86747be	2018-10-04 21:10:52 -07:00
Dong Shi	8e5ac43b4e	Fix unintended casting to long Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12341 Reviewed By: ezyang Differential Revision: D10205876 fbshipit-source-id: b0678b019b196ac9ee52969f80819ee9ee442bf2	2018-10-04 17:41:40 -07:00
David Reiss	16e21e14e3	Fix Caffe2 build on 64-bit Android (#12340 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12340 `long` and `int64_t` are the same type on 64-bit Android. Reviewed By: Yangqing Differential Revision: D10204892 fbshipit-source-id: 2d5bf707bf87b99fc597c9292b59f032e9004620	2018-10-04 15:14:53 -07:00
David Riazati	f0b73ff790	Pretty printer improvements (#12179 ) Summary: * Replaces `prim::PythonOp` with the name of the function being called * Delays printing values used in `prim::Return` nodes until the return node itself if that is the only place the value is used to remove some useless assigns zdevito apaszke ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12179 Differential Revision: D10132661 Pulled By: driazati fbshipit-source-id: cbc4ac34137ed5872049082e25d19eb1ebc71208	2018-10-04 15:14:51 -07:00
Orion Reblitz-Richardson	895994a7c3	Back out "[pytorch][PR] [Build] Use open-source NCCL2 in PyTorch" Reviewed By: The controller you requested could not be found. fbshipit-source-id: a13075339d3a7b970e81be0b1a32a7c4c3a6c68d	2018-10-04 14:12:04 -07:00
iotamudelta	a98489747d	Enable sparse functionality and tests (#12323 ) Summary: * Enable sparse functions for ROCm * Reenable test_sparse unit tests that are now passing in ROCm ezyang bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/12323 Differential Revision: D10203540 Pulled By: bddppq fbshipit-source-id: 33ffcfbda32875676c27b33ad1e7cd96fbadc790	2018-10-04 13:43:12 -07:00
vishwakftw	39bd73ae51	Guard NumPy usage using USE_NUMPY (#11798 ) Summary: All usages of the `ndarray` construct have now been guarded with `USE_NUMPY`. This eliminates the requirement of NumPy while building PyTorch from source. Fixes #11757 Reviewed By: Yangqing Differential Revision: D10031862 Pulled By: SsnL fbshipit-source-id: 32d84fd770a7714d544e2ca1895a3d7c75b3d712	2018-10-04 12:11:02 -07:00
Gu, Jinghui	c064f8a89d	Fix build error mkldnn due to corruptted CMAKE_REQUIRED_LIBRARIES (#12195 ) Summary: This is to fix cmake-time compilation error. When we change script to build Caffe2 with mkldnn, we run into some cmake-time compilation support check (like in libsleef) failed due to incorrect setting of CMAKE_REQUIRED_LIBRARIES. It is a global setting which can interfere camke compilation if it is not clean up properly. FindBLAS.cmake and FindLAPACK.cmake didn't clean this flag, and causes incorrect building of libsleef.so. yinghai gujinghui Pull Request resolved: https://github.com/pytorch/pytorch/pull/12195 Differential Revision: D10159314 Pulled By: yinghai fbshipit-source-id: 04908738f7d005579605b9c2a58d54f035d3baf4	2018-10-04 11:56:06 -07:00
Teng Li	ae7a7fb398	Use open-source NCCL2 in PyTorch (#12312 ) Summary: - Removed the old nccl file - Make open-source NCCL a submodule - CMake to make NCCL itself NCCL2 now is in the default build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12312 Differential Revision: D10190845 Pulled By: teng-li fbshipit-source-id: 08d42253b774149a66919d194f88b34628c39bae	2018-10-04 11:42:17 -07:00
Yangqing Jia	6b79e16d6d	revert test/expect files (#12332 ) Summary: Linter added newline to the expect files in #12144 . This reverts it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12332 Reviewed By: SsnL Differential Revision: D10201790 Pulled By: Yangqing fbshipit-source-id: 29f87c013c3522675a765a81a92520fbaea10057	2018-10-04 11:12:57 -07:00
Yangqing Jia	83de6f0dac	hip minor fix for c10 (#12329 ) Summary: TSIA Pull Request resolved: https://github.com/pytorch/pytorch/pull/12329 Differential Revision: D10201437 Pulled By: Yangqing fbshipit-source-id: 4e62f5870ad269d7a4f936393d2b3e646d0a6b2c	2018-10-04 11:12:54 -07:00
Peter Goldsborough	bcb62cb525	Lazily create tensors in optim_baseline (#12301 ) Summary: Tensors cannot be created globally because of static initialization order issues. So tensors for the optim_baseline test must be created lazily instead. This is fine because these functions will only be called once (in the respective test). ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/12301 Differential Revision: D10201008 Pulled By: goldsborough fbshipit-source-id: 59a041f437354e7c6600e5655b3e2d0647dbde9e	2018-10-04 10:55:53 -07:00
Yangqing Jia	1962646d0f	Remove CAFFE2_UNIQUE_LONG_TYPEMETA (#12311 ) Summary: CAFFE2_UNIQUE_LONG_TYPEMETA has been a tricky variable defined only from cmake - this is an experiment to remove it and see what exact compilers need that one set. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12311 Reviewed By: dzhulgakov Differential Revision: D10187777 Pulled By: Yangqing fbshipit-source-id: 03e4ede4eafc291e947e0449382bc557cb624b34	2018-10-04 10:12:13 -07:00
Yangqing Jia	38f3d1fc40	move flags to c10 (#12144 ) Summary: still influx. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12144 Reviewed By: smessmer Differential Revision: D10140176 Pulled By: Yangqing fbshipit-source-id: 1a313abed022039333e3925d19f8b3ef2d95306c	2018-10-04 02:09:56 -07:00
Johannes M Dieterich	c9f7d7b506	mark unit tests as working, skip failing unit test (#12313 ) Summary: * enabled fp16 tests for test_torch * enable fp16 tests for test_nn * enabled multilabelmargin loss for fp16 * removed skip for test_pdist_empty_col * Enable test_nn tests that pass with compiler fixes etc. * Enable test_legacy_nn tests that pass with compiler fixes etc. ezyang bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/12313 Differential Revision: D10189922 Pulled By: bddppq fbshipit-source-id: a5592817c04b14e355cb062d42ebea406f0c92b6	2018-10-03 23:56:26 -07:00
Bram Wasti	8c64655460	Open source distributed code (#12254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12254 Move distributed_* code to oss folders This unblocks adding python bindings Reviewed By: duc0 Differential Revision: D10141400 fbshipit-source-id: 04d6654b73b6757c4dc4a1ddd9dfa2ce23c8c91d	2018-10-03 21:41:14 -07:00
Qinqing Zheng	15367ba9bc	Deserialize offset of TreeCursor only when it is not empty (#11465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11465 It happened in one of my testing workflow run that deserialization of dataset_cursor failed. The reason it fails is due to the offset vector is serialized only when it's non-empty, but deserialization always process offset_blob whenever it is called. Though I'm still checking the reason why the offset of dataset_cursor is empty, I think it's good to remove this discrepancy. Reviewed By: aazzolini, Tianshu-Bao Differential Revision: D9737636 fbshipit-source-id: bb111933f534b092f29469680ff29e59617655f0	2018-10-03 20:38:59 -07:00
Jongsoo Park	07bb79bd8b	Use caffe2::int8::Int8TensorCPU when input type is uint8_t (#12274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12274 We use caffe2::int8::Int8TensorCPU for quantized tensor with uint8_t element type. Reviewed By: llyfacebook Differential Revision: D10156452 fbshipit-source-id: 52cf2bedc9dbb433cd5d03f0b76723f7df6a7361	2018-10-03 19:26:16 -07:00
Jerry Zhang	faab6ea922	Split Allocator (#12105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12105 Split CUDA/OpenCL/xxx Allocator from xxxStaticContext::New and rewrite it under at::Allocator interface. Reviewed By: dzhulgakov Differential Revision: D10001033 fbshipit-source-id: e1ffbc04c18d1dcb1f8d4ef2cbbb321967de5ccc	2018-10-03 19:10:10 -07:00
Jerry Zhang	74dc4460eb	New in StaticContext returns at::DataPtr (#12029 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12029 In order to remove New() function in StaticContext(to remove StaticContext) and converge to the Allocator design, we'll first change the return type of New to at::DataPtr. Reviewed By: ezyang Differential Revision: D9889990 fbshipit-source-id: 3257c763530b987025f428741bdd2e089d11bad4	2018-10-03 19:10:07 -07:00
Peter Goldsborough	bcc2a0599b	Enable clang-tidy in CI (#12213 ) Summary: At long last, we will have clang-tidy enabled in CI. For a while I thought I could clean up the project enough to enable clang-tidy with all checks enabled, but I figure it's smarter to set up the minimal checks and at least have those in CI. We can fix more going forward. ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/12213 Differential Revision: D10183069 Pulled By: goldsborough fbshipit-source-id: 7ecd2d368258f46efe23a2449c0a206d10f3a769	2018-10-03 17:25:06 -07:00
David Riazati	c9f9df002d	Properly catch errors in PythonOps (#12243 ) Summary: If a PythonOp throws an error it raises an exception to the interpreter and also releases the GIL which causes [pybind to segfault](https://github.com/potassco/clingo/issues/42) This fix catches pybind errors while the GIL is still held and throws a `python_error` to re-capture the GIL Fixes #12118 apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/12243 Differential Revision: D10182787 Pulled By: driazati fbshipit-source-id: 719d4a7c3294af201e061cf7141bec3ca0fb1f04	2018-10-03 17:25:03 -07:00
Jongsoo Park	557015fd93	wipe cache with writes (#12279 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12279 By some reason if we don't write to the wipe buffer, it doesn't really wipe out everything from caches in x86. We also need to wipe out cache after initializing input blobs. Reviewed By: Maratyszcza Differential Revision: D10161211 fbshipit-source-id: c34414dd8b83947805010d7d57e4134d56de1430	2018-10-03 17:12:23 -07:00
rohithkrn	6b9afc894b	pyHipify Fixes (#12292 ) Summary: This PR makes the following changes: * stores cuda_to_hip mappings in python OrderedDicts * Replace cudaError with cudaError_t and remove cudaError mapping bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/12292 Differential Revision: D10184399 Pulled By: bddppq fbshipit-source-id: b20a4661ba534e4fb12aa738e1ed74dba84f30fc	2018-10-03 17:12:17 -07:00
Wanchao Liang	fe10f3d0c6	Fix up onnxwhile op (#12124 ) Summary: Fix things in onnxwhile op to support nested loops, correctly track loop carried deps. Nested loops should be fully supported together with https://github.com/onnx/onnx/pull/1453 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12124 Differential Revision: D10108817 Pulled By: wanchaol fbshipit-source-id: 51b948024da857c9962833213ee792f47f054e48	2018-10-03 15:55:58 -07:00
Wanchao Liang	8aa23907e8	Make if block also take control_inputs, preserve SSA (#12224 ) Summary: If block is missing control inputs when do caffe2 net execution, this PR add them back and remove the un-SSA semantics jamesr66a Pull Request resolved: https://github.com/pytorch/pytorch/pull/12224 Differential Revision: D10135408 Pulled By: wanchaol fbshipit-source-id: 746c870bde54ed4ca627167361db1b3f36cd235c	2018-10-03 14:29:01 -07:00
Edward Yang	b548f8320d	Reduce size of TensorImpl from 160 bytes to 128 bytes (#12266 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12266 - Put all byte-size fields together (booleans and TensorTypeId), so they can be coalesced into a single word. - Replace std::vector<int64_t> strides with std::unique_ptr<int64_t[]>, saving two words. Reviewed By: dzhulgakov Differential Revision: D10150834 fbshipit-source-id: f54f38eec34732f3ff7e52e00b1371d7b5b210eb	2018-10-03 14:28:59 -07:00
Lu Fang	2217c0b408	create the onnx_root in local, and link it Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12294 Reviewed By: BIT-silence Differential Revision: D10178208 Pulled By: houseroad fbshipit-source-id: 6105b88ea5f3ce9164961cf13b356d85178c374d	2018-10-03 13:55:56 -07:00
Wanchao Liang	3db9738b30	add torch factory methods (zeros/ones) to onnx symbolic Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11477 Differential Revision: D9761637 Pulled By: wanchaol fbshipit-source-id: 401f8d43a831685a444e88509bace94ce5b94e52	2018-10-03 13:55:54 -07:00
Junjie Bai	01d835c9b2	Revert D10128131: [nomnigraph] Add move{Node,Edge,Subgraph} for Graph move-like semantics Differential Revision: D10128131 Original commit changeset: b0e17ec2802c fbshipit-source-id: c4a922c10ce8eddc965447b3cc4b6b01dd26dabb	2018-10-03 13:11:23 -07:00
David Riazati	d1ac1eba3b	Add `bool` type to IR (#11834 ) Summary: This PR adds a bool type to `IValue` and puts it into place. * changes conds for `prim::If` and `prim::Loop` to use `bool` type * changes operators that take `bool`s to match their native ops * fixes ambiguous `aten` ops `aten::std` and `aten::var` * fixes tests in `test_jit.py TestJitGenerated` ``` 'test_std_dim', 'test_std_dim_1d', 'test_std_dim_1d_neg0', 'test_std_dim_neg0', 'test_var_dim', 'test_var_dim_1d', 'test_var_dim_1d_neg0', 'test_var_dim_neg0' ``` * adds `prim::BoolToTensor` and `prim::TensorToBool` apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11834 Differential Revision: D9928570 Pulled By: driazati fbshipit-source-id: 373c53df2f1a8ffa9e33d9a517002fbeef25f3eb	2018-10-03 12:40:03 -07:00
Ashish	c029c839a1	MIOpen 1.5 group conv API integration (#12273 ) Summary: This PR contains changes for: 1. Group convolutions introduced in MIOpen 1.5 2. Checks to initialize MIOpen conv operator descriptors only when needed (inputs or weights changed) Differential Revision: D10174611 Pulled By: bddppq fbshipit-source-id: cd3d61fae350c4a5e540ce1a6e08012e0e2689fe	2018-10-03 12:26:58 -07:00
Bram Wasti	a839ec805a	Add move{Node,Edge,Subgraph} for Graph move-like semantics Summary: Adding back import{Node,Edge} as move{Node,Edge} and adding a new function moveSubgraph Reviewed By: duc0, yyetim Differential Revision: D10128131 fbshipit-source-id: b0e17ec2802cb211b6455578fdb17dab2a7a425b	2018-10-03 12:26:55 -07:00
Ir1dXD	b911ca9b0d	docs: change links to https (#12258 ) Summary: Hi, I think it might be better to use https instead of http in the README.md. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12258 Differential Revision: D10162279 Pulled By: soumith fbshipit-source-id: 4658aa75175909b4fea6972b437765d8b49c749f	2018-10-03 06:33:09 -07:00
Sven-Hendrik Haase	080266e79c	Document CUDAHOSTCXX environment variable (#12265 ) Summary: This variable is already being used so this just serves to document that. I think it's an important variable, too, so it should definitely be documented there somewhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12265 Differential Revision: D10162261 Pulled By: soumith fbshipit-source-id: e0d01e012c2fedea63372de9967a8eaa3745fe94	2018-10-03 06:33:06 -07:00
daquexian	1fb8925efe	Fix typo LMBD->LMDB in docs of setup.py (#12282 ) Summary: `setup.py` reads `USE_LMDB` rather than `USE_LMBD` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12282 Differential Revision: D10162025 Pulled By: soumith fbshipit-source-id: 6295a777be10509ca49516ad7c10061d26b6f9c9	2018-10-03 06:14:19 -07:00
Fei Sun	c0ed48a57e	Add support to the accuracy metric (#12211 ) Summary: The code that reads a blob from input files are broken. Fixing them. Also, add a binary that converts input files to blobs that can be used by Caffe2 directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12211 Reviewed By: llyfacebook Differential Revision: D10121845 Pulled By: sf-wind fbshipit-source-id: 6e48bb594680bcb3186d8d43276b602041c30d3e	2018-10-03 02:10:51 -07:00
Alexander Mazukabzov	06360c3050	Back out "Deduplicate canonical_axis_index_ with maybe_wrap_dim" Summary: Original commit changeset: 13c98fff0880 Reviewed By: ezyang Differential Revision: D10153342 fbshipit-source-id: c74c56e61662e9c747206e812b1da22170cbf742	2018-10-02 16:40:21 -07:00
Jongsoo Park	a76216b8ed	Back out "[aibench] Use caffe2::int8::Int8TensorCPU when input type is uint8_t" Summary: Original commit changeset: b63cd3a75f87 Reviewed By: bddppq Differential Revision: D10154512 fbshipit-source-id: 039dfd295c5d1de799993a20e708915be65e9d76	2018-10-02 16:25:11 -07:00
Junjie Bai	035d04299c	Update onnx to onnx/onnx@ddf8eb6 (#12267 ) Summary: `ddf8eb6aa0` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12267 Reviewed By: yinghai Differential Revision: D10151536 Pulled By: bddppq fbshipit-source-id: 4cb04fcc0377c6c39fb318c5fc7043e67c400866	2018-10-02 15:57:43 -07:00
Jongsoo Park	04b0774964	Use caffe2::int8::Int8TensorCPU when input type is uint8_t (#12250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12250 We use caffe2::int8::Int8TensorCPU for quantized tensor with uint8_t element type. Reviewed By: llyfacebook Differential Revision: D10121216 fbshipit-source-id: b63cd3a75f87e043cc3c83de4f3520b6ffbf1d07	2018-10-02 14:57:28 -07:00
Lu Fang	7c678746ef	update the script to match the current build process Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12262 Reviewed By: BIT-silence Differential Revision: D10148658 Pulled By: houseroad fbshipit-source-id: c083346cc40154f7baea1be713cac799cf076cbf	2018-10-02 14:01:37 -07:00
Peter Goldsborough	29e5ba8a7b	Fix for LibTorch download link (#12263 ) Summary: We now have a proper download link for libtorch. ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/12263 Differential Revision: D10149216 Pulled By: goldsborough fbshipit-source-id: e9caefed1c7f8e25d7623d72c8548bfdb6114329	2018-10-02 12:25:25 -07:00
Dmytro Dzhulgakov	1d3f650ce4	Revert D10098106: [pytorch][PR] [WIP] New version of PT1 model format Differential Revision: D10098106 Original commit changeset: 94ec7fc57c84 fbshipit-source-id: 38f729b0970618f38359797b806cbbcd865f4715	2018-10-02 00:43:40 -07:00
Junjie Bai	ff608a9ff3	Back out "Revert D10123245: Back out "codemod cuda_gpu_id to device_id"" (#12232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12232 Original commit changeset: fca91fea58b7 This adds proper modifications to the DeviceType <->DeviceOption conversion code added in D10033396 Reviewed By: jerryzh168 Differential Revision: D10132473 fbshipit-source-id: 801ef777e2950982cb47b48051b1471a0a91e64b	2018-10-01 21:54:52 -07:00
Edward Yang	696498d9e4	Delete stride updating logic from Caffe2, and make PyTorch error in this case. (#12236 ) Summary: Strides appear to cause a huge memory regression in some of our internal training workflows. This diff stems the bleeding, while we figure out exactly what happened. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12236 Reviewed By: dzhulgakov Differential Revision: D10134319 fbshipit-source-id: 1547c89a65c05473c409c0977c19c99dcaefb89c	2018-10-01 21:25:04 -07:00
iotamudelta	2cbcaf4544	Skip failing tests in test_sparse (#12229 ) Summary: Skip the recently introduced tests that fail on ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/12229 Differential Revision: D10138146 Pulled By: bddppq fbshipit-source-id: a0f1ff97fabb71f635a468e8030dbe32d388de49	2018-10-01 18:31:45 -07:00
Ilia Cherniavskii	8af06d8114	Use DFS scheduling only within single device (#11848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11848 Avoid crossing the boundary between devices when using DFS scheduling Reviewed By: romain-intel Differential Revision: D9931091 fbshipit-source-id: 1f3cf52127830048ed1db50b01677b66eeed8b32	2018-10-01 18:31:43 -07:00
Shicong Zhao	ecace9eb21	Move crf in caffe2 from fb to oss (#12200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12200 moved crf_viterbi_op, copied crf_predict and crf_viterbi_test to oss Reviewed By: Yangqing Differential Revision: D10118341 fbshipit-source-id: 51e30e57d280d6ca75fc0b488f743794f23b589f	2018-10-01 18:31:41 -07:00
Junjie Bai	26df16eb21	Clear previous device option when keep_device is set in load op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12240 Reviewed By: jerryzh168 Differential Revision: D10133933 fbshipit-source-id: 05935bd527177f936c1d08626888d43dedbf5ce4	2018-10-01 17:20:26 -07:00
Jerry Zhang	23f86ad57f	Back out "[caffe2][mpscnn] Enable multiple external output" Summary: Original commit changeset: 0cea9469cea0 Differential Revision: D10135814 fbshipit-source-id: 9563361cc00f4ce5dc2e903c0fcb10643ee9af26	2018-10-01 16:55:32 -07:00
Lu Fang	35becd1879	New version of PT1 model format (#12149 ) Summary: Considered four different existing formats: 1) static graph, 2) torch script, 3) pickle files, 4) PyTorch C++ serialize APIs Pull Request resolved: https://github.com/pytorch/pytorch/pull/12149 Reviewed By: BIT-silence Differential Revision: D10098106 Pulled By: houseroad fbshipit-source-id: 94ec7fc57c842e50fae5286ddeda657a4967a07a	2018-10-01 15:57:02 -07:00
Junjie Bai	8fa7de35f2	Enable ROCM clang-7 build Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12223 Differential Revision: D10133697 Pulled By: bddppq fbshipit-source-id: c1de99afccdad415ac1beb85d3b8ab44f9b58738	2018-10-01 15:11:40 -07:00
Roy Li	15d28e400f	remove support for c extensions (#12122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12122 We are deprecating support for c extensions. Please use cpp extension in the future. Reviewed By: Yangqing Differential Revision: D10060541 fbshipit-source-id: 4f7149e06a254bd7af463fd7aa9740f65369963a	2018-10-01 13:55:28 -07:00
Junjie Bai	1b59cf8b51	Add support to use llvm 7 in CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12182 Differential Revision: D10129630 Pulled By: bddppq fbshipit-source-id: f0217336474b807f03f84a4b8052ce92a6e3564b	2018-10-01 13:39:50 -07:00
Ilia Cherniavskii	06f535d8a0	More debug info in plan executor (#12183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12183 Adding more debug info printed from plan executor Reviewed By: manojkris Differential Revision: D10113104 fbshipit-source-id: dddc9aec8012c8575ab305033388412fdaaac537	2018-10-01 12:56:32 -07:00
Ilia Cherniavskii	eba1cf2145	Unify style (#11949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11949 Unify naming style Reviewed By: yinghai Differential Revision: D9931227 fbshipit-source-id: b6956bd98ed8625623e4747d616989f9f3a2ed46	2018-10-01 12:56:29 -07:00
Rick Ratmansky	3010dc4208	Revert D10123245: Back out "codemod cuda_gpu_id to device_id" Differential Revision: D10123245 Original commit changeset: d83da8e00a12 fbshipit-source-id: fca91fea58b7df208edc2e218a1d514f9821ec7b	2018-10-01 12:22:36 -07:00
Wei Yang	ecb3835387	change \gamma to \Gamma (#12214 ) Summary: - revert `\gamma` changes at landed PR: https://github.com/pytorch/pytorch/pull/12126 - minor fix for docs of `torch.norm()` SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/12214 Differential Revision: D10127337 Pulled By: weiyangfb fbshipit-source-id: 15eb8abda39ec9e8b2e815e2a22096cae786995a	2018-10-01 11:31:18 -07:00
Yang Liu	7d7d336c45	Back out "codemod cuda_gpu_id to device_id" Summary: Original commit changeset: f5614a5d2607 D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz We need to land this revert ASAP to unblock aggregator push. Reviewed By: orionr Differential Revision: D10123245 fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2	2018-10-01 11:31:14 -07:00
Duc Ngo	e43ffb0148	nomnigraph - easy - some code cleanup for transformations_test (#12101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12101 clean up some duplicate test code Reviewed By: ZolotukhinM Differential Revision: D10051914 fbshipit-source-id: 698ff144a85e8c70572116c5ddb415cd2396b4e3	2018-10-01 11:31:08 -07:00
Jerry Zhang	006171fffc	Back out "[pytorch][PR] Revert "Move CreateContext to global registry (#11688 )"" (#12121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12121 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12055 Original commit changeset: 6ca9de65b707 Reviewed By: ezyang Differential Revision: D10033396 fbshipit-source-id: ca9f4b2f7ef0561f619b833415d394a8b9972bf4	2018-10-01 11:10:46 -07:00
Elias Ellison	fed91f873f	(Very small) allow trailing commas in assign or tuples (#11723 ) Summary: Allow trailing commas in assign statements or tuples, which also allows single element tuples. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11723 Differential Revision: D10052162 Pulled By: eellison fbshipit-source-id: 344d908a3ad942a23ebd9f341794bc9734226aa8	2018-10-01 10:10:13 -07:00
Jongsoo Park	f3c32a4b54	dnnlowp_16 -> dnnlowp_acc16 (#12205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12205 We're more interested in testing the performance of DNNLOWP_ACC16 engine. Reviewed By: llyfacebook Differential Revision: D10121080 fbshipit-source-id: 7def38be838feb7636f7dd0c8ed352c2df398ec1	2018-10-01 09:40:13 -07:00
Hector Yuen	9768b4d4ff	support half float for SparseLengthsIndicesInGradientWeightedSumWithMainInputGradient (#12186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12186 specialized implementation, preconvert embeddings to float and do everything on fp32 Reviewed By: jspark1105 Differential Revision: D10100603 fbshipit-source-id: 3255b4addb6fda24722bd519163099f5d354d084	2018-09-30 23:56:14 -07:00
Peter Goldsborough	c3817e85fa	Temporary fix for LibTorch download link (#12212 ) Summary: We're waiting for the libtorch links to show up on the website. I had a fake link in the docs so far which is misleading. This PR changes it to a temporary markdown file until the web people fix the site tomorrow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12212 Differential Revision: D10121872 Pulled By: goldsborough fbshipit-source-id: f1bd1315f7333b9168e99983f3f6b679c9b0c52a	2018-09-30 15:39:51 -07:00
Wei Yang	572132fb17	copy_(Sparse, Sparse) for sparse tensor (#9005 ) Summary: - fix #8330 - add `torch.copy_(Sparse, Sparse)` with autograd support Pull Request resolved: https://github.com/pytorch/pytorch/pull/9005 Differential Revision: D8987885 Pulled By: weiyangfb fbshipit-source-id: b317a41da22ee1eae2835622a0ed28a6771a3a06	2018-09-30 11:55:09 -07:00
Peter Goldsborough	93ecf4d72a	Remove raise_from (#12185 ) Summary: soumith CC alsrgv Fixes #11995 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12185 Differential Revision: D10120103 Pulled By: goldsborough fbshipit-source-id: ef7807ad83f9efc05d169675b7ec72986a5d17c3	2018-09-29 22:41:55 -07:00
Wei Yang	5ffc915f26	fix docs (#12126 ) Summary: - fix https://github.com/pytorch/pytorch/issues/12120 - add `torch.argsort`, `torch.pdist`, `broadcast_tensors` to *.rst files - add parameter dim to `torch.unique` doc - fix table and args for `torch.norm` - test plan: make html and check docs in browser gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/12126 Differential Revision: D10087006 Pulled By: weiyangfb fbshipit-source-id: 25f65c43d14e02140d0da988d8742c7ade3d8cc9	2018-09-29 22:26:45 -07:00
Jiyan Yang	40aa212cd6	Support fp16 mkl engine in training Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12080 Reviewed By: hyuen Differential Revision: D10037719 fbshipit-source-id: 618ce894eccc4c87a038dc3ab836684f16843cde	2018-09-29 21:55:11 -07:00
iotamudelta	a2ebbccc9f	fix unit tests on CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12187 Differential Revision: D10118483 Pulled By: bddppq fbshipit-source-id: 986c8fb48d61e00103c713548a50e74489a0e442	2018-09-28 23:11:55 -07:00
mruberry	878e7740fd	Turns optimizations off when checking trace (#12172 ) Summary: Currently when tracing optimizations are performed twice. This means that optimizing passes, like the fusion pass, are also called twice. This is unnecessary and this PR turns off optimizations when checking the trace (since the trace is independent of optimizations). This should improve performance and debugging. apaszke who proposed this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12172 Reviewed By: ezyang Differential Revision: D10109250 Pulled By: apaszke fbshipit-source-id: 8b3385eae143446820f1b61ca7576d7c07f9b248	2018-09-28 19:40:10 -07:00
Bram Wasti	22ce6060ec	Add caffe2_api to exported functions (#12184 ) Summary: Broke the build, sorry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12184 Differential Revision: D10114818 Pulled By: bwasti fbshipit-source-id: 49844183a48d9383c5055a9ce06fe61fbf353050	2018-09-28 18:12:00 -07:00
Jerry Zhang	ebc2643498	Enable multiple external output (#10957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10957 att Differential Revision: D9525097 fbshipit-source-id: 0cea9469cea06cbfd3828549b168483413788269	2018-09-28 18:11:58 -07:00
Bram Wasti	0a5dfa5a52	Add support for device annotations on blobs Summary: device annotations on blobs with Declare and Export trick Reviewed By: yyetim Differential Revision: D9999916 fbshipit-source-id: 0bd4d15e7beed2788f47255d52ea296f8f674295	2018-09-28 14:11:54 -07:00
Bram Wasti	08e5ca1262	Add filter<T>(NNModule) and explicit Declare/Export classes (#11955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11955 Adding a `filter<T>(NNModule)` function to easily get inputs/outputs of a DAI-style NNModule. Reviewed By: duc0 Differential Revision: D9997696 fbshipit-source-id: 818c4f2e3093e0d02b35e6632b426e8d3189c21e	2018-09-28 14:11:53 -07:00
Bram Wasti	60061a20d9	Adding Declare and Export operators (#11954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11954 Adding an alternative to external_input and external_output for use in some distributed settings Reviewed By: aazzolini Differential Revision: D9997121 fbshipit-source-id: 1b5cc03fd3051368a3edc69e7bc472386f5746b5	2018-09-28 14:11:51 -07:00
mruberry	7b2c0a09e4	Adds support for NaN, +inf, -inf float scalars to CPU and CUDA fusers (#12070 ) Summary: In current upstream float scalars are always written into kernels with: `out << std::scientific << v << "f";` When the floats are special values like NaN, +inf, or -inf this produces nonsense that causes compilation to fail. This fix updates the conversion of float scalars to device-specific special values. The appropriate macros are added to the CPU and CUDA resource strings. Note that a NAN macro was not necessary on the CPU since math.h defines NAN. To verify this fix I updated the test_clamp_fusion test in test_jit.py. I wanted to test -inf, too, but -inf is not currently accepted by the interpreter. Edit: Forgot to mention, this partially addresses issue #12067. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12070 Reviewed By: ezyang Differential Revision: D10044704 Pulled By: soumith fbshipit-source-id: 8f4a930862d66a7d37d985e3f6a6fb724579e74c	2018-09-28 14:11:49 -07:00
Edward Yang	0e779c27e1	Deduplicate canonical_axis_index_ with maybe_wrap_dim (#11891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11891 maybe_wrap_dim is a slightly more general function, which is able to, under some circumstances, treat 0 as a "valid" dimension even with a tensor is scalar. canonical_axis_index_ never accepts this behavior, so it always passes false. Reviewed By: jerryzh168 Differential Revision: D9968320 fbshipit-source-id: 13c98fff0880d7bfcd00911a76c8aa10d37bd183	2018-09-28 14:11:48 -07:00
Aditya Kumar	ab9a5976a0	Disable inlinining of EnforceFailMessage (#12078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12078 The constructor is inlined multiple times Reviewed By: salexspb Differential Revision: D9358084 fbshipit-source-id: c8d4177a3fcccac574ee4f63336a6fa8bfb07d11	2018-09-28 11:24:35 -07:00
Tongzhou Wang	8009b6cdb5	Kill self_ty in TYPE_DERIVED_DEFINITION_NATIVE (#11903 ) Summary: This allows us to call the type argument with name other than `self_ty`. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11903 Differential Revision: D10105029 Pulled By: SsnL fbshipit-source-id: 0fbdc728123ebc1154d080628cb41a085ba3e6d7	2018-09-28 11:09:50 -07:00
Zachary DeVito	e7e10e60e0	Introduce builtin script functions (#12141 ) Summary: This functionality replaces the Scalar-Tensor builtin operators, with builtin functions. Builtin functions are used in place of operators where one operator can be defined using a composition of another. This simplifies later optimization passes by allowing us to have fewer operator. In the future, builtin functions can be used for other purposes. For example, we can define derivative functions as code rather than building graphs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12141 Reviewed By: ezyang Differential Revision: D10088065 Pulled By: zdevito fbshipit-source-id: a2acb06346e649c4c8a2fe423b420871161c21cf	2018-09-28 10:55:08 -07:00
Junjie Bai	65bf181ddf	Add "ai.onnx.pytorch" onnx domain (#12157 ) Summary: zrphercule Pull Request resolved: https://github.com/pytorch/pytorch/pull/12157 Differential Revision: D10100799 Pulled By: bddppq fbshipit-source-id: 76fdd126e0b52c54276752b3b0174735355a7d2f	2018-09-28 09:57:06 -07:00
Fritz Obermeyer	0aff3cc559	Fix broadcasting bug in StudentT (#12148 ) Summary: This fixes a broadcasting error with the `StudentT` distribution - [x] added a regression test - [x] strengthened parameter broadcasting tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/12148 Differential Revision: D10099226 Pulled By: soumith fbshipit-source-id: 0c5eb14180d158f8fff28ceb9e7cd3471c2bb803	2018-09-28 09:57:02 -07:00
cclauss	b0248df72a	Docs: Change cuda(async) —> cuda(non_blocking) (#12158 ) Summary: goldsborough Modify the docs to match the changes made in #4999 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12158 Differential Revision: D10103964 Pulled By: SsnL fbshipit-source-id: 1b8692da86aca1a52e8d2e6cea76a5ad1f71e058	2018-09-28 08:39:27 -07:00
Luca Antiga	5be0baefa2	Use streams in JIT serialization, allow JIT serialization to/from buffer (#11932 ) Summary: This PR replaces the use of `std::FILE` with `istream`/`ostream` for JIT serialization. It uses this mechanism to add the possibility to serialize to/from binary buffers, in addition to files, both in `libtorch` and from Python. `getExportImportCopy` in `test_jit.py` has been updated so that both file and buffer codepaths are exercised during tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11932 Differential Revision: D10084303 Pulled By: apaszke fbshipit-source-id: b850801b3932922fa1dbac6fdaed5063d58bc20d	2018-09-28 07:54:27 -07:00
Jeff Smith	d291cf7de6	Ensuring positive definite matrix before constructing (#12102 ) Summary: Ensuring positive definite matrix in Multivariate Normal Distribution Pull Request resolved: https://github.com/pytorch/pytorch/pull/12102 Reviewed By: ezyang, Balandat Differential Revision: D10052091 Pulled By: jeffreyksmithjr fbshipit-source-id: 276cfc6995f6a217a5ad9eac299445ff1b67a65f	2018-09-28 07:27:20 -07:00
Satish Nadathur	04c0971679	Special case BatchGather and BatchGatherGradient for block_size=1. (#11349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11349 Special case BatchGather and BatchGatherGradient for block_size=1. This makes BatchGather 3-4X faster and BatchGatherGradient 10X for this case. Reviewed By: jspark1105, ilia-cher Differential Revision: D7218043 fbshipit-source-id: ea12042239a8adc92b9efcbd0b66e354fb43f4c7	2018-09-27 21:11:38 -07:00
Edward Yang	f5a0c337ba	Move TensorImpl IsType, meta, dim32, dim, ExtractDeviceOption to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12100 Reviewed By: jerryzh168 Differential Revision: D10051424 fbshipit-source-id: 5986e92ea54e60ec6bfe992015a05e09288c948c	2018-09-27 20:40:03 -07:00
Edward Yang	bbae57d06e	Move TensorImpl size_from_dim, size_to_dim, size_between_dim, canonical_axis_index to caffe2::Tensor (#12099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12099 - Generalize the free functions to accept IntList, not just std::vector<int64_t> Reviewed By: jerryzh168 Differential Revision: D10051365 fbshipit-source-id: e3d571bf8fead22f6f25c3ca46f0c38c2bb065d2	2018-09-27 20:40:00 -07:00
Junjie Bai	3eb5940cf5	codemod cuda_gpu_id to device_id (#12022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12022 codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id codemod with 'Yes to all' Reviewed By: orionr Differential Revision: D9986213 fbshipit-source-id: f5614a5d26078817aee8caf79a494abfd6a95ff1	2018-09-27 20:24:53 -07:00
Edward Yang	149403f849	Move TensorImpl ndim, size, itemsize and nbytes to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12098 Reviewed By: jerryzh168 Differential Revision: D10051298 fbshipit-source-id: a833fad74bbda38c019ec2cb97d4bb6804e09963	2018-09-27 19:56:00 -07:00
Michael Suo	7f35e92af2	mutable lists (#10700 ) Summary: This PR implements the design that we discussed. Changes: - Added a World token IValue and type. The IValue is basically a dummy struct for now, in the future we may extend it (say, add thread-local state). - Effectful ops explicitly declare they are mutable by having World tokens as inputs and outputs in their schema. - Purely functional ops that use mutable values will get "fenced" and the world token will be threaded through the fences - AnnotateEffects pass which wires up all the world tokens together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10700 Reviewed By: eellison Differential Revision: D9547881 Pulled By: michaelsuo fbshipit-source-id: ebbd786c31f15bf45e2ddb0c188438ff2f5f3c88	2018-09-27 19:25:13 -07:00
Edward Z. Yang	a5818047c4	Rewrite serialization to correctly handle partial reads/writes in all cases (#12143 ) Summary: Previously, doRead/doWrite were functions that could return partial reads/writes, and we checked for this case inconsistently in the call sites of serialization.cpp. Now, these functions do NOT return the amount of bytes read/written, and instead handle the necessary checking loop themselves. Fixes #12042. Maybe. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12143 Differential Revision: D10097027 Pulled By: ezyang fbshipit-source-id: fd222ab8a825bed352153648ad396acfe124a3e1	2018-09-27 19:09:53 -07:00
Edward Yang	a86a61b004	Implement caffe2::Tensor::raw_data() in terms of data() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12097 Reviewed By: jerryzh168 Differential Revision: D10051202 fbshipit-source-id: b4b61869363a606ab465d1500558226efae30d06	2018-09-27 18:40:37 -07:00
Edward Yang	2021b26bcb	Move TensorImpl::ShareExternalPointer helper overloads to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12096 Reviewed By: jerryzh168 Differential Revision: D10051126 fbshipit-source-id: a9b95d00512a0b4e6339d4f3f0bb180dd0c79247	2018-09-27 18:40:35 -07:00
Edward Yang	976a9e0454	Move TensorImpl::DebugString() to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12095 Reviewed By: jerryzh168 Differential Revision: D10051078 fbshipit-source-id: f56b6fc5d1cb8ae4b636e88efe607fe65cc1d7a0	2018-09-27 18:40:33 -07:00
Edward Yang	b0e48aa197	Move TensorImpl::Reshape(vector<int>) to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12094 Reviewed By: jerryzh168 Differential Revision: D10051079 fbshipit-source-id: 87fb91f31c33ce9b64c4654e79e0131ae391cd78	2018-09-27 18:40:30 -07:00
Edward Yang	8c533c2c90	Fix bug where Reshape() trashes strides. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12092 Reviewed By: jerryzh168 Differential Revision: D10051005 fbshipit-source-id: c36d1c8d12fb41baf8d1a1a9f38776deeff242de	2018-09-27 18:40:28 -07:00
Edward Yang	d02478e607	Move TensorImpl::ResizeLike to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12091 Reviewed By: jerryzh168 Differential Revision: D10051012 fbshipit-source-id: 772ecd2e377f7d4e1ae510c1f647f6c8b71e5a57	2018-09-27 18:40:25 -07:00
Edward Yang	dd73d57643	Move TensorImpl::ShrinkTo to caffe2::Tensor (#12090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12090 This is a slight pessimization because we need to do a full recompute of is_contiguous(), even though a modification of dim-0 is guaranteed to preserve contiguity. Reviewed By: jerryzh168 Differential Revision: D10050905 fbshipit-source-id: b99233e21c9f4275b0db6e76740462e5430ce152	2018-09-27 18:40:23 -07:00
Edward Yang	00c6fb16e7	Move ExtendTo to caffe2::Tensor from TensorImpl Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12089 Reviewed By: jerryzh168 Differential Revision: D10050859 fbshipit-source-id: 843067aacfa2a519657220bc39a0f499582a48a4	2018-09-27 18:40:21 -07:00
Edward Yang	6a2dbc9808	Rename TensorImpl::GetDeviceType to device_type, and properly test if is_variable Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12087 Reviewed By: jerryzh168 Differential Revision: D10050781 fbshipit-source-id: 0b6c9d7caf3b1000691f86fcc7f2ef203936a29f	2018-09-27 18:40:19 -07:00
Edward Yang	c5fc2f1105	Merge UndefinedTensorImpl. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11972 Reviewed By: gchanan, Yangqing, jerryzh168 Differential Revision: D9995633 fbshipit-source-id: 6b4645c9d4bb0bc4301cd4bcfa76cf85331b8379	2018-09-27 18:40:16 -07:00
Wanchao Liang	e8cb6cb9d2	Fix some symbolics for ReduceSum, GE, LE (#12123 ) Summary: reduce sum negative indices turn to positive as caffe2 not supporting it. GE/LE symbolic operand order is wrong.. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12123 Reviewed By: houseroad Differential Revision: D10095467 Pulled By: wanchaol fbshipit-source-id: eb20248de5531c25040ee68b89bd18743498138d	2018-09-27 17:40:46 -07:00
Edward Yang	f6abd16a9d	Merge TensorImpl. (#11971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11971 - Switched TensorImpl::data<T>() to use Storage::unsafe_data<T>() to work around an outstanding bug in the Storage::data<T>() implementation where it only works on Ts which are valid ScalarType - Qualify a bunch of identifiers which still live in caffe2:: namespace - strides returns an IntList now - s/update_strides/update_to_contiguous_strides/ - Correctly compute type_id_ for the Storage only constructor from Caffe2. This is special cased to only work for CPU and CUDA dense tensors. - Fix some signed-unsigned comparisons in Caffe2 code (OSS build for ATen/core has more restrictive warning tests.) Reviewed By: jerryzh168 Differential Revision: D9995559 fbshipit-source-id: 9c74032e011189e1c7e9a98d20f2bd1e25ad2e5c	2018-09-27 17:40:44 -07:00
Edward Yang	1619264ca5	Make ATen-core and caffe2 mutually recursive / merge template data<T>() (#11970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11970 Adds an ATen-core-headers target, which caffe2_cpu_internal depends on, and makes ATen-core depend on caffe2_headers. If you link against ATen-core, you must ALSO link against caffe2_cpu_internal; if you link against caffe2_cpu_internal, you must ALSO link against ATen-core, otherwise you'll have undefined symbols. Then, we merge template data<T>() method with Caffe2 implementation, demonstrating that includes to Caffe2 (core) from ATen/core are working Reviewed By: jerryzh168 Differential Revision: D9967509 fbshipit-source-id: 3d220c38b2c3c646f8ff2884fdcc889fa9276c7a	2018-09-27 17:40:42 -07:00
Gu, Jinghui	c35f85a6d4	Export symbols for pybind and other libs after caffe2 rebase (#11975 ) Summary: Export symbols for pybind and other libs after caffe2 rebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/11975 Differential Revision: D10042615 Pulled By: yinghai fbshipit-source-id: 6de562d99403099113093716834abc51bf726e94	2018-09-27 14:40:27 -07:00
wuhuikx	80e3081c28	Add observers for mkldnn fallback operators (#9093 ) Summary: Add observers for ideep operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9093 Reviewed By: salexspb Differential Revision: D9952949 Pulled By: yinghai fbshipit-source-id: 1678d1a738f8781dc75eb3cb9dfb309f7b7934fb	2018-09-27 14:11:19 -07:00
Cheng,Penghui	6e7e63fda3	Implementation MomentumSGD/MomentumSGDUpdate operators for mkl-dnn (#11686 ) Summary: the speed-up of a single operation is up to 6X on BDW. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11686 Reviewed By: yinghai Differential Revision: D9828129 Pulled By: wesolwsk fbshipit-source-id: 7dbacea90609e18438f6fe1229c641937d0696c8	2018-09-27 13:39:59 -07:00
Yangqing Jia	13cf39294d	Remove ATen/Error.h and use ATen/core/Error.h instead. (#12132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12132 TSIA. No code change involved. Reviewed By: bwasti Differential Revision: D10083237 fbshipit-source-id: bdab029015b9d0f1fa1f866c68aa5945cc68db9d	2018-09-27 10:11:17 -07:00
Freddie Mendoza	a72603f8f8	Fix for ppc64le jit graph difference in sigmoid backward, see #10726 (#11579 ) Summary: As reported in Issue #10726, the jit compiler, when running on ppc64le, may produce an isomorphic output but fail a diff test against the expected output file. The expected output file is created from a test that was ran on x86_64. This ensures that if ppc64le test output is different, the output is instead compared to an expected output file created when the test is run on a ppc64le system. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11579 Differential Revision: D10080890 Pulled By: soumith fbshipit-source-id: 7249bf6b5dfa7c853368a3688a982bc9ed642bc9	2018-09-27 07:09:31 -07:00
Yangqing Jia	9c49bb9ddf	Move registry fully to c10 (#12077 ) Summary: This does 6 things: - add c10/util/Registry.h as the unified registry util - cleaned up some APIs such as export condition - fully remove aten/core/registry.h - fully remove caffe2/core/registry.h - remove a bogus aten/registry.h - unifying all macros - set up registry testing in c10 Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077 Reviewed By: ezyang Differential Revision: D10050771 Pulled By: Yangqing fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf	2018-09-27 03:09:54 -07:00
Jerry Ma	383d340e88	Small optimization for adam (#12107 ) Summary: Apply weight decay for Adam in-place instead of via copy. Synced offline with soumith , who mentioned that it should be OK. This is also consistent with other optimizers, e.g. `eee01731a5/torch/optim/sgd.py (L93)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12107 Reviewed By: soumith Differential Revision: D10071787 Pulled By: jma127 fbshipit-source-id: 5fd7939c79039693b225c44c4c80450923b8d673	2018-09-26 21:43:46 -07:00
Edward Yang	5da8a8c785	Handle undefined tensor in blob correctly. (#12125 ) Summary: You can't GetDeviceType an undefined tensor, so test for this case first. This allows you to safely move tensors out of blobs. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12125 Reviewed By: smessmer Differential Revision: D10080075 Pulled By: ezyang fbshipit-source-id: bb99b089b6daa9d4db99015208f939d7ce4d4a79	2018-09-26 21:43:41 -07:00
zrphercule	325101263a	Aten: catch2gtest (#11846 ) Summary: migrant all tests in aten to use gtest except of basic.cpp Sinc features of gtest are different from catch test, some of the tests has been re-writted with similar meaning. Basic test has some version conflict with valgrind according to CI, therefore this testcase is still implementing catch. It will be resolved by a different pr. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11846 Differential Revision: D10080860 Pulled By: zrphercule fbshipit-source-id: 439d4cf33fb6ccbe79b797860342853c63e59081	2018-09-26 20:57:45 -07:00
Peter Goldsborough	0f81039eaf	Better high level C++ documentation (#12079 ) Summary: I wrote some high level docs for the larger PyTorch C++ universe and the C++ frontend specifically. Happy for reviews, but let's please also land this ASAP so I can point users at something that looks more ready baked than the C++ docs landing page (https://pytorch.org/cppdocs) does right now. ezyang soumith CC ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/12079 Differential Revision: D10080785 Pulled By: goldsborough fbshipit-source-id: 3028de41373f307468eb1e3802aa27871c93b2e3	2018-09-26 20:57:43 -07:00
Christian Puhrsch	db5f8d42bb	Remove TIndex typedef from core/common.h (#12032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12032 See title Reviewed By: dinhviethoa Differential Revision: D10023757 fbshipit-source-id: dbf0a043b2afab767f052bd4c5e8de13e0f57dcc	2018-09-26 17:02:54 -07:00
Zachary DeVito	478803a75f	Introduce type variables to implement generic list operators (#12040 ) Summary: We generate specialized list operations for int, float, and Tensor lists so that small lists of integers like the arguments to conv do not involve tons of boxing code. This PR adds a fallback GenericList for List types that contain any other type. It does so by adding type variables to `jit::Type`, and machinery for matching/replacing the type variables during `tryMatchSchema` and operator lookup. It also modifies the builtin list ops to include a fallback that works on a GenericList object that simply holds IValues. This is distinguished from IValue's tuple type so that conversion to/from Python still happens losslessly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12040 Differential Revision: D10037098 Pulled By: zdevito fbshipit-source-id: 0c5f2864d12e7d33554bf34cc29e5fb700dde150	2018-09-26 17:02:51 -07:00
Joel Marcey	75b1ae1acd	Update issue templates Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12114 Reviewed By: soumith Differential Revision: D10060349 Pulled By: JoelMarcey fbshipit-source-id: ed88bf95f78742b089adb043e88613a5db006a10	2018-09-26 16:26:00 -07:00
Syed Tousif Ahmed	1b45f68397	Use atomicAdd from cuda_fp16 header when building with CUDA 10 (#12108 ) Summary: An efficient atomicAdd for halfs has been added in `cuda_fp16.h` in CUDA 10: ```__CUDA_FP16_DECL__ __half atomicAdd(__half *address, __half val);``` Through this change, PyTorch will be able to utilize efficient atomicAdd when building with CUDA 10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12108 Differential Revision: D10053385 Pulled By: soumith fbshipit-source-id: 946c90691a8f6bdcf6d6e367a507ac3c9970b750	2018-09-26 15:28:17 -07:00
Vlad Belous	6ff568df4d	Add full namespace resolution in CAFFE_DURATION (#12065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12065 Had compilation issues using CAFFE_DURATION in some contexts, specifically due to namespace resolution. Since this is a macro, it should fully qualify. Reviewed By: heslami Differential Revision: D10036132 fbshipit-source-id: b8d55dfe5e991ca702ce5b7483f0ffc699882c85	2018-09-26 13:29:18 -07:00
Dong Shi	d9c27f4d8d	T33898723: Simple put operators for caffe2 stats (#12057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12057 Add simple put operators for various types of stats Reviewed By: mlappelbaum Differential Revision: D9925268 fbshipit-source-id: cec02b0027d2d0ef3d35741be4b02c429d492810	2018-09-26 12:39:37 -07:00
Doug Friedman	c2f8f5076c	add narrow() support for sparse tensors re: #8853 (#11342 ) Summary: Couple questions: 1) I used the log1p implementation in #8969 as a guide especially for testing. I'm not sure what the ```skipIfROCM``` annotation is for, so unsure if i need it for my test. 2) I implemented the branching logic in the narrow function itself; is this the right place to do so? I noticed that there a number of places where sparse-specific logic is handled with just an if statement in this file. Or should I implement a separate dispatch in native_functions.yml as in the log1p? And of course, happy to make any any other updates/changes that I may have missed as well. This is my first PR to the project. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11342 Differential Revision: D9978430 Pulled By: weiyangfb fbshipit-source-id: e73dc20302ab58925afb19e609e31f4a38c634ad	2018-09-26 12:24:54 -07:00
Adam Paszke	78fe149ab9	Fix ONNX bug, add symbolic for full Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12052 Differential Revision: D10044910 Pulled By: apaszke fbshipit-source-id: 015ef372966d7594e1b450e348d457429f6ef20d	2018-09-26 11:45:25 -07:00
Adam Paszke	18f9c07b18	Enable tracing of tensor factories with an out argument Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12051 Differential Revision: D10044890 Pulled By: apaszke fbshipit-source-id: 2d794bf408875600bc71f354f0b4961d6b715094	2018-09-26 09:40:34 -07:00
vishwakftw	b535aecd7c	Fix warnings emitted when testing distributions (#12038 ) Summary: The earlier tests had around 80 warnings, and now there are 6 warnings: these are due to JIT The changes remove the wrapping of a Tensor by a Tensor constructor, which emits warnings due to the changes in https://github.com/pytorch/pytorch/pull/11061 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/12038 Differential Revision: D10033392 Pulled By: apaszke fbshipit-source-id: b1faf368e650d062d7983f9932511bee4702a893	2018-09-26 09:24:54 -07:00
Orion Reblitz-Richardson	02d7c88fa4	Unify versions across setup.py, libtorch, and libcaffe2 (#12053 ) Summary: This unifies our versions across setup.py, libtorch, and libcaffe2. CMake has a default version (bumped to 1.0.0) that can be overridden by setup.py. The versions are also printed as a part of cmake/Summary.cmake to make sure they are correct. cc Yangqing ezyang soumith goldsborough pjh5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12053 Differential Revision: D10041878 Pulled By: orionr fbshipit-source-id: a98a01771f6c008d1016ab63ab785c3a88c3ddb0	2018-09-26 08:55:06 -07:00
Richard Zou	c8a0b11b7f	add autodiff expressions for common operations (#11832 ) Summary: This PR does a few things: Previously test_jit.py only tested autograd on backward graphs. This is because we borrow from test_autograd and construct graphs with a small number of nodes. Because the number of nodes is small (typically 1-2), those graph do not end up containing autodiff subgraphs, so autodiff never gets tested. This PR enables autodiff testing by doing the following: - added disableDebugAutodiffSubgraphInlining fn to graph_executor to disable autodiff subgraph inlining. - (implementation) added autodiffSubgraphNodeThreshold and autodiffSubgraphInlineThreshold. These are set to their default values (2, 5) but disableDebugAutodiffSubgraphInlining() sets both to 1, disabling subgraph inlining and allowing 1-node autodiff subgraphs. - The relevant backward jit tests disable autodiff subgraph inlining so they will test the autodiff versions of the operators instead of autograd whenever an autodiff variant exists. - We don't run the tests that do inline autodiff subgraphs anymore. This has no impact on testing correctness because the assumption is that autograd functions are correct and are tested in test_autograd.py This allows the graph fuser to work better because a lot of these ops were previously not autodiff-compatible but fusible. On a more concrete example, lstm backward contains a lot of tensor-scalar operations; these autodiff formulas help its double backward pass. Included: - arithmetic overloads - abs, acos, asin, atan, ceil, cos, cosh, exp, expm1, floor, fmod, frac, log, log10, log1p, log2 reciprocal, remainder, round, sin, sinh, tan, trunc, rsqrt TestJitGenerated tests autodiff for all of the added operations. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11832 Differential Revision: D10031256 Pulled By: zou3519 fbshipit-source-id: 9daf9900a5ad187743609cd0fbbd10b15411ad93	2018-09-26 08:10:04 -07:00
Sebastian Messmer	21ed7e51b6	Blob doesn't allow access to destroyCall anymore (#11548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11548 This removes getting/setting the DestroyCall of a Blob, paving the way to removing DestroyCall from Blob entirely and using the destructor stored in TypeMeta instead. Use sites have been fixed in diffs stacked below this. Reviewed By: dzhulgakov Differential Revision: D9775191 fbshipit-source-id: 97d72d0c62843849057f295c27f391e63c99c521	2018-09-26 01:45:28 -07:00
Sebastian Messmer	65cbb8226b	IValue can store Blob (#11414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11414 caffe2::Blob can be stored in an IValue. This is a precondition for caffe2 to switch from Blob to IValue. Reviewed By: ezyang Differential Revision: D9731326 fbshipit-source-id: 462a39d2d9ab6f85b99b1670848c6976a3de417c	2018-09-26 01:12:31 -07:00
Sebastian Messmer	b7ebc00979	Move Blob to ATen/core (#11924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11924 Previous diffs removed Blob -> caffe2 dependencies, now we can move it to ATen/core. This is pre-work for allowing storing Blob in IValue. Reviewed By: ezyang Differential Revision: D9980641 fbshipit-source-id: 32082a673ec94c42c20b2298adced8bb7ca94d07	2018-09-25 23:27:52 -07:00
Ansha Yu	8ff435c8f6	Use tempfile during serialized test comparison (#12021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12021 TestPilot runs stress tests in parallel. These fail for serialized tests because extracting (and subsequent deletion) of binary data during the process isn't threadsafe. Extract zips into tempfile to avoid this problem. Also remove some accidentally checked in zips of a test that we didn't end up including for now. Reviewed By: houseroad Differential Revision: D10013682 fbshipit-source-id: 6e13b850b38dee4106d3c10a9372747d17b67c5a	2018-09-25 20:55:45 -07:00
Wei Yang	807de9a1e3	fix segfault when grad to a hook fn is None (#12028 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/11751 by checking if a grad is a Python None object before getting cdata from it - behaviors: pre-fix ``` >>> a = torch.randn(5, requires_grad=True) >>> a_list = a.unbind() >>> a0 = a_list[0] >>> a0.register_hook ...: def hook(grad): ...: print(grad) >>> a_list[0].backward() tensor(1.) >>> print('a_list[0]', a_list[0].grad, a.grad) ('a_list[0]', None, tensor([1., 0., 0., 0., 0.])) >>> a_list[1].backward() # segfault ``` post-fix ``` >>> a = torch.randn(5, requires_grad=True) >>> a_list = a.unbind() >>> a0 = a_list[0] >>> a0.register_hook ... : def hook(grad): ... : print(grad) >>> a_list[0].backward() tensor(1.) >>> print(a_list[0].grad, a.grad) (None, tensor([1., 0., 0., 0., 0.])) >>> a_list[1].backward() None >>> print(a_list[1].grad, a.grad) (None, tensor([1., 1., 0., 0., 0.])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12028 Differential Revision: D10034094 Pulled By: weiyangfb fbshipit-source-id: 3f2135325fa7d338b920f57752057e4f6a6c0b1d	2018-09-25 19:10:25 -07:00
Cheng,Penghui	db2f7de5c3	Fallback CreateMutex/AtomicIter operators for mkl-dnn Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11685 Reviewed By: pjh5 Differential Revision: D9928058 Pulled By: wesolwsk fbshipit-source-id: 734e19c35a684481d9a4d4f0c596e4dceae51ad4	2018-09-25 17:41:08 -07:00
Yangqing Jia	28dba2f928	Unify all _EXPORT and _IMPORT macros across c++ backend (#12019 ) Summary: TSIA. Right now we should basically use C10_EXPORT and C10_IMPORT for explicitly marking dllexport and dllimport, as a continued effort of the C10 unification. This is a codemod by mechanically doing the following change: CAFFE2_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT} AT_CORE_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT} Pull Request resolved: https://github.com/pytorch/pytorch/pull/12019 Reviewed By: ezyang, teng-li Differential Revision: D10016276 Pulled By: Yangqing fbshipit-source-id: a420d62c43d1110105fc88f9e9076e28a3203164	2018-09-25 17:41:05 -07:00
Edward Yang	90bcf41291	Add safety asserts for methods on TensorImpl which don't work on Variable. (#12058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12058 Methods on TensorImpl have to be written very carefully, because when you have a VariableImpl subclass of TensorImpl, usually the local fields on the TensorImpl are not valid; instead, you have to forward to the "wrapped" tensor. Functions which are virtualized are probably handled correctly by Variable, but functions which are NOT cannot be handled correctly and shouldn't be called if you have a Variable. This diff add checks to determine if this is the case or not. Reviewed By: jerryzh168 Differential Revision: D10034589 fbshipit-source-id: 650b2036ca9a044c0ab4abdf6f825521a64e1fc2	2018-09-25 17:25:47 -07:00
Yinghai Lu	658386a63f	Make USE_IDEEP work again (#12026 ) Summary: This PR establish a baseline so that we can build IDEEP ops in the new work flow. From this baseline, we need to - Merge the CMakefile of MKLDNN from caffe2 and Pytorch - Get rid of `USE_MKL=ON`. Build command from now on: ``` EXTRA_CAFFE2_CMAKE_FLAGS="-DUSE_MKL=ON -DINTEL_COMPILER_DIR=/opt/IntelComposerXE/2017.0.098" python setup.py build_deps ``` gujinghui Pull Request resolved: https://github.com/pytorch/pytorch/pull/12026 Differential Revision: D10041199 Pulled By: yinghai fbshipit-source-id: b7310bd84a494ac899d8e25da368b63feed4eeaf	2018-09-25 16:56:29 -07:00
Brian Gesiak	b7b9e3c7e8	Fix "identifier following the 'template' keyword does not refer to a template" (#12037 ) Summary: LLVM trunk emits an error diagnostic when attempting to compile caffe2. The identifiers following the `template` keywords are not templates, so the use of the keyword does not make sense in this context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12037 Reviewed By: ezyang Differential Revision: D10024531 Pulled By: modocache fbshipit-source-id: da4b9ba405d9f7fd633ab8c1a61c77da9c1a1f89	2018-09-25 16:40:42 -07:00
Edward Yang	1e28294487	Delete some unused variables. (#12059 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12059 Differential Revision: D10034632 Pulled By: ezyang fbshipit-source-id: ff33da0d93734856b8e8bcfe744cefe127fffb91	2018-09-25 14:25:21 -07:00
Edward Yang	e53e8df20b	Support TypeIdentifier::name() (#12036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12036 Sometimes you have a TypeIdentifier, and no way to get to the TypeMeta. Still nice to be able to read out the name. This should be obsoleted by smessmer's patches. Reviewed By: gchanan, mingzhe09088 Differential Revision: D10024554 fbshipit-source-id: 42cdceefd5c59be0441254665f66f5edc829f422	2018-09-25 14:25:19 -07:00
Edward Yang	aa1adde80b	Refactor fastGet/fastSet for clarity, removing a null pointer check. (#11902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11902 Previously, they were going through THTensor_getStoragePtr which incurred a null pointer check on storage. Now they use unsafe_data method which doesn't do this check. I don't know if this actually make things go faster, but I get an added bonus of reducing code duplication, so we should take this change anyway :) Reviewed By: SsnL Differential Revision: D9977654 fbshipit-source-id: f45c74828213a0439480755ad0b2d7f8858cb327	2018-09-25 13:55:53 -07:00
Edward Yang	ceadde2a7f	Add some more locations to search for nccl. (#12063 ) Summary: Users generally expect ./configure to find libraries installed in /usr/local and /usr, so search for nccl there too. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12063 Differential Revision: D10036248 Pulled By: ezyang fbshipit-source-id: d331ddd2ccc8ac9846fb54222db284b1ec371659	2018-09-25 13:27:54 -07:00
Sam Gross	b263078bc3	Fix CUDA division by a scalar on large arrays. (#12023 ) Summary: The gpu_unary_kernel function was not handling arrays that cannot use 32-bit indexing. This functions was only called directly by CUDA division by a scalar. Other arithmetic operations go through gpu_binary_kernel, which already properly handled large arrays. This bug sometimes manifested as a crash and sometimes as an incorrect answer. Fixes #11788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12023 Differential Revision: D10034017 Pulled By: colesbury fbshipit-source-id: b17300f327de54035746bf02f576766007c9b144	2018-09-25 13:10:25 -07:00
vishwakftw	a106388187	Free MAGMA queues after use (#11882 ) Summary: This PR is a minor change, just adds a simple `magma_queue_destroy` function to the implementation of `Gesv`. Also, I have replaced calls for obtaining handles with those already written in ATen. ``` THCState_getCurrentSparseHandle(at::globalContext().getTHCState()) --> getCurrentCUDASparseHandle() THCState_getCurrentBlasHandle(at::globalContext().getTHCState()) --> getCurrentCUDABlasHandle() ``` Differential Revision: D10032204 Pulled By: soumith fbshipit-source-id: ccd11989ecdc357313f0b661a2468f75d3aecb0e	2018-09-25 12:56:57 -07:00
Sebastian Messmer	8f0db9bbbb	Removing some dependency edges from Blob to other caffe2 (#12043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12043 Re-trying D9979976, this time with all call sites fixed. D9979976 got reverted because there was a call site that wasn't covered by sandcastle it seems. I fixed it and used 'grep' to ensure there aren't any more call sites in fbsource. Reviewed By: ezyang Differential Revision: D10026392 fbshipit-source-id: cd341514a8e53a40147ea0ee3e52f63bb6444157	2018-09-25 11:40:24 -07:00
Orion Reblitz-Richardson	94c513cc7f	Improve pybind11 message (#11640 ) Summary: Improving the message based on https://github.com/pytorch/pytorch/issues/11570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11640 Differential Revision: D10033383 Pulled By: orionr fbshipit-source-id: 0cdcdbe0582d896283a12970aebe771efa390dd2	2018-09-25 11:26:05 -07:00
Duc Ngo	364ae10bb8	nomnigraph - easy - add some python test helper methods (#12020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12020 - make it less verbose to create random blobs in python unit test by adding some test helper methods - move str_compare test helper method to test_util.py Reviewed By: ZolotukhinM Differential Revision: D10003637 fbshipit-source-id: cb79d2ad508341f750a1bb8f564e87d055c65652	2018-09-25 10:55:19 -07:00
Will Feng	7122f8b3bb	Disable more flaky tests on CircleCI (#11399 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11362. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11399 Differential Revision: D9736673 Pulled By: yf225 fbshipit-source-id: cad8c0e86a70a01b047e648975ca5b9926e4acb3	2018-09-25 10:25:30 -07:00
Edward Yang	d7e11e3aae	Revert "Move CreateContext to global registry (#11688 )" (#12049 ) Summary: This reverts commit 3ae6ee4ebded136da30aa53fd3873d84acfbc9f0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12049 Differential Revision: D10030954 Pulled By: ezyang fbshipit-source-id: 6ca9de65b707c5b4c68280fc6f1b8e5ad7251efc	2018-09-25 10:13:43 -07:00
Edward Yang	3deb4791c3	Replace 'struct Tensor' with 'class Tensor'. (#12034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12034 We need ATen and Caffe2 to line up, and the rule is that if you have any private/protected members, you should declare it as a class. Class we go. (There are some other obvious candidates for this treatment, but I've kept this patch just to Tensor) Reviewed By: gchanan, mingzhe09088 Differential Revision: D10024467 fbshipit-source-id: 17cfe2741ba9c3f56cb87d6f5d1afd3c61a8e4fe	2018-09-25 09:54:35 -07:00
Edward Yang	fcb3ccf23f	Don't record Git version automatically via cmake (#12046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12046 This /sounds/ like a good idea in theory, but a feature like this must be implemented very carefully, because if you just plop the Git version in a header (that is included by every file in your project, as macros.h is), then every time you do a 'git pull', you will do a FULL rebuild, because macros.h is going to regenerate to a new version and of course you have to rebuild a source file if a header file changes. I don't have time to implement it correctly, so I'm axing the feature instead. If you want git versions in, e.g., nightly builds, please explicitly specify that when you feed in the version. Reviewed By: pjh5 Differential Revision: D10030556 fbshipit-source-id: 499d001c7b8ccd4ef15ce10dd6591c300c7df27d	2018-09-25 09:40:19 -07:00
Gregory Chanan	0947712e5d	Move Factory functions from Type to TypeExtendedInterface. (#12025 ) Summary: This makes a few changes wrt Type, with the ultimate goal of removing Type from the public Methods/Functions. In particular: 1) Removes factory functions from Type, into TypeExtendedInterface. 2) sparse_coo_tensor is now a first class at:: namespace function, with TensorOptions overloads. 3) We move from Type-based sparse_coo_tensor dispatch to function-based. Note we still require a number of changes to get rid of tType in the public interface, in particular TensorOptions needs to support CUDA vs non-CUDA dispatch. That is coming in a future patch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12025 Reviewed By: ezyang Differential Revision: D10017205 Pulled By: gchanan fbshipit-source-id: 00807a37b09ed33f0656aaa165bb925abb026320	2018-09-25 09:40:17 -07:00
Edward Yang	d4ce41c4de	Rename tensor_impl_ to impl_ in Tensor (#12035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12035 This brings it in line with Caffe2's naming Reviewed By: mingzhe09088 Differential Revision: D10024485 fbshipit-source-id: a6feef82a56b5eb3043b0821ea802ba746e542a0	2018-09-25 09:11:39 -07:00
Edward Yang	71b99f28be	Give default values to members of TensorImpl. (#12033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12033 These are reasonable sensible default values. One key pick is -1 for numel: this is because in Caffe2, a tensor may be in "un-allocated" with no storage state; this is historically represented in Caffe2 with numel_ == -1 Reviewed By: mingzhe09088 Differential Revision: D10024439 fbshipit-source-id: a167d727a7665daac7e7a1e98c0c89d8f1da6fa6	2018-09-25 09:11:37 -07:00
Maciej Bargiel	2cdf98a74d	Back out "Removing some dependency edges from Blob to other caffe2" Summary: The controller you requested could not be found. Original commit changeset: 2ea17724e223 Differential Revision: D10026321 Ninja: stable broken fbshipit-source-id: faf87cb7cc0f78c2c10d4aa6fceea279cd27acd6	2018-09-25 01:11:14 -07:00
Hong Xu	3417a1e7e4	Prepend a "const" to a for loop in printPyObject. (#11857 ) Summary: As pytuple should be a constant type (since obj is constant), potential errors would occur without this const decorator, e.g., when compiling against PyPy. Although PyPy is not supported yet, it would still be useful if we remove this compilation issue (out of very few numbers of compilation issues) to allow hackers playing with them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11857 Differential Revision: D10024149 Pulled By: soumith fbshipit-source-id: aa7e08e58f6369233a11477113351dccd3854ba8	2018-09-24 23:12:57 -07:00
Sebastian Messmer	17a65bf9b6	Removing some dependency edges from Blob to other caffe2 (#11923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11923 This is pre-work to allow moving Blob to ATen/core, which cannot depend on caffe2 anymore. (1) Removing the Blob -> Tensor dependency allows us to move Blob to ATen/core and use it inside IValue without having to wait for the Tensor merge to be complete. (2) In the final Blob design, we want it to be a very small class that doesn't have any special treatment for Tensor (or to be more correct, doesn't allow storing Tensor anymore), so this is anyhow the direction we want to go. This changes call sites that will have to be moved to IValue later, but they cannot be moved to IValue directly, because for that, IValue first needs to be able to store Blob, which in turn first needs this diff and some other changes coming up in future diffs. Codemods: $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.IsTensorType\\(" "BlobIsTensorType(\\1, " $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->IsTensorType\\(" "BlobIsTensorType(\\1, " $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.GetMutableTensor\\(" "BlobGetMutableTensor(\\1, " $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->GetMutableTensor\\(" "BlobGetMutableTensor(\\1, " It is, however, not only these codemods because regex based refactoring was only able to match a small amount of the call sites. To catch more, I wouldn've needed a AST aware tool like clangr, which I didn't figure out how to use. Reviewed By: ezyang Differential Revision: D9979976 fbshipit-source-id: 2ea17724e223b5b73b44f99362727759ca689e61	2018-09-24 22:57:05 -07:00
Edward Yang	dfa03e94eb	Fix mispelling of AVAILABLE. (#12016 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12016 Reviewed By: pietern Differential Revision: D10010808 Pulled By: ezyang fbshipit-source-id: ff6394ae9a53f7fdad2cadb4e019e09ac63bba96	2018-09-24 20:46:41 -07:00
John	86e025fca2	magma-cuda should reference updated versions (#12000 ) Summary: Source build doc section LAPACK GPU only lists magma-cuda80 The magma-cuda version should reflect the installed version of cuda. - Verified on ubuntu with magma-cuda92 with build and test - Verified 91 is available Pull Request resolved: https://github.com/pytorch/pytorch/pull/12000 Differential Revision: D10024158 Pulled By: soumith fbshipit-source-id: a34c85a5e87b52657f1e6f7b21d235306ab7b2aa	2018-09-24 20:26:26 -07:00
Pieter Noordhuis	5d4624a1d9	Fix return temporary as reference in MPI backend (#11947 ) Summary: The MPI async work class returned a temporary as reference, which is invalid (hat tip to colesbury for noticing it). This change fixes that and uses a std::exception_ptr to hold on to the exception if applicable, and then returns the reference by throwing it and returning it, like the existing code path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11947 Differential Revision: D10019928 Pulled By: pietern fbshipit-source-id: 5a8ed0e894615a09224ca5e48c8b3104275a3019	2018-09-24 20:17:38 -07:00
Spandan Tiwari	9068a46dba	Fix deprecated function warning in ONNX model test. (#11827 ) Summary: When running /test/onnx/test_models.py, we see deprecation warnings in the test points for `super_resolution` and `squeezenet` models. This change updates those models to use the recommended methods, instead of the deprecated ones. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11827 Reviewed By: houseroad Differential Revision: D10023998 Pulled By: ezyang fbshipit-source-id: ee4e14304678c532ebd574e7bd143e3b311995ab	2018-09-24 19:59:02 -07:00
Adam Paszke	a830964007	Eliminate no-op adds and muls in peephole pass (#11801 ) Summary: Because we emit a lot of them in our symbolic AD. This brings down the backward time of an LSTM I'm testing from 14.2ms to 12.5ms (a 15% improvement). Pull Request resolved: https://github.com/pytorch/pytorch/pull/11801 Differential Revision: D9916815 Pulled By: apaszke fbshipit-source-id: 2d9cb886c424ccd43b9f996aad89950d3bddf494	2018-09-24 17:48:48 -07:00
Jerry Zhang	3ae6ee4ebd	Move CreateContext to global registry (#11688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11688 As a first step to remove static context(merge with allocator), we'll create a global registries for context constructors, and remove CreateContext function from tensor. Reviewed By: ezyang, dzhulgakov Differential Revision: D9779821 fbshipit-source-id: 8b239ea50af7a0556fde2382f58f79194f0e3dc1	2018-09-24 17:07:50 -07:00
Bram Wasti	b7c302da1a	Make gen_jit_dispatch runnable (#12018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12018 Tried to use the file and ran into a small bug, this fixes it Differential Revision: D10013231 fbshipit-source-id: 4cf8c29cf9e2cedd7a28fa0cc0196e5144a54bf2	2018-09-24 16:09:48 -07:00
Hoa Dinh	70e4b3ef59	Revert D10006069: Remove TIndex typedef from core/common.h Differential Revision: D10006069 Original commit changeset: 5e2aac993968 fbshipit-source-id: fbd8d3860635211e641ca14eaff7a64882e0d6bd	2018-09-24 15:30:25 -07:00
Peter Goldsborough	e05d689c49	Unify C++ API with C++ extensions (#11510 ) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), which is only passed when building a C++ extension. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535	2018-09-24 14:44:21 -07:00
Sam Gross	1c09bfde1b	Make promoteType(half, integer) -> half (#11941 ) Summary: Changes the result type of half type and any integer type to return half type (instead of float or double). This is based on top of #11808. The first new commit is "Make promoteType(half, integer) -> half". I'll rebase on top of master once that PR lands. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11941 Differential Revision: D10014122 Pulled By: colesbury fbshipit-source-id: 16a5eb3406a5712069201d872d8736d0599e9411	2018-09-24 13:55:42 -07:00
Adam Paszke	51414822f5	Stop moving constants into DifferentiableSubgraphs (#11809 ) Summary: Or even taking them as inputs. This prevents optimizations to happen either inside the differentiable subgraphs, or in the surrounding graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11809 Differential Revision: D10009680 Pulled By: apaszke fbshipit-source-id: face638566228e470a6deec48dc2aa3a1cce26d4	2018-09-24 13:24:53 -07:00
Syed Tousif Ahmed	ffbac7d0bb	Miscellaneous updates for CUDA 10 (#12017 ) Summary: This PR has some updates related to CUDA 10. - `c2195e9864` ensures that the repo successfully builts on CUDA 10. Addresses https://github.com/pytorch/pytorch/issues/11888 - `423d8d3524` follows up on the cufft max plan number bug: https://github.com/pytorch/pytorch/issues/11089, which has been fixed in CUDA 10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12017 Differential Revision: D10013405 Pulled By: soumith fbshipit-source-id: 5bc6d7f71d5133f7821b407b1ac6c51bef0f6fa8	2018-09-24 11:58:32 -07:00
Yangqing Jia	a6f1ae7f20	set up c10 scaffolding. Move macros proper first. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11939 Reviewed By: orionr, dzhulgakov Differential Revision: D10004629 Pulled By: Yangqing fbshipit-source-id: ba50a96820d35c7922d81c78c4cbe849c85c251c	2018-09-24 11:09:59 -07:00
Christian Puhrsch	1a1d79e761	Remove TIndex typedef from core/common.h (#11993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11993 See title Reviewed By: ezyang Differential Revision: D10006069 fbshipit-source-id: 5e2aac993968307c850e431c00052cb1a339ced2	2018-09-24 10:55:55 -07:00
Christian Puhrsch	a9e6a673ae	Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11876 Modern C++ api instead of macros, item() is aligned with Python frontend. caffe2::Tensor::capacity_nbytes is effecitvely unused and confusing w.r.t. caffe2::Tensor::nbytes(). codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCByte "item<uint8_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCLong "item<int64_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCInt "item<int32_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCDouble "item<double>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toByteData "data<uint8_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toLongData "data<int64_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toIntData "data<int32_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toDoubleData "data<double>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toFloatData "data<float>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCByte "item<uint8_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCLong "item<int64_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCInt "item<int32_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCDouble "item<double>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toByteData "data<uint8_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toLongData "data<int64_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toIntData "data<int32_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toDoubleData "data<double>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toFloatData "data<float>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCComplexDouble "item<std::complex<double>>" codemod -d tc --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" Reviewed By: ezyang Differential Revision: D9948572 fbshipit-source-id: 70c9f5390d92b82c85fdd5f8a5aebca338ab413c	2018-09-24 10:40:10 -07:00
Gregory Chanan	1178851280	Get rid of most usages of Type.tensor. (#12002 ) Summary: 1) Most usages are replaced by at::empty. 2) native_tensor has its namespace function removed 3) Type.tensor(sizes, strides) becomes at::empty_strided(sizes, strides). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12002 Differential Revision: D10007201 Pulled By: gchanan fbshipit-source-id: 5e5647c050ed2ecb87a33e0b5ce4928fa3186c34	2018-09-24 10:16:18 -07:00
Christian Puhrsch	76ab26cc3e	Remove unused THNN functions due to removal of torch/legacy (#11946 ) Summary: See title Pull Request resolved: https://github.com/pytorch/pytorch/pull/11946 Differential Revision: D9994625 Pulled By: cpuhrsch fbshipit-source-id: fca3d48ecbdab06ce53249db2402fc4613da4d21	2018-09-22 21:54:55 -07:00
Christian Puhrsch	a6630e25af	Remove many caffe2::TIndex and replace them with int64_t (#11943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11943 See title Reviewed By: ezyang Differential Revision: D9992645 fbshipit-source-id: e8f80d6ea762971513e5e8072975ceea53e1f11a	2018-09-22 18:11:04 -07:00
Greg McGary	5d0f1c3c8f	Add #include to satisfy Android NDK unified headers Summary: Old per-API+arch headers reside in /opt/android_ndk/r/platforms/android-/arch-/usr/include/ New Unified headers reside in /opt/android_ndk/r/sysroot/usr/include/ Unified headers are not exactly drop-in replacements for the old ones. Old headers had some nested includes that are absent in the unified versions, so we need to explicitly include them. Reviewed By: mzlee Differential Revision: D9952200 fbshipit-source-id: 6515e1d1ab576069db499c3fb23a69d507279c8c	2018-09-22 15:39:56 -07:00
Junjie Bai	7517e53468	Update onnx submodule to onnx/onnx@c4734c6 (#11958 ) Summary: `c4734c6200` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11958 Differential Revision: D10002779 Pulled By: bddppq fbshipit-source-id: 8bd7dfc8fdaf0b699a61f5b228f7102a16b92258	2018-09-22 01:40:31 -07:00
Junjie Bai	f15474ade8	Export caffe2::Caffe2Annotation symbols (#11965 ) Summary: Some of these symbols are used by device_test.cc . `d0db23e95a` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11965 Reviewed By: bwasti Differential Revision: D10002439 Pulled By: bddppq fbshipit-source-id: 4ae95b9c888b3c7685d0ffdbcbfa3441bcf90091	2018-09-21 22:43:48 -07:00
Sebastian Messmer	1c282ab99a	Move GetExceptionString to Error.h (#11501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11501 This doesn't really belong to TypeMeta, moving it to the error handling header Reviewed By: ezyang Differential Revision: D9763424 fbshipit-source-id: 127a8246171ab3a4475f2767d2dc1cc13c486a2e	2018-09-21 21:54:33 -07:00
Peter Goldsborough	825181ea9d	Rewrite C++ API tests in gtest (#11953 ) Summary: This PR is a large codemod to rewrite all C++ API tests with GoogleTest (gtest) instead of Catch. You can largely trust me to have correctly code-modded the tests, so it's not required to review every of the 2000+ changed lines. However, additional things I changed were: 1. Moved the cmake parts for these tests into their own `CMakeLists.txt` under `test/cpp/api` and calling `add_subdirectory` from `torch/CMakeLists.txt` 2. Fixing DataParallel tests which weren't being compiled because `USE_CUDA` wasn't correctly being set at all. 3. Updated README ezyang ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/11953 Differential Revision: D9998883 Pulled By: goldsborough fbshipit-source-id: affe3f320b0ca63e7e0019926a59076bb943db80	2018-09-21 21:28:16 -07:00
Bram Wasti	d0db23e95a	Add distributed annotations Summary: Annotations for DAI Reviewed By: duc0 Differential Revision: D9805867 fbshipit-source-id: 9ce2d9f3984817510ec8362a281f39878aad55e7	2018-09-21 19:09:59 -07:00
Wei Yang	de11fe0c83	migrate PReLU to ATen (#11758 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/10723 - migrate PReLU to ATen and deprecate legacy PReLU - performance: CPU with weight.numel() = 1 ``` >>> m = nn.PReLU() >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 100 loops, best of 100: 9.43 ms per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 10 loops, best of 100: 24.4 ms per loop >>> m = nn.PReLU() >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 695 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 2.47 ms per loop ``` CPU with weight.numel() = channels ``` >>> m = nn.PReLU(100) >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 603 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 13.3 ms per loop >>> m = nn.PReLU(100) >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 655 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 2.45 ms per loop ``` CUDA with weight.numel() = 1 ``` >>> m = nn.PReLU().cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 10000 loops, best of 100: 187 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.01 ms per loop >>> m = nn.PReLU().cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 1000 loops, best of 100: 195 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.28 ms per loop ``` CUDA with weight.numel() = channel ``` >>> m = nn.PReLU(100).cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 1000 loops, best of 100: 174 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.27 ms per loop >>> m = nn.PReLU(100).cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 10000 loops, best of 100: 181 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.26 ms per loop ``` The huge performance regression in CPU when weight.numel() = 1 is addressed by replacing at::CPU_tensor_apply* with parallelized kernels. ezyang SsnL zou3519 soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11758 Differential Revision: D9995799 Pulled By: weiyangfb fbshipit-source-id: d289937c78075f46a54dafbde92fab0cc4b5b86e	2018-09-21 16:26:04 -07:00
Owen Anderson	89d56ae435	Move function deletion from the stack to the heap. (#11611 ) Summary: This eliminates the need for any heuristics regarding stack size limits. This is a re-do #11534 with a fix to properly handle cases where multiple edges exist between a pair of functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11611 Differential Revision: D9991198 Pulled By: resistor fbshipit-source-id: fecd2c5cac7e78f82a0f20cf33268bb1617bb4a0	2018-09-21 16:11:03 -07:00
Richard Zou	b5f60af94c	Shape prop view/reshape/as_strided through prim::ListConstructs (#11877 ) Summary: Previously, aten::view returned a Dynamic type when attr::size is a prim::ListConstruct. See [this for a repro](https://gist.github.com/zou3519/cbd610472ba3369f556fa612a7d93b28). This prevented a pre-multipled lstm input graph from being fusible (aten::view is necessary to do premultiplication). If aten::view is passed an output of a prim::ListConstruct node, then shape prop should be able to figure out its TensorType because we statically know the number of inputs to prim::ListConstruct. This PR implements that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11877 Differential Revision: D9972356 Pulled By: zou3519 fbshipit-source-id: cb87786f6e7f222d4b8f07d8f2a9de34859cb6a5	2018-09-21 14:20:01 -07:00
Adam Paszke	7efbf3a827	Specialize ArgumentSpecs on tuple elements too (#11863 ) Summary: This is pretty important because a common situation of passing LSTM hidden states as a tuple completely trashes performance of a network. Cleans up all our propagation/undef specialization passes, at a cost of increased complexity of `ArgumentSpec` and `GraphExecutor`. An alternative would be to simply flatten all tuple inputs to a graph ahead of time, but that might just end up being confusing in the future (you never know if you're working with a graph that can have tuple or not). Pull Request resolved: https://github.com/pytorch/pytorch/pull/11863 Differential Revision: D9992814 Pulled By: apaszke fbshipit-source-id: 0a565a3b23e32f8fa72c0534e07c1ce6187739fc	2018-09-21 14:19:58 -07:00
Sam Gross	1cf5b0c7c1	Fix casting logic for 0d CPU tensors in CUDA ops (#11808 ) Summary: Previously, we didn't cast any 0-dim tensors used in CUDA operations. We can only avoid the casts for 0-dim CPU tensors used in CUDA operations. Fixes #11795 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11808 Differential Revision: D9922406 Pulled By: colesbury fbshipit-source-id: 940b8a8534770aa5cd70d5d09b96be0f0f8146ff	2018-09-21 14:19:56 -07:00
Adam Paszke	1ad7e0c5ec	Minor JIT improvements (#11654 ) Summary: - Disable addmm fusion. The reason for this is explained in the comment. - Tiny change in `stack.h` that lets us avoid constructing an unnecessary temporary `IValue` on the (C++) stack (it will only get created on the interpreter stack directly). - Fixed a correctness issue in requires grad propagation Pull Request resolved: https://github.com/pytorch/pytorch/pull/11654 Reviewed By: colesbury Differential Revision: D9813739 Pulled By: apaszke fbshipit-source-id: 23e83bc8605802f39bfecf447efad9239b9421c3	2018-09-21 14:19:54 -07:00
David Riazati	4e65fbfee5	Remove tests from EXCLUDE_SCRIPT that pass (#11916 ) Summary: Spruriously added in #11261 I had a PR to catch these automatically (#11279), but it had some issues passing on some CI environments but not others (e.g. for `test_nn_group_norm`), any ideas? Pull Request resolved: https://github.com/pytorch/pytorch/pull/11916 Differential Revision: D9992065 Pulled By: driazati fbshipit-source-id: 05cfa8ed9af939e8ffd5827847ee7bfe0be799b2	2018-09-21 14:19:50 -07:00
James Reed	00fe2c5606	Use -O1 for sleef build in Debug mode (#11942 ) Summary: `-O0` is problematic for compiling sleef kernels since they consist of a bunch of vector intrinsics. In `-O0`, the compiler spills every intermediate value to the stack. In one example (TestEndToEndHybridFrontendModels.test_snli in test_jit.py) the function `Sleef_tanhf8_u10avx2` would spill 30kB of AVX registers onto the stack and run two orders of magnitude slower than in opt mode, causing the test to take minutes rather than seconds. I've verified that this behavior is not present with `-O1` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11942 Differential Revision: D9994658 Pulled By: jamesr66a fbshipit-source-id: cdd9474c6ae3aa9898d5715ac19a900f5f90468a	2018-09-21 13:24:59 -07:00
Thomas Viehmann	775358e4c2	Add non-legacy test of bilinear (#11935 ) Summary: Fixes: #11905 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11935 Differential Revision: D9991120 Pulled By: soumith fbshipit-source-id: b00ad4f405440664ae5228b229a2ba0a5d3d92f6	2018-09-21 12:43:35 -07:00
Brian Johnson	23f5b2abbe	Fixes an error with canonical url. (#11938 ) Summary: Deleted this section by mistake in last PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11938 Reviewed By: SsnL Differential Revision: D9993258 Pulled By: brianjo fbshipit-source-id: 2552178cebd005a1105a22930c4d128c67247378	2018-09-21 12:21:42 -07:00
Adam Paszke	c2a2110d71	Stop tracing _out overloads (#11910 ) Summary: They aren't recognized anywhere in the JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/11910 Differential Revision: D9979968 Pulled By: apaszke fbshipit-source-id: bb2505a14e3b1e54d5c243f99c80a4f4d918b204	2018-09-21 11:44:10 -07:00
Yangqing Jia	c6a14b1edd	Revert D9985212: [pytorch][PR] [minor] remove a remaining todo line deletion in THD cmake Differential Revision: D9985212 Original commit changeset: 5f8e7ac94101 fbshipit-source-id: 1783cbfc91008ab3db36bad7c1bf51e16da7fb2d	2018-09-21 11:25:53 -07:00
Wei Yang	817e83fc01	fix PR #11061 (#11815 ) Summary: - fix PR https://github.com/pytorch/pytorch/pull/11061 by moving `detach_()` and `set_requires_grad()` to `torch.tensor_ctor()` and `tensor.new_tensor`, and also removed warnings and `args_requires_grad` from `internal_new_from_data ` - with this patch, the returned tensor from `tensor_ctor()` and `new_tensor` will be detached from source tensor, and set requires_grad based on the input args - `torch.as_tensor` retains its behavior as documented gchanan apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11815 Differential Revision: D9932713 Pulled By: weiyangfb fbshipit-source-id: 4290cbc57bd449954faadc597c24169a7b2d8259	2018-09-21 11:04:19 -07:00
Thomas Viehmann	6834dcab1c	Align cuda multinomial without replacement to CPU behaviour (#11933 ) Summary: We do this by being more NaN tolerant. Fixes: #9062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11933 Differential Revision: D9991129 Pulled By: soumith fbshipit-source-id: c99b04462c1bee90d00eeabb0c111de12f855f4d	2018-09-21 11:04:17 -07:00
Gao, Xiang	784d345828	Fix docstring of `torch.jit.createResolutionCallback` (#11921 ) Summary: The sample code in the docstring of `torch.jit.createResolutionCallback` is not working: `createResolutionCallback()` gets the frame of `bar`. In order to get the frame of `baz`, one need to use `createResolutionCallback(1)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11921 Differential Revision: D9989123 Pulled By: soumith fbshipit-source-id: a7166defdccbbf6979f7df4c871298e6b9a2b415	2018-09-21 09:41:57 -07:00
Adam Paszke	e655f16c35	Pop stashed IntList in resize_, warn about its usage when tracing. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11909 Differential Revision: D9979595 fbshipit-source-id: 07b1027bd6bd1605a31afd4f57bcd58e307fa41e	2018-09-21 08:40:20 -07:00
Thomas Viehmann	4fb7e72fe5	Fix _thnn_fused_lstm_cell backward (#11872 ) Summary: There are two parts: - Optional tensors cannot be dispatch tensors because dispatch tensors cannot be optional. - While the kernel dealt with undefined grad_outs, the logistics around it did not fully accomodate grad_hy being undefined. Fixes: #11800 Thank you, mttk for the reproduction! Pull Request resolved: https://github.com/pytorch/pytorch/pull/11872 Differential Revision: D9978527 Pulled By: apaszke fbshipit-source-id: e622c288d2eac93bd8388e141fb773f2588e2b8f	2018-09-21 08:25:00 -07:00
Edward Yang	48c8adfe1b	Turn storage on UndefinedTensorImpl into nullptr. (#11738 ) Summary: I also fix a bug that crept in while we had incorrect semantics where UndefinedTensorImpl was a CPU tensor, and thus some moves which shouldn't have been legal didn't crash. Moving out the Tensor* also moved out the Tensor* in the blob, and it's not supported to store an undefined tensor in a blob. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11738 Reviewed By: gchanan Differential Revision: D9847859 fbshipit-source-id: db6be0f76a8e6526a89fd0e87b6a23b9cc820c8d	2018-09-21 08:24:57 -07:00
Edward Yang	11bd2f2509	Retainable is no more (#11900 ) Summary: Stack:     ⚫  #11900 Retainable is no more  [💛](https://our.intern.facebook.com/intern/diff/D9977505/)     ⚪  #11902 Refactor fastGet/fastSet for clarity, removing a null pointer check.  [💛](https://our.intern.facebook.com/intern/diff/D9977654/) Kill it with fire Pull Request resolved: https://github.com/pytorch/pytorch/pull/11900 Differential Revision: D9979779 Pulled By: ezyang fbshipit-source-id: 0a437e7a0baadb6440e7dc39a01b4a406171faa7	2018-09-21 06:58:18 -07:00
peter	a7afd133f5	Sync FindCUDA.cmake with upstream cmake repo (#11880 ) Summary: Upstream PR: https://gitlab.kitware.com/cmake/cmake/merge_requests/2391/diffs Pull Request resolved: https://github.com/pytorch/pytorch/pull/11880 Differential Revision: D9989119 Pulled By: soumith fbshipit-source-id: 66e87367127975a5f1619fe447f74e76f101b503	2018-09-21 06:58:17 -07:00
Luca Antiga	58d28a5f12	Fix saving loaded module (#11915 ) Summary: This PR fixes #11913. In order to test for this, the model is serialized twice in `getExportImportCopy`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11915 Differential Revision: D9984697 Pulled By: soumith fbshipit-source-id: ae0250c179000c03db1522b99410f6ecb9681297	2018-09-21 06:58:16 -07:00
Yangqing Jia	0d9be2135f	remove a remaining todo line deletion in THD cmake (#11920 ) Summary: TSIA Pull Request resolved: https://github.com/pytorch/pytorch/pull/11920 Differential Revision: D9985212 Pulled By: Yangqing fbshipit-source-id: 5f8e7ac94101177740e791f44eaa8c8ec55a908c	2018-09-21 00:40:20 -07:00
Sebastian Messmer	b2b05b7c20	Move blob serialization to free functions (#11817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11817 Blob::Serialize() and Blob::Deserialize() are now free functions SerializeBlob(), DeserializeBlob() instead. This takes away access to Blob internals from them and makes future refactorings easier. Reviewed By: ezyang Differential Revision: D9882726 fbshipit-source-id: 3251ebd4b53fc12f5e6924a6e4a8db3846ab3729	2018-09-20 23:27:34 -07:00
Brian Johnson	17cd426c72	Updated docs styles (#11835 ) Summary: Updated requirements.txt and conf.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11835 Reviewed By: SsnL Differential Revision: D9941160 Pulled By: brianjo fbshipit-source-id: fbac91214558e6d17beff74261d990c7dc762038	2018-09-20 21:11:12 -07:00
Peter Goldsborough	d712a71741	Protobuf serialization (#11619 ) Summary: This PR serves two purposes: 1. Design an abstraction over a serialization scheme for C++ modules, optimizers and tensors in general, 2. Add serialization to the ONNX/PyTorch proto format. This is currently a rough prototype I coded up today, to get quick feedback. For this I propose the following serialization interface within the C++ API: ```cpp namespace torch { namespace serialize { class Reader { public: virtual ~Reader() = default; virtual void read(const std::string& key, Tensor& tensor, bool is_buffer = false) = 0; virtual void finish() { } }; class Writer { public: virtual ~Reader() = default; virtual void writer(const std::string& key, const Tensor& tensor, bool is_buffer = false) = 0; virtual void finish() { } }; }} // namespace torch::serialize ``` There are then subclasses of these two for (1) Cereal and (2) Protobuf (called the "DefaultWriter" and "DefaultReader" to hide the implementation details). See `torch/serialize/cereal.h` and `torch/serialize/default.h`. This abstraction and subclassing for these two allows us to: 1. Provide a cereal-less serialization forward that we can ship and iterate on going forward, 2. Provide no-friction backwards compatibility with existing C++ API uses, mainly StarCraft. The user-facing API is (conceptually): ```cpp void torch::save(const Module& module, Writer& writer); void torch::save(const Optimizer& optimizer, Writer& writer); void torch::read(Module& module, Reader& reader); void torch::read(Optimizer& optimizer, Reader& reader); ``` with implementations for both optimizers and modules that write into the `Writer` and read from the `Reader` ebetica ezyang zdevito dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/11619 Differential Revision: D9984664 Pulled By: goldsborough fbshipit-source-id: e03afaa646221546e7f93bb8dfe3558e384a5847	2018-09-20 20:39:34 -07:00
Roy Li	30521a37ad	codemod: caffe::float16 -> at::Half (#11785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11785 Replace each instead of float16 with Half. Reviewed By: Yangqing Differential Revision: D9892158 fbshipit-source-id: b9225ca7bd5c84fd1c04a9d24b026c8b6cbff120	2018-09-20 18:55:19 -07:00
Roy Li	a9459bf7b5	Replace float16 with at::Half in caffe2 (#11676 ) Summary: - Finishes unifying Half type in pytorch and caffe2 - As a side effect, aten_op works for fp16 now Pull Request resolved: https://github.com/pytorch/pytorch/pull/11676 Reviewed By: weiyangfb Differential Revision: D9829019 Pulled By: li-roy fbshipit-source-id: b8c9663873c10fe64c90ef180dc81af2e866674e	2018-09-20 18:55:17 -07:00
Lu Fang	9c44c60794	Bump up the frontend version (#11873 ) Summary: To update the onnx model zoo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11873 Reviewed By: BIT-silence Differential Revision: D9953369 Pulled By: houseroad fbshipit-source-id: 5e96a982b8029dceeb08e3bea4094bae053e1865	2018-09-20 16:20:48 -07:00
Thomas Viehmann	9f0d9db6e4	Improve GRU/LSTM documentation for multiple layers (#11896 ) Summary: Prompted by Alex Falcon's input on the forums. Thank you! Pull Request resolved: https://github.com/pytorch/pytorch/pull/11896 Differential Revision: D9976831 Pulled By: SsnL fbshipit-source-id: 460af51049c289ed4ce529b7b6ae6314e2bdaae4	2018-09-20 15:42:48 -07:00
Ashish	c7751f4df0	MIOpen bug fixes and performance enhancements (#11766 ) Summary: This PR contains changes for: 1. Performance enhancements for group conv using MIOpen 2. Performance enhancements by removing unnecessary computations while running pooling through MIOpen 3. Added check for bwdData comptutation while running MIOpen convGradient operator 4. Fix in MIOpen poolingGradient operator to compute window size for global pooling case 5. Minor code cleanup in MIOpen spatial batch norm operator Differential Revision: D9979050 Pulled By: bddppq fbshipit-source-id: fabc7a44a2f9ca0307d99564d1ce8fe1de9a6fbb	2018-09-20 15:31:46 -07:00
yya007	b91b15d86e	Implementing Matrix Norm for torch.norm (#11261 ) Summary: Currently, norm function only supports vector norm. This PR extends vector norm to matrix norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11261 Reviewed By: li-roy Differential Revision: D9652379 Pulled By: yya007 fbshipit-source-id: 519b3fb80b563c17c56a24675c7b0e46bf5a3a1c	2018-09-20 14:43:13 -07:00
Peter Goldsborough	6100c0ea14	Introduce ExtensionVersioner for C++ extensions (#11725 ) Summary: Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name. I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero. One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them. Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found. ezyang gchanan soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725 Differential Revision: D9948244 Pulled By: goldsborough fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383	2018-09-20 14:43:12 -07:00
Thomas Viehmann	068eac255b	Jit fuse clamp (#11574 ) Summary: This patch adds fused forward and backward for clamp to the jit. This is one item of #11118 . If it's OK, I'd be happy to also add some more of #11118 . The patch depends on #11150 , which I merged into master as a base. I'll rebase it when that or #10981 is merged. This is first serious jit patch, thank you, ngimel and the others for their guidance. All errors are my own. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11574 Differential Revision: D9943090 Pulled By: apaszke fbshipit-source-id: c40954b8c28c374baab8d3bd89acc9250580dc67	2018-09-20 14:43:10 -07:00
Christian Puhrsch	d8f6be686d	Remove torch/legacy (#11823 ) Summary: Largely unused and hinders current development Pull Request resolved: https://github.com/pytorch/pytorch/pull/11823 Differential Revision: D9925094 Pulled By: cpuhrsch fbshipit-source-id: c797f62180e2128f9a567b0c57c8347957470ea5	2018-09-20 14:00:54 -07:00
Pieter Noordhuis	24ec813967	Defer lazyInitCUDA() until needed (#11893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11893 This is needed to run binaries compiled with CUDA support on on CPU-only machines. Reviewed By: teng-li Differential Revision: D9972872 fbshipit-source-id: 7e4107925b3cd4d2fcf84ae532e800ab65f4b563	2018-09-20 12:12:42 -07:00
Edward Yang	9cd0ae5e2d	Remove deprecated factory functions from Type. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11583 Reviewed By: SsnL Differential Revision: D9792800 fbshipit-source-id: 9af46d577911ff38647790169df66aa5d0379dd9	2018-09-20 11:39:48 -07:00
Soumith Chintala	87701289a3	fix link to previous versions (#11894 ) Summary: https://github.com/pytorch/pytorch.github.io/issues/68#issuecomment-423073108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11894 Differential Revision: D9973695 Pulled By: soumith fbshipit-source-id: 1f74b12487ec39f4e88b527dcdfca0742e689c15	2018-09-20 11:10:37 -07:00
Soumith Chintala	0927386890	Workaround CUDA logging on some embedded platforms (#11851 ) Summary: Fixes #11518 Upstream PR submitted at https://gitlab.kitware.com/cmake/cmake/merge_requests/2400 On some embedded platforms, the NVIDIA driver is verbose logging unexpected output to stdout. One example is Drive PX2, where we see something like this whenever a CUDA program is run: ``` nvrm_gpu: Bug 200215060 workaround enabled. ``` This patch does a regex on the output of the architecture detection program to only capture architecture patterns. It's more robust than before, but not fool-proof. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11851 Differential Revision: D9968362 Pulled By: soumith fbshipit-source-id: b7952a87132ab05c724b287b76de263f1f671a0e	2018-09-20 09:26:00 -07:00
Pieter Noordhuis	1c77f9e543	Support torch.distributed.barrier in gloo backend Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11844 Reviewed By: colesbury, SsnL Differential Revision: D9929055 Pulled By: pietern fbshipit-source-id: 3a34a179cb80f495f18aa926c0f9513924737d8e	2018-09-20 09:25:59 -07:00
Richard Zou	8f4601fbac	renable test_scalar_fusion Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11378 Differential Revision: D9943578 Pulled By: zou3519 fbshipit-source-id: fb9e4303e844d5e2515acce7869bcbe11526ab56	2018-09-20 07:56:25 -07:00
Alexander Sidorov	23dd5b4a53	Back out "Open-source ThreadSafeActivationCleaningPredictor" Summary: Original commit changeset: bfe253ae5fc8 Apparently Ads push process detected some regression which normal canaries don't show. https://fb.facebook.com/groups/1274424122598505/permalink/2597819483592289/ Reviewed By: highker, Prowindy Differential Revision: D9952807 fbshipit-source-id: 1a3ea249c3b1e2618220c61f3d51468824b6ef10	2018-09-19 21:26:51 -07:00
Hong Xu	83740eae4a	Avoid using PyThreadState.frame as it is not a public member. (#11855 ) Summary: The doc of PyThreadState [1] emphasizes that interp is its only public member. Use PyEval_GetFrame() instead. [1] https://docs.python.org/3/c-api/init.html#c.PyThreadState Pull Request resolved: https://github.com/pytorch/pytorch/pull/11855 Differential Revision: D9954430 Pulled By: ezyang fbshipit-source-id: 92da6781e45e2bcb5e3a37b162fa40e49d823215	2018-09-19 20:58:37 -07:00
Tommy Yu	c64331f48f	Add test for verifying combine_spatial_bn values in DPM (#11710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11710 Added a test to check that output and gradient values are correctly calculated wehn combine_spatial_bn is true on data parallel model Reviewed By: enosair Differential Revision: D9833660 fbshipit-source-id: 14d29fbebefa9dc303ffae06f9899ea4bde23025	2018-09-19 20:17:51 -07:00
Mingzhe Li	aa8cd7319a	Enable build_test on windows (#11802 ) Summary: This PR enables BUILD_TEST for Caffe2 on windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11802 Reviewed By: orionr Differential Revision: D9951223 Pulled By: mingzhe09088 fbshipit-source-id: 7cdc1626b999daadeae482bd569eebdbd53eb6d4	2018-09-19 20:17:49 -07:00
Peter Goldsborough	c22dcc266f	Show build output in verbose mode of C++ extensions (#11724 ) Summary: Two improvements to C++ extensions: 1. In verbose mode, show the ninja build output (the exact compile commands, very useful) 2. When raising an error, don't show the `CalledProcessError` that shows ninja failing, only show the `RuntimeError` with the captured stdout soumith fmassa ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11724 Differential Revision: D9922459 Pulled By: goldsborough fbshipit-source-id: 5b319bf24348eabfe5f4c55d6d8e799b9abe523a	2018-09-19 20:17:43 -07:00
David Riazati	1091c5e59f	Throw error on indexing a 0 dim tensor (#11679 ) Summary: Following through on warning that indexing 0-dim tensor would be an error in PyTorch 0.5 and to use `item()` instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/11679 Reviewed By: soumith Differential Revision: D9833570 Pulled By: driazati fbshipit-source-id: ac19f811fa7320d30b7f60cf66b596d6de684d86	2018-09-19 18:10:03 -07:00
Lu Fang	6831d64591	Fix the symbolic for embedding_bag in ONNX_ATEN_FALLBACK (#11840 ) Summary: The ATen interface was changed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11840 Reviewed By: BIT-silence Differential Revision: D9932452 Pulled By: houseroad fbshipit-source-id: dd2040fcaa0f6052e5856ee19823cf3064124585	2018-09-19 17:40:39 -07:00
sytrus-in-github	ae1a972d78	Fix #11752 : correct numerical issue with log_softmax (#11866 ) Summary: This fixes the numerical problem in log_softmax cpu code when inputs are big but their differences are small. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11866 Differential Revision: D9946799 Pulled By: soumith fbshipit-source-id: 11fe8d92b91ef6b7a66f33fbce37ec2f0f0929be	2018-09-19 17:09:45 -07:00
Edward Yang	6302e4001a	Delete unnecessary include from allocator.cc/event_cpu.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11862 Reviewed By: Yangqing Differential Revision: D9942428 fbshipit-source-id: dea03f5ba0e621a047aa50bc4aa97acc834d2a39	2018-09-19 16:45:54 -07:00
Edward Yang	f4d25039cb	Fix Array.h when compiled with C++17 (#11816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11816 The file isn't in the std:: namespace, so is_same must be qualified. Reviewed By: smessmer Differential Revision: D9923774 fbshipit-source-id: 126532e27f08b5616ca46be1293d5d837920f588	2018-09-19 16:45:53 -07:00
Edward Yang	b06e35b568	Back out "Revert D9924348: Expunge (transitive) caffe2_pb2 dependency from tensor_impl.h from context_base.h" Summary: Original commit changeset: 0d1792804d73 Reviewed By: Yangqing Differential Revision: D9940725 fbshipit-source-id: 540a8ac7afcfe56a6b63abc6ed297c9434320998	2018-09-19 16:45:51 -07:00
Edward Yang	cedd12d86a	Explicitly qualify references to CPU. (#11819 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11819 Differential Revision: D9928730 Pulled By: ezyang fbshipit-source-id: 3140b6ef168586558f04fa8ee90f6f2169605d7d	2018-09-19 16:45:49 -07:00
Tongzhou Wang	24e958a0a7	Move bernoulli into ATen (#10273 ) Summary: + https://github.com/pytorch/pytorch/issues/10236 : torch.bernoulli's out kwarg is broken fixed in moving `bernoulli_out` to ATen + https://github.com/pytorch/pytorch/issues/9917 : BUG torch.bernoulli(p.expand(shape)) is broken fixed in moving all `bernoulli` ops in ATen to use the modern apply utils methods + https://github.com/pytorch/pytorch/issues/10357 : torch.bernoulli inconsistent gpu/cpu results fixed by adding CUDA asserts In order to use `curand_uniform4`, I made some changes to `CUDAApplyUtils.cuh`. Specifically, I introduced an optional template parameter `int step` to the `CUDA_tensor_applyN` methods, representing that we want to process `step` values at each time for each of the `N` tensors. The calling convention for `step = 1` (default) isn't changed. But if `step > 1`, the given lambda `op` must take in `int n` as its first argument, representing the number of valid values, because there may not be full `step` values at the boundary. E.g., here is what the `bernoulli(self, p_tensor)` call look like: ```cpp // The template argument `4` below indicates that we want to operate on four // element at each time. See NOTE [ CUDA_tensor_applyN helpers ] for details. at::cuda::CUDA_tensor_apply2<scalar_t, prob_t, 4>( ret, p, [seeds] __device__( int n, scalar_t& v1, scalar_t& v2, scalar_t& v3, scalar_t& v4, const prob_t& p1, const prob_t& p2, const prob_t& p3, const prob_t& p4) { curandStatePhilox4_32_10_t state; curand_init( seeds.first, blockIdx.x * blockDim.x + threadIdx.x, seeds.second, &state); float4 rand = curand_uniform4(&state); switch (n) { case 4: { assert(0 <= p4 && p4 <= 1); v4 = static_cast<scalar_t>(rand.w <= p4); } case 3: { assert(0 <= p3 && p3 <= 1); v3 = static_cast<scalar_t>(rand.z <= p3); } case 2: { assert(0 <= p2 && p2 <= 1); v2 = static_cast<scalar_t>(rand.y <= p2); } case 1: { assert(0 <= p1 && p1 <= 1); v1 = static_cast<scalar_t>(rand.x <= p1); } } } ); ``` Benchmarking on `torch.rand(200, 300, 400)` 20 times, each time with 20 loops: post patch ``` ➜ ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py torch.bernoulli(x) 6.841588497161865 +- 0.05413117632269859 torch.bernoulli(xc) 0.05963418632745743 +- 0.0008014909108169377 x.bernoulli_() 0.4024486541748047 +- 0.0021550932433456182 xc.bernoulli_() 0.02167394384741783 +- 2.3818030967959203e-05 ``` pre-patch ``` ➜ ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py torch.bernoulli(x) 12.394511222839355 +- 0.0966421514749527 torch.bernoulli(xc) 0.08970972150564194 +- 0.0038722590543329716 x.bernoulli_() 1.654480218887329 +- 0.02364428900182247 xc.bernoulli_() 0.058352887630462646 +- 0.003094920190051198 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10273 Differential Revision: D9831294 Pulled By: SsnL fbshipit-source-id: 65e0655a36b90d5278b675d35cb5327751604088	2018-09-19 16:45:47 -07:00
Bram Wasti	cf5a21e4a1	Add back proto opt disable feature that was lost during refactor (#11875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11875 Seems like the refactor to predictor_config dropped some functionality that is now blocking other teams rFBS2b30208263c14ce7039f27c618a3b232bf11ee33 is the change that was missed hoping to land this quickly :) Reviewed By: jonmorton Differential Revision: D9948324 fbshipit-source-id: 1628f7c51c06319fa7ca5dc9d59799135bb82c5f	2018-09-19 15:33:26 -07:00
Tongzhou Wang	c30790797f	Minor data loader doc improvements Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11821 Differential Revision: D9948292 Pulled By: SsnL fbshipit-source-id: 01c21c129423c0f7844b403e665a8fe021a9c820	2018-09-19 15:33:25 -07:00
Lu Fang	ce55767091	Add the missing header (#11864 ) Summary: Otherwise, some macro doesn't have the definition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11864 Reviewed By: BIT-silence Differential Revision: D9943327 Pulled By: houseroad fbshipit-source-id: 53e1bfc7a6b832f249f169b75a8fc15cdab63bf4	2018-09-19 14:40:19 -07:00
Ansha Yu	3b1a5a1b8a	Refactor tests part 2 (#11811 ) Summary: Followup to the [first refactor](https://github.com/pytorch/pytorch/pull/11350). Increase coverage of tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/11811 Reviewed By: houseroad Differential Revision: D9923074 Pulled By: ajyu fbshipit-source-id: 0f899bb9e9a75bf7ed939e06cc9b028daa7f6bd9	2018-09-19 10:09:28 -07:00
Pieter Noordhuis	52472508e9	Add env:// rendezvous test (#11782 ) Summary: A missing environment variable raised a missing key error. Now it raises a more descriptive error of the actual problem, for example: ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable WORLD_SIZE expected, but not set Pull Request resolved: https://github.com/pytorch/pytorch/pull/11782 Differential Revision: D9888962 Pulled By: pietern fbshipit-source-id: 5947e7a7bf7aa45f13bbd7b5e997529f26cc92d6	2018-09-19 09:56:06 -07:00
Will Feng	fa32317780	Add empty tensor tests to test_sparse (#11228 ) Summary: This PR adds empty sparse tensor tests to `test_sparse.py`, and also fix various places in internal code to make the tests pass. [NOTE] API CHANGE: - `coalesce` on sparse tensor will always be performed out-of-place now (meaning the original tensor will never be affected) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11228 Differential Revision: D9930449 Pulled By: yf225 fbshipit-source-id: 7c62439b216a6badf7938a10741c358ff18a556d	2018-09-19 09:40:26 -07:00
Adam Paszke	8c3a94eaf2	Improve autograd profiler performance (#11773 ) Summary: To illustrate the benefits of this commit, I'll use the time/iter I got from one of the JIT benchmarks on my machine. \| Run \| Time \| \|----------------------------------------------\|-------------------------\| \| No profiler \| 45ms \| \| With profiler \| 56ms \| \| Use `clock_gettime` instead of `std::chrono` \| 48ms \| \| Touch all pages on block allocation \| 48ms (less jitter) \| \| Use `const char*` instead of `std::string` \| 47ms (even less jitter) \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/11773 Differential Revision: D9886858 Pulled By: apaszke fbshipit-source-id: 58f926f09e95df0b11ec687763a72b06b66991d0	2018-09-19 09:25:43 -07:00
Peter Goldsborough	b3a2665e0f	Code-reorg to have TORCH_ARG in its own header (#11787 ) Summary: I noticed I was including `torch/nn/pimpl.h` in the optimizer library just to access `TORCH_ARG`, even though that file includes a lot of irrelevant code. Let's save some re-compilation time by refactoring this macro into a separate logical file. #small-wins ebetica ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11787 Differential Revision: D9924447 Pulled By: goldsborough fbshipit-source-id: 5acd4ba559ffb2a3e97277e74bb731d7b1074dcf	2018-09-19 09:25:41 -07:00
Lu Fang	32494c226e	OperatorDef <==> NodeProto Conversion (#11621 ) Summary: Operator level proto conversion between (new) torch proto and (old) caffe2 proto. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11621 Reviewed By: BIT-silence Differential Revision: D9892422 Pulled By: houseroad fbshipit-source-id: 01a55ec0a09479876a27082d90fc970723f4d431	2018-09-19 08:41:33 -07:00
Natalia Gimelshein	8601b33c07	fix half grad assignment (#11781 ) Summary: currently grad assignment for half type fails with a misleading RuntimeError ``` RuntimeError: torch.cuda.sparse.HalfTensor is not enabled. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11781 Differential Revision: D9931884 Pulled By: soumith fbshipit-source-id: 03e946c3833d1339a99585c9aa2dbb670f8bf459	2018-09-18 23:00:49 -07:00
Alexander Sidorov	b46f1b8ca7	Open-source ThreadSafeActivationCleaningPredictor (#11779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11779 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11731 This Predictor provides threadsafe interface and also cleans-up activations after each run. So in multi-model setup activation space doesn't explode Reviewed By: highker Differential Revision: D9842374 fbshipit-source-id: bfe253ae5fc813e73a347c5147ff6b58d50781ea	2018-09-18 21:56:58 -07:00
Soumith Chintala	77af40c025	prioritize Accelerate over OpenBLAS (#11812 ) Summary: might fix some binary build issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/11812 Reviewed By: ezyang Differential Revision: D9927309 Pulled By: soumith fbshipit-source-id: 9ed6c2c6fedc2a1cffbf52bc0a795135d4239800	2018-09-18 21:56:57 -07:00
Yangqing Jia	53b5f14f59	Remove inclusion of caffe2 pb (#11820 ) Summary: Probably not needed, but fwiw. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11820 Reviewed By: orionr Differential Revision: D9924953 Pulled By: Yangqing fbshipit-source-id: 4d340e3d4f4dadc50fb68bed9572b8e1e54b5f6d	2018-09-18 21:16:19 -07:00
Yinghai Lu	a26ad5a332	Remove unnecessary check on device option pointer (#11845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11845 The device pointer will be used by cudaPointerGetAttributes, which handles nullptr already. So this check is not necessary. https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__UNIFIED.html#group__CUDART__UNIFIED_1gd89830e17d399c064a2f3c3fa8bb4390 Reviewed By: salexspb Differential Revision: D9929828 fbshipit-source-id: d862f7e5590998ffafe9bfc7754b0f83d2ae4af4	2018-09-18 21:16:18 -07:00
Wei Yang	8aedc27a63	checking device types of input and weights at RNN (#10185 ) Summary: - fixes #9534 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10185 Differential Revision: D9141222 Pulled By: weiyangfb fbshipit-source-id: bb652e42cc15917019df080d6bce2926b18f3476	2018-09-18 20:26:02 -07:00
Anthony Miller	e80d1d2876	Revert D9924348: Expunge (transitive) caffe2_pb2 dependency from tensor_impl.h from context_base.h Differential Revision: D9924348 Original commit changeset: 8d92b9e8b424 fbshipit-source-id: 0d1792804d7387023af3a9c29477f1da6f40044a	2018-09-18 18:27:00 -07:00
Jerry Pan	2c358eaf51	Caffe2: add plan name to logging (#11704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11704 Add plan name to the logging in RunPlan Reviewed By: Tianshu-Bao Differential Revision: D9802416 fbshipit-source-id: 45c359dba0a5d992e303b3cdcf34624881a631d8	2018-09-18 18:10:13 -07:00
Will Feng	1f34be47d9	Raise error when perf test result is NaN (#11588 ) Summary: Currently one of our GPU perf tests `test_gpu_speed_mnist` reports NaN after this commit (https://github.com/pytorch/pytorch/pull/8018), and we didn't have the logic in place to raise error when this happens. This PR fixes the problem and will also update the baseline properly even if its previous value is NaN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11588 Differential Revision: D9831798 Pulled By: yf225 fbshipit-source-id: b95eee38d69b3b8273f48b8ac7b7e0e79cf756ed	2018-09-18 18:10:12 -07:00
David Riazati	a79f5d77ad	Add pretty printer for JIT IR (#10319 ) Summary: Adds some pretty-printing capability to the IR graph to make debugging easier/more human readable, see `torch/csrc/jit/test_jit.cpp:925` and onwards for example outputs. Results aren't perfect yet but it's a start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10319 Reviewed By: zdevito Differential Revision: D9558402 Pulled By: driazati fbshipit-source-id: 1d61c02818daa4c9bdca36d1477d1734cfc7d043	2018-09-18 17:39:44 -07:00
Edward Yang	1c8686001f	Expunge (transitive) caffe2_pb2 dependency from tensor_impl.h from context_base.h (#11818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11818 To do this, I have to move the static context registry into ATen/core. I take the opportunity to convert it into an unordered_map. Reviewed By: Yangqing Differential Revision: D9924348 fbshipit-source-id: 8d92b9e8b4246ce608eba24ecef7ad5f8b9b6582	2018-09-18 17:25:46 -07:00
Yangqing Jia	3da8d71d7d	remove protobuf inclusion in core/logging.h (#11814 ) Summary: This should not be there since logging does not depend on protobuf. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11814 Reviewed By: ezyang Differential Revision: D9923819 Pulled By: Yangqing fbshipit-source-id: 4d4edaea1a2e317f5db6e92c35d58c85dd35c5fb	2018-09-18 17:10:02 -07:00
Sebastian Messmer	53cf628503	Simplify Blob move constructor/assignment (#11402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11402 - Simplify move constructor/assignment - Make more things noexcept Reviewed By: ezyang Differential Revision: D9728631 fbshipit-source-id: 92562e30ea1e4d05ca857665a02b0ca66b0739e3	2018-09-18 15:09:40 -07:00
sven	e585f2fb48	Polish CPP docs, Minor Python Docs Fixes (#11722 ) Differential Revision: D9919120 Pulled By: goldsborough fbshipit-source-id: bf14cbe4ab79524495957cb749828046af864aab	2018-09-18 14:55:57 -07:00
Orion Reblitz-Richardson	8ad846fda5	Don't build Detectron ops with NO_CAFFE2_OPS=1 (#11799 ) Summary: cc apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11799 Differential Revision: D9922745 Pulled By: orionr fbshipit-source-id: b88724b7c2919aabc00d98658e8e563233e01c85	2018-09-18 14:09:33 -07:00
Wanchao Liang	d4e1fa45d0	allow no-alpha add/sub in onnx symbolic (#10972 ) Summary: The PR fixes #10873 The context is aten::add and aten::sub ST overloads don't have alpha, so onnx symbolic does not match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10972 Reviewed By: jamesr66a Differential Revision: D9724224 Pulled By: wanchaol fbshipit-source-id: eb5d1b09fa8f1604b288f4a62b8d1f0bc66611af	2018-09-18 13:55:39 -07:00
James Reed	7d25fa3c72	Emit Undefined type for value when it is Dynamic type (#11810 ) Summary: For example, outputs of control blocks often have Dynamic type, and when we try to export them to ONNX we get an invalid proto, since `elem_type` is not populated on the TypeInfoProto. This makes it so at least we can get past the checker, since having a dynamic typed output from a control block should still be semantically valid Pull Request resolved: https://github.com/pytorch/pytorch/pull/11810 Differential Revision: D9922754 Pulled By: jamesr66a fbshipit-source-id: 5c66113cc302a9d9b8b9f5a8605473d3c6ad5af1	2018-09-18 13:55:36 -07:00
Edward Yang	1d399a80a0	Handle pollution of MAX, MIN and CHECK macros. (#11805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11805 Some of our headers in Caffe2 pollute the macro namespace with things like MAX, MIN, CHECK, so I renamed these in places where this is a problem. This patch courtesy of gchanan, extracted out of #11721 Reviewed By: Yangqing Differential Revision: D9917757 fbshipit-source-id: 17fc692ca04b208dcb8ae00731ed60e393284f7c	2018-09-18 13:18:31 -07:00
Bram Wasti	9eb72889b4	Add successor/predecessor functions Summary: More functionality to prep nomnigraph for scheduler implementations Reviewed By: duc0 Differential Revision: D9794686 fbshipit-source-id: b460859d8ff965d0049b2a696bd8d2f5c97f3f86	2018-09-18 12:27:06 -07:00
Will Feng	47956ddf7e	Revert D9755189: [pytorch][PR] [API CHANGE] Add empty tensor tests to test_sparse Differential Revision: D9755189 Original commit changeset: e9d36f437db1 fbshipit-source-id: 8b99edf626418a953a8bd786847a6e0174a3a14d	2018-09-18 11:26:10 -07:00
Tongzhou Wang	540ef9b1fc	Add distributed get_backend (#11715 ) Summary: I have no idea how to run distributed tests locally so I'll let CI do this. Hopefully everything still works with `IntEnum`. cc mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/11715 Reviewed By: pietern Differential Revision: D9889646 Pulled By: SsnL fbshipit-source-id: 1e2a487cb6fe0bd4cc67501c9d72a295c35693e2	2018-09-18 10:56:24 -07:00
Soumith Chintala	2732c8bae1	improve aten/convolution error message (#11768 ) Summary: fixes https://github.com/pytorch/pytorch/issues/11762 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11768 Differential Revision: D9884185 Pulled By: soumith fbshipit-source-id: 2a0c3e1f5a4fb4833ae6e9fc791abcf45f7fbea2	2018-09-18 10:56:22 -07:00
Ansha Yu	98aebed88e	Refactor tests part 1 (#11350 ) Summary: Followup to [the serialized test framework](https://github.com/pytorch/pytorch/pull/10594) Round 1 for refactoring tests, starting alphabetically. I added some functionality, so I wanted to send out some of these initial changes sooner. I'm skipping all tests that don't explicitly call assertReferenceChecks. Some tests directly call np.allclose, and others are simply TestCase (rather than HypothesisTestCase). 1. Start alphabetically producing serialized outputs for test functions, annotating those we want to include with `serialized_test_util.given`. So far I've only added one test per operator, but this already does seem to add quite a few tests. 2. Add functionality to allow us to generate outputs using pytest by adding pytest argument options. This allows us to skip adding a `__main__` function to quite a few tests. 3. Catch any exceptions generating the gradient operator and skip serializing/reading it, since certain operators don't have gradients. 4. Add functionality to better handle jagged array inputs, which numpy doesn't handle very well. We simply explicitly do the conversion to dtype=object. 5. Make only one file per test function, rather than 4, to reduce the number of files in the github repo. I also noticed that there is some hypothesis handling that makes `serialized_test_util.given` not compatible with adding more hypothesis decorators on top. For example, there are tests that do ``` settings(...) given(...) def test_my_stuff(...) ``` But there is a hypothesis handler that explicitly checks that `given` is called below `settings`, so we cannot refactor this to `serialized_test_util.given`. I've just avoided decorating these kinds of tests for now, I hope that's alright. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11350 Reviewed By: houseroad Differential Revision: D9693857 Pulled By: ajyu fbshipit-source-id: a9b4279afbe51c90cf2025c5ac6b2db2111f4af7	2018-09-18 10:42:10 -07:00
Peter Goldsborough	6073f3073e	Document torch::nn::init (#11778 ) Summary: Doc fixes and documentation for `torch::nn::init`. ebetica soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11778 Differential Revision: D9886648 Pulled By: goldsborough fbshipit-source-id: 22eb78add1dc32b92cc32253683ab3d746505a64	2018-09-18 10:26:21 -07:00
Will Feng	c8fbeb3aa2	Add empty tensor tests to test_sparse (#11228 ) Summary: This PR adds empty sparse tensor tests to `test_sparse.py`, and also fix various places in internal code to make the tests pass. [NOTE] API CHANGE: - `coalesce` on sparse tensor will always be performed out-of-place now (meaning the original tensor will never be affected) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11228 Differential Revision: D9755189 Pulled By: yf225 fbshipit-source-id: e9d36f437db1a132c423d3a282ff405a084ae7cc	2018-09-18 10:26:18 -07:00
Gregory Chanan	e00fb69b25	Use CATCH prefix to avoid name conflicts with Caffe2. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11780 Differential Revision: D9889925 Pulled By: gchanan fbshipit-source-id: 5eca849c36ced00b8ae7482b7945b445a3e1687e	2018-09-18 08:12:45 -07:00
Amitesh Arora	4ee0a78ee6	varargs for meshgrid (#11600 ) Summary: Adds vararg support for meshgrid and adds checks for all the tensor arguments to have the same dtype and device. Fixes: [#10823](https://github.com/pytorch/pytorch/issues/10823), #11446 The earlier pull request closed without any changes because I had some rebasing issues, so I made another pull request to close out #10823. Sorry for the inconvenience. Differential Revision: D9892876 Pulled By: ezyang fbshipit-source-id: 93d96cafc876102ccbad3ca2cc3d81cb4c9bf556	2018-09-18 07:41:31 -07:00
Xingdong Zuo	e2bc95e1bd	add `ModuleList.insert` (#11664 ) Summary: fixes #11652 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11664 Differential Revision: D9892845 Pulled By: ezyang fbshipit-source-id: 2c910d6bc0b28a999e25beca6e398fd0f35535c5	2018-09-18 07:41:28 -07:00
nehz	91b6458e2d	Container __getitem__ slicing for subclasses (#11694 ) Summary: Simple change to allow ModuleList subclasses's `__getitem__(slice)` to return class of subclass rather than ModuleList Pull Request resolved: https://github.com/pytorch/pytorch/pull/11694 Differential Revision: D9892824 Pulled By: ezyang fbshipit-source-id: b75e9c196487f55cb93f0dab6c20d850e8e759ff	2018-09-18 01:26:18 -07:00
Marc Ferradou	e734c94fa2	Quick update to embedding_bag doc (#11784 ) Summary: Related to #11624 adding maxes to the function def of embedding_bag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11784 Differential Revision: D9892598 Pulled By: ezyang fbshipit-source-id: e6372ccf631826ddf1e1885b2f8f75f354a36c0b	2018-09-17 23:56:05 -07:00
Wei Yang	407a9fee0c	make copy constructed tensor a leaf variable when using torch.tensor(sourceTensor) (#11061 ) Summary: - fix https://github.com/pytorch/pytorch/issues/10876 - the cause of the bug is because copy constructor cannot distinguish between default value of requires_grad and requires_grad=False, thus it makes a copy from source tensor along with its grad_fn if requires_grad=True at source - with this fix, the behavior becomes ``` >>> source = torch.randn(2, 2, requires_grad=True) >>> copy = torch.tensor(source, requires_grad=True) >>> print(copy) tensor([[-1.2001, 1.9869], [-1.0134, 1.3096]], grad_fn=<CopyBackwards>) >>> source = torch.randn(2, 2, requires_grad=True) >>> copy = torch.tensor(source, requires_grad=False) >>> print(copy) tensor([[-0.7402, 0.0467], [ 0.4344, -0.0420]]) >>> source = torch.randn(2, 2, requires_grad=True) >>> copy = torch.tensor(source) >>> print(copy) tensor([[-0.7402, 0.0467], [ 0.4344, -0.0420]]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11061 Differential Revision: D9569714 Pulled By: weiyangfb fbshipit-source-id: ea368688bdc0f1ce5997870e164e42835b64b4a1	2018-09-17 23:29:09 -07:00
Peter Goldsborough	63c811b3a6	Include some JIT things in C++ docs (#11712 ) Summary: Since we're making parts of the JIT public as part of loading script modules, they should be on the cppdocs website. Orthogonal: We decided not to export things like `IValue` into the `torch` namespace, so `RegisterOperators` shouldn't be there either. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11712 Differential Revision: D9837578 Pulled By: goldsborough fbshipit-source-id: 4c06d2fa9dd4b4216951f27424c2ce795febab9c	2018-09-17 23:29:04 -07:00
Christian Puhrsch	bd43d64dd5	Add strides to Tensor (#11763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11763 baseline-std vector ``` ============================================================================ caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s ============================================================================ TensorConstructionDestruction 6.74us 148.26K TensorShareData 5.89us 169.78K TensorShareExternalPointer 1.01us 994.35K TensorReallocation 2.46us 405.78K ============================================================================ ============================================================================ caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s ============================================================================ TensorConstructionDestruction 7.50us 133.27K TensorShareData 7.07us 141.38K TensorShareExternalPointer 1.05us 955.19K TensorReallocation 2.55us 391.62K ============================================================================ ``` baseline-smallvector ``` ============================================================================ caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s ============================================================================ TensorConstructionDestruction 6.56us 152.34K TensorShareData 5.84us 171.32K TensorShareExternalPointer 962.49ns 1.04M TensorReallocation 2.32us 431.73K ============================================================================ ============================================================================ caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s ============================================================================ TensorConstructionDestruction 6.29us 159.04K TensorShareData 5.73us 174.39K TensorShareExternalPointer 914.90ns 1.09M TensorReallocation 2.29us 435.80K ============================================================================ ``` Reviewed By: ezyang Differential Revision: D9694097 fbshipit-source-id: c462e770a4b40e640d8c9d38e0ae7036a4e6e84a	2018-09-17 22:09:40 -07:00
Thomas Viehmann	a02685e109	Fix test_torch's test_potri (#11770 ) Summary: tset_potri -> test_potri, even though it has been like this for a long time More a curiosity than grave functionality... Pull Request resolved: https://github.com/pytorch/pytorch/pull/11770 Reviewed By: ezyang Differential Revision: D9884767 Pulled By: soumith fbshipit-source-id: 9bedde2e94ade281ab1ecc2293ca3cb1a0107387	2018-09-17 21:58:18 -07:00
Pieter Noordhuis	3cbec5453b	Reorder statements for readability (#11764 ) Summary: I was reading this a couple times before figuring out it's also the entry point for the MPI_COMM_WORLD. Reordered statements and added comment to clarify. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11764 Differential Revision: D9882834 Pulled By: pietern fbshipit-source-id: a9282d55368815925fd695a2541354e5aec599da	2018-09-17 21:58:15 -07:00
Mingzhe Li	a7cbcb1bb9	Enable build_python on windows (#11385 ) Summary: The PR aims to resolve issues related to BUILD_PYTHON and BUILD_TEST after FULL_CAFFE2 is removed on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11385 Reviewed By: orionr Differential Revision: D9884906 Pulled By: mingzhe09088 fbshipit-source-id: fc114c0cbff6223f1ec261161e4caecc1fef5dd6	2018-09-17 21:40:03 -07:00
Tianshu Bao	63e384a381	SNNTest with Data Preproc Service (#11707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11707 Trigger SNN offline training test with data preproc service. Reviewed By: xsh6528 Differential Revision: D9826978 fbshipit-source-id: f98405ca1e61a7662bf0d9313aaba42436025a83	2018-09-17 21:25:49 -07:00
Sam Gross	7f0dd2487d	Move AT_HOST_DEVICE macro to Macros.h (#10945 ) Summary: ``` I'm using AT_HOST_DEVICE outside of Half.h in an upcoming PR. Since this changes code without making any semantic changes, I wanted to make this change in a separate PR. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10945 Differential Revision: D9539821 Pulled By: colesbury fbshipit-source-id: 0daae40ea78b077a543f7bfeec06b225634540de	2018-09-17 18:25:51 -07:00
Bram Wasti	e8ecbcdf01	Move IValue to ATen/core (#11610 ) Summary: unblocks D9202320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11610 Differential Revision: D9774853 Pulled By: bwasti fbshipit-source-id: 4798223f6de680a7152283e8cad8814da7f90209	2018-09-17 18:25:50 -07:00
Junjie Bai	d4dde0bcaf	Detect number of amd gpus in ROCM CI (#11771 ) Summary: We now have CI machines with different number of amd gpus. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11771 Differential Revision: D9889837 Pulled By: bddppq fbshipit-source-id: dacf728a282f209e3f2419da186e59528a08ca6a	2018-09-17 18:11:09 -07:00
Pieter Noordhuis	24a8c13f36	Add barrier to fix distributed test flakiness (#11775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11775 This should fix #11582. Reviewed By: ezyang Differential Revision: D9885546 fbshipit-source-id: 3544f42ebe8b595cdf6941859c67484d3ea9b3f8	2018-09-17 17:31:45 -07:00
zrphercule	7d0657f13c	Migrate test in cpp/api/ to use gtest (#11556 ) Summary: The second part of T32009899 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11556 Differential Revision: D9888224 Pulled By: zrphercule fbshipit-source-id: cb0d0ba5d9c7ad601ee3bce0d932ce9cbbc40908	2018-09-17 17:31:43 -07:00
Bram Wasti	3819d25418	Clean up converter and accept less-valid networks Summary: Cleaning up converter.cc and allowing networks that have "pass through" inputs (that are also outputs but aren't actually consumed by the network) Reviewed By: duc0 Differential Revision: D9759435 fbshipit-source-id: 1ddfcc60a1b865a06682e4022230dfecc4b89ec3	2018-09-17 17:31:41 -07:00
Bram Wasti	ca5def1b8f	Expose annotations (#11649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11649 Putting annotations in python interface Reviewed By: duc0 Differential Revision: D9784750 fbshipit-source-id: d877c886ac52559ca3f009a1fd848dd1779b7d04	2018-09-17 16:39:37 -07:00
Gregory Chanan	3ce17bf8f6	Generate ATen/core to source if env GEN_TO_SOURCE is set. (#11759 ) Summary: It is currently tedious to change code generation because it takes two steps: change the code gen, then gen.py fails because of file mismatch. Just add an environment option of generating directly to source. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11759 Differential Revision: D9867259 Pulled By: gchanan fbshipit-source-id: 3cf8024d9e302f382cf8b8a44cb843fb086f8597	2018-09-17 15:25:33 -07:00
Tongzhou Wang	7df6650e9c	Fix empty embedding bag on cuda (#11740 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11739 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11740 Differential Revision: D9881392 Pulled By: SsnL fbshipit-source-id: 2964d314f199dd9b4bb69e36592b67efdf5e0760	2018-09-17 14:40:03 -07:00
David Riazati	7671f4ab1c	Add `math` to scope when using inf in tests (#11302 ) Summary: This fixes #8515 which was mostly issues in the test themselves. As long as `math` is imported in the scope in which the script runs it resolves to a `prim::Constant` with value `inf` correctly. This PR adds this to the `test_jit.py` tests involving `inf` and adds a test to demonstrate `inf` in a non-generated test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11302 Differential Revision: D9684336 Pulled By: driazati fbshipit-source-id: 73df2848dfdb45ab50690a7c88df8fda269a64eb	2018-09-17 14:08:32 -07:00
Jongsoo Park	29610621ec	64B align for avx512 (#11748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11748 For avx512, we need to align at a multiple of 64B not 32B Regardless of avx512, it's in general a good idea to be cache line aligned. Reviewed By: ilia-cher Differential Revision: D9845056 fbshipit-source-id: b1d3ed67749c0c1a64acd5cc230a1279e8023512	2018-09-17 14:08:31 -07:00
Natalia Gimelshein	336323f53c	return aten::gt to the list of fusable operations, add expected graphs (#11150 ) Summary: Fixes one of #11118 issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11150 Differential Revision: D9861372 Pulled By: apaszke fbshipit-source-id: 98b196b89e991d3936360b30568360367fd32e8b	2018-09-17 13:40:41 -07:00

3489 changed files with 162517 additions and 114901 deletions

1115

.circleci/config.yml

View File

File diff suppressed because it is too large Load Diff

60

.clang-tidy

View File

 @ -1,51 +1,31 @@
 ---
 # NOTE: there must be no spaces before the '-', so put the comma first.
 # NOTE there must be no spaces before the '-', so put the comma first.
 Checks: '
   *
   ,clang-analyzer-*
   ,modernize-*
   ,-cert-dcl21-cpp
   ,-cert-err58-cpp
   ,-cert-err60-cpp
   ,-clang-diagnostic-*
   ,-cppcoreguidelines-owning-memory
   -*
   ,bugprone-*
   ,-bugprone-macro-parentheses
   ,-bugprone-forward-declaration-namespace
   ,cppcoreguidelines-*
   ,-cppcoreguidelines-pro-bounds-array-to-pointer-decay
   ,-cppcoreguidelines-pro-bounds-constant-array-index
   ,-cppcoreguidelines-pro-type-member-init
   ,-cppcoreguidelines-pro-type-static-cast-downcast
   ,-cppcoreguidelines-pro-type-union-access
   ,-cppcoreguidelines-pro-bounds-pointer-arithmetic
   ,-cppcoreguidelines-pro-bounds-constant-array-index
   ,-cppcoreguidelines-pro-type-cstyle-cast
   ,-cppcoreguidelines-pro-type-reinterpret-cast
   ,-cppcoreguidelines-pro-type-vararg
   ,-cppcoreguidelines-special-member-functions
   ,-fuchsia-*
   ,-google-build-using-namespace
   ,-google-default-arguments
   ,-google-explicit-constructor
   ,-google-readability-braces-around-statements
   ,-google-readability-namespace-comments
   ,-google-readability-todo
   ,-google-runtime-references
   ,-google-runtime-references
   ,-hicpp-braces-around-statements
   ,-hicpp-explicit-conversions
   ,-hicpp-member-init
   ,-hicpp-no-array-decay
   ,-hicpp-signed-bitwise
   ,-hicpp-special-member-functions
   ,-hicpp-vararg
   ,-llvm-header-guard
   ,-llvm-include-order
   ,-llvm-namespace-comment
   ,-misc-unused-parameters
   ,-modernize-make-unique
   ,-cppcoreguidelines-interfaces-global-init
   ,-cppcoreguidelines-owning-memory
   ,hicpp-signed-bitwise
   ,hicpp-exception-baseclass
   ,hicpp-avoid-goto
   ,modernize-*
   ,-modernize-use-default-member-init
   ,-performance-unnecessary-value-param
   ,-readability-braces-around-statements
   ,-readability-else-after-return
   ,-readability-implicit-bool-conversion
   ,-readability-named-parameter
   ,-modernize-return-braced-init-list
   ,-modernize-use-auto
   '
 WarningsAsErrors: ''
 HeaderFilterRegex: 'torch/csrc/'
 WarningsAsErrors: '*'
 HeaderFilterRegex: 'torch/csrc/.*'
 AnalyzeTemporaryDtors: false
 CheckOptions:
 ...

									
										49

.github/ISSUE_TEMPLATE/bug-report.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,49 @@

				---

				name: "\U0001F41B Bug Report"

				about: Submit a bug report to help us improve PyTorch

				---

				## 🐛 Bug

				<!-- A clear and concise description of what the bug is. -->

				## To Reproduce

				Steps to reproduce the behavior:

				1.

				1.

				1.

				<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->

				## Expected behavior

				<!-- A clear and concise description of what you expected to happen. -->

				## Environment

				Please copy and paste the output from our

				[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)

				(or fill out the checklist below manually).

				You can get the script and run it with:

				```

				wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py

				# For security purposes, please check the contents of collect_env.py before running it.

				python collect_env.py

				```

				 - PyTorch Version (e.g., 1.0):

				 - OS (e.g., Linux):

				 - How you installed PyTorch (`conda`, `pip`, source):

				 - Build command you used (if compiling from source):

				 - Python version:

				 - CUDA/cuDNN version:

				 - GPU models and configuration:

				 - Any other relevant information:

				## Additional context

				<!-- Add any other context about the problem here. -->

									
										9

.github/ISSUE_TEMPLATE/documentation.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,9 @@

				---

				name: "\U0001F4DA Documentation"

				about: Report an issue related to https://pytorch.org/docs

				---

				## 📚 Documentation

				<!-- A clear and concise description of what content in https://pytorch.org/docs is an issue. If this has to do with the general https://pytorch.org website, please file an issue at https://github.com/pytorch/pytorch.github.io/issues/new/choose instead. If this has to do with https://pytorch.org/tutorials, please file an issue at https://github.com/pytorch/tutorials/issues/new -->

									
										24

.github/ISSUE_TEMPLATE/feature-request.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,24 @@

				---

				name: "\U0001F680Feature Request"

				about: Submit a proposal/request for a new PyTorch feature

				---

				## 🚀 Feature

				<!-- A clear and concise description of the feature proposal -->

				## Motivation

				<!-- Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too -->

				## Pitch

				<!-- A clear and concise description of what you want to happen. -->

				## Alternatives

				<!-- A clear and concise description of any alternative solutions or features you've considered, if any. -->

				## Additional context

				<!-- Add any other context or screenshots about the feature request here. -->

									
										13

.github/ISSUE_TEMPLATE/questions-help-support.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,13 @@

				---

				name: "❓Questions/Help/Support"

				about: Do you need support? We have resources.

				---

				## ❓ Questions and Help

				### Please note that this issue tracker is not a help form and this issue will be closed.

				We have a set of [listed resources available on the website](https://pytorch.org/resources). Our primary means of support is our discussion forum:

				- [Discussion Forum](https://discuss.pytorch.org/)

22

.gitignore vendored

View File

 @ -25,9 +25,8 @@ aten/src/ATen/cuda/CUDAConfig.h
 build/
 dist/
 docs/src/**/*
 docs/cpp/xml/
 docs/cpp/html/
 docs/cpp/api/
 docs/cpp/build
 docs/cpp/source/api
 test/.coverage
 test/cpp/api/mnist
 test/custom_operator/model.pt
 @ -45,7 +44,7 @@ torch/csrc/cudnn/cuDNN.cpp
 torch/csrc/generated
 torch/csrc/generic/TensorMethods.cpp
 torch/csrc/jit/generated/*
 torch/csrc/jit/fusers/Config.h
 torch/csrc/jit/fuser/config.h
 torch/csrc/nn/THCUNN.cpp
 torch/csrc/nn/THCUNN.cwrap
 torch/csrc/nn/THNN_generic.cpp
 @ -200,6 +199,14 @@ caffe2.egg-info
 # Atom/Watchman required file
 .watchmanconfig
 # Files generated by CLion
 cmake-build-debug
 # Files generated by ctags
 CTAGS
 tags
 TAGS
 # BEGIN NOT-CLEAN-FILES (setup.py handles this marker. Do not change.)
 #
 # Below files are not deleted by "setup.py clean".
 @ -207,3 +214,10 @@ caffe2.egg-info
 # Visual Studio Code files
 .vscode
 .vs
 # YouCompleteMe config file
 .ycm_extra_conf.py
 # Files generated when a patch is rejected
 *.orig
 *.rej

24

.gitmodules vendored

View File

 @ -1,6 +1,3 @@
 [submodule "third_party/catch"]
 	path = third_party/catch
 	url = https://github.com/catchorg/Catch2.git
 [submodule "third_party/pybind11"]
 	path = third_party/pybind11
 	url = https://github.com/pybind/pybind11.git
 @ -13,9 +10,6 @@
 [submodule "third_party/googletest"]
 	path = third_party/googletest
 	url = https://github.com/google/googletest.git
 [submodule "third_party/nervanagpu"]
 	path = third_party/nervanagpu
 	url = https://github.com/NervanaSystems/nervanagpu.git
 [submodule "third_party/benchmark"]
 	path = third_party/benchmark
 	url = https://github.com/google/benchmark.git
 @ -64,9 +58,6 @@
 [submodule "third_party/onnx"]
 	path = third_party/onnx
 	url = https://github.com/onnx/onnx.git
 [submodule "third_party/cereal"]
 	path = third_party/cereal
 	url = https://github.com/USCiLab/cereal
 [submodule "third_party/onnx-tensorrt"]
 	path = third_party/onnx-tensorrt
 	url = https://github.com/onnx/onnx-tensorrt
 @ -76,3 +67,18 @@
 [submodule "third_party/ideep"]
 	path = third_party/ideep
 	url = https://github.com/intel/ideep
 [submodule "third_party/nccl/nccl"]
 	path = third_party/nccl/nccl
 	url = https://github.com/NVIDIA/nccl
 [submodule "third_party/gemmlowp/gemmlowp"]
 	path = third_party/gemmlowp/gemmlowp
 	url = https://github.com/google/gemmlowp.git
 [submodule "third_party/QNNPACK"]
 	path = third_party/QNNPACK
 	url = https://github.com/pytorch/QNNPACK
 [submodule "third_party/neon2sse"]
 	path = third_party/neon2sse
 	url = https://github.com/intel/ARM_NEON_2_x86_SSE.git
 [submodule "third_party/fbgemm"]
 	path = third_party/fbgemm
 	url = https://github.com/pytorch/fbgemm

									
										71

.jenkins/caffe2/build.sh
									
												View File
												
				@ -4,7 +4,6 @@ set -ex

				pip install --user --no-cache-dir hypothesis==3.59.0

				# The INSTALL_PREFIX here must match up with test.sh

				INSTALL_PREFIX="/usr/local/caffe2"

				LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)

				@ -26,22 +25,29 @@ if [ "$(which gcc)" != "/root/sccache/gcc" ]; then

				    fi

				    # Setup wrapper scripts

				    for compiler in cc c++ gcc g++ x86_64-linux-gnu-gcc; do

				    wrapped="cc c++ gcc g++ x86_64-linux-gnu-gcc"

				    if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]]; then

				        wrapped="$wrapped nvcc"

				    fi

				    for compiler in $wrapped; do

				      (

				        echo "#!/bin/sh"

				        # TODO: if/when sccache gains native support for an

				        # SCCACHE_DISABLE flag analogous to ccache's CCACHE_DISABLE,

				        # this can be removed. Alternatively, this can be removed when

				        # https://github.com/pytorch/pytorch/issues/13362 is fixed.

				        #

				        # NOTE: carefully quoted - we want `which compiler` to be

				        # resolved as we execute the script, but SCCACHE_DISABLE and

				        # $@ to be evaluated when we execute the script

				        echo 'test $SCCACHE_DISABLE && exec '"$(which $compiler)"' "$@"'

				        echo "exec $SCCACHE $(which $compiler) \"\$@\""

				      ) > "./sccache/$compiler"

				      chmod +x "./sccache/$compiler"

				    done

				    if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]]; then

				      (

				        echo "#!/bin/sh"

				        echo "exec $SCCACHE $(which nvcc) \"\$@\""

				      ) > "./sccache/nvcc"

				      chmod +x "./sccache/nvcc"

				    fi

				    export CACHE_WRAPPER_DIR="$PWD/sccache"

				    # CMake must find these wrapper scripts

				@ -93,7 +99,7 @@ fi

				###############################################################################

				# Use special scripts for Android, conda, and setup builds

				# Use special scripts for Android and setup builds

				###############################################################################

				if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then

				  export ANDROID_NDK=/opt/ndk

				@ -103,19 +109,6 @@ if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then

				  CMAKE_ARGS+=("-DUSE_ZSTD=ON")

				  "${ROOT_DIR}/scripts/build_android.sh" ${CMAKE_ARGS[*]} "$@"

				  exit 0

				elif [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				  "${ROOT_DIR}/scripts/build_anaconda.sh" --skip-tests --install-locally "$@"

				  report_compile_cache_stats

				  # This build will be tested against onnx tests, which needs onnx installed.

				  # At this point the visible protbuf installation will be in conda, since one

				  # of Caffe2's dependencies uses conda, so the correct protobuf include

				  # headers are those in conda as well

				  # This path comes from install_anaconda.sh which installs Anaconda into the

				  # docker image

				  PROTOBUF_INCDIR=/opt/conda/include pip install -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"

				  report_compile_cache_stats

				  exit 0

				fi

				@ -149,26 +142,19 @@ if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then

				  export PATH="/usr/local/cuda/bin:$PATH"

				fi

				if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then

				  # TODO: This is patching the official FindHip to properly handly

				  # cmake generator expression. A PR is opened in the upstream repo here:

				  # https://github.com/ROCm-Developer-Tools/HIP/pull/516

				  # remove this hack once it's merged.

				  if [[ -f /opt/rocm/hip/cmake/FindHIP.cmake ]]; then

				    sudo sed -i 's/\ -I${dir}/\ $<$<BOOL:${dir}>:-I${dir}>/' /opt/rocm/hip/cmake/FindHIP.cmake

				  fi

				  export LANG=C.UTF-8

				  export LC_ALL=C.UTF-8

				  export HCC_AMDGPU_TARGET=gfx900

				  # The link time of libcaffe2_hip.so takes 40 minutes, according to

				  # https://github.com/RadeonOpenCompute/hcc#thinlto-phase-1---implemented

				  # using using ThinLTO could significantly improve link-time performance.

				  export KMTHINLTO=1

				  # This is needed to enable ImageInput operator in resnet50_trainer

				  CMAKE_ARGS+=("-USE_OPENCV=ON")

				  # This is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip

				  CMAKE_ARGS+=("-USE_LMDB=ON")

				  ########## HIPIFY Caffe2 operators

				  ${PYTHON} "${ROOT_DIR}/tools/amd_build/build_pytorch_amd.py"

				  ${PYTHON} "${ROOT_DIR}/tools/amd_build/build_caffe2_amd.py"

				  ${PYTHON} "${ROOT_DIR}/tools/amd_build/build_amd.py"

				fi

				# building bundled nccl in this config triggers a bug in nvlink. For

				# more, see https://github.com/pytorch/pytorch/issues/14486

				if [[ "${BUILD_ENVIRONMENT}" == *-cuda8*-cudnn7* ]]; then

				    CMAKE_ARGS+=("-DUSE_SYSTEM_NCCL=ON")

				fi

				# Try to include Redis support for Linux builds

				@ -236,7 +222,6 @@ else

				  report_compile_cache_stats

				fi

				###############################################################################

				# Install ONNX

				###############################################################################

									
										42

.jenkins/caffe2/test.sh
									
												View File
												
				@ -15,14 +15,6 @@ fi

				# The prefix must mirror the setting from build.sh

				INSTALL_PREFIX="/usr/local/caffe2"

				# Anaconda builds have a special install prefix and python

				if [[ "$BUILD_ENVIRONMENT" == conda* ]]; then

				  # This path comes from install_anaconda.sh which installs Anaconda into the

				  # docker image

				  PYTHON="/opt/conda/bin/python"

				  INSTALL_PREFIX="/opt/conda/"

				fi

				# Add the site-packages in the caffe2 install prefix to the PYTHONPATH

				SITE_DIR=$($PYTHON -c "from distutils import sysconfig; print(sysconfig.get_python_lib(prefix=''))")

				INSTALL_SITE_DIR="${INSTALL_PREFIX}/${SITE_DIR}"

				@ -34,11 +26,9 @@ if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then

				fi

				# Set PYTHONPATH and LD_LIBRARY_PATH so that python can find the installed

				# Caffe2. This shouldn't be done on Anaconda, as Anaconda should handle this.

				if [[ "$BUILD_ENVIRONMENT" != conda* ]]; then

				  export PYTHONPATH="${PYTHONPATH}:$INSTALL_SITE_DIR"

				  export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${INSTALL_PREFIX}/lib"

				fi

				# Caffe2.

				export PYTHONPATH="${PYTHONPATH}:$INSTALL_SITE_DIR"

				export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${INSTALL_PREFIX}/lib"

				cd "$ROOT_DIR"

				@ -97,18 +87,8 @@ if [[ "$BUILD_ENVIRONMENT" == *-cuda* ]]; then

				  EXTRA_TESTS+=("$CAFFE2_PYPATH/contrib/nccl")

				fi

				conda_ignore_test=()

				if [[ $BUILD_ENVIRONMENT == conda* ]]; then

				  # These tests both assume Caffe2 was built with leveldb, which is not the case

				  conda_ignore_test+=("--ignore $CAFFE2_PYPATH/python/dataio_test.py")

				  conda_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/checkpoint_test.py")

				fi

				rocm_ignore_test=()

				if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then

				  export LANG=C.UTF-8

				  export LC_ALL=C.UTF-8

				  # Currently these tests are failing on ROCM platform:

				  # Unknown reasons, need to debug

				@ -116,31 +96,23 @@ if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/piecewise_linear_transform_test.py")

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/softmax_ops_test.py")

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/unique_ops_test.py")

				  # Need to go through roi ops to replace max(...) with fmaxf(...)

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/roi_align_rotated_op_test.py")

				  # Our cuda top_k op has some asm code, the hipified version doesn't

				  # compile yet, so we don't have top_k operator for now

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/top_k_test.py")

				  # Our AMD CI boxes have 4 gpus on each

				  # Remove this once we have added multi-gpu support

				  export HIP_VISIBLE_DEVICES=$(($BUILD_NUMBER % 4))

				fi

				# Python tests

				# NB: Warnings are disabled because they make it harder to see what

				# the actual erroring test is

				echo "Running Python tests.."

				pip install --user pytest-sugar

				"$PYTHON" \

				  -m pytest \

				  -x \

				  -v \

				  --disable-warnings \

				  --junit-xml="$TEST_DIR/python/result.xml" \

				  --ignore "$CAFFE2_PYPATH/python/test/executor_test.py" \

				  --ignore "$CAFFE2_PYPATH/python/operator_test/matmul_op_test.py" \

				  --ignore "$CAFFE2_PYPATH/python/operator_test/pack_ops_test.py" \

				  --ignore "$CAFFE2_PYPATH/python/mkl/mkl_sbn_speed_test.py" \

				  ${conda_ignore_test[@]} \

				  ${rocm_ignore_test[@]} \

				  "$CAFFE2_PYPATH/python" \

				  "${EXTRA_TESTS[@]}"

									
										14

.jenkins/pytorch/build-asan.sh
									
												View File
												
				@ -14,8 +14,18 @@ clang --version

				# symbolize=1: Gives us much better errors when things go wrong

				export ASAN_OPTIONS=detect_leaks=0:symbolize=1

				# FIXME: Remove the hardcoded "-pthread" option.

				# With asan build, the cmake thread CMAKE_HAVE_LIBC_CREATE[1] checking will

				# succeed because "pthread_create" is in libasan.so. However, libasan doesn't

				# have the full pthread implementation. Other advanced pthread functions doesn't

				# exist in libasan.so[2]. If we need some pthread advanced functions, we still

				# need to link the pthread library.

				# [1] https://github.com/Kitware/CMake/blob/8cabaaf054a16ea9c8332ce8e9291bd026b38c62/Modules/FindThreads.cmake#L135

				# [2] https://wiki.gentoo.org/wiki/AddressSanitizer/Problems

				#

				# TODO: Make the ASAN flags a more unified env var

				CC="clang" CXX="clang++" LDSHARED="clang --shared" \

				  CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan" \

				  NO_CUDA=1 \

				  CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan -pthread" \

				  CXX_FLAGS="-pthread" \

				  NO_CUDA=1 USE_MKLDNN=0 \

				  python setup.py install

									
										108

.jenkins/pytorch/build.sh
									
												View File
												
				@ -1,5 +1,12 @@

				#!/bin/bash

				# Required environment variable: $BUILD_ENVIRONMENT

				# (This is set by default in the Docker images we build, so you don't

				# need to set it yourself.

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# For distributed, four environmental configs:

				# (1) build with only NCCL

				# (2) build with NCCL and MPI

				@ -7,15 +14,19 @@

				# (4) build with neither

				if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				  # TODO: move this to Docker

				  sudo apt-get update

				  sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				  sudo apt-get -qq update

				  sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				fi

				if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then

				if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9*gcc7* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then

				  # TODO: move this to Docker

				  sudo apt-get update

				  sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				  sudo apt-get install -y --no-install-recommends openssh-client openssh-server

				  sudo apt-get -qq update

				  if [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then

				    sudo apt-get -qq install openmpi-bin libopenmpi-dev

				  else

				    sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				  fi

				  sudo apt-get -qq install --no-install-recommends openssh-client openssh-server

				  sudo mkdir -p /var/run/sshd

				fi

				@ -23,13 +34,6 @@ if [[ "$BUILD_ENVIRONMENT" == "pytorch-linux-xenial-py3-clang5-asan" ]]; then

				  exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" $*

				fi

				# Required environment variable: $BUILD_ENVIRONMENT

				# (This is set by default in the Docker images we build, so you don't

				# need to set it yourself.

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				echo "Python version:"

				python --version

				@ -40,34 +44,56 @@ echo "CMake version:"

				cmake --version

				# TODO: Don't run this...

				pip install -r requirements.txt || true

				pip install -q -r requirements.txt || true

				if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  # This is necessary in order to cross compile (or else we'll have missing GPU device).

				  export HCC_AMDGPU_TARGET=gfx900

				  # When hcc runs out of memory, it silently exits without stopping

				  # the build process, leaving undefined symbols in the shared lib

				  # which will cause undefined symbol errors when later running

				  # tests. Setting MAX_JOBS to smaller number to make CI less flaky.

				  export MAX_JOBS=4

				  # These environment variables are not set on CI when we were running as the Jenkins user.

				  # The HIP Utility scripts require these environment variables to be set in order to run without error.

				  export LANG=C.UTF-8

				  export LC_ALL=C.UTF-8

				  # ROCm CI is using Caffe2 docker images, which needs these wrapper

				  # scripts to correctly use sccache.

				  if [ -n "${SCCACHE_BUCKET}" ]; then

				    mkdir -p ./sccache

				  # This environment variable enabled HCC Optimizations that speed up the linking stage.

				  # https://github.com/RadeonOpenCompute/hcc#hcc-with-thinlto-linking

				  export KMTHINLTO=1

				    SCCACHE="$(which sccache)"

				    if [ -z "${SCCACHE}" ]; then

				      echo "Unable to find sccache..."

				      exit 1

				    fi

				  # Need the libc++1 and libc++abi1 libraries to allow torch._C to load at runtime

				  sudo apt-get install libc++1

				  sudo apt-get install libc++abi1

				    # Setup wrapper scripts

				    for compiler in cc c++ gcc g++ x86_64-linux-gnu-gcc; do

				      (

				        echo "#!/bin/sh"

				        echo "exec $SCCACHE $(which $compiler) \"\$@\""

				      ) > "./sccache/$compiler"

				      chmod +x "./sccache/$compiler"

				    done

				  python tools/amd_build/build_pytorch_amd.py

				  python tools/amd_build/build_caffe2_amd.py

				  USE_ROCM=1 python setup.py install --user

				    export CACHE_WRAPPER_DIR="$PWD/sccache"

				    # CMake must find these wrapper scripts

				    export PATH="$CACHE_WRAPPER_DIR:$PATH"

				  fi

				  python tools/amd_build/build_amd.py

				  # OPENCV is needed to enable ImageInput operator in caffe2 resnet5_trainer

				  # LMDB is needed to read datasets from https://download.caffe2.ai/databases/resnet_trainer.zip

				  USE_ROCM=1 USE_LMDB=1 USE_OPENCV=1 python setup.py install --user

				  exit 0

				fi

				# TODO: Don't install this here

				if ! which conda; then

				  pip install mkl mkl-devel

				  pip install -q mkl mkl-devel

				  if [[ "$BUILD_ENVIRONMENT" == *trusty-py3.6-gcc7.2* ]] || [[ "$BUILD_ENVIRONMENT" == *trusty-py3.6-gcc4.8* ]]; then

				    export USE_MKLDNN=1

				  else

				    export USE_MKLDNN=0

				  fi

				fi

				# sccache will fail for CUDA builds if all cores are used for compiling

				@ -102,26 +128,24 @@ fi

				# Add the test binaries so that they won't be git clean'ed away

				git add -f build/bin

				# Test C FFI plugins

				# cffi install doesn't work for Python 3.7

				if [[ "$BUILD_ENVIRONMENT" != *pynightly* ]]; then

				  # TODO: Don't run this here

				  pip install cffi

				  git clone https://github.com/pytorch/extension-ffi.git

				  pushd extension-ffi/script

				  python build.py

				  popd

				fi

				# Test documentation build

				if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda8-cudnn6-py3* ]]; then

				  pushd docs

				  # TODO: Don't run this here

				  pip install -r requirements.txt || true

				  pip install -q -r requirements.txt || true

				  LC_ALL=C make html

				  popd

				fi

				# Test standalone c10 build

				if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda8-cudnn6-py3* ]]; then

				  mkdir -p c10/build

				  pushd c10/build

				  cmake ..

				  make -j

				  popd

				fi

				# Test no-Python build

				if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then

				  echo "Building libtorch"

7

.jenkins/pytorch/enabled-configs.txt

View File

 @ -14,6 +14,8 @@ pytorch-linux-xenial-cuda9-cudnn7-py3-build
 pytorch-linux-xenial-cuda9-cudnn7-py3-test
 pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-build
 pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-test
 pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7-build
 pytorch-linux-xenial-cuda10-cudnn7-py3-gcc7-test
 pytorch-linux-xenial-py3-clang5-asan-build
 pytorch-linux-xenial-py3-clang5-asan-test
 pytorch-linux-trusty-py2.7.9-build
 @ -40,8 +42,9 @@ pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
 pytorch-docker-build-test
 short-perf-test-cpu
 short-perf-test-gpu
 py2-clang3.8-rocm1.7.1-ubuntu16.04-build
 py2-clang3.8-rocm1.7.1-ubuntu16.04-test
 py2-clang7-rocmdeb-ubuntu16.04-build
 py2-clang7-rocmdeb-ubuntu16.04-test
 py2-devtoolset7-rocmrpm-centos7.5-build
 pytorch-ppc64le-cuda9.2-cudnn7-py3-build
 pytorch-ppc64le-cuda9.2-cudnn7-py3-test
 pytorch-ppc64le-cuda9.1-cudnn7-py3-build

									
										3

.jenkins/pytorch/macos-test.sh
									
												View File
												
				@ -15,7 +15,8 @@ if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then

				fi

				export PATH="${PYTORCH_ENV_DIR}/miniconda3/bin:$PATH"

				source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate

				conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja

				conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja six

				pip install hypothesis

				if [ -z "${IN_CIRCLECI}" ]; then

				  rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				fi

									
										16

.jenkins/pytorch/perf_test/compare_with_baseline.py
									
												View File
												
				@ -1,5 +1,6 @@

				import sys

				import json

				import math

				import numpy

				import argparse

				@ -35,14 +36,25 @@ else:

				print("population mean: ", mean)

				print("population sigma: ", sigma)

				# Let the test pass if baseline number is NaN (which happened in

				# the past when we didn't have logic for catching NaN numbers)

				if math.isnan(mean) or math.isnan(sigma):

				    mean = sys.maxsize

				    sigma = 0.001

				sample_stats_data = json.loads(args.sample_stats)

				sample_mean = sample_stats_data['mean']

				sample_sigma = sample_stats_data['sigma']

				sample_mean = float(sample_stats_data['mean'])

				sample_sigma = float(sample_stats_data['sigma'])

				print("sample mean: ", sample_mean)

				print("sample sigma: ", sample_sigma)

				if math.isnan(sample_mean):

				    raise Exception('''Error: sample mean is NaN''')

				elif math.isnan(sample_sigma):

				    raise Exception('''Error: sample sigma is NaN''')

				z_value = (sample_mean - mean) / sigma

				print("z-value: ", z_value)

									
										3

.jenkins/pytorch/perf_test/test_gpu_speed_mnist.sh
									
												View File
												
				@ -20,6 +20,9 @@ test_gpu_speed_mnist () {

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  # Needs warm up to get accurate number

				  python main.py --epochs 1 --no-log

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)

				    echo $runtime

									
										45

.jenkins/pytorch/test.sh
									
												View File
												
				@ -1,36 +1,40 @@

				#!/bin/bash

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-test"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# Required environment variable: $BUILD_ENVIRONMENT

				# (This is set by default in the Docker images we build, so you don't

				# need to set it yourself.

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-test"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				echo "Testing pytorch"

				if [ -n "${IN_CIRCLECI}" ]; then

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				    sudo apt-get -qq update

				    sudo apt-get -qq install --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				  fi

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				    sudo apt-get install -y --no-install-recommends openssh-client openssh-server

				    sudo apt-get -qq update

				    sudo apt-get -qq install --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				    sudo apt-get -qq install --no-install-recommends openssh-client openssh-server

				    sudo mkdir -p /var/run/sshd

				  fi

				fi

				# JIT C++ extensions require ninja.

				git clone https://github.com/ninja-build/ninja --quiet

				pushd ninja

				python ./configure.py --bootstrap

				export PATH="$PWD:$PATH"

				popd

				# --user breaks ppc64le builds and these packages are already in ppc64le docker

				if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then

				  # JIT C++ extensions require ninja.

				  pip install -q ninja --user

				  # ninja is installed in /var/lib/jenkins/.local/bin

				  export PATH="/var/lib/jenkins/.local/bin:$PATH"

				  # TODO: move this to Docker

				  pip install -q hypothesis --user

				fi

				# DANGER WILL ROBINSON.  The LD_PRELOAD here could cause you problems

				# if you're not careful.  Check this if you made some changes and the

				@ -72,6 +76,8 @@ fi

				if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  export PYTORCH_TEST_WITH_ROCM=1

				  export LANG=C.UTF-8

				  export LC_ALL=C.UTF-8

				fi

				if [[ "${JOB_BASE_NAME}" == *-NO_AVX-* ]]; then

				@ -102,7 +108,9 @@ test_aten() {

				      SUDO=sudo

				    fi

				    ${SUDO} ln -s "$TORCH_LIB_PATH"/libc10* build/bin

				    ${SUDO} ln -s "$TORCH_LIB_PATH"/libcaffe2* build/bin

				    ${SUDO} ln -s "$TORCH_LIB_PATH"/libmkldnn* build/bin

				    ${SUDO} ln -s "$TORCH_LIB_PATH"/libnccl* build/bin

				    ls build/bin

				@ -124,7 +132,7 @@ test_torchvision() {

				  # this should be a transient requirement...)

				  # See https://github.com/pytorch/pytorch/issues/7525

				  #time python setup.py install

				  pip install --user .

				  pip install -q --user .

				  popd

				}

				@ -137,7 +145,7 @@ test_libtorch() {

				     else

				       "$CPP_BUILD"/caffe2/bin/test_jit "[cpu]"

				     fi

				     python tools/download_mnist.py --quiet -d test/cpp/api/mnist

				     python tools/download_mnist.py --quiet -d mnist

				     OMP_NUM_THREADS=2 "$CPP_BUILD"/caffe2/bin/test_api

				  fi

				}

				@ -158,19 +166,20 @@ test_custom_script_ops() {

				}

				if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then

				  test_torchvision

				  test_python_nn

				  test_python_all_except_nn

				  test_aten

				  test_torchvision

				  test_libtorch

				  test_custom_script_ops

				else

				  if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then

				    test_torchvision

				    test_python_nn

				  elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then

				    test_torchvision

				    test_python_all_except_nn

				    test_aten

				    test_torchvision

				    test_libtorch

				    test_custom_script_ops

				  fi

									
										36

.jenkins/pytorch/win-build.sh
									
												View File
												
				@ -55,11 +55,11 @@ set LIB=%cd%\\mkl\\lib;%LIB

				:: Install MAGMA

				if "%REBUILD%"=="" (

				  if "%BUILD_ENVIRONMENT%"=="" (

				    curl -k https://s3.amazonaws.com/ossci-windows/magma_cuda90_release_mkl_2018.2.185.7z --output magma_cuda90_release_mkl_2018.2.185.7z

				    curl -k https://s3.amazonaws.com/ossci-windows/magma_2.4.0_cuda90_release.7z --output magma_2.4.0_cuda90_release.7z

				  ) else (

				    aws s3 cp s3://ossci-windows/magma_cuda90_release_mkl_2018.2.185.7z magma_cuda90_release_mkl_2018.2.185.7z --quiet

				    aws s3 cp s3://ossci-windows/magma_2.4.0_cuda90_release.7z magma_2.4.0_cuda90_release.7z --quiet

				  )

				  7z x -aoa magma_cuda90_release_mkl_2018.2.185.7z -omagma

				  7z x -aoa magma_2.4.0_cuda90_release.7z -omagma

				)

				set MAGMA_HOME=%cd%\\magma

				@ -80,18 +80,29 @@ if "%REBUILD%"=="" (

				)

				:: Install Miniconda3

				if "%REBUILD%"=="" (

				  IF EXIST C:\\Jenkins\\Miniconda3 ( rd /s /q C:\\Jenkins\\Miniconda3 )

				  curl -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O

				  .\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=C:\\Jenkins\\Miniconda3

				if "%BUILD_ENVIRONMENT%"=="" (

				  set CONDA_PARENT_DIR=%CD%

				) else (

				  set CONDA_PARENT_DIR=C:\\Jenkins

				)

				if "%REBUILD%"=="" (

				  IF EXIST %CONDA_PARENT_DIR%\\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\\Miniconda3 )

				  curl -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O

				  .\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\\Miniconda3

				)

				call %CONDA_PARENT_DIR%\\Miniconda3\\Scripts\\activate.bat %CONDA_PARENT_DIR%\\Miniconda3

				if "%REBUILD%"=="" (

				  :: We have to pin Python version to 3.6.7, until mkl supports Python 3.7

				  call conda install -y -q python=3.6.7 numpy cffi pyyaml boto3

				)

				call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3

				if "%REBUILD%"=="" ( call conda install -y -q numpy cffi pyyaml boto3 )

				:: Install ninja

				if "%REBUILD%"=="" ( pip install ninja )

				set WORKING_DIR=%CD%

				call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x64

				call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x86_amd64

				cd %WORKING_DIR%

				git submodule update --init --recursive

				@ -129,7 +140,7 @@ if not "%USE_CUDA%"=="0" (

				  if "%REBUILD%"=="" (

				    sccache --show-stats

				    sccache --zero-stats

				    rd /s /q C:\\Jenkins\\Miniconda3\\Lib\\site-packages\\torch

				    rd /s /q %CONDA_PARENT_DIR%\\Miniconda3\\Lib\\site-packages\\torch

				    copy %CD%\\tmp_bin\\sccache.exe tmp_bin\\nvcc.exe

				  )

				@ -139,9 +150,10 @@ if not "%USE_CUDA%"=="0" (

				  python setup.py install && sccache --show-stats && (

				    if "%BUILD_ENVIRONMENT%"=="" (

				      echo "NOTE: To run \`import torch\`, please make sure to activate the conda environment by running \`call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3\` in Command Prompt before running Git Bash."

				      echo NOTE: To run \`import torch\`, please make sure to activate the conda environment by running \`call %CONDA_PARENT_DIR%\\Miniconda3\\Scripts\\activate.bat %CONDA_PARENT_DIR%\\Miniconda3\` in Command Prompt before running Git Bash.

				    ) else (

				      7z a %IMAGE_COMMIT_TAG%.7z C:\\Jenkins\\Miniconda3\\Lib\\site-packages\\torch && python ci_scripts\\upload_image.py %IMAGE_COMMIT_TAG%.7z

				      mv %CD%\\build\\bin\\test_api.exe %CONDA_PARENT_DIR%\\Miniconda3\\Lib\\site-packages\\torch\\lib

				      7z a %IMAGE_COMMIT_TAG%.7z %CONDA_PARENT_DIR%\\Miniconda3\\Lib\\site-packages\\torch && python ci_scripts\\upload_image.py %IMAGE_COMMIT_TAG%.7z

				    )

				  )

				)

									
										77

.jenkins/pytorch/win-test.sh
									
												View File
												
				@ -39,15 +39,26 @@ cat >ci_scripts/setup_pytorch_env.bat <<EOL

				set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\ProgramData\\chocolatey\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%

				:: Install Miniconda3

				IF EXIST C:\\Jenkins\\Miniconda3 ( rd /s /q C:\\Jenkins\\Miniconda3 )

				curl https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O

				.\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=C:\\Jenkins\\Miniconda3

				call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3

				call conda install -y -q numpy mkl cffi pyyaml boto3

				pip install ninja

				if "%BUILD_ENVIRONMENT%"=="" (

				    set CONDA_PARENT_DIR=%CD%

				) else (

				    set CONDA_PARENT_DIR=C:\\Jenkins

				)

				if NOT "%BUILD_ENVIRONMENT%"=="" (

				    IF EXIST %CONDA_PARENT_DIR%\\Miniconda3 ( rd /s /q %CONDA_PARENT_DIR%\\Miniconda3 )

				    curl https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O

				    .\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=%CONDA_PARENT_DIR%\\Miniconda3

				)

				call %CONDA_PARENT_DIR%\\Miniconda3\\Scripts\\activate.bat %CONDA_PARENT_DIR%\\Miniconda3

				if NOT "%BUILD_ENVIRONMENT%"=="" (

				    :: We have to pin Python version to 3.6.7, until mkl supports Python 3.7

				    call conda install -y -q python=3.6.7 numpy mkl cffi pyyaml boto3

				)

				pip install ninja future hypothesis

				set WORKING_DIR=%CD%

				call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x86_amd64

				cd %WORKING_DIR%

				set PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\libnvvp;%PATH%

				set CUDA_PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				@ -58,13 +69,14 @@ set CUDA_TOOLKIT_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\

				set CUDNN_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set PYTHONPATH=%CD%\\test;%PYTHONPATH%

				cd test/

				python ..\\ci_scripts\\download_image.py %IMAGE_COMMIT_TAG%.7z

				7z x %IMAGE_COMMIT_TAG%.7z

				cd ..

				if NOT "%BUILD_ENVIRONMENT%"=="" (

				    cd test/

				    python ..\\ci_scripts\\download_image.py %IMAGE_COMMIT_TAG%.7z

				    7z x %IMAGE_COMMIT_TAG%.7z

				    cd ..

				) else (

				    xcopy /s %CONDA_PARENT_DIR%\\Miniconda3\\Lib\\site-packages\\torch .\\test\\torch\\

				)

				EOL

				@ -78,14 +90,47 @@ call ci_scripts/setup_pytorch_env.bat

				cd test/ && python run_test.py --exclude nn --verbose && cd ..

				EOL

				cat >ci_scripts/test_custom_script_ops.bat <<EOL

				call ci_scripts/setup_pytorch_env.bat

				cd test/custom_operator

				:: Build the custom operator library.

				mkdir build

				cd build

				:: Note: Caffe2 does not support MSVC + CUDA + Debug mode (has to be Release mode)

				cmake -DCMAKE_PREFIX_PATH=%CD%\\..\\..\\torch -DCMAKE_BUILD_TYPE=Release -GNinja ..

				ninja -v

				cd ..

				:: Run tests Python-side and export a script module.

				python test_custom_ops.py -v

				python model.py --export-script-module="build/model.pt"

				:: Run tests C++-side and load the exported script module.

				cd build

				set PATH=C:\\Program Files\\NVIDIA Corporation\\NvToolsExt/bin/x64;%CD%\\..\\..\\torch\\lib;%PATH%

				test_custom_ops.exe model.pt

				EOL

				cat >ci_scripts/test_libtorch.bat <<EOL

				call ci_scripts/setup_pytorch_env.bat

				dir

				dir %CD%\\test 

				dir %CD%\\test\\torch

				dir %CD%\\test\\torch\\lib

				cd %CD%\\test\\torch\\lib

				set PATH=C:\\Program Files\\NVIDIA Corporation\\NvToolsExt/bin/x64;%CD%\\..\\..\\torch\\lib;%PATH%

				test_api.exe --gtest_filter="-IntegrationTest.MNIST*"

				EOL

				run_tests() {

				    if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then

				        ci_scripts/test_python_nn.bat && ci_scripts/test_python_all_except_nn.bat

				        ci_scripts/test_python_nn.bat && ci_scripts/test_python_all_except_nn.bat && ci_scripts/test_custom_script_ops.bat && ci_scripts/test_libtorch.bat

				    else

				        if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then

				            ci_scripts/test_python_nn.bat

				        elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then

				            ci_scripts/test_python_all_except_nn.bat

				            ci_scripts/test_python_all_except_nn.bat && ci_scripts/test_custom_script_ops.bat && ci_scripts/test_libtorch.bat

				        fi

				    fi

				}

									
										16

.travis.yml
									
												View File
												
				@ -27,5 +27,17 @@ matrix:

				        install: pip install mypy mypy-extensions

				        script: mypy @mypy-files.txt

				      - env: CPP_DOC_CHECK

				        install: sudo apt-get install -y doxygen

				        script: cd docs/cpp && ./check-doxygen.sh

				        python: "3.6"

				        install: 

				          - sudo apt-get install -y doxygen

				          - pip install -r requirements.txt

				        script: cd docs/cpp/source && ./check-doxygen.sh

				      - env: CLANG_TIDY

				        python: "3.6"

				        addons:

				          apt:

				            sources:

				              - ubuntu-toolchain-r-test

				              - llvm-toolchain-trusty

				            packages: clang-tidy

				        script: tools/run-clang-tidy-in-ci.sh

									
										67

CMakeLists.txt
									
												View File
												
				@ -5,11 +5,14 @@ cmake_minimum_required(VERSION 3.5 FATAL_ERROR)

				# ---[ Project and semantic versioning.

				project(Caffe2 CXX C)

				set(CAFFE2_VERSION_MAJOR 0)

				set(CAFFE2_VERSION_MINOR 8)

				set(CAFFE2_VERSION_PATCH 2)

				set(CAFFE2_VERSION

				    "${CAFFE2_VERSION_MAJOR}.${CAFFE2_VERSION_MINOR}.${CAFFE2_VERSION_PATCH}")

				set(CMAKE_INSTALL_MESSAGE NEVER)

				set(CMAKE_CXX_STANDARD 11)

				if (NOT MSVC)

				  set(CMAKE_C_STANDARD 11)

				endif()

				set(CMAKE_EXPORT_COMPILE_COMMANDS ON)

				# One variable that determines whether the current cmake process is being run

				# with the main Caffe2 library. This is useful for building modules - if

				@ -56,11 +59,13 @@ include(CMakeDependentOption)

				option(BUILD_TORCH "Build Torch" OFF)

				option(ATEN_NO_TEST "Do not build ATen test binaries" OFF)

				option(BUILD_ATEN_MOBILE "Build ATen for Android and iOS" OFF)

				option(BUILD_ATEN_ONLY "Build only a subset focused on ATen only" OFF)

				option(BUILD_BINARY "Build C++ binaries" OFF)

				option(BUILD_DOCS "Build Caffe2 documentation" OFF)

				option(BUILD_CUSTOM_PROTOBUF "Build and use Caffe2's own protobuf under third_party" ON)

				option(BUILD_PYTHON "Build Python binaries" ON)

				option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON)

				option(BUILD_C10_EXPERIMENTAL_OPS "Build c10 experimental operators" ON)

				option(BUILD_SHARED_LIBS "Build libcaffe2.so" ON)

				cmake_dependent_option(

				    CAFFE2_LINK_LOCAL_PROTOBUF "If set, build protobuf inside libcaffe2.so." ON

				@ -75,11 +80,12 @@ cmake_dependent_option(

				option(USE_ACL "Use ARM Compute Library" OFF)

				option(USE_ASAN "Use Address Sanitizer" OFF)

				option(USE_CUDA "Use CUDA" ON)

				option(USE_ROCM "Use ROCm" OFF)

				option(USE_ROCM "Use ROCm" ON)

				option(CAFFE2_STATIC_LINK_CUDA "Statically link CUDA libraries" OFF)

				cmake_dependent_option(

				    USE_CUDNN "Use cuDNN" ON

				    "USE_CUDA" OFF)

				option(USE_FBGEMM "Use FBGEMM (quantized 8-bit server operators)" OFF)

				option(USE_FFMPEG "Use ffmpeg" OFF)

				option(USE_GFLAGS "Use GFLAGS" ON)

				option(USE_GLOG "Use GLOG" ON)

				@ -91,18 +97,19 @@ option(USE_MOBILE_OPENGL "Use OpenGL for mobile code" ON)

				option(USE_NATIVE_ARCH "Use -march=native" OFF)

				option(USE_NCCL "Use NCCL" ON)

				option(USE_SYSTEM_NCCL "Use system-wide NCCL" OFF)

				option(USE_NERVANA_GPU "Use Nervana GPU backend" OFF)

				option(USE_NNAPI "Use NNAPI" OFF)

				option(USE_NNPACK "Use NNPACK" ON)

				option(USE_NUMA "Use NUMA (only available on Linux)" ON)

				cmake_dependent_option(

				    USE_NVRTC "Use NVRTC. Only available if USE_CUDA is on." OFF

				    "USE_CUDA" OFF)

				option(USE_NUMPY "Use NumPy" ON)

				option(USE_OBSERVERS "Use observers module." OFF)

				option(USE_OPENCL "Use OpenCL" OFF)

				option(USE_OPENCV "Use OpenCV" ON)

				option(USE_OPENMP "Use OpenMP for parallel code" OFF)

				option(USE_PROF "Use profiling" OFF)

				option(USE_QNNPACK "Use QNNPACK (quantized 8-bit operators)" ON)

				option(USE_REDIS "Use Redis" OFF)

				option(USE_ROCKSDB "Use RocksDB" OFF)

				option(USE_SNPE "Use Qualcomm's SNPE library" OFF)

				@ -112,8 +119,6 @@ option(USE_TENSORRT "Using Nvidia TensorRT library" OFF)

				option(USE_ZMQ "Use ZMQ" OFF)

				option(USE_ZSTD "Use ZSTD" OFF)

				option(USE_MKLDNN "Use MKLDNN" OFF)

				option(USE_IDEEP "Use IDEEP interface in MKL BLAS" ON)

				option(USE_MKLML "Use MKLML interface in MKL BLAS" ON)

				option(USE_DISTRIBUTED "Use distributed" ON)

				cmake_dependent_option(

				    USE_MPI "Use MPI for Caffe2. Only available if USE_DISTRIBUTED is on." ON

				@ -124,7 +129,6 @@ cmake_dependent_option(

				cmake_dependent_option(

				    USE_GLOO_IBVERBS "Use Gloo IB verbs for distributed. Only available if USE_GLOO is on." OFF

				    "USE_GLOO" OFF)

				option(TORCH_USE_CEREAL "Build the C++ API with Cereal for serialization support" OFF)

				# Used when building Caffe2 through setup.py

				option(BUILDING_WITH_TORCH_LIBS "Tell cmake if Caffe2 is being built alongside torch libs" OFF)

				@ -135,6 +139,38 @@ if (ANDROID OR IOS)

				  set(BUILD_ATEN_MOBILE ON)

				endif()

				if (BUILD_ATEN_ONLY)

				  set(BUILD_CAFFE2_OPS OFF)

				  set(BUILD_PYTHON OFF)

				  set(USE_NUMA OFF)

				  set(USE_LEVELDB OFF)

				  set(USE_GFLAGS OFF)

				  set(USE_GLOG OFF)

				  set(USE_NCCL OFF)

				  set(USE_NNPACK OFF)

				  set(USE_NUMPY OFF)

				  set(USE_OPENCV OFF)

				  set(USE_MKLDNN OFF)

				  set(USE_DISTRIBUTED OFF)

				  set(USE_LMDB OFF)

				endif()

				# ---[ Utils

				# TODO: merge the following 3 files into cmake/public/utils.cmake.

				include(cmake/Utils.cmake)

				include(cmake/public/utils.cmake)

				# ---[ Version numbers for generated libraries

				set(TORCH_DEFAULT_VERSION "1.0.0")

				set(TORCH_BUILD_VERSION "${TORCH_DEFAULT_VERSION}" CACHE STRING "Torch build version")

				if (NOT TORCH_BUILD_VERSION)

				  # An empty string was specified so force version to the default

				  set(TORCH_BUILD_VERSION "${TORCH_DEFAULT_VERSION}"

				    CACHE STRING "Torch build version" FORCE)

				endif()

				caffe2_parse_version_str(TORCH ${TORCH_BUILD_VERSION})

				caffe2_parse_version_str(CAFFE2 ${TORCH_BUILD_VERSION})

				# ---[ CMake scripts + modules

				list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake/Modules)

				@ -161,11 +197,6 @@ include(cmake/MiscCheck.cmake)

				# External projects

				include(ExternalProject)

				# ---[ Utils

				# TODO: merge the following 3 files into cmake/public/utils.cmake.

				include(cmake/Utils.cmake)

				include(cmake/public/utils.cmake)

				# ---[ Dependencies

				include(cmake/Dependencies.cmake)

				@ -265,6 +296,10 @@ if (APPLE)

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-constexpr-not-const")

				endif()

				if (EMSCRIPTEN)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-implicit-function-declaration -DEMSCRIPTEN -s DISABLE_EXCEPTION_CATCHING=0")

				endif()

				if(CMAKE_COMPILER_IS_GNUCXX AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0.0)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-stringop-overflow")

				endif()

				@ -295,6 +330,7 @@ include_directories(BEFORE ${PROJECT_BINARY_DIR})

				include_directories(BEFORE ${PROJECT_SOURCE_DIR}/aten/src/)

				# ---[ Main build

				add_subdirectory(c10)

				add_subdirectory(caffe2)

				# --[ Documentation

				@ -387,6 +423,7 @@ if (BUILD_SHARED_LIBS)

				      ${PROJECT_SOURCE_DIR}/cmake/public/glog.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/gflags.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/mkl.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/mkldnn.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/protobuf.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/threads.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/utils.cmake

20

CODEOWNERS

View File

 @ -1,23 +1,9 @@
 # This is a comment.
 # Each line is a file pattern followed by one or more owners.
 /aten/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /aten/src/ATen/core/
 /torch/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /docs/source @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ssnl @zou3519
 /docs/cpp @goldsborough @ebetica @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /test @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /tools @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /README.md @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /setup.py @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /requirements.txt @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /torch/csrc/api/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ebetica @goldsborough
 /test/cpp/api/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ebetica @goldsborough
 /torch/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /torch/csrc/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /torch/csrc/jit/passes/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /test/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /scripts/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /docs/cpp @goldsborough @ebetica
 /torch/csrc/api/ @ebetica @goldsborough
 /test/cpp/api/ @ebetica @goldsborough
 /torch/lib/c10d/ @apaszke @pietern @teng-li
 /torch/csrc/distributed/ @apaszke @pietern @teng-li
 /torch/distributed/ @apaszke @pietern @teng-li

									
										119

CONTRIBUTING.md
									
												View File
												
				@ -75,6 +75,84 @@ You do not need to repeatedly install after modifying python files.

				In case you want to reinstall, make sure that you uninstall pytorch first by running `pip uninstall torch`

				and `python setup.py clean`. Then you can install in `build develop` mode again.

				## Codebase structure

				* [c10](c10) - Core library files that work everywhere, both server

				  and mobile.  We are slowly moving pieces from ATen/core here.

				  This library is intended only to contain essential functionality,

				  and appropriate to use in settings where binary size matters.  (But

				  you'll have a lot of missing functionality if you try to use it

				  directly.)

				* [aten](aten) - C++ tensor library for PyTorch (no autograd support)

				  * src

				    * [TH](aten/src/TH)

				      [THC](aten/src/THC)

				      [THNN](aten/src/THNN)

				      [THCUNN](aten/src/THCUNN) - Legacy library code from the original

				      Torch.  Try not to add things here; we're slowly porting these to

				      native.

				      * generic - Contains actual implementations of operators,

				        parametrized over `scalar_t`.  Files here get compiled N times

				        per supported scalar type in PyTorch.

				    * ATen

				      * [core](aten/src/ATen/core) - Core functionality of ATen.  This

				        is migrating to top-level c10 folder.

				      * [native](aten/src/ATen/native) - Modern implementations of

				        operators.  If you want to write a new operator, here is where

				        it should go.  Most CPU operators go in the top level directory,

				        except for operators which need to be compiled specially; see

				        cpu below.

				        * [cpu](aten/src/ATen/native/cpu) - Not actually CPU

				          implementations of operators, but specifically implementations

				          which are compiled with processor-specific instructions, like

				          AVX.  See the README for more details.

				        * [cuda](aten/src/ATen/native/cuda) - CUDA implementations of

				          operators.

				        * [sparse](aten/src/ATen/native/sparse) - CPU and CUDA

				          implementations of COO sparse tensor operations

				        * [mkl](aten/src/ATen/native/mkl) [mkldnn](aten/src/ATen/native/mkldnn)

				          [miopen](aten/src/ATen/native/miopen) [cudnn](aten/src/ATen/native/cudnn)

				          - implementations of operators which simply bind to some

				            backend library.

				* [torch](torch) - The actual PyTorch library.  Everything that is not

				  in csrc is Python modules, following the PyTorch Python frontend

				  module structure.

				  * [csrc](torch/csrc) - C++ files composing the PyTorch library.  Files

				    in this directory tree are a mix of Python binding code, and C++

				    heavy lifting.  Consult `setup.py` for the canonical list of Python

				    binding files; conventionally, they are often prefixed with

				    `python_`.

				    * [jit](torch/csrc/jit) - Compiler and frontend for TorchScript JIT

				      frontend.

				    * [autograd](torch/csrc/autograd) - Implementation of reverse-mode automatic

				      differentation

				    * [api](torch/csrc/api) - The PyTorch C++ frontend.

				    * [distributed](torch/csrc/distributed) - Distributed training

				      support for PyTorch.

				* [tools](tools) - Code generation scripts for the PyTorch library.

				  See README of this directory for more details.

				* [test](tests) - Python unit tests for PyTorch Python frontend

				  * [test_torch.py](test/test_torch.py) - Basic tests for PyTorch

				    functionality

				  * [test_autograd.py](test/test_autograd.py) - Tests for non-NN

				    automatic differentiation support

				  * [test_nn.py](test/test_nn.py) - Tests for NN operators and

				    their automatic differentiation

				  * [test_jit.py](test/test_jit.py) - Tests for the JIT compiler

				    and TorchScript

				  * ...

				  * [cpp](test/cpp) - C++ unit tests for PyTorch C++ frontend

				  * [expect](test/expect) - Automatically generated "expect" files

				    which are used to compare against expected output.

				  * [onnx](test/onnx) - Tests for ONNX export functionality,

				    using both PyTorch and Caffe2.

				* [caffe2](caffe2) - The Caffe2 library.

				  * [core](caffe2/core) - Core files of Caffe2, e.g., tensor, workspace,

				    blobs, etc.

				  * [operators](caffe2/operators) - Operators of Caffe2

				  * [python](caffe2/python) - Python bindings to Caffe2

				  * ...

				## Unit testing

				PyTorch's testing is located under `test/`. Run the entire test suite with

				@ -262,9 +340,9 @@ than Linux, which are worth keeping in mind when fixing these problems.

				1. Symbols are NOT exported by default on Windows; instead, you have to explicitly

				   mark a symbol as exported/imported in a header file with `__declspec(dllexport)` /

				   `__declspec(dllimport)`.  We have codified this pattern into a set of macros

				   which follow the convention `*_API`, e.g., `AT_API` inside ATen. (Every separate

				   shared library needs a unique macro name, because symbol visibility is on a per

				   shared library basis.)

				   which follow the convention `*_API`, e.g., `CAFFE2_API` inside Caffe2 and ATen.

				   (Every separate shared library needs a unique macro name, because symbol visibility

				   is on a per shared library basis. See c10/macros/Macros.h for more details.)

				   The upshot is if you see an "unresolved external" error in your Windows build, this

				   is probably because you forgot to mark a function with `*_API`.  However, there is

				@ -325,7 +403,7 @@ Here are a few well known pitfalls and workarounds:

				  catch all of these problems: stay vigilant to the possibility that

				  your crash is due to a real memory problem.

				* (NVCC) `at::optional` does not work when used from device code.  Don't use

				* (NVCC) `c10::optional` does not work when used from device code.  Don't use

				  it from kernels.  Upstream issue: https://github.com/akrzemi1/Optional/issues/58

				  and our local issue #10329.

				@ -334,7 +412,7 @@ Here are a few well known pitfalls and workarounds:

				  * The idiom `static_assert(f() == f())` to test if `f` is constexpr

				    does not work; you'll get "error C2131: expression did not evaluate

				    to a constant".  Don't use these asserts on Windows.

				    (Example: `aten/src/ATen/core/intrusive_ptr.h`)

				    (Example: `c10/util/intrusive_ptr.h`)

				* (NVCC) Code you access inside a `static_assert` will eagerly be

				  evaluated as if it were device code, and so you might get an error

				@ -354,6 +432,37 @@ static_assert(std::is_same(A*, decltype(A::singelton()))::value, "hmm");

				  are too large.  Splitting such files into separate files helps.

				  (Example: `THTensorMath`, `THTensorMoreMath`, `THTensorEvenMoreMath`.)

				### Running Clang-Tidy

				[Clang-Tidy](https://clang.llvm.org/extra/clang-tidy/index.html) is a C++

				linter and static analysis tool based on the clang compiler. We run clang-tidy

				in our CI to make sure that new C++ code is safe, sane and efficient. See our

				[.travis.yml](https://github.com/pytorch/pytorch/blob/master/.travis.yml) file

				for the simple commands we use for this.

				To run clang-tidy locally, follow these steps:

				1. Install clang-tidy. First, check if you already have clang-tidy by simply

				writing `clang-tidy` in your terminal. If you don't yet have clang-tidy, you

				should be able to install it easily with your package manager, e.g. by writing

				`apt-get install clang-tidy` on Ubuntu. See https://apt.llvm.org for details on

				how to install the latest version. Note that newer versions of clang-tidy will

				have more checks than older versions. In our CI, we run clang-tidy-6.0.

				2. Use our driver script to run clang-tidy over any changes relative to some

				   git revision (you may want to replace `HEAD~1` with `HEAD` to pick up

				   uncommitted changes). Changes are picked up based on a `git diff` with the

				   given revision:

				  ```sh

				  $ python tools/clang_tidy.py -d build -p torch/csrc --diff 'HEAD~1'

				  ```

				Above, it is assumed you are in the PyTorch root folder. `path/to/build` should

				be the path to where you built PyTorch from source, e.g. `build` in the PyTorch

				root folder if you used `setup.py build`. You can use `-c <clang-tidy-binary>`

				to change the clang-tidy this script uses. Make sure you have PyYaml installed,

				which is in PyTorch's `requirements.txt`.

				## Caffe2 notes

				In 2018, we merged Caffe2 into the PyTorch source repository.  While the

									
										44

README.md
									
												View File
												
				@ -22,11 +22,13 @@ We are in an early-release beta. Expect some adventures and rough edges.

				- [Releases and Contributing](#releases-and-contributing)

				- [The Team](#the-team)

				| System | 2.7 | 3.5 |

				| --- | --- | --- |

				| Linux CPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) |

				| Linux GPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) |

				| Windows GPU | <center>—</center> | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/)

				| System | 2.7 | 3.5 | 3.6 |

				| :---: | :---: | :---: | :--: |

				| Linux CPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | <center>—</center> |

				| Linux GPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | <center>—</center> |

				| Windows GPU | <center>—</center> | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/) |  <center>—</center> |

				| Linux (ppc64le) CPU | [![Build Status](https://powerci.osuosl.org/job/pytorch-master-nightly-py2-linux-ppc64le/badge/icon)](https://powerci.osuosl.org/job/pytorch-master-nightly-py2-linux-ppc64le/) | — | [![Build Status](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/badge/icon)](https://powerci.osuosl.org/job/pytorch-master-nightly-py3-linux-ppc64le/) |

				| Linux (ppc64le) GPU | [![Build Status](https://powerci.osuosl.org/job/pytorch-linux-cuda9-cudnn7-py2-mpi-build-test-gpu/badge/icon)](https://powerci.osuosl.org/job/pytorch-linux-cuda9-cudnn7-py2-mpi-build-test-gpu/) | — | [![Build Status](https://powerci.osuosl.org/job/pytorch-linux-cuda92-cudnn7-py3-mpi-build-test-gpu/badge/icon)](https://powerci.osuosl.org/job/pytorch-linux-cuda92-cudnn7-py3-mpi-build-test-gpu/) |

				See also the [ci.pytorch.org HUD](https://ezyang.github.io/pytorch-ci-hud/build/pytorch-master).

				@ -77,7 +79,7 @@ change the way your network behaves arbitrarily with zero lag or overhead. Our i

				from several research papers on this topic, as well as current and past work such as

				[torch-autograd](https://github.com/twitter/torch-autograd),

				[autograd](https://github.com/HIPS/autograd),

				[Chainer](http://chainer.org), etc.

				[Chainer](https://chainer.org), etc.

				While this technique is not unique to PyTorch, it's one of the fastest implementations of it to date.

				You get the best of speed and flexibility for your crazy research.

				@ -88,7 +90,7 @@ You get the best of speed and flexibility for your crazy research.

				PyTorch is not a Python binding into a monolithic C++ framework.

				It is built to be deeply integrated into Python.

				You can use it naturally like you would use NumPy / SciPy / scikit-learn etc.

				You can use it naturally like you would use [NumPy](http://www.numpy.org/) / [SciPy](https://www.scipy.org/) / [scikit-learn](http://scikit-learn.org) etc.

				You can write your new neural network layers in Python itself, using your favorite libraries

				and use packages such as Cython and Numba.

				Our goal is to not reinvent the wheel where appropriate.

				@ -104,7 +106,7 @@ We hope you never spend hours debugging your code because of bad stack traces or

				### Fast and Lean

				PyTorch has minimal framework overhead. We integrate acceleration libraries

				such as Intel MKL and NVIDIA (cuDNN, NCCL) to maximize speed.

				such as [Intel MKL](https://software.intel.com/mkl) and NVIDIA (cuDNN, NCCL) to maximize speed.

				At the core, its CPU and GPU Tensor and neural network backends

				(TH, THC, THNN, THCUNN) are mature and have been tested for years.

				@ -121,10 +123,10 @@ Writing new neural network modules, or interfacing with PyTorch's Tensor API was

				and with minimal abstractions.

				You can write new neural network layers in Python using the torch API

				[or your favorite NumPy-based libraries such as SciPy](http://pytorch.org/tutorials/advanced/numpy_extensions_tutorial.html).

				[or your favorite NumPy-based libraries such as SciPy](https://pytorch.org/tutorials/advanced/numpy_extensions_tutorial.html).

				If you want to write your layers in C/C++, we provide a convenient extension API that is efficient and with minimal boilerplate.

				There is no wrapper code that needs to be written. You can see [a tutorial here](http://pytorch.org/tutorials/advanced/cpp_extension.html) and [an example here](https://github.com/pytorch/extension-cpp).

				There is no wrapper code that needs to be written. You can see [a tutorial here](https://pytorch.org/tutorials/advanced/cpp_extension.html) and [an example here](https://github.com/pytorch/extension-cpp).

				## Installation

				@ -132,7 +134,7 @@ There is no wrapper code that needs to be written. You can see [a tutorial here]

				### Binaries

				Commands to install from binaries via Conda or pip wheels are on our website:

				[http://pytorch.org](http://pytorch.org)

				[https://pytorch.org](https://pytorch.org)

				### From Source

				@ -163,7 +165,7 @@ conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing

				conda install -c mingfeima mkldnn

				# Add LAPACK support for the GPU

				conda install -c pytorch magma-cuda80 # or magma-cuda90 if CUDA 9

				conda install -c pytorch magma-cuda92 # or [magma-cuda80 | magma-cuda91] depending on your cuda version

				```

				On macOS

				@ -202,7 +204,7 @@ REM The following two lines are needed for Python 2.7, but the support for it is

				set MSSdk=1

				set FORCE_PY27_BUILD=1

				REM As for CUDA 8, VS2015 Update 3 is also required to build PyTorch. Use the following line.

				set "CUDA_HOST_COMPILER=%VS140COMNTOOLS%\..\..\VC\bin\amd64\cl.exe"

				set "CUDAHOSTCXX=%VS140COMNTOOLS%\..\..\VC\bin\amd64\cl.exe"

				call "%VS150COMNTOOLS%\vcvarsall.bat" x64 -vcvars_ver=14.11

				python setup.py install

				@ -210,7 +212,7 @@ python setup.py install

				### Docker image

				Dockerfile is supplied to build images with cuda support and cudnn v7. You can pass -e PYTHON_VERSION=x.y flag to specificy which python to be used by Miniconda, or leave it unset to use the default. Build as usual

				Dockerfile is supplied to build images with cuda support and cudnn v7. You can pass `-e PYTHON_VERSION=x.y` flag to specify which python version is to be used by Miniconda, or leave it unset to use the default. Build as usual

				```

				docker build -t pytorch -f docker/pytorch/Dockerfile .

				```

				@ -226,7 +228,7 @@ should increase shared memory size either with `--ipc=host` or `--shm-size` comm

				### Building the Documentation

				To build documentation in various formats, you will need Sphinx and the

				To build documentation in various formats, you will need [Sphinx](http://www.sphinx-doc.org) and the

				readthedocs theme.

				```

				@ -239,7 +241,7 @@ You can then build the documentation by running ``make <format>`` from the

				### Previous Versions

				Installation instructions and binaries for previous PyTorch versions may be found

				on [our website](http://pytorch.org/previous-versions/).

				on [our website](https://pytorch.org/previous-versions).

				## Getting Started

				@ -247,13 +249,13 @@ on [our website](http://pytorch.org/previous-versions/).

				Three pointers to get you started:

				- [Tutorials: get you started with understanding and using PyTorch](https://pytorch.org/tutorials/)

				- [Examples: easy to understand pytorch code across all domains](https://github.com/pytorch/examples)

				- [The API Reference](http://pytorch.org/docs/)

				- [The API Reference](https://pytorch.org/docs/)

				## Communication

				* forums: discuss implementations, research, etc. http://discuss.pytorch.org

				* forums: discuss implementations, research, etc. https://discuss.pytorch.org

				* GitHub issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.

				* Slack: general chat, online discussions, collaboration etc. https://pytorch.slack.com/ . Our slack channel is invite-only to promote a healthy balance between power-users and beginners. If you need a slack invite, ping us at slack@pytorch.org

				* newsletter: no-noise, one-way email newsletter with important announcements about pytorch. You can sign-up here: http://eepurl.com/cbG0rv

				* newsletter: no-noise, one-way email newsletter with important announcements about pytorch. You can sign-up here: https://eepurl.com/cbG0rv

				## Releases and Contributing

				@ -273,3 +275,7 @@ PyTorch is currently maintained by [Adam Paszke](https://apaszke.github.io/), [S

				A non-exhaustive but growing list needs to mention: Trevor Killeen, Sasank Chilamkurthy, Sergey Zagoruyko, Adam Lerer, Francisco Massa, Alykhan Tejani, Luca Antiga, Alban Desmaison, Andreas Kopf, James Bradbury, Zeming Lin, Yuandong Tian, Guillaume Lample, Marat Dukhan, Natalia Gimelshein, Christian Sarofeen, Martin Raison, Edward Yang, Zachary Devito.

				Note: this project is unrelated to [hughperkins/pytorch](https://github.com/hughperkins/pytorch) with the same name. Hugh is a valuable contributor in the Torch community and has helped with many things Torch and PyTorch.

				## License

				PyTorch is BSD-style licensed, as found in the LICENSE file.

3

aten/.flake8

View File

 @ -1,3 +0,0 @@
 [flake8]
 max-line-length = 120

3

aten/.gitignore vendored

View File

 @ -1,3 +0,0 @@
 __pycache__/
 build/
 *.pyc

									
										258

aten/README.md
									
												View File
											
				@ -1,258 +0,0 @@

				# ATen: A TENsor library

				ATen is a simple tensor library thats exposes the Tensor operations in Torch

				and PyTorch directly in C++11. The wrapper respects the semantics of operators

				in PyTorch, except minor details due to differences between C++ and Python in

				the way default arguments are handled. See the [documentation for tensors](http://pytorch.org/docs/tensors.html) in PyTorch for what these operations do.

				ATen's API is auto-generated from the same declarations PyTorch uses so the

				two APIs will track each other over time.

				Tensor types are resolved dynamically, such that the API is generic and

				does not include templates. That is, there is one `Tensor` type. It can hold a

				CPU or CUDA Tensor, and the tensor may have Doubles, Float, Ints, etc. This design

				makes it easy to write generic code without templating everything.

				See https://pytorch.org/cppdocs for the provided API. Excerpt:

				```c++

				Tensor atan2(const Tensor & other) const;

				Tensor & atan2_(const Tensor & other);

				Tensor pow(Scalar exponent) const;

				Tensor pow(const Tensor & exponent) const;

				Tensor & pow_(Scalar exponent);

				Tensor & pow_(const Tensor & exponent);

				Tensor lerp(const Tensor & end, Scalar weight) const;

				Tensor & lerp_(const Tensor & end, Scalar weight);

				Tensor histc() const;

				Tensor histc(int64_t bins) const;

				Tensor histc(int64_t bins, Scalar min) const;

				Tensor histc(int64_t bins, Scalar min, Scalar max) const;

				```

				Inplace operations are also provided, and always suffixed by `_` to indicate they will modify the Tensor.

				### Installation

				TH/THC/THNN/THCUNN are provided (as git subtrees), so the repo is standalone. You will need a C++11 compiler, cmake, and the pyyaml python package.

				```

				# Install pyyaml used by python code generation to read API declarations

				# macOS: if you don't have pip

				sudo easy_install pip

				# Ubuntu: if you don't have pip

				apt-get -y install python-pip

				# if you don't have pyyaml

				sudo pip install pyyaml

				mkdir build

				cd build

				cmake .. -DCMAKE_INSTALL_PREFIX=/where/you/want # specify your dest directory

				# cmake .. -DUSE_NVRTC=ON -DUSE_TENSORRT=OFF -DCMAKE_INSTALL_PREFIX=../install -DCAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO=OFF -DUSE_CUDA=ON # for CUDA

				# cmake .. -DUSE_CUDA=OFF  # for CPU only machines

				make install

				```

				### Example usage

				Here is a simple example; again, the syntax follows Torch semantics.

				```c++

				using namespace at; // assumed in the following

				Tensor d = CPU(kFloat).ones({3, 4});

				Tensor r = CPU(kFloat).zeros({3,4});

				for(auto i = 0; i < 100000; i++) {

				  r = r.add(d);

				  // equivalently

				  r = r + d;

				  // or

				  r += d;

				}

				```

				Want this running on the GPU?

				```c++

				using namespace at; // assumed in the following

				Tensor d = CUDA(kFloat).ones({3, 4});

				Tensor r = CUDA(kFloat).zeros({3,4});

				for(auto i = 0; i < 100000; i++) {

				  r = r.add(d);

				  // equivalently

				  r = r + d;

				  // or

				  r += d;

				}

				```

				Expressions like `CUDA(kFloat)` are first-class `at::Type` objects that represent

				the type of a Tensor and are used to create Tensors when their type cannot be

				inferred.

				See more in [sample files](src/ATen/test).

				### Creating your kernel

				It is easy to create new kernels, thanks to the `dispatch<>()` templated function. Example:

				```c++

				// a simple sum kernel (for CPU only)

				template<typename T>

				struct sum_op {

				  // dispatch handles variable arguments for you

				  Tensor CPU(const Type & t, Tensor & x_)

				  {

				    Tensor x = x_.contiguous();

				    auto x_p = x.data<T>();

				    int64_t size = x.numel();

				    T sum = 0;

				    for(int64_t i = 0; i < size; i++) {

				      sum += x_p[i];

				    }

				    return sum;

				  };

				  Tensor CUDA(Tensor& x) {

				    throw std::invalid_argument("device not supported");

				  };

				};

				Tensor a = CPU(kFloat).rand({3, 7});

				std::cout << a << std::endl;

				std::cout << dispatch<sum_op>(a.type(),a) << " == " << a.sum() << std::endl;

				```

				### Efficient access to tensor elements

				When using Tensor-wide operations, the relative cost of dynamic dispatch is very small.

				However, there are cases, especially in your own kernels, where efficient element-wise access is needed,

				and the cost of dynamic dispatch inside the element-wise loop is very high.

				ATen provides _accessors_ that are created with a single dynamic check that a Tensor is the type and number of

				dimensions. Accessors then expose an API for accessing the Tensor elements efficiently:

				```c++

				Tensor foo = CPU(kFloat).rand({12,12});

				// assert foo is 2-dimensional and holds floats.

				auto foo_a = foo.accessor<float,2>();

				float trace = 0;

				for(int i = 0; i < foo_a.size(0); i++) {

				  // use the accessor foo_a to get tensor data.

				  trace += foo_a[i][i];

				}

				```

				Accessors are temporary views of a Tensor. They are only valid for the lifetime of the tensor that they

				view and hence should only be used locally in a function, like iterators.

				### Using externally created data

				If you already have your tensor data allocated in memory (CPU or CUDA),

				you can view that memory as a Tensor in ATen:

				```c++

				float data[] = { 1, 2, 3,

				                 4, 5, 6};

				auto f = CPU(kFloat).tensorFromBlob(data, {2,3});

				cout << f << endl;

				```

				These tensors cannot be resized because ATen does not own the memory, but otherwise

				behave as normal tensors.

				### Scalars and zero-dimensional tensors

				In addition to the `Tensor` objects, ATen also includes `Scalar`s that represent a single number.

				Like a Tensor, Scalars are dynamically typed and can hold any one of ATen's number types.

				Scalars can be implicitly constructed from C++ number types. Scalars are needed because some functions like `addmm` take numbers along with Tensors and expect these

				numbers to be the same dynamic type as the tensor. They are also used in the API to indicate places where

				a function will _always_ return a Scalar value, like `sum`.

				```c++

				Tensor addmm(Scalar beta, const Tensor & self,

				             Scalar alpha, const Tensor & mat1,

				             const Tensor & mat2);

				Scalar sum(const Tensor & self);

				//usage

				Tensor a = ...

				Tensor b = ...

				Tensor c = ...

				Tensor r = addmm(1.0, a, .5, b, c);

				```

				In addition to Scalars, ATen also allows Tensor objects to be zero-dimensional. These Tensors hold

				a single value and they can be references to a single element in a larger Tensor. They can be used anywhere a Tensor is expected. They are normally created by operators like `select` which reduce the dimensions of

				a Tensor.

				```c++

				Tensor two = CPU(kFloat).rand({10,20});

				two[1][2] = 4;

				//~~~~~~~  zero-dimensional Tensor

				```

				It is possible to convert between Scalar and zero-dim Tensors:

				```c++

				Tensor zero_dim = CPU(kFloat).scalarTensor(4);

				Scalar from_tensor = Scalar(zero_dim); //only valid when zero_dim.dim() == 0;

				```

				### Avoiding unnecessary CUDA synchronization in your kernels when using Scalars

				Moving a single number from the GPU to the CPU introduces a synchronization point

				that can add latency to your program. In certain cases the result of a GPU operator like `sum` which

				returns a Scalar may be plugged into another GPU operator as an argument. If Scalars were always copied

				to the CPU, this would result in 2 copies. To avoid these synchronizations, Scalar objects can be

				optionally backed by a zero-dim Tensor, and are only copied to the CPU when requested.

				```c++

				auto a = CUDA(kFloat).rand({3,4});

				Scalar on_gpu = Scalar(a[1][1]); //backed by zero-dim Tensor

				assert(on_gpu.isBackedByTensor());

				double value = on_gpu.toDouble(); // copied to CPU, if it was backed by GPU Tensor.

				Scalar svalue = on_gpu.local(); // force the Scalar to become local to CPU.

				// get the scalar as a zero-dim tensor. If it was already backed

				// by a zero-dim Tensor then this op has no synchronization.

				// if the Scalar was local on CPU, it performs the copy

				Tensor same_tensor = CUDA(kFloat).scalarTensor(on_gpu);

				```

				Operators aware of the location of Scalars can arrange to do the minimal number of copies required.

				### Developer notes

				ATen relies heavily on code generation to automatically generate headers

				and implementations for all of the tensor methods it supports.  The main

				entry point for the script which does all this work is

				[`src/ATen/gen.py`](src/ATen/gen.py), which ingests

				[`src/ATen/Declarations.cwrap`](src/ATen/Declarations.cwrap),

				[`src/ATen/nn.yaml`](src/ATen/nn.yaml),

				[`src/ATen/native/native_functions.yaml`](src/ATen/native/native_functions.yaml) and the THNN/THCUNN headers and

				produces all of the headers and wrapping code necessary to generate

				the ATen interface.

				If you need to understand how ATen understands a declaration after all

				of this processing occurs, it's helpful to look at the generated file

				`Declarations.yaml` (NB: not cwrap) which contains information for all

				ATen methods in a uniform manner.  This file is utilized by PyTorch

				which further extends the ATen interface with support for automatic

				differentation.

				#### Note [ATen preprocessor philosophy]

				ATen is designed to be simple to use, and one of the things this implies is

				that it should not be necessary to use preprocessor macros when using ATen;

				we would rather provide all symbols, even for functionality that is not

				available on the system ATen is running on.

				This means that internally inside ATen, whereas other libraries might

				simply omit source files for, e.g., CuDNN, when CuDNN libraries are not

				installed, ATen will always build these source files, compiling stub

				functions for anything that is not available.  ATen never uses

				`AT_ENABLED_CUDA()` in header files, and all types in ATen's public API

				are always available no matter your build configuration.

									
										18

aten/src/ATen/ATen.h
									
												View File
												
				@ -1,9 +1,7 @@

				#pragma once

				#include "ATen/core/ATenGeneral.h"

				#include "ATen/Allocator.h"

				#include "ATen/CPUGeneral.h"

				#include "ATen/CUDAGuard.h"

				#include "ATen/Context.h"

				#include "ATen/Device.h"

				#include "ATen/DeviceGuard.h"

				@ -11,16 +9,16 @@

				#include "ATen/Dispatch.h"

				#include "ATen/Formatting.h"

				#include "ATen/Functions.h"

				#include "ATen/core/Generator.h"

				#include "ATen/core/Layout.h"

				#include "ATen/OptionsGuard.h"

				#include "ATen/core/Scalar.h"

				#include "ATen/ScalarOps.h"

				#include "ATen/core/Storage.h"

				#include "ATen/Tensor.h"

				#include "ATen/TensorGeometry.h"

				#include "ATen/core/TensorMethods.h"

				#include "ATen/TensorOperators.h"

				#include "ATen/core/TensorOptions.h"

				#include "ATen/Type.h"

				#include "ATen/core/Error.h"

				#include "ATen/core/ATenGeneral.h"

				#include "ATen/core/Generator.h"

				#include <c10/core/Layout.h>

				#include "ATen/core/Scalar.h"

				#include <c10/core/Storage.h>

				#include "ATen/core/TensorMethods.h"

				#include "ATen/core/TensorOptions.h"

				#include <c10/util/Exception.h>

									
										7

aten/src/ATen/AccumulateType.h
									
												View File
												
				@ -6,9 +6,12 @@

				// Example:

				//   using accscalar_t = acc_type<scalar_t, true>;

				#ifdef __CUDACC__

				#if defined(__CUDACC__)

				#include <cuda.h>

				#include <cuda_fp16.h>

				#elif defined(__HIPCC__)

				#include <hip/hip_runtime.h>

				#include <hip/hip_fp16.h>

				#endif

				namespace at {

				@ -16,7 +19,7 @@ namespace at {

				template <typename T, bool is_cuda>

				struct AccumulateType { };

				#ifdef __CUDACC__

				#if defined(__CUDACC__) || defined(__HIPCC__)

				template <> struct AccumulateType<half, true> { using type = float; };

				#endif

				template <> struct AccumulateType<Half, true> { using type = float; };

									
										2

aten/src/ATen/Allocator.h
									
												View File
												
				@ -1,2 +1,2 @@

				#pragma once

				#include <ATen/core/Allocator.h>

				#include <c10/core/Allocator.h>

									
										2

aten/src/ATen/ArrayRef.h
									
												View File
												
				@ -1,2 +1,2 @@

				#pragma once

				#include <ATen/core/ArrayRef.h>

				#include <c10/util/ArrayRef.h>

									
										2

aten/src/ATen/Backend.h
									
												View File
												
				@ -1,2 +1,2 @@

				#pragma once

				#include <ATen/core/Backend.h>

				#include <c10/core/Backend.h>

									
										29

aten/src/ATen/CMakeLists.txt
									
												View File
												
				@ -20,8 +20,8 @@ CONFIGURE_FILE(Config.h.in "${CMAKE_CURRENT_SOURCE_DIR}/Config.h")

				CONFIGURE_FILE(cuda/CUDAConfig.h.in "${CMAKE_CURRENT_SOURCE_DIR}/cuda/CUDAConfig.h")

				# NB: If you edit these globs, you'll have to update setup.py package_data as well

				FILE(GLOB base_h "*.h" "detail/*.h")

				FILE(GLOB base_cpp "*.cpp" "detail/*.cpp")

				FILE(GLOB base_h "*.h" "detail/*.h" "cpu/*.h")

				FILE(GLOB base_cpp "*.cpp" "detail/*.cpp" "cpu/*.cpp")

				add_subdirectory(core)

				FILE(GLOB cuda_h "cuda/*.h" "cuda/detail/*.h" "cuda/*.cuh" "cuda/detail/*.cuh")

				FILE(GLOB cuda_cpp "cuda/*.cpp" "cuda/detail/*.cpp")

				@ -158,6 +158,16 @@ if(NOT MSVC AND NOT EMSCRIPTEN)

				  set(OLD_CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})

				  set(CMAKE_CXX_FLAGS)

				  # Bump up optimization level for sleef to -O1, since at -O0 the compiler

				  # excessively spills intermediate vector registers to the stack

				  # and makes things run impossibly slowly

				  set(OLD_CMAKE_C_FLAGS_DEBUG ${CMAKE_C_FLAGS_DEBUG})

				  IF(${CMAKE_C_FLAGS_DEBUG} MATCHES "-O0")

				    string(REGEX REPLACE "-O0" "-O1" CMAKE_C_FLAGS_DEBUG ${OLD_CMAKE_C_FLAGS_DEBUG})

				  ELSE()

				    set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} -O1")

				  ENDIF()

				  set(BUILD_SHARED_LIBS OFF CACHE BOOL "Build sleef static" FORCE)

				  set(BUILD_DFT OFF CACHE BOOL "Don't build sleef DFT lib" FORCE)

				  set(BUILD_GNUABI_LIBS OFF CACHE BOOL "Don't build sleef gnuabi libs" FORCE)

				@ -168,6 +178,7 @@ if(NOT MSVC AND NOT EMSCRIPTEN)

				  link_directories(${CMAKE_BINARY_DIR}/sleef/lib)

				  list(APPEND ATen_CPU_DEPENDENCY_LIBS sleef)

				  set(CMAKE_C_FLAGS_DEBUG ${OLD_CMAKE_C_FLAGS_DEBUG})

				  set(CMAKE_CXX_FLAGS ${OLD_CMAKE_CXX_FLAGS})

				  # Set these back. TODO: Use SLEEF_ to pass these instead

				@ -195,6 +206,12 @@ IF(USE_CUDA AND NOT USE_ROCM)

					--generate-code arch=compute_50,code=sm_50

					--generate-code arch=compute_60,code=sm_60

					--generate-code arch=compute_70,code=sm_70)

				    elseif(${CUDA_VERSION_MAJOR} EQUAL "10")

				      SET(CUFFT_FAKELINK_OPTIONS

					--generate-code arch=compute_35,code=sm_35

					--generate-code arch=compute_50,code=sm_50

					--generate-code arch=compute_60,code=sm_60

					--generate-code arch=compute_70,code=sm_70)

				    else()

				      MESSAGE(FATAL_ERROR "Unhandled major cuda version ${CUDA_VERSION_MAJOR}")

				    endif()

				@ -274,7 +291,7 @@ else()

				  target_link_libraries(ATen_cpu PRIVATE ATEN_CPU_FILES_GEN_LIB)

				  caffe2_interface_library(ATen_cpu ATen_cpu_library)

				  # Set standard properties on the target

				  aten_set_target_props(ATen_cpu)

				  torch_set_target_props(ATen_cpu)

				  # Make sure these don't get built by parent

				  set(ATen_CPU_SRCS)

				@ -315,7 +332,7 @@ if(USE_CUDA OR USE_ROCM)

				        ATen_cuda PUBLIC ATen_cpu ${ATen_PUBLIC_CUDA_DEPENDENCY_LIBS})

				    # Set standard properties on the target

				    aten_set_target_props(ATen_cuda)

				    torch_set_target_props(ATen_cuda)

				    caffe2_interface_library(ATen_cuda ATen_cuda_library)

				@ -333,9 +350,9 @@ if(NOT AT_LINK_STYLE STREQUAL "INTERFACE")

				  endif()

				  if(NOT MSVC)

				    aten_compile_options(ATen_cpu)

				    torch_compile_options(ATen_cpu)

				    if(USE_CUDA OR USE_ROCM)

				      aten_compile_options(ATen_cuda)

				      torch_compile_options(ATen_cuda)

				    endif()

				  endif()

									
										18

aten/src/ATen/CPUApplyUtils.h
									
												View File
												
				@ -156,12 +156,14 @@ struct strided_tensor_iter_fixed {

				  strided_tensor_iter_fixed(Tensor& tensor, bool sort_strides = false)

				      : data_(tensor.data<T>()) {

				    std::memset(counter_, 0, sizeof(int64_t) * N);

				    std::memcpy(

				        sizes_, tensor.sizes().data(), tensor.ndimension() * sizeof(int64_t));

				    std::memcpy(

				        strides_,

				        tensor.strides().data(),

				        tensor.ndimension() * sizeof(int64_t));

				    if (tensor.dim() > 0) {

				      std::memcpy(

				          sizes_, tensor.sizes().data(), tensor.dim() * sizeof(int64_t));

				      std::memcpy(

				          strides_,

				          tensor.strides().data(),

				          tensor.dim() * sizeof(int64_t));

				    }

				    dim_ = std::get<1>(collapse_dims(sizes_, strides_, tensor.ndimension()));

				  }

				};

				@ -207,7 +209,7 @@ inline std::string _all_equal_numel_error(at::ArrayRef<Tensor> tensors) {

				  for (size_t i = 0; i < tensors.size() - 1; i++) {

				    oss << tensors[i].sizes() << ", ";

				  }

				  oss << "and " << tensors[tensors.size() - 1]

				  oss << "and " << tensors[tensors.size() - 1].sizes()

				      << " to have the same number of elements, but got ";

				  for (size_t i = 0; i < tensors.size() - 1; i++) {

				    oss << tensors[i].numel() << ", ";

				@ -220,7 +222,7 @@ inline std::string _all_equal_numel_error(at::ArrayRef<Tensor> tensors) {

				inline bool _apply_preamble(ArrayRef<Tensor> tensors) {

				  checkBackend("CPU_tensor_apply", tensors, Backend::CPU);

				  if (!_all_equal_numel(tensors))

				    throw std::runtime_error(_all_equal_numel_error(tensors));

				    AT_ERROR(_all_equal_numel_error(tensors));

				  // An empty tensor has no elements

				  for (auto& t : tensors)

				    if (t.numel() == 0)

									
										2

aten/src/ATen/CPUFixedAllocator.h
									
												View File
												
				@ -1,7 +1,7 @@

				#pragma once

				#include "ATen/core/Error.h"

				#include "TH/TH.h"

				#include "c10/util/Exception.h"

				// This file creates a fake allocator that just throws exceptions if

				// it is actually used.

									
										8

aten/src/ATen/CPUGeneral.h
									
												View File
												
				@ -1,12 +1,12 @@

				#pragma once

				// Using AT_API is crucial as otherwise you'll see

				// Using CAFFE2_API is crucial as otherwise you'll see

				// linking errors using MSVC

				// See https://msdn.microsoft.com/en-us/library/a90k134d.aspx

				// This header adds this if using AT_API

				// This header adds this if using CAFFE2_API

				#include "ATen/core/ATenGeneral.h"

				namespace at {

				AT_API void set_num_threads(int);

				AT_API int get_num_threads();

				CAFFE2_API void set_num_threads(int);

				CAFFE2_API int get_num_threads();

				}

									
										2

aten/src/ATen/CPUTypeDefault.h
									
												View File
												
				@ -3,7 +3,7 @@

				namespace at {

				struct AT_API CPUTypeDefault : public TypeDefault {

				struct CAFFE2_API CPUTypeDefault : public TypeDefault {

				  CPUTypeDefault(TensorTypeId type_id, bool is_variable, bool is_undefined)

				      : TypeDefault(type_id, is_variable, is_undefined) {}

				  Allocator* allocator() const override;

0

aten/src/ATen/CUDAStream.cpp

View File

0

aten/src/ATen/CUDAStream.h

View File

									
										4

aten/src/ATen/CheckGenerator.h
									
												View File
												
				@ -1,8 +1,8 @@

				#pragma once

				#include "ATen/core/Generator.h"

				#include "ATen/Utils.h"

				#include "ATen/core/Error.h"

				#include "ATen/core/Generator.h"

				#include "c10/util/Exception.h"

				namespace at {

									
										46

aten/src/ATen/Context.cpp
									
												View File
												
				@ -13,13 +13,10 @@

				#include "ATen/CPUGenerator.h"

				#include "ATen/RegisterCPU.h"

				#include "ATen/Tensor.h"

				#include <ATen/cpu/FlushDenormal.h>

				#include "TH/TH.h"  // for USE_LAPACK

				#ifdef USE_SSE3

				#include <pmmintrin.h>

				#endif

				namespace at {

				static inline void errorHandler(const char * msg, void * data) {

				@ -33,7 +30,9 @@ static inline void argErrorHandler(int arg, const char * msg, void * data) {

				Context::Context()

				: next_id(static_cast<size_t>(TypeID::NumOptions))

				, thc_state(nullptr, [](THCState* p){ /* no-op */ } ) {

				, thc_state(nullptr, [](THCState* p){ /* no-op */ } )

				, thh_state(nullptr, [](THHState* p){ /* no-op */ } )

				{

				  THSetDefaultErrorHandler(errorHandler,nullptr);

				  THSetDefaultArgErrorHandler(argErrorHandler,nullptr);

				@ -94,51 +93,54 @@ bool Context::hasLAPACK() const {

				}

				bool Context::setFlushDenormal(bool on) {

				#ifdef USE_SSE3

				  // Setting flush-to-zero (FTZ) flag

				  _MM_SET_FLUSH_ZERO_MODE(on ? _MM_FLUSH_ZERO_ON

				                             : _MM_FLUSH_ZERO_OFF);

				  // Setting denormals-are-zero (DAZ) flag

				  _MM_SET_DENORMALS_ZERO_MODE(on ? _MM_DENORMALS_ZERO_ON

				                                 : _MM_DENORMALS_ZERO_OFF);

				  return true;

				#else

				  return false;

				#endif

				  return at::cpu::set_flush_denormal(on);

				}

				TypeExtendedInterface& getType(TensorOptions options) {

				  return globalContext().getType(

				            options.backend(), options.dtype(), options.is_variable());

				            options.backend(), typeMetaToScalarType(options.dtype()), options.is_variable());

				}

				TypeExtendedInterface& getType(const TensorImpl* impl) {

				  Backend backend = tensorTypeIdToBackend(impl->type_id());

				  return globalContext().getType(

				            backend, dataTypeToScalarType(impl->dtype().id()), impl->is_variable());

				            backend, typeMetaToScalarType(impl->dtype()), impl->is_variable());

				}

				TypeExtendedInterface& getType(const Tensor& t) {

				  return getType(t.unsafeGetTensorImpl());

				}

				LegacyTHDispatcher& getLegacyTHDispatcher(TensorOptions options) {

				  return globalContext().getLegacyTHDispatcher(

				            options.backend(), typeMetaToScalarType(options.dtype()));

				}

				LegacyTHDispatcher& getLegacyTHDispatcher(const TensorImpl* impl) {

				  Backend backend = tensorTypeIdToBackend(impl->type_id());

				  return globalContext().getLegacyTHDispatcher(

				            backend, typeMetaToScalarType(impl->dtype()));

				}

				Allocator* getCPUAllocator() {

				  return getTHDefaultAllocator();

				}

				struct LegacyTypeInit : public LegacyTypeInitInterface {

				  LegacyTypeInit(LegacyTypeInitArgs) {}

				struct LegacyDeviceTypeInit : public LegacyDeviceTypeInitInterface {

				  LegacyDeviceTypeInit(LegacyDeviceTypeInitArgs) {}

				  void initCPU() const override {

				    globalContext();

				  }

				  void initCUDA() const override {

				    globalContext().lazyInitCUDA();

				  }

				  void initHIP() const override {

				    globalContext().lazyInitHIP();

				  }

				  void initComplex() const override {

				    globalContext().lazyInitComplex();

				  }

				};

				REGISTER_LEGACY_TYPE_INIT(LegacyTypeInit);

				REGISTER_LEGACY_TYPE_INIT(LegacyDeviceTypeInit);

				}

									
										93

aten/src/ATen/Context.h
									
												View File
												
				@ -1,20 +1,19 @@

				#pragma once

				#include <ATen/CPUGeneral.h>

				#include "ATen/core/ATenGeneral.h"

				#include "ATen/CUDAStream.h"

				#include "ATen/core/Generator.h"

				#include "ATen/Type.h"

				#include "ATen/TypeExtendedInterface.h"

				#include "ATen/Utils.h"

				#include "ATen/core/Error.h"

				#include "ATen/detail/CUDAHooksInterface.h"

				#include "ATen/core/VariableHooksInterface.h"

				#include "ATen/detail/ComplexHooksInterface.h"

				#include "ATen/LegacyTHDispatch.h"

				#include "ATen/LegacyTHDispatcher.h"

				#include "ATen/core/ATenGeneral.h"

				#include "ATen/core/Generator.h"

				#include "ATen/core/LegacyTypeDispatch.h"

				// This is temporary

				#include "ATen/core/ATenCoreTest.h"

				#include "ATen/core/VariableHooksInterface.h"

				#include "ATen/detail/CUDAHooksInterface.h"

				#include "ATen/detail/HIPHooksInterface.h"

				#include "ATen/detail/ComplexHooksInterface.h"

				#include "c10/util/Exception.h"

				#include <memory>

				#include <mutex>

				@ -22,10 +21,10 @@

				namespace at {

				struct Tensor;

				class Tensor;

				class AT_API Context {

				public:

				class CAFFE2_API Context {

				 public:

				  Context();

				  TypeExtendedInterface* getNonVariableTypeRaw(Backend p, ScalarType s) {

				    return static_cast<TypeExtendedInterface*>(globalLegacyTypeDispatch().getNonVariableTypeRaw(p, s));

				@ -42,6 +41,9 @@ public:

				  TypeExtendedInterface & getType(Backend p, ScalarType s, bool is_variable) {

				    return static_cast<TypeExtendedInterface&>(globalLegacyTypeDispatch().getType(p, s, is_variable));

				  }

				  LegacyTHDispatcher& getLegacyTHDispatcher(Backend p, ScalarType s) {

				    return globalLegacyTHDispatch().getLegacyTHDispatcher(p, s);

				  }

				  // The passed in Type must be delete'able

				  // TODO: Just make it take a unique_ptr

				  void registerType(Backend b, ScalarType s, Type* t) {

				@ -49,8 +51,14 @@ public:

				      LegacyTypeDispatch::TypeUniquePtr{t, LegacyTypeDeleter([](Type* p) { delete p; }) });

				  }

				  void registerLegacyTHDispatcher(Backend b, ScalarType s, LegacyTHDispatcher* t) {

				    globalLegacyTHDispatch().registerDispatcher(b, s,

				      LegacyTHDispatch::LegacyTHDispatcherUniquePtr{t, LegacyTHDispatcherDeleter([](LegacyTHDispatcher* p) { delete p; }) });

				  }

				  Generator & defaultGenerator(DeviceType device_type) {

				    initCUDAIfNeeded(device_type);

				    initHIPIfNeeded(device_type);

				    auto & generator = generator_registry[static_cast<int>(device_type)];

				    if(!generator)

				      AT_ERROR(DeviceTypeName(device_type), " backend type not enabled.");

				@ -64,11 +72,8 @@ public:

				  bool hasCUDA() const {

				    return detail::getCUDAHooks().hasCUDA();

				  }

				  bool hasCuDNN() const {

				    return detail::getCUDAHooks().hasCuDNN();

				  }

				  int64_t current_device() const {

				    return detail::getCUDAHooks().current_device();

				  bool hasHIP() const {

				    return detail::getHIPHooks().hasHIP();

				  }

				  // defined in header so that getNonVariableType has ability to inline

				  // call_once check. getNonVariableType is called fairly frequently

				@ -81,6 +86,15 @@ public:

				    });

				    return thc_state.get();

				  }

				  THHState* lazyInitHIP() {

				    std::call_once(thh_init,[&] {

				      thh_state = detail::getHIPHooks().initHIP();

				      generator_registry[static_cast<int>(DeviceType::HIP)] =

				        detail::getHIPHooks().initHIPGenerator(this);

				      detail::getHIPHooks().registerHIPTypes(this);

				    });

				    return thh_state.get();

				  }

				  void lazyInitComplex() {

				    std::call_once(complex_init_, [&] {

				      detail::getComplexHooks().registerComplexTypes(this);

				@ -91,10 +105,10 @@ public:

				    // AT_ASSERT(thc_state);

				    return thc_state.get();

				  }

				  int getNumGPUs() const {

				    return detail::getCUDAHooks().getNumGPUs();

				  THHState* getTHHState() {

				    return thh_state.get();

				  }

				  size_t freshTypeID() {

				    return next_id++;

				  }

				@ -118,22 +132,29 @@ private:

				      lazyInitCUDA();

				    }

				  }

				  void initHIPIfNeeded(DeviceType p) {

				    if (p == DeviceType::HIP) {

				      lazyInitHIP();

				    }

				  }

				  void initComplexIfNeeded(ScalarType s) {

				    if (isComplexType(s)) {

				      lazyInitComplex();

				    }

				  }

				  std::once_flag thc_init;

				  std::once_flag thh_init;

				  std::once_flag complex_init_;

				  bool enabled_cudnn = true;

				  bool deterministic_cudnn = false;

				  bool benchmark_cudnn = false;

				  std::atomic<size_t> next_id;

				  std::unique_ptr<THCState, void(*)(THCState*)> thc_state;

				  std::unique_ptr<THHState, void(*)(THHState*)> thh_state;

				  friend struct Type;

				};

				AT_API Context & globalContext();

				CAFFE2_API Context& globalContext();

				static inline void init() {

				  globalContext();

				@ -153,11 +174,11 @@ static inline TypeExtendedInterface& getNonVariableType(DeviceType p, ScalarType

				  return globalContext().getNonVariableType(deviceTypeToBackend(p), s);

				}

				AT_API TypeExtendedInterface& getType(TensorOptions options);

				AT_API TypeExtendedInterface& getType(const TensorImpl*);

				AT_API TypeExtendedInterface& getType(const Tensor&);

				CAFFE2_API TypeExtendedInterface& getType(TensorOptions options);

				CAFFE2_API TypeExtendedInterface& getType(const TensorImpl*);

				CAFFE2_API TypeExtendedInterface& getType(const Tensor&);

				AT_API Allocator* getCPUAllocator();

				CAFFE2_API Allocator* getCPUAllocator();

				static inline TypeExtendedInterface& CPU(ScalarType s) {

				  return getNonVariableType(Backend::CPU, s);

				@ -167,12 +188,19 @@ static inline TypeExtendedInterface& CUDA(ScalarType s) {

				  return getNonVariableType(Backend::CUDA, s);

				}

				static inline TypeExtendedInterface& HIP(ScalarType s) {

				  return getNonVariableType(Backend::HIP, s);

				}

				CAFFE2_API LegacyTHDispatcher& getLegacyTHDispatcher(TensorOptions options);

				CAFFE2_API LegacyTHDispatcher& getLegacyTHDispatcher(const Tensor&);

				static inline bool hasCUDA() {

				  return globalContext().hasCUDA();

				}

				static inline bool hasCuDNN() {

				  return globalContext().hasCuDNN();

				static inline bool hasHIP() {

				  return globalContext().hasHIP();

				}

				static inline bool hasMKL() {

				@ -187,8 +215,13 @@ static inline bool hasMAGMA() {

				  return globalContext().hasMAGMA();

				}

				static inline int64_t current_device() {

				  return globalContext().current_device();

				static inline void manual_seed(uint64_t seed) {

				  globalContext().defaultGenerator(DeviceType::CPU).manualSeed(seed);

				  // NB: Sometimes we build with CUDA, but we don't have any GPUs

				  // available. In that case, we must not seed CUDA; it will fail!

				  if (hasCUDA() && detail::getCUDAHooks().getNumGPUs() > 0) {

				    globalContext().defaultGenerator(DeviceType::CUDA).manualSeedAll(seed);

				  }

				}

				} // namespace at

									
										2

aten/src/ATen/DLConvertor.cpp
									
												View File
												
				@ -152,7 +152,7 @@ DLManagedTensor* toDLPack(const Tensor& src) {

				  atDLMTensor->tensor.deleter = &deleter;

				  atDLMTensor->tensor.dl_tensor.data = src.data_ptr();

				  int64_t device_id = 0;

				  if (src.type().is_cuda()) {

				  if (src.is_cuda()) {

				    device_id = src.get_device();

				  }

				  atDLMTensor->tensor.dl_tensor.ctx = getDLContext(src.type(), device_id);

									
										6

aten/src/ATen/DLConvertor.h
									
												View File
												
				@ -10,8 +10,8 @@

				namespace at {

				AT_API ScalarType toScalarType(const DLDataType& dtype);

				AT_API DLManagedTensor * toDLPack(const Tensor& src);

				AT_API Tensor fromDLPack(const DLManagedTensor* src);

				CAFFE2_API ScalarType toScalarType(const DLDataType& dtype);

				CAFFE2_API DLManagedTensor* toDLPack(const Tensor& src);

				CAFFE2_API Tensor fromDLPack(const DLManagedTensor* src);

				} //namespace at

727

aten/src/ATen/Declarations.cwrap

View File

File diff suppressed because it is too large Load Diff

									
										2

aten/src/ATen/Device.h
									
												View File
												
				@ -1,2 +1,2 @@

				#pragma once

				#include <ATen/core/Device.h>

				#include <c10/Device.h>

									
										148

aten/src/ATen/DeviceGuard.h
									
												View File
												
				@ -1,132 +1,36 @@

				#pragma once

				#include <ATen/core/Device.h>

				#include <ATen/core/ScalarType.h>

				#include <ATen/Tensor.h>

				#include <ATen/core/Error.h>

				#include <ATen/core/optional.h>

				#include <ATen/detail/CUDAHooksInterface.h>

				#include <cstddef>

				#include <c10/DeviceGuard.h>

				#include <ATen/core/Tensor.h>

				#include <c10/core/ScalarType.h> // TensorList whyyyyy

				namespace at {

				/// RAII guard that sets a certain default GPU index in its constructor, and

				/// changes it back to the device that was originally active upon destruction.

				///

				/// The index is always reset to the one that was active at the time of

				/// construction of the guard. Even if you `set_index` after construction, the

				/// destructor will still reset the index to the one that was active at

				/// construction time.

				struct DeviceGuard {

				  /// Default constructor, does nothing.

				  DeviceGuard() = default;

				  /// Uses the given device's `index()` if it is a CUDA device, else does

				  /// nothing.

				  explicit DeviceGuard(Device device) {

				    if (device.is_cuda()) {

				      set_index(device.index());

				    }

				// Are you here because you're wondering why DeviceGuard(tensor) no

				// longer works?  For code organization reasons, we have temporarily(?)

				// removed this constructor from DeviceGuard.  The new way to

				// spell it is:

				//

				//    OptionalDeviceGuard guard(device_of(tensor));

				/// Return the Device of a Tensor, if the Tensor is defined.

				inline optional<Device> device_of(Tensor t) {

				  if (t.defined()) {

				    return make_optional(t.device());

				  } else {

				    return nullopt;

				  }

				}

				  explicit DeviceGuard(optional<Device> device_opt) {

				    if (device_opt.has_value() && device_opt.value().is_cuda()) {

				      set_index(device_opt.value().index());

				    }

				/// Return the Device of a TensorList, if the list is non-empty and

				/// the first Tensor is defined.  (This function implicitly assumes

				/// that all tensors in the list have the same device.)

				inline optional<Device> device_of(TensorList t) {

				  if (!t.empty()) {

				    return device_of(t.front());

				  } else {

				    return nullopt;

				  }

				}

				  /// Calls `set_index` with the given index.

				  explicit DeviceGuard(int32_t index) {

				    set_index(index);

				  }

				  /// Sets the device to the index on which the given tensor is located.

				  explicit DeviceGuard(const Tensor& tensor) {

				    set_index_from(tensor);

				  }

				  /// Sets the device to the index on which the first tensor in the list is

				  /// located. If the list is empty, does nothing.

				  explicit DeviceGuard(const TensorList& tensors) {

				    if (!tensors.empty()) {

				      set_index_from(tensors.front());

				    }

				  }

				  /// Copy is disallowed.

				  DeviceGuard(const DeviceGuard&) = delete;

				  DeviceGuard& operator=(const DeviceGuard&) = delete;

				  /// Move-constructs this `DeviceGuard` from another `DeviceGuard`. The

				  /// moved-from `DeviceGuard` is modified such that its destruction has no

				  /// effect (does not reset the device).

				  DeviceGuard(DeviceGuard&& other) noexcept {

				    *this = std::move(other);

				  }

				  /// Move-assigns this `DeviceGuard` from another `DeviceGuard`. The

				  /// moved-from `DeviceGuard` is modified such that its destruction has no

				  /// effect (does not reset the device).

				  DeviceGuard& operator=(DeviceGuard&& other) noexcept {

				    this->original_index_ = other.original_index_;

				    this->last_index_ = other.last_index_;

				    // Set other's original index to the unspecified/default state, so that it

				    // doesn't also reset the device in its constructor.

				    other.original_index_ = -1;

				    return *this;

				  }

				  /// Resets the device to the index that was active at construction of the

				  /// guard.

				  ~DeviceGuard() {

				    // It should only not have a value if an index was never actually set.

				    if (original_index_ != -1) {

				      // Unchecked because we don't want to throw in the destructor.

				      detail::DynamicCUDAInterface::unchecked_set_device(original_index_);

				    }

				  }

				  /// Sets the device to the given one.

				  void set_index(int32_t index) {

				    if (index == -1) {

				      return;

				    }

				    AT_ASSERT(index >= 0);

				    if (original_index_ == -1) {

				      int32_t previous_index = -123;

				      detail::DynamicCUDAInterface::get_device(&previous_index);

				      original_index_ = previous_index;

				      if (index != original_index_) {

				        detail::DynamicCUDAInterface::set_device(index);

				      }

				    } else {

				      detail::DynamicCUDAInterface::set_device(index);

				    }

				    last_index_ = index;

				  }

				  /// Calls `set_index` with the `Tensor`'s current device, if it is a CUDA

				  /// tensor. Does nothing if the `tensor` is not defined.

				  void set_index_from(const Tensor& tensor) {

				    if (tensor.defined() && tensor.is_cuda()) {

				      set_index(tensor.get_device());

				    }

				  }

				  /// Returns the device that was set upon construction of the guard.

				  int32_t original_index() const noexcept {

				    return original_index_;

				  }

				  /// Returns the last device that was set via `set_index`, if any.

				  int32_t last_index() const noexcept {

				    return last_index_;

				  }

				 private:

				  /// The original device that was active at construction of this object.

				  int32_t original_index_ = -1;

				  /// The last index that was set via `set_index`.

				  int32_t last_index_ = -1;

				};

				} // namespace at

									
										11

aten/src/ATen/DimVector.h
									
												View File
												
				@ -1,11 +1,2 @@

				#pragma once

				#include <ATen/core/SmallVector.h>

				#include <stdint.h>

				namespace at {

				/// A container for sizes or strides

				using DimVector = SmallVector<int64_t, 5>;

				}

				#include <ATen/core/DimVector.h>

									
										231

aten/src/ATen/Dispatch.h
									
												View File
												
				@ -1,8 +1,8 @@

				#pragma once

				#include <ATen/Type.h>

				#include <ATen/core/Error.h>

				#include <ATen/core/Half.h>

				#include <c10/util/Exception.h>

				#define AT_PRIVATE_CASE_TYPE(enum_type, type, ...) \

				  case enum_type: {                                \

				@ -10,121 +10,144 @@

				    return __VA_ARGS__();                          \

				  }

				#define AT_DISPATCH_FLOATING_TYPES(TYPE, NAME, ...)                           \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)         \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				#define AT_DISPATCH_FLOATING_TYPES(TYPE, NAME, ...)                          \

				  [&] {                                                                      \

				    const at::Type& the_type = TYPE;                                         \

				    switch (the_type.scalarType()) {                                         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)      \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)        \

				      default:                                                               \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \

				    }                                                                        \

				  }()

				#define AT_DISPATCH_FLOATING_TYPES_AND_HALF(TYPE, NAME, ...)                  \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__)       \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				#define AT_DISPATCH_FLOATING_TYPES_AND_HALF(TYPE, NAME, ...)                 \

				  [&] {                                                                      \

				    const at::Type& the_type = TYPE;                                         \

				    switch (the_type.scalarType()) {                                         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)      \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__)      \

				      default:                                                               \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \

				    }                                                                        \

				  }()

				#define AT_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...)                           \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)       \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				#define AT_DISPATCH_FLOATING_AND_COMPLEX_TYPES(TYPE, NAME, ...)              \

				  [&] {                                                                      \

				    const at::Type& the_type = TYPE;                                         \

				    switch (the_type.scalarType()) {                                         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)      \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__)      \

				      AT_PRIVATE_CASE_TYPE(                                                  \

				          at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__)  \

				      AT_PRIVATE_CASE_TYPE(                                                  \

				          at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__)    \

				      AT_PRIVATE_CASE_TYPE(                                                  \

				          at::ScalarType::ComplexHalf, std::complex<at::Half>, __VA_ARGS__)  \

				      default:                                                               \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \

				    }                                                                        \

				  }()

				#define AT_DISPATCH_ALL_TYPES(TYPE, NAME, ...)                                \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)       \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				#define AT_DISPATCH_INTEGRAL_TYPES(TYPE, NAME, ...)                          \

				  [&] {                                                                      \

				    const at::Type& the_type = TYPE;                                         \

				    switch (the_type.scalarType()) {                                         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)      \

				      default:                                                               \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \

				    }                                                                        \

				  }()

				#define AT_DISPATCH_ALL_TYPES_AND_HALF(TYPE, NAME, ...)                       \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__)       \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				#define AT_DISPATCH_ALL_TYPES(TYPE, NAME, ...)                               \

				  [&] {                                                                      \

				    const at::Type& the_type = TYPE;                                         \

				    switch (the_type.scalarType()) {                                         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)      \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)      \

				      default:                                                               \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \

				    }                                                                        \

				  }()

				#define AT_DISPATCH_COMPLEX_TYPES(TYPE, NAME, ...)                            \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__)      \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				#define AT_DISPATCH_ALL_TYPES_AND_HALF(TYPE, NAME, ...)                      \

				  [&] {                                                                      \

				    const at::Type& the_type = TYPE;                                         \

				    switch (the_type.scalarType()) {                                         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)      \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)      \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__)      \

				      default:                                                               \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \

				    }                                                                        \

				  }()

				#define AT_DISPATCH_ALL_TYPES_AND_COMPLEX(TYPE, NAME, ...)                       \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__)      \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				#define AT_DISPATCH_COMPLEX_TYPES(TYPE, NAME, ...)                           \

				  [&] {                                                                      \

				    const at::Type& the_type = TYPE;                                         \

				    switch (the_type.scalarType()) {                                         \

				      AT_PRIVATE_CASE_TYPE(                                                  \

				          at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__)    \

				      AT_PRIVATE_CASE_TYPE(                                                  \

				          at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__)  \

				      default:                                                               \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \

				    }                                                                        \

				  }()

				#define AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX(TYPE, NAME, ...)                       \

				  [&] {                                                                       \

				    const at::Type& the_type = TYPE;                                          \

				    switch (the_type.scalarType()) {                                          \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__)      \

				      default:                                                                \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'");  \

				    }                                                                         \

				#define AT_DISPATCH_ALL_TYPES_AND_COMPLEX(TYPE, NAME, ...)                   \

				  [&] {                                                                      \

				    const at::Type& the_type = TYPE;                                         \

				    switch (the_type.scalarType()) {                                         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)      \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)      \

				      AT_PRIVATE_CASE_TYPE(                                                  \

				          at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__)    \

				      AT_PRIVATE_CASE_TYPE(                                                  \

				          at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__)  \

				      default:                                                               \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \

				    }                                                                        \

				  }()

				#define AT_DISPATCH_ALL_TYPES_AND_HALF_AND_COMPLEX(TYPE, NAME, ...)          \

				  [&] {                                                                      \

				    const at::Type& the_type = TYPE;                                         \

				    switch (the_type.scalarType()) {                                         \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Byte, uint8_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Char, int8_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Double, double, __VA_ARGS__)      \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Float, float, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Int, int32_t, __VA_ARGS__)        \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Long, int64_t, __VA_ARGS__)       \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Short, int16_t, __VA_ARGS__)      \

				      AT_PRIVATE_CASE_TYPE(at::ScalarType::Half, at::Half, __VA_ARGS__)      \

				      AT_PRIVATE_CASE_TYPE(                                                  \

				          at::ScalarType::ComplexFloat, std::complex<float>, __VA_ARGS__)    \

				      AT_PRIVATE_CASE_TYPE(                                                  \

				          at::ScalarType::ComplexDouble, std::complex<double>, __VA_ARGS__)  \

				      default:                                                               \

				        AT_ERROR(#NAME, " not implemented for '", the_type.toString(), "'"); \

				    }                                                                        \

				  }()

									
										2

aten/src/ATen/Error.h
									
												View File
											
				@ -1,2 +0,0 @@

				#pragma once

				#include <ATen/core/Error.h>

									
										6

aten/src/ATen/ExpandUtils.cpp
									
												View File
												
				@ -68,7 +68,11 @@ std::tuple<std::vector<int64_t>, std::vector<int64_t>> inferExpandGeometry(

				          ") must match the existing size (",

				          size,

				          ") at non-singleton dimension ",

				          i);

				          i,

				          ".  Target sizes: ",

				          sizes,

				          ".  Tensor sizes: ",

				          tensor_sizes);

				      size = targetSize;

				      stride = 0;

				    }

									
										32

aten/src/ATen/ExpandUtils.h
									
												View File
												
				@ -1,7 +1,7 @@

				#pragma once

				#include "ATen/Tensor.h"

				#include "ATen/core/Error.h"

				#include "c10/util/Exception.h"

				#include <functional>

				#include <sstream>

				@ -9,9 +9,12 @@

				namespace at {

				AT_API std::vector<int64_t> infer_size(IntList a, IntList b);

				AT_API std::tuple<std::vector<int64_t>, std::vector<int64_t> > inferExpandGeometry(

				    IntList tensor_sizes, IntList tensor_strides, IntList sizes);

				CAFFE2_API std::vector<int64_t> infer_size(IntList a, IntList b);

				CAFFE2_API std::tuple<std::vector<int64_t>, std::vector<int64_t>>

				inferExpandGeometry(

				    IntList tensor_sizes,

				    IntList tensor_strides,

				    IntList sizes);

				// avoid copy-construction of Tensor by using a reference_wrapper.

				inline void check_defined(std::initializer_list<std::reference_wrapper<const Tensor>> tensors, const char *api_name) {

				@ -133,20 +136,25 @@ inline std::vector<Tensor> expand_outplace(TensorList to_expand) {

				// Sums `tensor` repeatedly to produce a tensor of shape `shape`.

				// Precondition: is_expandable_to(shape, tensor.sizes()) must be true

				static inline Tensor sum_to(Tensor tensor, IntList shape) {

				static inline Tensor sum_to(Tensor tensor, const IntList shape) {

				  if (shape.size() == 0) {

				    return tensor.sum();

				  }

				  Tensor result = tensor;

				  while (result.dim() > (int64_t)shape.size()) {

				    result = result.sum(0, false);

				  c10::SmallVector<int64_t, 8> reduce_dims;

				  const at::IntList sizes = tensor.sizes();

				  const int64_t leading_dims = sizes.size() - shape.size();

				  for (int64_t i = 0; i < leading_dims; ++i) {

				    reduce_dims.push_back(i);

				  }

				  for (int64_t i = 0; i < result.dim(); ++i) {

				    if (shape[i] == 1 && result.sizes()[i] > 1) {

				      result = result.sum(i, true);

				  for (int64_t i = leading_dims; i < static_cast<int64_t>(sizes.size()); ++i) {

				    if (shape[i - leading_dims] == 1 && sizes[i] > 1) {

				      reduce_dims.push_back(i);

				    }

				  }

				  return result;

				  if (!reduce_dims.empty()) {

				    tensor = tensor.sum(reduce_dims, /*keepdim=*/true);

				  }

				  return leading_dims > 0 ? tensor.view(shape) : tensor;

				}

				// True if `shape` can be broadcasted to `desired`

									
										25

aten/src/ATen/Formatting.h
									
												View File
												
				@ -1,24 +1 @@

				#pragma once

				#include <iostream>

				#include "ATen/Type.h"

				#include "ATen/core/Scalar.h"

				namespace at {

				AT_API std::ostream& operator<<(std::ostream & out, IntList list);

				AT_API std::ostream& operator<<(std::ostream & out, Backend b);

				AT_API std::ostream& operator<<(std::ostream & out, const Type & t);

				AT_API std::ostream& print(std::ostream& stream, const Tensor & tensor, int64_t linesize);

				static inline std::ostream& operator<<(std::ostream & out, const Tensor & t) {

				  return print(out,t,80);

				}

				static inline void print(const Tensor & t, int64_t linesize=80) {

				  print(std::cout,t,linesize);

				}

				static inline std::ostream& operator<<(std::ostream & out, Scalar s) {

				  return out << (s.isFloatingPoint() ? s.toDouble() : s.toLong());

				}

				}

				#include <ATen/core/Formatting.h>

									
										6

aten/src/ATen/InferSize.h
									
												View File
												
				@ -1,7 +1,7 @@

				#pragma once

				#include <ATen/optional.h>

				#include <ATen/ScalarType.h>

				#include <c10/core/ScalarType.h>

				#include <c10/util/Optional.h>

				#include <sstream>

				#include <vector>

				@ -12,7 +12,7 @@ namespace at {

				static std::vector<int64_t> infer_size(IntList shape, int64_t numel) {

				  auto res = shape.vec();

				  int64_t newsize = 1;

				  auto infer_dim = at::optional<int64_t>();

				  auto infer_dim = c10::optional<int64_t>();

				  for (int64_t dim = 0, ndim = shape.size(); dim != ndim; dim++) {

				    if (shape[dim] == -1) {

				      if (infer_dim) {

									
										15

aten/src/ATen/InitialTensorOptions.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,15 @@

				#pragma once

				#include <ATen/core/TensorOptions.h>

				namespace at {

				// Represents the initial TensorOptions, before the "defaults" are ever changed.

				// This is designed to be used in library code, where the explicit devices, dtypes, etc. are known.

				// NOTE: this is not a stable API.

				inline TensorOptions initialTensorOptions() {

				  return TensorOptions(kCPU).dtype(kFloat).layout(kStrided)

				                            .requires_grad(false).is_variable(false);

				}

				}

									
										2

aten/src/ATen/Layout.h
									
												View File
												
				@ -1,2 +1,2 @@

				#pragma once

				#include <ATen/core/Layout.h>

				#include <c10/core/Layout.h>

									
										12

aten/src/ATen/LegacyTHDispatch.cpp
									
										Normal file
									
												View File
												
				@ -0,0 +1,12 @@

				#include <ATen/LegacyTHDispatch.h>

				namespace at {

				// TODO: This could be bad juju if someone calls globalContext() in the

				// destructor of an object with static lifetime.

				LegacyTHDispatch & globalLegacyTHDispatch() {

				  static LegacyTHDispatch singleton;

				  return singleton;

				}

				}

									
										91

aten/src/ATen/LegacyTHDispatch.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,91 @@

				#pragma once

				// LegacyTHDispatcher is the legacy mechanism for dispatching directly

				// to TH/THNN/THC/THCUNN functions in ATen, which is essentially a giant virtual

				// dispatch table for every TH function we support dynamically dispatching over.

				//

				// NB: We do not actually dispatch to *operators* here, the usual pattern is for

				// ATen operators to call this mechanism for their implementation, but the

				// operator itself is declared separately (e.g. as a native function "wrapper").

				//

				// Q: Why don't we just use LegacyTypeDispatch here?

				// A: Mainly separation of concerns:

				//   1) Type is for implementation of operators, which requires codegen of

				//      Variables, JIT, etc.  That is handled by the native function "wrappers";

				//      just calling into TH does not require that.

				//   2) Type does not require scalar-specific dispatch, whereas calling into TH

				//      does.  Thus, this separation allows us to evolve operator dispatch

				//      separately (i.e. to use the C10 dispatcher) from details of how to

				//      call TH functionality.

				//

				// The implmentation here is very similar to the LegacyTypeDispatch design, with

				// the following simplications:

				// 1) This is not required for a mobile build, so does not have to live in /core.

				// 2) Because these only contain function implementations, we do not have to

				//    handle the Variable/Tensor split; that is handled at the native function

				//    "wrapper" level.

				// 3) Because an operator must have been previously dispatched via the Type

				//    mechanism, we do need to handle device initialization.  This means it is

				//    WRONG to call directly into these functions without first going through

				//    Type dispatch (i.e. the usual operator -> Type -> LegacyTHDispatch pattern).

				// 4) Because an operator must have been previously dispatched via the Type

				//    mechanism, we do not need to handle undefined Tensors.

				//

				// NB: We don't use Registry for this, because we don't want to

				// pay for a hash table lookup every time we do an operation.

				//

				// NB: we can delete this when we don't call into any TH implementations.

				#include <c10/core/Backend.h>

				#include <c10/core/ScalarType.h>

				#include <ATen/LegacyTHDispatcher.h>

				namespace at {

				struct Type;

				struct CAFFE2_API LegacyTHDispatcherDeleter {

				  using LegacyTHDispatcherDeleterFun = void(LegacyTHDispatcher*);

				  LegacyTHDispatcherDeleterFun *fn_ = nullptr;

				  LegacyTHDispatcherDeleter() {}

				  /* implicit */ LegacyTHDispatcherDeleter(LegacyTHDispatcherDeleterFun *fn) : fn_(fn) {}

				  void operator()(LegacyTHDispatcher * ptr) {

				    if (fn_) {

				      (*fn_)(ptr);

				    }

				  }

				};

				class CAFFE2_API LegacyTHDispatch {

				 public:

				  using LegacyTHDispatcherUniquePtr = std::unique_ptr<LegacyTHDispatcher, LegacyTHDispatcherDeleter>;

				  // WARNING: This function has the precondition that you have

				  // initialized the type you want to call.  This initialization

				  // step is generally done by Context, or assumed because you

				  // have a Tensor and thus the Type of that Tensor must already

				  // be initialized.

				  void registerDispatcher(Backend b, ScalarType s, LegacyTHDispatcherUniquePtr&& t) {

				    dispatcher_registry[static_cast<int>(b)][static_cast<int>(s)] = std::move(t);

				  }

				  LegacyTHDispatcher* getLegacyTHDispatcherRaw(Backend p, ScalarType s) {

				    return dispatcher_registry[static_cast<int>(p)][static_cast<int>(s)].get();

				  }

				  LegacyTHDispatcher & getLegacyTHDispatcher(Backend p, ScalarType s) {

				    auto* type = getLegacyTHDispatcherRaw(p, s);

				    if (!type) AT_ERROR(toString(p), toString(s), "THDispatcher is not enabled.");

				    return *type;

				  }

				private:

				  // NB: dispatcher_registry has nullptr for all CUDA backends until

				  // CUDA initialization has occurred

				  LegacyTHDispatcherUniquePtr dispatcher_registry

				    [static_cast<int>(Backend::NumOptions)]

				    [static_cast<int>(ScalarType::NumOptions)];

				};

				CAFFE2_API LegacyTHDispatch& globalLegacyTHDispatch();

				} // namespace at

									
										2

aten/src/ATen/MatrixRef.h
									
												View File
												
				@ -1,6 +1,6 @@

				#pragma once

				#include <ATen/Utils.h>

				#include <ATen/core/ArrayRef.h>

				#include <c10/util/ArrayRef.h>

				#include <vector>

									
										2

aten/src/ATen/OptionsGuard.h
									
												View File
											
				@ -1,2 +0,0 @@

				#pragma once

				#include <ATen/core/OptionsGuard.h>

									
										42

aten/src/ATen/Parallel.h
									
												View File
												
				@ -1,6 +1,8 @@

				#pragma once

				#include <ATen/ATen.h>

				#include <atomic>

				#include <cstddef>

				#include <exception>

				#ifdef _OPENMP

				#include <omp.h>

				@ -20,6 +22,30 @@ inline int64_t divup(int64_t x, int64_t y) {

				  return (x + y - 1) / y;

				}

				inline int get_max_threads() {

				#ifdef _OPENMP

				  return omp_get_max_threads();

				#else

				  return 1;

				#endif

				}

				inline int get_thread_num() {

				#ifdef _OPENMP

				  return omp_get_thread_num();

				#else

				  return 0;

				#endif

				}

				inline bool in_parallel_region() {

				#ifdef _OPENMP

				  return omp_in_parallel();

				#else

				  return false;

				#endif

				}

				template <class F>

				inline void parallel_for(

				    const int64_t begin,

				@ -27,14 +53,26 @@ inline void parallel_for(

				    const int64_t grain_size,

				    const F& f) {

				#ifdef _OPENMP

				  std::atomic_flag err_flag = ATOMIC_FLAG_INIT;

				  std::exception_ptr eptr;

				#pragma omp parallel if (!omp_in_parallel() && ((end - begin) >= grain_size))

				  {

				    int64_t num_threads = omp_get_num_threads();

				    int64_t tid = omp_get_thread_num();

				    int64_t chunk_size = divup((end - begin), num_threads);

				    int64_t begin_tid = begin + tid * chunk_size;

				    if (begin_tid < end)

				      f(begin_tid, std::min(end, chunk_size + begin_tid));

				    if (begin_tid < end) {

				      try {

				        f(begin_tid, std::min(end, chunk_size + begin_tid));

				      } catch (...) {

				        if (!err_flag.test_and_set()) {

				          eptr = std::current_exception();

				        }

				      }

				    }

				  }

				  if (eptr) {

				    std::rethrow_exception(eptr);

				  }

				#else

				  if (begin < end) {

									
										2

aten/src/ATen/Registry.h
									
												View File
											
				@ -1,2 +0,0 @@

				#pragma once

				#include <ATen/core/Registry.h>

									
										60

aten/src/ATen/Retainable.h
									
												View File
											
				@ -1,60 +0,0 @@

				#pragma once

				#include <atomic>

				#include "ATen/core/ATenGeneral.h"

				namespace at {

				// base class for refcounted things, allows for collects of generic

				// refcounted objects that include tensors

				struct AT_API Retainable {

				  Retainable(): refcount(1), weak_refcount(1) {}

				  void retain() {

				    ++refcount;

				  }

				  void release() {

				    if(--refcount == 0) {

				      // If we know that this is the last reference then we can skip

				      // all the decrements and release_resources().

				      if (weak_refcount == 1) {

				        delete this;

				      } else {

				        release_resources();

				        weak_release();

				      }

				    }

				  }

				  void weak_retain() {

				    ++weak_refcount;

				  }

				  void weak_release() {

				    if (--weak_refcount == 0) {

				      delete this;

				    }

				  }

				  bool weak_lock() {

				    for (;;) {

				      auto current_refcount = refcount.load();

				      if (current_refcount == 0) return false;

				      if (refcount.compare_exchange_strong(current_refcount, current_refcount + 1)) break;

				    }

				    return true;

				  }

				  uint32_t use_count() const {

				    return refcount.load();

				  }

				  uint32_t weak_use_count() const {

				    return weak_refcount.load();

				  }

				  virtual void release_resources() {};

				  virtual ~Retainable() {}

				private:

				  // INVARIANT: once refcount reaches 0 it can never go up

				  // INVARIANT: weak_refcount = number of weak references + (refcount > 0 ? 1 : 0)

				  std::atomic<uint32_t> refcount;

				  std::atomic<uint32_t> weak_refcount;

				};

				}

									
										11

aten/src/ATen/ScalarOps.h
									
												View File
												
				@ -1,18 +1,19 @@

				#pragma once

				#include "ATen/core/Scalar.h"

				#include <c10/core/Scalar.h>

				#include "ATen/Tensor.h"

				namespace at {

				// This is in the c10 namespace because we use ADL to find the functions in it.

				namespace c10 {

				// FIXME: this should be (and was) Scalar::toTensor, but there is currently no way

				// to implement this without going through Derived Types (which are not part of core).

				inline Tensor scalar_to_tensor(Scalar s) {

				inline at::Tensor scalar_to_tensor(Scalar s) {

				  if (s.isFloatingPoint()) {

				    return CPU(kDouble).scalarTensor(s);

				    return at::CPU(kDouble).scalarTensor(s);

				  } else {

				    AT_ASSERT(s.isIntegral());

				    return CPU(kLong).scalarTensor(s);

				    return at::CPU(kLong).scalarTensor(s);

				  }

				}

									
										4

aten/src/ATen/ScalarType.h
									
												View File
												
				@ -1,4 +1,4 @@

				#pragma once

				#include <ATen/core/ATenGeneral.h> // for BC reasons

				#include <ATen/core/Backend.h>

				#include <ATen/core/ScalarType.h>

				#include <c10/core/Backend.h>

				#include <c10/core/ScalarType.h>

									
										2

aten/src/ATen/SmallVector.h
									
												View File
												
				@ -1,2 +1,2 @@

				#pragma once

				#include <ATen/core/SmallVector.h>

				#include <c10/util/SmallVector.h>

									
										41

aten/src/ATen/SparseTensorImpl.cpp
									
												View File
												
				@ -1,14 +1,16 @@

				#include <ATen/ATen.h>

				#include <ATen/SparseTensorImpl.h>

				#include <ATen/InitialTensorOptions.h>

				#include <ATen/core/LegacyTypeDispatch.h>

				namespace at {

				namespace {

				  Backend sparseTensorIdToDenseBackend(TensorTypeId type_id) {

				  DeviceType sparseTensorIdToDeviceType(TensorTypeId type_id) {

				    if (type_id == SparseCPUTensorId()) {

				      return Backend::CPU;

				      return kCPU;

				    } else if (type_id == SparseCUDATensorId()) {

				      return Backend::CUDA;

				      return kCUDA;

				    } else {

				      AT_ERROR("Cannot construct SparseTensor with non-sparse tensor type ID ", type_id);

				    }

				@ -21,20 +23,20 @@ namespace {

				// a scalar and have one element)

				//

				// Thus, an empty sparse tensor should be a 1-dimensional tensor of size [0].

				// Furthermore, we have dim == sparseDims + denseDims; since this is a sparse

				// tensor, let us say that an empty sparse tensor has sparseDims == 1 and

				// denseDims == 0.  (There is a degree of freedom here, but given that this

				// is a sparse dimension, it seems reasonable to demand that sparseDims > 0).

				// Furthermore, we have dim == sparse_dim + dense_dim; since this is a sparse

				// tensor, let us say that an empty sparse tensor has sparse_dim == 1 and

				// dense_dim == 0.  (There is a degree of freedom here, but given that this

				// is a sparse dimension, it seems reasonable to demand that sparse_dim > 0).

				//

				// This means that we allocate a [1,0] size indices tensor and a [0] size

				// values tensor for such an empty tensor.

				SparseTensorImpl::SparseTensorImpl(at::TensorTypeId type_id, const caffe2::TypeMeta& data_type)

				    : TensorImpl(type_id, data_type, nullptr, false)

				    , size_{0}

				    , sparseDims_(1)

				    , denseDims_(0)

				    , indices_(globalContext().getNonVariableTypeOpt(sparseTensorIdToDenseBackend(type_id), ScalarType::Long)->tensor({1, 0}))

				    , values_(globalContext().getNonVariableTypeOpt(sparseTensorIdToDenseBackend(type_id), dataTypeToScalarType(data_type.id()))->tensor()) {}

				    , sparse_dim_(1)

				    , dense_dim_(0)

				    , indices_(at::empty({1, 0}, at::initialTensorOptions().device(sparseTensorIdToDeviceType(type_id)).dtype(ScalarType::Long)))

				    , values_(at::empty({0}, at::initialTensorOptions().device(sparseTensorIdToDeviceType(type_id)).dtype(data_type))) {}

				IntList SparseTensorImpl::sizes() const {

				  return size_;

				@ -66,7 +68,7 @@ void SparseTensorImpl::set_storage_offset(int64_t storage_offset) {

				}

				int64_t SparseTensorImpl::dim() const {

				  return sparseDims_ + denseDims_;

				  return sparse_dim_ + dense_dim_;

				}

				TensorImpl* SparseTensorImpl::maybe_zero_dim(bool condition_when_zero_dim) {

				  AT_CHECK(condition_when_zero_dim == (dim() == 0),

				@ -82,17 +84,22 @@ int64_t SparseTensorImpl::storage_offset() const {

				  AT_ERROR("sparse tensors do not have storage");

				}

				void SparseTensorImpl::set_indices_and_values_unsafe(const Tensor& indices, const Tensor& values) {

				  AT_CHECK(values.type().toSparse() == type(), "values type must match sparse tensor type");

				  AT_ASSERT(!indices.is_variable() && !values.is_variable());  // They should be plain tensors!

				  AT_CHECK(!indices.is_sparse(), "expected indices to be a dense tensor, but got indices of layout ", indices.layout());

				  AT_CHECK(!values.is_sparse(), "expected values to be a dense tensor, but got values of layout ", values.layout());

				  AT_CHECK(values.type().toSparse() == legacyTensorType(*this), "values type must match sparse tensor type");

				  AT_CHECK(indices.type().scalarType() == kLong, "indices must be an int64 tensor");

				  AT_CHECK(indices.type().backend() == values.type().backend(), "backend of indices (", indices.type().backend(), ") must match backend of values (", values.type().backend(), ")");

				  AT_CHECK(!indices.is_cuda() || indices.get_device() == values.get_device(), "device of indices (", indices.get_device(), ") must match device of values (", values.get_device(), ")");

				  AT_CHECK(indices.dim() == 2, "indices must be nDim x nnz, but got: ", indices.sizes());

				  AT_CHECK(indices.dim() == 2, "indices must be sparse_dim x nnz, but got: ", indices.sizes());

				  AT_CHECK(indices.size(1) == values.size(0), "indices and values must have same nnz, but got nnz from indices: ", indices.size(1), ", nnz from values: ", values.size(0));

				  AT_CHECK(indices.size(0) == sparseDims_, "indices has incorrect first dimension, expected ", sparseDims_, ", got ", indices.size(0));

				  AT_CHECK(values.dim() == denseDims_ + 1, "values has incorrect number of dimensions, expected ", denseDims_ + 1, ", got ", values.dim());

				  AT_CHECK(indices.size(0) == sparse_dim_, "indices has incorrect first dimension, expected ", sparse_dim_, ", got ", indices.size(0));

				  AT_CHECK(values.dim() == dense_dim_ + 1, "values has incorrect number of dimensions, expected ", dense_dim_ + 1, ", got ", values.dim());

				  auto dense_size_original = sizes().slice(sparseDims_);

				  auto dense_size_original = sizes().slice(sparse_dim_);

				  std::vector<int64_t> expected_values_size_vec = {values.size(0)};

				  expected_values_size_vec.insert(expected_values_size_vec.end(), dense_size_original.begin(), dense_size_original.end());

				  IntList expected_values_size(expected_values_size_vec);

									
										105

aten/src/ATen/SparseTensorImpl.h
									
												View File
												
				@ -2,25 +2,25 @@

				#include "ATen/Tensor.h"

				#include "ATen/core/TensorImpl.h"

				#include "ATen/core/Error.h"

				#include "c10/util/Exception.h"

				namespace at {

				struct AT_API SparseTensorImpl : public TensorImpl {

				struct CAFFE2_API SparseTensorImpl : public TensorImpl {

				  // Stored in COO format, indices + values.

				  // INVARIANTS:

				  // _sparseDims: range [0, len(shape)]; _sparseDims + _denseDims = len(shape)

				  // _denseDims : range [0, len(shape)]; _sparseDims + _denseDims = len(shape)

				  // _indices.shape: dimensionality: 2,  shape: (_sparseDims, nnz)

				  // _values.shape:  dimensionality: 1 + _denseDims.  shape: (nnz, shape[_sparseDims:])

				  // sparse_dim: range [0, len(shape)]; sparse_dim + dense_dim = len(shape)

				  // dense_dim : range [0, len(shape)]; sparse_dim + dense_dim = len(shape)

				  // _indices.shape: dimensionality: 2,  shape: (sparse_dim, nnz)

				  // _values.shape:  dimensionality: 1 + dense_dim.  shape: (nnz, shape[sparse_dim:])

				  // The true size of the sparse tensor (e.g., if you called to_dense()

				  // on it).  When THTensor merges into TensorImpl, this field

				  // should move to the parent class.

				  std::vector<int64_t> size_;

				  int64_t sparseDims_ = 0; // number of sparse dimensions

				  int64_t denseDims_ = 0; // number of dense dimensions

				  int64_t sparse_dim_ = 0; // number of sparse dimensions

				  int64_t dense_dim_ = 0; // number of dense dimensions

				  Tensor indices_; // always a LongTensor

				  Tensor values_;

				@ -39,8 +39,8 @@ public:

				  explicit SparseTensorImpl(at::TensorTypeId, const caffe2::TypeMeta&);

				  int64_t nnz() const { return values_.size(0); }

				  int64_t sparseDims() const { return sparseDims_; }

				  int64_t denseDims() const { return denseDims_; }

				  int64_t sparse_dim() const { return sparse_dim_; }

				  int64_t dense_dim() const { return dense_dim_; }

				  bool coalesced() const { return coalesced_; }

				  Tensor indices() const { return indices_; }

				  Tensor values() const { return values_; }

				@ -60,16 +60,16 @@ public:

				  const Storage& storage() const override;

				  int64_t storage_offset() const override;

				  // WARNING: This function does NOT preserve invariants of sparseDims/denseDims with

				  // WARNING: This function does NOT preserve invariants of sparse_dim/dense_dim with

				  // respect to indices and values

				  void raw_resize_(int64_t sparseDims, int64_t denseDims, IntList size) {

				  void raw_resize_(int64_t sparse_dim, int64_t dense_dim, IntList size) {

				    size_ = size.vec();

				    sparseDims_ = sparseDims;

				    denseDims_ = denseDims;

				    sparse_dim_ = sparse_dim;

				    dense_dim_ = dense_dim;

				    refresh_numel();

				  }

				  // NOTE: This function preserves invariants of sparseDims/denseDims with respect to

				  // NOTE: This function preserves invariants of sparse_dim/dense_dim with respect to

				  // indices and values.

				  //

				  // NOTE: This function supports the following cases:

				@ -91,36 +91,36 @@ public:

				  // and for API consistency we don't support it).

				  // 4. When we attempt to shrink the size of any of the sparse dimensions on a non-empty sparse tensor

				  // (this could make some of the stored indices out-of-bound and thus unsafe).

				  void resize_(int64_t sparseDims, int64_t denseDims, IntList size) {

				    AT_CHECK(sparseDims + denseDims == size.size(), "number of dimensions must be sparseDims (", sparseDims, ") + denseDims (", denseDims, "), but got ", size.size());

				  void resize_(int64_t sparse_dim, int64_t dense_dim, IntList size) {

				    AT_CHECK(sparse_dim + dense_dim == size.size(), "number of dimensions must be sparse_dim (", sparse_dim, ") + dense_dim (", dense_dim, "), but got ", size.size());

				    if (nnz() > 0) {

				      auto alt_options_msg = "You could try the following options:\n\

				1. If you need an empty sparse tensor of this size, call `x=torch.sparse_coo_tensor(size)`.\n\

				1. If you need an empty sparse tensor of this size, call `x = torch.sparse_coo_tensor(size)`.\n\

				2. If you need to resize this tensor, you have the following options:\n\

				    1. For both sparse and dense dimensions, keep the number of them constant and the size of them non-shrinking, and then try the same call again.\n\

				    2. Or, create a new sparse tensor with the correct indices and values from this sparse tensor.";

				      AT_CHECK(sparseDims == sparseDims_,

				        "changing the number of sparse dimensions (from ", sparseDims_, " to ", sparseDims, ") on a non-empty sparse tensor is not supported.\n", alt_options_msg);

				      AT_CHECK(sparse_dim == sparse_dim_,

				        "changing the number of sparse dimensions (from ", sparse_dim_, " to ", sparse_dim, ") on a non-empty sparse tensor is not supported.\n", alt_options_msg);

				      AT_CHECK(denseDims == denseDims_,

				        "changing the number of dense dimensions (from ", denseDims_, " to ", denseDims, ") on a non-empty sparse tensor is not supported.\n", alt_options_msg);

				      AT_CHECK(dense_dim == dense_dim_,

				        "changing the number of dense dimensions (from ", dense_dim_, " to ", dense_dim, ") on a non-empty sparse tensor is not supported.\n", alt_options_msg);

				      bool shrinking_sparse_dims = false;

				      bool shrinking_dense_dims = false;

				      auto sparse_size_original = sizes().slice(0, sparseDims);

				      auto sparse_size_new = size.slice(0, sparseDims);

				      for (int i = 0; i < sparseDims; i++) {

				      bool shrinking_dense_dim = false;

				      auto sparse_size_original = sizes().slice(0, sparse_dim);

				      auto sparse_size_new = size.slice(0, sparse_dim);

				      for (int i = 0; i < sparse_dim; i++) {

				        if (sparse_size_new[i] < sparse_size_original[i]) {

				          shrinking_sparse_dims = true;

				          break;

				        }

				      }

				      auto dense_size_original = sizes().slice(sparseDims);

				      auto dense_size_new = size.slice(sparseDims);

				      for (int i = 0; i < denseDims; i++) {

				      auto dense_size_original = sizes().slice(sparse_dim);

				      auto dense_size_new = size.slice(sparse_dim);

				      for (int i = 0; i < dense_dim; i++) {

				        if (dense_size_new[i] < dense_size_original[i]) {

				          shrinking_dense_dims = true;

				          shrinking_dense_dim = true;

				          break;

				        }

				      }

				@ -128,40 +128,38 @@ public:

				      AT_CHECK(!shrinking_sparse_dims,

				        "shrinking the size of sparse dimensions (from ", sparse_size_original, " to ", sparse_size_new, ") on a non-empty sparse tensor is not supported.\n", alt_options_msg);

				      AT_CHECK(!shrinking_dense_dims,

				      AT_CHECK(!shrinking_dense_dim,

				        "shrinking the size of dense dimensions (from ", dense_size_original, " to ", dense_size_new, ") on a non-empty sparse tensor is not supported.\n", alt_options_msg);

				    }

				    if ((!size.equals(size_)) || (sparseDims != sparseDims_) || (denseDims != denseDims_)) {

				      std::vector<int64_t> values_size = {values().size(0)};

				      auto dense_size = size.slice(sparseDims);

				    if ((!size.equals(size_)) || (sparse_dim != sparse_dim_) || (dense_dim != dense_dim_)) {

				      auto nnz = values().size(0);

				      std::vector<int64_t> values_size = {nnz};

				      auto dense_size = size.slice(sparse_dim);

				      values_size.insert(values_size.end(), dense_size.begin(), dense_size.end());

				      values_.resize_(values_size);

				      std::vector<int64_t> indices_size = indices().sizes().vec();

				      indices_size[0] = sparseDims;

				      indices_.resize_(indices_size);

				      indices_.resize_({sparse_dim, nnz});

				    }

				    size_ = size.vec();

				    sparseDims_ = sparseDims;

				    denseDims_ = denseDims;

				    sparse_dim_ = sparse_dim;

				    dense_dim_ = dense_dim;

				    refresh_numel();

				  }

				  // NOTE: this function will resize the sparse tensor and also set `indices` and `values` to empty.

				  void resize_and_clear_(int64_t sparseDims, int64_t denseDims, IntList size) {

				    AT_CHECK(sparseDims + denseDims == size.size(), "number of dimensions must be sparseDims (", sparseDims, ") + denseDims (", denseDims, "), but got ", size.size());

				  void resize_and_clear_(int64_t sparse_dim, int64_t dense_dim, IntList size) {

				    AT_CHECK(sparse_dim + dense_dim == size.size(), "number of dimensions must be sparse_dim (", sparse_dim, ") + dense_dim (", dense_dim, "), but got ", size.size());

				    size_ = size.vec();

				    sparseDims_ = sparseDims;

				    denseDims_ = denseDims;

				    sparse_dim_ = sparse_dim;

				    dense_dim_ = dense_dim;

				    auto empty_indices = indices().type().tensor({sparseDims, 0});

				    auto empty_indices = at::empty({sparse_dim, 0}, indices().options());

				    std::vector<int64_t> values_size = {0};

				    auto dense_size = sizes().slice(sparseDims);

				    auto dense_size = sizes().slice(sparse_dim);

				    values_size.insert(values_size.end(), dense_size.begin(), dense_size.end());

				    auto empty_values = values().type().tensor(values_size);

				    auto empty_values = at::empty(values_size, values().options());

				    set_indices_and_values_unsafe(empty_indices, empty_values);

				    refresh_numel();

				  }

				@ -169,9 +167,10 @@ public:

				  void set_coalesced(bool coalesced) { coalesced_ = coalesced; }

				  // NOTE: this function is only used internally and not exposed to Python frontend

				  void set_nnz_and_narrow(int64_t nnz) {

				    indices_ = indices_.narrow(1, 0, nnz);

				    values_ = values_.narrow(0, 0, nnz);

				  void set_nnz_and_narrow(int64_t new_nnz) {

				    AT_ASSERT(new_nnz <= nnz());

				    indices_ = indices_.narrow(1, 0, new_nnz);

				    values_ = values_.narrow(0, 0, new_nnz);

				  }

				  // Takes indices and values and directly puts them into the sparse tensor, no copy.

				@ -182,6 +181,12 @@ public:

				  // NB: This used to be able to avoid a refcount bump, but I was too lazy to

				  // make it happen

				  void set_indices_and_values_unsafe(const Tensor& indices, const Tensor& values);

				 private:

				  int64_t get_device_slow() const override {

				    return values_.get_device();

				  }

				};

				} // namespace at

									
										125

aten/src/ATen/SparseTensorUtils.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,125 @@

				#include <ATen/ATen.h>

				#include <ATen/SparseTensorImpl.h>

				namespace at { namespace sparse {

				// Just for documentary purposes

				using SparseTensor = Tensor;

				using LongTensor = Tensor;

				using IntTensor = Tensor;

				using SparseType = Type;

				// This is an internal utility function for getting at the SparseTensorImpl,

				// so that we can write sparse tensor specific accessors for special fields

				// in SparseTensor.  You should only use this for writing low level

				// setters/getters for SparseTensorImpl fields; otherwise, you should use

				// the low level setters/getters that were implemented using this.

				//

				// This may be called repeatedly, so make sure it's pretty cheap.

				inline SparseTensorImpl* get_sparse_impl(const SparseTensor& self) {

				  AT_ASSERTM(!self.is_variable(), "_internal_get_SparseTensorImpl: should not be a variable");

				  AT_ASSERTM(self.is_sparse(), "_internal_get_SparseTensorImpl: not a sparse tensor");

				  return static_cast<SparseTensorImpl*>(self.unsafeGetTensorImpl());

				}

				// Takes indices and values and directly puts them into the sparse tensor, no

				// copy.  This used to be called THSTensor_(_move)

				inline void alias_into_sparse(const SparseTensor& self, const LongTensor& indices, const Tensor& values) {

				  get_sparse_impl(self)->set_indices_and_values_unsafe(indices, values);

				}

				// Take indices and values and makes a (data) copy of them to put into the sparse

				// indices/values.  This used to be called THSTensor_(_set)

				inline void copy_into_sparse(const SparseTensor& self, const LongTensor& indices, const Tensor& values, bool non_blocking) {

				  alias_into_sparse(self, self._indices().type().copy(indices, non_blocking), self._values().type().copy(values, non_blocking));

				}

				// TODO: put this into the public API

				inline bool is_same_tensor(const Tensor& lhs, const Tensor& rhs) {

				  return lhs.unsafeGetTensorImpl() == rhs.unsafeGetTensorImpl();

				}

				inline bool is_same_density(const SparseTensor& self, const SparseTensor& src) {

				  return self.sparse_dim() == src.sparse_dim() && self.dense_dim() == src.dense_dim();

				}

				// Give us a new values tensor, with the same dimensionality

				// as 'values' but with a new number of non-zero elements.

				// TODO: Expose this for real in ATen, some day?

				// NB: Doesn't preserve data.

				inline Tensor new_values_with_size_of(const Tensor& values, int64_t nnz) {

				  std::vector<int64_t> size = values.sizes().vec();

				  size[0] = nnz;

				  return at::empty(size, values.options());

				}

				// NOTE [ Flatten Sparse Indices ]

				// This helper function flattens a sparse indices tensor (a LongTensor) into a 1D

				// indices tensor. E.g.,

				//   input = [[2, 4, 0],

				//            [3, 1, 10]]

				//   full_size = [2, 12]

				//   output = [ 2 * 12 + 3, 4 * 12 + 1, 0 * 12 + 10 ] = [27, 49, 10]

				//

				// In other words, assuming that each `indices[i, :]` is a valid index to a

				// tensor `t` of shape `full_size`. This returns the corresponding indices to

				// the flattened tensor `t.reshape( prod(full_size[:indices.size(0)]), -1 )`.

				// if forceClone is true, the result will forced to be a clone of self.

				// if force_clone is true, the result will forced to be a clone of self.

				inline LongTensor flatten_indices(const Tensor& indices, IntList full_size, bool force_clone = false) {

				  int64_t sparse_dim = indices.size(0);

				  if (sparse_dim == 1) {

				    if (force_clone) {

				      return indices.squeeze(0).clone();

				    } else {

				      return indices.squeeze(0);

				    }

				  } else {

				    std::vector<int64_t> indices_mult_cpu_vec;

				    indices_mult_cpu_vec.reserve(sparse_dim);

				    int64_t mult = 1;

				    for (int64_t i = sparse_dim - 1; i >= 0; i--) {

				      indices_mult_cpu_vec[i] = mult;

				      mult *= full_size[i];

				    }

				    auto indices_mult_cpu = indices.type().cpu()

				                                   .tensorFromBlob(indices_mult_cpu_vec.data(), /*size=*/{sparse_dim, 1});

				    // NB: must be blocking because this blob may be freed after this closure,

				    //     and non_blocking copy will see garbage.

				    auto indices_mult = indices_mult_cpu.to(indices.device(), /*non_blocking=*/false);

				    // Ideally we want matmul but matmul is slow on CPU Long and not implemented

				    // on CUDA Long. So mul is faster.

				    return indices.mul(indices_mult).sum(0);

				  }

				}

				// Flatten sparse tensor's indices from nD to 1D, similar to NOTE [ Flatten Sparse Indices ],

				// except this one allows partial flatten: only flatten on specified dims. Note that

				// the flatten indices might be uncoalesced if dims_to_flatten.size() < sparse_dim.

				// Also if input indices is already coalesced, the flattened indices will also be sorted.

				//

				// args:

				//    indices: sparse tensor indices

				//    sizes: sparse tensor sizes

				//    dims_to_flatten: a list of dim index to flatten

				//

				// Ex1:

				//   indices = [[2, 4, 0],

				//             [3, 1, 3]]

				//   sizes = [2, 12]

				//   dims_to_flatten = [0, 1]

				//   new_indices = [ 2 * 12 + 3, 4 * 12 + 1, 0 * 12 + 3 ] = [27, 49, 3]

				//

				// Ex2:

				//   dims_to_flatten = [1]

				//   new_indices = [ 3, 1, 3 ]  # uncoalesced

				inline LongTensor flatten_indices_by_dims(const LongTensor& indices, const IntList& sizes, const IntList& dims_to_flatten){

				  LongTensor new_indices = at::zeros({indices.size(1)}, indices.options());

				  for (auto d : dims_to_flatten) {

				    new_indices.mul_(sizes[d]);

				    new_indices.add_(indices.select(0, d));

				  }

				  return new_indices;

				}

				}} // namespace at::sparse

									
										2

aten/src/ATen/Storage.h
									
												View File
												
				@ -1,2 +1,2 @@

				#pragma once

				#include <ATen/core/Storage.h>

				#include <c10/core/Storage.h>

									
										2

aten/src/ATen/StorageImpl.h
									
												View File
											
				@ -1,2 +0,0 @@

				#pragma once

				#include <ATen/core/StorageImpl.h>

									
										4

aten/src/ATen/TensorGeometry.cpp
									
												View File
												
				@ -12,8 +12,4 @@ bool TensorGeometry::is_contiguous() const {

				  return at::geometry_is_contiguous(sizes_, strides_);

				}

				Tensor TensorGeometry::zeros_with_stride(const Type& type) const {

				  return type.tensor(sizes_, strides_).zero_();

				}

				} // namespace at

									
										5

aten/src/ATen/TensorGeometry.h
									
												View File
												
				@ -5,7 +5,7 @@

				namespace at {

				struct AT_API TensorGeometry {

				struct CAFFE2_API TensorGeometry {

				  TensorGeometry() : storage_offset_(0) {}

				  explicit TensorGeometry(IntList sizes)

				@ -30,9 +30,6 @@ struct AT_API TensorGeometry {

				  // true if the tensor is contiguous

				  bool is_contiguous() const;

				  // creates a new tensor with the sizes and strides of the source

				  Tensor zeros_with_stride(const Type& type) const;

				  int64_t dim() const { return sizes_.size(); }

				  int64_t size(int64_t dim) const {

				    dim = maybe_wrap_dim(dim, this->dim());

									
										2

aten/src/ATen/TensorImpl.h
									
												View File
											
				@ -1,2 +0,0 @@

				#pragma once

				#include <ATen/core/TensorImpl.h>

									
										10

aten/src/ATen/TensorOperators.h
									
												View File
												
				@ -1,6 +1,6 @@

				#pragma once

				#include "ATen/core/Scalar.h"

				#include <c10/core/Scalar.h>

				#include "ATen/Tensor.h"

				#include "ATen/Type.h"

				@ -59,7 +59,7 @@ inline Tensor Tensor::operator[](Tensor index) const {

				      index.dim() == 0,

				      "Can only index with tensors that are scalars (zero-dim)");

				  // The Scalar(Tensor) constructor is explicit, so we need to call it.

				  return this->operator[](index._local_scalar());

				  return this->operator[](index.item());

				}

				inline Tensor Tensor::operator[](int64_t index) const {

				  return select(0, index);

				@ -68,9 +68,9 @@ inline Tensor Tensor::operator[](int64_t index) const {

				#define AT_FORALL_BINARY_OPS(_) \

				_(+,x.add(y), y.add(x)) \

				_(*,x.mul(y), y.mul(x)) \

				_(-,x.sub(y), y.type().tensor().resize_(y.sizes()).fill_(x).sub_(y)) \

				_(/,x.div(y), y.type().tensor().resize_(y.sizes()).fill_(x).div_(y)) \

				_(%,x.remainder(y), y.type().tensor().resize_(y.sizes()).fill_(x).remainder_(y)) \

				_(-,x.sub(y), ::at::empty(y.sizes(), y.options()).fill_(x).sub_(y)) \

				_(/,x.div(y), ::at::empty(y.sizes(), y.options()).fill_(x).div_(y)) \

				_(%,x.remainder(y), ::at::empty(y.sizes(), y.options()).fill_(x).remainder_(y)) \

				_(<,x.lt(y), y.gt(x)) \

				_(<=,x.le(y), y.ge(x)) \

				_(>,x.gt(y),y.lt(x)) \

									
										94

aten/src/ATen/TensorUtils.h
									
												View File
												
				@ -12,7 +12,7 @@ namespace at {

				// make sense.  These are particularly useful for native functions,

				// which do NO argument checking by default.

				struct AT_API TensorArg {

				struct CAFFE2_API TensorArg {

				  Tensor tensor;

				  const char* name;

				  int pos; // 1-indexed

				@ -22,7 +22,7 @@ struct AT_API TensorArg {

				  const Tensor& operator*() const { return tensor; }

				};

				struct AT_API TensorGeometryArg {

				struct CAFFE2_API TensorGeometryArg {

				  TensorGeometry tensor;

				  const char* name;

				  int pos; // 1-indexed

				@ -49,40 +49,80 @@ using CheckedFrom = const char*;

				// not TensorGeometryArg, because the Tensor to TensorGeometry

				// conversion will blow up if you have undefined tensors.

				AT_API std::ostream& operator<<(std::ostream & out, TensorGeometryArg t);

				AT_API void checkDim(CheckedFrom c, const TensorGeometryArg& t, int64_t dim);

				CAFFE2_API std::ostream& operator<<(std::ostream& out, TensorGeometryArg t);

				CAFFE2_API void checkDim(

				    CheckedFrom c,

				    const TensorGeometryArg& t,

				    int64_t dim);

				// NB: this is an inclusive-exclusive range

				AT_API void checkDimRange(CheckedFrom c, const TensorGeometryArg& t, int64_t dim_start, int64_t dim_end);

				AT_API void checkSameDim(CheckedFrom c, const TensorGeometryArg& t1, const TensorGeometryArg& t2);

				AT_API void checkContiguous(CheckedFrom c, const TensorGeometryArg& t);

				AT_API void checkAllContiguous(CheckedFrom c, at::ArrayRef<TensorArg> ts);

				AT_API void checkSize(CheckedFrom c, const TensorGeometryArg& t, IntList sizes);

				AT_API void checkSize(CheckedFrom c, const TensorGeometryArg& t, int64_t dim, int64_t size);

				AT_API void checkNumel(CheckedFrom c, const TensorGeometryArg& t, int64_t numel);

				AT_API void checkSameNumel(CheckedFrom c, const TensorGeometryArg& t1, const TensorGeometryArg& t2);

				AT_API void checkAllSameNumel(CheckedFrom c, ArrayRef<TensorArg> tensors);

				AT_API void checkScalarType(CheckedFrom c, const TensorArg& t, ScalarType s);

				AT_API void checkScalarTypes(CheckedFrom c, const TensorArg& t, at::ArrayRef<ScalarType> l);

				AT_API void checkSameGPU(CheckedFrom c, const TensorArg& t1, const TensorArg& t2);

				AT_API void checkAllSameGPU(CheckedFrom c, ArrayRef<TensorArg> tensors);

				AT_API void checkSameType(CheckedFrom c, const TensorArg& t1, const TensorArg& t2);

				AT_API void checkAllSameType(CheckedFrom c, ArrayRef<TensorArg> tensors);

				AT_API void checkSameSize(CheckedFrom c, const TensorArg& t1, const TensorArg& t2);

				AT_API void checkDefined(CheckedFrom c, const TensorArg& t);

				AT_API void checkAllDefined(CheckedFrom c, at::ArrayRef<TensorArg> t);

				CAFFE2_API void checkDimRange(

				    CheckedFrom c,

				    const TensorGeometryArg& t,

				    int64_t dim_start,

				    int64_t dim_end);

				CAFFE2_API void checkSameDim(

				    CheckedFrom c,

				    const TensorGeometryArg& t1,

				    const TensorGeometryArg& t2);

				CAFFE2_API void checkContiguous(CheckedFrom c, const TensorGeometryArg& t);

				CAFFE2_API void checkAllContiguous(CheckedFrom c, at::ArrayRef<TensorArg> ts);

				CAFFE2_API void checkSize(

				    CheckedFrom c,

				    const TensorGeometryArg& t,

				    IntList sizes);

				CAFFE2_API void checkSize(

				    CheckedFrom c,

				    const TensorGeometryArg& t,

				    int64_t dim,

				    int64_t size);

				CAFFE2_API void checkNumel(

				    CheckedFrom c,

				    const TensorGeometryArg& t,

				    int64_t numel);

				CAFFE2_API void checkSameNumel(

				    CheckedFrom c,

				    const TensorGeometryArg& t1,

				    const TensorGeometryArg& t2);

				CAFFE2_API void checkAllSameNumel(CheckedFrom c, ArrayRef<TensorArg> tensors);

				CAFFE2_API void checkScalarType(

				    CheckedFrom c,

				    const TensorArg& t,

				    ScalarType s);

				CAFFE2_API void checkScalarTypes(

				    CheckedFrom c,

				    const TensorArg& t,

				    at::ArrayRef<ScalarType> l);

				CAFFE2_API void checkSameGPU(

				    CheckedFrom c,

				    const TensorArg& t1,

				    const TensorArg& t2);

				CAFFE2_API void checkAllSameGPU(CheckedFrom c, ArrayRef<TensorArg> tensors);

				CAFFE2_API void checkSameType(

				    CheckedFrom c,

				    const TensorArg& t1,

				    const TensorArg& t2);

				CAFFE2_API void checkAllSameType(CheckedFrom c, ArrayRef<TensorArg> tensors);

				CAFFE2_API void checkSameSize(

				    CheckedFrom c,

				    const TensorArg& t1,

				    const TensorArg& t2);

				CAFFE2_API void checkDefined(CheckedFrom c, const TensorArg& t);

				CAFFE2_API void checkAllDefined(CheckedFrom c, at::ArrayRef<TensorArg> t);

				// FixMe: does TensorArg slow things down?

				AT_API void checkBackend(CheckedFrom c, at::ArrayRef<Tensor> t, at::Backend backend);

				CAFFE2_API void checkBackend(

				    CheckedFrom c,

				    at::ArrayRef<Tensor> t,

				    at::Backend backend);

				// Methods for getting data_ptr if tensor is defined

				AT_API void * maybe_data_ptr(const Tensor& tensor);

				AT_API void * maybe_data_ptr(const TensorArg& tensor);

				CAFFE2_API void* maybe_data_ptr(const Tensor& tensor);

				CAFFE2_API void* maybe_data_ptr(const TensorArg& tensor);

				// Return if the tensor geometry represented by `sizes` and `strides` is contiguous

				// Although we cache is_contiguous in tensor now, this is till useful because it

				// allows checking if a particular geometry is contiguous without explicitly

				// constructing a tensor, e.g., when you want to choose a kernel strategy based

				// on whether a subgeometry is contiguous.

				AT_API bool geometry_is_contiguous(IntList sizes, IntList strides);

				CAFFE2_API bool geometry_is_contiguous(IntList sizes, IntList strides);

				}

									
										10

aten/src/ATen/UndefinedType.cpp
									
												View File
												
				@ -1,5 +1,5 @@

				#include "ATen/UndefinedType.h"

				#include "ATen/core/Error.h"

				#include "c10/util/Exception.h"

				namespace at {

				@ -70,12 +70,4 @@ Type & UndefinedType::toScalarType(ScalarType s) const {

				  AT_ERROR("toScalarType not implemented for UndefinedType to non-UndefinedType");

				}

				Tensor & UndefinedType::s_copy_(Tensor & self, const Tensor & src, bool non_blocking) const {

				  AT_ERROR("s_copy not defined for UndefinedType");

				}

				Tensor & UndefinedType::_s_copy_from(const Tensor & self, Tensor & dst, bool non_blocking) const {

				  AT_ERROR("_s_copy_from not defined for UndefinedType");

				}

				}

									
										3

aten/src/ATen/UndefinedType.h
									
												View File
												
				@ -30,9 +30,6 @@ struct UndefinedType final : public TypeDefault {

				  virtual TypeID ID() const override;

				  virtual Storage unsafeStorageFromTH(void * th_pointer, bool retain) const override;

				  virtual Tensor unsafeTensorFromTH(void * th_pointer, bool retain) const override;

				  virtual Tensor & s_copy_(Tensor & self, const Tensor & src, bool non_blocking) const override;

				  virtual Tensor & _s_copy_from(const Tensor & self, Tensor & dst, bool non_blocking) const override;

				};

				} // namespace at

									
										14

aten/src/ATen/Utils.h
									
												View File
												
				@ -1,13 +1,13 @@

				#pragma once

				#include "ATen/core/ATenGeneral.h"

				#include "ATen/StorageImpl.h"

				#include <c10/core/StorageImpl.h>

				#include "ATen/core/UndefinedTensorImpl.h"

				#include <ATen/core/ScalarType.h>

				#include <c10/core/ScalarType.h>

				#include "ATen/Formatting.h"

				#include "ATen/core/ArrayRef.h"

				#include "ATen/core/Error.h"

				#include <c10/util/ArrayRef.h>

				#include <c10/util/Exception.h>

				#include <algorithm>

				#include <sstream>

				@ -24,7 +24,7 @@

				namespace at {

				AT_API int _crash_if_asan(int);

				CAFFE2_API int _crash_if_asan(int);

				static inline const Storage& checked_storage(

				    const Storage& expr,

				@ -113,11 +113,11 @@ std::array<int64_t, N> check_intlist(ArrayRef<int64_t> list, const char * name,

				}

				inline int64_t sum_intlist(ArrayRef<int64_t> list) {

				  return std::accumulate(list.begin(), list.end(), 0);

				  return std::accumulate(list.begin(), list.end(), 0ll);

				}

				inline int64_t prod_intlist(ArrayRef<int64_t> list) {

				  return std::accumulate(list.begin(), list.end(), 1, std::multiplies<int64_t>());

				  return std::accumulate(list.begin(), list.end(), 1ll, std::multiplies<int64_t>());

				}

				} // at

									
										2

aten/src/ATen/WrapDimUtilsMulti.h
									
												View File
												
				@ -12,7 +12,7 @@ namespace at {

				constexpr size_t dim_bitset_size = 64;

				static inline std::bitset<dim_bitset_size> dim_list_to_bitset(IntList dims, int64_t ndims, bool wrap_scalar=true) {

				static inline std::bitset<dim_bitset_size> dim_list_to_bitset(IntList dims, int64_t ndims) {

				  AT_CHECK(ndims <= (int64_t) dim_bitset_size, "only tensors with up to ", dim_bitset_size, " dims are supported");

				  std::bitset<dim_bitset_size> seen;

				  for (size_t i = 0; i < dims.size(); i++) {

									
										3

aten/src/ATen/common_with_cwrap.py
									
												View File
												
				@ -30,8 +30,7 @@ def set_declaration_defaults(declaration):

				    if 'backends' not in declaration:

				        declaration['backends'] = ['CPU', 'CUDA']

				    if 'api_name' not in declaration:

				        declaration['api_name'] = (declaration['python_name']

				                                   if 'python_name' in declaration else declaration['name'])

				        declaration['api_name'] = declaration['name']

				    # Simulate multiple dispatch, even if it's not necessary

				    if 'options' not in declaration:

				        declaration['options'] = [{'arguments': declaration['arguments']}]

									
										251

aten/src/ATen/copy_wrapper.py
									
												View File
											
				@ -1,251 +0,0 @@

				from code_template import CodeTemplate

				from function_wrapper import nested_dict

				FILE = CodeTemplate("""\

				// ${generated_comment}

				#include "ATen/Config.h"

				#include "TH/TH.h"

				${cuda_includes}

				#include "ATen/Utils.h"

				${copy_includes}

				namespace at {

				${copy_functions}

				}

				""")

				CUDA_INCLUDES = """\

				#undef THNN_

				#include "THC/THC.h"

				"""

				# NB: The copy templates static_cast both dst and src, even though

				# technically we also perform a checked_cast_tensor in the prologue

				# of the copy (meaning that hypothetically, an already casted tensor

				# is available.  However, in s_copy, the casted tensor is dst, while

				# in _s_copy_from, the casted tensor is src.  So we can reuse the logic

				# in both cases, we unconditionally cast both tensors (and rely

				# on the surrounding code to establish the necessary invariants.)

				COPY = CodeTemplate("""\

				${THTensor}_copy${cuda}${src_scalar_name}(${state,}\

				dst.unsafeGetTensorImpl(), \

				src.unsafeGetTensorImpl());

				""")

				COPY_ASYNC_CPU = CodeTemplate("""\

				if (non_blocking) {

				    ${THTensor}_copyAsyncCPU(${state,}\

				dst.unsafeGetTensorImpl(), \

				src.unsafeGetTensorImpl());

				    break;

				}

				""")

				COPY_ASYNC_CUDA = CodeTemplate("""\

				if (non_blocking) {

				    ${THTensor}_copyAsyncCuda(${state,}\

				dst.unsafeGetTensorImpl(), \

				src.unsafeGetTensorImpl());

				    break;

				}

				""")

				CASE = CodeTemplate("""\

				case ${case_id}:

				    ${copies}

				    break;

				""")

				FUNCTION = CodeTemplate("""\

				Tensor & ${Type}::s_copy_(Tensor & dst, const Tensor & src, bool non_blocking) const {

				  // code generated by copy_wrapper

				  ${checked_cast_dst}

				  switch (src.type().ID()) {

				    ${copy_body}

				    default:

				      ${function_fallthrough}

				  }

				  dst.unsafeGetTensorImpl()->maybe_zero_dim(src.dim() == 0);

				  return dst;

				}

				""")

				FUNCTION_FALLTHROUGH_REDISPATCH = "return src.type()._s_copy_from(src, dst, non_blocking);"

				FUNCTION_FALLTHROUGH_ERROR = """\

				AT_ERROR("copy does not support ", src.type().toString(), " to ", toString(), " copy.");

				"""

				FUNCTION_FROM = CodeTemplate("""\

				Tensor & ${Type}::_s_copy_from(const Tensor & src, Tensor & dst, bool non_blocking) const {

				  // code generated by copy_wrapper

				  ${checked_cast_src}

				  switch (dst.type().ID()) {

				    ${copy_body}

				    default:

				      AT_ERROR("copy does not support ", toString(), " to ", dst.type().toString(), " copy.");

				      break;

				  }

				  dst.unsafeGetTensorImpl()->maybe_zero_dim(src.dim() == 0);

				  return dst; // NB! dst

				}

				""")

				# NB: Hypothetically, someone could call s_copy_from directly and get an error

				# message which claims something is not supported, when it actually is.  But

				# the correct fix in this case was to NOT call copy_from

				FUNCTION_FROM_SWAP = CodeTemplate("""\

				Tensor & ${Type}::_s_copy_from(const Tensor & src, Tensor & dst, bool non_blocking) const {

				  AT_ERROR("copy does not support ", src.type().toString(), " to ", dst.type().toString(), " copy (s_copy_from case).");

				}

				""")

				def create_one_copy(dst_type, all_types):

				    copy_body = []

				    for src_type in all_types:

				        if dst_type['Density'] == 'Sparse' or src_type['Density'] == 'Sparse':

				            # skip sparse copies, which are not yet implemented

				            continue

				        cuda = ''

				        state = []

				        if src_type['Backend'] == 'CUDA' or dst_type['Backend'] == 'CUDA':

				            state.append('globalContext().getTHCState()')

				        if src_type['Backend'] == 'CUDA':

				            if dst_type['Backend'] == 'CUDA':

				                cuda = 'Cuda'

				            else:

				                # don't attempt to process CPU-CUDA; this is handled in the

				                # redispatch

				                continue

				        body_env = nested_dict({

				            'src_scalar_name': src_type['ScalarName'],

				            'case_id': src_type['TypeID'],

				            'src_tensor': src_type['Tensor'],

				            'dst_tensor': dst_type['Tensor'],

				            'cuda': cuda,

				            'state': state,

				        }, dst_type)

				        copies = []

				        if dst_type['ScalarType'] == src_type['ScalarType']:

				            if dst_type['Backend'] == 'CUDA' and src_type['Backend'] == 'CPU':

				                copies.append(COPY_ASYNC_CPU.substitute(body_env))

				        copies.append(COPY.substitute(body_env))

				        copy_body.append(CASE.substitute(body_env, copies=copies))

				    if dst_type['Backend'] == 'CPU':

				        # CPU fallthrough needs to redispatch to _s_copy_from

				        # (Backend == CPU implies Dense)

				        assert dst_type['Density'] == 'Dense'

				        function_fallthrough = FUNCTION_FALLTHROUGH_REDISPATCH

				    else:

				        function_fallthrough = FUNCTION_FALLTHROUGH_ERROR

				    # Note [checked_cast_tensor is for dense only]

				    # ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

				    # checked_cast_tensor is only needed for backends which implement

				    # copy and thus do a cast.  Sparse does not support copies, so there

				    # is no need to do a checked cast.  (Furthermore, the code as written

				    # will not work, as it will try to there is no derived Tensor type

				    # for sparse.)

				    checked_cast_dst = ''

				    if dst_type['Density'] == 'Dense':

				        checked_cast_dst = \

				            'checked_tensor_unwrap(dst, "dst", 0, false, Backend::{}, ScalarType::{});' \

				            .format(dst_type['Backend'],

				                    dst_type['ScalarName'])

				    env = nested_dict({

				        'function_fallthrough': function_fallthrough,

				        'checked_cast_dst': checked_cast_dst,

				    }, dst_type)

				    return FUNCTION.substitute(env, copy_body=copy_body)

				def create_one_copy_from(src_type, all_types):

				    if src_type['DenseBackend'] == 'CPU':

				        return FUNCTION_FROM_SWAP.substitute(src_type)

				    copy_body = []

				    for dst_type in all_types:

				        if dst_type['Density'] == 'Sparse' or src_type['Density'] == 'Sparse':

				            # skip sparse copies, which are not yet implemented

				            continue

				        cuda = ''

				        state = []

				        if src_type['Backend'] == 'CUDA':

				            cuda = 'Cuda'

				        if dst_type['Backend'] == 'CUDA' or src_type['Backend'] == 'CUDA':

				            state.append('globalContext().getTHCState()')

				        body_env = nested_dict({

				            'src_scalar_name': src_type['ScalarName'],

				            'case_id': dst_type['TypeID'],

				            'src_tensor': src_type['Tensor'],

				            'dst_tensor': dst_type['Tensor'],

				            'cuda': cuda,

				            'state': state,

				        }, dst_type)

				        copies = []

				        if dst_type['ScalarType'] == src_type['ScalarType']:

				            # NB: Technically, we have already short-circuited the

				            # src_type['Backend'] == 'CUDA' case at the beginning of this

				            # function

				            if dst_type['Backend'] == 'CPU' and src_type['Backend'] == 'CUDA':

				                copies.append(COPY_ASYNC_CUDA.substitute(body_env))

				        copies.append(COPY.substitute(body_env))

				        copy_body.append(CASE.substitute(body_env, copies=copies))

				    # See Note [checked_cast_tensor is for dense only]

				    checked_cast_src = ''

				    if src_type['Density'] != 'Sparse':

				        checked_cast_src = \

				            'checked_tensor_unwrap(src, "src", 0, false, Backend::{}, ScalarType::{});' \

				            .format(src_type['Backend'], src_type['ScalarName'])

				    return FUNCTION_FROM.substitute(src_type, copy_body=copy_body, checked_cast_src=checked_cast_src)

				def create(all_types, backend):

				    top_env = {

				        'copy_includes': [],

				        'copy_functions': [],

				        'cuda_includes': [],

				        'generated_comment': '@' + 'generated by aten/src/ATen/copy_wrapper.py'

				    }

				    if backend == 'CUDA':

				        top_env['cuda_includes'].append(CUDA_INCLUDES)

				    # Headers to include

				    for the_type in all_types:

				        # CUDA backend requires all headers (as it also manages CPU-CUDA

				        # conversions), but CPU backend should only have CPU headers

				        if backend == 'CPU' and the_type['DenseBackend'] != 'CPU':

				            continue

				        top_env['copy_includes'].append(

				            '#include "ATen/{}.h"'.format(the_type['Type']))

				    top_env['copy_includes'].append(

				        '#include "ATen/core/TensorImpl.h"')

				    # Code generation

				    for the_type in all_types:

				        # Only generate code for the requested backend

				        if the_type['DenseBackend'] != backend:

				            continue

				        top_env['copy_functions'].append(create_one_copy(the_type, all_types))

				        top_env['copy_functions'].append(create_one_copy_from(the_type, all_types))

				    return FILE.substitute(top_env)

									
										12

aten/src/ATen/core/ATenCoreTest.cpp
									
												View File
											
				@ -1,12 +0,0 @@

				#include <ATen/core/ATenCoreTest.h>

				#include <ATen/core/Tensor.h>

				namespace at {

				static int CoreTestGlobal = 0;

				int CoreTest() {

				  Tensor x;

				  return CoreTestGlobal++;

				}

				} // namespace at

									
										8

aten/src/ATen/core/ATenCoreTest.h
									
												View File
											
				@ -1,8 +0,0 @@

				#pragma once

				#include <ATen/core/Macros.h>

				namespace at {

				AT_CORE_API int CoreTest();

				}

									
										7

aten/src/ATen/core/ATenGeneral.h
									
												View File
												
				@ -1,8 +1,3 @@

				#pragma once

				#include "ATen/core/Macros.h"

				// TODO: Merge the *_API macros.

				#define AT_API AT_CORE_API

				#define AT_EXPORT AT_CORE_EXPORT

				#define AT_IMPORT AT_CORE_IMPORT

				#include "c10/macros/Macros.h"

									
										1

aten/src/ATen/core/AlignOf.cpp
									
												View File
											
				@ -1 +0,0 @@

				#include <ATen/core/AlignOf.h>

									
										19

aten/src/ATen/core/Allocator.cpp
									
												View File
											
				@ -1,19 +0,0 @@

				#include <ATen/core/Allocator.h>

				namespace at {

				static void deleteInefficientStdFunctionContext(void* ptr) {

				  delete static_cast<InefficientStdFunctionContext*>(ptr);

				}

				at::DataPtr InefficientStdFunctionContext::makeDataPtr(

				    void* ptr,

				    const std::function<void(void*)>& deleter,

				    Device device) {

				  return {ptr,

				          new InefficientStdFunctionContext({ptr, deleter}),

				          &deleteInefficientStdFunctionContext,

				          device};

				}

				} // namespace at

									
										1

aten/src/ATen/core/ArrayRef.cpp
									
												View File
											
				@ -1 +0,0 @@

				#include <ATen/core/ArrayRef.h>

									
										30

aten/src/ATen/core/Backtrace.h
									
												View File
												
				@ -1,28 +1,2 @@

				#pragma once

				#include <cstddef>

				#include <string>

				#include <typeinfo>

				#include <ATen/core/Macros.h>

				namespace at {

				/// Utility to demangle a C++ symbol name.

				AT_CORE_API std::string demangle(const char* name);

				/// Returns the printable name of the type.

				template <typename T>

				inline const char* demangle_type() {

				#ifdef __GXX_RTTI

				  static const std::string name = demangle(typeid(T).name());

				  return name.c_str();

				#else // __GXX_RTTI

				  return "(RTTI disabled, cannot show name)";

				#endif // __GXX_RTTI

				}

				AT_CORE_API std::string get_backtrace(

				    size_t frames_to_skip = 0,

				    size_t maximum_number_of_frames = 64,

				    bool skip_python_frames = true);

				} // namespace at

				#include "c10/util/Backtrace.h"

				#include "c10/util/Type.h"

									
										1

aten/src/ATen/core/C++17.cpp
									
												View File
											
				@ -1 +0,0 @@

				#include <ATen/core/C++17.h>

									
										6

aten/src/ATen/core/CMakeLists.txt
									
												View File
												
				@ -6,6 +6,12 @@ FILE(GLOB ATen_CORE_SRCS "*.cpp")

				FILE(GLOB ATen_CORE_TEST_SRCS "*_test.cpp")

				EXCLUDE(ATen_CORE_SRCS "${ATen_CORE_SRCS}" ${ATen_CORE_TEST_SRCS})

				# see the source file for explanation

				set_source_files_properties(

				  ${CMAKE_CURRENT_SOURCE_DIR}/register_symbols.cpp

				  PROPERTIES COMPILE_FLAGS -O0

				  )

				# Pass to parent

				set(ATen_CORE_HEADERS ${ATen_CORE_HEADERS} PARENT_SCOPE)

				set(ATen_CORE_SRCS ${ATen_CORE_SRCS} PARENT_SCOPE)

									
										14

aten/src/ATen/core/DefaultDtype.cpp
									
										Normal file
									
												View File
												
				@ -0,0 +1,14 @@

				#include <ATen/core/typeid.h>

				#include <ATen/core/DefaultDtype.h>

				namespace at {

				static auto default_dtype = caffe2::TypeMeta::Make<float>();

				void set_default_dtype(caffe2::TypeMeta dtype) {

				  default_dtype = std::move(dtype);

				}

				const caffe2::TypeMeta& get_default_dtype() {

				  return default_dtype;

				}

				} // namespace at

									
										12

aten/src/ATen/core/DefaultDtype.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,12 @@

				#pragma once

				#include <c10/macros/Macros.h>

				namespace caffe2 {

				class TypeMeta;

				} // namespace caffe2

				namespace at {

				CAFFE2_API void set_default_dtype(caffe2::TypeMeta dtype);

				CAFFE2_API const caffe2::TypeMeta& get_default_dtype();

				} // namespace at

									
										42

aten/src/ATen/core/DeviceType.cpp
									
												View File
											
				@ -1,42 +0,0 @@

				#include <ATen/core/DeviceType.h>

				#include <ATen/core/Error.h>

				namespace at {

				std::string DeviceTypeName(at::DeviceType d, bool lower_case) {

				  switch (d) {

				    // I considered instead using ctype::tolower to lower-case the strings

				    // on the fly, but this seemed a bit much.

				    case DeviceType::CPU:

				      return lower_case ? "cpu" : "CPU";

				    case DeviceType::CUDA:

				      return lower_case ? "cuda" : "CUDA";

				    case DeviceType::OPENGL:

				      return lower_case ? "opengl" : "OPENGL";

				    case DeviceType::OPENCL:

				      return lower_case ? "opencl" : "OPENCL";

				    case DeviceType::MKLDNN:

				      return lower_case ? "mkldnn" : "MKLDNN";

				    case DeviceType::IDEEP:

				      return lower_case ? "ideep" : "IDEEP";

				    case DeviceType::HIP:

				      return lower_case ? "hip" : "HIP";

				    default:

				      AT_ERROR(

				          "Unknown device: ",

				          static_cast<int32_t>(d),

				          ". If you have recently updated the caffe2.proto file to add a new "

				          "device type, did you forget to update the DeviceTypeName() "

				          "function to reflect such recent changes?");

				      // The below code won't run but is needed to suppress some compiler

				      // warnings.

				      return "";

				  }

				}

				std::ostream& operator<<(std::ostream& stream, at::DeviceType type) {

				  stream << at::DeviceTypeName(type, /* lower case */ true);

				  return stream;

				}

				} // namespace at

									
										34

aten/src/ATen/core/DeviceType.h
									
												View File
											
				@ -1,34 +0,0 @@

				#pragma once

				// This is directly synchronized with caffe2/proto/caffe2.proto, but

				// doesn't require me to figure out how to get Protobuf headers into

				// ATen/core (which would require a lot more build system hacking.)

				// If you modify me, keep me synchronized with that file.

				#include <ATen/core/Macros.h>

				#include <ostream>

				namespace at {

				// Underlying type declared to be int32_t for consistency with protobufs.

				enum class DeviceType : int32_t {

				  CPU = 0,

				  CUDA = 1, // CUDA.

				  MKLDNN = 2, // Reserved for explicit MKLDNN

				  OPENGL = 3, // OpenGL

				  OPENCL = 4, // OpenCL

				  IDEEP = 5, // IDEEP.

				  HIP = 6, // AMD HIP

				  // Change the following number if you add more devices in the code.

				  COMPILE_TIME_MAX_DEVICE_TYPES = 7,

				  ONLY_FOR_TEST = 20901701, // This device type is only for test.

				};

				AT_CORE_API std::string DeviceTypeName(

				    at::DeviceType d,

				    bool lower_case = false);

				AT_CORE_API std::ostream& operator<<(std::ostream& stream, at::DeviceType type);

				} // namespace at

									
										11

aten/src/ATen/core/DimVector.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,11 @@

				#pragma once

				#include <c10/util/SmallVector.h>

				#include <stdint.h>

				namespace at {

				/// A container for sizes or strides

				using DimVector = SmallVector<int64_t, 5>;

				} // namespace at

									
										25

aten/src/ATen/Formatting.cpp → aten/src/ATen/core/Formatting.cpp
									
												View File
												
				@ -1,6 +1,4 @@

				#include "ATen/Formatting.h"

				#include <ATen/ATen.h>

				#include "ATen/core/Formatting.h"

				#include <cmath>

				#include <cstdint>

				@ -9,6 +7,11 @@

				#include <sstream>

				#include <tuple>

				namespace c10 {

				std::ostream& operator<<(std::ostream & out, Backend b) {

				  return out << toString(b);

				}

				}

				namespace at {

				//not all C++ compilers have default float so we define our own here

				@ -30,22 +33,6 @@ private:

				  std::ios saved;

				};

				std::ostream& operator<<(std::ostream & out, IntList list) {

				  int i = 0;

				  out << "[";

				  for(auto e : list) {

				    if (i++ > 0)

				      out << ", ";

				    out << e;

				  }

				  out << "]";

				  return out;

				}

				std::ostream& operator<<(std::ostream & out, Backend b) {

				  return out << toString(b);

				}

				std::ostream& operator<<(std::ostream & out, const Type& t) {

				  return out << t.toString();

				}

									
										31

aten/src/ATen/core/Formatting.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,31 @@

				#pragma once

				#include <c10/core/Scalar.h>

				#include <ATen/core/Tensor.h>

				#include <ATen/core/TensorMethods.h>

				#include <ATen/core/Type.h>

				#include <iostream>

				namespace c10 {

				CAFFE2_API std::ostream& operator<<(std::ostream& out, Backend b);

				}

				namespace at {

				CAFFE2_API std::ostream& operator<<(std::ostream& out, const Type& t);

				CAFFE2_API std::ostream& print(

				    std::ostream& stream,

				    const Tensor& tensor,

				    int64_t linesize);

				static inline std::ostream& operator<<(std::ostream & out, const Tensor & t) {

				  return print(out,t,80);

				}

				static inline void print(const Tensor & t, int64_t linesize=80) {

				  print(std::cout,t,linesize);

				}

				static inline std::ostream& operator<<(std::ostream & out, Scalar s) {

				  return out << (s.isFloatingPoint() ? s.toDouble() : s.toLong());

				}

				}

									
										2

aten/src/ATen/core/Generator.h
									
												View File
												
				@ -5,7 +5,7 @@

				namespace at {

				struct AT_API Generator {

				struct CAFFE2_API Generator {

				  Generator() {};

				  Generator(const Generator& other) = delete;

				  Generator(Generator&& other) = delete;

									
										257

aten/src/ATen/core/Half-inl.h
									
												View File
											
				@ -1,257 +0,0 @@

				#pragma once

				#include <cstring>

				#include <limits>

				#include <ATen/core/Macros.h>

				#ifdef __CUDACC__

				#include <cuda_fp16.h>

				#endif

				#if defined(__HIP_DEVICE_COMPILE__)

				#include <hip/hip_fp16.h>

				#endif

				namespace at {

				/// Constructors

				inline AT_HOSTDEVICE Half::Half(float value) {

				#if defined(__CUDA_ARCH__) || defined(__HIP_DEVICE_COMPILE__)

				  x = __half_as_short(__float2half(value));

				#else

				  x = detail::float2halfbits(value);

				#endif

				}

				/// Implicit conversions

				inline AT_HOSTDEVICE Half::operator float() const {

				#if defined(__CUDA_ARCH__) || defined(__HIP_DEVICE_COMPILE__)

				  return __half2float(*reinterpret_cast<const __half*>(&x));

				#else

				  return detail::halfbits2float(x);

				#endif

				}

				#ifdef __CUDACC__

				inline AT_HOSTDEVICE Half::Half(const __half& value) {

				  x = *reinterpret_cast<const unsigned short*>(&value);

				}

				inline AT_HOSTDEVICE Half::operator __half() const {

				  return *reinterpret_cast<const __half*>(&x);

				}

				#endif

				// CUDA intrinsics

				#if defined(__CUDA_ARCH__) && (__CUDA_ARCH__ >= 350)

				inline __device__ Half __ldg(const Half* ptr) {

				    return __ldg(reinterpret_cast<const __half*>(ptr));

				}

				#endif

				/// Arithmetic

				inline AT_HOSTDEVICE Half operator+(const Half& a, const Half& b) {

				  return static_cast<float>(a) + static_cast<float>(b);

				}

				inline AT_HOSTDEVICE Half operator-(const Half& a, const Half& b) {

				  return static_cast<float>(a) - static_cast<float>(b);

				}

				inline AT_HOSTDEVICE Half operator*(const Half& a, const Half& b) {

				  return static_cast<float>(a) * static_cast<float>(b);

				}

				inline AT_HOSTDEVICE Half operator/(const Half& a, const Half& b) {

				  return static_cast<float>(a) / static_cast<float>(b);

				}

				inline AT_HOSTDEVICE Half operator-(const Half& a) {

				  return -static_cast<float>(a);

				}

				inline AT_HOSTDEVICE Half& operator+=(Half& a, const Half& b) {

				  a = a + b;

				  return a;

				}

				inline AT_HOSTDEVICE Half& operator-=(Half& a, const Half& b) {

				  a = a - b;

				  return a;

				}

				inline AT_HOSTDEVICE Half& operator*=(Half& a, const Half& b) {

				  a = a * b;

				  return a;

				}

				inline AT_HOSTDEVICE Half& operator/=(Half& a, const Half& b) {

				  a = a / b;

				  return a;

				}

				/// Arithmetic with floats

				inline AT_HOSTDEVICE float operator+(Half a, float b) {

				  return static_cast<float>(a) + b;

				}

				inline AT_HOSTDEVICE float operator-(Half a, float b) {

				  return static_cast<float>(a) - b;

				}

				inline AT_HOSTDEVICE float operator*(Half a, float b) {

				  return static_cast<float>(a) * b;

				}

				inline AT_HOSTDEVICE float operator/(Half a, float b) {

				  return static_cast<float>(a) / b;

				}

				inline AT_HOSTDEVICE float operator+(float a, Half b) {

				  return a + static_cast<float>(b);

				}

				inline AT_HOSTDEVICE float operator-(float a, Half b) {

				  return a - static_cast<float>(b);

				}

				inline AT_HOSTDEVICE float operator*(float a, Half b) {

				  return a * static_cast<float>(b);

				}

				inline AT_HOSTDEVICE float operator/(float a, Half b) {

				  return a / static_cast<float>(b);

				}

				inline AT_HOSTDEVICE float& operator+=(float& a, const Half& b) {

				  return a += static_cast<float>(b);

				}

				inline AT_HOSTDEVICE float& operator-=(float& a, const Half& b) {

				  return a -= static_cast<float>(b);

				}

				inline AT_HOSTDEVICE float& operator*=(float& a, const Half& b) {

				  return a *= static_cast<float>(b);

				}

				inline AT_HOSTDEVICE float& operator/=(float& a, const Half& b) {

				  return a /= static_cast<float>(b);

				}

				/// Arithmetic with doubles

				inline AT_HOSTDEVICE double operator+(Half a, double b) {

				  return static_cast<double>(a) + b;

				}

				inline AT_HOSTDEVICE double operator-(Half a, double b) {

				  return static_cast<double>(a) - b;

				}

				inline AT_HOSTDEVICE double operator*(Half a, double b) {

				  return static_cast<double>(a) * b;

				}

				inline AT_HOSTDEVICE double operator/(Half a, double b) {

				  return static_cast<double>(a) / b;

				}

				inline AT_HOSTDEVICE double operator+(double a, Half b) {

				  return a + static_cast<double>(b);

				}

				inline AT_HOSTDEVICE double operator-(double a, Half b) {

				  return a - static_cast<double>(b);

				}

				inline AT_HOSTDEVICE double operator*(double a, Half b) {

				  return a * static_cast<double>(b);

				}

				inline AT_HOSTDEVICE double operator/(double a, Half b) {

				  return a / static_cast<double>(b);

				}

				/// Arithmetic with ints

				inline AT_HOSTDEVICE Half operator+(Half a, int b) {

				  return a + static_cast<Half>(b);

				}

				inline AT_HOSTDEVICE Half operator-(Half a, int b) {

				  return a - static_cast<Half>(b);

				}

				inline AT_HOSTDEVICE Half operator*(Half a, int b) {

				  return a * static_cast<Half>(b);

				}

				inline AT_HOSTDEVICE Half operator/(Half a, int b) {

				  return a / static_cast<Half>(b);

				}

				inline AT_HOSTDEVICE Half operator+(int a, Half b) {

				  return static_cast<Half>(a) + b;

				}

				inline AT_HOSTDEVICE Half operator-(int a, Half b) {

				  return static_cast<Half>(a) - b;

				}

				inline AT_HOSTDEVICE Half operator*(int a, Half b) {

				  return static_cast<Half>(a) * b;

				}

				inline AT_HOSTDEVICE Half operator/(int a, Half b) {

				  return static_cast<Half>(a) / b;

				}

				/// NOTE: we do not define comparisons directly and instead rely on the implicit

				/// conversion from at::Half to float.

				} // namespace at

				namespace std {

				template <>

				class numeric_limits<at::Half> {

				 public:

				  static constexpr bool is_specialized = true;

				  static constexpr bool is_signed = true;

				  static constexpr bool is_integer = false;

				  static constexpr bool is_exact = false;

				  static constexpr bool has_infinity = true;

				  static constexpr bool has_quiet_NaN = true;

				  static constexpr bool has_signaling_NaN = true;

				  static constexpr auto has_denorm = numeric_limits<float>::has_denorm;

				  static constexpr auto has_denorm_loss =

				      numeric_limits<float>::has_denorm_loss;

				  static constexpr auto round_style = numeric_limits<float>::round_style;

				  static constexpr bool is_iec559 = true;

				  static constexpr bool is_bounded = true;

				  static constexpr bool is_modulo = false;

				  static constexpr int digits = 11;

				  static constexpr int digits10 = 3;

				  static constexpr int max_digits10 = 5;

				  static constexpr int radix = 2;

				  static constexpr int min_exponent = -13;

				  static constexpr int min_exponent10 = -4;

				  static constexpr int max_exponent = 16;

				  static constexpr int max_exponent10 = 4;

				  static constexpr auto traps = numeric_limits<float>::traps;

				  static constexpr auto tinyness_before =

				      numeric_limits<float>::tinyness_before;

				  static constexpr at::Half min() {

				    return at::Half(0x0400, at::Half::from_bits);

				  }

				  static constexpr at::Half lowest() {

				    return at::Half(0xFBFF, at::Half::from_bits);

				  }

				  static constexpr at::Half max() {

				    return at::Half(0x7BFF, at::Half::from_bits);

				  }

				  static constexpr at::Half epsilon() {

				    return at::Half(0x1400, at::Half::from_bits);

				  }

				  static constexpr at::Half round_error() {

				    return at::Half(0x3800, at::Half::from_bits);

				  }

				  static constexpr at::Half infinity() {

				    return at::Half(0x7C00, at::Half::from_bits);

				  }

				  static constexpr at::Half quiet_NaN() {

				    return at::Half(0x7E00, at::Half::from_bits);

				  }

				  static constexpr at::Half signaling_NaN() {

				    return at::Half(0x7D00, at::Half::from_bits);

				  }

				  static constexpr at::Half denorm_min() {

				    return at::Half(0x0001, at::Half::from_bits);

				  }

				};

				} // namespace std

Compare commits

1708 Commits v1.0.0a0 ... v1.0.0

1115 .circleci/config.yml View File

60 .clang-tidy Unescape Escape View File

49 .github/ISSUE_TEMPLATE/bug-report.md vendored Normal file Unescape Escape View File

9 .github/ISSUE_TEMPLATE/documentation.md vendored Normal file Unescape Escape View File

24 .github/ISSUE_TEMPLATE/feature-request.md vendored Normal file Unescape Escape View File

13 .github/ISSUE_TEMPLATE/questions-help-support.md vendored Normal file Unescape Escape View File

22 .gitignore vendored Unescape Escape View File

24 .gitmodules vendored Unescape Escape View File

71 .jenkins/caffe2/build.sh Unescape Escape View File

42 .jenkins/caffe2/test.sh Unescape Escape View File

14 .jenkins/pytorch/build-asan.sh Unescape Escape View File

108 .jenkins/pytorch/build.sh Unescape Escape View File

7 .jenkins/pytorch/enabled-configs.txt Unescape Escape View File

3 .jenkins/pytorch/macos-test.sh Unescape Escape View File

16 .jenkins/pytorch/perf_test/compare_with_baseline.py Unescape Escape View File

3 .jenkins/pytorch/perf_test/test_gpu_speed_mnist.sh Unescape Escape View File

45 .jenkins/pytorch/test.sh Unescape Escape View File

36 .jenkins/pytorch/win-build.sh Unescape Escape View File

77 .jenkins/pytorch/win-test.sh Unescape Escape View File

16 .travis.yml Unescape Escape View File

67 CMakeLists.txt Unescape Escape View File

20 CODEOWNERS Unescape Escape View File

119 CONTRIBUTING.md Unescape Escape View File

44 README.md Unescape Escape View File

3 aten/.flake8 Unescape Escape View File

3 aten/.gitignore vendored Unescape Escape View File

258 aten/README.md Unescape Escape View File

18 aten/src/ATen/ATen.h Unescape Escape View File

7 aten/src/ATen/AccumulateType.h Unescape Escape View File

2 aten/src/ATen/Allocator.h Unescape Escape View File

2 aten/src/ATen/ArrayRef.h Unescape Escape View File

2 aten/src/ATen/Backend.h Unescape Escape View File

29 aten/src/ATen/CMakeLists.txt Unescape Escape View File

18 aten/src/ATen/CPUApplyUtils.h Unescape Escape View File

2 aten/src/ATen/CPUFixedAllocator.h Unescape Escape View File

8 aten/src/ATen/CPUGeneral.h Unescape Escape View File

2 aten/src/ATen/CPUTypeDefault.h Unescape Escape View File

0 aten/src/ATen/CUDAStream.cpp Unescape Escape View File

0 aten/src/ATen/CUDAStream.h Unescape Escape View File

4 aten/src/ATen/CheckGenerator.h Unescape Escape View File

46 aten/src/ATen/Context.cpp Unescape Escape View File

93 aten/src/ATen/Context.h Unescape Escape View File

2 aten/src/ATen/DLConvertor.cpp Unescape Escape View File

6 aten/src/ATen/DLConvertor.h Unescape Escape View File

727 aten/src/ATen/Declarations.cwrap View File

2 aten/src/ATen/Device.h Unescape Escape View File

148 aten/src/ATen/DeviceGuard.h Unescape Escape View File

11 aten/src/ATen/DimVector.h Unescape Escape View File

231 aten/src/ATen/Dispatch.h Unescape Escape View File

2 aten/src/ATen/Error.h Unescape Escape View File

6 aten/src/ATen/ExpandUtils.cpp Unescape Escape View File

32 aten/src/ATen/ExpandUtils.h Unescape Escape View File

25 aten/src/ATen/Formatting.h Unescape Escape View File

6 aten/src/ATen/InferSize.h Unescape Escape View File

15 aten/src/ATen/InitialTensorOptions.h Normal file Unescape Escape View File

2 aten/src/ATen/Layout.h Unescape Escape View File

12 aten/src/ATen/LegacyTHDispatch.cpp Normal file Unescape Escape View File

91 aten/src/ATen/LegacyTHDispatch.h Normal file Unescape Escape View File

2 aten/src/ATen/MatrixRef.h Unescape Escape View File

2 aten/src/ATen/OptionsGuard.h Unescape Escape View File

42 aten/src/ATen/Parallel.h Unescape Escape View File

2 aten/src/ATen/Registry.h Unescape Escape View File

60 aten/src/ATen/Retainable.h Unescape Escape View File

11 aten/src/ATen/ScalarOps.h Unescape Escape View File

4 aten/src/ATen/ScalarType.h Unescape Escape View File

2 aten/src/ATen/SmallVector.h Unescape Escape View File

41 aten/src/ATen/SparseTensorImpl.cpp Unescape Escape View File

105 aten/src/ATen/SparseTensorImpl.h Unescape Escape View File

125 aten/src/ATen/SparseTensorUtils.h Normal file Unescape Escape View File

2 aten/src/ATen/Storage.h Unescape Escape View File

2 aten/src/ATen/StorageImpl.h Unescape Escape View File

4 aten/src/ATen/TensorGeometry.cpp Unescape Escape View File

5 aten/src/ATen/TensorGeometry.h Unescape Escape View File

2 aten/src/ATen/TensorImpl.h Unescape Escape View File

10 aten/src/ATen/TensorOperators.h Unescape Escape View File

94 aten/src/ATen/TensorUtils.h Unescape Escape View File

10 aten/src/ATen/UndefinedType.cpp Unescape Escape View File

3 aten/src/ATen/UndefinedType.h Unescape Escape View File

1708 Commits

v1.0.0a0 ... v1.0.0

1115

.circleci/config.yml

View File

60

.clang-tidy

View File

49

.github/ISSUE_TEMPLATE/bug-report.md vendored Normal file

View File

9

.github/ISSUE_TEMPLATE/documentation.md vendored Normal file

View File

24

.github/ISSUE_TEMPLATE/feature-request.md vendored Normal file

View File

13

.github/ISSUE_TEMPLATE/questions-help-support.md vendored Normal file

View File

22

.gitignore vendored

View File

24

.gitmodules vendored

View File

71

.jenkins/caffe2/build.sh

View File

42

.jenkins/caffe2/test.sh

View File

14

.jenkins/pytorch/build-asan.sh

View File

108

.jenkins/pytorch/build.sh

View File

7

.jenkins/pytorch/enabled-configs.txt

View File

3

.jenkins/pytorch/macos-test.sh

View File

16

.jenkins/pytorch/perf_test/compare_with_baseline.py

View File

3

.jenkins/pytorch/perf_test/test_gpu_speed_mnist.sh

View File

45

.jenkins/pytorch/test.sh

View File

36

.jenkins/pytorch/win-build.sh

View File

77

.jenkins/pytorch/win-test.sh

View File

16

.travis.yml

View File

67

CMakeLists.txt

View File

20

CODEOWNERS

View File

119

CONTRIBUTING.md

View File

44

README.md

View File

3

aten/.flake8

View File

3

aten/.gitignore vendored

View File

258

aten/README.md

View File

18

aten/src/ATen/ATen.h

View File

7

aten/src/ATen/AccumulateType.h

View File

2

aten/src/ATen/Allocator.h

View File

2

aten/src/ATen/ArrayRef.h

View File

2

aten/src/ATen/Backend.h

View File

29

aten/src/ATen/CMakeLists.txt

View File

18

aten/src/ATen/CPUApplyUtils.h

View File

2

aten/src/ATen/CPUFixedAllocator.h

View File

8

aten/src/ATen/CPUGeneral.h

View File

2

aten/src/ATen/CPUTypeDefault.h

View File

0

aten/src/ATen/CUDAStream.cpp

View File

0

aten/src/ATen/CUDAStream.h

View File

4

aten/src/ATen/CheckGenerator.h

View File

46

aten/src/ATen/Context.cpp

View File

93

aten/src/ATen/Context.h

View File

2

aten/src/ATen/DLConvertor.cpp

View File

6

aten/src/ATen/DLConvertor.h

View File

727

aten/src/ATen/Declarations.cwrap

View File

2

aten/src/ATen/Device.h

View File

148

aten/src/ATen/DeviceGuard.h

View File

11

aten/src/ATen/DimVector.h

View File

231

aten/src/ATen/Dispatch.h

View File

2

aten/src/ATen/Error.h

View File

6

aten/src/ATen/ExpandUtils.cpp

View File

32

aten/src/ATen/ExpandUtils.h

View File

25

aten/src/ATen/Formatting.h

View File

6

aten/src/ATen/InferSize.h

View File

15

aten/src/ATen/InitialTensorOptions.h Normal file

View File

2

aten/src/ATen/Layout.h

View File

12

aten/src/ATen/LegacyTHDispatch.cpp Normal file

View File

91

aten/src/ATen/LegacyTHDispatch.h Normal file

View File

2

aten/src/ATen/MatrixRef.h

View File

2

aten/src/ATen/OptionsGuard.h

View File

42

aten/src/ATen/Parallel.h

View File

2

aten/src/ATen/Registry.h

View File

60

aten/src/ATen/Retainable.h

View File

11

aten/src/ATen/ScalarOps.h

View File

4

aten/src/ATen/ScalarType.h

View File

2

aten/src/ATen/SmallVector.h

View File

41

aten/src/ATen/SparseTensorImpl.cpp

View File

105

aten/src/ATen/SparseTensorImpl.h

View File

125

aten/src/ATen/SparseTensorUtils.h Normal file

View File

2

aten/src/ATen/Storage.h

View File

2

aten/src/ATen/StorageImpl.h

View File

4

aten/src/ATen/TensorGeometry.cpp

View File

5

aten/src/ATen/TensorGeometry.h

View File

2

aten/src/ATen/TensorImpl.h

View File

10

aten/src/ATen/TensorOperators.h

View File

94

aten/src/ATen/TensorUtils.h

View File

10

aten/src/ATen/UndefinedType.cpp

View File

3

aten/src/ATen/UndefinedType.h

View File

14

aten/src/ATen/Utils.h

View File