pytorch

mirror of https://github.com/pytorch/pytorch.git synced 2025-10-22 14:15:01 +08:00

Author	SHA1	Message	Date
Junjie Bai	ff608a9ff3	Back out "Revert D10123245: Back out "codemod cuda_gpu_id to device_id"" (#12232 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12232 Original commit changeset: fca91fea58b7 This adds proper modifications to the DeviceType <->DeviceOption conversion code added in D10033396 Reviewed By: jerryzh168 Differential Revision: D10132473 fbshipit-source-id: 801ef777e2950982cb47b48051b1471a0a91e64b	2018-10-01 21:54:52 -07:00
Edward Yang	696498d9e4	Delete stride updating logic from Caffe2, and make PyTorch error in this case. (#12236 ) Summary: Strides appear to cause a huge memory regression in some of our internal training workflows. This diff stems the bleeding, while we figure out exactly what happened. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12236 Reviewed By: dzhulgakov Differential Revision: D10134319 fbshipit-source-id: 1547c89a65c05473c409c0977c19c99dcaefb89c	2018-10-01 21:25:04 -07:00
iotamudelta	2cbcaf4544	Skip failing tests in test_sparse (#12229 ) Summary: Skip the recently introduced tests that fail on ROCm Pull Request resolved: https://github.com/pytorch/pytorch/pull/12229 Differential Revision: D10138146 Pulled By: bddppq fbshipit-source-id: a0f1ff97fabb71f635a468e8030dbe32d388de49	2018-10-01 18:31:45 -07:00
Ilia Cherniavskii	8af06d8114	Use DFS scheduling only within single device (#11848 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11848 Avoid crossing the boundary between devices when using DFS scheduling Reviewed By: romain-intel Differential Revision: D9931091 fbshipit-source-id: 1f3cf52127830048ed1db50b01677b66eeed8b32	2018-10-01 18:31:43 -07:00
Shicong Zhao	ecace9eb21	Move crf in caffe2 from fb to oss (#12200 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12200 moved crf_viterbi_op, copied crf_predict and crf_viterbi_test to oss Reviewed By: Yangqing Differential Revision: D10118341 fbshipit-source-id: 51e30e57d280d6ca75fc0b488f743794f23b589f	2018-10-01 18:31:41 -07:00
Junjie Bai	26df16eb21	Clear previous device option when keep_device is set in load op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12240 Reviewed By: jerryzh168 Differential Revision: D10133933 fbshipit-source-id: 05935bd527177f936c1d08626888d43dedbf5ce4	2018-10-01 17:20:26 -07:00
Jerry Zhang	23f86ad57f	Back out "[caffe2][mpscnn] Enable multiple external output" Summary: Original commit changeset: 0cea9469cea0 Differential Revision: D10135814 fbshipit-source-id: 9563361cc00f4ce5dc2e903c0fcb10643ee9af26	2018-10-01 16:55:32 -07:00
Lu Fang	35becd1879	New version of PT1 model format (#12149 ) Summary: Considered four different existing formats: 1) static graph, 2) torch script, 3) pickle files, 4) PyTorch C++ serialize APIs Pull Request resolved: https://github.com/pytorch/pytorch/pull/12149 Reviewed By: BIT-silence Differential Revision: D10098106 Pulled By: houseroad fbshipit-source-id: 94ec7fc57c842e50fae5286ddeda657a4967a07a	2018-10-01 15:57:02 -07:00
Junjie Bai	8fa7de35f2	Enable ROCM clang-7 build Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12223 Differential Revision: D10133697 Pulled By: bddppq fbshipit-source-id: c1de99afccdad415ac1beb85d3b8ab44f9b58738	2018-10-01 15:11:40 -07:00
Roy Li	15d28e400f	remove support for c extensions (#12122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12122 We are deprecating support for c extensions. Please use cpp extension in the future. Reviewed By: Yangqing Differential Revision: D10060541 fbshipit-source-id: 4f7149e06a254bd7af463fd7aa9740f65369963a	2018-10-01 13:55:28 -07:00
Junjie Bai	1b59cf8b51	Add support to use llvm 7 in CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12182 Differential Revision: D10129630 Pulled By: bddppq fbshipit-source-id: f0217336474b807f03f84a4b8052ce92a6e3564b	2018-10-01 13:39:50 -07:00
Ilia Cherniavskii	06f535d8a0	More debug info in plan executor (#12183 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12183 Adding more debug info printed from plan executor Reviewed By: manojkris Differential Revision: D10113104 fbshipit-source-id: dddc9aec8012c8575ab305033388412fdaaac537	2018-10-01 12:56:32 -07:00
Ilia Cherniavskii	eba1cf2145	Unify style (#11949 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11949 Unify naming style Reviewed By: yinghai Differential Revision: D9931227 fbshipit-source-id: b6956bd98ed8625623e4747d616989f9f3a2ed46	2018-10-01 12:56:29 -07:00
Rick Ratmansky	3010dc4208	Revert D10123245: Back out "codemod cuda_gpu_id to device_id" Differential Revision: D10123245 Original commit changeset: d83da8e00a12 fbshipit-source-id: fca91fea58b7df208edc2e218a1d514f9821ec7b	2018-10-01 12:22:36 -07:00
Wei Yang	ecb3835387	change \gamma to \Gamma (#12214 ) Summary: - revert `\gamma` changes at landed PR: https://github.com/pytorch/pytorch/pull/12126 - minor fix for docs of `torch.norm()` SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/12214 Differential Revision: D10127337 Pulled By: weiyangfb fbshipit-source-id: 15eb8abda39ec9e8b2e815e2a22096cae786995a	2018-10-01 11:31:18 -07:00
Yang Liu	7d7d336c45	Back out "codemod cuda_gpu_id to device_id" Summary: Original commit changeset: f5614a5d2607 D9986213 is causing Multifeed Aggregator a [huge performance different](https://our.intern.facebook.com/intern/ads/analyze_canary/412951953278781781/) and is blocking aggregator push since last Friday night: https://fburl.com/feedtools/b6izvwjz We need to land this revert ASAP to unblock aggregator push. Reviewed By: orionr Differential Revision: D10123245 fbshipit-source-id: d83da8e00a1250f5d09811a0a587c127e377aab2	2018-10-01 11:31:14 -07:00
Duc Ngo	e43ffb0148	nomnigraph - easy - some code cleanup for transformations_test (#12101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12101 clean up some duplicate test code Reviewed By: ZolotukhinM Differential Revision: D10051914 fbshipit-source-id: 698ff144a85e8c70572116c5ddb415cd2396b4e3	2018-10-01 11:31:08 -07:00
Jerry Zhang	006171fffc	Back out "[pytorch][PR] Revert "Move CreateContext to global registry (#11688 )"" (#12121 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12121 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12055 Original commit changeset: 6ca9de65b707 Reviewed By: ezyang Differential Revision: D10033396 fbshipit-source-id: ca9f4b2f7ef0561f619b833415d394a8b9972bf4	2018-10-01 11:10:46 -07:00
Elias Ellison	fed91f873f	(Very small) allow trailing commas in assign or tuples (#11723 ) Summary: Allow trailing commas in assign statements or tuples, which also allows single element tuples. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11723 Differential Revision: D10052162 Pulled By: eellison fbshipit-source-id: 344d908a3ad942a23ebd9f341794bc9734226aa8	2018-10-01 10:10:13 -07:00
Jongsoo Park	f3c32a4b54	dnnlowp_16 -> dnnlowp_acc16 (#12205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12205 We're more interested in testing the performance of DNNLOWP_ACC16 engine. Reviewed By: llyfacebook Differential Revision: D10121080 fbshipit-source-id: 7def38be838feb7636f7dd0c8ed352c2df398ec1	2018-10-01 09:40:13 -07:00
Hector Yuen	9768b4d4ff	support half float for SparseLengthsIndicesInGradientWeightedSumWithMainInputGradient (#12186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12186 specialized implementation, preconvert embeddings to float and do everything on fp32 Reviewed By: jspark1105 Differential Revision: D10100603 fbshipit-source-id: 3255b4addb6fda24722bd519163099f5d354d084	2018-09-30 23:56:14 -07:00
Peter Goldsborough	c3817e85fa	Temporary fix for LibTorch download link (#12212 ) Summary: We're waiting for the libtorch links to show up on the website. I had a fake link in the docs so far which is misleading. This PR changes it to a temporary markdown file until the web people fix the site tomorrow. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12212 Differential Revision: D10121872 Pulled By: goldsborough fbshipit-source-id: f1bd1315f7333b9168e99983f3f6b679c9b0c52a	2018-09-30 15:39:51 -07:00
Wei Yang	572132fb17	copy_(Sparse, Sparse) for sparse tensor (#9005 ) Summary: - fix #8330 - add `torch.copy_(Sparse, Sparse)` with autograd support Pull Request resolved: https://github.com/pytorch/pytorch/pull/9005 Differential Revision: D8987885 Pulled By: weiyangfb fbshipit-source-id: b317a41da22ee1eae2835622a0ed28a6771a3a06	2018-09-30 11:55:09 -07:00
Peter Goldsborough	93ecf4d72a	Remove raise_from (#12185 ) Summary: soumith CC alsrgv Fixes #11995 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12185 Differential Revision: D10120103 Pulled By: goldsborough fbshipit-source-id: ef7807ad83f9efc05d169675b7ec72986a5d17c3	2018-09-29 22:41:55 -07:00
Wei Yang	5ffc915f26	fix docs (#12126 ) Summary: - fix https://github.com/pytorch/pytorch/issues/12120 - add `torch.argsort`, `torch.pdist`, `broadcast_tensors` to *.rst files - add parameter dim to `torch.unique` doc - fix table and args for `torch.norm` - test plan: make html and check docs in browser gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/12126 Differential Revision: D10087006 Pulled By: weiyangfb fbshipit-source-id: 25f65c43d14e02140d0da988d8742c7ade3d8cc9	2018-09-29 22:26:45 -07:00
Jiyan Yang	40aa212cd6	Support fp16 mkl engine in training Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12080 Reviewed By: hyuen Differential Revision: D10037719 fbshipit-source-id: 618ce894eccc4c87a038dc3ab836684f16843cde	2018-09-29 21:55:11 -07:00
iotamudelta	a2ebbccc9f	fix unit tests on CI Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12187 Differential Revision: D10118483 Pulled By: bddppq fbshipit-source-id: 986c8fb48d61e00103c713548a50e74489a0e442	2018-09-28 23:11:55 -07:00
mruberry	878e7740fd	Turns optimizations off when checking trace (#12172 ) Summary: Currently when tracing optimizations are performed twice. This means that optimizing passes, like the fusion pass, are also called twice. This is unnecessary and this PR turns off optimizations when checking the trace (since the trace is independent of optimizations). This should improve performance and debugging. apaszke who proposed this change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12172 Reviewed By: ezyang Differential Revision: D10109250 Pulled By: apaszke fbshipit-source-id: 8b3385eae143446820f1b61ca7576d7c07f9b248	2018-09-28 19:40:10 -07:00
Bram Wasti	22ce6060ec	Add caffe2_api to exported functions (#12184 ) Summary: Broke the build, sorry. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12184 Differential Revision: D10114818 Pulled By: bwasti fbshipit-source-id: 49844183a48d9383c5055a9ce06fe61fbf353050	2018-09-28 18:12:00 -07:00
Jerry Zhang	ebc2643498	Enable multiple external output (#10957 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10957 att Differential Revision: D9525097 fbshipit-source-id: 0cea9469cea06cbfd3828549b168483413788269	2018-09-28 18:11:58 -07:00
Bram Wasti	0a5dfa5a52	Add support for device annotations on blobs Summary: device annotations on blobs with Declare and Export trick Reviewed By: yyetim Differential Revision: D9999916 fbshipit-source-id: 0bd4d15e7beed2788f47255d52ea296f8f674295	2018-09-28 14:11:54 -07:00
Bram Wasti	08e5ca1262	Add filter<T>(NNModule) and explicit Declare/Export classes (#11955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11955 Adding a `filter<T>(NNModule)` function to easily get inputs/outputs of a DAI-style NNModule. Reviewed By: duc0 Differential Revision: D9997696 fbshipit-source-id: 818c4f2e3093e0d02b35e6632b426e8d3189c21e	2018-09-28 14:11:53 -07:00
Bram Wasti	60061a20d9	Adding Declare and Export operators (#11954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11954 Adding an alternative to external_input and external_output for use in some distributed settings Reviewed By: aazzolini Differential Revision: D9997121 fbshipit-source-id: 1b5cc03fd3051368a3edc69e7bc472386f5746b5	2018-09-28 14:11:51 -07:00
mruberry	7b2c0a09e4	Adds support for NaN, +inf, -inf float scalars to CPU and CUDA fusers (#12070 ) Summary: In current upstream float scalars are always written into kernels with: `out << std::scientific << v << "f";` When the floats are special values like NaN, +inf, or -inf this produces nonsense that causes compilation to fail. This fix updates the conversion of float scalars to device-specific special values. The appropriate macros are added to the CPU and CUDA resource strings. Note that a NAN macro was not necessary on the CPU since math.h defines NAN. To verify this fix I updated the test_clamp_fusion test in test_jit.py. I wanted to test -inf, too, but -inf is not currently accepted by the interpreter. Edit: Forgot to mention, this partially addresses issue #12067. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12070 Reviewed By: ezyang Differential Revision: D10044704 Pulled By: soumith fbshipit-source-id: 8f4a930862d66a7d37d985e3f6a6fb724579e74c	2018-09-28 14:11:49 -07:00
Edward Yang	0e779c27e1	Deduplicate canonical_axis_index_ with maybe_wrap_dim (#11891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11891 maybe_wrap_dim is a slightly more general function, which is able to, under some circumstances, treat 0 as a "valid" dimension even with a tensor is scalar. canonical_axis_index_ never accepts this behavior, so it always passes false. Reviewed By: jerryzh168 Differential Revision: D9968320 fbshipit-source-id: 13c98fff0880d7bfcd00911a76c8aa10d37bd183	2018-09-28 14:11:48 -07:00
Aditya Kumar	ab9a5976a0	Disable inlinining of EnforceFailMessage (#12078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12078 The constructor is inlined multiple times Reviewed By: salexspb Differential Revision: D9358084 fbshipit-source-id: c8d4177a3fcccac574ee4f63336a6fa8bfb07d11	2018-09-28 11:24:35 -07:00
Tongzhou Wang	8009b6cdb5	Kill self_ty in TYPE_DERIVED_DEFINITION_NATIVE (#11903 ) Summary: This allows us to call the type argument with name other than `self_ty`. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11903 Differential Revision: D10105029 Pulled By: SsnL fbshipit-source-id: 0fbdc728123ebc1154d080628cb41a085ba3e6d7	2018-09-28 11:09:50 -07:00
Zachary DeVito	e7e10e60e0	Introduce builtin script functions (#12141 ) Summary: This functionality replaces the Scalar-Tensor builtin operators, with builtin functions. Builtin functions are used in place of operators where one operator can be defined using a composition of another. This simplifies later optimization passes by allowing us to have fewer operator. In the future, builtin functions can be used for other purposes. For example, we can define derivative functions as code rather than building graphs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12141 Reviewed By: ezyang Differential Revision: D10088065 Pulled By: zdevito fbshipit-source-id: a2acb06346e649c4c8a2fe423b420871161c21cf	2018-09-28 10:55:08 -07:00
Junjie Bai	65bf181ddf	Add "ai.onnx.pytorch" onnx domain (#12157 ) Summary: zrphercule Pull Request resolved: https://github.com/pytorch/pytorch/pull/12157 Differential Revision: D10100799 Pulled By: bddppq fbshipit-source-id: 76fdd126e0b52c54276752b3b0174735355a7d2f	2018-09-28 09:57:06 -07:00
Fritz Obermeyer	0aff3cc559	Fix broadcasting bug in StudentT (#12148 ) Summary: This fixes a broadcasting error with the `StudentT` distribution - [x] added a regression test - [x] strengthened parameter broadcasting tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/12148 Differential Revision: D10099226 Pulled By: soumith fbshipit-source-id: 0c5eb14180d158f8fff28ceb9e7cd3471c2bb803	2018-09-28 09:57:02 -07:00
cclauss	b0248df72a	Docs: Change cuda(async) —> cuda(non_blocking) (#12158 ) Summary: goldsborough Modify the docs to match the changes made in #4999 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12158 Differential Revision: D10103964 Pulled By: SsnL fbshipit-source-id: 1b8692da86aca1a52e8d2e6cea76a5ad1f71e058	2018-09-28 08:39:27 -07:00
Luca Antiga	5be0baefa2	Use streams in JIT serialization, allow JIT serialization to/from buffer (#11932 ) Summary: This PR replaces the use of `std::FILE` with `istream`/`ostream` for JIT serialization. It uses this mechanism to add the possibility to serialize to/from binary buffers, in addition to files, both in `libtorch` and from Python. `getExportImportCopy` in `test_jit.py` has been updated so that both file and buffer codepaths are exercised during tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11932 Differential Revision: D10084303 Pulled By: apaszke fbshipit-source-id: b850801b3932922fa1dbac6fdaed5063d58bc20d	2018-09-28 07:54:27 -07:00
Jeff Smith	d291cf7de6	Ensuring positive definite matrix before constructing (#12102 ) Summary: Ensuring positive definite matrix in Multivariate Normal Distribution Pull Request resolved: https://github.com/pytorch/pytorch/pull/12102 Reviewed By: ezyang, Balandat Differential Revision: D10052091 Pulled By: jeffreyksmithjr fbshipit-source-id: 276cfc6995f6a217a5ad9eac299445ff1b67a65f	2018-09-28 07:27:20 -07:00
Satish Nadathur	04c0971679	Special case BatchGather and BatchGatherGradient for block_size=1. (#11349 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11349 Special case BatchGather and BatchGatherGradient for block_size=1. This makes BatchGather 3-4X faster and BatchGatherGradient 10X for this case. Reviewed By: jspark1105, ilia-cher Differential Revision: D7218043 fbshipit-source-id: ea12042239a8adc92b9efcbd0b66e354fb43f4c7	2018-09-27 21:11:38 -07:00
Edward Yang	f5a0c337ba	Move TensorImpl IsType, meta, dim32, dim, ExtractDeviceOption to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12100 Reviewed By: jerryzh168 Differential Revision: D10051424 fbshipit-source-id: 5986e92ea54e60ec6bfe992015a05e09288c948c	2018-09-27 20:40:03 -07:00
Edward Yang	bbae57d06e	Move TensorImpl size_from_dim, size_to_dim, size_between_dim, canonical_axis_index to caffe2::Tensor (#12099 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12099 - Generalize the free functions to accept IntList, not just std::vector<int64_t> Reviewed By: jerryzh168 Differential Revision: D10051365 fbshipit-source-id: e3d571bf8fead22f6f25c3ca46f0c38c2bb065d2	2018-09-27 20:40:00 -07:00
Junjie Bai	3eb5940cf5	codemod cuda_gpu_id to device_id (#12022 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12022 codemod -d . --extensions h,cc,cpp,cu,py,proto,pbtxt,pb.txt,config cuda_gpu_id device_id codemod with 'Yes to all' Reviewed By: orionr Differential Revision: D9986213 fbshipit-source-id: f5614a5d26078817aee8caf79a494abfd6a95ff1	2018-09-27 20:24:53 -07:00
Edward Yang	149403f849	Move TensorImpl ndim, size, itemsize and nbytes to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12098 Reviewed By: jerryzh168 Differential Revision: D10051298 fbshipit-source-id: a833fad74bbda38c019ec2cb97d4bb6804e09963	2018-09-27 19:56:00 -07:00
Michael Suo	7f35e92af2	mutable lists (#10700 ) Summary: This PR implements the design that we discussed. Changes: - Added a World token IValue and type. The IValue is basically a dummy struct for now, in the future we may extend it (say, add thread-local state). - Effectful ops explicitly declare they are mutable by having World tokens as inputs and outputs in their schema. - Purely functional ops that use mutable values will get "fenced" and the world token will be threaded through the fences - AnnotateEffects pass which wires up all the world tokens together. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10700 Reviewed By: eellison Differential Revision: D9547881 Pulled By: michaelsuo fbshipit-source-id: ebbd786c31f15bf45e2ddb0c188438ff2f5f3c88	2018-09-27 19:25:13 -07:00
Edward Z. Yang	a5818047c4	Rewrite serialization to correctly handle partial reads/writes in all cases (#12143 ) Summary: Previously, doRead/doWrite were functions that could return partial reads/writes, and we checked for this case inconsistently in the call sites of serialization.cpp. Now, these functions do NOT return the amount of bytes read/written, and instead handle the necessary checking loop themselves. Fixes #12042. Maybe. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12143 Differential Revision: D10097027 Pulled By: ezyang fbshipit-source-id: fd222ab8a825bed352153648ad396acfe124a3e1	2018-09-27 19:09:53 -07:00
Edward Yang	a86a61b004	Implement caffe2::Tensor::raw_data() in terms of data() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12097 Reviewed By: jerryzh168 Differential Revision: D10051202 fbshipit-source-id: b4b61869363a606ab465d1500558226efae30d06	2018-09-27 18:40:37 -07:00
Edward Yang	2021b26bcb	Move TensorImpl::ShareExternalPointer helper overloads to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12096 Reviewed By: jerryzh168 Differential Revision: D10051126 fbshipit-source-id: a9b95d00512a0b4e6339d4f3f0bb180dd0c79247	2018-09-27 18:40:35 -07:00
Edward Yang	976a9e0454	Move TensorImpl::DebugString() to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12095 Reviewed By: jerryzh168 Differential Revision: D10051078 fbshipit-source-id: f56b6fc5d1cb8ae4b636e88efe607fe65cc1d7a0	2018-09-27 18:40:33 -07:00
Edward Yang	b0e48aa197	Move TensorImpl::Reshape(vector<int>) to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12094 Reviewed By: jerryzh168 Differential Revision: D10051079 fbshipit-source-id: 87fb91f31c33ce9b64c4654e79e0131ae391cd78	2018-09-27 18:40:30 -07:00
Edward Yang	8c533c2c90	Fix bug where Reshape() trashes strides. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12092 Reviewed By: jerryzh168 Differential Revision: D10051005 fbshipit-source-id: c36d1c8d12fb41baf8d1a1a9f38776deeff242de	2018-09-27 18:40:28 -07:00
Edward Yang	d02478e607	Move TensorImpl::ResizeLike to caffe2::Tensor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12091 Reviewed By: jerryzh168 Differential Revision: D10051012 fbshipit-source-id: 772ecd2e377f7d4e1ae510c1f647f6c8b71e5a57	2018-09-27 18:40:25 -07:00
Edward Yang	dd73d57643	Move TensorImpl::ShrinkTo to caffe2::Tensor (#12090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12090 This is a slight pessimization because we need to do a full recompute of is_contiguous(), even though a modification of dim-0 is guaranteed to preserve contiguity. Reviewed By: jerryzh168 Differential Revision: D10050905 fbshipit-source-id: b99233e21c9f4275b0db6e76740462e5430ce152	2018-09-27 18:40:23 -07:00
Edward Yang	00c6fb16e7	Move ExtendTo to caffe2::Tensor from TensorImpl Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12089 Reviewed By: jerryzh168 Differential Revision: D10050859 fbshipit-source-id: 843067aacfa2a519657220bc39a0f499582a48a4	2018-09-27 18:40:21 -07:00
Edward Yang	6a2dbc9808	Rename TensorImpl::GetDeviceType to device_type, and properly test if is_variable Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12087 Reviewed By: jerryzh168 Differential Revision: D10050781 fbshipit-source-id: 0b6c9d7caf3b1000691f86fcc7f2ef203936a29f	2018-09-27 18:40:19 -07:00
Edward Yang	c5fc2f1105	Merge UndefinedTensorImpl. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11972 Reviewed By: gchanan, Yangqing, jerryzh168 Differential Revision: D9995633 fbshipit-source-id: 6b4645c9d4bb0bc4301cd4bcfa76cf85331b8379	2018-09-27 18:40:16 -07:00
Wanchao Liang	e8cb6cb9d2	Fix some symbolics for ReduceSum, GE, LE (#12123 ) Summary: reduce sum negative indices turn to positive as caffe2 not supporting it. GE/LE symbolic operand order is wrong.. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12123 Reviewed By: houseroad Differential Revision: D10095467 Pulled By: wanchaol fbshipit-source-id: eb20248de5531c25040ee68b89bd18743498138d	2018-09-27 17:40:46 -07:00
Edward Yang	f6abd16a9d	Merge TensorImpl. (#11971 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11971 - Switched TensorImpl::data<T>() to use Storage::unsafe_data<T>() to work around an outstanding bug in the Storage::data<T>() implementation where it only works on Ts which are valid ScalarType - Qualify a bunch of identifiers which still live in caffe2:: namespace - strides returns an IntList now - s/update_strides/update_to_contiguous_strides/ - Correctly compute type_id_ for the Storage only constructor from Caffe2. This is special cased to only work for CPU and CUDA dense tensors. - Fix some signed-unsigned comparisons in Caffe2 code (OSS build for ATen/core has more restrictive warning tests.) Reviewed By: jerryzh168 Differential Revision: D9995559 fbshipit-source-id: 9c74032e011189e1c7e9a98d20f2bd1e25ad2e5c	2018-09-27 17:40:44 -07:00
Edward Yang	1619264ca5	Make ATen-core and caffe2 mutually recursive / merge template data<T>() (#11970 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11970 Adds an ATen-core-headers target, which caffe2_cpu_internal depends on, and makes ATen-core depend on caffe2_headers. If you link against ATen-core, you must ALSO link against caffe2_cpu_internal; if you link against caffe2_cpu_internal, you must ALSO link against ATen-core, otherwise you'll have undefined symbols. Then, we merge template data<T>() method with Caffe2 implementation, demonstrating that includes to Caffe2 (core) from ATen/core are working Reviewed By: jerryzh168 Differential Revision: D9967509 fbshipit-source-id: 3d220c38b2c3c646f8ff2884fdcc889fa9276c7a	2018-09-27 17:40:42 -07:00
Gu, Jinghui	c35f85a6d4	Export symbols for pybind and other libs after caffe2 rebase (#11975 ) Summary: Export symbols for pybind and other libs after caffe2 rebase Pull Request resolved: https://github.com/pytorch/pytorch/pull/11975 Differential Revision: D10042615 Pulled By: yinghai fbshipit-source-id: 6de562d99403099113093716834abc51bf726e94	2018-09-27 14:40:27 -07:00
wuhuikx	80e3081c28	Add observers for mkldnn fallback operators (#9093 ) Summary: Add observers for ideep operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9093 Reviewed By: salexspb Differential Revision: D9952949 Pulled By: yinghai fbshipit-source-id: 1678d1a738f8781dc75eb3cb9dfb309f7b7934fb	2018-09-27 14:11:19 -07:00
Cheng,Penghui	6e7e63fda3	Implementation MomentumSGD/MomentumSGDUpdate operators for mkl-dnn (#11686 ) Summary: the speed-up of a single operation is up to 6X on BDW. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11686 Reviewed By: yinghai Differential Revision: D9828129 Pulled By: wesolwsk fbshipit-source-id: 7dbacea90609e18438f6fe1229c641937d0696c8	2018-09-27 13:39:59 -07:00
Yangqing Jia	13cf39294d	Remove ATen/Error.h and use ATen/core/Error.h instead. (#12132 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12132 TSIA. No code change involved. Reviewed By: bwasti Differential Revision: D10083237 fbshipit-source-id: bdab029015b9d0f1fa1f866c68aa5945cc68db9d	2018-09-27 10:11:17 -07:00
Freddie Mendoza	a72603f8f8	Fix for ppc64le jit graph difference in sigmoid backward, see #10726 (#11579 ) Summary: As reported in Issue #10726, the jit compiler, when running on ppc64le, may produce an isomorphic output but fail a diff test against the expected output file. The expected output file is created from a test that was ran on x86_64. This ensures that if ppc64le test output is different, the output is instead compared to an expected output file created when the test is run on a ppc64le system. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11579 Differential Revision: D10080890 Pulled By: soumith fbshipit-source-id: 7249bf6b5dfa7c853368a3688a982bc9ed642bc9	2018-09-27 07:09:31 -07:00
Yangqing Jia	9c49bb9ddf	Move registry fully to c10 (#12077 ) Summary: This does 6 things: - add c10/util/Registry.h as the unified registry util - cleaned up some APIs such as export condition - fully remove aten/core/registry.h - fully remove caffe2/core/registry.h - remove a bogus aten/registry.h - unifying all macros - set up registry testing in c10 Also, an important note that we used to mark the templated Registry class as EXPORT - this should not happen, because one should almost never export a template class. This PR fixes that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12077 Reviewed By: ezyang Differential Revision: D10050771 Pulled By: Yangqing fbshipit-source-id: 417b249b49fed6a67956e7c6b6d22374bcee24cf	2018-09-27 03:09:54 -07:00
Jerry Ma	383d340e88	Small optimization for adam (#12107 ) Summary: Apply weight decay for Adam in-place instead of via copy. Synced offline with soumith , who mentioned that it should be OK. This is also consistent with other optimizers, e.g. `eee01731a5/torch/optim/sgd.py (L93)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12107 Reviewed By: soumith Differential Revision: D10071787 Pulled By: jma127 fbshipit-source-id: 5fd7939c79039693b225c44c4c80450923b8d673	2018-09-26 21:43:46 -07:00
Edward Yang	5da8a8c785	Handle undefined tensor in blob correctly. (#12125 ) Summary: You can't GetDeviceType an undefined tensor, so test for this case first. This allows you to safely move tensors out of blobs. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12125 Reviewed By: smessmer Differential Revision: D10080075 Pulled By: ezyang fbshipit-source-id: bb99b089b6daa9d4db99015208f939d7ce4d4a79	2018-09-26 21:43:41 -07:00
zrphercule	325101263a	Aten: catch2gtest (#11846 ) Summary: migrant all tests in aten to use gtest except of basic.cpp Sinc features of gtest are different from catch test, some of the tests has been re-writted with similar meaning. Basic test has some version conflict with valgrind according to CI, therefore this testcase is still implementing catch. It will be resolved by a different pr. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11846 Differential Revision: D10080860 Pulled By: zrphercule fbshipit-source-id: 439d4cf33fb6ccbe79b797860342853c63e59081	2018-09-26 20:57:45 -07:00
Peter Goldsborough	0f81039eaf	Better high level C++ documentation (#12079 ) Summary: I wrote some high level docs for the larger PyTorch C++ universe and the C++ frontend specifically. Happy for reviews, but let's please also land this ASAP so I can point users at something that looks more ready baked than the C++ docs landing page (https://pytorch.org/cppdocs) does right now. ezyang soumith CC ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/12079 Differential Revision: D10080785 Pulled By: goldsborough fbshipit-source-id: 3028de41373f307468eb1e3802aa27871c93b2e3	2018-09-26 20:57:43 -07:00
Christian Puhrsch	db5f8d42bb	Remove TIndex typedef from core/common.h (#12032 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12032 See title Reviewed By: dinhviethoa Differential Revision: D10023757 fbshipit-source-id: dbf0a043b2afab767f052bd4c5e8de13e0f57dcc	2018-09-26 17:02:54 -07:00
Zachary DeVito	478803a75f	Introduce type variables to implement generic list operators (#12040 ) Summary: We generate specialized list operations for int, float, and Tensor lists so that small lists of integers like the arguments to conv do not involve tons of boxing code. This PR adds a fallback GenericList for List types that contain any other type. It does so by adding type variables to `jit::Type`, and machinery for matching/replacing the type variables during `tryMatchSchema` and operator lookup. It also modifies the builtin list ops to include a fallback that works on a GenericList object that simply holds IValues. This is distinguished from IValue's tuple type so that conversion to/from Python still happens losslessly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12040 Differential Revision: D10037098 Pulled By: zdevito fbshipit-source-id: 0c5f2864d12e7d33554bf34cc29e5fb700dde150	2018-09-26 17:02:51 -07:00
Joel Marcey	75b1ae1acd	Update issue templates Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12114 Reviewed By: soumith Differential Revision: D10060349 Pulled By: JoelMarcey fbshipit-source-id: ed88bf95f78742b089adb043e88613a5db006a10	2018-09-26 16:26:00 -07:00
Syed Tousif Ahmed	1b45f68397	Use atomicAdd from cuda_fp16 header when building with CUDA 10 (#12108 ) Summary: An efficient atomicAdd for halfs has been added in `cuda_fp16.h` in CUDA 10: ```__CUDA_FP16_DECL__ __half atomicAdd(__half *address, __half val);``` Through this change, PyTorch will be able to utilize efficient atomicAdd when building with CUDA 10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12108 Differential Revision: D10053385 Pulled By: soumith fbshipit-source-id: 946c90691a8f6bdcf6d6e367a507ac3c9970b750	2018-09-26 15:28:17 -07:00
Vlad Belous	6ff568df4d	Add full namespace resolution in CAFFE_DURATION (#12065 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12065 Had compilation issues using CAFFE_DURATION in some contexts, specifically due to namespace resolution. Since this is a macro, it should fully qualify. Reviewed By: heslami Differential Revision: D10036132 fbshipit-source-id: b8d55dfe5e991ca702ce5b7483f0ffc699882c85	2018-09-26 13:29:18 -07:00
Dong Shi	d9c27f4d8d	T33898723: Simple put operators for caffe2 stats (#12057 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12057 Add simple put operators for various types of stats Reviewed By: mlappelbaum Differential Revision: D9925268 fbshipit-source-id: cec02b0027d2d0ef3d35741be4b02c429d492810	2018-09-26 12:39:37 -07:00
Doug Friedman	c2f8f5076c	add narrow() support for sparse tensors re: #8853 (#11342 ) Summary: Couple questions: 1) I used the log1p implementation in #8969 as a guide especially for testing. I'm not sure what the ```skipIfROCM``` annotation is for, so unsure if i need it for my test. 2) I implemented the branching logic in the narrow function itself; is this the right place to do so? I noticed that there a number of places where sparse-specific logic is handled with just an if statement in this file. Or should I implement a separate dispatch in native_functions.yml as in the log1p? And of course, happy to make any any other updates/changes that I may have missed as well. This is my first PR to the project. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11342 Differential Revision: D9978430 Pulled By: weiyangfb fbshipit-source-id: e73dc20302ab58925afb19e609e31f4a38c634ad	2018-09-26 12:24:54 -07:00
Adam Paszke	78fe149ab9	Fix ONNX bug, add symbolic for full Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12052 Differential Revision: D10044910 Pulled By: apaszke fbshipit-source-id: 015ef372966d7594e1b450e348d457429f6ef20d	2018-09-26 11:45:25 -07:00
Adam Paszke	18f9c07b18	Enable tracing of tensor factories with an out argument Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12051 Differential Revision: D10044890 Pulled By: apaszke fbshipit-source-id: 2d794bf408875600bc71f354f0b4961d6b715094	2018-09-26 09:40:34 -07:00
vishwakftw	b535aecd7c	Fix warnings emitted when testing distributions (#12038 ) Summary: The earlier tests had around 80 warnings, and now there are 6 warnings: these are due to JIT The changes remove the wrapping of a Tensor by a Tensor constructor, which emits warnings due to the changes in https://github.com/pytorch/pytorch/pull/11061 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/12038 Differential Revision: D10033392 Pulled By: apaszke fbshipit-source-id: b1faf368e650d062d7983f9932511bee4702a893	2018-09-26 09:24:54 -07:00
Orion Reblitz-Richardson	02d7c88fa4	Unify versions across setup.py, libtorch, and libcaffe2 (#12053 ) Summary: This unifies our versions across setup.py, libtorch, and libcaffe2. CMake has a default version (bumped to 1.0.0) that can be overridden by setup.py. The versions are also printed as a part of cmake/Summary.cmake to make sure they are correct. cc Yangqing ezyang soumith goldsborough pjh5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12053 Differential Revision: D10041878 Pulled By: orionr fbshipit-source-id: a98a01771f6c008d1016ab63ab785c3a88c3ddb0	2018-09-26 08:55:06 -07:00
Richard Zou	c8a0b11b7f	add autodiff expressions for common operations (#11832 ) Summary: This PR does a few things: Previously test_jit.py only tested autograd on backward graphs. This is because we borrow from test_autograd and construct graphs with a small number of nodes. Because the number of nodes is small (typically 1-2), those graph do not end up containing autodiff subgraphs, so autodiff never gets tested. This PR enables autodiff testing by doing the following: - added disableDebugAutodiffSubgraphInlining fn to graph_executor to disable autodiff subgraph inlining. - (implementation) added autodiffSubgraphNodeThreshold and autodiffSubgraphInlineThreshold. These are set to their default values (2, 5) but disableDebugAutodiffSubgraphInlining() sets both to 1, disabling subgraph inlining and allowing 1-node autodiff subgraphs. - The relevant backward jit tests disable autodiff subgraph inlining so they will test the autodiff versions of the operators instead of autograd whenever an autodiff variant exists. - We don't run the tests that do inline autodiff subgraphs anymore. This has no impact on testing correctness because the assumption is that autograd functions are correct and are tested in test_autograd.py This allows the graph fuser to work better because a lot of these ops were previously not autodiff-compatible but fusible. On a more concrete example, lstm backward contains a lot of tensor-scalar operations; these autodiff formulas help its double backward pass. Included: - arithmetic overloads - abs, acos, asin, atan, ceil, cos, cosh, exp, expm1, floor, fmod, frac, log, log10, log1p, log2 reciprocal, remainder, round, sin, sinh, tan, trunc, rsqrt TestJitGenerated tests autodiff for all of the added operations. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11832 Differential Revision: D10031256 Pulled By: zou3519 fbshipit-source-id: 9daf9900a5ad187743609cd0fbbd10b15411ad93	2018-09-26 08:10:04 -07:00
Sebastian Messmer	21ed7e51b6	Blob doesn't allow access to destroyCall anymore (#11548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11548 This removes getting/setting the DestroyCall of a Blob, paving the way to removing DestroyCall from Blob entirely and using the destructor stored in TypeMeta instead. Use sites have been fixed in diffs stacked below this. Reviewed By: dzhulgakov Differential Revision: D9775191 fbshipit-source-id: 97d72d0c62843849057f295c27f391e63c99c521	2018-09-26 01:45:28 -07:00
Sebastian Messmer	65cbb8226b	IValue can store Blob (#11414 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11414 caffe2::Blob can be stored in an IValue. This is a precondition for caffe2 to switch from Blob to IValue. Reviewed By: ezyang Differential Revision: D9731326 fbshipit-source-id: 462a39d2d9ab6f85b99b1670848c6976a3de417c	2018-09-26 01:12:31 -07:00
Sebastian Messmer	b7ebc00979	Move Blob to ATen/core (#11924 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11924 Previous diffs removed Blob -> caffe2 dependencies, now we can move it to ATen/core. This is pre-work for allowing storing Blob in IValue. Reviewed By: ezyang Differential Revision: D9980641 fbshipit-source-id: 32082a673ec94c42c20b2298adced8bb7ca94d07	2018-09-25 23:27:52 -07:00
Ansha Yu	8ff435c8f6	Use tempfile during serialized test comparison (#12021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12021 TestPilot runs stress tests in parallel. These fail for serialized tests because extracting (and subsequent deletion) of binary data during the process isn't threadsafe. Extract zips into tempfile to avoid this problem. Also remove some accidentally checked in zips of a test that we didn't end up including for now. Reviewed By: houseroad Differential Revision: D10013682 fbshipit-source-id: 6e13b850b38dee4106d3c10a9372747d17b67c5a	2018-09-25 20:55:45 -07:00
Wei Yang	807de9a1e3	fix segfault when grad to a hook fn is None (#12028 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/11751 by checking if a grad is a Python None object before getting cdata from it - behaviors: pre-fix ``` >>> a = torch.randn(5, requires_grad=True) >>> a_list = a.unbind() >>> a0 = a_list[0] >>> a0.register_hook ...: def hook(grad): ...: print(grad) >>> a_list[0].backward() tensor(1.) >>> print('a_list[0]', a_list[0].grad, a.grad) ('a_list[0]', None, tensor([1., 0., 0., 0., 0.])) >>> a_list[1].backward() # segfault ``` post-fix ``` >>> a = torch.randn(5, requires_grad=True) >>> a_list = a.unbind() >>> a0 = a_list[0] >>> a0.register_hook ... : def hook(grad): ... : print(grad) >>> a_list[0].backward() tensor(1.) >>> print(a_list[0].grad, a.grad) (None, tensor([1., 0., 0., 0., 0.])) >>> a_list[1].backward() None >>> print(a_list[1].grad, a.grad) (None, tensor([1., 1., 0., 0., 0.])) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/12028 Differential Revision: D10034094 Pulled By: weiyangfb fbshipit-source-id: 3f2135325fa7d338b920f57752057e4f6a6c0b1d	2018-09-25 19:10:25 -07:00
Cheng,Penghui	db2f7de5c3	Fallback CreateMutex/AtomicIter operators for mkl-dnn Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11685 Reviewed By: pjh5 Differential Revision: D9928058 Pulled By: wesolwsk fbshipit-source-id: 734e19c35a684481d9a4d4f0c596e4dceae51ad4	2018-09-25 17:41:08 -07:00
Yangqing Jia	28dba2f928	Unify all _EXPORT and _IMPORT macros across c++ backend (#12019 ) Summary: TSIA. Right now we should basically use C10_EXPORT and C10_IMPORT for explicitly marking dllexport and dllimport, as a continued effort of the C10 unification. This is a codemod by mechanically doing the following change: CAFFE2_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT} AT_CORE_{EXPORT,IMPORT} -> C10_{EXPORT,IMPORT} Pull Request resolved: https://github.com/pytorch/pytorch/pull/12019 Reviewed By: ezyang, teng-li Differential Revision: D10016276 Pulled By: Yangqing fbshipit-source-id: a420d62c43d1110105fc88f9e9076e28a3203164	2018-09-25 17:41:05 -07:00
Edward Yang	90bcf41291	Add safety asserts for methods on TensorImpl which don't work on Variable. (#12058 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12058 Methods on TensorImpl have to be written very carefully, because when you have a VariableImpl subclass of TensorImpl, usually the local fields on the TensorImpl are not valid; instead, you have to forward to the "wrapped" tensor. Functions which are virtualized are probably handled correctly by Variable, but functions which are NOT cannot be handled correctly and shouldn't be called if you have a Variable. This diff add checks to determine if this is the case or not. Reviewed By: jerryzh168 Differential Revision: D10034589 fbshipit-source-id: 650b2036ca9a044c0ab4abdf6f825521a64e1fc2	2018-09-25 17:25:47 -07:00
Yinghai Lu	658386a63f	Make USE_IDEEP work again (#12026 ) Summary: This PR establish a baseline so that we can build IDEEP ops in the new work flow. From this baseline, we need to - Merge the CMakefile of MKLDNN from caffe2 and Pytorch - Get rid of `USE_MKL=ON`. Build command from now on: ``` EXTRA_CAFFE2_CMAKE_FLAGS="-DUSE_MKL=ON -DINTEL_COMPILER_DIR=/opt/IntelComposerXE/2017.0.098" python setup.py build_deps ``` gujinghui Pull Request resolved: https://github.com/pytorch/pytorch/pull/12026 Differential Revision: D10041199 Pulled By: yinghai fbshipit-source-id: b7310bd84a494ac899d8e25da368b63feed4eeaf	2018-09-25 16:56:29 -07:00
Brian Gesiak	b7b9e3c7e8	Fix "identifier following the 'template' keyword does not refer to a template" (#12037 ) Summary: LLVM trunk emits an error diagnostic when attempting to compile caffe2. The identifiers following the `template` keywords are not templates, so the use of the keyword does not make sense in this context. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12037 Reviewed By: ezyang Differential Revision: D10024531 Pulled By: modocache fbshipit-source-id: da4b9ba405d9f7fd633ab8c1a61c77da9c1a1f89	2018-09-25 16:40:42 -07:00
Edward Yang	1e28294487	Delete some unused variables. (#12059 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12059 Differential Revision: D10034632 Pulled By: ezyang fbshipit-source-id: ff33da0d93734856b8e8bcfe744cefe127fffb91	2018-09-25 14:25:21 -07:00
Edward Yang	e53e8df20b	Support TypeIdentifier::name() (#12036 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12036 Sometimes you have a TypeIdentifier, and no way to get to the TypeMeta. Still nice to be able to read out the name. This should be obsoleted by smessmer's patches. Reviewed By: gchanan, mingzhe09088 Differential Revision: D10024554 fbshipit-source-id: 42cdceefd5c59be0441254665f66f5edc829f422	2018-09-25 14:25:19 -07:00
Edward Yang	aa1adde80b	Refactor fastGet/fastSet for clarity, removing a null pointer check. (#11902 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11902 Previously, they were going through THTensor_getStoragePtr which incurred a null pointer check on storage. Now they use unsafe_data method which doesn't do this check. I don't know if this actually make things go faster, but I get an added bonus of reducing code duplication, so we should take this change anyway :) Reviewed By: SsnL Differential Revision: D9977654 fbshipit-source-id: f45c74828213a0439480755ad0b2d7f8858cb327	2018-09-25 13:55:53 -07:00
Edward Yang	ceadde2a7f	Add some more locations to search for nccl. (#12063 ) Summary: Users generally expect ./configure to find libraries installed in /usr/local and /usr, so search for nccl there too. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12063 Differential Revision: D10036248 Pulled By: ezyang fbshipit-source-id: d331ddd2ccc8ac9846fb54222db284b1ec371659	2018-09-25 13:27:54 -07:00
Sam Gross	b263078bc3	Fix CUDA division by a scalar on large arrays. (#12023 ) Summary: The gpu_unary_kernel function was not handling arrays that cannot use 32-bit indexing. This functions was only called directly by CUDA division by a scalar. Other arithmetic operations go through gpu_binary_kernel, which already properly handled large arrays. This bug sometimes manifested as a crash and sometimes as an incorrect answer. Fixes #11788 Pull Request resolved: https://github.com/pytorch/pytorch/pull/12023 Differential Revision: D10034017 Pulled By: colesbury fbshipit-source-id: b17300f327de54035746bf02f576766007c9b144	2018-09-25 13:10:25 -07:00
vishwakftw	a106388187	Free MAGMA queues after use (#11882 ) Summary: This PR is a minor change, just adds a simple `magma_queue_destroy` function to the implementation of `Gesv`. Also, I have replaced calls for obtaining handles with those already written in ATen. ``` THCState_getCurrentSparseHandle(at::globalContext().getTHCState()) --> getCurrentCUDASparseHandle() THCState_getCurrentBlasHandle(at::globalContext().getTHCState()) --> getCurrentCUDABlasHandle() ``` Differential Revision: D10032204 Pulled By: soumith fbshipit-source-id: ccd11989ecdc357313f0b661a2468f75d3aecb0e	2018-09-25 12:56:57 -07:00
Sebastian Messmer	8f0db9bbbb	Removing some dependency edges from Blob to other caffe2 (#12043 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12043 Re-trying D9979976, this time with all call sites fixed. D9979976 got reverted because there was a call site that wasn't covered by sandcastle it seems. I fixed it and used 'grep' to ensure there aren't any more call sites in fbsource. Reviewed By: ezyang Differential Revision: D10026392 fbshipit-source-id: cd341514a8e53a40147ea0ee3e52f63bb6444157	2018-09-25 11:40:24 -07:00
Orion Reblitz-Richardson	94c513cc7f	Improve pybind11 message (#11640 ) Summary: Improving the message based on https://github.com/pytorch/pytorch/issues/11570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11640 Differential Revision: D10033383 Pulled By: orionr fbshipit-source-id: 0cdcdbe0582d896283a12970aebe771efa390dd2	2018-09-25 11:26:05 -07:00
Duc Ngo	364ae10bb8	nomnigraph - easy - add some python test helper methods (#12020 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12020 - make it less verbose to create random blobs in python unit test by adding some test helper methods - move str_compare test helper method to test_util.py Reviewed By: ZolotukhinM Differential Revision: D10003637 fbshipit-source-id: cb79d2ad508341f750a1bb8f564e87d055c65652	2018-09-25 10:55:19 -07:00
Will Feng	7122f8b3bb	Disable more flaky tests on CircleCI (#11399 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11362. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11399 Differential Revision: D9736673 Pulled By: yf225 fbshipit-source-id: cad8c0e86a70a01b047e648975ca5b9926e4acb3	2018-09-25 10:25:30 -07:00
Edward Yang	d7e11e3aae	Revert "Move CreateContext to global registry (#11688 )" (#12049 ) Summary: This reverts commit 3ae6ee4ebded136da30aa53fd3873d84acfbc9f0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12049 Differential Revision: D10030954 Pulled By: ezyang fbshipit-source-id: 6ca9de65b707c5b4c68280fc6f1b8e5ad7251efc	2018-09-25 10:13:43 -07:00
Edward Yang	3deb4791c3	Replace 'struct Tensor' with 'class Tensor'. (#12034 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12034 We need ATen and Caffe2 to line up, and the rule is that if you have any private/protected members, you should declare it as a class. Class we go. (There are some other obvious candidates for this treatment, but I've kept this patch just to Tensor) Reviewed By: gchanan, mingzhe09088 Differential Revision: D10024467 fbshipit-source-id: 17cfe2741ba9c3f56cb87d6f5d1afd3c61a8e4fe	2018-09-25 09:54:35 -07:00
Edward Yang	fcb3ccf23f	Don't record Git version automatically via cmake (#12046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12046 This /sounds/ like a good idea in theory, but a feature like this must be implemented very carefully, because if you just plop the Git version in a header (that is included by every file in your project, as macros.h is), then every time you do a 'git pull', you will do a FULL rebuild, because macros.h is going to regenerate to a new version and of course you have to rebuild a source file if a header file changes. I don't have time to implement it correctly, so I'm axing the feature instead. If you want git versions in, e.g., nightly builds, please explicitly specify that when you feed in the version. Reviewed By: pjh5 Differential Revision: D10030556 fbshipit-source-id: 499d001c7b8ccd4ef15ce10dd6591c300c7df27d	2018-09-25 09:40:19 -07:00
Gregory Chanan	0947712e5d	Move Factory functions from Type to TypeExtendedInterface. (#12025 ) Summary: This makes a few changes wrt Type, with the ultimate goal of removing Type from the public Methods/Functions. In particular: 1) Removes factory functions from Type, into TypeExtendedInterface. 2) sparse_coo_tensor is now a first class at:: namespace function, with TensorOptions overloads. 3) We move from Type-based sparse_coo_tensor dispatch to function-based. Note we still require a number of changes to get rid of tType in the public interface, in particular TensorOptions needs to support CUDA vs non-CUDA dispatch. That is coming in a future patch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12025 Reviewed By: ezyang Differential Revision: D10017205 Pulled By: gchanan fbshipit-source-id: 00807a37b09ed33f0656aaa165bb925abb026320	2018-09-25 09:40:17 -07:00
Edward Yang	d4ce41c4de	Rename tensor_impl_ to impl_ in Tensor (#12035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12035 This brings it in line with Caffe2's naming Reviewed By: mingzhe09088 Differential Revision: D10024485 fbshipit-source-id: a6feef82a56b5eb3043b0821ea802ba746e542a0	2018-09-25 09:11:39 -07:00
Edward Yang	71b99f28be	Give default values to members of TensorImpl. (#12033 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12033 These are reasonable sensible default values. One key pick is -1 for numel: this is because in Caffe2, a tensor may be in "un-allocated" with no storage state; this is historically represented in Caffe2 with numel_ == -1 Reviewed By: mingzhe09088 Differential Revision: D10024439 fbshipit-source-id: a167d727a7665daac7e7a1e98c0c89d8f1da6fa6	2018-09-25 09:11:37 -07:00
Maciej Bargiel	2cdf98a74d	Back out "Removing some dependency edges from Blob to other caffe2" Summary: The controller you requested could not be found. Original commit changeset: 2ea17724e223 Differential Revision: D10026321 Ninja: stable broken fbshipit-source-id: faf87cb7cc0f78c2c10d4aa6fceea279cd27acd6	2018-09-25 01:11:14 -07:00
Hong Xu	3417a1e7e4	Prepend a "const" to a for loop in printPyObject. (#11857 ) Summary: As pytuple should be a constant type (since obj is constant), potential errors would occur without this const decorator, e.g., when compiling against PyPy. Although PyPy is not supported yet, it would still be useful if we remove this compilation issue (out of very few numbers of compilation issues) to allow hackers playing with them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11857 Differential Revision: D10024149 Pulled By: soumith fbshipit-source-id: aa7e08e58f6369233a11477113351dccd3854ba8	2018-09-24 23:12:57 -07:00
Sebastian Messmer	17a65bf9b6	Removing some dependency edges from Blob to other caffe2 (#11923 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11923 This is pre-work to allow moving Blob to ATen/core, which cannot depend on caffe2 anymore. (1) Removing the Blob -> Tensor dependency allows us to move Blob to ATen/core and use it inside IValue without having to wait for the Tensor merge to be complete. (2) In the final Blob design, we want it to be a very small class that doesn't have any special treatment for Tensor (or to be more correct, doesn't allow storing Tensor anymore), so this is anyhow the direction we want to go. This changes call sites that will have to be moved to IValue later, but they cannot be moved to IValue directly, because for that, IValue first needs to be able to store Blob, which in turn first needs this diff and some other changes coming up in future diffs. Codemods: $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.IsTensorType\\(" "BlobIsTensorType(\\1, " $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->IsTensorType\\(" "BlobIsTensorType(\\1, " $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)\\.GetMutableTensor\\(" "BlobGetMutableTensor(\\1, " $ codemod --extensions h,hpp,c,cpp,cc "([a-zA-Z0-9_]+)->GetMutableTensor\\(" "BlobGetMutableTensor(\\1, " It is, however, not only these codemods because regex based refactoring was only able to match a small amount of the call sites. To catch more, I wouldn've needed a AST aware tool like clangr, which I didn't figure out how to use. Reviewed By: ezyang Differential Revision: D9979976 fbshipit-source-id: 2ea17724e223b5b73b44f99362727759ca689e61	2018-09-24 22:57:05 -07:00
Edward Yang	dfa03e94eb	Fix mispelling of AVAILABLE. (#12016 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/12016 Reviewed By: pietern Differential Revision: D10010808 Pulled By: ezyang fbshipit-source-id: ff6394ae9a53f7fdad2cadb4e019e09ac63bba96	2018-09-24 20:46:41 -07:00
John	86e025fca2	magma-cuda should reference updated versions (#12000 ) Summary: Source build doc section LAPACK GPU only lists magma-cuda80 The magma-cuda version should reflect the installed version of cuda. - Verified on ubuntu with magma-cuda92 with build and test - Verified 91 is available Pull Request resolved: https://github.com/pytorch/pytorch/pull/12000 Differential Revision: D10024158 Pulled By: soumith fbshipit-source-id: a34c85a5e87b52657f1e6f7b21d235306ab7b2aa	2018-09-24 20:26:26 -07:00
Pieter Noordhuis	5d4624a1d9	Fix return temporary as reference in MPI backend (#11947 ) Summary: The MPI async work class returned a temporary as reference, which is invalid (hat tip to colesbury for noticing it). This change fixes that and uses a std::exception_ptr to hold on to the exception if applicable, and then returns the reference by throwing it and returning it, like the existing code path. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11947 Differential Revision: D10019928 Pulled By: pietern fbshipit-source-id: 5a8ed0e894615a09224ca5e48c8b3104275a3019	2018-09-24 20:17:38 -07:00
Spandan Tiwari	9068a46dba	Fix deprecated function warning in ONNX model test. (#11827 ) Summary: When running /test/onnx/test_models.py, we see deprecation warnings in the test points for `super_resolution` and `squeezenet` models. This change updates those models to use the recommended methods, instead of the deprecated ones. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11827 Reviewed By: houseroad Differential Revision: D10023998 Pulled By: ezyang fbshipit-source-id: ee4e14304678c532ebd574e7bd143e3b311995ab	2018-09-24 19:59:02 -07:00
Adam Paszke	a830964007	Eliminate no-op adds and muls in peephole pass (#11801 ) Summary: Because we emit a lot of them in our symbolic AD. This brings down the backward time of an LSTM I'm testing from 14.2ms to 12.5ms (a 15% improvement). Pull Request resolved: https://github.com/pytorch/pytorch/pull/11801 Differential Revision: D9916815 Pulled By: apaszke fbshipit-source-id: 2d9cb886c424ccd43b9f996aad89950d3bddf494	2018-09-24 17:48:48 -07:00
Jerry Zhang	3ae6ee4ebd	Move CreateContext to global registry (#11688 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11688 As a first step to remove static context(merge with allocator), we'll create a global registries for context constructors, and remove CreateContext function from tensor. Reviewed By: ezyang, dzhulgakov Differential Revision: D9779821 fbshipit-source-id: 8b239ea50af7a0556fde2382f58f79194f0e3dc1	2018-09-24 17:07:50 -07:00
Bram Wasti	b7c302da1a	Make gen_jit_dispatch runnable (#12018 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/12018 Tried to use the file and ran into a small bug, this fixes it Differential Revision: D10013231 fbshipit-source-id: 4cf8c29cf9e2cedd7a28fa0cc0196e5144a54bf2	2018-09-24 16:09:48 -07:00
Hoa Dinh	70e4b3ef59	Revert D10006069: Remove TIndex typedef from core/common.h Differential Revision: D10006069 Original commit changeset: 5e2aac993968 fbshipit-source-id: fbd8d3860635211e641ca14eaff7a64882e0d6bd	2018-09-24 15:30:25 -07:00
Peter Goldsborough	e05d689c49	Unify C++ API with C++ extensions (#11510 ) Summary: Currently the C++ API and C++ extensions are effectively two different, entirely orthogonal code paths. This PR unifies the C++ API with the C++ extension API by adding an element of Python binding support to the C++ API. This means the `torch/torch.h` included by C++ extensions, which currently routes to `torch/csrc/torch.h`, can now be rerouted to `torch/csrc/api/include/torch/torch.h` -- i.e. the main C++ API header. This header then includes Python binding support conditioned on a define (`TORCH_WITH_PYTHON_BINDINGS`), which is only passed when building a C++ extension. Currently stacked on top of https://github.com/pytorch/pytorch/pull/11498 Why is this useful? 1. One less codepath. In particular, there has been trouble again and again due to the two `torch/torch.h` header files and ambiguity when both ended up in the include path. This is now fixed. 2. I have found that it is quite common to want to bind a C++ API module back into Python. This could be for simple experimentation, or to have your training loop in Python but your models in C++. This PR makes this easier by adding pybind11 support to the C++ API. 3. The C++ extension API simply becomes richer by gaining access to the C++ API headers. soumith ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11510 Reviewed By: ezyang Differential Revision: D9998835 Pulled By: goldsborough fbshipit-source-id: 7a94b44a9d7e0377b7f1cfc99ba2060874d51535	2018-09-24 14:44:21 -07:00
Sam Gross	1c09bfde1b	Make promoteType(half, integer) -> half (#11941 ) Summary: Changes the result type of half type and any integer type to return half type (instead of float or double). This is based on top of #11808. The first new commit is "Make promoteType(half, integer) -> half". I'll rebase on top of master once that PR lands. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11941 Differential Revision: D10014122 Pulled By: colesbury fbshipit-source-id: 16a5eb3406a5712069201d872d8736d0599e9411	2018-09-24 13:55:42 -07:00
Adam Paszke	51414822f5	Stop moving constants into DifferentiableSubgraphs (#11809 ) Summary: Or even taking them as inputs. This prevents optimizations to happen either inside the differentiable subgraphs, or in the surrounding graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11809 Differential Revision: D10009680 Pulled By: apaszke fbshipit-source-id: face638566228e470a6deec48dc2aa3a1cce26d4	2018-09-24 13:24:53 -07:00
Syed Tousif Ahmed	ffbac7d0bb	Miscellaneous updates for CUDA 10 (#12017 ) Summary: This PR has some updates related to CUDA 10. - `c2195e9864` ensures that the repo successfully builts on CUDA 10. Addresses https://github.com/pytorch/pytorch/issues/11888 - `423d8d3524` follows up on the cufft max plan number bug: https://github.com/pytorch/pytorch/issues/11089, which has been fixed in CUDA 10. Pull Request resolved: https://github.com/pytorch/pytorch/pull/12017 Differential Revision: D10013405 Pulled By: soumith fbshipit-source-id: 5bc6d7f71d5133f7821b407b1ac6c51bef0f6fa8	2018-09-24 11:58:32 -07:00
Yangqing Jia	a6f1ae7f20	set up c10 scaffolding. Move macros proper first. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11939 Reviewed By: orionr, dzhulgakov Differential Revision: D10004629 Pulled By: Yangqing fbshipit-source-id: ba50a96820d35c7922d81c78c4cbe849c85c251c	2018-09-24 11:09:59 -07:00
Christian Puhrsch	1a1d79e761	Remove TIndex typedef from core/common.h (#11993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11993 See title Reviewed By: ezyang Differential Revision: D10006069 fbshipit-source-id: 5e2aac993968307c850e431c00052cb1a339ced2	2018-09-24 10:55:55 -07:00
Christian Puhrsch	a9e6a673ae	Remove caffe2::Tensor::capacity_nbytes, at::Tensor::to##name##Data, (#11876 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11876 Modern C++ api instead of macros, item() is aligned with Python frontend. caffe2::Tensor::capacity_nbytes is effecitvely unused and confusing w.r.t. caffe2::Tensor::nbytes(). codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCByte "item<uint8_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCLong "item<int64_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCInt "item<int32_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCDouble "item<double>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toByteData "data<uint8_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toLongData "data<int64_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toIntData "data<int32_t>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toDoubleData "data<double>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toFloatData "data<float>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCByte "item<uint8_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCLong "item<int64_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCInt "item<int32_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCDouble "item<double>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toByteData "data<uint8_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toLongData "data<int64_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toIntData "data<int32_t>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toDoubleData "data<double>" codemod -d hphp --extensions cc,cpp,cu,cuh,h,py,hpp,mm toFloatData "data<float>" codemod -d caffe2 --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCComplexDouble "item<std::complex<double>>" codemod -d tc --extensions cc,cpp,cu,cuh,h,py,hpp,mm toCFloat "item<float>" Reviewed By: ezyang Differential Revision: D9948572 fbshipit-source-id: 70c9f5390d92b82c85fdd5f8a5aebca338ab413c	2018-09-24 10:40:10 -07:00
Gregory Chanan	1178851280	Get rid of most usages of Type.tensor. (#12002 ) Summary: 1) Most usages are replaced by at::empty. 2) native_tensor has its namespace function removed 3) Type.tensor(sizes, strides) becomes at::empty_strided(sizes, strides). Pull Request resolved: https://github.com/pytorch/pytorch/pull/12002 Differential Revision: D10007201 Pulled By: gchanan fbshipit-source-id: 5e5647c050ed2ecb87a33e0b5ce4928fa3186c34	2018-09-24 10:16:18 -07:00
Christian Puhrsch	76ab26cc3e	Remove unused THNN functions due to removal of torch/legacy (#11946 ) Summary: See title Pull Request resolved: https://github.com/pytorch/pytorch/pull/11946 Differential Revision: D9994625 Pulled By: cpuhrsch fbshipit-source-id: fca3d48ecbdab06ce53249db2402fc4613da4d21	2018-09-22 21:54:55 -07:00
Christian Puhrsch	a6630e25af	Remove many caffe2::TIndex and replace them with int64_t (#11943 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11943 See title Reviewed By: ezyang Differential Revision: D9992645 fbshipit-source-id: e8f80d6ea762971513e5e8072975ceea53e1f11a	2018-09-22 18:11:04 -07:00
Greg McGary	5d0f1c3c8f	Add #include to satisfy Android NDK unified headers Summary: Old per-API+arch headers reside in /opt/android_ndk/r/platforms/android-/arch-/usr/include/ New Unified headers reside in /opt/android_ndk/r/sysroot/usr/include/ Unified headers are not exactly drop-in replacements for the old ones. Old headers had some nested includes that are absent in the unified versions, so we need to explicitly include them. Reviewed By: mzlee Differential Revision: D9952200 fbshipit-source-id: 6515e1d1ab576069db499c3fb23a69d507279c8c	2018-09-22 15:39:56 -07:00
Junjie Bai	7517e53468	Update onnx submodule to onnx/onnx@c4734c6 (#11958 ) Summary: `c4734c6200` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11958 Differential Revision: D10002779 Pulled By: bddppq fbshipit-source-id: 8bd7dfc8fdaf0b699a61f5b228f7102a16b92258	2018-09-22 01:40:31 -07:00
Junjie Bai	f15474ade8	Export caffe2::Caffe2Annotation symbols (#11965 ) Summary: Some of these symbols are used by device_test.cc . `d0db23e95a` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11965 Reviewed By: bwasti Differential Revision: D10002439 Pulled By: bddppq fbshipit-source-id: 4ae95b9c888b3c7685d0ffdbcbfa3441bcf90091	2018-09-21 22:43:48 -07:00
Sebastian Messmer	1c282ab99a	Move GetExceptionString to Error.h (#11501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11501 This doesn't really belong to TypeMeta, moving it to the error handling header Reviewed By: ezyang Differential Revision: D9763424 fbshipit-source-id: 127a8246171ab3a4475f2767d2dc1cc13c486a2e	2018-09-21 21:54:33 -07:00
Peter Goldsborough	825181ea9d	Rewrite C++ API tests in gtest (#11953 ) Summary: This PR is a large codemod to rewrite all C++ API tests with GoogleTest (gtest) instead of Catch. You can largely trust me to have correctly code-modded the tests, so it's not required to review every of the 2000+ changed lines. However, additional things I changed were: 1. Moved the cmake parts for these tests into their own `CMakeLists.txt` under `test/cpp/api` and calling `add_subdirectory` from `torch/CMakeLists.txt` 2. Fixing DataParallel tests which weren't being compiled because `USE_CUDA` wasn't correctly being set at all. 3. Updated README ezyang ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/11953 Differential Revision: D9998883 Pulled By: goldsborough fbshipit-source-id: affe3f320b0ca63e7e0019926a59076bb943db80	2018-09-21 21:28:16 -07:00
Bram Wasti	d0db23e95a	Add distributed annotations Summary: Annotations for DAI Reviewed By: duc0 Differential Revision: D9805867 fbshipit-source-id: 9ce2d9f3984817510ec8362a281f39878aad55e7	2018-09-21 19:09:59 -07:00
Wei Yang	de11fe0c83	migrate PReLU to ATen (#11758 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/10723 - migrate PReLU to ATen and deprecate legacy PReLU - performance: CPU with weight.numel() = 1 ``` >>> m = nn.PReLU() >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 100 loops, best of 100: 9.43 ms per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 10 loops, best of 100: 24.4 ms per loop >>> m = nn.PReLU() >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 695 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 2.47 ms per loop ``` CPU with weight.numel() = channels ``` >>> m = nn.PReLU(100) >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 603 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 13.3 ms per loop >>> m = nn.PReLU(100) >>> x = torch.randn(100, 100, 100, requires_grad=True) >>> %timeit -r 100 y = m(x) 1000 loops, best of 100: 655 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 y.backward(retain_graph=True) 100 loops, best of 100: 2.45 ms per loop ``` CUDA with weight.numel() = 1 ``` >>> m = nn.PReLU().cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 10000 loops, best of 100: 187 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.01 ms per loop >>> m = nn.PReLU().cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 1000 loops, best of 100: 195 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.28 ms per loop ``` CUDA with weight.numel() = channel ``` >>> m = nn.PReLU(100).cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 1000 loops, best of 100: 174 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.27 ms per loop >>> m = nn.PReLU(100).cuda() >>> x = torch.randn(100, 100, 100, requires_grad=True).cuda() >>> %timeit -r 100 torch.cuda.synchronize(); y = m(x); torch.cuda.synchronize(); 10000 loops, best of 100: 181 µs per loop >>> y = m(x).sum() >>> %timeit -r 100 torch.cuda.synchronize(); y.backward(retain_graph=True); torch.cuda.synchronize(); 100 loops, best of 100: 2.26 ms per loop ``` The huge performance regression in CPU when weight.numel() = 1 is addressed by replacing at::CPU_tensor_apply* with parallelized kernels. ezyang SsnL zou3519 soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11758 Differential Revision: D9995799 Pulled By: weiyangfb fbshipit-source-id: d289937c78075f46a54dafbde92fab0cc4b5b86e	2018-09-21 16:26:04 -07:00
Owen Anderson	89d56ae435	Move function deletion from the stack to the heap. (#11611 ) Summary: This eliminates the need for any heuristics regarding stack size limits. This is a re-do #11534 with a fix to properly handle cases where multiple edges exist between a pair of functions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11611 Differential Revision: D9991198 Pulled By: resistor fbshipit-source-id: fecd2c5cac7e78f82a0f20cf33268bb1617bb4a0	2018-09-21 16:11:03 -07:00
Richard Zou	b5f60af94c	Shape prop view/reshape/as_strided through prim::ListConstructs (#11877 ) Summary: Previously, aten::view returned a Dynamic type when attr::size is a prim::ListConstruct. See [this for a repro](https://gist.github.com/zou3519/cbd610472ba3369f556fa612a7d93b28). This prevented a pre-multipled lstm input graph from being fusible (aten::view is necessary to do premultiplication). If aten::view is passed an output of a prim::ListConstruct node, then shape prop should be able to figure out its TensorType because we statically know the number of inputs to prim::ListConstruct. This PR implements that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11877 Differential Revision: D9972356 Pulled By: zou3519 fbshipit-source-id: cb87786f6e7f222d4b8f07d8f2a9de34859cb6a5	2018-09-21 14:20:01 -07:00
Adam Paszke	7efbf3a827	Specialize ArgumentSpecs on tuple elements too (#11863 ) Summary: This is pretty important because a common situation of passing LSTM hidden states as a tuple completely trashes performance of a network. Cleans up all our propagation/undef specialization passes, at a cost of increased complexity of `ArgumentSpec` and `GraphExecutor`. An alternative would be to simply flatten all tuple inputs to a graph ahead of time, but that might just end up being confusing in the future (you never know if you're working with a graph that can have tuple or not). Pull Request resolved: https://github.com/pytorch/pytorch/pull/11863 Differential Revision: D9992814 Pulled By: apaszke fbshipit-source-id: 0a565a3b23e32f8fa72c0534e07c1ce6187739fc	2018-09-21 14:19:58 -07:00
Sam Gross	1cf5b0c7c1	Fix casting logic for 0d CPU tensors in CUDA ops (#11808 ) Summary: Previously, we didn't cast any 0-dim tensors used in CUDA operations. We can only avoid the casts for 0-dim CPU tensors used in CUDA operations. Fixes #11795 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11808 Differential Revision: D9922406 Pulled By: colesbury fbshipit-source-id: 940b8a8534770aa5cd70d5d09b96be0f0f8146ff	2018-09-21 14:19:56 -07:00
Adam Paszke	1ad7e0c5ec	Minor JIT improvements (#11654 ) Summary: - Disable addmm fusion. The reason for this is explained in the comment. - Tiny change in `stack.h` that lets us avoid constructing an unnecessary temporary `IValue` on the (C++) stack (it will only get created on the interpreter stack directly). - Fixed a correctness issue in requires grad propagation Pull Request resolved: https://github.com/pytorch/pytorch/pull/11654 Reviewed By: colesbury Differential Revision: D9813739 Pulled By: apaszke fbshipit-source-id: 23e83bc8605802f39bfecf447efad9239b9421c3	2018-09-21 14:19:54 -07:00
David Riazati	4e65fbfee5	Remove tests from EXCLUDE_SCRIPT that pass (#11916 ) Summary: Spruriously added in #11261 I had a PR to catch these automatically (#11279), but it had some issues passing on some CI environments but not others (e.g. for `test_nn_group_norm`), any ideas? Pull Request resolved: https://github.com/pytorch/pytorch/pull/11916 Differential Revision: D9992065 Pulled By: driazati fbshipit-source-id: 05cfa8ed9af939e8ffd5827847ee7bfe0be799b2	2018-09-21 14:19:50 -07:00
James Reed	00fe2c5606	Use -O1 for sleef build in Debug mode (#11942 ) Summary: `-O0` is problematic for compiling sleef kernels since they consist of a bunch of vector intrinsics. In `-O0`, the compiler spills every intermediate value to the stack. In one example (TestEndToEndHybridFrontendModels.test_snli in test_jit.py) the function `Sleef_tanhf8_u10avx2` would spill 30kB of AVX registers onto the stack and run two orders of magnitude slower than in opt mode, causing the test to take minutes rather than seconds. I've verified that this behavior is not present with `-O1` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11942 Differential Revision: D9994658 Pulled By: jamesr66a fbshipit-source-id: cdd9474c6ae3aa9898d5715ac19a900f5f90468a	2018-09-21 13:24:59 -07:00
Thomas Viehmann	775358e4c2	Add non-legacy test of bilinear (#11935 ) Summary: Fixes: #11905 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11935 Differential Revision: D9991120 Pulled By: soumith fbshipit-source-id: b00ad4f405440664ae5228b229a2ba0a5d3d92f6	2018-09-21 12:43:35 -07:00
Brian Johnson	23f5b2abbe	Fixes an error with canonical url. (#11938 ) Summary: Deleted this section by mistake in last PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11938 Reviewed By: SsnL Differential Revision: D9993258 Pulled By: brianjo fbshipit-source-id: 2552178cebd005a1105a22930c4d128c67247378	2018-09-21 12:21:42 -07:00
Adam Paszke	c2a2110d71	Stop tracing _out overloads (#11910 ) Summary: They aren't recognized anywhere in the JIT Pull Request resolved: https://github.com/pytorch/pytorch/pull/11910 Differential Revision: D9979968 Pulled By: apaszke fbshipit-source-id: bb2505a14e3b1e54d5c243f99c80a4f4d918b204	2018-09-21 11:44:10 -07:00
Yangqing Jia	c6a14b1edd	Revert D9985212: [pytorch][PR] [minor] remove a remaining todo line deletion in THD cmake Differential Revision: D9985212 Original commit changeset: 5f8e7ac94101 fbshipit-source-id: 1783cbfc91008ab3db36bad7c1bf51e16da7fb2d	2018-09-21 11:25:53 -07:00
Wei Yang	817e83fc01	fix PR #11061 (#11815 ) Summary: - fix PR https://github.com/pytorch/pytorch/pull/11061 by moving `detach_()` and `set_requires_grad()` to `torch.tensor_ctor()` and `tensor.new_tensor`, and also removed warnings and `args_requires_grad` from `internal_new_from_data ` - with this patch, the returned tensor from `tensor_ctor()` and `new_tensor` will be detached from source tensor, and set requires_grad based on the input args - `torch.as_tensor` retains its behavior as documented gchanan apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11815 Differential Revision: D9932713 Pulled By: weiyangfb fbshipit-source-id: 4290cbc57bd449954faadc597c24169a7b2d8259	2018-09-21 11:04:19 -07:00
Thomas Viehmann	6834dcab1c	Align cuda multinomial without replacement to CPU behaviour (#11933 ) Summary: We do this by being more NaN tolerant. Fixes: #9062 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11933 Differential Revision: D9991129 Pulled By: soumith fbshipit-source-id: c99b04462c1bee90d00eeabb0c111de12f855f4d	2018-09-21 11:04:17 -07:00
Gao, Xiang	784d345828	Fix docstring of `torch.jit.createResolutionCallback` (#11921 ) Summary: The sample code in the docstring of `torch.jit.createResolutionCallback` is not working: `createResolutionCallback()` gets the frame of `bar`. In order to get the frame of `baz`, one need to use `createResolutionCallback(1)` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11921 Differential Revision: D9989123 Pulled By: soumith fbshipit-source-id: a7166defdccbbf6979f7df4c871298e6b9a2b415	2018-09-21 09:41:57 -07:00
Adam Paszke	e655f16c35	Pop stashed IntList in resize_, warn about its usage when tracing. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11909 Differential Revision: D9979595 fbshipit-source-id: 07b1027bd6bd1605a31afd4f57bcd58e307fa41e	2018-09-21 08:40:20 -07:00
Thomas Viehmann	4fb7e72fe5	Fix _thnn_fused_lstm_cell backward (#11872 ) Summary: There are two parts: - Optional tensors cannot be dispatch tensors because dispatch tensors cannot be optional. - While the kernel dealt with undefined grad_outs, the logistics around it did not fully accomodate grad_hy being undefined. Fixes: #11800 Thank you, mttk for the reproduction! Pull Request resolved: https://github.com/pytorch/pytorch/pull/11872 Differential Revision: D9978527 Pulled By: apaszke fbshipit-source-id: e622c288d2eac93bd8388e141fb773f2588e2b8f	2018-09-21 08:25:00 -07:00
Edward Yang	48c8adfe1b	Turn storage on UndefinedTensorImpl into nullptr. (#11738 ) Summary: I also fix a bug that crept in while we had incorrect semantics where UndefinedTensorImpl was a CPU tensor, and thus some moves which shouldn't have been legal didn't crash. Moving out the Tensor* also moved out the Tensor* in the blob, and it's not supported to store an undefined tensor in a blob. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11738 Reviewed By: gchanan Differential Revision: D9847859 fbshipit-source-id: db6be0f76a8e6526a89fd0e87b6a23b9cc820c8d	2018-09-21 08:24:57 -07:00
Edward Yang	11bd2f2509	Retainable is no more (#11900 ) Summary: Stack:     ⚫  #11900 Retainable is no more  [💛](https://our.intern.facebook.com/intern/diff/D9977505/)     ⚪  #11902 Refactor fastGet/fastSet for clarity, removing a null pointer check.  [💛](https://our.intern.facebook.com/intern/diff/D9977654/) Kill it with fire Pull Request resolved: https://github.com/pytorch/pytorch/pull/11900 Differential Revision: D9979779 Pulled By: ezyang fbshipit-source-id: 0a437e7a0baadb6440e7dc39a01b4a406171faa7	2018-09-21 06:58:18 -07:00
peter	a7afd133f5	Sync FindCUDA.cmake with upstream cmake repo (#11880 ) Summary: Upstream PR: https://gitlab.kitware.com/cmake/cmake/merge_requests/2391/diffs Pull Request resolved: https://github.com/pytorch/pytorch/pull/11880 Differential Revision: D9989119 Pulled By: soumith fbshipit-source-id: 66e87367127975a5f1619fe447f74e76f101b503	2018-09-21 06:58:17 -07:00
Luca Antiga	58d28a5f12	Fix saving loaded module (#11915 ) Summary: This PR fixes #11913. In order to test for this, the model is serialized twice in `getExportImportCopy`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11915 Differential Revision: D9984697 Pulled By: soumith fbshipit-source-id: ae0250c179000c03db1522b99410f6ecb9681297	2018-09-21 06:58:16 -07:00
Yangqing Jia	0d9be2135f	remove a remaining todo line deletion in THD cmake (#11920 ) Summary: TSIA Pull Request resolved: https://github.com/pytorch/pytorch/pull/11920 Differential Revision: D9985212 Pulled By: Yangqing fbshipit-source-id: 5f8e7ac94101177740e791f44eaa8c8ec55a908c	2018-09-21 00:40:20 -07:00
Sebastian Messmer	b2b05b7c20	Move blob serialization to free functions (#11817 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11817 Blob::Serialize() and Blob::Deserialize() are now free functions SerializeBlob(), DeserializeBlob() instead. This takes away access to Blob internals from them and makes future refactorings easier. Reviewed By: ezyang Differential Revision: D9882726 fbshipit-source-id: 3251ebd4b53fc12f5e6924a6e4a8db3846ab3729	2018-09-20 23:27:34 -07:00
Brian Johnson	17cd426c72	Updated docs styles (#11835 ) Summary: Updated requirements.txt and conf.py. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11835 Reviewed By: SsnL Differential Revision: D9941160 Pulled By: brianjo fbshipit-source-id: fbac91214558e6d17beff74261d990c7dc762038	2018-09-20 21:11:12 -07:00
Peter Goldsborough	d712a71741	Protobuf serialization (#11619 ) Summary: This PR serves two purposes: 1. Design an abstraction over a serialization scheme for C++ modules, optimizers and tensors in general, 2. Add serialization to the ONNX/PyTorch proto format. This is currently a rough prototype I coded up today, to get quick feedback. For this I propose the following serialization interface within the C++ API: ```cpp namespace torch { namespace serialize { class Reader { public: virtual ~Reader() = default; virtual void read(const std::string& key, Tensor& tensor, bool is_buffer = false) = 0; virtual void finish() { } }; class Writer { public: virtual ~Reader() = default; virtual void writer(const std::string& key, const Tensor& tensor, bool is_buffer = false) = 0; virtual void finish() { } }; }} // namespace torch::serialize ``` There are then subclasses of these two for (1) Cereal and (2) Protobuf (called the "DefaultWriter" and "DefaultReader" to hide the implementation details). See `torch/serialize/cereal.h` and `torch/serialize/default.h`. This abstraction and subclassing for these two allows us to: 1. Provide a cereal-less serialization forward that we can ship and iterate on going forward, 2. Provide no-friction backwards compatibility with existing C++ API uses, mainly StarCraft. The user-facing API is (conceptually): ```cpp void torch::save(const Module& module, Writer& writer); void torch::save(const Optimizer& optimizer, Writer& writer); void torch::read(Module& module, Reader& reader); void torch::read(Optimizer& optimizer, Reader& reader); ``` with implementations for both optimizers and modules that write into the `Writer` and read from the `Reader` ebetica ezyang zdevito dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/11619 Differential Revision: D9984664 Pulled By: goldsborough fbshipit-source-id: e03afaa646221546e7f93bb8dfe3558e384a5847	2018-09-20 20:39:34 -07:00
Roy Li	30521a37ad	codemod: caffe::float16 -> at::Half (#11785 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11785 Replace each instead of float16 with Half. Reviewed By: Yangqing Differential Revision: D9892158 fbshipit-source-id: b9225ca7bd5c84fd1c04a9d24b026c8b6cbff120	2018-09-20 18:55:19 -07:00
Roy Li	a9459bf7b5	Replace float16 with at::Half in caffe2 (#11676 ) Summary: - Finishes unifying Half type in pytorch and caffe2 - As a side effect, aten_op works for fp16 now Pull Request resolved: https://github.com/pytorch/pytorch/pull/11676 Reviewed By: weiyangfb Differential Revision: D9829019 Pulled By: li-roy fbshipit-source-id: b8c9663873c10fe64c90ef180dc81af2e866674e	2018-09-20 18:55:17 -07:00
Lu Fang	9c44c60794	Bump up the frontend version (#11873 ) Summary: To update the onnx model zoo. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11873 Reviewed By: BIT-silence Differential Revision: D9953369 Pulled By: houseroad fbshipit-source-id: 5e96a982b8029dceeb08e3bea4094bae053e1865	2018-09-20 16:20:48 -07:00
Thomas Viehmann	9f0d9db6e4	Improve GRU/LSTM documentation for multiple layers (#11896 ) Summary: Prompted by Alex Falcon's input on the forums. Thank you! Pull Request resolved: https://github.com/pytorch/pytorch/pull/11896 Differential Revision: D9976831 Pulled By: SsnL fbshipit-source-id: 460af51049c289ed4ce529b7b6ae6314e2bdaae4	2018-09-20 15:42:48 -07:00
Ashish	c7751f4df0	MIOpen bug fixes and performance enhancements (#11766 ) Summary: This PR contains changes for: 1. Performance enhancements for group conv using MIOpen 2. Performance enhancements by removing unnecessary computations while running pooling through MIOpen 3. Added check for bwdData comptutation while running MIOpen convGradient operator 4. Fix in MIOpen poolingGradient operator to compute window size for global pooling case 5. Minor code cleanup in MIOpen spatial batch norm operator Differential Revision: D9979050 Pulled By: bddppq fbshipit-source-id: fabc7a44a2f9ca0307d99564d1ce8fe1de9a6fbb	2018-09-20 15:31:46 -07:00
yya007	b91b15d86e	Implementing Matrix Norm for torch.norm (#11261 ) Summary: Currently, norm function only supports vector norm. This PR extends vector norm to matrix norm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11261 Reviewed By: li-roy Differential Revision: D9652379 Pulled By: yya007 fbshipit-source-id: 519b3fb80b563c17c56a24675c7b0e46bf5a3a1c	2018-09-20 14:43:13 -07:00
Peter Goldsborough	6100c0ea14	Introduce ExtensionVersioner for C++ extensions (#11725 ) Summary: Python never closes shared library it `dlopen`s. This means that calling `load` or `load_inline` (i.e. building a JIT C++ extension) with the same C++ extension name twice in the same Python process will never re-load the library, even if the compiled source code and the underlying shared library have changed. The only way to circumvent this is to create a new library and load it under a new module name. I fix this, of course, by introducing a layer of indirection. Loading a JIT C++ extension now goes through an `ExtensionVersioner`, which hashes the contents of the source files as well as build flags, and if this hash changed, bumps an internal version stored for each module name. A bump in the version will result in the ninja file being edited and a new shared library and effectively a new C++ extension to be compiled. For this the version name is appended as `_v<version>` to the extension name for all versions greater zero. One caveat is that if you were to update your code many times and always re-load it in the same process, you may end up with quite a lot of shared library objects in your extension's folder under `/tmp`. I imagine this isn't too bad, since extensions are typically small and there isn't really a good way for us to garbage collect old libraries, since we don't know what still has handles to them. Fixes https://github.com/pytorch/pytorch/issues/11398 CC The controller you requested could not be found. ezyang gchanan soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11725 Differential Revision: D9948244 Pulled By: goldsborough fbshipit-source-id: 695bbdc1f1597c5e4306a45cd8ba46f15c941383	2018-09-20 14:43:12 -07:00
Thomas Viehmann	068eac255b	Jit fuse clamp (#11574 ) Summary: This patch adds fused forward and backward for clamp to the jit. This is one item of #11118 . If it's OK, I'd be happy to also add some more of #11118 . The patch depends on #11150 , which I merged into master as a base. I'll rebase it when that or #10981 is merged. This is first serious jit patch, thank you, ngimel and the others for their guidance. All errors are my own. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11574 Differential Revision: D9943090 Pulled By: apaszke fbshipit-source-id: c40954b8c28c374baab8d3bd89acc9250580dc67	2018-09-20 14:43:10 -07:00
Christian Puhrsch	d8f6be686d	Remove torch/legacy (#11823 ) Summary: Largely unused and hinders current development Pull Request resolved: https://github.com/pytorch/pytorch/pull/11823 Differential Revision: D9925094 Pulled By: cpuhrsch fbshipit-source-id: c797f62180e2128f9a567b0c57c8347957470ea5	2018-09-20 14:00:54 -07:00
Pieter Noordhuis	24ec813967	Defer lazyInitCUDA() until needed (#11893 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11893 This is needed to run binaries compiled with CUDA support on on CPU-only machines. Reviewed By: teng-li Differential Revision: D9972872 fbshipit-source-id: 7e4107925b3cd4d2fcf84ae532e800ab65f4b563	2018-09-20 12:12:42 -07:00
Edward Yang	9cd0ae5e2d	Remove deprecated factory functions from Type. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11583 Reviewed By: SsnL Differential Revision: D9792800 fbshipit-source-id: 9af46d577911ff38647790169df66aa5d0379dd9	2018-09-20 11:39:48 -07:00
Soumith Chintala	87701289a3	fix link to previous versions (#11894 ) Summary: https://github.com/pytorch/pytorch.github.io/issues/68#issuecomment-423073108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11894 Differential Revision: D9973695 Pulled By: soumith fbshipit-source-id: 1f74b12487ec39f4e88b527dcdfca0742e689c15	2018-09-20 11:10:37 -07:00
Soumith Chintala	0927386890	Workaround CUDA logging on some embedded platforms (#11851 ) Summary: Fixes #11518 Upstream PR submitted at https://gitlab.kitware.com/cmake/cmake/merge_requests/2400 On some embedded platforms, the NVIDIA driver is verbose logging unexpected output to stdout. One example is Drive PX2, where we see something like this whenever a CUDA program is run: ``` nvrm_gpu: Bug 200215060 workaround enabled. ``` This patch does a regex on the output of the architecture detection program to only capture architecture patterns. It's more robust than before, but not fool-proof. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11851 Differential Revision: D9968362 Pulled By: soumith fbshipit-source-id: b7952a87132ab05c724b287b76de263f1f671a0e	2018-09-20 09:26:00 -07:00
Pieter Noordhuis	1c77f9e543	Support torch.distributed.barrier in gloo backend Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11844 Reviewed By: colesbury, SsnL Differential Revision: D9929055 Pulled By: pietern fbshipit-source-id: 3a34a179cb80f495f18aa926c0f9513924737d8e	2018-09-20 09:25:59 -07:00
Richard Zou	8f4601fbac	renable test_scalar_fusion Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11378 Differential Revision: D9943578 Pulled By: zou3519 fbshipit-source-id: fb9e4303e844d5e2515acce7869bcbe11526ab56	2018-09-20 07:56:25 -07:00
Alexander Sidorov	23dd5b4a53	Back out "Open-source ThreadSafeActivationCleaningPredictor" Summary: Original commit changeset: bfe253ae5fc8 Apparently Ads push process detected some regression which normal canaries don't show. https://fb.facebook.com/groups/1274424122598505/permalink/2597819483592289/ Reviewed By: highker, Prowindy Differential Revision: D9952807 fbshipit-source-id: 1a3ea249c3b1e2618220c61f3d51468824b6ef10	2018-09-19 21:26:51 -07:00
Hong Xu	83740eae4a	Avoid using PyThreadState.frame as it is not a public member. (#11855 ) Summary: The doc of PyThreadState [1] emphasizes that interp is its only public member. Use PyEval_GetFrame() instead. [1] https://docs.python.org/3/c-api/init.html#c.PyThreadState Pull Request resolved: https://github.com/pytorch/pytorch/pull/11855 Differential Revision: D9954430 Pulled By: ezyang fbshipit-source-id: 92da6781e45e2bcb5e3a37b162fa40e49d823215	2018-09-19 20:58:37 -07:00
Tommy Yu	c64331f48f	Add test for verifying combine_spatial_bn values in DPM (#11710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11710 Added a test to check that output and gradient values are correctly calculated wehn combine_spatial_bn is true on data parallel model Reviewed By: enosair Differential Revision: D9833660 fbshipit-source-id: 14d29fbebefa9dc303ffae06f9899ea4bde23025	2018-09-19 20:17:51 -07:00
Mingzhe Li	aa8cd7319a	Enable build_test on windows (#11802 ) Summary: This PR enables BUILD_TEST for Caffe2 on windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11802 Reviewed By: orionr Differential Revision: D9951223 Pulled By: mingzhe09088 fbshipit-source-id: 7cdc1626b999daadeae482bd569eebdbd53eb6d4	2018-09-19 20:17:49 -07:00
Peter Goldsborough	c22dcc266f	Show build output in verbose mode of C++ extensions (#11724 ) Summary: Two improvements to C++ extensions: 1. In verbose mode, show the ninja build output (the exact compile commands, very useful) 2. When raising an error, don't show the `CalledProcessError` that shows ninja failing, only show the `RuntimeError` with the captured stdout soumith fmassa ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11724 Differential Revision: D9922459 Pulled By: goldsborough fbshipit-source-id: 5b319bf24348eabfe5f4c55d6d8e799b9abe523a	2018-09-19 20:17:43 -07:00
David Riazati	1091c5e59f	Throw error on indexing a 0 dim tensor (#11679 ) Summary: Following through on warning that indexing 0-dim tensor would be an error in PyTorch 0.5 and to use `item()` instead Pull Request resolved: https://github.com/pytorch/pytorch/pull/11679 Reviewed By: soumith Differential Revision: D9833570 Pulled By: driazati fbshipit-source-id: ac19f811fa7320d30b7f60cf66b596d6de684d86	2018-09-19 18:10:03 -07:00
Lu Fang	6831d64591	Fix the symbolic for embedding_bag in ONNX_ATEN_FALLBACK (#11840 ) Summary: The ATen interface was changed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11840 Reviewed By: BIT-silence Differential Revision: D9932452 Pulled By: houseroad fbshipit-source-id: dd2040fcaa0f6052e5856ee19823cf3064124585	2018-09-19 17:40:39 -07:00
sytrus-in-github	ae1a972d78	Fix #11752 : correct numerical issue with log_softmax (#11866 ) Summary: This fixes the numerical problem in log_softmax cpu code when inputs are big but their differences are small. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11866 Differential Revision: D9946799 Pulled By: soumith fbshipit-source-id: 11fe8d92b91ef6b7a66f33fbce37ec2f0f0929be	2018-09-19 17:09:45 -07:00
Edward Yang	6302e4001a	Delete unnecessary include from allocator.cc/event_cpu.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11862 Reviewed By: Yangqing Differential Revision: D9942428 fbshipit-source-id: dea03f5ba0e621a047aa50bc4aa97acc834d2a39	2018-09-19 16:45:54 -07:00
Edward Yang	f4d25039cb	Fix Array.h when compiled with C++17 (#11816 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11816 The file isn't in the std:: namespace, so is_same must be qualified. Reviewed By: smessmer Differential Revision: D9923774 fbshipit-source-id: 126532e27f08b5616ca46be1293d5d837920f588	2018-09-19 16:45:53 -07:00
Edward Yang	b06e35b568	Back out "Revert D9924348: Expunge (transitive) caffe2_pb2 dependency from tensor_impl.h from context_base.h" Summary: Original commit changeset: 0d1792804d73 Reviewed By: Yangqing Differential Revision: D9940725 fbshipit-source-id: 540a8ac7afcfe56a6b63abc6ed297c9434320998	2018-09-19 16:45:51 -07:00
Edward Yang	cedd12d86a	Explicitly qualify references to CPU. (#11819 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11819 Differential Revision: D9928730 Pulled By: ezyang fbshipit-source-id: 3140b6ef168586558f04fa8ee90f6f2169605d7d	2018-09-19 16:45:49 -07:00
Tongzhou Wang	24e958a0a7	Move bernoulli into ATen (#10273 ) Summary: + https://github.com/pytorch/pytorch/issues/10236 : torch.bernoulli's out kwarg is broken fixed in moving `bernoulli_out` to ATen + https://github.com/pytorch/pytorch/issues/9917 : BUG torch.bernoulli(p.expand(shape)) is broken fixed in moving all `bernoulli` ops in ATen to use the modern apply utils methods + https://github.com/pytorch/pytorch/issues/10357 : torch.bernoulli inconsistent gpu/cpu results fixed by adding CUDA asserts In order to use `curand_uniform4`, I made some changes to `CUDAApplyUtils.cuh`. Specifically, I introduced an optional template parameter `int step` to the `CUDA_tensor_applyN` methods, representing that we want to process `step` values at each time for each of the `N` tensors. The calling convention for `step = 1` (default) isn't changed. But if `step > 1`, the given lambda `op` must take in `int n` as its first argument, representing the number of valid values, because there may not be full `step` values at the boundary. E.g., here is what the `bernoulli(self, p_tensor)` call look like: ```cpp // The template argument `4` below indicates that we want to operate on four // element at each time. See NOTE [ CUDA_tensor_applyN helpers ] for details. at::cuda::CUDA_tensor_apply2<scalar_t, prob_t, 4>( ret, p, [seeds] __device__( int n, scalar_t& v1, scalar_t& v2, scalar_t& v3, scalar_t& v4, const prob_t& p1, const prob_t& p2, const prob_t& p3, const prob_t& p4) { curandStatePhilox4_32_10_t state; curand_init( seeds.first, blockIdx.x * blockDim.x + threadIdx.x, seeds.second, &state); float4 rand = curand_uniform4(&state); switch (n) { case 4: { assert(0 <= p4 && p4 <= 1); v4 = static_cast<scalar_t>(rand.w <= p4); } case 3: { assert(0 <= p3 && p3 <= 1); v3 = static_cast<scalar_t>(rand.z <= p3); } case 2: { assert(0 <= p2 && p2 <= 1); v2 = static_cast<scalar_t>(rand.y <= p2); } case 1: { assert(0 <= p1 && p1 <= 1); v1 = static_cast<scalar_t>(rand.x <= p1); } } } ); ``` Benchmarking on `torch.rand(200, 300, 400)` 20 times, each time with 20 loops: post patch ``` ➜ ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py torch.bernoulli(x) 6.841588497161865 +- 0.05413117632269859 torch.bernoulli(xc) 0.05963418632745743 +- 0.0008014909108169377 x.bernoulli_() 0.4024486541748047 +- 0.0021550932433456182 xc.bernoulli_() 0.02167394384741783 +- 2.3818030967959203e-05 ``` pre-patch ``` ➜ ~ numactl --cpunodebind 1 --membind 1 -- taskset -c 12,13,14,15,16,17,18,19,20,21,22,23 env CUDA_LAUNCH_BLOCKING=1 python bern.py torch.bernoulli(x) 12.394511222839355 +- 0.0966421514749527 torch.bernoulli(xc) 0.08970972150564194 +- 0.0038722590543329716 x.bernoulli_() 1.654480218887329 +- 0.02364428900182247 xc.bernoulli_() 0.058352887630462646 +- 0.003094920190051198 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10273 Differential Revision: D9831294 Pulled By: SsnL fbshipit-source-id: 65e0655a36b90d5278b675d35cb5327751604088	2018-09-19 16:45:47 -07:00
Bram Wasti	cf5a21e4a1	Add back proto opt disable feature that was lost during refactor (#11875 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11875 Seems like the refactor to predictor_config dropped some functionality that is now blocking other teams rFBS2b30208263c14ce7039f27c618a3b232bf11ee33 is the change that was missed hoping to land this quickly :) Reviewed By: jonmorton Differential Revision: D9948324 fbshipit-source-id: 1628f7c51c06319fa7ca5dc9d59799135bb82c5f	2018-09-19 15:33:26 -07:00
Tongzhou Wang	c30790797f	Minor data loader doc improvements Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11821 Differential Revision: D9948292 Pulled By: SsnL fbshipit-source-id: 01c21c129423c0f7844b403e665a8fe021a9c820	2018-09-19 15:33:25 -07:00
Lu Fang	ce55767091	Add the missing header (#11864 ) Summary: Otherwise, some macro doesn't have the definition. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11864 Reviewed By: BIT-silence Differential Revision: D9943327 Pulled By: houseroad fbshipit-source-id: 53e1bfc7a6b832f249f169b75a8fc15cdab63bf4	2018-09-19 14:40:19 -07:00
Ansha Yu	3b1a5a1b8a	Refactor tests part 2 (#11811 ) Summary: Followup to the [first refactor](https://github.com/pytorch/pytorch/pull/11350). Increase coverage of tests Pull Request resolved: https://github.com/pytorch/pytorch/pull/11811 Reviewed By: houseroad Differential Revision: D9923074 Pulled By: ajyu fbshipit-source-id: 0f899bb9e9a75bf7ed939e06cc9b028daa7f6bd9	2018-09-19 10:09:28 -07:00
Pieter Noordhuis	52472508e9	Add env:// rendezvous test (#11782 ) Summary: A missing environment variable raised a missing key error. Now it raises a more descriptive error of the actual problem, for example: ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable WORLD_SIZE expected, but not set Pull Request resolved: https://github.com/pytorch/pytorch/pull/11782 Differential Revision: D9888962 Pulled By: pietern fbshipit-source-id: 5947e7a7bf7aa45f13bbd7b5e997529f26cc92d6	2018-09-19 09:56:06 -07:00
Will Feng	fa32317780	Add empty tensor tests to test_sparse (#11228 ) Summary: This PR adds empty sparse tensor tests to `test_sparse.py`, and also fix various places in internal code to make the tests pass. [NOTE] API CHANGE: - `coalesce` on sparse tensor will always be performed out-of-place now (meaning the original tensor will never be affected) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11228 Differential Revision: D9930449 Pulled By: yf225 fbshipit-source-id: 7c62439b216a6badf7938a10741c358ff18a556d	2018-09-19 09:40:26 -07:00
Adam Paszke	8c3a94eaf2	Improve autograd profiler performance (#11773 ) Summary: To illustrate the benefits of this commit, I'll use the time/iter I got from one of the JIT benchmarks on my machine. \| Run \| Time \| \|----------------------------------------------\|-------------------------\| \| No profiler \| 45ms \| \| With profiler \| 56ms \| \| Use `clock_gettime` instead of `std::chrono` \| 48ms \| \| Touch all pages on block allocation \| 48ms (less jitter) \| \| Use `const char*` instead of `std::string` \| 47ms (even less jitter) \| Pull Request resolved: https://github.com/pytorch/pytorch/pull/11773 Differential Revision: D9886858 Pulled By: apaszke fbshipit-source-id: 58f926f09e95df0b11ec687763a72b06b66991d0	2018-09-19 09:25:43 -07:00
Peter Goldsborough	b3a2665e0f	Code-reorg to have TORCH_ARG in its own header (#11787 ) Summary: I noticed I was including `torch/nn/pimpl.h` in the optimizer library just to access `TORCH_ARG`, even though that file includes a lot of irrelevant code. Let's save some re-compilation time by refactoring this macro into a separate logical file. #small-wins ebetica ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11787 Differential Revision: D9924447 Pulled By: goldsborough fbshipit-source-id: 5acd4ba559ffb2a3e97277e74bb731d7b1074dcf	2018-09-19 09:25:41 -07:00
Lu Fang	32494c226e	OperatorDef <==> NodeProto Conversion (#11621 ) Summary: Operator level proto conversion between (new) torch proto and (old) caffe2 proto. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11621 Reviewed By: BIT-silence Differential Revision: D9892422 Pulled By: houseroad fbshipit-source-id: 01a55ec0a09479876a27082d90fc970723f4d431	2018-09-19 08:41:33 -07:00
Natalia Gimelshein	8601b33c07	fix half grad assignment (#11781 ) Summary: currently grad assignment for half type fails with a misleading RuntimeError ``` RuntimeError: torch.cuda.sparse.HalfTensor is not enabled. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11781 Differential Revision: D9931884 Pulled By: soumith fbshipit-source-id: 03e946c3833d1339a99585c9aa2dbb670f8bf459	2018-09-18 23:00:49 -07:00
Alexander Sidorov	b46f1b8ca7	Open-source ThreadSafeActivationCleaningPredictor (#11779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11779 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11731 This Predictor provides threadsafe interface and also cleans-up activations after each run. So in multi-model setup activation space doesn't explode Reviewed By: highker Differential Revision: D9842374 fbshipit-source-id: bfe253ae5fc813e73a347c5147ff6b58d50781ea	2018-09-18 21:56:58 -07:00
Soumith Chintala	77af40c025	prioritize Accelerate over OpenBLAS (#11812 ) Summary: might fix some binary build issues Pull Request resolved: https://github.com/pytorch/pytorch/pull/11812 Reviewed By: ezyang Differential Revision: D9927309 Pulled By: soumith fbshipit-source-id: 9ed6c2c6fedc2a1cffbf52bc0a795135d4239800	2018-09-18 21:56:57 -07:00
Yangqing Jia	53b5f14f59	Remove inclusion of caffe2 pb (#11820 ) Summary: Probably not needed, but fwiw. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11820 Reviewed By: orionr Differential Revision: D9924953 Pulled By: Yangqing fbshipit-source-id: 4d340e3d4f4dadc50fb68bed9572b8e1e54b5f6d	2018-09-18 21:16:19 -07:00
Yinghai Lu	a26ad5a332	Remove unnecessary check on device option pointer (#11845 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11845 The device pointer will be used by cudaPointerGetAttributes, which handles nullptr already. So this check is not necessary. https://docs.nvidia.com/cuda/cuda-runtime-api/group__CUDART__UNIFIED.html#group__CUDART__UNIFIED_1gd89830e17d399c064a2f3c3fa8bb4390 Reviewed By: salexspb Differential Revision: D9929828 fbshipit-source-id: d862f7e5590998ffafe9bfc7754b0f83d2ae4af4	2018-09-18 21:16:18 -07:00
Wei Yang	8aedc27a63	checking device types of input and weights at RNN (#10185 ) Summary: - fixes #9534 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10185 Differential Revision: D9141222 Pulled By: weiyangfb fbshipit-source-id: bb652e42cc15917019df080d6bce2926b18f3476	2018-09-18 20:26:02 -07:00
Anthony Miller	e80d1d2876	Revert D9924348: Expunge (transitive) caffe2_pb2 dependency from tensor_impl.h from context_base.h Differential Revision: D9924348 Original commit changeset: 8d92b9e8b424 fbshipit-source-id: 0d1792804d7387023af3a9c29477f1da6f40044a	2018-09-18 18:27:00 -07:00
Jerry Pan	2c358eaf51	Caffe2: add plan name to logging (#11704 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11704 Add plan name to the logging in RunPlan Reviewed By: Tianshu-Bao Differential Revision: D9802416 fbshipit-source-id: 45c359dba0a5d992e303b3cdcf34624881a631d8	2018-09-18 18:10:13 -07:00
Will Feng	1f34be47d9	Raise error when perf test result is NaN (#11588 ) Summary: Currently one of our GPU perf tests `test_gpu_speed_mnist` reports NaN after this commit (https://github.com/pytorch/pytorch/pull/8018), and we didn't have the logic in place to raise error when this happens. This PR fixes the problem and will also update the baseline properly even if its previous value is NaN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11588 Differential Revision: D9831798 Pulled By: yf225 fbshipit-source-id: b95eee38d69b3b8273f48b8ac7b7e0e79cf756ed	2018-09-18 18:10:12 -07:00
David Riazati	a79f5d77ad	Add pretty printer for JIT IR (#10319 ) Summary: Adds some pretty-printing capability to the IR graph to make debugging easier/more human readable, see `torch/csrc/jit/test_jit.cpp:925` and onwards for example outputs. Results aren't perfect yet but it's a start. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10319 Reviewed By: zdevito Differential Revision: D9558402 Pulled By: driazati fbshipit-source-id: 1d61c02818daa4c9bdca36d1477d1734cfc7d043	2018-09-18 17:39:44 -07:00
Edward Yang	1c8686001f	Expunge (transitive) caffe2_pb2 dependency from tensor_impl.h from context_base.h (#11818 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11818 To do this, I have to move the static context registry into ATen/core. I take the opportunity to convert it into an unordered_map. Reviewed By: Yangqing Differential Revision: D9924348 fbshipit-source-id: 8d92b9e8b4246ce608eba24ecef7ad5f8b9b6582	2018-09-18 17:25:46 -07:00
Yangqing Jia	3da8d71d7d	remove protobuf inclusion in core/logging.h (#11814 ) Summary: This should not be there since logging does not depend on protobuf. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11814 Reviewed By: ezyang Differential Revision: D9923819 Pulled By: Yangqing fbshipit-source-id: 4d4edaea1a2e317f5db6e92c35d58c85dd35c5fb	2018-09-18 17:10:02 -07:00
Sebastian Messmer	53cf628503	Simplify Blob move constructor/assignment (#11402 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11402 - Simplify move constructor/assignment - Make more things noexcept Reviewed By: ezyang Differential Revision: D9728631 fbshipit-source-id: 92562e30ea1e4d05ca857665a02b0ca66b0739e3	2018-09-18 15:09:40 -07:00
sven	e585f2fb48	Polish CPP docs, Minor Python Docs Fixes (#11722 ) Differential Revision: D9919120 Pulled By: goldsborough fbshipit-source-id: bf14cbe4ab79524495957cb749828046af864aab	2018-09-18 14:55:57 -07:00
Orion Reblitz-Richardson	8ad846fda5	Don't build Detectron ops with NO_CAFFE2_OPS=1 (#11799 ) Summary: cc apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11799 Differential Revision: D9922745 Pulled By: orionr fbshipit-source-id: b88724b7c2919aabc00d98658e8e563233e01c85	2018-09-18 14:09:33 -07:00
Wanchao Liang	d4e1fa45d0	allow no-alpha add/sub in onnx symbolic (#10972 ) Summary: The PR fixes #10873 The context is aten::add and aten::sub ST overloads don't have alpha, so onnx symbolic does not match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10972 Reviewed By: jamesr66a Differential Revision: D9724224 Pulled By: wanchaol fbshipit-source-id: eb5d1b09fa8f1604b288f4a62b8d1f0bc66611af	2018-09-18 13:55:39 -07:00
James Reed	7d25fa3c72	Emit Undefined type for value when it is Dynamic type (#11810 ) Summary: For example, outputs of control blocks often have Dynamic type, and when we try to export them to ONNX we get an invalid proto, since `elem_type` is not populated on the TypeInfoProto. This makes it so at least we can get past the checker, since having a dynamic typed output from a control block should still be semantically valid Pull Request resolved: https://github.com/pytorch/pytorch/pull/11810 Differential Revision: D9922754 Pulled By: jamesr66a fbshipit-source-id: 5c66113cc302a9d9b8b9f5a8605473d3c6ad5af1	2018-09-18 13:55:36 -07:00
Edward Yang	1d399a80a0	Handle pollution of MAX, MIN and CHECK macros. (#11805 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11805 Some of our headers in Caffe2 pollute the macro namespace with things like MAX, MIN, CHECK, so I renamed these in places where this is a problem. This patch courtesy of gchanan, extracted out of #11721 Reviewed By: Yangqing Differential Revision: D9917757 fbshipit-source-id: 17fc692ca04b208dcb8ae00731ed60e393284f7c	2018-09-18 13:18:31 -07:00
Bram Wasti	9eb72889b4	Add successor/predecessor functions Summary: More functionality to prep nomnigraph for scheduler implementations Reviewed By: duc0 Differential Revision: D9794686 fbshipit-source-id: b460859d8ff965d0049b2a696bd8d2f5c97f3f86	2018-09-18 12:27:06 -07:00
Will Feng	47956ddf7e	Revert D9755189: [pytorch][PR] [API CHANGE] Add empty tensor tests to test_sparse Differential Revision: D9755189 Original commit changeset: e9d36f437db1 fbshipit-source-id: 8b99edf626418a953a8bd786847a6e0174a3a14d	2018-09-18 11:26:10 -07:00
Tongzhou Wang	540ef9b1fc	Add distributed get_backend (#11715 ) Summary: I have no idea how to run distributed tests locally so I'll let CI do this. Hopefully everything still works with `IntEnum`. cc mcarilli Pull Request resolved: https://github.com/pytorch/pytorch/pull/11715 Reviewed By: pietern Differential Revision: D9889646 Pulled By: SsnL fbshipit-source-id: 1e2a487cb6fe0bd4cc67501c9d72a295c35693e2	2018-09-18 10:56:24 -07:00
Soumith Chintala	2732c8bae1	improve aten/convolution error message (#11768 ) Summary: fixes https://github.com/pytorch/pytorch/issues/11762 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11768 Differential Revision: D9884185 Pulled By: soumith fbshipit-source-id: 2a0c3e1f5a4fb4833ae6e9fc791abcf45f7fbea2	2018-09-18 10:56:22 -07:00
Ansha Yu	98aebed88e	Refactor tests part 1 (#11350 ) Summary: Followup to [the serialized test framework](https://github.com/pytorch/pytorch/pull/10594) Round 1 for refactoring tests, starting alphabetically. I added some functionality, so I wanted to send out some of these initial changes sooner. I'm skipping all tests that don't explicitly call assertReferenceChecks. Some tests directly call np.allclose, and others are simply TestCase (rather than HypothesisTestCase). 1. Start alphabetically producing serialized outputs for test functions, annotating those we want to include with `serialized_test_util.given`. So far I've only added one test per operator, but this already does seem to add quite a few tests. 2. Add functionality to allow us to generate outputs using pytest by adding pytest argument options. This allows us to skip adding a `__main__` function to quite a few tests. 3. Catch any exceptions generating the gradient operator and skip serializing/reading it, since certain operators don't have gradients. 4. Add functionality to better handle jagged array inputs, which numpy doesn't handle very well. We simply explicitly do the conversion to dtype=object. 5. Make only one file per test function, rather than 4, to reduce the number of files in the github repo. I also noticed that there is some hypothesis handling that makes `serialized_test_util.given` not compatible with adding more hypothesis decorators on top. For example, there are tests that do ``` settings(...) given(...) def test_my_stuff(...) ``` But there is a hypothesis handler that explicitly checks that `given` is called below `settings`, so we cannot refactor this to `serialized_test_util.given`. I've just avoided decorating these kinds of tests for now, I hope that's alright. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11350 Reviewed By: houseroad Differential Revision: D9693857 Pulled By: ajyu fbshipit-source-id: a9b4279afbe51c90cf2025c5ac6b2db2111f4af7	2018-09-18 10:42:10 -07:00
Peter Goldsborough	6073f3073e	Document torch::nn::init (#11778 ) Summary: Doc fixes and documentation for `torch::nn::init`. ebetica soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11778 Differential Revision: D9886648 Pulled By: goldsborough fbshipit-source-id: 22eb78add1dc32b92cc32253683ab3d746505a64	2018-09-18 10:26:21 -07:00
Will Feng	c8fbeb3aa2	Add empty tensor tests to test_sparse (#11228 ) Summary: This PR adds empty sparse tensor tests to `test_sparse.py`, and also fix various places in internal code to make the tests pass. [NOTE] API CHANGE: - `coalesce` on sparse tensor will always be performed out-of-place now (meaning the original tensor will never be affected) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11228 Differential Revision: D9755189 Pulled By: yf225 fbshipit-source-id: e9d36f437db1a132c423d3a282ff405a084ae7cc	2018-09-18 10:26:18 -07:00
Gregory Chanan	e00fb69b25	Use CATCH prefix to avoid name conflicts with Caffe2. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11780 Differential Revision: D9889925 Pulled By: gchanan fbshipit-source-id: 5eca849c36ced00b8ae7482b7945b445a3e1687e	2018-09-18 08:12:45 -07:00
Amitesh Arora	4ee0a78ee6	varargs for meshgrid (#11600 ) Summary: Adds vararg support for meshgrid and adds checks for all the tensor arguments to have the same dtype and device. Fixes: [#10823](https://github.com/pytorch/pytorch/issues/10823), #11446 The earlier pull request closed without any changes because I had some rebasing issues, so I made another pull request to close out #10823. Sorry for the inconvenience. Differential Revision: D9892876 Pulled By: ezyang fbshipit-source-id: 93d96cafc876102ccbad3ca2cc3d81cb4c9bf556	2018-09-18 07:41:31 -07:00
Xingdong Zuo	e2bc95e1bd	add `ModuleList.insert` (#11664 ) Summary: fixes #11652 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11664 Differential Revision: D9892845 Pulled By: ezyang fbshipit-source-id: 2c910d6bc0b28a999e25beca6e398fd0f35535c5	2018-09-18 07:41:28 -07:00
nehz	91b6458e2d	Container __getitem__ slicing for subclasses (#11694 ) Summary: Simple change to allow ModuleList subclasses's `__getitem__(slice)` to return class of subclass rather than ModuleList Pull Request resolved: https://github.com/pytorch/pytorch/pull/11694 Differential Revision: D9892824 Pulled By: ezyang fbshipit-source-id: b75e9c196487f55cb93f0dab6c20d850e8e759ff	2018-09-18 01:26:18 -07:00
Marc Ferradou	e734c94fa2	Quick update to embedding_bag doc (#11784 ) Summary: Related to #11624 adding maxes to the function def of embedding_bag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11784 Differential Revision: D9892598 Pulled By: ezyang fbshipit-source-id: e6372ccf631826ddf1e1885b2f8f75f354a36c0b	2018-09-17 23:56:05 -07:00
Wei Yang	407a9fee0c	make copy constructed tensor a leaf variable when using torch.tensor(sourceTensor) (#11061 ) Summary: - fix https://github.com/pytorch/pytorch/issues/10876 - the cause of the bug is because copy constructor cannot distinguish between default value of requires_grad and requires_grad=False, thus it makes a copy from source tensor along with its grad_fn if requires_grad=True at source - with this fix, the behavior becomes ``` >>> source = torch.randn(2, 2, requires_grad=True) >>> copy = torch.tensor(source, requires_grad=True) >>> print(copy) tensor([[-1.2001, 1.9869], [-1.0134, 1.3096]], grad_fn=<CopyBackwards>) >>> source = torch.randn(2, 2, requires_grad=True) >>> copy = torch.tensor(source, requires_grad=False) >>> print(copy) tensor([[-0.7402, 0.0467], [ 0.4344, -0.0420]]) >>> source = torch.randn(2, 2, requires_grad=True) >>> copy = torch.tensor(source) >>> print(copy) tensor([[-0.7402, 0.0467], [ 0.4344, -0.0420]]) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11061 Differential Revision: D9569714 Pulled By: weiyangfb fbshipit-source-id: ea368688bdc0f1ce5997870e164e42835b64b4a1	2018-09-17 23:29:09 -07:00
Peter Goldsborough	63c811b3a6	Include some JIT things in C++ docs (#11712 ) Summary: Since we're making parts of the JIT public as part of loading script modules, they should be on the cppdocs website. Orthogonal: We decided not to export things like `IValue` into the `torch` namespace, so `RegisterOperators` shouldn't be there either. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11712 Differential Revision: D9837578 Pulled By: goldsborough fbshipit-source-id: 4c06d2fa9dd4b4216951f27424c2ce795febab9c	2018-09-17 23:29:04 -07:00
Christian Puhrsch	bd43d64dd5	Add strides to Tensor (#11763 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11763 baseline-std vector ``` ============================================================================ caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s ============================================================================ TensorConstructionDestruction 6.74us 148.26K TensorShareData 5.89us 169.78K TensorShareExternalPointer 1.01us 994.35K TensorReallocation 2.46us 405.78K ============================================================================ ============================================================================ caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s ============================================================================ TensorConstructionDestruction 7.50us 133.27K TensorShareData 7.07us 141.38K TensorShareExternalPointer 1.05us 955.19K TensorReallocation 2.55us 391.62K ============================================================================ ``` baseline-smallvector ``` ============================================================================ caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s ============================================================================ TensorConstructionDestruction 6.56us 152.34K TensorShareData 5.84us 171.32K TensorShareExternalPointer 962.49ns 1.04M TensorReallocation 2.32us 431.73K ============================================================================ ============================================================================ caffe2/caffe2/fb/benchmarks/core_overhead_benchmark.ccrelative time/iter iters/s ============================================================================ TensorConstructionDestruction 6.29us 159.04K TensorShareData 5.73us 174.39K TensorShareExternalPointer 914.90ns 1.09M TensorReallocation 2.29us 435.80K ============================================================================ ``` Reviewed By: ezyang Differential Revision: D9694097 fbshipit-source-id: c462e770a4b40e640d8c9d38e0ae7036a4e6e84a	2018-09-17 22:09:40 -07:00
Thomas Viehmann	a02685e109	Fix test_torch's test_potri (#11770 ) Summary: tset_potri -> test_potri, even though it has been like this for a long time More a curiosity than grave functionality... Pull Request resolved: https://github.com/pytorch/pytorch/pull/11770 Reviewed By: ezyang Differential Revision: D9884767 Pulled By: soumith fbshipit-source-id: 9bedde2e94ade281ab1ecc2293ca3cb1a0107387	2018-09-17 21:58:18 -07:00
Pieter Noordhuis	3cbec5453b	Reorder statements for readability (#11764 ) Summary: I was reading this a couple times before figuring out it's also the entry point for the MPI_COMM_WORLD. Reordered statements and added comment to clarify. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11764 Differential Revision: D9882834 Pulled By: pietern fbshipit-source-id: a9282d55368815925fd695a2541354e5aec599da	2018-09-17 21:58:15 -07:00
Mingzhe Li	a7cbcb1bb9	Enable build_python on windows (#11385 ) Summary: The PR aims to resolve issues related to BUILD_PYTHON and BUILD_TEST after FULL_CAFFE2 is removed on Windows. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11385 Reviewed By: orionr Differential Revision: D9884906 Pulled By: mingzhe09088 fbshipit-source-id: fc114c0cbff6223f1ec261161e4caecc1fef5dd6	2018-09-17 21:40:03 -07:00
Tianshu Bao	63e384a381	SNNTest with Data Preproc Service (#11707 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11707 Trigger SNN offline training test with data preproc service. Reviewed By: xsh6528 Differential Revision: D9826978 fbshipit-source-id: f98405ca1e61a7662bf0d9313aaba42436025a83	2018-09-17 21:25:49 -07:00
Sam Gross	7f0dd2487d	Move AT_HOST_DEVICE macro to Macros.h (#10945 ) Summary: ``` I'm using AT_HOST_DEVICE outside of Half.h in an upcoming PR. Since this changes code without making any semantic changes, I wanted to make this change in a separate PR. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10945 Differential Revision: D9539821 Pulled By: colesbury fbshipit-source-id: 0daae40ea78b077a543f7bfeec06b225634540de	2018-09-17 18:25:51 -07:00
Bram Wasti	e8ecbcdf01	Move IValue to ATen/core (#11610 ) Summary: unblocks D9202320 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11610 Differential Revision: D9774853 Pulled By: bwasti fbshipit-source-id: 4798223f6de680a7152283e8cad8814da7f90209	2018-09-17 18:25:50 -07:00
Junjie Bai	d4dde0bcaf	Detect number of amd gpus in ROCM CI (#11771 ) Summary: We now have CI machines with different number of amd gpus. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11771 Differential Revision: D9889837 Pulled By: bddppq fbshipit-source-id: dacf728a282f209e3f2419da186e59528a08ca6a	2018-09-17 18:11:09 -07:00
Pieter Noordhuis	24a8c13f36	Add barrier to fix distributed test flakiness (#11775 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11775 This should fix #11582. Reviewed By: ezyang Differential Revision: D9885546 fbshipit-source-id: 3544f42ebe8b595cdf6941859c67484d3ea9b3f8	2018-09-17 17:31:45 -07:00
zrphercule	7d0657f13c	Migrate test in cpp/api/ to use gtest (#11556 ) Summary: The second part of T32009899 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11556 Differential Revision: D9888224 Pulled By: zrphercule fbshipit-source-id: cb0d0ba5d9c7ad601ee3bce0d932ce9cbbc40908	2018-09-17 17:31:43 -07:00
Bram Wasti	3819d25418	Clean up converter and accept less-valid networks Summary: Cleaning up converter.cc and allowing networks that have "pass through" inputs (that are also outputs but aren't actually consumed by the network) Reviewed By: duc0 Differential Revision: D9759435 fbshipit-source-id: 1ddfcc60a1b865a06682e4022230dfecc4b89ec3	2018-09-17 17:31:41 -07:00
Bram Wasti	ca5def1b8f	Expose annotations (#11649 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11649 Putting annotations in python interface Reviewed By: duc0 Differential Revision: D9784750 fbshipit-source-id: d877c886ac52559ca3f009a1fd848dd1779b7d04	2018-09-17 16:39:37 -07:00
Gregory Chanan	3ce17bf8f6	Generate ATen/core to source if env GEN_TO_SOURCE is set. (#11759 ) Summary: It is currently tedious to change code generation because it takes two steps: change the code gen, then gen.py fails because of file mismatch. Just add an environment option of generating directly to source. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11759 Differential Revision: D9867259 Pulled By: gchanan fbshipit-source-id: 3cf8024d9e302f382cf8b8a44cb843fb086f8597	2018-09-17 15:25:33 -07:00
Tongzhou Wang	7df6650e9c	Fix empty embedding bag on cuda (#11740 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11739 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11740 Differential Revision: D9881392 Pulled By: SsnL fbshipit-source-id: 2964d314f199dd9b4bb69e36592b67efdf5e0760	2018-09-17 14:40:03 -07:00
David Riazati	7671f4ab1c	Add `math` to scope when using inf in tests (#11302 ) Summary: This fixes #8515 which was mostly issues in the test themselves. As long as `math` is imported in the scope in which the script runs it resolves to a `prim::Constant` with value `inf` correctly. This PR adds this to the `test_jit.py` tests involving `inf` and adds a test to demonstrate `inf` in a non-generated test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11302 Differential Revision: D9684336 Pulled By: driazati fbshipit-source-id: 73df2848dfdb45ab50690a7c88df8fda269a64eb	2018-09-17 14:08:32 -07:00
Jongsoo Park	29610621ec	64B align for avx512 (#11748 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11748 For avx512, we need to align at a multiple of 64B not 32B Regardless of avx512, it's in general a good idea to be cache line aligned. Reviewed By: ilia-cher Differential Revision: D9845056 fbshipit-source-id: b1d3ed67749c0c1a64acd5cc230a1279e8023512	2018-09-17 14:08:31 -07:00
Natalia Gimelshein	336323f53c	return aten::gt to the list of fusable operations, add expected graphs (#11150 ) Summary: Fixes one of #11118 issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11150 Differential Revision: D9861372 Pulled By: apaszke fbshipit-source-id: 98b196b89e991d3936360b30568360367fd32e8b	2018-09-17 13:40:41 -07:00
Soumith Chintala	73738ec570	bump version to 1.0 (#11717 ) Summary: I'm just doing the honors and bumping the version to 1.0.0. 1.0 preview and RC releases will have the 1.0.0.dev{date} tag Pull Request resolved: https://github.com/pytorch/pytorch/pull/11717 Reviewed By: SsnL Differential Revision: D9840857 Pulled By: soumith fbshipit-source-id: 4c9c2e01dccb3c521dab26c49e1569d970a87ace	2018-09-17 12:13:48 -07:00
vishwakftw	47d65ed34f	Fix issue 10492 (#11634 ) Summary: - pass infos vector by reference - checkErrors takes infos vector by reference - modified gesv tests to not cause infs or nans sporadically - also clean up error messages Reviewed By: ezyang Differential Revision: D9818550 Pulled By: soumith fbshipit-source-id: 00215205ff88767d6a5e921322394c5fd915d6d8	2018-09-17 12:13:45 -07:00
Gregory Chanan	39520ffec1	remove Type/Tensor/TensorMethods include order dependencies. (#11720 ) Summary: Previously, it was a necessity to include TensorMethods.h after Tensor.h in order to get the tensor method definitions. We abstracted this away from users by making sure ATen.h did this correctly; but we don't have any equivalent for ATen/core. In order to solve this dependency issue, we now forward declare Tensor in the Type declaration, which breaks the dependency cycle. Type.h now includes Tensor.h (for backwards compatibility) and Tensor.h now includes TensorMethods.h, so there is no longer include dependency restrictions. We could get rid of TensorMethods.h completely now, but that would involve coordinating a code generation change. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11720 Reviewed By: ezyang Differential Revision: D9841488 Pulled By: gchanan fbshipit-source-id: 1668199095e096c1790e646b5dc9f61ec1b33c0a	2018-09-17 11:10:32 -07:00
Gregory Chanan	e125e61824	Fix flake8 Summary: Fix flake8 Reviewed By: ezyang Differential Revision: D9873872 fbshipit-source-id: 26e81238f22caaeccd2c8b4f39cedb6cfb5520dd	2018-09-17 11:10:29 -07:00
Chenguang Xi	cdefc27795	Support lr adaption for SparseAdam and RowWiseSparseAdam (#11162 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11162 as title, fix pr test failure Reviewed By: chocjy Differential Revision: D9619308 fbshipit-source-id: 0a2228841ed8fadb15f07e94d3575aa701b10146	2018-09-17 10:29:03 -07:00
Peter Goldsborough	7949250295	Fixes for Torch Script C++ API (#11682 ) Summary: A couple fixes I deem necessary to the TorchScript C++ API after writing the tutorial: 1. When I was creating the custom op API, I created `torch/op.h` as the one-stop header for creating custom ops. I now notice that there is no good header for the TorchScript C++ story altogether, i.e. when you just want to load a script module in C++ without any custom ops necessarily. The `torch/op.h` header suits that purpose just as well of course, but I think we should rename it to `torch/script.h`, which seems like a great name for this feature. 2. The current API for the CMake we provided was that we defined a bunch of variables like `TORCH_LIBRARY_DIRS` and `TORCH_INCLUDES` and then expected users to add those variables to their targets. We also had a CMake function that did that for you automatically. I now realized a much smarter way of doing this is to create an `IMPORTED` target for the libtorch library in CMake, and then add all this stuff to the link interface of that target. Then all downstream users have to do is `target_link_libraries(my_target torch)` and they get all the proper includes, libraries and compiler flags added to their target. This means we can get rid of the CMake function and all that stuff. orionr AFAIK this is a much, much better way of doing all of this, no? 3. Since we distribute libtorch with `D_GLIBCXX_USE_CXX11_ABI=0`, dependent libraries must set this flag too. I now add this to the interface compile options of this imported target. 4. Fixes to JIT docs. These could likely be 4 different PRs but given the release I wouldn't mind landing them all asap. zdevito dzhulgakov soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11682 Differential Revision: D9839431 Pulled By: goldsborough fbshipit-source-id: fdc47b95f83f22d53e1995aa683e09613b4bfe65	2018-09-17 09:54:50 -07:00
Thomas Viehmann	a7e3cd09e0	Fix ctc gradient handling (#11753 ) Summary: Fixes: #11750 Also fix cuda ctc with double to enable gradient check. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11753 Differential Revision: D9861318 Pulled By: ezyang fbshipit-source-id: 2e7afea2b60dbbd891bb5d0bda61ee75fe01d933	2018-09-17 09:54:47 -07:00
Edward Yang	07fd4450ab	Revert D9831398: [pytorch][PR] Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) Differential Revision: D9831398 Original commit changeset: db119d3f9c26 fbshipit-source-id: 4f183c9c178c159473bdaaa6299d4d5eb8afe549	2018-09-17 09:39:23 -07:00
Edward Yang	f6a6d7fae1	Switch at::TensorImpl to store TypeMeta rather than ScalarType Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11702 Reviewed By: cpuhrsch Differential Revision: D9831384 fbshipit-source-id: 1b1233a70ed70b47a3dab4a5797b6cfcb7a2c265	2018-09-17 09:09:35 -07:00
Edward Yang	6660a128a5	Cache and use TypeMeta in TensorImpl (#11706 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11706 This is necessary to handle use-cases when Storage is not set (because the tensor in question doesn't have a notion of storage.) Reviewed By: orionr Differential Revision: D9833361 fbshipit-source-id: e90a384019f44f57682b687d129b54e85b6fabb9	2018-09-17 08:58:13 -07:00
Edward Yang	2baba7f835	Add storage_offset to Caffe2 (#11701 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11701 There's one extra multiply from TypeMeta::itemsize() which needs to be characterized. For all existing Caffe2 uses, storage_offset is zero. Reviewed By: li-roy Differential Revision: D9831230 fbshipit-source-id: 353678edf76d2ccc297a73475a34f6ab2a20d1e1	2018-09-17 08:58:11 -07:00
Edward Yang	35518b3dc7	Back out "Back out "Refactor Tensor/TensorImpl constructors."" E2: Confirm problem with old patch (#11744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11744 Original commit changeset: 093e4c47d557 Restores D9813742 Reviewed By: dzhulgakov Differential Revision: D9847835 fbshipit-source-id: f3f467891e01c923dd9d3352d892cf59e10402f1	2018-09-17 08:58:09 -07:00
Gregory Chanan	0d345cfa18	Remove Type method defaults in ATen. (#11675 ) Summary: This will allow us to break the dependency cycle between Tensor and Type, because currently Type has defaulted Tensor (reference) arguments. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11675 Reviewed By: ezyang Differential Revision: D9819720 Pulled By: gchanan fbshipit-source-id: a9577ac34a358120075129ab0654e7862d1dace6	2018-09-17 08:58:07 -07:00
Jesse Hellemn	5bfd8f583c	Moving copy of Caffe2 protos back to build_pytorch_libs.sh (#11726 ) Summary: This way it shows up in all current and future setup.py commands, as otherwise we'd have to override every once to have them all call copy_protos. This is needed because the nightly packages still do not include caffe2_pb2, because setup.py bdist does not go through setup.py install or setup.py develop Pull Request resolved: https://github.com/pytorch/pytorch/pull/11726 Reviewed By: orionr Differential Revision: D9844075 Pulled By: pjh5 fbshipit-source-id: 57b469e48010aacd0c08c214ba8a7e5d757feefa	2018-09-17 08:58:05 -07:00
Gregory Chanan	a8b1755de6	Check device argument makes sense for legacy tensor constructors. (#11669 ) Summary: Fixes: https://github.com/pytorch/pytorch/issues/11427. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11669 Differential Revision: D9817881 Pulled By: gchanan fbshipit-source-id: 77dc5b0e6bc9884d2616210b96c07e4734058bb6	2018-09-17 08:24:25 -07:00
peter	d63bb72d89	Remove symbol export annotations in THC/generic/*.cu (#11367 ) Summary: We use these annotations during function declarations, not definitions. See the description of compiler error [C2491](https://msdn.microsoft.com/en-us/library/62688esh.aspx) for more details. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11367 Reviewed By: ezyang Differential Revision: D9697923 Pulled By: orionr fbshipit-source-id: 1e539c02957851386f887e6d0510ce83117a1695	2018-09-17 08:24:23 -07:00
JerryShih	f5bc2aef07	Update OpenMP cmake setting for xcode 9 compiler(AppleClang 9.0) (#11563 ) Summary: Fix the link OpenMP link error for AppleClang 9.0 compiler. Built with the following command: python setup.py build develop The error message: ``` Undefined symbols for architecture x86_64: "___kmpc_critical", referenced from: _THFloatTensor_addmm in THTensorMath.cpp.o _THDoubleTensor_addmm in THTensorMath.cpp.o _THByteTensor_addmm in THTensorMath.cpp.o _THCharTensor_addmm in THTensorMath.cpp.o _THShortTensor_addmm in THTensorMath.cpp.o _THIntTensor_addmm in THTensorMath.cpp.o _THLongTensor_addmm in THTensorMath.cpp.o ... "___kmpc_end_critical", referenced from: _THFloatTensor_addmm in THTensorMath.cpp.o _THDoubleTensor_addmm in THTensorMath.cpp.o _THByteTensor_addmm in THTensorMath.cpp.o _THCharTensor_addmm in THTensorMath.cpp.o _THShortTensor_addmm in THTensorMath.cpp.o _THIntTensor_addmm in THTensorMath.cpp.o _THLongTensor_addmm in THTensorMath.cpp.o ... "___kmpc_end_reduce_nowait", referenced from: _.omp_outlined..270 in THTensorMoreMath.cpp.o _.omp_outlined..271 in THTensorMoreMath.cpp.o _.omp_outlined..273 in THTensorMoreMath.cpp.o _.omp_outlined..275 in THTensorMoreMath.cpp.o _.omp_outlined..43 in THTensorEvenMoreMath.cpp.o _.omp_outlined..44 in THTensorEvenMoreMath.cpp.o _.omp_outlined..46 in THTensorEvenMoreMath.cpp.o ... "___kmpc_end_serialized_parallel", referenced from: at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::TensorIterator::for_each(std::__1::function<void (int, char*, long long const, long long)> const&) in TensorIterator.cpp.o ... "___kmpc_for_static_fini", referenced from: _.omp_outlined..9 in Embedding.cpp.o _.omp_outlined. in EmbeddingBag.cpp.o _.omp_outlined. in GridSampler.cpp.o _.omp_outlined..42 in GridSampler.cpp.o _.omp_outlined..44 in GridSampler.cpp.o _.omp_outlined..45 in GridSampler.cpp.o _.omp_outlined..47 in GridSampler.cpp.o ... "___kmpc_for_static_init_4", referenced from: _.omp_outlined. in init.cpp.o _.omp_outlined..35 in init.cpp.o _.omp_outlined..36 in init.cpp.o _.omp_outlined..37 in init.cpp.o _.omp_outlined..49 in init.cpp.o _.omp_outlined..52 in init.cpp.o _.omp_outlined..220 in init.cpp.o ... "___kmpc_for_static_init_8", referenced from: _.omp_outlined..9 in Embedding.cpp.o _.omp_outlined. in EmbeddingBag.cpp.o _.omp_outlined. in GridSampler.cpp.o _.omp_outlined..42 in GridSampler.cpp.o _.omp_outlined..44 in GridSampler.cpp.o _.omp_outlined..45 in GridSampler.cpp.o _.omp_outlined..47 in GridSampler.cpp.o ... "___kmpc_for_static_init_8u", referenced from: _.omp_outlined..203 in init.cpp.o _.omp_outlined..207 in init.cpp.o _.omp_outlined..209 in init.cpp.o _.omp_outlined..210 in init.cpp.o "___kmpc_fork_call", referenced from: at::native::embedding_dense_backward_cpu(at::Tensor const&, at::Tensor const&, long long, long long, bool) in Embedding.cpp.o at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o at::native::grid_sampler_2d_cpu(at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o at::native::grid_sampler_3d_cpu(at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o at::native::grid_sampler_2d_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o at::native::grid_sampler_3d_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, long long) in GridSampler.cpp.o ... "___kmpc_global_thread_num", referenced from: at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::TensorIterator::for_each(std::__1::function<void (int, char*, long long const, long long)> const&) in TensorIterator.cpp.o ... "___kmpc_push_num_threads", referenced from: void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o ... "___kmpc_reduce_nowait", referenced from: _.omp_outlined..270 in THTensorMoreMath.cpp.o _.omp_outlined..271 in THTensorMoreMath.cpp.o _.omp_outlined..273 in THTensorMoreMath.cpp.o _.omp_outlined..275 in THTensorMoreMath.cpp.o _.omp_outlined..43 in THTensorEvenMoreMath.cpp.o _.omp_outlined..44 in THTensorEvenMoreMath.cpp.o _.omp_outlined..46 in THTensorEvenMoreMath.cpp.o ... "___kmpc_serialized_parallel", referenced from: at::native::embedding_renorm_cpu_(at::Tensor&, at::Tensor const&, double, double) in Embedding.cpp.o at::native::_embedding_bag_dense_backward_cpu(at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, at::Tensor const&, long long, bool, long long) in EmbeddingBag.cpp.o at::native::softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::log_softmax_cpu(at::Tensor const&, long long) in SoftMax.cpp.o at::native::softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::native::log_softmax_backward_cpu(at::Tensor const&, at::Tensor const&, long long, at::Tensor const&) in SoftMax.cpp.o at::TensorIterator::for_each(std::__1::function<void (int, char*, long long const, long long)> const&) in TensorIterator.cpp.o ... "_omp_get_max_threads", referenced from: _THGetNumThreads in THGeneral.cpp.o caffe2::Caffe2SetOpenMPThreads(int, char*) in init_omp.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 0, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 1, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> >, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 1, false, float, 1, false, 0>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Transpose<Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::Stride<0, 0> > const>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::Stride<0, 0> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o void Eigen::internal::parallelize_gemm<true, Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> >, long>(Eigen::internal::gemm_functor<float, long, Eigen::internal::general_matrix_matrix_product<long, float, 0, false, float, 0, false, 0>, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1> const, 0, Eigen::OuterStride<-1> >, Eigen::Map<Eigen::Matrix<float, -1, -1, 0, -1, -1>, 0, Eigen::OuterStride<-1> >, Eigen::internal::gemm_blocking_space<0, float, float, -1, -1, -1, 1, false> > const&, long, long, long, bool) in math_cpu.cc.o ... "_omp_get_num_procs", referenced from: _THGetNumCores in THGeneral.cpp.o "_omp_get_num_threads", referenced from: _.omp_outlined. in Embedding.cpp.o _.omp_outlined. in SoftMax.cpp.o _.omp_outlined..35 in SoftMax.cpp.o _.omp_outlined..37 in SoftMax.cpp.o _.omp_outlined..38 in SoftMax.cpp.o _.omp_outlined..46 in SoftMax.cpp.o _.omp_outlined..47 in SoftMax.cpp.o ... "_omp_get_thread_num", referenced from: _.omp_outlined. in Embedding.cpp.o _.omp_outlined. in SoftMax.cpp.o _.omp_outlined..35 in SoftMax.cpp.o _.omp_outlined..37 in SoftMax.cpp.o _.omp_outlined..38 in SoftMax.cpp.o _.omp_outlined..46 in SoftMax.cpp.o _.omp_outlined..47 in SoftMax.cpp.o ... "_omp_in_parallel", referenced from: _THFloatTensor_copy in THTensorCopy.cpp.o _THDoubleTensor_copy in THTensorCopy.cpp.o _THByteTensor_copy in THTensorCopy.cpp.o _THCharTensor_copy in THTensorCopy.cpp.o _THShortTensor_copy in THTensorCopy.cpp.o _THIntTensor_copy in THTensorCopy.cpp.o _THLongTensor_copy in THTensorCopy.cpp.o ... "_omp_set_num_threads", referenced from: _THSetNumThreads in THGeneral.cpp.o caffe2::Caffe2SetOpenMPThreads(int, char***) in init_omp.cc.o ld: symbol(s) not found for architecture x86_64 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11563 Differential Revision: D9831398 Pulled By: ezyang fbshipit-source-id: db119d3f9c26a71180335ad955f2f62c5369f9ed	2018-09-17 08:24:20 -07:00
Tongzhou Wang	6f6b03566b	Vectorize grid sample 2d CPU kernels (#10980 ) Summary: This PR vectorizes the CPU grid sample 2d forward and backward kernels. Specifically, 1. add `.data()` in `TensorAccessor` 2. support non-void return value for declaring CPU kernel stub 2. add `bool at:: geometry_is_contiguous(IntList sizes, IntList strides)` 1. The following vectorized CPU primitives are added: + `gather<scale>(baseaddr, vindex)`: `result[i] = baseaddr[vindex[i] * scale]` + `mask_gather<scale>(src, baseaddr, vindex, mask)`: `result[i] = mask[i] ? baseaddr[vindex[i] * scale] : src[i]`. + comparison ops + binary logical ops + `min(a, b)` + `cast<dst_t, src_t>(src_vec)`: changing dtype but keeping the bit representation + `blendv(a, b, mask)`: `result[i] = mask[i] ? b[i] : a[i]`. + ctor with multiple values (i.e., `setr`) + `arange(start = 0, step = 1)`: constructs a vector with values specified by the arange parameters + `convert_to_int_of_same_size(vec)`: convert floating point vector to corresponding integral type of same size + `interleave2(a, b)` & `deinterleave2(x, y)`: interleave or deinterleaves two vectors. E.g., for `interleave`: ``` inputs: {a0, a1, a2, a3, a4, a5, a6, a7} {b0, b1, b2, b3, b4, b5, b6, b7} outputs: {a0, b0, a1, b1, a2, b2, a3, b3} {a4, b4, a5, b5, a6, b6, a7, b7} ``` 2. Grid sample CPU kernel implementations are described in the following note (also in `GridSampleKernel.cpp`: ``` NOTE [ Grid Sample CPU Kernels ] Implementation of vectorized grid sample CPU kernels is divided into three parts: 1. `ComputeLocation` struct Transforms grid values into interpolation locations of the input tensor for a particular spatial dimension, basing on the size of that dimension in input tensor, and the padding mode. ``` ```cpp template<typename scalar_t, GridSamplerPadding padding> struct ComputeLocation { using Vec = Vec256<scalar_t>; // ctor ComputeLocation(int64_t size); // Given grid values `in`, return the interpolation locations after // un-normalization and padding mechanism (elementwise). Vec apply(const Vec &in) const; // Similar to `apply`, but also returns `d apply(in) / d in` // (elementwise). // this is often used in gradient computation. std::pair<Vec, Vec> apply_get_grad(const Vec &in) const; }; ``` ``` 2. `ApplyGridSample` struct Owns N `ComputeLocation` structs, where N is the number of spatial dimensions. Given N input grid vectors (one for each spatial dimension) and spatial offset, it gets the interpolation locations from `ComputeLocation`s, applies interpolation procedure, and then writes to the output (or grad_input & grad_grid in backward). ``` ```cpp template<typename scalar_t, int spatial_dim, GridSamplerInterpolation interp, GridSamplerPadding padding> struct ApplyGridSample { // ctor ApplyGridSample(const TensorAccessor<scalar_t, 4>& input); // Applies grid sampling (forward) procedure: // 1. computes interpolation locations from grid values `grid_x` and // `grid_y`, // 2. interpolates output values using the locations and input data // in `inp_slice`, and // 3. writes the first `len` values in the interpolated vector to // `out_slice` with spatial offset being `offset`. // // This assimes that `grid_x` and `grid_y` all contain valid grid // values \in [-1, 1], even at indices greater than `len`. // // The `*_slice` argument namess mean samples within a batch (i.e., // with the batch dimension sliced out). void forward(TensorAccessor<scalar_t, 3>& out_slice, const TensorAccessor<scalar_t, 3>& inp_slice, int64_t offset, const Vec& grid_x, const Vec& grid_y, int64_t len) const; // Applies grid sampling (backward) procedure. Arguments semantics // and strategy are similar to those of `forward`. void backward(TensorAccessor<scalar_t, 3>& gInp_slice, TensorAccessor<scalar_t, 3>& gGrid_slice, const TensorAccessor<scalar_t, 3>& gOut_slice, const TensorAccessor<scalar_t, 3>& inp_slice, int64_t offset, const Vec& grid_x, const Vec& grid_y, int64_t len) const; } ``` ``` 3. `grid_sample_2d_grid_slice_iterator` function Among the tensors we work with, we know that the output tensors are contiguous (i.e., `output` in forward, and `grad_input` & `grad_grid` in backward), we need to randomly read `input` anyways, and `grad_output` usually comes from autograd and is often contiguous. So we base our iterating strategy on the geometry of grid. `grid_sample_2d_grid_slice_iterator` function provides an abstract to efficiently iterates through a `grid` slice (without batch dimension). See comments of that function on the specific cases and strategies used. ``` ```cpp template<typename scalar_t, typename ApplyFn> void grid_sample_2d_grid_slice_iterator( const TensorAccessor<scalar_t, 3>& grid_slice, const ApplyFn &apply_fn); // `apply_fn` is a function/lambda that can be called as if it has // declaration: // void apply_fn(const Vec256<scalar_t>& grid_x, // const Vec256<scalar_t>& grid_y, // int64_t spatial_offset, int64_t len); ``` ``` `apply_fn` will be called multiple times, and together cover the entire output spatial space. Therefore, e.g., to implement forward 2d grid sample, we can do ``` ```cpp ApplyGridSample<scalar_t, 2, interp, padding> grid_sample(input_accessor); for (int n = 0; n < input_accessor.size(0); n++) { grid_sample_2d_grid_slice_iterator( grid_accessor[n], [&](const Vec256<scalar_t>& grid_x, const Vec256<scalar_t>& grid_y, int64_t spatial_offset, int64_t len) { grid_sample.forward(out_accessor[n], input_accessor[n], spatial_offset, grid_x, grid_y, len); }); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10980 Differential Revision: D9564867 Pulled By: SsnL fbshipit-source-id: 5b7c3c7ea63af00eec230ae9ee1c3e6c6c9679b4	2018-09-16 20:41:10 -07:00
peter	10c29c8970	Fix CUDA 8 build on Windows (#11729 ) Summary: Tested via https://github.com/pytorch/pytorch/pull/11374. Upstream PR: https://gitlab.kitware.com/cmake/cmake/merge_requests/2391 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11729 Differential Revision: D9847807 Pulled By: orionr fbshipit-source-id: 69af3e6c5bba0abcbc8830495e867a0b1b399c22	2018-09-16 08:09:24 -07:00
Jiyan Yang	ca6f08f359	Set correct dtype for fp16 op inference function (#11693 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11693 as desc. Reviewed By: hyuen Differential Revision: D9829061 fbshipit-source-id: 0f4c8a9d2b95d4cf5fa20a2aefd5671f273a8e76	2018-09-15 23:40:41 -07:00
Junjie Bai	b3e726042c	Do not use FixedDivisor in ROCM order switch op (#11697 ) Summary: Fix the recent order_switch_test failure in ROCM CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/11697 Reviewed By: BIT-silence Differential Revision: D9831039 Pulled By: bddppq fbshipit-source-id: 2368fd1ac7b1bab335ff3377071246cfd3392f3f	2018-09-15 18:24:51 -07:00
rohithkrn	eb3c47bdd5	max -> fmaxf in cross_entropy kernel (#11733 ) Summary: Changing `max` to `fmaxf` in `LabelCrossEntropy` kernel for hip to work correctly. bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/11733 Differential Revision: D9846783 Pulled By: bddppq fbshipit-source-id: c1b394d2ba7ee0e819f7bf3b36b53d1962de5522	2018-09-15 18:13:42 -07:00
Ailing Zhang	f09054f8d0	Remove deprecate warning for Upsampling (#11568 ) Summary: Fixes #11452 . Based on the discussion with SsnL and soumith , we want to bring back Upsample as a module instead of introducing a new nn.interpolate module for now. If anyone want to do downsample, they should use `nn.functional.interpolate ` instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11568 Differential Revision: D9804359 Pulled By: ailzhang fbshipit-source-id: 2b232d55fc83c2b581bf336f1ee8d1cf1c1159ca	2018-09-14 17:54:48 -07:00
Sebastian Messmer	bb6f18c44f	Simplify IValue::toTensor() (#11355 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11355 There is no reason to implement refcounting manually in this case. Given the correct NullType, toIntrusivePtr() and moveToIntrusivePtr() will do the right thing. Reviewed By: ezyang Differential Revision: D9694918 fbshipit-source-id: 8aae4d66aec32ca5f85c438d66339bd80b72b656	2018-09-14 16:57:15 -07:00
Sebastian Messmer	690c999bba	Simplify union payload copying (#11353 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11353 Before, there was one extra member in the union that had to be at least as large as the largest other member, because it was used for copying. Now, this isn't needed anymore and we copy the union directly. Reviewed By: ezyang Differential Revision: D9694326 fbshipit-source-id: 42b2f7d51ac5d4ea5ebafea3a598b018e10fed68	2018-09-14 16:57:14 -07:00
Sebastian Messmer	270fb22bd8	Remove intrusive_ptr::reclaim() in Storage (2/2) (#11547 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11547 Pushing manual refcounting further back, making things safer. Reviewed By: ezyang Differential Revision: D9778042 fbshipit-source-id: c9572edc440c5ce5ea1b2355b5c54f87078ea28e	2018-09-14 16:57:12 -07:00
Sebastian Messmer	f4d9fe395d	Remove intrusive_ptr::reclaim() in Storage (#11352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11352 Pushing manual refcounting further back, making things safer. Reviewed By: ezyang Differential Revision: D9694327 fbshipit-source-id: befdbcac199225383a93520472ee7c6511a0e9cd	2018-09-14 16:57:10 -07:00
Edward Yang	2c8a1b957e	Back out "Refactor Tensor/TensorImpl constructors." Summary: Original commit changeset: 7501b54fe5f3 Reviewed By: gchanan Differential Revision: D9838097 fbshipit-source-id: 093e4c47d5574ce99f706b0683ef369a89b62b38	2018-09-14 16:39:31 -07:00
Tongzhou Wang	8e76dcf173	Prevent raising KeyboardInterrupt in worker (#11718 ) Summary: Current behavior is that each process (main and workers) will print trace from `KeyboardInterrupt`. And the main process will also print ``` RuntimeError: DataLoader worker (pid 46045) exited unexpectedly with exit code 1. Details are lost due to multiprocessing. Rerunning with nm_workers=0 may give better error trace. ``` due to our SIGCLD handler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11718 Differential Revision: D9840844 Pulled By: SsnL fbshipit-source-id: 1a05060bb02907fef5aac3f274d2c84f9f42d187	2018-09-14 16:09:35 -07:00
Junjie Bai	d24bcfd930	Suppress hiprand "duplicate-decl-specifier" warning (#11698 ) Summary: Otherwise each build produces 65MB of warnings log, which makes the CI hard to debug. iotamudelta Jorghi12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11698 Differential Revision: D9840356 Pulled By: bddppq fbshipit-source-id: b69bf6a5c38a97b188221f9c084c608ffc9b37c8	2018-09-14 15:51:43 -07:00
Peter Goldsborough	8e3f8c52e8	Document the Sequential module (#11648 ) Summary: 1. Document the Sequential module in the C++ API at a high, why-does-this-exist, and low, how-to-use, level 2. Change the Sequential tests to be in a style that makes them easier to convert to gtest. No code changes. ebetica ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11648 Differential Revision: D9834526 Pulled By: goldsborough fbshipit-source-id: 39f2f5c6cbbf8ed5a1b69986978c8ef127036de1	2018-09-14 15:51:41 -07:00
Mike Ruberry	96d3f968eb	Splits CPU and CUDA fusion compilers (#10981 ) Summary: This PR splits the CPU and CUDA fusion compilers, putting them into a new jit/fusers/ directory with jit/fusers/common for common components. In particular: - A fusion interface is created that allows "fusion handles" to be requested - The CPU and CUDA fusers implement this interface, with dispatch determined by device - The fusion compilers, fusion function specializations and resource strings are split - CPU-specific classes like TempFile and DynamicLibrary are in the CPU fuser - Common classes likes TensorDesc and the base fusion function class are in jit/fusers/common - There is still some specialization in jit/fusers/common, but these specializations are small(-ish) - Updates the build system to remove the dummy interface on Windows and minimize the use of macros This structure should allow in-flight PRs to easily rebase while providing a clear interface to the fusers. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10981 Reviewed By: soumith Differential Revision: D9701999 Pulled By: apaszke fbshipit-source-id: 3b6bec7b97e0444b2a93caa38d9b897f2e68c1b3	2018-09-14 14:05:34 -07:00
David Riazati	70e68e755a	Casting for binary ops (#11708 ) Summary: Fixes #11663 `TensorIterator` was replacing the op tensors with type casted tensors which ended up producing side effects in binary ops like `a.float() * b` where `a` and `b` are `LongTensor`s. colesbury ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11708 Differential Revision: D9834016 Pulled By: driazati fbshipit-source-id: 4082eb9710b31dfc741161a0fbdb9a8eba8fe39d	2018-09-14 13:40:21 -07:00
Anders Papitto	224e62bbec	respect USE_CUDA_STATIC_LINK in build_libtorch.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11713 Differential Revision: D9835972 Pulled By: anderspapitto fbshipit-source-id: 046363b132e5487c05ef7e6e6d88b508196386a1	2018-09-14 12:25:08 -07:00
Michael Carilli	0c2648830f	Augment emit_nvtx to help connect backward-pass Function apply calls with their corresponding forward pass ops (#10881 ) Summary: Often, we find ourselves looking at some long-running kernel or emit_nvtx range on an nvvp profile and trying to connect it to the offending line in a training script. If the op is in the forward pass that's easy: ops are enqueued explicitly from the Python side, so tracking it down with manual nvtx ranges supplemented by the built-in emit_nvtx ranges is straightforward. If the op is in the backward pass, it's much more difficult. From the Python side, all you can do is wrap loss.backward() in an nvtx range, and if you also use emit_nvtx, the automatic ranges provide only local information. Right now, the only consistent way to connect backward-pass kernels to their associated forward-pass lines of Python is to understand your script line by line, and know exactly where in the backward pass you are. This PR augments the existing nvtx machinery to bridge the gap between forward and backward, allowing connection of backward-pass Function apply calls to the forward-pass operations that required/created those Functions. The method is simple and surgical. During the forward pass, when running with emit_nvtx, the nvtx range for each function in VariableType is tagged with the current sequence number. During the backward pass, the nvtx range associated with each Function's operator() is tagged with that Function's stashed sequence number, which can be compared to "current sequence numbers" from the forward pass to locate the associated op. Double-backward is not a problem. If a backward pass with create_graph = True is underway, the relationship between backward and double-backward is conceptually the same as the relationship between forward and backward: The functions in VariableType still spit out current-sequence-number-tagged ranges, the Function objects they create still stash those sequence numbers, and in the eventual double-backward execution, their operator() ranges are still tagged with the stashed numbers, which can be compared to "current sequence numbers" from the backward pass. Minor caveats: - The sequence number is thread-local, and many VariableType functions (specifically, those without a derivative explicitly defined in derivatives.yaml) don't create an associated function object (instead delegating that to sub-functions further down the call chain, perhaps called from within at::native functions that route back through VariableType by calling at::function_name). So the correspondence of stashed sequence numbers in Function operator() ranges with numbers in forward-pass ranges is not guaranteed to be 1 to 1. However, it's still a vast improvement over the current situation, and I don't think this issue should be a blocker. - Feel free to litigate my use of stringstream in profiler.cpp. I did it because it was easy and clean. If that's too big a hammer, let's figure out something more lightweight. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10881 Differential Revision: D9833371 Pulled By: apaszke fbshipit-source-id: 1844f2e697117880ef5e31394e36e801d1de6088	2018-09-14 11:56:55 -07:00
Gregory Chanan	b90872c00e	Get rid of default arguments for TH/THC factory functions. (#11673 ) Summary: This is causing codegen problems in caffe2, when we try to remove the circular Tensor/Type declarations. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11673 Differential Revision: D9819341 Pulled By: gchanan fbshipit-source-id: f2c2cd96e8a16f6de6aa4889e71b8a78e12e9256	2018-09-14 10:55:38 -07:00
Pieter Noordhuis	7535d98ec4	Add message tag parameter to send/recv Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11490 Reviewed By: teng-li Differential Revision: D9828116 Pulled By: pietern fbshipit-source-id: 98be1ae84b6763ffb329e63c030c5e3ec0e748b7	2018-09-14 10:55:37 -07:00
Peter Goldsborough	3258fc11a7	Delete torch/csrc/api/README.md (#11703 ) Summary: We'll have separate docs for the C++ frontend, right now this file is just misleading Pull Request resolved: https://github.com/pytorch/pytorch/pull/11703 Differential Revision: D9832847 Pulled By: goldsborough fbshipit-source-id: 2e8b30ccf6b5cba9d0526e6261160f7c6211a35c	2018-09-14 10:55:35 -07:00
James Reed	278e304c18	Implement elif in string frontend (#11667 ) Summary: Closes #11625 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11667 Differential Revision: D9828145 Pulled By: jamesr66a fbshipit-source-id: c72dc41cb310a4211b4e4c6b33f7e2c1fb3581a0	2018-09-14 10:09:46 -07:00
Roy Li	115b13ffab	clean up some old Half stuff Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11687 Differential Revision: D9829027 Pulled By: li-roy fbshipit-source-id: f35dcdf93ea57ba4fa775e36e9d6378bed46a710	2018-09-14 09:54:45 -07:00
Alexander Sidorov	eb039dc92c	Add CHECKs into GetTensorInfo and ExtractDeviceOption (#11597 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11597 We should always CHECK pointers which we plan to dereference if they are inputs to the function. Nobody knows how the function will be called in the future. Reviewed By: yinghai Differential Revision: D9800002 fbshipit-source-id: 7fd05f4717f2256d1b09a9e75475b12de6685b03	2018-09-14 09:40:27 -07:00
Vishwak Srinivasan	0d9b9100f9	Fix gesv and gels docs (#11699 ) Summary: Closes #9935 and closes #5431 . Differential Revision: D9830448 Pulled By: soumith fbshipit-source-id: 4e5320a1d0c1d4c8253a5b26f4842cea76530514	2018-09-14 09:24:45 -07:00
Edward Yang	72822ee6b2	Fix #11430 (CPU only builds raise opaque error message when calling .… (#11533 ) Summary: …cuda()) While I was at it, I audited all other ways I know how we might get a CUDA type from PyTorch and fixed more constructors which don't work. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11533 Differential Revision: D9775786 Pulled By: ezyang fbshipit-source-id: cd07cdd375fdf74945539ec475a48bf08cbc0c17	2018-09-14 09:10:08 -07:00
Gregory Chanan	2631da0822	Move some Tensor method definitions from Type.h to TensorMethods.h. (#11650 ) Summary: There's no reason they need to be in Type.h and this moves us along the path of not having circular dependencies (so we can get rid of TensorMethods.h). Pull Request resolved: https://github.com/pytorch/pytorch/pull/11650 Reviewed By: ezyang Differential Revision: D9812271 Pulled By: gchanan fbshipit-source-id: 8b70db9a5eb0a332398ab2e8998eeaf7d2eea6d7	2018-09-14 08:56:02 -07:00
Gregory Chanan	6c3792b9ec	Implement UndefinedType::typeMeta. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11666 Differential Revision: D9816212 Pulled By: gchanan fbshipit-source-id: 079899590150009bc2e2a3bbdc78a98de9380e37	2018-09-14 08:40:26 -07:00
Neeraj Pradhan	cda71e2600	Disallow scalar parameters in Dirichlet and Categorical (#11589 ) Summary: This adds a small check in `Dirichlet` and `Categorical` `__init__` methods to ensure that scalar parameters are not admissible. Motivation Currently, `Dirichlet` throws no error when provided with a scalar parameter, but if we `expand` a scalar instance, it inherits the empty event shape from the original instance and gives unexpected results. The alternative to this check is to promote `event_shape` to be `torch.Size((1,))` if the original instance was a scalar, but that seems to add a bit more complexity (and changes the behavior of `expand` in that it would affect the `event_shape` as well as the `batch_shape` now). Does this seem reasonable? cc. alicanb, fritzo. ```python In [4]: d = dist.Dirichlet(torch.tensor(1.)) In [5]: d.sample() Out[5]: tensor(1.0000) In [6]: d.log_prob(d.sample()) Out[6]: tensor(0.) In [7]: e = d.expand([3]) In [8]: e.sample() Out[8]: tensor([0.3953, 0.1797, 0.4250]) # interpreted as events In [9]: e.log_prob(e.sample()) Out[9]: tensor(0.6931) # wrongly summed out In [10]: e.batch_shape Out[10]: torch.Size([3]) In [11]: e.event_shape Out[11]: torch.Size([]) # cannot be empty ``` Additionally, based on review comments, this removes `real_vector` constraint. This was only being used in `MultivariateNormal`, but I am happy to revert this if we want to keep it around for backwards compatibility. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11589 Differential Revision: D9818271 Pulled By: soumith fbshipit-source-id: f9bbba90ed6f04e0b5bdfa169e70ca20b280fc74	2018-09-14 07:55:35 -07:00
Neeraj Pradhan	c391c20063	Adding .expand method for TransformedDistribution (#11607 ) Summary: This PR: - adds a `.expand` method for `TransformedDistribution` along the lines of #11341. - uses this method to simplify `.expand` in distribution classes that subclass off of `TransformedDistribution`. - restores testing of `TransformedDistribution` fixtures. - fixes some bugs wherein we were not setting certain attributes in the expanded instances, and adds tests for `.mean` and `.variance` which use these attributes. There are many cases where users directly use `TransformedDistribution` rather than subclassing off it. In such cases, it seems rather inconvenient to have to write a separate class just to define a `.expand` method. The default implementation should suffice in these cases. cc. fritzo, vishwakftw, alicanb Pull Request resolved: https://github.com/pytorch/pytorch/pull/11607 Differential Revision: D9818225 Pulled By: soumith fbshipit-source-id: 2c4b3812b9a03e6985278cfce0f9a127ce536f23	2018-09-14 07:55:33 -07:00
Edward Yang	74197c7115	Restore support for dim=None on WeightNorm. (#11661 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11661 Reviewed By: veenix Differential Revision: D9826799 Pulled By: ezyang fbshipit-source-id: 9eec57bb27a365406669e412f6eb88741b22ed3d	2018-09-14 07:39:43 -07:00
Edward Yang	19065f91fc	Centralize TypeExtendedInterface casts. (#11576 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11576 Previously, they were spattered throughout the codebase. We now follow this convention: - LegacyTypeDispatch gives you Type - Context gives you TypeExtendedInterface - Tensor::type() gives you Type - at::getType() gives you TypeExtendedInterface I change some sites to use getType() over type(). Reviewed By: SsnL Differential Revision: D9790187 fbshipit-source-id: 5e2577cb590a5bbf5df530f3763d3b3c0b4625ca	2018-09-14 07:39:41 -07:00
Jiyan Yang	c5f7da3f4a	Support FP16 sparse lookup (#11674 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11674 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11658 Reviewed By: hyuen Differential Revision: D9676950 fbshipit-source-id: 89a115b9664b84e4e4436b7da033e5a428c2246d	2018-09-14 02:40:08 -07:00
zrphercule	1637729620	Fix ci by skipping some tests (#11668 ) Summary: scalar_tensor_test skipped Pull Request resolved: https://github.com/pytorch/pytorch/pull/11668 Differential Revision: D9825819 Pulled By: zrphercule fbshipit-source-id: 6e62a001bcde49be8f7af1501b303bd93d09d005	2018-09-13 20:25:14 -07:00
Edward Yang	e6fe8d9cf5	Try to delete codeowners for ATen/core (#10693 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10693 Reviewed By: soumith Differential Revision: D9772210 Pulled By: ezyang fbshipit-source-id: 14560eaf77441980e9784536acd0ffe20b15c5b8	2018-09-13 20:25:11 -07:00
Fritz Obermeyer	2431eac7c0	Ensure most Distribution methods are jittable (#11560 ) Summary: This adds tests in tests/test_distributions.py to ensure that all methods of `Distribution` objects are jittable. I've replaced a few samplers with jittable versions: - `.uniform_()` -> `torch.rand()` - `.exponential_()` -> `-(-torch.rand()).log1p()` - `.normal_()` -> `torch.normal(torch.zeros(...), torch.ones(...), ...)` Some jit failures remain, and are marked in test_distributions.py - `Cauchy` and `HalfCauchy` do not support sampling due to missing `.cauchy_()` - `Binomial` does not support `.enumerate_support()` due to `arange` ignoring its first arg. - `MultivariateNormal`, `LowRankMultivariateNormal` do not support `.mean`, `.entropy` - [x] Currently some tests fail (I've skipped those) due to unavailability of `aten::uniform` and `aten::cauchy` in the jit. Can someone suggest how to add these? I tried to add declarations to `torch/csrc/ir.cpp` and `torch/csrc/passes/shape_analysis.cpp`, but that resulted in "Couldn't find operator" errors. - [x] There are still lots of `TracerWarning`s that something doesn't match something. I'm not sure whether these are real. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11560 Differential Revision: D9816327 Pulled By: apaszke fbshipit-source-id: 72ec998ea13fc4c76d1ed003d9502e0fbaf728b8	2018-09-13 19:55:01 -07:00
xhzhao	99c0b96f68	optimize norm on ATen CPU backend (#11565 ) Summary: current torch.norm() runs sequentially on CPU. This PR did parallelization and vectorization of torch.norm() on ATen CPU path, roughly provide 2 order of magnitude performance boost. Performance is benchmarks on Xeon skylake 8180, 228 cores 2.5GHz, using the following script: ```python import torch from time import time count = 1000 size = 10001000 def test_norm(p=2): a = torch.randn(size) tstart = time() for i in range(count): torch.norm(a, p) tend = time() print("norm on size %d tensor p = %d: %f s" % (size, p, (tend-tstart))) for p in range(4): test_norm(p) ``` without this optimization, ``` (intel-pytorch) [mingfeim@mlt-skx065 unit_tests]$ python test_norm.py norm on size 1000000 tensor p = 0: 1.071235 s norm on size 1000000 tensor p = 1: 1.069149 s norm on size 1000000 tensor p = 2: 1.068212 s norm on size 1000000 tensor p = 3: 69.735312 s ``` and with this optimization, ``` (pytorch-tf) [mingfeim@mlt-skx053 unit_tests]$ python test_norm.py norm on size 1000000 tensor p = 0: 0.127507 s norm on size 1000000 tensor p = 1: 0.011867 s norm on size 1000000 tensor p = 2: 0.011907 s norm on size 1000000 tensor p = 3: 0.014470 s ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11565 Differential Revision: D9804484 Pulled By: ezyang fbshipit-source-id: 52899f30ac26139d00684d07edfb47cb9b25d871	2018-09-13 19:40:43 -07:00
Adam Paszke	98e04db955	Implement requires_grad propagation in the JIT (#11586 ) Summary: Previously, we would pretty much assume that all floating point tensors do require grad, which might result in some unnecessary compute. I don't really like the fact that `TensorType` uses `tensor.is_variable() && tensor.requires_grad()` to infer the value of `requires_grad`, but changing constants to keep variables turns out to be pretty hard. I got halfway there, but it would still need some more work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11586 Reviewed By: ezyang Differential Revision: D9813648 Pulled By: apaszke fbshipit-source-id: 77f77756d18ff7632fca3aa68ce855e1d7f3bdb8	2018-09-13 19:25:26 -07:00
Gao, Xiang	513fd3dd36	Improve doc of `torch.nn.functional.pad` (#11623 ) Summary: I'm reading the doc of `torch.nn.functional.pad` and it looks a bit confusing to me. Hopefully this PR makes it clearer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11623 Differential Revision: D9818255 Pulled By: soumith fbshipit-source-id: 4f6b17b0211c6927007f44bfdf42df5f84d47536	2018-09-13 19:25:24 -07:00
Tongzhou Wang	760679352e	Move Pixel Shuffle to ATen (#9721 ) Summary: <del>#9692 </del> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9721 Differential Revision: D8955829 Pulled By: SsnL fbshipit-source-id: 4f4d1c7720b6f757fbef9a10f70209ae76f61399	2018-09-13 18:25:48 -07:00
Edward Yang	e1cd220b90	Reimplement swap() using default move constructor. (#11659 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11659 This is less error-prone and less code. Reviewed By: smessmer Differential Revision: D9814536 fbshipit-source-id: 028510e31e2fa7a9fa11c1398b0743c5cd085dd5	2018-09-13 16:32:55 -07:00
Edward Yang	02980d7f8c	Refactor Tensor/TensorImpl constructors. (#11657 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11657 Previously, we had a constructor in TensorImpl for every constructor in Tensor. This was unnecessary and wordy: Tensor is the user-visible class, so it deserves the constructors, but TensorImpl is internal and doesn't need it. So I replaced TensorImpl with a single, Storage accepting constructor, and then rewrote Tensor to use that constructor. Reviewed By: jerryzh168 Differential Revision: D9813742 fbshipit-source-id: 7501b54fe5f39180f1bc07573fd7c1640b0f4e89	2018-09-13 16:32:53 -07:00
Edward Yang	7607b49538	s/GetDevicetype/device_type/ (#11656 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11656 The mis-capitalization really sticks up my craw. I know why (we already have a static function named GetDeviceType), but let's name it differently. ``` codemod -d . --extensions cc,cpp,cu,cuh,h,py,hpp,TARGETS GetDevicetype device_type ``` Reviewed By: jerryzh168 Differential Revision: D9813544 fbshipit-source-id: fe462f4bc40b03e74921f8cf5ebd9cfc52e7e636	2018-09-13 16:32:51 -07:00
Edward Yang	c18510463b	Reduce includes in tensor_impl.h (#11643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11643 - Reduce the tensor_impl.h includes to the bare minimum necessary - Explicitly namespace std:: Reviewed By: jerryzh168 Differential Revision: D9811028 fbshipit-source-id: 44e32720962b35c12a7b2c93605721b9f6c5b254	2018-09-13 16:32:49 -07:00
Edward Yang	8402fde279	Revert D9778043: Pass Storage by value Differential Revision: D9778043 Original commit changeset: b1381cd60a82 fbshipit-source-id: 40f1de67e939cb41605978d632105a48a91e7629	2018-09-13 16:32:48 -07:00
Gregory Chanan	85ff72348d	Only involve tensor device in CUDA -> CPU copy, not current device. (#11592 ) Summary: This also unifies the device usage between the async and sync case. Fixes https://github.com/pytorch/pytorch/issues/10832. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11592 Differential Revision: D9797355 Pulled By: gchanan fbshipit-source-id: e496cd371111cfaf9a6c664167967b395e3d72e9	2018-09-13 16:32:46 -07:00
Sebastian Messmer	4672280b55	Pass Storage by value (#11546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11546 - Reviewed By: ezyang Differential Revision: D9778043 fbshipit-source-id: b1381cd60a826055ce8771d6c67eac4cc375b3b4	2018-09-13 15:26:05 -07:00
Jeff Smith	05e06f7de2	migrating deprecated calls without abc module for containers (#11515 ) Summary: Implementing #10540. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11515 Reviewed By: apaszke Differential Revision: D9771045 Pulled By: jeffreyksmithjr fbshipit-source-id: 85ea39abaa9b465805a969f122b626b11fc85ef6	2018-09-13 15:09:22 -07:00
Pieter Noordhuis	29e29ca6ee	Use MPI_Isend/MPI_Irecv to back send/recv (#11630 ) Summary: The isCompleted function is changed to being non-const to accomodate setting some internal status on the work object in the case of completion. Previously, it was only checking a member field, but for the MPI backend it calls MPI_Test to poll for completion of an asynchronous request. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11630 Reviewed By: SsnL Differential Revision: D9808008 Pulled By: pietern fbshipit-source-id: 18b70825b1fb4d561a552fa75e9475a522852cd4	2018-09-13 15:01:24 -07:00
Marc Ferradou	f129da1a47	Add max to the ValueError for EmbeddingBag mode check (#11655 ) Summary: Related to #11624 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11655 Differential Revision: D9815454 Pulled By: SsnL fbshipit-source-id: 8dd82e0c0aa68362e12b301e095a85af7d7fd71a	2018-09-13 14:39:40 -07:00
Sebastian Messmer	90537289a0	Constexpr std::move / std::forward for C++11 (#11396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11396 std::move and std::forward in C++11 aren't constexpr (they are in C++14). This caused a build issue orionr was working on. It should be fixed by this diff Reviewed By: orionr Differential Revision: D9724805 fbshipit-source-id: 0d9047dce611385d659cc71a6c04cc7a6a40a5ae	2018-09-13 12:56:17 -07:00
James Reed	0f1ca569ce	End-to-end dynamic slicing with ONNX DynamicSlice experimental operator (#11255 ) Summary: Requires https://github.com/onnx/onnx/pull/1377 This PR makes it so that slices with dynamic boundary values can be exported from pytorch and run in caffe2 via ONNX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11255 Differential Revision: D9790216 Pulled By: jamesr66a fbshipit-source-id: 6adfcddc5788df4d34d7ca98341077140402a3e2	2018-09-13 12:39:52 -07:00
Soumith Chintala	acb6f18bab	fix generate_code.py caching (#11644 ) Summary: Currently, because of some setup.py logic, `ninja` caching of the `generate_code.py` build step was broken. This resulted in `generate_code.py` running every single time builds were happening, regardless of whether inputs changed. This updated logic fixes the input caching Pull Request resolved: https://github.com/pytorch/pytorch/pull/11644 Reviewed By: orionr Differential Revision: D9814348 Pulled By: soumith fbshipit-source-id: 2012960908d0f600488d410094095cfd72adc34f	2018-09-13 12:39:48 -07:00
Roy Li	75f49befeb	move instance_norm to aten (#10792 ) Summary: This also removes the usage of torch.onnx.symbolic_override in instance_norm. Fixes #8439. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10792 Differential Revision: D9800643 Pulled By: li-roy fbshipit-source-id: fa13a57de5a31fbfa2d4d02639d214c867b9e1f1	2018-09-13 12:26:22 -07:00
Edward Yang	912d3626c8	Split tensor.h into tensor_impl.h and tensor.h (#11642 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11642 This is just a preparatory change to help with future refactoring: - I want to reduce the number of includes that tensor_impl.h depends on, but - I need to keep tensor.h providing all Caffe2 headers, because users may be relying on tensor.h transitively providing those headers. Introducing a level of indirection lets me do both at the same time. Reviewed By: jerryzh168 Differential Revision: D9810823 fbshipit-source-id: 8dfaac4b8768051a22898be8fcaf787ecc57eb13	2018-09-13 12:26:20 -07:00
Richard Zou	45e9ee096e	Fix test_mnist_training_leaks_no_memory_cuda warning (#11639 ) Summary: Before this PR it would warn that "dropout is non deterministic and can cause problems when checking trace", so I disabled the trace checking. cc zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11639 Differential Revision: D9812493 Pulled By: zou3519 fbshipit-source-id: fab86928a5fba8b218b47543533aaf7c82a10b4a	2018-09-13 12:09:20 -07:00
Roy Li	9abc666745	stop allowing extra positional args in arg parser (#10499 ) Summary: Arg parser allowed additional positional args to be parsed into keyword-only params. Fixes a couple cases: - The positional argument happens to be of the right type, and it just works silently. Now, we fail as expected. - The positional argument fails later down the line. Now, we fail at the appropriate time and get a better error message. Pre-fix: ``` >>> torch.cuda.LongTensor((6, 0), 1, 1, 0) tensor([6, 0], device='cuda:1') ``` Post-fix: ``` >>> torch.cuda.LongTensor((6, 0), 1, 1, 0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: new() received an invalid combination of arguments - got (tuple, int, int, int), but expected one of: * (torch.device device) * (torch.Storage storage) * (Tensor other) * (tuple of ints size, torch.device device) * (object data, torch.device device) ``` Pre-fix: ``` >>> a = torch.tensor(5) >>> a.new_zeros((5,5), 0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: new_zeros(): argument 'dtype' (position 2) must be torch.dtype, not int ``` Post-fix: ``` >>> a = torch.tensor(5) >>> a.new_zeros((5,5), 0) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: new_zeros() takes 1 positional argument but 2 were given ``` fixes #8351 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10499 Differential Revision: D9811093 Pulled By: li-roy fbshipit-source-id: ce946270fd11b264ff1b09765db3300879491f76	2018-09-13 11:56:12 -07:00
David Riazati	6f53b4efea	Remove implicit bool casts (#11503 ) Summary: In order to comply with Python's rules on implicit casting of non-booleans to booleans, this PR removes implicit casting in favor of explicit casts via `bool()` cc zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11503 Differential Revision: D9780869 Pulled By: driazati fbshipit-source-id: c753acaca27f4e79dddf424c6b04674f44a6aad9	2018-09-13 11:26:45 -07:00
Zachary DeVito	ab3a2d25fb	Improve error messages when trying to use nested lists. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11606 Differential Revision: D9806949 Pulled By: zdevito fbshipit-source-id: c38abc4ce745a63d26a64f6aa1b41350e4b1acd5	2018-09-13 11:10:38 -07:00
Roger-luo	5bc90b8554	support conversion and dispatch of complex numbers (#11603 ) Summary: - Just a simple fix to support `fill_` - And a fix for indexing in `pytorch-complex` Differential Revision: D9804061 Pulled By: ezyang fbshipit-source-id: 631129b3fa220a9670770b3766f14a8e03633bdf	2018-09-13 11:10:37 -07:00
Roy Li	a861573e36	fix tensor export bug in IR export (#11613 ) Differential Revision: D9811094 Pulled By: li-roy fbshipit-source-id: 012792dbedc70bd3fa242fdf2e39da0b21ce158d	2018-09-13 11:10:35 -07:00
Lu Fang	d278344e36	Automatic update of fbcode/onnx to 39dd0d4fec5913aa517b71bcfcbf638a427894eb (#11622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11622 Previous import was bff0b8835870c7df7762ef43498d000d2d8ffb52 Included changes: - [39dd0d4](https://github.com/onnx/onnx/commit/39dd0d4): [build] Add ONNX_API for protos in all cases (#1407) <Orion Reblitz-Richardson> - [944db4f](https://github.com/onnx/onnx/commit/944db4f): cmake (#1401) <zrphercule> - [8ccc8dd](https://github.com/onnx/onnx/commit/8ccc8dd): Remove ONNXIFI_CHECK_RESULT from onnxRelease* functions (#1397) <Marat Dukhan> - [df14e74](https://github.com/onnx/onnx/commit/df14e74): Change onnxifi test driver classname (#1396) <zrphercule> - [0c885cc](https://github.com/onnx/onnx/commit/0c885cc): ONNXIFI cpp test driver (#1290) <zrphercule> - [a557848](https://github.com/onnx/onnx/commit/a557848): Coverage Report Tools for Backend Scoreboard (#1301) <Akshay Chalana> - [31fd87f](https://github.com/onnx/onnx/commit/31fd87f): fix AvgPool doc. add default value for count_include_pad (#1391) <Wenhao Hu> - [8ff08c2](https://github.com/onnx/onnx/commit/8ff08c2): Do not export onnx symbols in the python extension (#1388) <bddppq> Reviewed By: orionr Differential Revision: D9806635 fbshipit-source-id: f61c052b6bd14e0c80ace19c1a5f0ba659030c6f	2018-09-13 10:40:48 -07:00
Edward Yang	1f49b879d1	Add missing include for __half (#11638 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11638 Differential Revision: D9811063 Pulled By: ezyang fbshipit-source-id: dd103bb152485bcdbb0108b4d3de2443c30d5572	2018-09-13 10:33:09 -07:00
Tongzhou Wang	d4d72b87e3	Sphinx is case sensitive Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11646 Differential Revision: D9811355 Pulled By: SsnL fbshipit-source-id: d484561baa2ac5b3113870b4ee06fa3560b686e4	2018-09-13 10:33:06 -07:00
Tongzhou Wang	57f149a861	Only join pin_memory_thread after it started (#11599 ) Summary: Same reason as in #11432 . Example error: ``` Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa06963cf28> Traceback (most recent call last): File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 405, in __del__ self._shutdown_workers() File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 401, in _shutdown_workers self.pin_memory_thread.join() AttributeError: '_DataLoaderIter' object has no attribute 'pin_memory_thread' ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11599 Differential Revision: D9801143 Pulled By: SsnL fbshipit-source-id: 520590a21f56fa381fcac621457a7544d3fba47e	2018-09-13 09:40:49 -07:00
Christian Puhrsch	36fc1a0a58	Merge caffe2::/at::Storage Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11637 Reviewed By: gchanan Differential Revision: D9806425 Pulled By: ezyang fbshipit-source-id: e20ec93bff6dc7fb22ca9b7e7348d060b3876b67	2018-09-13 09:40:48 -07:00
Elias Ellison	77f6998e54	Guard against inputting or returning sparse tensors (#11550 ) Summary: Add guards against using sparse tensor by checking the conversion from IValue -> PyObject & PyObject -> IValue. This diff also changes the behavior in constant propagation to not run python ops even if all ops are constant because of possible mutation to global state. This came up in trying to run get_sparse(), and I'm including it here to make it easier to land. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11550 Differential Revision: D9804712 Pulled By: eellison fbshipit-source-id: 9fe7daf721c6d6e48df4925c0f9c775873bcdc77	2018-09-13 08:58:29 -07:00
Christian Puhrsch	cac11a4ac3	Merge caffe2::/at::StorageImpl (#11543 ) Summary: Merges caffe2::StorageImpl methods with at::StorageImpl methods and defines caffe2::StorageImpl as at::StorageImpl. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11543 Differential Revision: D9795228 Pulled By: cpuhrsch fbshipit-source-id: fbd6fa3cbf6c9099a4803337286c30e00652f95c	2018-09-13 01:25:50 -07:00
Wanchao Liang	44b2b6b150	clean up jit generated tests (#11403 ) Summary: Clean up some generated tests after we have newly nice features like var args. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11403 Differential Revision: D9800545 Pulled By: wanchaol fbshipit-source-id: e9973b113f78dc38cf99a81b6ede3fa3485f1cfa	2018-09-12 22:55:03 -07:00
Christian Puhrsch	e998038bc0	Use TypeMeta instead of TypeIdentifier within at::StorageImpl (#11236 ) Summary: Further aligns at::StorageImpl with caffe2::StorageImpl Pull Request resolved: https://github.com/pytorch/pytorch/pull/11236 Differential Revision: D9776286 Pulled By: cpuhrsch fbshipit-source-id: f2c53995fcece013b77b3a1f709ab0f9df8ab23e	2018-09-12 22:26:00 -07:00
Edward Yang	6f05b5ee54	Pin Sphinx to 1.7.9 (#11620 ) Summary: Sphinx 1.8.0 breaks us. Upgrading is tracked in #11618. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11620 Differential Revision: D9806440 Pulled By: ezyang fbshipit-source-id: 7a8d849c78e697a8775d00cd3a463a7bdbcddabe	2018-09-12 21:55:21 -07:00
Guan Pang	17637f2b03	enable_mkl support for resnet18+lstm model Summary: * Many op in lstm part of the model don't have implementation in ideep/mkl, and it doesn't make sense to copy back and forth for the few available ops because majority of RNN will be on CPU * Thus the strategy is to enable mkl only for the resnet18 part of the model, then switch to default cpu engine for the lstm part * The net may contain some external_inputs falsely added during ONNX->Caffe2. Canary in service shows their existence could leads to service crash (presumably due to these blob somehow get shared between threads). They're now manually removed which seem to be enough to avoid the crash. Reviewed By: viswanathgs Differential Revision: D8888763 fbshipit-source-id: da7761bcb7d876ff7bbb6640ae4b24712c0b1de6	2018-09-12 18:56:46 -07:00
Orion Reblitz-Richardson	0a6931cfee	Only reference ONNX through onnx_pb.h (#11609 ) Summary: I think this is needed to land https://github.com/onnx/onnx/pull/1407 without CI errors. cc mingzhe09088 houseroad Pull Request resolved: https://github.com/pytorch/pytorch/pull/11609 Reviewed By: houseroad Differential Revision: D9803490 Pulled By: orionr fbshipit-source-id: 26193f38ab0a2eef9ad7d0da9a0310dc40ef0f2d	2018-09-12 18:25:58 -07:00
Edward Z. Yang	5da0b31bee	More native docs on TensorOptions. (#11558 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11558 Differential Revision: D9783655 Pulled By: ezyang fbshipit-source-id: 17c749c9ef99fd9dfd0ff365ebfe22102fb891d7	2018-09-12 17:39:39 -07:00
Roy Li	f00f99ebcc	use at::Half in THC (#11322 ) Summary: - use Half instead of half in THC - clean up TH_float2half, TH_half2float, etc. conversions Pull Request resolved: https://github.com/pytorch/pytorch/pull/11322 Differential Revision: D9799553 Pulled By: li-roy fbshipit-source-id: 9aa3e003bff73d9df6224a393f3ec0624b1f44ed	2018-09-12 17:39:37 -07:00
Edward Yang	daa379ffd7	Disable flaky test ObserverTest.TestMultipleNetBase (#11596 ) Summary: Tracked in https://github.com/pytorch/pytorch/issues/9137 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11596 Differential Revision: D9803256 Pulled By: ezyang fbshipit-source-id: 973393203ed8343a3a0feef36d34e561d9f653c4	2018-09-12 17:39:36 -07:00
Edward Yang	e2cd627cce	Temporarily disable docs build. (#11608 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11608 Differential Revision: D9803369 Pulled By: ezyang fbshipit-source-id: a206d6137e8e729f702189c926ec898444d1dc53	2018-09-12 17:39:34 -07:00
Xiaomeng Yang	7f7cda99cd	Optimize order_swich_ops on GPU (#11404 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11404 Optimize order_swich_ops on GPU Reviewed By: houseroad Differential Revision: D9728642 fbshipit-source-id: 74ff62268856fb1613fa61eb214bed6ec6716632	2018-09-12 16:56:15 -07:00
Johannes M Dieterich	776a9992e1	topk test fix, hgemm integration (#11593 ) Summary: After discussions in #11584 , new PR for just the test skip and hgemm integration. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11593 Differential Revision: D9798527 Pulled By: ezyang fbshipit-source-id: e2ef5609676571caef2f8e6844909fe3a11d8b3e	2018-09-12 16:56:13 -07:00
Edward Yang	def44c96fd	Revert D9779866: [pytorch][PR] Move function deletion from the stack to the heap. Differential Revision: D9779866 Original commit changeset: 96753eead790 fbshipit-source-id: 959deeb63318d48f4c563e10e70ef6ec7fabd3b4	2018-09-12 16:56:11 -07:00
Peter Goldsborough	5b2efcf425	Document the Conv module (#11566 ) Summary: Document the C++ API conv module. No code changes. ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11566 Differential Revision: D9793665 Pulled By: goldsborough fbshipit-source-id: 5f7f0605f952fadc62ffbcb8eca4183d4142c451	2018-09-12 16:56:09 -07:00
Peter Goldsborough	130d55a5f4	Allow building the C++ API without cereal (#11498 ) Summary: I am working on unifying the C++ extensions and C++ API, and one constraint for this is that we will want to be able to build the C++ API without cereal, since we won't want to ship it with the Python `torch` package. For this I introduce a `TORCH_WITH_CEREAL` option to CMake. If on, the C++ API will be built with cereal and thus serialization support. If off, serialization functions will throw exceptions, but the library will otherwise still compile the same. __This option is on by default, so for regular C++ API users nothing will change__. However, from C++ extensions, we'll be able to turn it off. This effectively means we won't be searching for any cereal headers from C++ API headers, which wouldn't be installed in the Python package. ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11498 Differential Revision: D9784803 Pulled By: goldsborough fbshipit-source-id: 5d0a1f2501993012d28cf3d730f45932b483abc4	2018-09-12 16:56:07 -07:00
Eli Amesefe	12efef166a	Split out copy_op from utility_ops (#11470 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11470 In order to reduce build sizes, we are identifying files that can be split up into smaller units, allowing us to only include the ops we need. Reviewed By: orionr, ajtulloch Differential Revision: D9725819 fbshipit-source-id: def1074a33dffe99bd6a7e6e48aa9e5be3d04a6a	2018-09-12 16:25:48 -07:00
Yinghai Lu	316c167940	Add checking of nullptrs in GetTensorInfo (#11587 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11587 To help debug the issue in T33295362, we add some checks in the function. Possible crashing site in `GetTensorInfo` 1. tc is nullptr, which is checked. 2. tc->capacity_nbytes() hits nullptr, this is unlikely because storage is not a pointer and compute of capacity_nbytes doesn't involve pointers. It's numel * itermsize(). 3. tc->ExtractDeviceOption hits nullpt. One possibility raw_data() is nullptr because tc->ExtractDeviceOption will use that. This is checked. 4. Tensor itself which is not a reference. This is also checked. Reviewed By: salexspb Differential Revision: D9793484 fbshipit-source-id: 3fc72746fc310a23ae45553bbe0d269a4b9edb72	2018-09-12 16:25:46 -07:00
Xiaodong Wang	eb7a298489	Add resnext model to OSS (#11468 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11468 Add resnext model into OSS Caffe 2 repo. Reviewed By: orionr, kuttas Differential Revision: D9506000 fbshipit-source-id: 236005d5d7dbeb8c2864014b1eea03810618d8e8	2018-09-12 15:59:20 -07:00
Peter Goldsborough	c81406c514	Document Any (#11580 ) Summary: Documents the `AnyModule` class in the C++ API. Also changed the API to be friendlier by default. Calling `AnyModule::forward` used to return an `AnyModule::Value` which you had to call `.get<T>()` on to cast to a concrete type. I changed the name of that `forward` method to `any_forward` and instead made `forward` templated on a `ReturnType` template parameter which you can supply to do the `.get<T>` cast for you automatically. I default this parameter to `torch::Tensor` so that it can often be omitted. So where you used to have to write ```cpp any_module.forward(...).get<int>(); any_module.forward(...).get<torch::Tensor>(); ``` you now write ```cpp any_module.forward<int>(...); any_module.forward(...); ``` ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11580 Differential Revision: D9798626 Pulled By: goldsborough fbshipit-source-id: 060b4ea28facaffc417f53b80b846a9dff9acb73	2018-09-12 15:59:19 -07:00
Tongzhou Wang	ac94889939	Add jit doc entry to sidebar (#11598 ) Summary: cc zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11598 Differential Revision: D9801230 Pulled By: SsnL fbshipit-source-id: f0c8d2468b64a50c3c437667d462722dcd2682d1	2018-09-12 15:29:23 -07:00
Edward Yang	b663b7ce7e	Update ROCm Docker image with latest AMD debians (#11507 ) Summary: Building at https://ci.pytorch.org/jenkins/job/caffe2-docker-trigger/194/ Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11507 Differential Revision: D9772474 Pulled By: ezyang fbshipit-source-id: ab00f05744547dc7ec9f97511e2c8495ac282fac	2018-09-12 15:29:21 -07:00
Tongzhou Wang	02c4cd3c8a	Skip flaky distributed tests (#11594 ) Summary: context: https://github.com/pytorch/pytorch/issues/11582 cc pietern The controller you requested could not be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11594 Differential Revision: D9798871 Pulled By: SsnL fbshipit-source-id: 9f9e1871c7fd9505ca898865eb8068fab4d3416d	2018-09-12 14:57:57 -07:00
Owen Anderson	d4e05f4e1e	Move function deletion from the stack to the heap. (#11534 ) Summary: This eliminates the need for any heuristics regarding stack size limits. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11534 Differential Revision: D9779866 Pulled By: resistor fbshipit-source-id: 96753eead7904bbdc2869fb01f7bd42141032347	2018-09-12 14:39:59 -07:00
Lingyi Liu	958ba4e913	Aibench for asr decoder Summary: as title Reviewed By: sf-wind Differential Revision: D9738021 fbshipit-source-id: 98f570484bca6486ad99207732efd534ec7e3251	2018-09-12 14:25:19 -07:00
Edward Yang	f0a440007e	Explicitly set locale on docs build. (#11595 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11595 Differential Revision: D9798567 Pulled By: ezyang fbshipit-source-id: ac05458347e181960a07cacae1dfc68d2837451f	2018-09-12 14:11:24 -07:00
James Reed	504126e705	Documentation for debugging JIT Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11540 Differential Revision: D9798647 Pulled By: jamesr66a fbshipit-source-id: 968a4af22c735a848fa27cbadaed9b7023ba8276	2018-09-12 14:11:22 -07:00
Michael Carilli	a3036b3bb3	Fused weightnorm for ATen (#10842 ) Summary: This PR contains a C++ implementation of weight norm. The user-side exposure of weight norm through torch.nn.utils.weight_norm is unchanged. If running on the GPU, and the norm is requested over the first or last dimension of the weight tensor, the forward pass is carried out using the fused kernels I wrote for our Fairseq GTC hero run, which offer superior performance to primitive ops and superior numerical stability when running in FP16. In the common case that the backward pass is not itself constructing a graph (ie not attempting to set up double backward) the backward pass will be carried out using another fused kernel. If the backward pass is constructing a graph, an alternate code path is taken, which does the math using differentiable primitive ops. In this way, the implementation allows double backward, even if the fused kernel was used in forward (although in this case, you don't benefit from the performance and stability of the fused backward kernel). If running on the CPU, or if norming over an interior dim, the forward pass is carried out using double-differentiable primitive ops. Figuring out how to generate all the right plumbing for this was tricky, but it was a fun experience learning how the autogenerator works and how the graph is constructed. Thanks to colesbury for useful guidance on this front. I do have a few lingering questions: - Should I unify my return statements (ie by default-constructing Tensors outside if blocks and using operator= within)? - What is the significance of `non_blocking` when calling e.g. `auto norms = saved_norms.to(saved_g.type().scalarType(), non_blocking=True/False);`? I am currently omitting `non_blocking`, so it defaults to False, but I didn't see any associated synchronizes on the timeline, so I'm wondering what it means. - Is there an "official" mapping from at::ScalarTypes to corresponding accumulate types, as there are for the PODs + Half in [AccumulateType.h](https://github.com/pytorch/pytorch/blob/master/aten/src/ATen/AccumulateType.h)? I looked for an equivalent mapping for ScalarTypes, didn't find one, and ended up rigging it myself (` at::ScalarType AccType = g.type().scalarType() == at::ScalarType::Half ? at::ScalarType::Float : g.type().scalarType();`). - Are sparse tensors a concern? Should I include another check for sparse tensors in the `_weight_norm` entry point, and send those along the fallback CPU path as well? Pull Request resolved: https://github.com/pytorch/pytorch/pull/10842 Differential Revision: D9735531 Pulled By: ezyang fbshipit-source-id: 24431d46532cf5503876b3bd450d5ca775b3eaee	2018-09-12 13:55:27 -07:00
Gregory Chanan	9a7c196040	Move Type, Tensor, TensorMethods to core. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11519 Reviewed By: yf225 Differential Revision: D9771684 Pulled By: gchanan fbshipit-source-id: a57ee2072af99ce856f895c688b09d750a8606e0	2018-09-12 13:10:54 -07:00
Wanchao Liang	739e6af869	Add reminder % to the jit Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11557 Reviewed By: apaszke Differential Revision: D9784642 Pulled By: wanchaol fbshipit-source-id: b7c60c3e9534555c9d7db83769965b3f2f277cdf	2018-09-12 12:40:38 -07:00
Zachary DeVito	ad7936e108	Fix reloading modules back into python (#11552 ) Summary: This changes the way module import works so that when a module is reloaded in python it becomes a ScriptModule and not a _C.ScriptModule Pull Request resolved: https://github.com/pytorch/pytorch/pull/11552 Differential Revision: D9782751 Pulled By: zdevito fbshipit-source-id: 9576850b75494b228ce3def94c0d371a4a44b11d	2018-09-12 12:25:15 -07:00
Gao, Xiang	17e76e26c8	Add trigonometry functions to docs/source/onnx.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11581 Differential Revision: D9794449 Pulled By: soumith fbshipit-source-id: 1218fcf8969a10ffbfefd3ced7fee9fe7df296f1	2018-09-12 12:10:01 -07:00
Richard Zou	13b05c8c78	Add EndToEndHybridModel CUDA tests (#11544 ) Summary: Also adds two additional tests that check for memory leaks while the relevant graph executors are alive: - (minimal test): Create a ScriptModule, keep it alive, and test that it does not leak memory while it is alive - (large test) Do MNIST training with a traced MNIST module and test that no memory is leaked while the traced module (with graph executor) is alive cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11544 Reviewed By: apaszke Differential Revision: D9778479 Pulled By: zou3519 fbshipit-source-id: 2d6cdea81dd1264f2c0396b662f70fdafecb3647	2018-09-12 11:25:18 -07:00
Yan Zhu	23d55883c0	minor formatting error log (#11528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11528 as title Reviewed By: chocjy Differential Revision: D9773214 fbshipit-source-id: b7dd4c19ab83a18f344de8e71ce5b3bf74d1af72	2018-09-12 11:25:17 -07:00
zou3519	6398d626f4	Warn that export+import module always load onto the CPU (#11485 ) Summary: Test Plan `cd docs && make html` ![image](https://user-images.githubusercontent.com/5652049/45325074-ed04e480-b51d-11e8-9d2d-685dbe8a08e9.png) cc zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11485 Differential Revision: D9772119 Pulled By: zou3519 fbshipit-source-id: 3dcb16c9edc2e8deebef17accf91a1c7d4dc9063	2018-09-12 10:55:39 -07:00
Christian Puhrsch	12f4c46eea	caffe2::StorageImpl use at::DataPtr (#11282 ) Summary: See title Pull Request resolved: https://github.com/pytorch/pytorch/pull/11282 Reviewed By: ezyang Differential Revision: D9658503 Pulled By: cpuhrsch fbshipit-source-id: 42fa73c979692cb1069c0345744a85d12150745c	2018-09-12 09:39:23 -07:00
Edward Yang	e5dd77c7ad	Sync all libnccl soversions, not just libnccl.so.1 (#11575 ) Summary: Fixes: ``` /bin/ld: warning: libnccl.so.1, needed by /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so, not found (try using -rp ath or -rpath-link) /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclAllReduce' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclBcast' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclCommInitAll' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclGetErrorString' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclReduceScatter' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclAllGather' /data/users/ezyang/pytorch-tmp/build/lib/libcaffe2_gpu.so: undefined reference to `ncclReduce' ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11575 Differential Revision: D9789956 Pulled By: ezyang fbshipit-source-id: 63e48763cc233be9d137cec721b239159b511a24	2018-09-12 09:24:51 -07:00
Peter Goldsborough	f0a284502a	Document BatchNorm and update default behavior (#11484 ) Summary: This PR: 1. Documents `BatchNorm`, 2. Makes a number of API changes after reconsidering some quirks: 1. The default value for the `stateful` parameter used to be `false`, but the most common usage of `BatchNorm` out of the wild is certainly stateful, and the default in Python is also statefulness. So we change the default to stateful. 2. The `pure_forward` function used to use the internal running mean and variance variables instead of the ones supplied to that function call when `stateful` was true, which certainly seems odd. When you call `pure_forward` you would certainly expect the values you pass explicitly to be used. This is now fixed. 3. Adds tests for `BatchNorm`, finally. ebetica apaszke ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11484 Reviewed By: pjh5 Differential Revision: D9779618 Pulled By: goldsborough fbshipit-source-id: 59ba760e085c01454b75644b24b22317b688e459	2018-09-12 09:09:53 -07:00
Rasmus Diederichsen	6fc18a7541	Typo fix in randomness.rst (#11571 ) Summary: "need to be" -> "need not be" Pull Request resolved: https://github.com/pytorch/pytorch/pull/11571 Differential Revision: D9786001 Pulled By: soumith fbshipit-source-id: 7cc408f5c8bfcc56d4b5c153646f30e1cec37539	2018-09-12 08:25:46 -07:00
Thomas Viehmann	efc0f6784a	Move some bmm/baddbmm to ATen (#11292 ) Summary: - Incorporates MKL addition by mingfeima Thank you! (but all errors are my own) - Native CPU implementation: defer to matrix multiplication for small batches and parallelize over batch dimension for large batches. - Add bmm test for CUDA just to be sure. This is a partial fix for #10661, getting down to a factor ~5. Considerable overhead is incurred for the setup in einsum. It might be more efficient to eventually define an optimized contraction functions for arbitrary and several dimensions. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11292 Differential Revision: D9784941 Pulled By: ezyang fbshipit-source-id: f6dded2c6f5e8f0461fb38f31f9a824992a58358	2018-09-12 07:09:55 -07:00
Teng Li	76070fe73c	Make c10d test work on CPU only build (#11567 ) Summary: Make test work with CPU only build, also fixed the test failures for a long time Pull Request resolved: https://github.com/pytorch/pytorch/pull/11567 Differential Revision: D9785740 Pulled By: teng-li fbshipit-source-id: 61c43b758c1ee53117e30de8074583e6faea863a	2018-09-12 01:39:44 -07:00
Owen Anderson	6597779847	Clean up some C++ cruftiness in the script lexer. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11408 Differential Revision: D9772843 Pulled By: resistor fbshipit-source-id: 07f16bf7eaf4f1d8700e46e91a485de4b2d9ed83	2018-09-11 23:55:31 -07:00
Peter Goldsborough	3e3d8caecd	Allow setting deletion constant Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11529 Differential Revision: D9775398 Pulled By: goldsborough fbshipit-source-id: 8593d1afcf8be3150dcc4a58433f53307e3ae665	2018-09-11 23:11:46 -07:00
Teng Li	6dcdbd3a1d	Make C10d support CPU only build (#11513 ) Summary: This makes torch.distributed works for CPU only build. Also added one more CI test case to cover MPI CPU build. All CI tests should cover this change Pull Request resolved: https://github.com/pytorch/pytorch/pull/11513 Differential Revision: D9784546 Pulled By: teng-li fbshipit-source-id: 0976a6b0fd199670926f0273e17ad7d2805e42e7	2018-09-11 22:10:34 -07:00
Adam Paszke	90e31f4896	Improve tracer warnings (#11545 ) Summary: Also, fix a performance bug in `ensureUnique`. Previously it formatted the warning string even though we weren't tracing, so all that work would always happen in the hot path and be for nothing. A sample of how the new warnings look like: ``` tmp.py:4: TracerWarning: Converting a tensor to a Python integer might cause the trace to be incorrect. We can't record the data flow of Pytho n values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs! int(x) tmp.py:5: TracerWarning: torch.tensor results are registered as constants in the trace. You can safely ignore this warning if you use this fun ction to create tensors out of constant variables that would be the same every time you call this function. In any other case, this might caus e the trace to be incorrect. torch.tensor([1.]) tmp.py:6: TracerWarning: There are 2 live references to the data region being modified when tracing in-place operator add_. This might cause t he trace to be incorrect, because all other views that also reference this data will not not reflect this change in the trace! On the other ha nd, if all other views use the same memory, but are disjoint (e.g. are outputs of torch.split), this might still be safe. torch.split(y, 2, dim=1)[0].add_(2) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11545 Differential Revision: D9782975 Pulled By: apaszke fbshipit-source-id: 5b3abd31366e59c69e0b7ff278042b5563deb5a9	2018-09-11 22:10:32 -07:00
Adam Paszke	62c9d4ac96	Make .to() methods native functions (to fix JIT tracing) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11491 Differential Revision: D9771121 Pulled By: apaszke fbshipit-source-id: 08d11101fb12093f8cf913b06359adddf3af9da7	2018-09-11 21:55:42 -07:00
Adam Paszke	a00fa2c614	Release GIL when calling into JIT interpreter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11541 Differential Revision: D9777909 Pulled By: apaszke fbshipit-source-id: d0217e203721262f3f131b54ea78f898df0b54ec	2018-09-11 21:55:40 -07:00
Soumith Chintala	1a246c9c7e	guard spurious cudnn.h include (#11562 ) Summary: This fixes the build when CuDNN was not found on the system. From the `git blame`, it looks like the bug has been around for 2 years :) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11562 Differential Revision: D9784589 Pulled By: soumith fbshipit-source-id: b33153436dced0a503c9833cdf52f7093f3394b4	2018-09-11 21:09:54 -07:00
Tongliang Liao	a11ebfa195	Add explicit "this->" for nvcc. (#11196 ) Summary: Fix #11195 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11196 Differential Revision: D9737625 Pulled By: ezyang fbshipit-source-id: fb62076f005bd619eba53c0ed3f07683633f6d91	2018-09-11 21:09:52 -07:00
Rasmus Diederichsen	8aa8ad8b01	WIP: Reproducibility note (#11329 ) Summary: This adds a Note on making experiments reproducible. It also adds Instructions for building the Documentation to `README.md`. Please ping if I missed any requirements. I'm not sure what to do about the submodule changes. Please advise. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11329 Differential Revision: D9784939 Pulled By: ezyang fbshipit-source-id: 5c5acbe343d1fffb15bdcb84c6d8d925c2ffcc5e	2018-09-11 21:09:51 -07:00
Anders Papitto	b75c32ded9	link against TORCH_CUDA_LIBRARIES Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11475 Differential Revision: D9784616 Pulled By: anderspapitto fbshipit-source-id: bb8b443bcb308bbbe9707d265f21e5d00d717d65	2018-09-11 20:39:53 -07:00
Peter Goldsborough	f4d9f39a94	Test libtorch on cuda Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11554 Differential Revision: D9784772 Pulled By: goldsborough fbshipit-source-id: c3e071695f56c1f427984f427b1f7722722947d3	2018-09-11 20:39:51 -07:00
Rasmus Diederichsen	35348dab10	WIP: Include note on cudnn determinism in each function backed by cudnn (#11434 ) Summary: Ping ezyang This addresses your comment in #114. Strangely, when running the doc build (`make html`) none of my changes are actually showing, could you point out what I'm doing wrong? Once #11329 is merged it might make sense to link to the reproducibility note everywhere. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11434 Differential Revision: D9751208 Pulled By: ezyang fbshipit-source-id: cc672472449564ff099323c39603e8ff2b2d35c9	2018-09-11 20:27:09 -07:00
Wei Yang	54107ae8cf	convert output_device at data_parallel from torch.device to index (#10189 ) Summary: - fixes #9984 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10189 Differential Revision: D9545390 Pulled By: weiyangfb fbshipit-source-id: 3a6a705437553ba319e9fd4b7f676ff73857a27e	2018-09-11 20:27:07 -07:00
Peter Goldsborough	045f862574	Use torch::nn::init::xavier_normal_ Summary: The PyTorch C++ API has `torch.nn.init` equivalents that the RNNG can use to initialize the state of its StackRNNs. This gets rid of the `fanInOut_` methods on `Parser` and tidies up `xavierInitialState` a little. Reviewed By: wowitsmrinal Differential Revision: D9472595 fbshipit-source-id: c202116f32383d3b4bba064c2c0d2656311e1170	2018-09-11 20:27:06 -07:00
Peter Goldsborough	d95fedb436	Use ATen dropout implementation in Dropout module and add FeatureDropout (#11458 ) Summary: This PR does two things: 1. Replaces the implementation of the `Dropout` module with a call to the ATen function, 2. Replaces `Dropout2d` with a new `FeatureDropout` module that shall take the place of `Dropout2d` and `Dropout3d`. I contemplated calling it `Dropout2d` and making `Dropout3d` an alias for it, but similar to our decision for `BatchNorm{1,2,3}d` (c.f. https://github.com/pytorch/pytorch/pull/9188), we can deviate from Python PyTorch in favor of the ideal-world solution, which is to have a single module, since both actually just call `feature_dropout`. I also replaced the implementation of `dropout3d` with a call to `dropout2d` in Python. The code is the same and it's easier for developers to parse than having to manually match the tokens to make sure it's really 100% the same code (which it is, if I matched the tokens correctly). ebetica ezyang SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/11458 Differential Revision: D9756603 Pulled By: goldsborough fbshipit-source-id: fe847cd2cda2b6da8b06779255d76e32a974807c	2018-09-11 20:16:12 -07:00
Yangqing Jia	3121c8f526	Update gtest and remove the macro guide on gtest from #11321 (#11417 ) Summary: Last PR seems to have test failures, re-issuing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11417 Reviewed By: orionr Differential Revision: D9784706 Pulled By: Yangqing fbshipit-source-id: 9e5f347e19fa2700ff69d2cd69ea7a9e01a91609	2018-09-11 20:16:08 -07:00
Edward Yang	92fd69f256	Split Type into TypeExtendedInterface and Type (#11520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11520 Previously, we had Type which was a catch all interface for all functions and methods we could possibly want to do dynamic dispatch on. However, we want to check in a non-autogenerated Tensor class to ATen/core, and to do this, we must also check in a non-autogenerated Type class which we can do dispatch on. In principle, we could put the full Type interface in ATen/core, but this would be a bad developer experience, since any time you add a new free function, you'd have to regenerate the checked in Type header. For a better dev experience, we split Type into a two parts, Type, which will be checked in (though not in this diff), and TypeExtendedInterface, which will NOT be checked in. Type contains just enough methods to let Tensor be defined, and leaves the rest to TypeExtendedInterface. Some complications: - We (very unfortunately) have overloaded virtual methods. Because of C++'s rules, we cannot move one overload without doing some extra work to make sure that overload in a superclass and an overload in a subclass resolve together. I've chosen to resolve this problem simply by moving ALL overloads of a method which occurs in Tensor to Type. - There are some places where we take a type() object and call a method on it, which is not a Tensor base method. I've eliminated some where possible, but in other cases calling the method on type is the ONLY way to invoke it; in that case, I've just inserted a cast. Further refactoring is necessary. Reviewed By: gchanan Differential Revision: D9771708 fbshipit-source-id: c59d39fe919cd6f42be6dca699d474346ea3c614	2018-09-11 20:16:04 -07:00
Yangqing Jia	35d52dbb0e	re-enable USE_MPI (#11416 ) Summary: The previous error was caused by mpi_test not depending on MPI_CXX_LIBRARIES. This might solve the problem. Not tested locally - waiting for CI test. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11416 Reviewed By: mingzhe09088 Differential Revision: D9771694 Pulled By: Yangqing fbshipit-source-id: 53e7b4f64eadc88313bc4dd9b8e3f7931cda6e91	2018-09-11 18:26:12 -07:00
Fritz Obermeyer	bbf54ea37c	Ensure .enumerate_support() methods are jittable (#11542 ) Summary: This works around #11535 by avoiding `arange(n, out=x)` and `eye(n, out=x)` in `torch.distributions`. I've confirmed that the `.enumerate_support()` methods are now jittable. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11542 Differential Revision: D9777805 Pulled By: apaszke fbshipit-source-id: fa38f2f1acfc0a289f725fd8c92478573cfdbefb	2018-09-11 18:26:09 -07:00
Wei Yang	cda74ac476	fix nested no_grad decorator and with-statement (#11479 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/10858 - allow `no_grad` decorator to apply `with torch.no_grad()` at the correct context - current behavior: ``` import torch torch.no_grad() def nothing(x): return x testin = torch.Tensor([0]) with torch.no_grad(): print(torch.is_grad_enabled()) # False testout = nothing(testin) print(torch.is_grad_enabled()) # False ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11479 Differential Revision: D9758691 Pulled By: weiyangfb fbshipit-source-id: 87de2219c6c45f65a2c0406ae152c3ad760be8f2	2018-09-11 17:56:40 -07:00
Adam Paszke	8b196d671b	Allow tracing random functions (only when using default generators) (#11539 ) Summary: Fixes #11504. zdevito, neerajprad, fritzo Pull Request resolved: https://github.com/pytorch/pytorch/pull/11539 Differential Revision: D9777897 Pulled By: apaszke fbshipit-source-id: 56983260f5b93da7d5540a6242769ea7bd50eb06	2018-09-11 17:56:39 -07:00
Soumith Chintala	b6b0b5222d	fix missing libnccl.so.1 error (#11553 ) Summary: what it says on the tin. I broke the build in https://github.com/pytorch/pytorch/pull/11487 but contbuild didn't end up catching it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11553 Differential Revision: D9781557 Pulled By: soumith fbshipit-source-id: 2a1fa314af4b85b5491d74110bfee3d80599aa95	2018-09-11 17:25:58 -07:00
Tongzhou Wang	3a39006d38	Fix some more doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11531 Differential Revision: D9776541 Pulled By: SsnL fbshipit-source-id: 8725485639ea6e9479b6ea95a49f5b75a9457db7	2018-09-11 16:26:55 -07:00
Roger-luo	3a8e39b215	Support load and store between Py_complex and std::complex (#11493 ) Summary: Printing for complex numbers requires loading and storing between `Py_complex` and `std::complex`. This patch aims to support this for the plugin. Differential Revision: D9771808 Pulled By: ezyang fbshipit-source-id: 024865f1945d63ddb5efc775a35438c8ea06408e	2018-09-11 15:55:11 -07:00
Zachary DeVito	289a8c9b7d	Allow train/eval, and non-Tensor arguments to python functions (#11505 ) Summary: This whitelists train/eval functions in script modules, and tests that nested nn.Modules still work. This also changes the code for calling python functions from script to allow non-tensor inputs/outputs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11505 Differential Revision: D9765466 Pulled By: zdevito fbshipit-source-id: 1177bff931324422b69e18fa0bbaa82e3c98ec69	2018-09-11 15:05:09 -07:00
Yangqing Jia	17776db2ee	Add gtest dependency on aten tests. (#11429 ) Summary: ezyang delivering my promise to you :) Basically, now aten tests can use gtest as part of our test harness unification effort. I also converted one test (atest.cpp) to show how one can do this. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11429 Reviewed By: ezyang Differential Revision: D9762934 Pulled By: Yangqing fbshipit-source-id: 68ec3a748403c6bd88399b1e756200985a4e07e3	2018-09-11 13:39:51 -07:00
Lukasz Wesolowski	4db21a1d8e	Optimize LengthsTileOp on GPU to run a kernel instead of a sequence of memcopies (#11413 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11413 LengthsTileOp was implemented using a sequence of device memcopies initiated on the CPU. This was very slow. I changed it to use a kernel. TUM benchmark QPS improved from 13k QPS to 20k QPS as a result. Reviewed By: manojkris, xianjiec Differential Revision: D9724988 fbshipit-source-id: 2f98c697730982734d7c6a26d0b6967310d49900	2018-09-11 13:25:35 -07:00
Thomas Viehmann	c1dce21fd5	Cuda TensorAccessor (#11373 ) Summary: Provide a TensorAccessor-Like interface for CUDA as discussed in #8366. Compared to TensorAccessor - the CUDATensorAccessor copies the sizes and strides while on the host (I didn't implement a host indexing function, though) to enable transfer to the device, on the device, `[]` works like for TensorAccessors, - instantiation is from TensorAccessors in order to allow using `.accessor<..>`. The drawback is that it you cannot use `auto` for the variable declaration, but the alternative would be a cuda-specific `.accessor`-like function, - there is a PtrTraits argument to enable `__restrict__`, Example for the intended use: ``` ... template <typename scalar_t> __global__ void apply_homography_2d_kernel(cuda::CUDATensorAccessor<scalar_t, 4> dest_a, cuda::CUDATensorAccessor<scalar_t, 4> src_a, cuda::CUDATensorAccessor<float, 2> transform) { ... } template <typename scalar_t> Tensor apply_homography_2d_template(Tensor& res, const Tensor& image, const Tensor& transform) { ... cuda::CUDATensorAccessor<scalar_t, 4> image_a(image.accessor<scalar_t, 4>()); cuda::CUDATensorAccessor<scalar_t, 4> res_a(res.accessor<scalar_t, 4>()); cuda::CUDATensorAccessor<float, 2> transform_a(transform.accessor<float, 2>()); auto stream = at::cuda::getCurrentCUDAStream(); apply_homography_2d_kernel<scalar_t> <<<grid, block, 0, stream>>>(res_a, image_a, transform_a); return res; } ... ``` I could use a hint where to put a test for this (e.g. doing a plain vanilla matrix multiplication with a custom kernel) and comparing with the aten mm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11373 Differential Revision: D9735573 Pulled By: ezyang fbshipit-source-id: 482b218a0d514e19a8b692bbc77c0e37082cfded	2018-09-11 13:09:33 -07:00
vishwakftw	c56a7cfc37	More use of AT_CHECK and AT_ERROR (#11457 ) Summary: Considering these increase the size of the message stack, I didn't touch the code outside `ATen/native` Differential Revision: D9754283 Pulled By: soumith fbshipit-source-id: 04198ec4fd0c4abae09eeba92c493a783408537a	2018-09-11 12:55:09 -07:00
Will Feng	5952acc041	Add "merge to master" step before build in CircleCI (#11443 ) Summary: This PR adds the "merge to master" step before the build step in CircleCI, so that all PR commits are built against master instead of against the PR's branch. Note that all PRs still need to rebase to master to pick up this new config, so it won't apply to old PR branches retroactively. To check in CI: make sure it's performing the git merge to master appropriately in "Merge Onto Master" step. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11443 Differential Revision: D9775628 Pulled By: yf225 fbshipit-source-id: 8083db6b098d234a44ae4481f40a486e9906f6f8	2018-09-11 12:39:37 -07:00
James Reed	fbc17321fd	Update pybind11 to fix Python 3.7 support for script (#11473 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11419 In particular pulling in https://github.com/pybind/pybind11/pull/1454 as well as pending bugfix in https://github.com/pybind/pybind11/pull/1517 (documenting in comment) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11473 Differential Revision: D9776003 Pulled By: jamesr66a fbshipit-source-id: a225dcfb66c06bcae98fd2508d9e690c24be551a	2018-09-11 12:39:36 -07:00
Adam Paszke	781737f84c	Remove time prefix from rsync (#11525 ) Summary: This fails with zsh saying "time: command not found". cc soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11525 Differential Revision: D9772522 Pulled By: apaszke fbshipit-source-id: b80d108fa6b174d68ada08a9fdbf7260ee37e08f	2018-09-11 12:10:24 -07:00
Will Feng	a566bc2f11	Disable all CircleCI jobs (#11523 ) Summary: Disable all CircleCI jobs until we are ready to move forward with them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11523 Differential Revision: D9774462 Pulled By: yf225 fbshipit-source-id: c5724e71eb68bac4df958b4f7bcc380050668b3c	2018-09-11 11:25:17 -07:00
Fei Sun	d09041bd81	Add an option to statically link cuda (#10596 ) Summary: Need to link CUDA statically for benchmarking purpose. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10596 Reviewed By: llyfacebook Differential Revision: D9370738 Pulled By: sf-wind fbshipit-source-id: 4464d62473e95fe8db65b0bd3b301f262bf269bf	2018-09-11 11:09:29 -07:00
Lu Fang	727a4453aa	New Serialization Proto Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11166 Reviewed By: mingzhe09088 Differential Revision: D9623522 Pulled By: houseroad fbshipit-source-id: f21153034a398de7959404321d8534234cd58a40	2018-09-11 10:55:43 -07:00
Edward Yang	f80f15866b	Get rid of manual dispatch on Type. (#11486 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11486 I discovered these by narrowing the interface on Type, and then fixing call sites outside of core plumbing code which depended on these methods being provided. Reviewed By: cpuhrsch Differential Revision: D9757935 fbshipit-source-id: 3abda0c98919a448a326a757671d438964f6909f	2018-09-11 10:40:22 -07:00
Peter Goldsborough	01c7542f43	Use -isystem for system includes in C++ extensions (#11459 ) Summary: I noticed warnings from within pybind11 being shown when building C++ extensions. This can be avoided by including non-user-supplied headers with `-isystem` instead of `-I` I hope this works on Windows. soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11459 Differential Revision: D9764444 Pulled By: goldsborough fbshipit-source-id: b288572106078f347f0342f158f9e2b63a58c235	2018-09-11 10:40:20 -07:00
Orion Reblitz-Richardson	d32b41003a	Copy protos on install same as develop (#11517 ) Summary: This is a potential fix for https://github.com/pytorch/pytorch/issues/11453 and https://github.com/pytorch/pytorch/issues/11074 worked through with pjh5 . Turns out we had some protos copy code that was in the .sh file that was removed. Better to have it in setup.py, though, same as for develop. cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11517 Differential Revision: D9771911 Pulled By: orionr fbshipit-source-id: 76975d8f71f38d951eaaed0b50dd3ec36dd177a9	2018-09-11 10:09:56 -07:00
James Reed	deac304b6b	Bugfix for basic slicing Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11428 Differential Revision: D9753999 Pulled By: jamesr66a fbshipit-source-id: cfc4163a5a06b41beb808a4e24650d71f5d91f4f	2018-09-11 09:39:29 -07:00
Soumith Chintala	4e8d9a4a58	Introducing python setup.py rebuild develop (#11487 ) Summary: This speeds up incremental builds by doing the following changes: - Uses `rsync` instead of `cp` (when `rsync` is found) which is a bit smarter in doing "maybe copy" - Introduces a `rebuild` mode which does not rerun `cmake` in `build_pytorch_libs.sh`. Note: `rebuild` should only be used if you dont add / remove files to the build, as `cmake` is not rerun Current no-op rebuild speedup: - 1m 15s -> 20s There are some lingering bugs. No-op rebuilds rerun `cmake` for two rebuilds (likely that cmake logic is dependent on the install folder, hence kicking off rebuild). So what you see ``` python setup.py rebuild develop # first time - ~5 mins python setup.py rebuild develop # second time - ~3 mins python setup.py rebuild develop # third time - ~2 mins python setup.py rebuild develop # fourth time - ~20 seconds python setup.py rebuild develop # fifth time - ~20 seconds ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11487 Differential Revision: D9769087 Pulled By: soumith fbshipit-source-id: 20fbecde33af6426149c13767e8734fb3be783c5	2018-09-11 08:56:25 -07:00
Orion Reblitz-Richardson	31850163ac	Remove separate ATen build target (#11488 ) Summary: ATen has had a separate build target in the past, but with our move to a root-level CMakeLists.txt file this makes less sense and is harder to maintain. Also, as we blend code between Caffe2 and ATen this will become even less maintainable. Talked to ezyang about this, but also cc zdevito, Yangqing, and soumith. If this is too difficult, I will revert, but want to see if we can simplify for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11488 Differential Revision: D9770266 Pulled By: orionr fbshipit-source-id: c7ba52a1676d84e2d052dad4c042b666f49451cd	2018-09-11 08:56:23 -07:00
Tongzhou Wang	de460c7ad3	Improvements on conv/pool/fold/stft/ParamDict docs (#11106 ) Summary: Also fixes some incorrect formula rendering. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11106 Differential Revision: D9752433 Pulled By: SsnL fbshipit-source-id: 535fc8498638e8b645757fc7535d8771992b7d21	2018-09-11 08:56:21 -07:00
Gregory Chanan	86ab92b0a9	Move TensorImpl / UndefinedTensor(Impl) to core (#11441 ) Summary: Moves TensorImpl to core. Renames UndefinedTensor to UndefinedTensorImpl and moves to core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11441 Differential Revision: D9736620 Pulled By: gchanan fbshipit-source-id: 0322ae3b903e338de253b35a0d74a9d3e219204b	2018-09-11 07:45:56 -07:00
Neeraj Pradhan	80fa8e1007	Add .expand() method to distribution classes (#11341 ) Summary: This adds a `.expand` method for distributions that is akin to the `torch.Tensor.expand` method for tensors. It returns a new distribution instance with batch dimensions expanded to the desired `batch_shape`. Since this calls `torch.Tensor.expand` on the distribution's parameters, it does not allocate new memory for the expanded distribution instance's parameters. e.g. ```python >>> d = dist.Normal(torch.zeros(100, 1), torch.ones(100, 1)) >>> d.sample().shape torch.Size([100, 1]) >>> d.expand([100, 10]).sample().shape torch.Size([100, 10]) ``` We have already been using the `.expand` method in Pyro in our [patch](https://github.com/uber/pyro/blob/dev/pyro/distributions/torch.py#L10) of `torch.distributions`. We use this in our models to enable dynamic broadcasting. This has also been requested by a few users on the distributions slack, and we believe will be useful to the larger community. Note that currently, there is no convenient and efficient way to expand distribution instances: - Many distributions use `TransformedDistribution` (or wrap over another distribution instance. e.g. `OneHotCategorical` uses a `Categorical` instance) under the hood, or have lazy parameters. This makes it difficult to collect all the relevant parameters, broadcast them and construct new instances. - In the few cases where this is even possible, the resulting implementation would be inefficient since we will go through a lot of broadcasting and args validation logic in `__init__.py` that can be avoided. The `.expand` method allows for a safe and efficient way to expand distribution instances. Additionally, this bypasses `__init__.py` (using `__new__` and populating relevant attributes) since we do not need to do any broadcasting or args validation (which was already done when the instance was first created). This can result in significant savings as compared to constructing new instances via `__init__` (that said, the `sample` and `log_prob` methods will probably be the rate determining steps in many applications). e.g. ```python >>> a = dist.Bernoulli(torch.ones([10000, 1]), validate_args=True) >>> %timeit a.expand([10000, 100]) 15.2 µs ± 224 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) >>> %timeit dist.Bernoulli(torch.ones([10000, 100]), validate_args=True) 11.8 ms ± 153 µs per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` cc. fritzo, apaszke, vishwakftw, alicanb Pull Request resolved: https://github.com/pytorch/pytorch/pull/11341 Differential Revision: D9728485 Pulled By: soumith fbshipit-source-id: 3b94c23bc6a43ee704389e6287aa83d1e278d52f	2018-09-11 06:56:18 -07:00
Adam Paszke	120d769432	Add support for tracing strings (#11506 ) Summary: This enabled `torch.einsum` both in tracing and in script mode. It's used all over Pyro at the moment, and is needed for any use of the JIT in there. Fixes #11157. zdevito fritzo neerajprad Pull Request resolved: https://github.com/pytorch/pytorch/pull/11506 Differential Revision: D9764787 Pulled By: apaszke fbshipit-source-id: 9b5251b9e7c5897034602bd07ff67b425d33326c	2018-09-11 06:02:41 -07:00
Adam Paszke	0ddbe668cd	Improve shape analysis to cover all most commonly used ops (#11358 ) Summary: [Here's a list](https://gist.github.com/apaszke/f0821840bdcc67a977832dc58acc1b85) of ops that are in `register_aten_ops.cpp`, but aren't supported in shape prop. Everything else should work now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11358 Differential Revision: D9753693 Pulled By: apaszke fbshipit-source-id: efeae0126ce16cb56b8797fc5246405588bcae3c	2018-09-11 06:02:39 -07:00
Duc Ngo	f84693efa9	nomnigraph - Improvements to subgraph matching APIs (#11418 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11418 Several improvements that aim to make the APIs more straightforward to use - Get rid of helper methods subgraph and nonTerminal . Users now should create a NNMatchGraph directly via graph's createNode and createEdge API - Get rid of operatorSubgraph helper method - invertGraphTraversal flag applies to both the match graph and the scanned graph. This allows user to create match graph in the same direction as the scanned graph, thus reduce confusion. - additional parameters of matchNode (count, includeInSubgraph, nonTerminal) are removed from the constructors and moved into setter methods. (We no longer enforce that MatchNode is immutable but this helps improve code clarity). - Tests are updated to reflect the changes Follow up changes: - Possibly clean up the tests further. This change aims to minimally modify the unit tests. - Help a validity check that enforce the current limitation of the match graph (single source node), and throws if the match graph does not satisfy the criteria. - Have the single source node be detected automatically and callers just need to pass in the matchGraph instead of the source node reference. Differential Revision: D9732565 fbshipit-source-id: ae8320e2bc89b867f6bb4b1c1aad635f4b219fa1	2018-09-11 04:39:27 -07:00
Teng Li	3d5fd12488	Documentation for c10d: torch.distributed and deprecate the old distributed doc (#11450 ) Summary: This is the new documentation for c10d release, and it also deprecates the old torch.distributed document. This PR depends on https://github.com/pytorch/pytorch/pull/11405 and should only be landed after https://github.com/pytorch/pytorch/pull/11405 is landed Pull Request resolved: https://github.com/pytorch/pytorch/pull/11450 Differential Revision: D9765504 Pulled By: teng-li fbshipit-source-id: 48f38b27b8c270baf389f8e478ea226b9ecc63db	2018-09-11 02:10:28 -07:00
Teng Li	0988bbad2d	C10d release to torch.distributed for PT1 (#11405 ) Summary: The old `torch.distributed` will go to `torch.distributed.deprecated` The old DDP will go to `torch.nn.parallel.deprecated` Now `torch.nn.parallel.DDP` will use c10d DDP Now `torch.distributed` will use C10d frontend API Pull Request resolved: https://github.com/pytorch/pytorch/pull/11405 Reviewed By: pietern Differential Revision: D9733733 Pulled By: teng-li fbshipit-source-id: d6a3f3e73f8d3a7fcb1f4baef53c78063b8cbb08	2018-09-10 23:27:22 -07:00
Peter Goldsborough	b14a80553d	Ignore functional doc error Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11508 Differential Revision: D9764380 Pulled By: goldsborough fbshipit-source-id: 3abb9c04f46137be833ea26d67734741e14f8010	2018-09-10 20:55:48 -07:00
Gregory Chanan	f9d12eeb27	Give copy an optional device argument. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11497 Differential Revision: D9762014 Pulled By: gchanan fbshipit-source-id: 996419cc5e86d000af953d030ff361adafb921ad	2018-09-10 20:40:03 -07:00
Peter Goldsborough	dd8defeb3f	Document the Functional module (#11460 ) Summary: Document the `Functional` module in the C++ API. ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11460 Differential Revision: D9757555 Pulled By: goldsborough fbshipit-source-id: 15f8bf6d60bd26f3f4e69fb8e414e186e3c220ee	2018-09-10 19:58:38 -07:00
Peter Goldsborough	9cfdf0d677	Document the Embedding module (#11469 ) Summary: ebetica soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11469 Differential Revision: D9757547 Pulled By: goldsborough fbshipit-source-id: a95673abe949bb81d716dbc03c5c3e2a11cc15d3	2018-09-10 18:25:08 -07:00
Orion Reblitz-Richardson	a175282776	Flags for LMDB, LevelDB, and Caffe2 ops (#11462 ) Summary: Add flags for LMDB and LevelDB, default `OFF`. These can be enabled with ``` USE_LMDB=1 USE_LEVELDB=1 python setup.py build_deps ``` Also add a flag to build Caffe2 ops, which is default `ON`. Disable with ``` NO_CAFFE2_OPS=1 python setup.py build_deps ``` cc Yangqing soumith pjh5 mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11462 Reviewed By: soumith Differential Revision: D9758156 Pulled By: orionr fbshipit-source-id: 95fd206d72fdf44df54fc5d0aeab598bff900c63	2018-09-10 17:27:50 -07:00
Orion Reblitz-Richardson	e1e69446f6	Lockdown NO_TEST=1 for tests even more (#11415 ) Summary: Skip torch tests as well when NO_TEST=1 environment variable is set. Also remove the separate ATen code path for not being built with Caffe2, since it will always be built with Caffe2. cc The controller you requested could not be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11415 Reviewed By: soumith Differential Revision: D9758179 Pulled By: orionr fbshipit-source-id: e3e3327364fccdc57a703aeaad8c4f30452973fb	2018-09-10 17:27:48 -07:00
Bram Wasti	3e49a69466	Resolve ambiguity when including both caffe2 and aten registries (#11411 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11411 Simple fix Reviewed By: goldsborough Differential Revision: D9730371 fbshipit-source-id: f841327c01faa13cfb6b7fc6e279b8fc50fad1db	2018-09-10 17:27:46 -07:00
James Reed	3ad67c60f0	Traceable explicit Variable instantiation (#11463 ) Summary: There's a bunch of legacy code where people are explicitly instantiating Variable, and these call-sites have thus far been untraceable (appearing as prim::Constant nodes with the tensor value at the time of tracing). This makes it so that the new variable inherits the traced Value* from the tensor it's being constructed from Pull Request resolved: https://github.com/pytorch/pytorch/pull/11463 Differential Revision: D9756529 Pulled By: jamesr66a fbshipit-source-id: da99c6a7621957a305f2699ec9cb9def69b1b2d7	2018-09-10 17:03:24 -07:00
Mingda Li	f2f43ad2da	Add new LengthsSplit operator (#10974 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10974 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10291 This new operator will do the following: Given a LENGTHS vector and n_splits, output a "split" LENGTHS vector where: 1. Each length in input vector is split into n_splits values (thus output vector should have LENGTHS.size(0) * n_splits elements) 2. The new lengths in output should be evenly split, and if the length is not divisible by n_splits, then order new values in descending order. (e.g. n_splits = 3, length = 5 -> 2 2 1) 3. If n_splits > some element in the array, its split elements will contain 0s. (e.g. n_splits = 3, length = 2 - > 1 1 0) Reviewed By: bddppq, chocjy Differential Revision: D9013119 fbshipit-source-id: 82bf3371ec08c41fc3379177f0007afc142e0d84	2018-09-10 15:40:28 -07:00
Owen Anderson	0b78ae86c5	Cleanup byte swapping utilities to generate optimal code on the platforms we care about. (#11394 ) Summary: While the use of memcpy as part of the byte swapping sequence looks funky, all major compilers recognize and optimize this pattern reliably, resulting in essentially optimal code generation. For example, decodeUInt32LE goes from this on iOS arm64: > ldrb w8, [x0, #3] > ldrb w9, [x0, #2] > bfi w8, w9, #8, #8 > ldrb w9, [x0, #1] > bfi w8, w9, #16, #8 > ldrb w9, [x0] > bfi w8, w9, #24, #8 > mov x0, x8 > ret To this: > ldr w8, [x0] > rev w0, w8 > ret Pull Request resolved: https://github.com/pytorch/pytorch/pull/11394 Reviewed By: SsnL Differential Revision: D9728659 Pulled By: resistor fbshipit-source-id: 9afbd4adfad1d1fb7b01f1179e6707ee21fa726f	2018-09-10 15:40:24 -07:00
Peter Goldsborough	a0d4106c07	Integrate custom op tests with CI (#10611 ) Summary: This PR is stacked on https://github.com/pytorch/pytorch/pull/10610, and only adds changes in one file `.jenkins/pytorch/test.sh`, where we now build the custom op tests and run them. I'd also like to take this PR to discuss whether the [`TorchConfig.cmake`](https://github.com/pytorch/pytorch/blob/master/cmake/TorchConfig.cmake.in) I made is robust enough (we will also see in the CI) orionr Yangqing dzhulgakov what do you think? Also ezyang for CI changes Pull Request resolved: https://github.com/pytorch/pytorch/pull/10611 Differential Revision: D9597627 Pulled By: goldsborough fbshipit-source-id: f5af8164c076894f448cef7e5b356a6b3159f8b3	2018-09-10 15:40:21 -07:00
Adam Paszke	3e665cc29b	Improve support for tracing sizes, add more tracer warnings (#11288 ) Summary: Many constructors like `torch.zeros` or `torch.randn` didn't support size tracing correctly which is fixed by this pass. Same issue has been fixed in legacy tensor constructors. Additionally, new tensor constructors, which do not participate in tracing (most notably `torch.tensor`, `torch.as_tensor` and `torch.from_numpy`) raise a warning when they are used. Finally, entering a traceable operation disables the tracing in its body. This is needed because zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11288 Reviewed By: ezyang Differential Revision: D9751183 Pulled By: apaszke fbshipit-source-id: 51444a39d76a3e164adc396c432fd5ee3c8d5f7f	2018-09-10 15:22:48 -07:00
Tongzhou Wang	70d93f4777	Check for maximum numel in NCCL broadcasting (#11466 ) Summary: NCCL1 uses `int` as its numerical type for fields like `count`, which makes broadcasting tensors larger than `2 << 31 - 1` impossible, and raises opaque error `invalid arguments`. NCCL2 greatly increase the limit on many platforms by using `size_t`. This patch statically detects this type, and raises properly if the broadcast tensor exceeds the limit. No test because I don't think our test suite should broadcast big tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11466 Differential Revision: D9754753 Pulled By: SsnL fbshipit-source-id: 73506450cae047e06b5b225b39efdb42d5d26685	2018-09-10 14:39:15 -07:00
Peter Goldsborough	35008e0a1a	Add flags to fix half comparison and test (#11395 ) Summary: The controller you requested could not be found. found there are some issues when using comparison operators for half types when certain THC header are included. I was able to reproduce and added a test. I also fix the issue by adding the proper definitions to avoid this issue. Reported in https://github.com/pytorch/pytorch/pull/10301#issuecomment-416773333 Related: https://github.com/pytorch/tutorials/pull/292 soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/11395 Differential Revision: D9725102 Pulled By: goldsborough fbshipit-source-id: 630425829046bbebea3409bb792a9d62c91f41ad	2018-09-10 14:10:21 -07:00
Myle Ott	18e5fd36c2	Normalize gradients before reduction in DistributedDataParallelC10d (#11109 ) Summary: Normalizing by the world size before the reduction is less likely to cause overflow in FP16 training. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11109 Differential Revision: D9594708 Pulled By: myleott fbshipit-source-id: 93ab53cb782ee1cbe1264e529b333490a0940338	2018-09-10 13:55:09 -07:00
Tongzhou Wang	ea0ee77c61	Fix katex math rendering (#11472 ) Summary: I'm 80% sure that this fixes the math bug. But I can't repro locally so I don't know. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11472 Differential Revision: D9755328 Pulled By: SsnL fbshipit-source-id: 130be664d3c6ceee3c0c166c1a86fc9ec3b79d74	2018-09-10 12:40:23 -07:00
Sebastian Messmer	198ade74f9	Remove manual refcounting from Tensor class (#11294 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11294 The Tensor(ptr, retain) constructor is error prone and circumvents the intrusive_ptr safety. This diff removes that and pushes the responsibility to callers. Step by step, manual refcounting can be pushed back and possibly eliminated in the end. Reviewed By: ezyang Differential Revision: D9663476 fbshipit-source-id: 7f010e5e47b137a9575960201c5bf5d552c5c2f5	2018-09-10 12:40:21 -07:00
Sebastian Messmer	b0c1397271	Fix intrusive_ptr move/copy for different NullType's (#11260 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11260 This is needed to make something like this work: intrusive_ptr<TensorImpl, UndefinedTensorImpl> a = make_intrusive<SparseTensorImpl>(...); Reviewed By: ezyang Differential Revision: D9652089 fbshipit-source-id: 19c65e98460ccb27bc69e36d7e558cb9d6e67615	2018-09-10 12:40:20 -07:00
Sebastian Messmer	252f93df09	Improve Tensor() constructor (#11258 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11258 The two intrusive_ptr constructors in Tensor can be combined into one implementation that does both, moving and copying. Reviewed By: ezyang Differential Revision: D9652088 fbshipit-source-id: 5efca02654ba305c99c20bbeb83551469d17a51d	2018-09-10 12:40:19 -07:00
Sebastian Messmer	09292f2c03	Some improvements to IValue (#11238 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11238 - when moving an IValue, free the old value instead of keeping it allocated - making classes final - moving std::string - making ConstantList const Reviewed By: ezyang Differential Revision: D9644700 fbshipit-source-id: ab7228368e4f00f664ba54e1242b0307d91c5e7e	2018-09-10 12:40:17 -07:00
Sebastian Messmer	ce6906b051	Narrowing Blob (#11167 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11167 Narrow the Blob API as preparation for merging Blob/IValue - get rid of templated IsType and Operator::InputIsType / OutputIsType - Use 'using' instead of 'typedef' for DestroyCall (just for readability) Reviewed By: ezyang Differential Revision: D9623916 fbshipit-source-id: 952f0b0cf5a525094b02e8d2798dd57a56a9e1d8	2018-09-10 12:40:16 -07:00
Richard Zou	040d75d455	Add option to use CUDA memory leak testing as a context manager (#11380 ) Summary: cc SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/11380 Reviewed By: ezyang Differential Revision: D9705877 Pulled By: zou3519 fbshipit-source-id: 02470c25236f57fa02f4ac9d7ed63d38a6355db2	2018-09-10 12:40:15 -07:00
Elias Ellison	2158f4a9c8	add export import test to TestJitGenerated (#10982 ) Summary: Checking assertExportImport for all of the generated test jit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10982 Differential Revision: D9636935 Pulled By: eellison fbshipit-source-id: f3f1ce77d454848098f2ac7e0fa18bf8564890be	2018-09-10 11:37:05 -07:00
Gregory Chanan	cee743f639	Move backward/set_data to Type-based dispatch. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11440 Differential Revision: D9736565 Pulled By: gchanan fbshipit-source-id: 1e66f54f1c87084f37c0b014030f0d6d2f8dfaee	2018-09-10 08:40:29 -07:00
Tongzhou Wang	87a9a8f80a	Use AT_CHECK and AT_ERROR Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11444 Differential Revision: D9736992 Pulled By: SsnL fbshipit-source-id: bf5320e878c6ef71468f3e2aa12ce304b92d45ca	2018-09-09 21:26:12 -07:00
Tongzhou Wang	560d6efd3a	Only join started dataloader workers (#11432 ) Summary: `Process.start()` actually take some time as it needs to start a process and pass the arguments over via a pipe. Therefore, we only add a worker to self.workers list after it started, so that we do not call `.join()` if program dies before it starts, and `__del__` tries to join it but will get: AssertionError: can only join a started process. Example trace when such error happens: ```py [unrelated] File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 500, in __iter__ return _DataLoaderIter(self) File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 292, in __init__ w.start() File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 112, in start self._popen = self._Popen(self) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/context.py", line 277, in _Popen return Popen(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 20, in __init__ self._launch(process_obj) File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/popen_fork.py", line 70, in _launch self.pid = os.fork() KeyboardInterrupt Exception ignored in: <function _DataLoaderIter.__del__ at 0x7fa704d5aa60> Traceback (most recent call last): File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 398, in __del__ self._shutdown_workers() File "/private/home/ssnl/miniconda3/lib/python3.7/site-packages/torch/utils/data/dataloader.py", line 392, in _shutdown_workers w.join() File "/private/home/ssnl/miniconda3/lib/python3.7/multiprocessing/process.py", line 139, in join assert self._popen is not None, 'can only join a started process' AssertionError: can only join a started process ``` No test because hard to reliably trigger. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11432 Reviewed By: ezyang Differential Revision: D9735430 Pulled By: SsnL fbshipit-source-id: a8912d9bb4063f210d6236267b178173810e2351	2018-09-09 12:55:51 -07:00
Matt Dawkins	87b2f05a9c	Also set stdin to subprocess pipe in FindCUDNN windows popen call (#11435 ) Summary: Same issue as https://github.com/pytorch/pytorch/pull/10379, just in a different place (adding this resolves it) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11435 Differential Revision: D9736396 Pulled By: soumith fbshipit-source-id: 220a52b8009fc2bee9313c5a091443c68f85f62f	2018-09-09 11:40:25 -07:00
Thomas Viehmann	581099a7b2	pybind conversion for IntList (#11425 ) Summary: as discussed with ezyang and slayton58 , this might be a nice convenience to be able to use code in extensions just as in ATen. also split off `tracing_state.h` from `torch/jit/tracer.h` fix #11204 to bee able to use the utility functions pytorchbot it's not a jit patch per se. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11425 Differential Revision: D9735556 Pulled By: ezyang fbshipit-source-id: 466c92bbdb1d7d7a970eba1c26b7583fe9756139	2018-09-09 10:39:40 -07:00
Soumith Chintala	ee4309a9ac	override BUILD_TEST when building gloo (#11431 ) Summary: A recent build regression is that we need a system GoogleTest for builds to pass. This was because, when building with Gloo, gloo is trying to build it's own tests, which look for system gtest [here](https://github.com/facebookincubator/gloo/blob/master/cmake/Dependencies.cmake#L72-L80) (because we're not using full cmake build and making it aware of third_party/GoogleTest, but instead, we are building it isolated using tools/build_pytorch_libs.sh Traditionally, we didn't ask Gloo to build it's tests, but because we added `-DBUILD_TEST=1` by default to all builds (in refactoring variable names), we accidentally started asking Gloo to build it's tests. This PR overrides the Gloo flags and asks it to not build tests (like it used to) Pull Request resolved: https://github.com/pytorch/pytorch/pull/11431 Differential Revision: D9736387 Pulled By: soumith fbshipit-source-id: 59e84edae780123b793bdaea5fd9ac46156cd0af	2018-09-09 10:11:56 -07:00
Mingfei Ma	1b94f5c6e6	optimize masked_fill on CPU (#11359 ) Summary: This PR parallels `masked_fill` on CPU, currently it runs in sequential on CPU. the following script is used to benchmark and verify this PR. On Xeon skylake 8180 (2 sockets * 28 cores), it runs `4.20` sec without the PR and `0.11` sec with the PR. ```python import torch import random from time import time size = 10 * 1000 * 1000 count = 100 def test_masked_fill(): dst = torch.randn(size) dst_ = dst.clone() mask = torch.rand(size).mul(2).floor().byte() val = random.random() tstart = time() for i in range(count): dst.masked_fill_(mask, val) tend = time() print("masked_fill_: %f" % (tend-tstart)) for i in range(size): if mask[i]: if dst[i] != val: print("fail") else: if dst[i] != dst_[i]: print("fail1") print("test_masked_fill: PASS") test_masked_fill() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11359 Differential Revision: D9735578 Pulled By: ezyang fbshipit-source-id: d437ad7c6dace1910d0c18d6d9ede80efb44fae4	2018-09-09 00:25:26 -07:00
Syed Tousif Ahmed	b7ecf035dc	Updates FindCUDA.cmake to 3.12.2 upstream version (#11406 ) Summary: This PR is just a copy-paste of the upstream FindCUDA.cmake. Since, cublas_device is deprecated in CUDA >= 9.2, this change is necessary for build. Related: https://gitlab.kitware.com/cmake/cmake/merge_requests/2298 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11406 Differential Revision: D9735563 Pulled By: ezyang fbshipit-source-id: c74d86ced7cc485cb2233f9066ce23e921832c30	2018-09-08 23:10:32 -07:00
Erik Brinkman	6683fb56ca	Add AVX optimizations for pdist (#11230 ) Summary: Added AVX optimizations for pdist using Vec256. This brings single threaded performance up to speed with scipy, but the current implementation greatly hurts performance without AVX enabled. Is there a way to special case out AVX on dispatch and call the non Vec256 code? Or is the way I used Vec256 completely wrong? Single threaded comparison to scipy ============================ This is the time to compute the pdist of a 2048 x 2048 float matrix with only one thread for various values of p between torch and scipy. p = 3 is the code path for arbitrary p, and so is much slower than the other values. p \| torch \| scipy -----\|-----------\|------ 0 \| 6.27 s ± 393 ms \| 7.23 s ± 498 ms 1 \| 5.49 s ± 201 ms \| 43.4 s ± 1.09 s 2 \| 5.74 s ± 474 ms \| 53.8 s ± 3.52 s ∞ \| 5.59 s ± 292 ms \| 47.4 s ± 2.03 s 3 \| really slow \| gave up Result by AVX support ================ This is the time to compute the distance and gradient of a 2048 x 2048 float matrix with all threads by AVX support. `before` is the old code, `default` is no AVX support, etc. Interestingly the AVX optimizations provided a great benefit over the old unoptimized code, but drastically hurt performance when compiled without AVX optimizations. p = 3 is the code path for arbitrary p, and so is much slower than the other values. Results for p = 0 ---------------- avx \| dist \| grad ----\|------\|----- before \| 514 ms ± 87.5 ms \| 191 µs ± 35 µs default \| 3.47 s ± 183 ms \| 201 µs ± 24.6 µs avx \| 123 ms ± 18.2 ms \| 281 µs ± 130 µs avx2 \| 103 ms ± 11.4 ms \| 216 µs ± 74.4 µs Results for p = 1 ---------------- avx \| dist \| grad ----\|------\|----- before \| 426 ms ± 35 ms \| 6.21 s ± 187 ms default \| 2.6 s ± 123 ms \| 5.62 s ± 273 ms avx \| 104 ms ± 6.37 ms \| 833 ms ± 44.3 ms avx2 \| 106 ms ± 3.59 ms \| 924 ms ± 86.2 ms Results for p = 2 ----------------- avx \| dist \| grad ----\|------\|----- before \| 425 ms ± 45.4 ms \| 6.31 s ± 125 ms default \| 3.04 s ± 187 ms \| 3.55 s ± 242 ms avx \| 110 ms ± 3.66 ms \| 896 ms ± 21.8 ms avx2 \| 113 ms ± 4.68 ms \| 934 ms ± 25.2 ms Results for p = ∞ ------------------ avx \| dist \| grad ----\|------\|----- before \| 501 ms ± 39.5 ms \| 6.64 s ± 321 ms default \| 2.15 s ± 92.9 ms \| 8.43 s ± 355 ms avx \| 104 ms ± 5.52 ms \| 835 ms ± 36.7 ms avx2 \| 100 ms ± 3.41 ms \| 864 ms ± 67 ms Results for p = 3 ----------------- avx \| dist \| grad ----\|------\|----- before \| 22.6 s ± 413 ms \| 11.1 s ± 242 ms default \| 24.9 s ± 1 s \| 11.2 s ± 293 ms avx \| 2.69 s ± 148 ms \| 5.63 s ± 88.4 ms avx2 \| 2.48 s ± 31.8 ms \| 5.61 s ± 114 ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/11230 Differential Revision: D9735503 Pulled By: erikbrinkman fbshipit-source-id: a9da619249e4ca2625b39ca1ca7f5543c3086bfb	2018-09-08 22:55:02 -07:00
Tongliang Liao	538ea67437	Search for CMake config files for pybind11. (#11423 ) Summary: If pybind is build with cmake and installed, we should use config file instead of the Findpybind11 shipped with caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11423 Differential Revision: D9735557 Pulled By: ezyang fbshipit-source-id: 28a39e579fa045060aa1a716e5fd7dbcf7b89569	2018-09-08 22:44:03 -07:00
Yifei Teng	02114e877f	fix #10838 incorrect bidirectional output format (#11368 ) Summary: Fixes the issue discussed in #10838. `hidden_size` should be the last dimension regardless if we're in ONNX or PyTorch. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11368 Differential Revision: D9734814 Pulled By: soumith fbshipit-source-id: 7f69947a029964e092c7b88d1d79b188a417bf5f	2018-09-08 17:09:57 -07:00
Edward Yang	ac9268f25d	Conversions to and from complex numbers. (#11420 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11420 Surprisingly tricky! Here are the major pieces: - We grow a even yet more ludicrous macro AT_FORALL_SCALAR_TYPES_WITH_COMPLEX_EXCEPT_COMPLEX_HALF which does what it says on the tin. This is because I was too lazy to figure out how to define the necessary conversions in and out of ComplexHalf without triggering ambiguity problems. It doesn't seem to be as simple as just Half. Leave it for when someone actually wants this. - Scalar now can hold std::complex<double>. Internally, it is stored as double[2] because nvcc chokes on a non-POD type inside a union. - overflow() checking is generalized to work with complex. When converting to std::complex<T>, all we need to do is check for overflow against T. When converting from complex, we must check (1) if To is not complex, that imag() == 0 and (2) for overflow componentwise. - convert() is generalized to work with complex<->real conversions. Complex to real drops the imaginary component; we rely on overflow checking to tell if this actually loses fidelity. To get the specializations and overloads to work out, we introduce a new Converter class that actually is specializable. - Complex scalars convert into Python complex numbers - This probably fixes complex tensor printing, but there is no way to test this right now. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Reviewed By: cpuhrsch Differential Revision: D9697878 Pulled By: ezyang fbshipit-source-id: 181519e56bbab67ed1e5b49c691b873e124d7946	2018-09-08 16:39:43 -07:00
Tongzhou Wang	d3f98b5ffc	Add matrix power (#11421 ) Summary: vishwakftw Your patch needed some updates because the default native function dispatches changed from `[function, method]` to `[function]`. The CI was run before that change happened so it still shows green, but the internal test caught it. I did some changes when rebasing and updating so I didn't just force push to your branch. Let's see if this passes CI and internal test. If it does, let me know if you want me to force push to your branch or use this PR instead. Note to reviewers: patch was already approved at #10068 . cc yf225 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11421 Differential Revision: D9733407 Pulled By: SsnL fbshipit-source-id: cf2ed293bb9942dcc5158934ff4def2f63252599	2018-09-08 15:25:56 -07:00
Edward Yang	802380ac93	Improve LegacyTypeDispatch to handle initialization correctly. (#11331 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11331 In the previous commit, we added a bare-bones LegacyTypeDispatch in ATen/core. This is not sufficient for the use cases we need: we not only need to be able to get a Type, but we also need to be able to initialize the Types if its the first time we have retrieved a CPU/CUDA/Complex type. I hemmed and hawed about how to do this; the strategy this PR takes is to introduce a new "hooks" interface specifically for initializing CPU/CUDA/Complex (which still lives in Context). We then move all "user-friendly" functions to LegacyTypeDispatch. Here were some other options which I considered, but don't work: - Assume that Type is already initialized, because we only intend to call Type from Tensor methods, where we already have a Tensor. This does not work because Caffe2 created tensors will not have gone through the standard Type codepath, and will have skipped initialization. - Move CUDAHooks and ComplexHooks to ATen/core. Besides being sucky, this isn't even a complete fix, because I still need to initialize CPU hooks (so you still need another hooks interface). Reviewed By: cpuhrsch Differential Revision: D9666612 fbshipit-source-id: ac7004b230044b67d13caa81fdfaf3c6ab915e3f	2018-09-08 10:10:17 -07:00
Edward Yang	9687a72794	Move the type registry out of Context, into LegacyTypeDispatch. (#11274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11274 We don't want to put all of Context into ATen/core, but one particular part cannot be avoided: the type registry, because implementations of TensorMethods will need to get a Type, and then do a virtual call on it. I needed to do a little bit of (temporary) footwork to get this in without also moving Type, because unique_ptr<Type> expects to be able to see the destructor of Type (but it's forward declared right now). So instead I put the destructor as an explicit functor. We can get rid of this once Type actually moves in ATen/core Reviewed By: cpuhrsch Differential Revision: D9657449 fbshipit-source-id: 940931493bf4f1f6a8dad03f34633cacdd63dd0b	2018-09-08 10:10:11 -07:00
Tongzhou Wang	b9b9ae935b	Make torch.randint have default dtype int64 (#11040 ) Summary: cc gchanan apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11040 Differential Revision: D9565728 Pulled By: SsnL fbshipit-source-id: eb5be9609f30c88f52746fa7e13ad71e2856648e	2018-09-08 07:55:06 -07:00
Teng Li	505ecab88d	bumping up the default store timeout (#11409 ) Summary: to 300 seconds to be safe. It used to be no timeout in THD Pull Request resolved: https://github.com/pytorch/pytorch/pull/11409 Differential Revision: D9731709 Pulled By: teng-li fbshipit-source-id: 0ce011dcca507cbf063176ad4995405c77dd0cdd	2018-09-07 23:55:23 -07:00
Pieter Noordhuis	3d2862526b	Support send/recv for the gloo process group (#11387 ) Summary: This change removes the skips for the existing send/recv tests in the backwards compatibility layer. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11387 Reviewed By: teng-li Differential Revision: D9729330 Pulled By: pietern fbshipit-source-id: f8899219a94d806386d03e9ef53bff622d8658a3	2018-09-07 20:25:18 -07:00
James Reed	47c1de25e8	Test exporting batch norm, dropout, RNN Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11126 Differential Revision: D9727689 Pulled By: jamesr66a fbshipit-source-id: f142257a2fba27d86844bf33084174f1f68a8ca5	2018-09-07 19:41:39 -07:00
Deyu Fu	b7a2c91eed	remove unnecessary clone() when .grad is None (#11165 ) Summary: Currently gradient is copied into .grad if it is None. This PR aim to remove the copy when it is not absolutely needed. It is generally an improvement of speed and memory usage. And here is a case it may help a lot: Normally, people do optimizer.zero_grad() every minibatch before backward. It will translate into a memset, and later a point-wise add. When there is some large weight in the network, one optimization people can always do is set parameter.grad to None instead of zero_grad. This will remove memset and change point-wise add to a memcpy. Here is result running following script on V100 GPU. It is 100 iterations of forward/backward/zero_grad on single 1-billion word benchmark size embedding. `Zero grad: 2.123847723007202` `None grad: 1.3342866897583008` With the backend change of this PR, the unnecessary memcpy is removed, thus further speed up is achieved. `Zero grad: 2.124978542327881` `None grad: 0.4396955966949463` [benchmark.txt](https://github.com/pytorch/pytorch/files/2341800/benchmark.txt) Some details on the code change: .detach() is used because we need to get rid of new_grad being a view without copy data. This should be safe in first-order only mode. data need to be contiguous, otherwise `grad_variable.data() += new_grad.data();` below will fail. Only the last variable that has reference to the temp gradient will grab its buffer. ngimel, mcarilli and mruberry helped on finalizing this PR. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11165 Differential Revision: D9728874 Pulled By: soumith fbshipit-source-id: b8fb822a2dff6e812bbddd215d8e384534b2fd78	2018-09-07 19:41:37 -07:00
Edward Yang	c49b01a8a0	Change default variants to 'function'. (#11247 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11247 Previously, the default for a declaration in native_functions.yaml was ['function', 'method'], i.e., generate both a method and function for every binding. We now believe this is inappropriate: the majority of new kernels added to PyTorch should live as free functions, NOT methods. Thus, we change the default accordingly. I also took the opportunity to de-method some "internal" functions that had a leading underscore. While, strictly speaking, this is a BC breaking change, I believe it is highly unlikely anyone was using these directly. Reviewed By: yf225 Differential Revision: D9648570 fbshipit-source-id: 8b94647b824e0899d6d18aa5585aaedc9d9957d2	2018-09-07 17:56:08 -07:00
Jesse Hellemn	fa522d1aed	Revert D9720931: [pytorch][PR] [third-party] Update googletest to release-1.8.1 Differential Revision: D9720931 Original commit changeset: 18a60d0409e7 fbshipit-source-id: a05dcba71277eb4f8ac38886f307d6cf6e6955a9	2018-09-07 17:42:03 -07:00
Yangqing Jia	c9843bd86b	Update googletest to release-1.8.1 (#11388 ) Summary: This is mainly to pick up the change `20074be19a` to avoid polluting the CMAKE_DEBUG_POSTFIX variable. cc orionr . Pull Request resolved: https://github.com/pytorch/pytorch/pull/11388 Reviewed By: orionr Differential Revision: D9720931 Pulled By: Yangqing fbshipit-source-id: 18a60d0409e74316f74d364f4fe16bf0d0198413	2018-09-07 16:56:16 -07:00
Peter Goldsborough	31d36b1d31	move complex registration test out-of-line (#11397 ) Summary: Moves the code for the complex registration code into an out-of-line C++ extension to de-noise the test_cpp_extensions.py file. Let's keep it nice and tidy so we can point our users at it for usage examples. ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11397 Differential Revision: D9725335 Pulled By: goldsborough fbshipit-source-id: 290618f2ee711b1895cdb8f05276034dfe315c6d	2018-09-07 16:56:14 -07:00
James Reed	4ae16c9ad9	Recursive descent for validation + convert expands in ATen fal… (#11356 ) Summary: …lback Pull Request resolved: https://github.com/pytorch/pytorch/pull/11356 Differential Revision: D9721002 Pulled By: jamesr66a fbshipit-source-id: eeb50b56f8a72e929860c5e459a5ab50ac624814	2018-09-07 16:39:36 -07:00
Xiaomeng Yang	4c8cc36e34	Fix igios build (#11392 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11392 Fix igios build Reviewed By: houseroad Differential Revision: D9720833 fbshipit-source-id: 33acc3c658c22addd4bad142433824076233e901	2018-09-07 15:55:23 -07:00
David Riazati	4bf5fc44c8	Fix split_size test failures (#11051 ) Summary: ~~This PR fixes #8525 by renaming `split_with_sizes` to `split` so that 2 `aten::split` ops are generated (previously `aten::split(self, int, int)` and `aten::split_with_sizes(self, int[], int)` were generated)~~ ~~`split_with_sizes` was made in PR #5443, but I don't see a reason for it to have a different name than `split` rather than just overload `split`.~~ This PR fixes #8525 by adding `register_special_ops.cpp` to mirror Python dispatching from `split` to `split` and `split_with_sizes` in [tensor.py](https://github.com/pytorch/pytorch/blob/master/torch/tensor.py#L279). It also fixes #8520 by adding an `int[]` wherever it sees `torch.Size` In a follow up PR this could also be used to fix some of the other `unknown builtin op` test errors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11051 Differential Revision: D9582443 Pulled By: driazati fbshipit-source-id: d27201f85937d72e45e851eaa1460dd3dd1b61a9	2018-09-07 15:39:24 -07:00
Pieter Noordhuis	9886ebeb24	Remove hardcoded system path from CMAKE_MODULE_PATH (#11386 ) Summary: This seems to be causing different versions of OpenMPI being picked up by different parts of the build. Not a good practice to include absolute paths anyway, so let's try removing it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11386 Reviewed By: teng-li Differential Revision: D9724349 Pulled By: pietern fbshipit-source-id: 3dfef91c81f2e97e5125284aff9e7e98f8761917	2018-09-07 15:25:38 -07:00
Orion Reblitz-Richardson	802d21c8f4	Remove FULL_CAFFE2 flag (#11321 ) Summary: Continuing pjh5's work to remove FULL_CAFFE2 flag completely. With these changes you'll be able to also do something like ``` NO_TEST=1 python setup.py build_deps ``` and this will skip building tests in caffe2, aten, and c10d. By default the tests are built. cc mingzhe09088 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11321 Reviewed By: mingzhe09088 Differential Revision: D9694950 Pulled By: orionr fbshipit-source-id: ff5c4937a23d1a263378a196a5eda0cba98af0a8	2018-09-07 15:09:44 -07:00
Tongzhou Wang	93da5a21c9	Update variable view note Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11393 Differential Revision: D9725444 Pulled By: SsnL fbshipit-source-id: b1607d986ab93e64b0b0ff9e8f10d9e3f6e2160e	2018-09-07 15:09:43 -07:00
Peter Goldsborough	77b6d7d255	Doc improvements (#11347 ) Summary: 1. Remove cudnn* symbols from C++ docs 2. Fix code examples for `nn::Module` and `jit::compile` 3. Document Dropout Pull Request resolved: https://github.com/pytorch/pytorch/pull/11347 Differential Revision: D9716751 Pulled By: goldsborough fbshipit-source-id: e0566cec35848335cac3eb9196cb244bb0c8fa45	2018-09-07 14:39:36 -07:00
Zachary DeVito	7de0332e10	Add initial documentation for JIT (#11357 ) Summary: In addition to documentation, this cleans up a few error message formats. It also adds infra to find which operators are supported by the JIT automatically, which is then used in the generation of the docs. The wording and formatting of the docs is not yet polished, but having this will allow our document writers to make faster progress. Followup PRs will polish the docs and fix formatting issues. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11357 Differential Revision: D9721277 Pulled By: zdevito fbshipit-source-id: 153a0d5be1efb314511bcfc0cec48643d78ea48b	2018-09-07 14:27:47 -07:00
Wanchao Liang	69b4b45f91	enable missing nn tests with single grad check, minor refactor Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11366 Differential Revision: D9723305 Pulled By: wanchaol fbshipit-source-id: 9e7e2e7e68cb4919610bccfbf76fa33b647f6eb7	2018-09-07 14:27:46 -07:00
Teng Li	576807ce1a	flaky test fix trial (#11391 ) Summary: Add a barrier() to wait for all PG created before destroy Pull Request resolved: https://github.com/pytorch/pytorch/pull/11391 Differential Revision: D9727383 Pulled By: teng-li fbshipit-source-id: 689d62c978e642b68f4949dcf29982e34869ada4	2018-09-07 14:10:06 -07:00
Xiaodong Wang	e9da2dd3cc	Do not use PERSISTENT cudnn mode for spatialBN (#11382 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11382 We found this cudnn bug in S163230 that causes accuracy loss. We fix this in D9601217, but due to the reimplementation of spatialBN it's overwritten. Let's land this fix again. Reviewed By: kuttas Differential Revision: D9702347 fbshipit-source-id: 11547e9edaf7b2ba7f4aa7263ffb4f0281bbf078	2018-09-07 13:41:18 -07:00
Peter Goldsborough	01930a3145	Move sync_params to C++ (#9805 ) Summary: The next function I'm moving to C++ is `sync_params`. It is stacked on top of https://github.com/pytorch/pytorch/pull/9729, so some changes will go away when it lands and I rebase. I also split code into a `.h` and `.cpp` file for better code organization. The controller you requested could not be found. pietern apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9805 Differential Revision: D9688604 Pulled By: goldsborough fbshipit-source-id: 4467104d3f9e2354425503b9e4edbd59603e20a8	2018-09-07 12:56:40 -07:00
Gu Wang	ba6f10343b	update CUDAExtension doc (#11370 ) Summary: fix typo Pull Request resolved: https://github.com/pytorch/pytorch/pull/11370 Differential Revision: D9701777 Pulled By: soumith fbshipit-source-id: 9f3986cf30ae0491e79ca4933c675a99d6078982	2018-09-07 12:56:38 -07:00
vishwakftw	733402bef4	Fix issues with certain heterogeneous types in lists during tensor creation (#11377 ) Summary: Closes #9963 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11377 Differential Revision: D9701824 Pulled By: soumith fbshipit-source-id: 89c5448fd90ece1b365dc42f775b6b0c73ce790c	2018-09-07 12:56:35 -07:00
Jerry Zhang	5e400e9cae	move context_base.h to ATen/core (#11336 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11336 Move `context_base.h` header to `ATen/core` and the implementations are in `caffe2/core/context_base.cc` Reviewed By: ezyang Differential Revision: D9670493 fbshipit-source-id: ce5bf2b3b4c80e9b62819f4332ce68af82720055	2018-09-07 12:20:25 -07:00
Peter Goldsborough	fb4e8088f3	Remove methods that start with an underscore from at::Tensor (#11152 ) Summary: This PR cleans up the `at::Tensor` class by removing all methods that start with an underscore in favor of functions in the `at::` namespace. This greatly cleans up the `Tensor` class and makes it clearer what is the public and non-public API. For this I changed `native_functions.yaml` and `Declarations.cwrap` to make all underscore methods `variant: function` (or add such a statement to begin with), and then fixed all code locations using the underscore methods. ezyang colesbury gchanan Pull Request resolved: https://github.com/pytorch/pytorch/pull/11152 Differential Revision: D9683607 Pulled By: goldsborough fbshipit-source-id: 97f869f788fa56639c05a439e2a33be49f10f543	2018-09-07 11:55:11 -07:00
Tongzhou Wang	e80f7e1f64	Fix more warnings (#11320 ) Summary: also a missing space in fft error message Pull Request resolved: https://github.com/pytorch/pytorch/pull/11320 Differential Revision: D9676012 Pulled By: SsnL fbshipit-source-id: a636e5fce042198510c8e456fa51fde714da8348	2018-09-07 11:26:58 -07:00
Erik Brinkman	91089a7e17	Add GPU implementation of pdist (#11102 ) Summary: Add the gpu kernel version. The parallelism I went with performs poorly when there are a large number of vectors, but they're all short, as I don't allocate the thread pool to wrap in that case. Test Plan --------- ``` python -m unittest test_torch.TestTorch.test_pdist_{empty,scipy} test_nn.TestNN.test_pdist{,_zeros,_empty_row,_empty_col,_cpu_gradgrad_unimplemented,_cuda_gradgrad_unimplemented} test_jit.TestJitGenerated.test_nn_pdist ``` Current performance specs are a little underwhelming, I'm in the process of debugging. size \| torch \| torch cuda \| scipy -----\|-------\|------------\|------ 16 x 16 \| 9.13 µs ± 3.55 µs \| 9.86 µs ± 81.5 ns \| 15.8 µs ± 1.2 µs 16 x 1024 \| 15 µs ± 224 ns \| 9.48 µs ± 88.7 ns \| 88.7 µs ± 8.83 µs 1024 x 16 \| 852 µs ± 6.03 µs \| 7.84 ms ± 6.22 µs \| 4.7 ms ± 166 µs 1024 x 1024 \| 34.1 ms ± 803 µs \| 11.5 ms ± 6.24 µs \| 273 ms ± 6.7 ms 2048 x 2048 \| 261 ms ± 3.5 ms \| 77.5 ms ± 41.5 µs \| 2.5 s ± 97.6 ms 4096 x 4096 \| 2.37 s ± 154 ms \| 636 ms ± 2.97 µs \| 25.9 s ± 394 ms Pull Request resolved: https://github.com/pytorch/pytorch/pull/11102 Differential Revision: D9697305 Pulled By: erikbrinkman fbshipit-source-id: 2b4f4b816c02b3715a85d8db3f4e77479d19bb99	2018-09-07 09:09:46 -07:00
Gregory Chanan	110191e5c7	Remove detach from TensorImpl, handle via Type. (#11337 ) Summary: This is so that TensorImpl does not have to depend on Tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11337 Differential Revision: D9684421 Pulled By: gchanan fbshipit-source-id: d2af93420ca6d493429c251cfe5a34e9289c4484	2018-09-07 08:55:59 -07:00
Edward Yang	52b37d8b66	Move VariableHooksInterface to ATen/core (#11273 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11273 This one might strike you as a bit surprising, but it's necessary to expose this interface in ATen/core, because we need to be able to get a true Variable type from Variable tensors, and to do that we need to go through the hooks interface. Reviewed By: gchanan Differential Revision: D9656548 fbshipit-source-id: 28bb5aee6ac304e8cd5fa1e4c65452c336647161	2018-09-07 08:11:53 -07:00
Edward Yang	396e64fff7	Move ATen/Registry.h to ATen/core/Registry.h (#11270 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11270 Still need to deduplicate this with caffe2/core/registry.h, but this will be a bit tricky because the current formulation of the macro is namespace sensitive (i.e., the macro for classes defined in at:: namespace won't work if you call from caffe2:: namespace). Reviewed By: gchanan Differential Revision: D9654871 fbshipit-source-id: 2207d1f2cc6d50bd41bf64ce0eb0b8523b05d9d9	2018-09-07 08:11:52 -07:00
Edward Yang	b02b125d16	Rename getMaybeVariableType back to getType. (#11250 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11250 ``` codemod -d . --extensions cc,cpp,cu,cuh,h getMaybeVariableType getType ``` Reviewed By: gchanan Differential Revision: D9648830 fbshipit-source-id: 6b2ac2b1c265ae47722390e6e7f106653077d851	2018-09-07 08:11:50 -07:00
Jongsoo Park	68371b6d2e	fast code path when partition=1 which makes LengthsPartition a simple copy (#11351 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11351 When partitions == 1 (InputSize() == OutputSize()), LengthsPartition becomes just a copy. Reviewed By: aazzolini Differential Revision: D9693409 fbshipit-source-id: a9ea034d227af357b661477ab779a71600f58f58	2018-09-07 08:11:49 -07:00
vishwakftw	da4ebc2971	Switch SVD on CPU from gesvd to gesdd (#11194 ) Summary: - Added a note to the doc string for `svd`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11194 Differential Revision: D9683250 Pulled By: soumith fbshipit-source-id: 2d2c120be346122afa333629c0516a5c9dbb406f	2018-09-07 07:39:57 -07:00
rasbt	f9595e756e	typo/grammar fixes (#11344 ) Summary: Fixes some minor grammar issues in the code base. PS: I was actually looking for the following one but couldn't find it via grepping in this repo: ![screen shot 2018-09-06 at 3 27 39 pm](https://user-images.githubusercontent.com/5618407/45184280-1e16a980-b1ec-11e8-9cb1-87a96738bdd1.png) Any idea in which file this issue is raised? Pull Request resolved: https://github.com/pytorch/pytorch/pull/11344 Differential Revision: D9696454 Pulled By: soumith fbshipit-source-id: 8ffe494b1bf1efb0e35563381d9da2e1e8032a3c	2018-09-06 21:57:14 -07:00
mruberry	a2afad2b69	Improves ATen CUDAEvent (#11293 ) Summary: After submitting PR #9726, PR #10581 created a different CUDAEvent class. The CUDAEvent proposed in #9726 was similar to the c10d::CUDAEvent class with additional testing and functionality. In particular, it was movable but not copyable. The CUDAEvent created by #10581 is refcounted and copyable. This PR retains the refcounting of the latter PR while fixing several bugs, adding tests, and extending the functionality to support testing and usage like in PR #8354. In particular, this PR: - Adds set_device() to CUDAContext - Adds three CUDAEvent tests to stream_test.cpp - Fixes three bugs: - Refcounting was broken. Destroying an of the RAIIs holding a particular CUDAEvent would destroy the event UNLESS it was the last RAII (the check was backwards). - Moving an event would cause a segfault. - Events were not destroyed on the device they were created on. See PR #9415 (pietern) - Adds the happened() and recordOnce() functions - Changes the record() functions to not be const - Adds additional assertions to verify correctness This PR does not: - Make c10d use the ATen CUDAEvent (this is appropriate for a separate PR) Whether events should be refcounted is an interesting question. It adds some atomic operations and makes event creation eager. Making events movable but not copyable (like the c10d events) avoids these costs and allows events to be lazily constructed. Lazy construction is preferable when working with containers (like std::array or std::vector) and because the event's device can be set automatically to the first stream it's recorded on. With eager construction the user is required to understand that events have a device and acquire the device of the stream the event will be recorded on upfront. This can be seen here: `542aadd9a7/aten/src/ATen/native/cudnn/RNN.cpp (L1130-L1132)` and that file is the only one which currently uses the ATen CUDAEvent. Refcounting does allow single writer multi-reader scenarios, although these scenarios can be also be supported by providing indirect access to the underlying CUDAEvent. I believe all current and planned usage scenarios do not require refcounting, and if desired I can update this PR to remove refcounting and make the ATen event movable but not copyable like the c10d event. I think not refcounting is preferable because it can improve performance, ease usability, and simplify the code (as seen with two of the above bugs). I have decided to separate this from PR #8354 since while it's required for PR #8354 the changes are, clearly, of independent interest. PR #8354 has a new dependency on this one, however. I am closing PR #9726 in favor of this PR. apaszke ezyang pietern Pull Request resolved: https://github.com/pytorch/pytorch/pull/11293 Differential Revision: D9665836 Pulled By: soumith fbshipit-source-id: a1513fa4f9761e2f304d126e402f6b6950e1c1d2	2018-09-06 21:39:44 -07:00
Neeraj Pradhan	b3b1e7624d	Optional expand=True kwarg in distribution.enumerate_support (#11231 ) Summary: This adds an optional `expand=True` kwarg to the `distribution.expand_support()` method, to get a distribution's support without expanding the values over the distribution's `batch_shape`. - The default `expand=True` preserves the current behavior, whereas `expand=False` collapses the batch dimensions. e.g. ```python In [47]: d = dist.OneHotCategorical(torch.ones(3, 5) * 0.5) In [48]: d.batch_shape Out[48]: torch.Size([3]) In [49]: d.enumerate_support() Out[49]: tensor([[[1., 0., 0., 0., 0.], [1., 0., 0., 0., 0.], [1., 0., 0., 0., 0.]], [[0., 1., 0., 0., 0.], [0., 1., 0., 0., 0.], [0., 1., 0., 0., 0.]], [[0., 0., 1., 0., 0.], [0., 0., 1., 0., 0.], [0., 0., 1., 0., 0.]], [[0., 0., 0., 1., 0.], [0., 0., 0., 1., 0.], [0., 0., 0., 1., 0.]], [[0., 0., 0., 0., 1.], [0., 0., 0., 0., 1.], [0., 0., 0., 0., 1.]]]) In [50]: d.enumerate_support().shape Out[50]: torch.Size([5, 3, 5]) In [51]: d.enumerate_support(expand=False) Out[51]: tensor([[[1., 0., 0., 0., 0.]], [[0., 1., 0., 0., 0.]], [[0., 0., 1., 0., 0.]], [[0., 0., 0., 1., 0.]], [[0., 0., 0., 0., 1.]]]) In [52]: d.enumerate_support(expand=False).shape Out[52]: torch.Size([5, 1, 5]) ``` Motivation: - Currently `enumerate_support` builds up tensors of size `support + batch_shape + event_shape`, but the values are repeated over the `batch_shape` (adding little in the way of information). This can lead to expensive matrix operations over large tensors when `batch_shape` is large (see, example above), often leading to OOM issues. We use `expand=False` in Pyro for message passing inference. e.g. when enumerating over the state space in a Hidden Markov Model. This creates sparse tensors that capture the markov dependence, and allows for the possibility of using optimized matrix operations over these sparse tensors. `expand=True`, on the other hand, will create tensors that scale exponentially in size with the length of the Markov chain. - We have been using this in our [patch](https://github.com/uber/pyro/blob/dev/pyro/distributions/torch.py) of `torch.distributions` in Pyro. The interface has been stable, and it is already being used in a few Pyro algorithms. We think that this is more broadly applicable and will be of interest to the larger distributions community. cc. apaszke, fritzo, alicanb Pull Request resolved: https://github.com/pytorch/pytorch/pull/11231 Differential Revision: D9696290 Pulled By: soumith fbshipit-source-id: c556f8ff374092e8366897ebe3f3b349538d9318	2018-09-06 21:39:42 -07:00
Yan Zhu	c59c1a25b2	diagnose option: get_entry to print a whole row (#11308 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11308 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11299 Reviewed By: xianjiec Differential Revision: D9652844 fbshipit-source-id: 650d550317bfbed0c1f25ae7d74286cfc7c3ac70	2018-09-06 21:26:30 -07:00
Edward Yang	2946b021e3	Disable flaky test, see #11360 (#11361 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11361 Reviewed By: yf225 Differential Revision: D9696524 Pulled By: ezyang fbshipit-source-id: f6801d6f4f34090d467b16810db9cf576d5d519b	2018-09-06 20:40:00 -07:00
Edward Yang	3149a72c63	Move TensorOptions.cpp to the correct place in ATen/core (#11244 ) Summary: This actually ended up being a lot more involved than I thought. The basic problem is that in some of our build environments, thread local state is not supported. The correct way to test if this is the case is using the (undocumented) CAFFE2_FB_LIMITED_MOBILE_CAPABILITY macro. On mobile, OptionGuard is not available, and you have to do everything by hand. There's a static_assert to check if you accidentally use OptionGuard in this case and give you a better error message in this case. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11244 Reviewed By: gchanan Differential Revision: D9646190 fbshipit-source-id: cf4016f79b47705a96ee9b6142eb34c95abb2bd4	2018-09-06 20:11:39 -07:00
Edward Yang	c45607f77f	Static assert GetMutable is not passed with Tensor argument (#11323 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11323 If you do pass it this, you'll get a pointer to UndefinedTensor; probably not what you want! Reviewed By: Yangqing Differential Revision: D9676205 fbshipit-source-id: 0bd3c22c2c40ac2958f95fc7a73b908af291cf22	2018-09-06 20:11:37 -07:00
Orion Reblitz-Richardson	0f419abf40	Roll nomnigraph build into caffe2 (#11303 ) Summary: We need to remove nomnigraph from the list of public libraries in order to support libtorch extensions. Easiest way to do this is to include it into the Caffe2 source like all other caffe2/core/ code. However, because the headers are in a different place, we need to include them for linked libraries (pybind, tests, etc). On an upside, this means that nomnigraph is now default hidden visibility too. FYI peterjc123 xkszltl goldsborough bwasti Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11303 Reviewed By: pjh5 Differential Revision: D9694932 Pulled By: orionr fbshipit-source-id: 5db3eb20bc5ddc873ce9151236b74663fbb33ed8	2018-09-06 19:38:09 -07:00
iotamudelta	9de2085806	Use custom hcc/HIP, purge hcSPARSE (#11198 ) Summary: * purge hcSPARSE now that rocSPARSE is available * integrate a custom hcc and HIP * hcc brings two important compiler fixes (fixes hundreds of unit tests) * HIP brings a smart dispatcher that allows us to avoid a lot of static_casts (we haven't yet removed the automatic static_casts but this catches some occurrences the script did not catch) * mark 5 unit tests skipping that have regressed w/ the new hcc (we don't know yet what is at fault) * optimize bitonic sort - the comparator is always an empty struct - therefore passing it by value saves at least 3 bytes. It also removes an ambiguity around passing references to `__global__` functions Pull Request resolved: https://github.com/pytorch/pytorch/pull/11198 Differential Revision: D9652340 Pulled By: ezyang fbshipit-source-id: f5af1d891189da820e3d13b7bed91a7a43154690	2018-09-06 19:38:07 -07:00
Xiaomeng Yang	ec5404a449	Add cuda version of SpatialBNOp also optimize SpatialBN on CPU (#10888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10888 Add cuda version of SpatialBNOp also optimize SpatialBN on CPU Reviewed By: houseroad Differential Revision: D9512435 fbshipit-source-id: 6f828c88d56d30dc9a2f98a297a161c35cc511b1	2018-09-06 18:26:13 -07:00
Teng Li	7726b36489	Full-fledged group testings and fixes for c10d frontend APIs (#11318 ) Summary: Fixed a few bugs that were not tested in the c10d frontend APIs, including get_rank, get_world_size, and destroy_process_group of a given group. These APIs are added to the CI tests. Also added all the group related tests, including full-group, and partial groups (existing ones), since both will hit different code paths. Also removed experimental APIs for c10d initially used in DDP, now we don't use it anyway. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11318 Reviewed By: pietern Differential Revision: D9675896 Pulled By: teng-li fbshipit-source-id: a2eac2c57933effa2d139855f786e64919a95bfc	2018-09-06 18:26:11 -07:00
Chenguang Xi	1a01c75dde	support gradClipping per blob in mtml (#10776 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10776 as title Reviewed By: chocjy Differential Revision: D9458099 fbshipit-source-id: f840d4f1542e8180f41cc0732c8468fa43805ab8	2018-09-06 18:10:52 -07:00
Lu Fang	c39216f8c4	Automatic update of fbcode/onnx to bff0b8835870c7df7762ef43498d000d2d8ffb52 (#11346 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11346 Previous import was 1b09eb14c2c781fae078fa6b1c0390ba6fc0898c Included changes: - [bff0b88](https://github.com/onnx/onnx/commit/bff0b88): Add DynamicSlice experimental op (#1377) <James Reed> - [91a7b8e](https://github.com/onnx/onnx/commit/91a7b8e): statCoverage(model) (#1246) <Akshay Chalana> - [36643c6](https://github.com/onnx/onnx/commit/36643c6): fix the doc for softmax (#1374) <Lu Fang> - [8c64acd](https://github.com/onnx/onnx/commit/8c64acd): Silence usused result warning in ONNXIFI wrapper cleanup. Fix #1344 (#1371) <Marat Dukhan> - [53b20f6](https://github.com/onnx/onnx/commit/53b20f6): Add the ability to deprecate an OpSchema (#1317) <Ryan Hill> - [8aec4e2](https://github.com/onnx/onnx/commit/8aec4e2): [Anderspapitto patch] fix the shape inference for broadcasting (#1368) <Lu Fang> Reviewed By: jamesr66a Differential Revision: D9691533 fbshipit-source-id: 6aff6ce04ade37182e2ffe9bc83eb86846bc722d	2018-09-06 17:39:57 -07:00
Richard Zou	4d678790c5	enable advanced indexing with tensors (#10862 ) Summary: On the way to #10774 This PR adds advanced indexing with tensors. The approach is to desugar advanced indexing into an at::index op. This is exactly how normal pytorch does it. [(I used this code as reference)](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp) Supporting sequences is a little tricky because JIT script doesn't have an easy way to turn arbitrary n-dimensional python lists into a tensor (it would be easy if we supported `torch.tensor`), so that'll come in a future PR. cc jamesr66a zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10862 Differential Revision: D9659449 Pulled By: zou3519 fbshipit-source-id: 56d293720d44c0fd27909e18327ab3985ddfced6	2018-09-06 16:41:45 -07:00
Duc Ngo	148f7cc47a	nomnigraph - nit - fix generated code to be consistent with style (#11343 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11343 make the generated classes (OpClasses.h...) consistent with fb cpp code style Reviewed By: yinghai Differential Revision: D9689487 fbshipit-source-id: 450e742d2462115d1bf41b9ea88d20df0a842b2b	2018-09-06 16:27:17 -07:00
Edward Yang	49231ab0a8	Reimplement storage slicing. (#11314 ) Summary: In #9466 I got rid of storage views and eliminated all places where they were used... OR SO I THOUGHT. In actuality, under certain conditions (specifically, if you trained a CUDA multiprocessing model shared over CUDA IPC and then serialized your parameters), you could also serialize storage slices to the saved model format. In #9466, I "fixed" the case when you loaded the legacy model format (really, just unshared the storages--not strictly kosher but if you aren't updating the parameters, shouldn't matter), but NOT the modern model format, so such models would fail. So, I could have applied the legacy model format fix too, but hyperfraise remarked that he had applied a fix that was effectively the same as unsharing the storages, but it had caused his model to behave differently. So I looked into it again, and realized that using a custom deleter, I could simulate the same behavior as old storage slices. So back they come. In principle, I could also reimplement storage views entirely using our allocators, but I'm not going to do that unless someone really really wants it. Fixes #10120. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11314 Reviewed By: ailzhang Differential Revision: D9671966 Pulled By: ezyang fbshipit-source-id: fd863783d03b6a6421d6b9ae21ce2f0e44a0dcce	2018-09-06 16:11:59 -07:00
Jongsoo Park	1d406c04ae	fix comment on Cost params_bytes (#11190 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11190 As discussed with Alexander Sidorov, params_bytes refer to the number of bytes we're reading for parameters, not the size of parameters. They only differ in sparse operators. Reviewed By: mdschatz Differential Revision: D9628635 fbshipit-source-id: 9e2aed0cf59388928dc69b8534cf254f0347c9c8	2018-09-06 15:12:22 -07:00
Yangqing Jia	68613cf5a2	Windows DLL build with Caffe2 code (#11266 ) Summary: This is an experimental build on top of what orionr and mingzhe09088 built. Essentially, the idea is that we will need separate *_API versions for different shared libraries. If this theory is right, I'll try to clean up the design a bit and document it properly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11266 Reviewed By: orionr Differential Revision: D9682942 Pulled By: Yangqing fbshipit-source-id: c79653199e67a1500c9174f39f8b0357324763f3	2018-09-06 15:12:20 -07:00
Orion Reblitz-Richardson	34c0043aae	Force third_party Eigen from setup.py (#11334 ) Summary: We shouldn't use system Eigen in any cases when building with setup.py. If people want to use system Eigen (not from third_party) they can build with CMake for now. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11334 Reviewed By: pjh5 Differential Revision: D9689450 Pulled By: orionr fbshipit-source-id: baf616b9f195692942151ad201611dcfe7d927ba	2018-09-06 14:56:53 -07:00
Tommy Yu	03ca7358af	Add unit test for Parallel Spatial Batch Normalization (#11098 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11098 Added a test for testing CPU version across multiple devices. Reviewed By: enosair, BIT-silence Differential Revision: D9584520 fbshipit-source-id: 0d8c85e6d402bc7b34d5f8f16ef655ff9b61b49e	2018-09-06 14:26:56 -07:00
Yinghai Lu	5712fe3297	Fix out-of-boundary conversion issue (#11338 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11338 The `min_` and `max_` value of the filler is in `double` format but when we are filling a specific type of tensor, their value can exceed the type limits, resulting in crash. This diff checks the type limits first and if `min_`/`max_` is out of the limits, it will clip it. Reviewed By: highker Differential Revision: D9684455 fbshipit-source-id: 6da98a03c57f3296abaddc7c5cfc1c836c611eb0	2018-09-06 13:39:52 -07:00
Teng Li	ec195129ec	Adding setTimeout option in Store (#11265 ) Summary: This will allow users to set customized timeout option for the store. Tested by my own debug print to make sure that C++ actually used the timeout Pull Request resolved: https://github.com/pytorch/pytorch/pull/11265 Differential Revision: D9666164 Pulled By: teng-li fbshipit-source-id: 4eb6441783da106a3fd59b95457e503e83e4640f	2018-09-06 12:55:50 -07:00
David Riazati	fef52cc1f8	Add resolver for 'torch' module (#10847 ) Summary: This lets you compile builtin functions from C++ without having a dependence on Python ```cpp auto module = torch::jit::compile(JIT"( def my_script_method(x, y): return torch.relu(x) + y )"); IValue result = module->run_method("my_script_method", 1, 2); ``` goldsborough zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10847 Differential Revision: D9543461 Pulled By: driazati fbshipit-source-id: 6160dae094030ca144a0df93cb9f26aa78c8cf27	2018-09-06 12:42:21 -07:00
Duc Ngo	0f1ec07c57	nomnigraph - nit - rename unit test files (#11315 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11315 Rename unit tests file to make it consistent with fb cpp style guideline "The unittest for MyFoo.cpp should be named MyFooTest.cpp." Reviewed By: yinghai Differential Revision: D9671519 fbshipit-source-id: 44ed6794f6e479d190916db8064eee692e3ad876	2018-09-06 12:28:18 -07:00
Peter Goldsborough	ed8849b640	Add include path to Doxygen preprocessing and add some documentation (#11313 ) Summary: 1. Add documentation to Linear and improve documentation for RNNs 2. Fix preprocessing in C++ docs by adding correct include path 3. Make myself and ebetica codeowner of docs/cpp to improve development speed ebetica ezyang soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11313 Differential Revision: D9683615 Pulled By: goldsborough fbshipit-source-id: 84ea32f9ea6b4060744aabbf5db368776a30f0b5	2018-09-06 12:28:17 -07:00
Costin Eseanu	f98bd53b01	Small fix to the UniformIntFill tensor shape and type inference. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11028 Reviewed By: salexspb Differential Revision: D7715107 Pulled By: costin-eseanu fbshipit-source-id: a4f73d53c0192b9826451b4bba4ab0992abbb1a2	2018-09-06 12:11:32 -07:00
Richard Zou	1ad61a18b2	Rename cuda tests to have 'cuda' in their names (#11332 ) Summary: Not a lot changed Pull Request resolved: https://github.com/pytorch/pytorch/pull/11332 Differential Revision: D9683680 Pulled By: zou3519 fbshipit-source-id: 95f444e54049dd268fc10effe425ef2df79c6467	2018-09-06 11:57:52 -07:00
Yiming Wu	0ef2b318a2	fix empty net type (#11286 ) Summary: Turns out that '' net.type is not acceptable to CreateNet. But empty net.type is acceptable. Fix that in this diff. Also this is related to T33613083 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11286 Reviewed By: Maratyszcza, wat3rBro Differential Revision: D9659920 Pulled By: harouwu fbshipit-source-id: d68f24b754e18e1121f029656d885c48ab101946	2018-09-06 11:10:01 -07:00
Xiaodong Wang	936bba77d1	cudnn 7 upgrade with spatialBN fix (#11291 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11291 In S163230, we've found CuDNN 7 upgrade causes accuracy drop in training convolution network such as ResNeXt-101 (~0% accuracy), and video R(2+1)D (65 --> 63%). Our current theory for this accuracy loss is because of the new "CUDNN_BATCHNORM_SPATIAL_PERSISTENT" in spatialBN operator. In Caffe 2, we've made this mode as default. According to CuDNN manual (https://fburl.com/z996mr13), this mode may introduce some limitation in the input data range and cause overflow (which outputs NaN). NaN is probably not the case, because we're seeing a few percent of accuracy drop but not gradient explosion or failure. However, this "performance-optimized" code path may introduce accuracy loss (which is not caught by our unit test case because the input data range is [-0.5-0.5]. Reviewed By: kuttas, stephenyan1231 Differential Revision: D9601217 fbshipit-source-id: 73c2690c19cb1f02ea4e5e2200f50128df4f377b	2018-09-06 10:11:59 -07:00
Elias Ellison	4ae95738b2	Ignore FuseGraph Call on Windows (#11015 ) Summary: Fusion is NYI implemented on Windows, so ignore FuseGraph call instead of failing. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11015 Differential Revision: D9619121 Pulled By: eellison fbshipit-source-id: ad09aeaa41b7fdeb9ca7bf5e1c166923ca405b15	2018-09-06 09:54:51 -07:00
Anders Papitto	a853a74217	defer resolution of mkl to a cmake wrapper library (#11298 ) Summary: this is a fix that's needed for building extensions with a pre-packaged pytorch. Consider the scenario where (1) pytorch is compiled and packaged on machine A (2) the package is downloaded and installed on machine B (3) an extension is compiled on machine B, using the downloaded package Before this patch, stage (1) would embed absolute paths to the system installation of mkl into the generated Caffe2Config.cmake, leading to failures in stage (3) if mkl was not at the same location on B as on A. After this patch, only a reference to the wrapper library is embedded, which is re-resolved on machine B. We are already using a similar approach for cuda. Testing: built a package on jenkins, downloaded locally and compiled an extension. Works with this patch, fails without. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11298 Differential Revision: D9683150 Pulled By: anderspapitto fbshipit-source-id: 06a80c3cd2966860ce04f76143b358de15f94aa4	2018-09-06 09:10:39 -07:00
Orion Reblitz-Richardson	dda8402447	Cleanup dependency of distributed flags (#11221 ) Summary: Now that we're building everything together, making all distributed flags conditional of USE_DISTRIBUTED being set. cc pietern The controller you requested could not be found. cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/11221 Reviewed By: Yangqing Differential Revision: D9664267 Pulled By: orionr fbshipit-source-id: a296cda5746ad150028c97160f8beacba955ff73	2018-09-06 08:56:00 -07:00
Gregory Chanan	68930c48cf	Move minimal wrapdim functionality to core, remove THTensor include i… (#11283 ) Summary: …n TensorImpl. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11283 Reviewed By: ezyang Differential Revision: D9660015 Pulled By: gchanan fbshipit-source-id: 263cba226d9ee981d55281c94e6fda5842a46b02	2018-09-06 08:10:33 -07:00
Edward Yang	f6568b00f5	Change includes from ATen/Storage.h to ATen/core/Storage.h (#11217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11217 ``` codemod -d . --extensions cc,cpp,cu,cuh,h 'ATen/Storage.h' 'ATen/core/Storage.h' ``` Reviewed By: gchanan Differential Revision: D9634904 fbshipit-source-id: 35a177733f3816e32d8748513c9caa4cf13a6896	2018-09-06 08:10:30 -07:00
Richard Zou	656e81db93	Fix scalar tensor assert in fusion compiler (#10952 ) Summary: Fixes #8560. Unblocks #10715. The assert (nDim <= uncompressedDims) was being triggered for a scalar tensor because we compute nDim to be 1 for a scalar tensor but uncompressedDim = 0. This PR changes it so that we compute nDim to be 0 for a scalar tensor. This works because indexing in a kernel depends on nDim. If nDim = 0, then offset is always 0, which is what we want. Some other (small) changes were necessary to make this work: - One cannot define a 0-length array `IndexType arr[0]` so the code guards against that - Needed to change some of the maxTensorInfoSize logic to handle the case when uncompressedDim == 0. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10952 Differential Revision: D9544607 Pulled By: zou3519 fbshipit-source-id: 2b873f47e2377125e1f94eb1b310a95cda51476c	2018-09-06 07:54:57 -07:00
Bram Wasti	bb7d1837bc	Add dead code elimination pass (#10101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10101 Simple DCE enabled by knowledge of the actual outputs (stacked beneath this diff) Reviewed By: yinghai Differential Revision: D9107853 fbshipit-source-id: 0c38fe5fe408be2b7fc9e1fe6a5b7160c06ce79b	2018-09-05 23:55:17 -07:00
Teng Li	220c9e52b9	Distributed Data Parallel CPU module for C10D (#11168 ) Summary: Distributed Data Parallel CPU module for c10d. This is basically the same code as Distributed Data Parallel CPU module for THD, since c10d now has the exact same front-end interface as torch.distributed. We will keep both in the first release and remove the THD one once c10d is stable enough. Test fully covered just as THD too. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11168 Differential Revision: D9674963 Pulled By: teng-li fbshipit-source-id: ecf52a7189374ca7930c2be305218167fdd822a7	2018-09-05 21:59:31 -07:00
Jerry Zhang	126ac4b71f	Back out "[pt1][tensor] Add strides to caffe2::Tensor" Summary: Original commit changeset: 3643871b70f1 Differential Revision: D9665958 fbshipit-source-id: 46e22adbf39af92fb23abb66212991bd53a86317	2018-09-05 20:39:07 -07:00
Orion Reblitz-Richardson	fb836db4b2	Fix conv gradient conversion (#11312 ) Summary: Fix Windows build failure after https://github.com/pytorch/pytorch/pull/10744 landed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11312 Reviewed By: mingzhe09088 Differential Revision: D9669907 Pulled By: orionr fbshipit-source-id: d717ec4f8fdf17acf334528d7838b88c5c50e9c3	2018-09-05 20:09:31 -07:00
Peter Goldsborough	dccd0f2de6	Bag of clang tidy fixes for torch/csrc/ and torch/csrc/autograd (#11050 ) Summary: Linting `torch/csrc/` (non-recursive) and `torch/csrc/autograd` (non-recursive). Fixed things like: - `typedef` vs `using` - Use `.empty()` instead of comparing with empty string/using `.size() == 0` - Use range for loops instead of old style loops (`modernize-`) - Remove some `virtual` + `override` - Replace `stdint.h` with `cstdint` - Replace `return Type(x, y)` with `return {x, y}` - Use boolean values (`true`/`false`) instead of numbers (1/0) - More ... ezyang apaszke cpuhrsch Pull Request resolved: https://github.com/pytorch/pytorch/pull/11050 Differential Revision: D9597505 Pulled By: goldsborough fbshipit-source-id: cb0fb4793ade885a8dbf4b10484487b84c64c7f2	2018-09-05 19:55:50 -07:00
Tongzhou Wang	83a1ab2136	Sparse tensor printing; add NotImplemented autograd fn (#10181 ) Summary: Commits: 1. Add autograd function `NotImplemented` (subclass of `Error`) so python `grad_fn` prints nicer. Since `Error` is used in `DelayedError` to implement `oncedifferentiable`, I can't just change its name. cc colesbury 2. Add printing for sparse tensors. Fixes https://github.com/pytorch/pytorch/issues/9412 . cc weiyangfb The controller you requested could not be found. . 3. Add tests for sparse printing Examples: ```diff In [2]: x = torch.sparse.FloatTensor(torch.arange(4).view(2,2), torch.randn(2, 2), [10, 10, 2]) In [3]: x Out[3]: - torch.sparse.FloatTensor of size (10,10,2) with indices: - tensor([[0, 1], - [2, 3]]) - and values: - tensor([[-1.1832, -0.5927], - [ 0.0831, 0.2511]]) + tensor(indices=tensor([[0, 1], + [2, 3]]), + values=tensor([[ 1.5081, 0.3451], + [-0.0392, 0.4776]]), + size=(10, 10, 2), nnz=2, layout=torch.sparse_coo) In [4]: x.requires_grad_() Out[4]: - torch.sparse.FloatTensor of size (10,10,2) with indices: - tensor([[0, 1], - [2, 3]], grad_fn=<Error>) - and values: - tensor([[-1.1832, -0.5927], - [ 0.0831, 0.2511]], grad_fn=<Error>) + tensor(indices=tensor([[0, 1], + [2, 3]]), + values=tensor([[ 1.5081, 0.3451], + [-0.0392, 0.4776]]), + size=(10, 10, 2), nnz=2, layout=torch.sparse_coo, requires_grad=True) In [5]: x + x Out[5]: - torch.sparse.FloatTensor of size (10,10,2) with indices: - tensor([[0, 1], - [2, 3]], grad_fn=<Error>) - and values: - tensor([[-2.3664, -1.1855], - [ 0.1662, 0.5021]], grad_fn=<Error>) + tensor(indices=tensor([[0, 1], + [2, 3]]), + values=tensor([[ 3.0162, 0.6902], + [-0.0785, 0.9553]]), + size=(10, 10, 2), nnz=2, layout=torch.sparse_coo, grad_fn=<AddBackward0>) In [6]: x.double() Out[6]: - torch.sparse.DoubleTensor of size (10,10,2) with indices: - tensor([[0, 1], - [2, 3]], grad_fn=<Error>) - and values: - tensor([[-1.1832, -0.5927], - [ 0.0831, 0.2511]], dtype=torch.float64, grad_fn=<Error>) + tensor(indices=tensor([[0, 1], + [2, 3]]), + values=tensor([[ 1.5081, 0.3451], + [-0.0392, 0.4776]]), + size=(10, 10, 2), nnz=2, dtype=torch.float64, layout=torch.sparse_coo, + grad_fn=<NotImplemented>) In [7]: x = torch.sparse.FloatTensor(torch.ones(0, 2, dtype=torch.long), torch.randn(2, 0), [0]) In [8]: x Out[8]: - torch.sparse.FloatTensor of size (0,) with indices: - tensor([], size=(0, 2), dtype=torch.int64) - and values: - tensor([], size=(2, 0)) + tensor(indices=tensor([], size=(0, 2)), + values=tensor([], size=(2, 0)), + size=(0,), nnz=2, layout=torch.sparse_coo) In [9]: x = torch.sparse.FloatTensor(torch.ones(0, 2, dtype=torch.long), torch.randn(2), []) In [10]: x Out[10]: - torch.sparse.FloatTensor of size () with indices: - tensor([], size=(0, 2), dtype=torch.int64) - and values: - tensor([-0.0064, 0.8518]) + tensor(indices=tensor([], size=(0, 2)), + values=tensor([ 0.9800, -0.5978]), + size=(), nnz=2, layout=torch.sparse_coo) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10181 Differential Revision: D9139845 Pulled By: SsnL fbshipit-source-id: 353eebd55fac4049ed9bf85f8b0ee2c1418a744e	2018-09-05 19:41:22 -07:00
Bram Wasti	fa147abda4	Add convertToCaffe2Proto to python API Summary: Closing the gap a bit on API, allowing users to go NetDef -> nomnigraph -> NetDef in python now Reviewed By: duc0 Differential Revision: D9670495 fbshipit-source-id: 6497518ffc05a186deb0d657e06317980d39ddd5	2018-09-05 18:40:48 -07:00
Wei Yang	425ea6b31e	fix doc for functional.dropout* (#10417 ) Summary: - fixes #4177 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10417 Differential Revision: D9542876 Pulled By: weiyangfb fbshipit-source-id: 480ed973d1fe0364f4acb5cd596c2031895b82df	2018-09-05 17:26:00 -07:00
Jongsoo Park	ad116210e5	typo fix Tranpose2D -> Transpose2D (#11281 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11281 A simple typo fix Reviewed By: BIT-silence Differential Revision: D9658324 fbshipit-source-id: b6513c8d12d8fe75a9b18df1b443e9e66e692744	2018-09-05 17:25:58 -07:00
Christian Puhrsch	a9d8b021e9	Remove THFinalizer Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11287 Reviewed By: ezyang Differential Revision: D9662341 Pulled By: cpuhrsch fbshipit-source-id: 306bea00694db1ae207167ee4bf10de01426911c	2018-09-05 16:56:27 -07:00
Jesse Hellemn	c0efe6f027	Forward declarations of needed curand functions (#10911 ) Summary: Needed for FULL_CAFFE2=1 with statically linked CUDA libraries. Waiting on advice from Nvidia Pull Request resolved: https://github.com/pytorch/pytorch/pull/10911 Reviewed By: pjh5 Differential Revision: D9636256 Pulled By: orionr fbshipit-source-id: fcad7945910b6c8fb5f52e81cc87dad5fcfb3c65	2018-09-05 16:56:26 -07:00
Duc Ngo	57728f71e7	nomnigraph - simplify core graph API and test (#11256 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11256 - in deleteNode method, remove optional deleteEdge flag as it's not used - in deleteEdge method, remove optional removeRef flag as it's not used - in replaceNode method, remove optional newHead_ parameter as it's not used - also simplifying the implementation by just calling replaceInEdges and replaceOutEdges - remove importNode & importEdge as they're not in used - add getEdgeIfExists that is like getEdge() but returns nullptr instead of throwing when the edge does not exist - reduce verbosity in the basic graph unit test and add more test cases for ReplaceEdges Differential Revision: D9650913 fbshipit-source-id: 6c18b37bef0d2abe1b57fb4fc47bfdbcee387694	2018-09-05 16:40:49 -07:00
Peter Goldsborough	c43187291c	Small fixes to cppdocs for sync script (#11300 ) Summary: I'm setting up an automatic sync job for cppdocs and need two fixes to the cpp docs config: 1. Right now the cppdocs use the `torch` package to figure out the version. For C++ docs all I really need from the built package are the generated Tensor.h and Functions.h files. I can actually generate those directly via `aten/src/ATen/gen.py`, so I can skip building PyTorch altogether and save 10 minutes in the sync job! For this I need to avoid using the torch package in the docs. 2. Internal proxy issues prevent using the git link for sphinx_rtd_theme. We can just use the pip package for the cppdocs (not for the normal PyTorch docs) soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11300 Differential Revision: D9667193 Pulled By: goldsborough fbshipit-source-id: 5567e0b3d3bdce03f5856babdb4ff76bcee91846	2018-09-05 16:40:47 -07:00
Will Feng	c9e66351a7	Port all PyTorch and Caffe2 jobs to CircleCI (#11264 ) Summary: This PR adds all PyTorch and Caffe2 job configs to CircleCI. Steps for the CircleCI mini-trial: - [ ] Make sure this PR passes Jenkins CI and fbcode internal tests - [x] Approve this PR - [ ] Ask CircleCI to turn up the number of build machines - [ ] Land this PR so that the new `.circleci/config.yml` will take effect Several Caffe2 tests are flaky on CircleCI machines and hence skipped when running on CircleCI. A proper fix for them will be worked on after a successful mini-trial. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11264 Differential Revision: D9656793 Pulled By: yf225 fbshipit-source-id: 7832e90018f3dff7651489c04a179d6742168fe1	2018-09-05 16:28:11 -07:00
Jerry Zhang	9f4bcdf075	caffe2::DeviceType -> at::DeviceType (#11254 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11254 Previously we use DeviceType in caffe2.proto directly, but it's an `enum` and have implicit conversion to int, which does not have type safety, e.g. we have to explicitly check for a device type is valid in event.h: ``` template <int d> struct EventCreateFunctionRegisterer { explicit EventCreateFunctionRegisterer(EventCreateFunction f) { static_assert(d < MaxDeviceTypes, ""); Event::event_creator_[d] = f; } }; ``` at::DeviceType is an `enum class`, and it does not have implicit conversion to int, and provides better type safety guarantees. In this diff we have done the following refactor(taking CPU as an example): 1. caffe2::DeviceType → caffe2::DeviceTypeProto 2. caffe2::CPU → caffe2::PROTO_CPU 3. caffe2::DeviceType = at::DeviceType 4. caffe2::CPU = at::DeviceType::CPU codemod -d caffe2/caffe2 --extensions h,cc,cpp 'device_type, ' 'device_type(), PROTO_' + some manual changes In short, after this diff, in c++, caffe2::CPU refers to the at::DeviceType::CPU and the old proto caffe2::CPU will be caffe2::PROTO_CPU. In python side, we have a temporary workaround that alias `caffe2_pb2.CPU = caffe2_pb2.PROOT_CPU` to make the change easier to review and this will be removed later. Reviewed By: ezyang Differential Revision: D9545704 fbshipit-source-id: 461a28a4ca74e616d3ee183a607078a717fd38a7	2018-09-05 16:28:09 -07:00
Yan Zhu	ac9f0a6884	refactor preproc, support dense in TumHistory layer Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11131 Reviewed By: xianjiec Differential Revision: D9358415 fbshipit-source-id: 38bf0e597e22d540d9e985ac8da730f80971d745	2018-09-05 16:10:13 -07:00
Natalia Gimelshein	3e85685f8f	add persistent rnns with conservative criteria (#11248 ) Summary: Persistent rnns provide much better performance on V100 with half input data for a variety of cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11248 Differential Revision: D9665687 Pulled By: ezyang fbshipit-source-id: 2bd09a7eb1f5190aadb580977b0ba956e21a7dd5	2018-09-05 16:10:11 -07:00
Richard Zou	68c2e014cb	Handling for py2/py3 division differences (#11016 ) Summary: - In Python 2, use of `/` (regardless of int/float/Tensor) causes a compiler error if `from __future__ import division` is not imported in the file. - The / operator is universally set to do "true" division for integers - Added a `prim::FloorDiv` operator because it is used in loop unrolling. The error if users use '/' in python 2 without importing from __future__ occurs when building the JIT AST. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11016 Differential Revision: D9613527 Pulled By: zou3519 fbshipit-source-id: 0cebf44d5b8c92e203167733692ad33c4ec9dac6	2018-09-05 14:57:38 -07:00
Pieter Noordhuis	9a0effb92c	Update send/recv tests to reflect intended use (#11275 ) Summary: The existing tests had every rank run send to every other rank and only then switch to recv mode. This only works if the send operations are non-blocking and the passed tensors are immediately copied to some kind of send buffer. Instead, every send must be matched with a recv on the other side, because from the API perspective they may block. E.g. imagine a 1GB tensor being sent to every other rank. It can only go through if there is a recv on the other side, or it will deadlock. This change reflects this in the send/recv unit tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11275 Differential Revision: D9658197 Pulled By: pietern fbshipit-source-id: fb6a3fc03b42343a9dfeed0def30d94914e76974	2018-09-05 14:40:04 -07:00
Martin Schatz	8da081f7a5	Add cost inference to ConvGradient and WeightedSum operators (#10744 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10744 As title Reviewed By: jspark1105 Differential Revision: D9436387 fbshipit-source-id: 578b7a6d98843d57e3f8f4c564727e9cadbedd78	2018-09-05 13:56:05 -07:00
Christian Puhrsch	4fe3356ee0	Move collapse dims into a single place (#11272 ) Summary: Deduplicates implementations and reduces sources of failure Pull Request resolved: https://github.com/pytorch/pytorch/pull/11272 Differential Revision: D9659167 Pulled By: cpuhrsch fbshipit-source-id: 759bfba4fd90795038afe684d9829f5f41f98109	2018-09-05 12:57:00 -07:00
Tongzhou Wang	5e2067ce30	Fix some more warnings (#11257 ) Summary: Found these when compiling the new master with gcc 7.3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11257 Differential Revision: D9656612 Pulled By: SsnL fbshipit-source-id: 7acb19e13204c010238dab7bc6973cc97b96f9a4	2018-09-05 11:10:27 -07:00
Lu Fang	f866574afc	Fix the batchnorm onnx exporting when affine=False Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11249 Reviewed By: Ac2zoom Differential Revision: D9652526 Pulled By: houseroad fbshipit-source-id: 12a9038beddd227a2f9e2178edf4e8d623488c3e	2018-09-05 11:10:25 -07:00
Adam Paszke	55212507a2	Improve error message to include return types too (#11245 ) Summary: Fixes #11057. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11245 Differential Revision: D9652698 Pulled By: apaszke fbshipit-source-id: 4c5006e32e599c35367aa5acfae45de3ab8ac176	2018-09-05 10:56:51 -07:00
Peter Goldsborough	e6d6aed12e	Check doxygen output in travis (#11124 ) Summary: This PR adds a .travis.yml check for our C++ documentation. The goal is to avoid any documentation/comments in our C++ code that would break the doxygen output and possibly ruin the C++ documentation site (currently https://pytorch.org/cppdocs). For this, we: 1. Run doxygen and record any warnings, 2. Filter out some known bogus warnings, 3. Count the remaining warnings, 4. Fail the check if (3) is non-zero. soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11124 Differential Revision: D9651011 Pulled By: goldsborough fbshipit-source-id: 30f776d23bb6d6c482c54db32828b4b99547e87b	2018-09-05 10:25:56 -07:00
Thomas Viehmann	267e1ec112	Accept more numpy scalars as doubles (#9659 ) Summary: Allows mulitplication of e.g. numpy.float32 with tensors. This came up with #9468 If you want this and after the other patch is done, I'll add tests (but that would be conflicting, so I prefer to wait). Pull Request resolved: https://github.com/pytorch/pytorch/pull/9659 Differential Revision: D8948078 Pulled By: weiyangfb fbshipit-source-id: c7dcc57b63e2f100df837f70e1299395692f1a1b	2018-09-05 10:25:55 -07:00
Dmitrii Marin	8bd80a6b74	Fixed log message (#10874 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10874 Fixes the log message "WARNING:data_workers:Warning, data loading lagging behind: name=0" where instead of source name the size of a queue is reported Reviewed By: panshen1, Novitial Differential Revision: D9506606 fbshipit-source-id: 03717cfa9b991afb335ef877378afa3b52fd8f22	2018-09-05 09:55:52 -07:00
Neeraj Pradhan	434e943b08	Fix to distribution.__repr__ with lazy attributes (#11263 ) Summary: `__repr__` currently fails for distributions with lazy attributes in PyTorch master, throwing a `KeyError`. This fixes the issue. Additionally: - Added `logits` to `arg_constraints` for distributions that accept either `probs` or `logits`. This is both to have `__repr__` display the `logits` param when available, and to be able to do validation checks (e.g. NaN checks) when the logit parametrization is used. fritzo, alicanb - I think there were reasons why we had not done so in the first place, but I am unable to recall now. It passes all the tests, but let me know if there is something that I am missing at the moment. - There are certain distributions, e.g. `OneHotCategorical` which won't show any parameters because it uses a `categorical` instance under the hood and neither `logits` / `probs` in `arg_constraints` are present in the instance's `__dict__`. This isn't addressed in this PR. cc. vishwakftw, fritzo, nadavbh12, apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11263 Differential Revision: D9654959 Pulled By: apaszke fbshipit-source-id: 16f5b20243fe8e2c13e9c528050d4df0b8ea6e45	2018-09-05 09:55:51 -07:00
Roy Li	9fc22cb772	Add import export step to end to end tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10717 Differential Revision: D9562888 Pulled By: li-roy fbshipit-source-id: 8f5d62fd0a44aca0a41dc10438e7bb91cc2a972a	2018-09-05 09:39:47 -07:00
Edward Yang	1808e368e4	Add complex hooks for out of tree complex implementation. (#11216 ) Summary: This PR adds a hooks interface for registering types for complex scalar types, and a sample implementation of the hook in test_cpp_extensions. The hook registration is patterned off of the existing CUDA hooks. Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC The controller you requested could not be found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11216 Differential Revision: D9654840 Pulled By: ezyang fbshipit-source-id: 7b97646280d584f8ed6e14ee10a4abcd04cf2987	2018-09-05 09:25:50 -07:00
Christian Puhrsch	aeb6094538	Unify opt flag for cmake codegen (#11227 ) Summary: Also enables debug for non-MSVC for kernel codegen Pull Request resolved: https://github.com/pytorch/pytorch/pull/11227 Differential Revision: D9656506 Pulled By: cpuhrsch fbshipit-source-id: 667195cb55de1a1a9042b6b1c4436e9c6c743333	2018-09-05 08:55:49 -07:00
Duc Ngo	d612855b91	nomnigraph - fix memory error in NN subgraph matchOp (#11127 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11127 it's invalid to capture `predicate` by reference as it's a local variable. capture it by value instead. Differential Revision: D9600115 fbshipit-source-id: 92e0130d0a74908380b75ade5c3492df49e25941	2018-09-05 07:57:40 -07:00
Adam Paszke	6d6655e6be	Port PackedSequences functions to C++ (#11224 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11224 Differential Revision: D9652703 Pulled By: apaszke fbshipit-source-id: 558e39457e590cad07516e5bb2ecb12789564950	2018-09-05 06:35:15 -07:00
Adam Paszke	b7038f7c37	Treat numerical differences as warnings instead of errors when tracing (#11246 ) Summary: Also, make `torch.isclose` work with integral tensors and refactor `_check_trace` a bit. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11246 Differential Revision: D9652701 Pulled By: apaszke fbshipit-source-id: fb0bdbfd1952e45e153541e4d471b423a5659f25	2018-09-05 06:35:13 -07:00
Hector Yuen	b7cd4b692c	add a Float16UniformFill (#11123 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11123 this adds an operator that fills a tensor with a uniform(min, max) the implementation is to use the fp32 generator and convert to fp16 if performance becomes an issue we could resort to intrinsics Reviewed By: jspark1105, chocjy Differential Revision: D9598142 fbshipit-source-id: 5aeab99acf7c3596fa6c33611d9d2c484f7c1145	2018-09-04 23:28:22 -07:00
Thomas Viehmann	d4060d2d0e	Implement torch.tensordot (#10025 ) Summary: Fixes: #8988 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10025 Reviewed By: ezyang Differential Revision: D9540967 Pulled By: yf225 fbshipit-source-id: 6ba2a7777162983977db884b693e6f4543b31aeb	2018-09-04 21:10:07 -07:00
Yiming Wu	d1b920b44f	keep net type info when generating model complete net (#11032 ) Summary: keep net type info when generating model complete net. This will keep the performance optimization option Pull Request resolved: https://github.com/pytorch/pytorch/pull/11032 Reviewed By: wat3rBro Differential Revision: D9564125 Pulled By: harouwu fbshipit-source-id: c6546af9b1d4ff5eddf6124e24a5da1b8baf47df	2018-09-04 21:10:06 -07:00
Edward Yang	56bdd87b40	Get rid of some uses of type() (#11215 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11215 I found these by deleting the implicit conversion of Type to TensorOptions and then fixing sites. This isn't a complete refactor, because I ran out of steam after fixing this many and decided to keep the implicit conversion. Still, why waste a perfectly good refactor? Reviewed By: gchanan, cpuhrsch Differential Revision: D9634750 fbshipit-source-id: 4d8fb778e13e6e24b888b1314a02709b2cb00b62	2018-09-04 20:26:22 -07:00
Edward Yang	9ca63c5e63	Reorganize methods in Type, add CPUTypeDefault/CUDATypeDefault (#11205 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11205 Our short term plan for supporting out of tree complex development requires an external library to add a custom subclass of Type without access to the code generation facilities in ATen. This commit reorganizes Type so as to minimize the amount of boilerplate you have to write when making a subclass of Type. In particular, it: - Creates a new CPUTypeDefault/CUDATypeDefault class, which you are intended to inherit from, which provides default implementations of CPU/CUDA that is layout/dtype agnostic. - Adds new getCPUAllocator() and getCUDAAllocator() functions, as a more public API to get your hands on Allocator - Adds allocator() and getDeviceFromPtr(), abstracting the device specific parts of storage() methods; these methods are now implemented in base TypeDefault. - Delete the static typeString() method, which is now dead. - Move is_cuda/is_sparse/is_distributed to TypeDefault. Reviewed By: SsnL Differential Revision: D9631619 fbshipit-source-id: 40b600d99691230e36e03eb56434c351cbc2aa3a	2018-09-04 20:26:20 -07:00
Peter Goldsborough	f0d3fda064	Improve docs for torch::nn::Module (#11115 ) Summary: Added some documentation. Will rebuild docs to make sure it looks good. Can already accept approvals. ebetica apaszke ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11115 Differential Revision: D9597880 Pulled By: goldsborough fbshipit-source-id: 56b701da631702ba56e281a0de0f7ebe490f5c5a	2018-09-04 18:10:38 -07:00
Gregory Chanan	7f74875304	Pull Context out of TensorMethods. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11241 Reviewed By: ezyang Differential Revision: D9645514 Pulled By: gchanan fbshipit-source-id: 43e65d1d2fa3183264ed7e4752c1512df5f69175	2018-09-04 18:10:37 -07:00
Gregory Chanan	05cb40dc00	Move some includes from Tensor/Type to core. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11234 Reviewed By: ezyang Differential Revision: D9642669 Pulled By: gchanan fbshipit-source-id: 2c131bb46b54a0803c37b444ad48d861080056f1	2018-09-04 18:10:34 -07:00
Orion Reblitz-Richardson	c8672f0b42	Support environments with no libprotobuf (#11161 ) Summary: Just pulling this out of https://github.com/pytorch/pytorch/pull/10611 Make sure we can support environments where we don't have libprotobuf installed when we link-local protobuf. cc goldsborough Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11161 Differential Revision: D9650282 Pulled By: orionr fbshipit-source-id: 447b5e54cd2639973b4b10f58590d1c693a988d4	2018-09-04 17:27:54 -07:00
Teng Li	020501b7b0	Getting rid of USE_C10D for build (#11237 ) Summary: Will use USE_DISTRIBUTED for both c10d and THD Pull Request resolved: https://github.com/pytorch/pytorch/pull/11237 Differential Revision: D9647825 Pulled By: teng-li fbshipit-source-id: 06e0ec9b5e2f8f38780fc88718f8499463e9e969	2018-09-04 17:27:53 -07:00
Christian Puhrsch	313e89d8db	Fix dimension collapsing (#11226 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/11206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11226 Differential Revision: D9646638 Pulled By: cpuhrsch fbshipit-source-id: 104f367f75a4478bb7580324ea3661de71b2c8b0	2018-09-04 17:27:52 -07:00
Gregory Chanan	6219c4a28f	Make Scalar::toTensor a free function, move Scalar to ATen/core. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11125 Reviewed By: ezyang Differential Revision: D9599798 Pulled By: gchanan fbshipit-source-id: 2fec682c109013a82788dfba13f4d30b2945d3f4	2018-09-04 16:25:57 -07:00
Pieter Noordhuis	033499cf56	Remove mention of USE_DISTRIBUTED_MW (#11240 ) Summary: This was lingering after #10731. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/11240 Differential Revision: D9645437 Pulled By: pietern fbshipit-source-id: d02c33354b094be3bb0872cf54a45721e20c4e7d	2018-09-04 16:10:20 -07:00
Mingzhe Li	3f30c296d3	Export CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_* (#11233 ) Summary: This PR resolved the following compilation errors on devgpu: /home/mingzhe0908/pytorch/build/lib/libcaffe2_gpud.so: undefined reference to `caffe2::CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_Tan()' /home/mingzhe0908/pytorch/build/lib/libcaffe2_gpud.so: undefined reference to `caffe2::CAFFE2_PLEASE_ADD_OPERATOR_SCHEMA_FOR_MaxPool3D()' .... The same error has been happening with caffe2 build with debug mode before build_caffe2 was removed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11233 Reviewed By: orionr Differential Revision: D9645527 Pulled By: mingzhe09088 fbshipit-source-id: 68a45aa7fd815cac41b7fd64cfd9838b3226345a	2018-09-04 14:56:43 -07:00
Maxim Naumov	7e0a052a5d	Adding synthetic data generation to the filler.h file (#11060 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11060 Adding synthetic data generation to the filler.h file (the exact distribution to be replaced later on). Reviewed By: highker Differential Revision: D9417594 fbshipit-source-id: 5d66dfbcb254a5961c36b7d3a081332c7372dac7	2018-09-04 13:40:53 -07:00
Zachary DeVito	1eed7d5f0b	Report an error when trying to record a mutable operator when (#11129 ) Summary: there are multiple views of the tensor live. Also adds recording for copy_ because this is the critical in place op where these views will cause LHS indexing to fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11129 Differential Revision: D9600195 Pulled By: zdevito fbshipit-source-id: bfd8f5befa47377e36d704dbdb11023c608fe9a3	2018-09-04 13:40:51 -07:00
Yury Gitman	0e8088d6f6	Fix typo in data_parallel_model Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11086 Differential Revision: D9581297 fbshipit-source-id: b164177bdbb309f56ff3231c1ffc0973f6c5299b	2018-09-04 13:15:31 -07:00
Bram Wasti	ec6f0ed560	Additional Python Bindings Summary: Major change: - Addition of pattern matching bindings Minor change: - OperatorDef instantiation - Generic Graph API Reviewed By: duc0 Differential Revision: D9546205 fbshipit-source-id: ab5274014be23a3e9e3fcf18ae1815c4f387b83c	2018-09-04 12:10:10 -07:00
Elias Ellison	750cd48980	update expect file for short circuiting (#11229 ) Summary: Fix failing test by updating expect file Pull Request resolved: https://github.com/pytorch/pytorch/pull/11229 Differential Revision: D9638587 Pulled By: eellison fbshipit-source-id: e870ef3a4fbc7e07f299cc9413703d9f77e89895	2018-09-04 11:56:09 -07:00
Yangqing Jia	684b55d762	In default, use third party eigen. Added new flag USE_SYSTEM_EIGEN_INSTALL to control. (#11020 ) Summary: TSIA. apaszke pointed out that it might be better to use third party folder in default, since system Eigen may often be out of date and does not have the version we need to compile successfully. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11020 Differential Revision: D9562548 Pulled By: Yangqing fbshipit-source-id: d8ab8a6ebe1f3d9eec638ef726cf5dc4dcf777b5	2018-09-04 10:56:22 -07:00
Elias Ellison	539579aa9a	Logical short circuit (#11116 ) Summary: Adding short circuit evaluation to AND or OR. The second expression of and AND or OR gets lifted into an if branch, which is conditionally evaluated. BatchOps was using the expression `dims = dims1 or dims2`, where dims is often an empty tensor. This nows throws an error, because dims1 gets cast to a boolean, and you can't convert an empty tensor to a scalar. It now matches the behavior of pytorch in python. One thing that came up is if the second expression in an and/or in python gets returned, it does not get coerced to a boolean. `tensor == (False or tensor)` `tensor == (True and tensor)` We do not currently support this. edit: wording Pull Request resolved: https://github.com/pytorch/pytorch/pull/11116 Differential Revision: D9618168 Pulled By: eellison fbshipit-source-id: 93b202be2f222d41f85d38d9c95f04d1749e8343	2018-09-04 09:25:13 -07:00
Edward Yang	b2217109ec	Move TensorOptions to ATen/core Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11147 Reviewed By: gchanan Differential Revision: D9614321 fbshipit-source-id: 618cb342eb7c52181425f6bb9c17b9ecdb87a394	2018-09-04 08:55:54 -07:00
Edward Yang	0ff1bb0d8a	Remove Type constructor from TensorOptions, add Type::options (#11189 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11189 Replaces it with an operator TensorOptions() method on Type, reestablishing the implicit conversion. I originally wanted to get rid of the implicit conversion entirely, but there were a lot of use-sites, so I added it back to avoid a huge codemod. In this patch, I only had to fix sites that used the optional device_index API. Reviewed By: cpuhrsch Differential Revision: D9628281 fbshipit-source-id: 5fe2a68eefb77a3c9bb446f03a94ad723ef90210	2018-09-04 08:10:04 -07:00
Tongzhou Wang	0d5e4a2c66	Allow passing through arguments to unittest (#11209 ) Summary: Example: ```sh python run_test.py -i sparse -- TestSparse.test_factory_size_check -f ``` With this, the `--verbose` option is redundant (one can call `python run_test.py -- -v` instead of `python run_test.py -v`. But since this is (probably) a frequently used flag, I didn't remove the existing easier-to-use option. cc ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11209 Differential Revision: D9632215 Pulled By: SsnL fbshipit-source-id: ff522802da11ef0a0714578be46e4a44f6343d44	2018-09-03 20:09:08 -07:00
Tongzhou Wang	050aa42e09	Fix some more compile warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11208 Differential Revision: D9632216 Pulled By: SsnL fbshipit-source-id: b181f3ce114474e171146cd2ac5de150b0e23f75	2018-09-03 19:39:33 -07:00
Edward Yang	cd4c32691d	Add complex32, complex64 and complex128 dtypes (#11173 ) Summary: We don't generate a corresponding Type implementations for them, so this doesn't do anything at the moment. We don't plan on supporting complex32 in the near future, but it is added to reserve the name and number in case we do at some point in the future. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11173 Reviewed By: SsnL Differential Revision: D9627477 Pulled By: ezyang fbshipit-source-id: f49a44ab1c92d8a33130c249ac7b234f210a65e6	2018-09-03 19:19:36 -07:00
gngdb	c5b021cc88	State dict loading arguments were in the wrong order (#11200 ) Summary: In the state dict loading code, it would print the error message referring to the shape of the loaded parameters and the parameters in the initialised model with the formatting in the wrong order. Swapped them round to fix. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11200 Differential Revision: D9631160 Pulled By: SsnL fbshipit-source-id: 03d9446303bd417fef67027b10d7a27de06486be	2018-09-03 15:42:30 -07:00
Tongzhou Wang	7e2136c2b5	remove allclose from test_doc skipped list Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11187 Differential Revision: D9628349 Pulled By: SsnL fbshipit-source-id: 0ff94666542ca049a6d82091bd9fc79ec1699ac6	2018-09-03 09:39:56 -07:00
iotamudelta	24eb5ad0c5	Fix unit tests on CI (#11191 ) Summary: Disables two of the unit tests in test_cuda that got introduced after test_cuda was enabled that fail on ROCm. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11191 Differential Revision: D9628702 Pulled By: ezyang fbshipit-source-id: 4c298c728f42bb43d39b57967aa3e44385980265	2018-09-02 21:54:47 -07:00
Edward Yang	0a8c8c1dbe	Rename real to scalar_t. (#11163 ) Summary: This is necessary to allow us to use the complex header which defines real (and is very sad if real is macro'ed). We should also fix accreal, ureal, Real and REAL, but only 'real' is the real blocker. ``` codemod -d aten/src/TH --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t codemod -d aten/src/THC --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t codemod -d aten/src/THNN --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t codemod -d aten/src/THCUNN --extensions c,cc,cpp,cu,cuh,h,TARGETS,py,hpp '\breal\b' scalar_t ``` Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11163 Reviewed By: SsnL Differential Revision: D9619906 Pulled By: ezyang fbshipit-source-id: 922cb3a763c0bffecbd81200c1cefc6b8ea70942	2018-09-02 15:26:01 -07:00
Edward Yang	43fd6b234d	Make Type a (mostly) pure virtual class; TypeDefault for impls (#11013 ) (#11013 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11013 Previously, the parent class Type also contained a large number of implementations, for things like broadcasting and native functions that didn't need dispatch. We'd like to be able to reference this interface from Tensor even when we don't have any of these implementations are available. To do this, we convert Type into a truly pure virtual interface, and move all of the implementations to TypeDefault. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11181 Differential Revision: D9561478 Pulled By: ezyang fbshipit-source-id: 13c49d80bc547551adf524b1cf1d691bfe311133	2018-09-02 15:25:59 -07:00
Tongliang Liao	e1a17d5a42	Should not use CAFFE2_API when definition is already in header. (#11114 ) Summary: Remove or use CAFFE2_EXPORT. Fix #11108 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11114 Differential Revision: D9628293 Pulled By: ezyang fbshipit-source-id: dc3bb7dc5bc299e3b6cfd1cdd640f618c206fb5a	2018-09-02 14:39:38 -07:00
pbialecki	cf10efb8d4	Fixes unclear exception message for F.conv2d (#11053 ) Summary: Fixes #11033 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11053 Differential Revision: D9573606 Pulled By: soumith fbshipit-source-id: 9729cbd6c8afcef0fd487bdd425b0d1f55189009	2018-09-02 13:39:34 -07:00
vishwakftw	593d74061f	Document torch.allclose (#11185 ) Summary: - Modify torch.autograd.gradcheck to use torch.allclose instead - Expose doc strings Closes #10355 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11185 Differential Revision: D9628016 Pulled By: soumith fbshipit-source-id: 22a30622b9fe52e41b5b3540406137b59d8c5a75	2018-09-02 09:26:07 -07:00
iotamudelta	33c7cc13ca	improve docker packages, fix bugs, enable tests, enable FFT (#10893 ) Summary: * improve docker packages (install OpenBLAS to have at-compile-time LAPACK functionality w/ optimizations for both Intel and AMD CPUs) * integrate rocFFT (i.e., enable Fourier functionality) * fix bugs in ROCm caused by wrong warp size * enable more test sets, skip the tests that don't work on ROCm yet * don't disable asserts any longer in hipification * small improvements Pull Request resolved: https://github.com/pytorch/pytorch/pull/10893 Differential Revision: D9615053 Pulled By: ezyang fbshipit-source-id: 864b4d27bf089421f7dfd8065e5017f9ea2f7b3b	2018-09-02 08:54:42 -07:00
Samuel Ainsworth	abe8b3391d	LowRankMultivariateNormal cleanup Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11179 Differential Revision: D9627502 Pulled By: soumith fbshipit-source-id: c7a4aa8be24bd8c688a7c655ff25ca901ed19704	2018-09-02 07:54:56 -07:00
Marcin Elantkowski	4d28b65fb8	fix serialization of nn.Parameter with dill (#10296 ) Summary: Should resolve #9981. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10296 Differential Revision: D9196353 Pulled By: soumith fbshipit-source-id: 109b6da42b7240cdbc7a0586745c735bce5e1279	2018-09-01 23:55:40 -07:00
Tongzhou Wang	1350f76b62	Fix max and min with inf on CUDA (#11091 ) Summary: Fixes #10237 #11084 cc vishwakftw Pull Request resolved: https://github.com/pytorch/pytorch/pull/11091 Differential Revision: D9582859 Pulled By: SsnL fbshipit-source-id: 3991c0a2af65ba82fa815b82f9e6b2107912fd10	2018-09-01 23:09:23 -07:00
Owen Anderson	7eba9849c1	Pool constants during script compilation. (#10231 ) Summary: This places all constants in the entry block of the graph, and de-duplicates them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10231 Differential Revision: D9601501 Pulled By: resistor fbshipit-source-id: daa10ed8c99e9894830d6f3e5d65c8d3ab5ea899	2018-09-01 22:40:50 -07:00
Edward Yang	7af6f9515f	Move TensorAccessor to ATen/core Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11014 Reviewed By: cpuhrsch Differential Revision: D9561802 fbshipit-source-id: d3dbe6d7e76e2419ead81fb448711f101daee19f	2018-09-01 21:41:26 -07:00
Tongzhou Wang	011f615945	Fix compile warnings Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11177 Reviewed By: soumith Differential Revision: D9626443 Pulled By: SsnL fbshipit-source-id: e75d893e1e91e49d3e7b021892434489d8df7987	2018-09-01 21:41:25 -07:00
Adam Paszke	1506547771	Disable -Werror on macOS test build (#11090 ) Summary: cc goldsborough Pull Request resolved: https://github.com/pytorch/pytorch/pull/11090 Reviewed By: soumith Differential Revision: D9582525 Pulled By: apaszke fbshipit-source-id: 5d2c6e930e7b09f0ed5a35fbf4fe36b8845a2580	2018-09-01 21:09:49 -07:00
Soumith Chintala	f60a2b682e	allow spaces in filename for jit-compiled cpp_extensions (#11146 ) Summary: Now, folder having spaces will not error out for `torch.utils.cpp_extensionload(name="xxx", sources=["xxx.cpp"], verbose=True)` calls. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11146 Differential Revision: D9618838 Pulled By: soumith fbshipit-source-id: 63fb49bfddc0998dccd8a33a6935543b1a6c2def	2018-09-01 20:39:51 -07:00
James Reed	43e73f85ad	Dont optimize slicing dispatch when we are tracing (#11156 ) Summary: Previously when we had a slicing expression like `x[0:5, 0]`, where the sliced tensor was of size `5` in dimension 0, we would skip dispatching the actual slice call as an optimization. This caused incorrect behavior under tracing, as we would not record the slice op and thus if we encountered an input with a different shape while running the trace, we would get incorrect results. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11156 Differential Revision: D9622252 Pulled By: jamesr66a fbshipit-source-id: 822f2e8f01504e131f53bd9ef51c171c7913a7cc	2018-09-01 17:13:03 -07:00
Xiaomeng Yang	b3d559cdd1	Optimize WeightedSumOp for two inputs (#11049 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11049 Optimize WeightedSumOp for two inputs Reviewed By: houseroad Differential Revision: D9566692 fbshipit-source-id: 9aab1f02251d386b6f7d0699ae11eeb2ea2b5b4f	2018-09-01 11:54:55 -07:00
Shihao Xu	b834d9107e	Revert D9566744: [New Checkpoint] Kill the dummy TaskOutput when task.get_step() (#11164 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11164 Revert D9566744 Reviewed By: enosair Differential Revision: D9620272 fbshipit-source-id: 6a78c46929f66bd11969840cb6b107f734be0c02	2018-08-31 22:25:57 -07:00
Lu Fang	1b7172a2b9	fix the slice onnx exporting Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11117 Reviewed By: MisterTea Differential Revision: D9597870 Pulled By: houseroad fbshipit-source-id: 3a2a307ee327397939bedb9150f780682e18a89a	2018-08-31 17:40:03 -07:00
James Reed	03c06ec93d	Traceable detach (#11038 ) Summary: This makes it so `detach` and `detach_` are traceable and also adds a pass to erase them before ONNX export Pull Request resolved: https://github.com/pytorch/pytorch/pull/11038 Differential Revision: D9588038 Pulled By: jamesr66a fbshipit-source-id: 263dd3147e24fcb0c716743f37fdb9f84c0015e7	2018-08-31 16:40:42 -07:00
Christian Puhrsch	861e1c430c	Move StorageImpl and Storage to core (#11154 ) Summary: Will need to be accessible by caffe2 This also removes a bunch of unnecessary includes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11154 Reviewed By: ezyang Differential Revision: D9618681 Pulled By: cpuhrsch fbshipit-source-id: 838a87b75d9c3959e145fd5fca13b63bc5de7bd3	2018-08-31 15:55:26 -07:00
Peter Goldsborough	4abddad1a0	use py::str to remove deprecation warnings (#11107 ) Summary: ``` In file included from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/cast.h:13:0, from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/attr.h:13, from third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/pybind11.h:43, from caffe2/torch/csrc/utils/pybind.h:6, from caffe2/torch/csrc/jit/pybind.h:5, from caffe2/torch/csrc/jit/script/init.h:3, from caffe2/torch/csrc/jit/script/init.cpp:1: third-party-buck/gcc-5-glibc-2.23/build/pybind11/889256a/include/pybind11/pytypes.h:118:19: note: declared here In file included from caffe2/torch/csrc/jit/pybind.h:12:0, from caffe2/torch/csrc/jit/python_ir.cpp:4: caffe2/torch/csrc/jit/pybind_utils.h: In function 'torch::jit::IValue torch::jit::argumentToIValue(const torch::jit::FunctionSchema&, size_t, pybind11::handle)': caffe2/torch/csrc/jit/pybind_utils.h:138:226: warning: 'pybind11::str pybind11::detail::object_api<Derived>::str() const [with Derived = pybind11::detail::accessor<pybind11::detail::accessor_policies::str_attr>]' is deprecated: Use py::str(obj) instead [-Wdeprecated-declarations] ``` apaszke zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11107 Differential Revision: D9598040 Pulled By: goldsborough fbshipit-source-id: 4a055353ac08d54a2bbca49573ff099310de3666	2018-08-31 15:25:04 -07:00
Lu Fang	c48bf3a77e	Automatic update of fbcode/onnx to 1b09eb14c2c781fae078fa6b1c0390ba6fc0898c (#11153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11153 Previous import was bae6333e149a59a3faa9c4d9c44974373dcf5256 Included changes: - [1b09eb1](https://github.com/onnx/onnx/commit/1b09eb1): Fix the shape inference for concat (#1361) <Lu Fang> - [7b9b3ee](https://github.com/onnx/onnx/commit/7b9b3ee): ONNX v1.3.0 release (#1359) <bddppq> Reviewed By: Ac2zoom Differential Revision: D9615844 fbshipit-source-id: f1d4e2d6ef72a269d6ab3c1c347b272b5bdc4f2a	2018-08-31 14:55:15 -07:00
Peter Goldsborough	5987b44dda	Remove aten doc/ folder (#11158 ) Summary: ATen's doc/ folder is manually maintained and can thus cause confusion with the generated file. We now have proper online documentation for ATen, which is superior to ATen doc/. Let's delete ATen/doc. ezyang apaszke soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/11158 Differential Revision: D9618782 Pulled By: goldsborough fbshipit-source-id: 0ef14f84947601a0589aa4a41e5c8619783426fe	2018-08-31 14:55:13 -07:00
Adam Paszke	3081c8ea1d	Lower trivial differentiable subgraphs (#11110 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11110 Differential Revision: D9616408 Pulled By: apaszke fbshipit-source-id: f1ae77d698bf0ada32f2c1c3f587e46a4f57a867	2018-08-31 14:55:10 -07:00
Christian Puhrsch	c87d082d26	Use ->data<real>() instead of THTensor_(data) and c10::raw::intrusive_ptr::decref instead of _free (#11039 ) Summary: Codemod used for this ``` grep -rnw "THTensor_(free)" aten \| grep -v Binary \| cut -f 1 -d ":" \| xargs -I {} sed -i "s/THTensor_(free)($[^)]$)/c10::raw::intrusive_ptr::decref(\1)/g" {} ``` ``` grep -rnw "THTensor_(data)" aten \| grep -v Binary \| cut -f 1 -d ":" \| xargs -I {} sed -i "s/THTensor_(data)($[^)]$)/\1->data<real>()/g" {} ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/11039 Reviewed By: ezyang Differential Revision: D9617265 Pulled By: cpuhrsch fbshipit-source-id: d9e7581867a335703f82f4556cead2b32b97bd83	2018-08-31 14:27:09 -07:00
Edward Yang	adeebed549	Delete TensorImpl::toString() (#11035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11035 Instead, inline its definition into Tensor. We need to do this so we can avoid needing to getType() from TensorImpl. Reviewed By: cpuhrsch Differential Revision: D9564516 fbshipit-source-id: 19fdaa2b93419e21572b9916714aee4165cb3390	2018-08-31 14:27:08 -07:00
Edward Yang	5286925d4a	Add getMaybeVariableType(const TensorImpl*) (#11031 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11031 The eventual plan is to get rid of TensorImpl::type() entirely; but first we need a function to call. Reviewed By: cpuhrsch Differential Revision: D9564206 fbshipit-source-id: b59a9ccfaed44199f185eff392835cec89ccda8e	2018-08-31 14:27:06 -07:00
Edward Yang	2c5ae8c4bf	Get rid of type() method on TensorOptions; use at::getType instead (#11023 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11023 I'd like TensorOptions to not know anything about Context, so I can move it to ATen/core without pulling in Context. To do this, the type() method has to go, since it consults the context to get a Type. Reviewed By: cpuhrsch Differential Revision: D9562467 fbshipit-source-id: 61a18a76eb042a5e70b64b963501e9d68c25d4f0	2018-08-31 14:27:05 -07:00
Edward Yang	fd110411b7	Don't convert TensorOptions to type before printing. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11145 Reviewed By: cpuhrsch Differential Revision: D9613897 fbshipit-source-id: eaa28b24992e8202cecb5ab97fa541fcf49a205f	2018-08-31 14:27:03 -07:00
Edward Yang	48c2f3cf0f	Move TensorOptions Tensor methods to TensorMethods.h (#11144 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11144 We can move them now that TensorMethods no longer references Tensor. Reviewed By: cpuhrsch Differential Revision: D9613800 fbshipit-source-id: 99ad1dd7d77eb319000769230b7016294cf1980f	2018-08-31 14:27:02 -07:00
Adam Paszke	780d2792c5	Warn about non-traceable behavior when tracing (#11088 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11088 Differential Revision: D9585527 Pulled By: apaszke fbshipit-source-id: 29a03cb152d83b626f748fff4501ac9e139994c2	2018-08-31 14:27:00 -07:00
Peter Goldsborough	c31ebccd01	Clean up TupleType and SchemaParser (#11007 ) Summary: Some fixes to address your comments zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/11007 Differential Revision: D9597750 Pulled By: goldsborough fbshipit-source-id: f35f4801707dff2367e9dfc7d4e968357bc2b832	2018-08-31 14:26:59 -07:00
Sebastian Messmer	f4b2961af9	Simplify assignment operators (#11027 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11027 Using swap() as a primitive, copy and move assignment become much easier. Reviewed By: ezyang Differential Revision: D9563753 fbshipit-source-id: e74faf39b596f097de758bfe038639565807040a	2018-08-31 13:43:41 -07:00
Orion Reblitz-Richardson	6508db7421	Remove BUILD_CAFFE2 and build everything (#8338 ) Summary: This completely removes BUILD_CAFFE2 from CMake. There is still a little bit of "full build" stuff in setup.py that enables USE_CUDNN and BUILD_PYTHON, but otherwise everything should be enabled for PyTorch as well as Caffe2. This gets us a lot closer to full unification. cc mingzhe09088, pjh5, ezyang, smessmer, Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/8338 Reviewed By: mingzhe09088 Differential Revision: D9600513 Pulled By: orionr fbshipit-source-id: 9f6ca49df35b920d3439dcec56e7b26ad4768b7d	2018-08-31 13:10:24 -07:00
Edward Yang	a2a584f347	Proper recompilation tracking for more files in tools/autograd (#11143 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11143 Differential Revision: D9613758 Pulled By: ezyang fbshipit-source-id: 08ed143739438435e0e8219dff3a738ab424c3e1	2018-08-31 13:10:21 -07:00
Teng Li	3791bd12c8	PT1 Release Milestone No.2 MPI Group Support with all tests passed (#11128 ) Summary: Added MPI group support. And this will make all previous group test cases of MPI passed. Also, release the MPI thread level support by serializing different PG's MPI ops. This is required. The build is fixed too Pull Request resolved: https://github.com/pytorch/pytorch/pull/11128 Differential Revision: D9602188 Pulled By: teng-li fbshipit-source-id: 1d618925ae5fb7b47259b23051cc181535aa7497	2018-08-31 12:39:56 -07:00
Edward Yang	d95e68c8cc	Delete Tensor constructor from TensorOptions. (#11101 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11101 I'd like to invert the dependency between Tensor and TensorOptions (such that Tensor includes TensorOptions); to do this, I'd prefer there to not be a Tensor constructor. Eventually, all references of Tensor will disappear from TensorOptions.h Reviewed By: cpuhrsch Differential Revision: D9585627 fbshipit-source-id: dd4a28b2c06b1e55f629762915f03c2b6c34d840	2018-08-31 09:55:01 -07:00
Edward Yang	a585158c9e	Some usage examples for TensorOptions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11081 Reviewed By: goldsborough Differential Revision: D9579371 fbshipit-source-id: 329a07fc2e58f57384c8a840bcdebc2c6d4f7bb1	2018-08-31 09:40:30 -07:00
Hector Yuen	e2bdd35cf0	fixes to device.cc (#11122 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11122 these changes add fixes to device.cc that are appropriate to create the intra-device-copies for opencl Reviewed By: bwasti Differential Revision: D9553292 fbshipit-source-id: e59f17916b5df30a504adee0718f9cecfe28f35a	2018-08-31 09:25:26 -07:00
Edward Yang	f30fd7fb5c	Get rid of the runtime type in TensorOptions (#11021 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11021 We can now store a boolean saying if we want a Variable or not, and context can use VariableHooks to get a VariableType if we request one. Reviewed By: cpuhrsch Differential Revision: D9562312 fbshipit-source-id: 84653cd789622764132252406a5ea1a83eee3360	2018-08-31 09:10:52 -07:00
Edward Yang	1db5a7d8f0	Move variable getType lookup support to Context Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11017 Reviewed By: cpuhrsch Differential Revision: D9562197 fbshipit-source-id: dd00c79592d6c59f2e21c9d62fea3a2c093b609b	2018-08-31 09:10:51 -07:00
Edward Yang	9fac0a5093	Rename at::getType to at::getNonVariableType (#11096 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11096 To discourage willy-nilly use, and make it clearer that it is not a Variable Reviewed By: cpuhrsch Differential Revision: D9583699 fbshipit-source-id: 4fbde0c01ae3deb2c7ef8c125a9028f089b203ae	2018-08-31 09:10:49 -07:00
Adam Paszke	0961c923c0	Unbreak the build Summary: The controller you requested could not be found. fbshipit-source-id: 861021dbe88f84d1a8bd80e04dd684527384629f	2018-08-31 08:13:12 -07:00
Edward Yang	3073051a18	Revert D9554375: Support lr adaption for SparseAdam and RowWiseSparseAdam Differential Revision: D9554375 Original commit changeset: b88768f470ef fbshipit-source-id: 2c103c616c8680684892c7d9085fd7bb8289d2f1	2018-08-31 07:54:31 -07:00
Adam Paszke	82aeebb3d9	Fix a bug in addmm fusion in the JIT (#11100 ) Summary: Fixes #10839. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/11100 Differential Revision: D9585533 Pulled By: apaszke fbshipit-source-id: 19e2710c8fc113f577faf14c080d8c89afbe23c4	2018-08-31 07:24:34 -07:00
Chenguang Xi	0555768e0f	Support lr adaption for SparseAdam and RowWiseSparseAdam (#10993 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10993 as title Reviewed By: chocjy Differential Revision: D9554375 fbshipit-source-id: b88768f470ef7d023dd481c6a97b91594892f422	2018-08-31 00:55:39 -07:00
Shashank Singh	f1bfe6750f	Back out "[caffe2] Update blackbox predictor with new constructor" (#11105 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11105 Reverts: D9516972 See this discussion for context: https://fburl.com/w45hb1oc Reviewed By: highker Differential Revision: D9587931 fbshipit-source-id: 715247929d819dfa88e1d051021e51c5bf0c4835	2018-08-31 00:55:36 -07:00
Ansha Yu	9fae8fcdff	framework for committed serialized tests (#10594 ) Summary: Generate serialized test inputs/outputs/backward graphs of tests inside `caffe2/python/operator_test` that call assertSerializedOperatorCheck(). Tests should be decorated with serialized_test.collect_tests.given_and_seeded to run hypothesis tests that are actually random and a single fixed seeded hypothesis tests. To use: 1. Refactor your test to be a SerializedTestCase 1a. Decorate it with given_and_seeded 1b. Call testWithArgs in main 2. Run your test with -g to generate the output. Check it in. 3. Subsequent runs of the test without generating the output will check against the checked in test case. Details: Run your test with `python caffe2/python/operator_test/[your_test].py -g` Outputs are in `caffe2/python/serialized_test/data`. The operator tests outputs are in a further subdirectory `operator_test`, to allow for other tests in the future (model zoo tests?) Currently, we've only refactored weighted_sum_test to use this, but in the next diff, we'll refactor as many as possible. The directory structure may also change as usually there are multiple tests in a single file, so we may create more structure to account for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10594 Reviewed By: ezyang Differential Revision: D9370359 Pulled By: ajyu fbshipit-source-id: 2ce77389cd8bcc0255d3bccd61569833e545ede8	2018-08-30 22:41:46 -07:00
Adam Paszke	00df09b65d	Change specialization rules in GraphExecutors (#10977 ) Summary: Review last commit only. Stacked on top of #10949. This commit fixes a number of issues connected to caching differentiability status of graphs inside graph executors, and changes the rules for optimization of differentiable subgraphs. Previously every one of those was instantiated as a separate graph executor, but now they are simply heavier-optimized graph regions, and graph executors are only instantiated for their backward. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10977 Differential Revision: D9600626 Pulled By: apaszke fbshipit-source-id: dad09a0f586e396afbd5406319c1cd54fbb8a3d3	2018-08-30 22:11:01 -07:00
Jerry Zhang	a320e5cbd3	Move static_context outside of class (#11097 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11097 att Reviewed By: ezyang Differential Revision: D9549702 fbshipit-source-id: 058b942311b00be20a0b557ba97eb3451ea55e33	2018-08-30 22:10:58 -07:00
Edward Yang	750ede7215	Rename getType to getVariableTypeFromBaseType / getVariableType (#11095 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11095 We used getType to mean a lot of things. - getVariableTypeFromBaseType: given a base Type (non-Variable type) compute the Variable Type which corresponds to it. - getVariableType: like at::getType, but return the Variable type rather than the plain type. This rename makes it clearer at the use-site what things are what, and will make a subsequent rename of at::getType easier. Reviewed By: gchanan, cpuhrsch Differential Revision: D9583630 fbshipit-source-id: 2667ec98e7607bc466920c7415a8c651fd56dfca	2018-08-30 20:11:25 -07:00
Edward Yang	c836a04dc8	Delete a bunch of uses of getType in favor of TensorOptions. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11087 Reviewed By: cpuhrsch Differential Revision: D9581560 fbshipit-source-id: ebe3c4c0956da8a7215ada287bf6526dbcb2b07d	2018-08-30 20:11:24 -07:00
Edward Yang	34a0604d51	Eliminate use of getType from DLConvertor (#11080 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11080 - Add a new TensorOptions(Device, ScalarType) constructor, which serves roughly the same role as getType used to. We shouldn't get too wild with these constructors, but since this particular one was widely used by getType, it seems worth adding. - Change DLPack DeviceType conversion to at::DeviceType, rather than at::Backend. While I'm at, add a few more conversions that at::DeviceType understands. - Add a new overload of from_blob which understands strides. Reviewed By: gchanan, cpuhrsch Differential Revision: D9578734 fbshipit-source-id: 28288ec053aae8765e23925ab91023398d632d6b	2018-08-30 20:11:23 -07:00
Edward Yang	c283acce72	Rename getTypeRaw to getNonVariableTypeRaw (#11078 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11078 ``` codemod -d . --extensions cc,cpp,cu,cuh,h getTypeRaw getNonVariableTypeRaw ``` Reviewed By: gchanan, cpuhrsch Differential Revision: D9578399 fbshipit-source-id: 00a86ae8fb00d14116762ce39d15858da9a1671e	2018-08-30 20:11:21 -07:00
Edward Yang	66c4d7e060	Rename getTypeOpt to getNonVariableTypeOpt (#11077 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11077 getType now supports retrieving variable types, so make it clearer when a getType function does NOT give you a variable type. ``` codemod -d . --extensions cc,cpp,cu,cuh,h getTypeOpt getNonVariableTypeOpt ``` Reviewed By: gchanan Differential Revision: D9578398 fbshipit-source-id: 3ee502ac5c714849917f11ddc71de8eacfdaa9d3	2018-08-30 20:11:20 -07:00
Adam Paszke	f3c3127c67	Don't flatten output lists in the JIT IR (#10949 ) Summary: Operators like aten::chunk used to return a number of tensors, but now return a list. To make it easier to do shape prop through aten::chunk and fuse it, I've also introduced prim::ConstantChunk, which behaves like the previous implementation (has a variable length output list). The downside of this PR is that the introduction of more lists to the IR causes the LSTM and MiLSTM graphs to be considered as non-differentiable by the graph executor. I verified that they are still optimize correctly, and my next patch (that changes how the specializations/differentiation works) will restore those. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10949 Reviewed By: zdevito Differential Revision: D9556823 Pulled By: apaszke fbshipit-source-id: 33e63b17fc7247cac6cfc05eb7eb9bf069b499ee	2018-08-30 19:54:39 -07:00
Orion Reblitz-Richardson	c8c21fa2b4	Allow same flags when glog is used or not (#11034 ) Summary: Extracted from https://github.com/pytorch/pytorch/pull/8338 cc mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11034 Reviewed By: mingzhe09088 Differential Revision: D9582801 Pulled By: orionr fbshipit-source-id: b41ca1bebf6cf62fff2a2b8caf4c94af3e43db00	2018-08-30 19:24:51 -07:00
Fei Sun	26409a4300	Caffe2 flags needs to be used after the GlobalInit function is called Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11120 Reviewed By: llyfacebook Differential Revision: D9598430 Pulled By: sf-wind fbshipit-source-id: 468f0ed7880339c9c4467d1cef29f5bc9fc80a2a	2018-08-30 19:10:39 -07:00
Hector Yuen	a6cb41486d	update documentation for observers Summary: update to the latest observer usage syntax add an example of HistogramObservers Reviewed By: jspark1105 Differential Revision: D6878439 fbshipit-source-id: c9521f2daecfc7f0c17de6a944dce58e568e3dbe	2018-08-30 18:11:48 -07:00
Tongliang Liao	15314c7b8e	GCC-7 doesn't like the original syntax. (#10665 ) Summary: Replace with "this->template f<T>()". Fix #7881 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10665 Differential Revision: D9597187 Pulled By: ezyang fbshipit-source-id: 8af4e7efd98edadabb97e2523a58bd21bc116d1a	2018-08-30 16:41:16 -07:00
Jerry Zhang	684bd1b7bd	size_ -> numel_ (#11112 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11112 att Reviewed By: ezyang Differential Revision: D9474018 fbshipit-source-id: d9267e52e2d50dac7524a456a44f2e28b6c0b693	2018-08-30 16:41:13 -07:00
Peter Goldsborough	7ddc6f84c4	NULL -> nullptr (#11047 ) Summary: How did we get so many uses of `NULL` again? ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11047 Differential Revision: D9566799 Pulled By: goldsborough fbshipit-source-id: 83469f352ac69aa65bdaf1a1a21f922d892e0db3	2018-08-30 16:25:42 -07:00
Junjie Bai	302e9cb815	Update onnx submodule to onnx/onnx@bae6333 (#10961 ) Summary: ONNX v1.3.0 release Pull Request resolved: https://github.com/pytorch/pytorch/pull/10961 Reviewed By: houseroad Differential Revision: D9543998 Pulled By: bddppq fbshipit-source-id: b7f0a0553d832d609d3b7613a608f7bf4a2582ef	2018-08-30 15:25:57 -07:00
Orion Reblitz-Richardson	56c737a9b7	Inject GetEmptyStringAlreadyInited once for static proto (#11045 ) Summary: I've been seeing a lot of warnings about multiple declarations of this. Hopefully this fixes it. cc Yangqing mingzhe09088 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/11045 Reviewed By: mingzhe09088 Differential Revision: D9582756 Pulled By: orionr fbshipit-source-id: 6171485609a2f2f357d6e1c44e26b4ecfcdb4ce6	2018-08-30 14:59:54 -07:00
Jerry Zhang	a136d29fd1	Use intrusive_ptr in Storage (#10907 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10907 replace shared_ptr with intrusive_ptr in Storage Reviewed By: ezyang Differential Revision: D9414388 fbshipit-source-id: d413549ffde24959166d2dff2042b99f0c5018af	2018-08-30 14:59:52 -07:00
Adam Paszke	f0142faab0	Expose arbitrary cpp autograd functions to Python (#11082 ) Summary: This is needed because the JIT declares some custom autograd functions. colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/11082 Differential Revision: D9580456 Pulled By: apaszke fbshipit-source-id: 6bf00c1188a20b2ee6ecf60e5a0099f8263ad55a	2018-08-30 14:25:59 -07:00
Zachary DeVito	93bd291e55	Change torch.jit.trace to no longer be a decorator (#11069 ) Summary: This was done because it surprising for a decorator to run a function rather than wrap it, and not simplify the syntax for tracing modules. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11069 Reviewed By: jamesr66a Differential Revision: D9583192 Pulled By: zdevito fbshipit-source-id: b914b7ab4c73c255086465a6576eef3a22de1e13	2018-08-30 13:56:05 -07:00
Sebastian Messmer	ebe9d204fa	Add test cases to intrusive_ptr (#11026 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11026 ezyang fixed a bug with moving or copying an intrusive_ptr into itself. This diff adds test cases for it. Reviewed By: ezyang Differential Revision: D9563464 fbshipit-source-id: 3a3b3f681124730d2500b276c0135c3bba7875ae	2018-08-30 13:25:33 -07:00
Tongzhou Wang	e85f3fccb3	Fix relying on UB in test_data_parallel_nested_output (#11092 ) Summary: We shouldn't reply on plain `dict` ordering. Example failure: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-linux-xenial-cuda8-cudnn6-py3-test1/8417/console Pull Request resolved: https://github.com/pytorch/pytorch/pull/11092 Reviewed By: ezyang Differential Revision: D9583274 Pulled By: SsnL fbshipit-source-id: ba80b96648c98c24c2ec5fa6fd9aa566c095cce7	2018-08-30 13:10:25 -07:00
mruberry	9d4360c060	Creates stream pool (#9938 ) Summary: This PR creates a stream pool per issue #9646. When a new stream is requested, that device it's requested on lazily creates two pools, one low priority and one high priority, of 32 streams each. Streams are returned from these pools round-robin. That is, stream 0 is returned, then stream 1... then stream 31, then stream 0... This PR also takes the opportunity to clean up the stream API, reducing its complexity and verbosity. Change notes: - There are now 3 sets of streams per device, the default stream, the low priority streams, and the high priority streams. These streams live in lazily initialized pools and are destroyed on shutdown. - All stream refcounting has been removed (the pools pattern replaces it). - Setting a stream now sets it on its device. Streams are associated with a device and the previous requirement to specify that device was unnecessary. - There is no exposure for setting the flags on a stream. This may also seem like a regression but the flag was always set to cudaStreamNonBlocking. - Streams are now low or high priority whereas previously the priority could be set with an integer. In practice, however, the range for priorities is -1 to 0 on the latest hardware. -1 is high priority, 0 is low priority (aka default priority). Low vs. high actually clarifies this behavior if people were trying finer separations. (E.g., if someone tried streams with priorities 0, 1, and 2, they would actually all have priority 0, historically, and the intended behavior would not be respected.) - Unused THCStream and THCState stream-related functions were removed. - A new test of pooling behavior was added in stream_test. fyi: colesbury, apaszke, goldsborough Pull Request resolved: https://github.com/pytorch/pytorch/pull/9938 Reviewed By: SsnL Differential Revision: D9569036 Pulled By: ezyang fbshipit-source-id: 12ed673fe373170d0cf4d65cb570de016c53ee7d	2018-08-30 12:40:23 -07:00
Pádraig Brady	23b0c90e71	caffe2: fix gcc8 warnings Summary: The warnings are erroneous as far as i can see, so tweak things to avoid. The (unsigned int) cast is to avoid passing -1 to a size_t time. This was triggered in gcc8's lto build only, giving: caffe2/aten/src/TH/generic/THTensor.cpp: In function ‘THFloatTensor_squeeze1d’: lto1: error: ‘__builtin_memset’ specified size 18446744073709551608 exceeds maximum object size 9223372036854775807 [-Werror=stringop-overflow=] In function ‘newImpl’, inlined from ‘operator new’ at common/memory/OperatorOverride.cpp:86:23, inlined from ‘allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/ext/new_allocator.h:111:0, inlined from ‘allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/alloc_traits.h:436:0, inlined from ‘_M_allocate’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/stl_vector.h:172:0, inlined from ‘_M_default_append’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/vector.tcc:571:0, inlined from ‘resize’ at third-party-buck/platform007/build/libgcc/include/c++/7.3.0/bits/stl_vector.h:671:0, inlined from ‘THTensor_resizeDim’ at caffe2/aten/src/TH/THTensor.hpp:123:0, inlined from ‘THFloatTensor_squeeze1d.part.198’ at caffe2/aten/src/TH/generic/THTensor.cpp:429:0, inlined from ‘THFloatTensor_squeeze1d’: common/memory/OperatorOverride.cpp:86:23: error: argument 1 value ‘18446744073709551608’ exceeds maximum object size 9223372036854775807 [-Werror=alloc-size-larger-than=] void* ptr = malloc(size); Reviewed By: soumith Differential Revision: D9568621 fbshipit-source-id: 4569a4be897d669caa3f283f4b84ec829e8d77ad	2018-08-30 11:55:29 -07:00
Erik Brinkman	611a608517	Add ATen pdist CPU kernel (#10782 ) Summary: Also add single grad whitelist to the jit test Pull Request resolved: https://github.com/pytorch/pytorch/pull/10782 Reviewed By: ezyang Differential Revision: D9583378 Pulled By: erikbrinkman fbshipit-source-id: 069e5ae68ea7f3524dec39cf1d5fe9cd53941944	2018-08-30 11:55:27 -07:00
Adam Paszke	029082e87c	Add entry for torch/lib/pythonX.Y in .gitignore (#11083 ) Summary: I've had `torch/lib/python3.6` show up as part of the build for some time now. It's not ignored which means I need to be extra careful about checking in files, or I end up with a thousand of them in my index. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11083 Differential Revision: D9580453 Pulled By: apaszke fbshipit-source-id: 369e4fe87962696532d111b24f2a4a99b9572bf2	2018-08-30 11:40:25 -07:00
Jerry Zhang	40227671e9	Add strides to caffe2::Tensor (#10826 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10826 Add strides, and make sure the strides are consistent with sizes, and is_contiguous, for all the Caffe2 functions. is_contiguous means strides_[dim-1] = 1 and strides_[i] = strides_[i+1] * max(size_[i+1], 1); Reviewed By: ezyang Differential Revision: D9354480 fbshipit-source-id: 3643871b70f1111b7ffdd9fdd9fe9bec82635963	2018-08-30 11:25:58 -07:00
Orion Reblitz-Richardson	535633bddc	Export MPI functions (#11037 ) Summary: Potential fix for https://github.com/caffe2/caffe2/issues/2551#issuecomment-417124872 cc Yangqing mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/11037 Reviewed By: mingzhe09088 Differential Revision: D9580937 Pulled By: orionr fbshipit-source-id: 5e1fbf718728271a5b5af526d8e67cc5b48f0575	2018-08-30 10:42:02 -07:00
Fei Sun	e7195431e0	Add benchmarking functionality to the benchmark app (#10976 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10976 The app can run in XCode with the benchmark metrics collected. It can also run when building with buck Reviewed By: llyfacebook Differential Revision: D9546755 fbshipit-source-id: 60ad0112946f8cf57138417f6838a58ed6d2c90f	2018-08-30 09:54:55 -07:00
Xingdong Zuo	a8af7fe46a	Support import of `nn.RNNCellBase` in `__all__` Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10992 Differential Revision: D9572005 Pulled By: soumith fbshipit-source-id: 26b546830b6a25a4f7ba6f825cd888d678233a97	2018-08-30 08:25:21 -07:00
Tullie Murrell	dbc0004f99	Remove use_count() == 1 in Tensor::Extend (#11046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11046 As suggested by jerryzh168, temporary fix for a new constraint that was added D9350686 is to remove this assert. Long term jerryzh168 is going to work out a better way of handling this. Reviewed By: jerryzh168 Differential Revision: D9566323 fbshipit-source-id: e4630c7cbe0cc68a084974ea7048654811fae01f	2018-08-29 23:55:28 -07:00
Tongzhou Wang	23af7deea7	Add has_lapack flag (#11024 ) Summary: Currently our `skipIfLapack` has uses a try-catch block and regex match the error message. It is highly unreliable. This PR adds `hasLAPACK` and `hasMAGMA` on ATen context, and expose the flags to python. Also fixes refcounting bug with `PyModule_AddObject`. The method steals reference, but we didn't `Py_INCREF` in some places before calling it with `Py_True` or `Py_False`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11024 Differential Revision: D9564898 Pulled By: SsnL fbshipit-source-id: f46862ec3558d7e0058ef48991cd9c720cb317e2	2018-08-29 22:41:16 -07:00
Shihao Xu	ad1670cf54	Kill the dummy TaskOutput when task.get_step() (#11048 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11048 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739 I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint. But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan". This adding a dummy TaskOutput when user specifies no TaskOutput is a hack. The reason for this is that ZMQ socket can't send empty blob list. As a result, if the Task on the Worker had no output, The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`. TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces. Instead, we should move the creating of the dummy blob to some deeper layer, and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces. After this change, the workaround becomes totally transparent and no side-effect to users. Reviewed By: mraway Differential Revision: D9566744 fbshipit-source-id: 18292dd64a6d48192c34034200a7c9811d2172af	2018-08-29 20:11:29 -07:00
Christian Puhrsch	16b8e0a787	at::StorageImpl: Rename size_ to numel_ and elementSize() to itemsize() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11011 Reviewed By: ezyang Differential Revision: D9561898 Pulled By: cpuhrsch fbshipit-source-id: 0cf5cdc3e7acd397f7e2d66097856aaad0581147	2018-08-29 20:11:27 -07:00
Lu Fang	394bdcd49a	Fix the build of aten tests when FULL_CAFFE2=1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11019 Reviewed By: orionr Differential Revision: D9562691 Pulled By: houseroad fbshipit-source-id: 95a8dee580e5f4dc9af3a2e1f68ec6c62a0e4e04	2018-08-29 18:09:54 -07:00
Yi Cheng	e550eab3e2	Remove MetaNetDef test case in Predictor (#11052 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11052 Delete the test case for Predictor with constructing by MetaNetDef since the constructor actually has been deprecated. The broken PR is for construcing predictor from DB instance. Reviewed By: highker Differential Revision: D9566935 fbshipit-source-id: 5511883953a2d3f6eb0a4f1c5518a1bc4b3ffbdc	2018-08-29 17:55:21 -07:00
Christian Puhrsch	91ecbf8b1d	Remove TensorBase (#11036 ) Summary: Not subclassed except by Tensor. Also requried to align further with caffe2. Pull Request resolved: https://github.com/pytorch/pytorch/pull/11036 Reviewed By: ezyang Differential Revision: D9565640 Pulled By: cpuhrsch fbshipit-source-id: ff7203a2c95d3f3956282b4f2d8dda6c2b93f4a6	2018-08-29 17:27:19 -07:00
Zachary DeVito	ae635b16f7	Record tensor factory functions in trace (#10935 ) Summary: Things like torch.zeros now appear in traces rather than constants. To continue to support our current level of ONNX export, we run constant prop to turn these back into constants where possible before export. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10935 Differential Revision: D9527427 Pulled By: zdevito fbshipit-source-id: 552a8bcc01b911251dab7d7026faafdd7a3c758a	2018-08-29 17:10:24 -07:00
Roy Li	c4e1adf29d	Remove THHalf type Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11010 Reviewed By: ezyang Differential Revision: D9561325 Pulled By: li-roy fbshipit-source-id: 053cf2925ec1fc458db31e92bd31ffd23389f3e8	2018-08-29 16:44:45 -07:00
pbialecki	2cc98d8df7	Adds `dim` argument to `torch.unique` (#10423 ) Summary: Initial version of `unique` supporting a `dim` argument. As discussed in [this issue](https://github.com/pytorch/pytorch/issues/9997) I added the `dim` argument to `torch.unique` with the same behavior like [numpy](https://docs.scipy.org/doc/numpy-1.14.0/reference/generated/numpy.unique.html). Since the implementation is based on `std/thrust::unique`, the `tensor` always needs to be sorted. The `sorted` argument in `torch.unique` does not have any function, as in the CUDA version of the plain `torch.unique`. To check the performance and equal behavior between `torch.unique` and `np.unique`, I've used [this gist](https://gist.github.com/ptrblck/ac0dc862f4e1766f0e1036c252cdb105). Currently we achieve the following timings for an input of `x = torch.randint(2, (1000, 1000))`: (The values are calculated by taking the average of the times for both dimension) \| Device \| PyTorch (return_inverse=False) \| Numpy (return_inverse=False) \| PyTorch (return_inverse=True) \| Numpy (return_inverse=True) \| \| --- \| --- \| --- \| --- \| --- \| \| CPU \| ~0.007331s \| ~0.022452s \| ~0.011139s \| ~0.044800s \| \| GPU \| ~0.006154s \| - \| ~0.105373s \| - \| Many thanks to colesbury for the awesome mentoring and the valuable advices on the general implementation and performance issues! Pull Request resolved: https://github.com/pytorch/pytorch/pull/10423 Differential Revision: D9517289 Pulled By: soumith fbshipit-source-id: a4754f805223589c2847c98b8e4e39d8c3ddb7b5	2018-08-29 16:26:09 -07:00
Bram Wasti	98d85b1790	Debugging help + test Summary: When conversion fails, dump more information to help fix up the netdef Reviewed By: hyuen, yinghai Differential Revision: D9558667 fbshipit-source-id: 8917cc61c9be6285697e4f8395a9dbc7135f618e	2018-08-29 16:26:07 -07:00
Christian Puhrsch	ef7fc2a3e1	Remove at::StorageImpl::finalizer_ (#11022 ) Summary: Unused member variable Pull Request resolved: https://github.com/pytorch/pytorch/pull/11022 Reviewed By: ezyang Differential Revision: D9562520 Pulled By: cpuhrsch fbshipit-source-id: af190b3ba06d33d65fa0fabffb34a0df769f38d0	2018-08-29 16:09:47 -07:00
Christian Puhrsch	6b87198245	Devirtualize StorageImpl deconstructor (#11018 ) Summary: Further align at::StorageImpl with caffe2::StorageImpl Pull Request resolved: https://github.com/pytorch/pytorch/pull/11018 Reviewed By: ezyang Differential Revision: D9562256 Pulled By: cpuhrsch fbshipit-source-id: d929317f6226a1e2550b78034b723afbae343aaa	2018-08-29 15:39:54 -07:00
Adam Paszke	d9b74f6540	Make it possible to disable JIT using env variables (#10867 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10867 Differential Revision: D9556882 Pulled By: apaszke fbshipit-source-id: 04c0ca875d15d37dd9ac05ac7b515cd899ddb7e4	2018-08-29 15:11:05 -07:00
jgong5	c755616e00	Enable Detectron model inference for CPU and MKL-DNN paths (#10157 ) Summary: 1. Support ops needed for inference of Faster-RCNN/Mask-RCNN needed in Detectron, mostly direct fallbacks. 2. Use CPU device to hold 0-dim tensors and integer tensors in both fallback op and blob feeder, needed by Detectron models. 3. Ignore 0-dim tensor in MKL-DNN concat operator. 4. Generate dynamic library of Detectron module for CPU device. This PR obsoletes #9164. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10157 Differential Revision: D9276837 Pulled By: yinghai fbshipit-source-id: dc364932ae4a2e7fcefdee70b5fce3c0cee91b6f	2018-08-29 15:11:01 -07:00
Tommy Yu	89834dfe64	Add GPU version of HardSigmoid Op to Caffe2 (#10955 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10955 Add GPU version of HardSigmoid Op to Caffe2. Updated test file to include GPU tests. Reviewed By: enosair Differential Revision: D9499353 fbshipit-source-id: fcb51902063d0c3e4b10354533a8a42cf827c545	2018-08-29 14:55:29 -07:00
Zhanibek Datbayev	22e3b2c9c3	Revert D9413150: [New Checkpoint] Kill the dummy TaskOutput when task.get_step() Differential Revision: D9413150 Original commit changeset: 51aaf3201e26 fbshipit-source-id: ac7c4c0960db03f344fe3eb2ad7f0e034db2371a	2018-08-29 14:39:49 -07:00
Yangqing Jia	6a8bc3804a	Add flush to logging messages higher than INFO. (#10983 ) Summary: This probably fixes the logging test error that orionr is encountering - haven't tested locally but wanted to send out a PR to kick off CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10983 Reviewed By: ezyang Differential Revision: D9552607 Pulled By: Yangqing fbshipit-source-id: 9ac019031ffd9c03972144df04a836e5dcdafe02	2018-08-29 14:39:48 -07:00
Edward Yang	0b1de74732	Documentation improvement in caffe2/core/tensor.h (#11006 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11006 Reviewed By: smessmer Differential Revision: D9558383 Pulled By: ezyang fbshipit-source-id: 7d36fb69a6e8a7d064da2c8796dc263a9fd4e094	2018-08-29 14:25:38 -07:00
Tongzhou Wang	e9eed8edb4	Add doc for Tensor.digamma_? (#11008 ) Summary: follow up for #10967 zou3519 vishwakftw Pull Request resolved: https://github.com/pytorch/pytorch/pull/11008 Differential Revision: D9559889 Pulled By: SsnL fbshipit-source-id: a05d8fbad92a54bcdb93de6e62a7f94180da1d99	2018-08-29 14:11:16 -07:00
Edward Yang	f687ff5a59	Delete unnecessary includes from TensorImpl.h (#11005 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/11005 Reviewed By: smessmer Differential Revision: D9558300 Pulled By: ezyang fbshipit-source-id: ebebb3c6d3a1a2f7cc3da9fe9d3c56310ead46e1	2018-08-29 14:11:14 -07:00
Edward Yang	b644d5e74a	Delete context and get_context from Type. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11001 Reviewed By: cpuhrsch Differential Revision: D9557315 fbshipit-source-id: b9862b8dda49194298bb1a4fbc214d466f3c8350	2018-08-29 13:55:45 -07:00
Edward Yang	cd9416317d	Minor copy-edit on setup.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10933 Reviewed By: cpuhrsch Differential Revision: D9526650 fbshipit-source-id: 8ad1c989bee7009b3f95a2641189f55cf6c1979f	2018-08-29 13:41:04 -07:00
Yi Cheng	c99a143eea	Update blackbox predictor with new constructor (#10920 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10920 Update the black box predictor and the related code to use the constructor with PredictorConfig. Reviewed By: highker Differential Revision: D9516972 fbshipit-source-id: fbd7ece934d527e17dc6bcc740b4e67e778afa1d	2018-08-29 13:31:45 -07:00
Teng Li	56539f5fe1	PT1 Distributed Release MileStone No.1 - Completed Distributed Package and CI tests (#10871 ) Summary: The PR includes: (1) torch.distributed.c10d, which now includes the complete backward compatible frontend API for `torch.distributed` (2) `env://` init method functionality (3) Minor change to `test_distributed.py`, which is now a test for `torch.distributed.c10d`. (4) The old `test_distributed.py' is now moved to `test_distributed_thd` (5) Miscellaneous bug fixes. (6) DDP CPU test is removed since c10d doesn't have this support yet, but this is a very easy test after moving DDP CPU's dependency to torch.distributed.c10d. (7) CI config to test MPI, NCCL, and Gloo backend of c10d Now all the distributed test including c10d DDP can pass with the c10d frontend API TODO: (in a separate PR) MPI subgroup support, once this is added, CI group test will be enabled. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10871 Differential Revision: D9554514 Pulled By: teng-li fbshipit-source-id: fb686ad42258526c8b4372148e82969fac4f42dd	2018-08-29 12:55:57 -07:00
Duc Ngo	fa7c81c640	nomnigraph - nit - code style update (#10987 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10987 some code style update to make it consistent with fb cpp style Reviewed By: yinghai Differential Revision: D9550130 fbshipit-source-id: 6aef9878676c08e7d384383c95e7ba8c5c9a1bce	2018-08-29 12:55:55 -07:00
Christian Puhrsch	ec519e8a4a	Reduce number of elements within test_abs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10997 Differential Revision: D9556861 Pulled By: cpuhrsch fbshipit-source-id: 986ef275e94fcffcc04a5c1103b8b7bfb4ae3ba5	2018-08-29 12:55:54 -07:00
Yanghan Wang	dbce1c840f	exposing net_transformer_fun before add grad (#11003 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/11003 Need a interface to re-write the graph after the net is built and after adding gradient ops. Reviewed By: aazzolini, harouwu Differential Revision: D9557827 fbshipit-source-id: 2e082f0321c0776e488a29e18047d950948e7c37	2018-08-29 12:55:52 -07:00
Gregory Chanan	bed9d41abd	Generate Type::registerCPU as we do register_cuda_types. (#10947 ) Summary: The goal here is to separate out the base Type into core; as it was done previously we need all derived Types to be defined when we compile the base Type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10947 Reviewed By: gchanan Differential Revision: D9540025 Pulled By: ezyang fbshipit-source-id: 49f0b5acb3c378348ef3a55780abb73e4ae27edd	2018-08-29 12:39:47 -07:00
Richard Zou	4e446b85fb	Make profiler.build_table() O(n) rather than O(n^2) (#10969 ) Summary: Fixes #10851 Speeds up profiling results dramatically. For the following script: ``` import torch import time ITER = 2000 x = torch.randn(1, 1, requires_grad=True) with torch.autograd.profiler.profile() as prof: y = x for i in range(ITER): y = 3 * y - 2 * y y.backward() start = time.time() print("Done running. Preparing prof") x = str(prof) print("Done preparing prof results") end = time.time() print("Elapsed: {}".format(end - start)) ``` I get 7s before / 0.13s after these changes. cc apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10969 Differential Revision: D9556129 Pulled By: zou3519 fbshipit-source-id: 26b421686f8a42cdaace6382567d403e6385dc12	2018-08-29 12:25:51 -07:00
zou3519	396dec0e37	s/spaerse/sparse (#10968 ) Summary: cc SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/10968 Differential Revision: D9546746 Pulled By: zou3519 fbshipit-source-id: a6a4bb8bb04eccf89c3d90a90259070beb484500	2018-08-29 12:13:04 -07:00
Gregory Chanan	525548fb64	Move SparseTensorRef to core, change some includes to core. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10964 Differential Revision: D9545021 Pulled By: gchanan fbshipit-source-id: 8ba7e5e3a7bdf24e5aeb4bbc91957c1a6f14d7f0	2018-08-29 11:55:29 -07:00
Mingzhe Li	e0dbb91060	Windows raw string fix (#10998 ) Summary: Breaking this out of https://github.com/pytorch/pytorch/pull/8338 mingzhe09088's fix of the docstrings for Windows builds. Unfortunately some versions of Windows seem to try and parse the `#` inside the string as a pre-processor declaration. We might need to change this to something else later, but want to get this landed first. cc mingzhe09088 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10998 Reviewed By: mingzhe09088 Differential Revision: D9557480 Pulled By: orionr fbshipit-source-id: c6a6237c27b7cf35c81133fd9faefead675a9f59	2018-08-29 11:40:08 -07:00
Orion Reblitz-Richardson	206d52d0e3	Disable smart_tensor_printer_test without glog (#10999 ) Summary: Breaking out of https://github.com/pytorch/pytorch/pull/8338 This test fails once we start building with `-DUSE_GLOG=OFF` since the non-glog logging case doesn't support flushing or streaming to the right location. For now, we just disable this test in that case. cc Yangqing mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10999 Reviewed By: mingzhe09088 Differential Revision: D9557488 Pulled By: orionr fbshipit-source-id: 8b306f210411dfc8ccc404bdccf77ddcd36a4830	2018-08-29 11:10:23 -07:00
Lu Fang	562fc7631f	Add test cases for ONNX unsqueeze (#10924 ) Summary: PyTorch exporting test and end to end cases. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10924 Reviewed By: Ac2zoom Differential Revision: D9548210 Pulled By: houseroad fbshipit-source-id: 2381d1ad92a4e07f97060eb65c9fd09f60ad3de6	2018-08-29 11:10:21 -07:00
Gregory Chanan	1b0d5e60ab	Get rid of some unnecessary includes of Context. (#10951 ) Summary: This is part of splitting Context from what needs to go in ATen/core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10951 Differential Revision: D9540369 Pulled By: gchanan fbshipit-source-id: 73b0e8c4493785fbab368a989f46137c51f6ea0b	2018-08-29 11:10:20 -07:00
Ailing Zhang	a9469c9c8a	Fill eigenvector with zeros if not required (#10645 ) Summary: Fix #10345, which only happens in CUDA case. * Instead of returning some random buffer, we fill it with zeros. * update torch.symeig doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10645 Reviewed By: soumith Differential Revision: D9395762 Pulled By: ailzhang fbshipit-source-id: 0f3ed9bb6a919a9c1a4b8eb45188f65a68bfa9ba	2018-08-29 10:55:22 -07:00
Orion Reblitz-Richardson	b41988c71e	Cleanup BUILD_DOCS cmake section (#11000 ) Summary: Breaking out of https://github.com/pytorch/pytorch/pull/8338 cc mingzhe09088 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/11000 Differential Revision: D9557474 Pulled By: orionr fbshipit-source-id: 7d84914b67ff37bdb7738f9b7846dfeb5b975c00	2018-08-29 10:09:52 -07:00
zou3519	7169906249	torch.digamma (#10967 ) Summary: Fixes #10307 cc SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/10967 Differential Revision: D9546748 Pulled By: zou3519 fbshipit-source-id: 764e27b1cc8dd487270b3ffa653b806c86f717dd	2018-08-29 09:43:19 -07:00
Lu Fang	a5d7abedae	Enable fusing aten::expand on GT, LT, EQ (#10845 ) Summary: GT, LT, EQ all support numpy broadcasting, just enable the fusion. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10845 Reviewed By: bddppq Differential Revision: D9494089 Pulled By: houseroad fbshipit-source-id: 7c65ca06c54dbd476ac7d07b47a413faaed3dd5e	2018-08-28 23:56:50 -07:00
James Reed	db0abe1890	Fix bugs in handling of negative slice + gather indices (#10973 ) Summary: This fixes multiple bugs in the handling of negative indices in both slicing and gather operations. These were uncovered by @[1466077526:Elias Ellison]'s diff D9493614, which made it so that we actually emit negative indices when we see them in PyTorch code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10973 Reviewed By: jhcross Differential Revision: D9546183 Pulled By: jamesr66a fbshipit-source-id: 6cb0e84e8ad399e47e24a96c44025f644c17b375	2018-08-28 23:40:40 -07:00
Shihao Xu	6ca28984c7	Kill the dummy TaskOutput when task.get_step() (#10739 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10739 I wanted to assert that the blobs in the workspace of the new session after loading checkpoint are exactly the same as the blobs in the workspace of the old session before saving to a checkpoint. But I found that when calling `task.get_step()`, a dummy task output blob, `task:output/ConstIntFill:0`, is added. Also a dummy net `task:output` was also added along with it. See https://fburl.com/937lf2yk This makes it hard to assert "Equal", forcing me to assert "LessThan" or "GreaterThan". This adding a dummy TaskOutput when user specifies no TaskOutput is a hack. The reason for this is that ZMQ socket can't send empty blob list. As a result, if the Task on the Worker had no output, The master would never stop waiting and hang forever. See https://fburl.com/rd7fhy6p and imagine `socket.recv(net, 0)`. TaskOuput is at user layer. The hack shouldn't be exposed to user layer, polluting user workspaces. Instead, we should move the creating of the dummy blob to some deeper layer, and remove the dummy blob in the workspace afterwards to avoid polluting user workspaces. After this change, the workaround becomes totally transparent and no side-effect to users. Reviewed By: mraway Differential Revision: D9413150 fbshipit-source-id: 51aaf3201e26570b4fcf5738e9b9aa17c58777ac	2018-08-28 20:41:46 -07:00
James Reed	beeec47041	Sanity checks for tracing (#10841 ) Summary: TODO: integrate into torch.onnx.export -- separate PR Problem: We have a facility to trace PyTorch operations on Python code, but there are several failure modes where the trace is not representative of the actual underlying computation: * The tracer encountered dynamic control flow * Some computation escaped the tracer, and appeared as a Constant tensor node in the graph * Some stateful function was traced, e.g. someone did an optimization in Python by memoizing function outputs Objective: In an ideal world, this whole process would be automated and the user can trust that the system will magically capture the intended semantics from the program. Realistically speaking, we will likely have to settle with a human-in-the-loop error reporting system, allowing for the user to identify problems and modify the source code to allow for tracing. Stage 1 (this PR): Output-level checking & graph diff. torch.jit.trace gains a kwarg 'check_inputs', which is a list of tuples of input arguments. We will iterate through the list and trace the function again for each set of check inputs. We'll also interpret the original trace with these inputs and compare output values and graphs, printing a diff of the graph if there is a difference. Examples: ``` torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 5),)]) def foo(x): y = torch.arange(0, x.shape[0]).float() return x + y.unsqueeze(1) ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Graphs differed across invocations! Graph diff: graph(%0 : Dynamic) { - %1 : Dynamic = prim::Constant[value= 0 1 2 [ CPULongType{3} ]]() ? ^ + %1 : Dynamic = prim::Constant[value= 0 1 2 3 [ CPULongType{4} ]]() ? +++ ^ %2 : int = prim::Constant[value=0]() %3 : Dynamic = aten::_cast_Float(%1, %2) %4 : int = prim::Constant[value=1]() %5 : Dynamic = aten::unsqueeze(%3, %4) %6 : int = prim::Constant[value=1]() %7 : Dynamic = aten::add(%0, %5, %6) return (%7); } Node diff: - %1 : Dynamic = prim::Constant[value= 0 1 2 [ CPULongType{3} ]]() ? ^ + %1 : Dynamic = prim::Constant[value= 0 1 2 3 [ CPULongType{4} ]]() ? +++ ^ Trace source location: dank.py(5): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper dank.py(3): <module> Check source location: dank.py(5): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper dank.py(3): <module> ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code. Node: %1 : Dynamic = prim::Constant[value= 0 1 2 [ CPULongType{3} ]]() Source Location: dank.py(5): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper dank.py(3): <module> Comparison exception: Not equal to tolerance rtol=1e-07, atol=0 (shapes (3,), (4,) mismatch) x: array([0, 1, 2]) y: array([0, 1, 2, 3]) ``` == ``` torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)]) def foo(x): y = x.data return x + y ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. ERROR: Tensor-valued Constant nodes differed in value across invocations. This often indicates that the tracer has encountered untraceable code. Node: %1 : Dynamic = prim::Constant[value=<Tensor>]() Source Location: dank.py(6): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper dank.py(3): <module> Comparison exception: Not equal to tolerance rtol=1e-07, atol=0 (mismatch 100.0%) x: array([0.397137, 0.956105, 0.169478, 0.560292, 0.392568, 0.108441, 0.97645 , 0.34412 , 0.951246, 0.793061, 0.557595, 0.770245], dtype=float32) y: array([0.243178, 0.315964, 0.972041, 0.0215 , 0.927751, 0.457512, 0.951092, 0.97883 , 0.048688, 0.118066, 0.779345, 0.271272], dtype=float32) ``` == ``` import torch torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(4, 4),)]) def foo(x): for _ in range(x.size(0)): x = torch.neg(x) return x ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. ERROR: Graphs differed across invocations! Graph diff: graph(%0 : Dynamic) { %1 : Dynamic = aten::neg(%0) %2 : Dynamic = aten::neg(%1) %3 : Dynamic = aten::neg(%2) + %4 : Dynamic = aten::neg(%3) - return (%3); ? ^ + return (%4); ? ^ } ``` == ``` import torch def foo(x): if not hasattr(foo, 'cache'): foo.cache = torch.neg(x) return x + foo.cache traced = torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)])(foo) ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. ERROR: Graphs differed across invocations! Graph diff: graph(%0 : Dynamic) { - %1 : Dynamic = aten::neg(%0) + %1 : Dynamic = prim::Constant[value=<Tensor>]() %2 : int = prim::Constant[value=1]() %3 : Dynamic = aten::add(%0, %1, %2) return (%3); } Node diff: - %1 : Dynamic = aten::neg(%0) + %1 : Dynamic = prim::Constant[value=<Tensor>]() Trace source location: test.py(5): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(402): wrapper test.py(8): <module> Check source location: test.py(6): foo /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(281): check_trace /Users/jamesreed/onnx-fairseq/pytorch/torch/jit/__init__.py(408): wrapper test.py(8): <module> ``` The following two examples show instances where program semantics are lost in the Python -> trace transformation, and repeated invocation does not give us useful debug information. Further design in underway for catching these scenarios. ``` import torch torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(3, 4),)]) def foo(x): for i in range(3): x[i, :] = torch.zeros(4) return x ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. Exception: Not equal to tolerance rtol=1e-07, atol=0 (mismatch 100.0%) x: array([0.830221, 0.915481, 0.940281, 0.555241], dtype=float32) y: array([0., 0., 0., 0.], dtype=float32) ``` == ``` import torch torch.jit.trace(torch.rand(3, 4), check_inputs=[(torch.rand(5, 6),)]) def foo(x): x.view(-1).add_(-x.view(-1)) return x ``` ``` torch.jit.TracingCheckError: Tracing failed sanity checks! ERROR: Traced function outputs do not match the Python function outputs. Exception: Not equal to tolerance rtol=1e-07, atol=0 (mismatch 100.0%) x: array([0.734441, 0.445327, 0.640592, 0.30076 , 0.891674, 0.124771], dtype=float32) y: array([0., 0., 0., 0., 0., 0.], dtype=float32) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10841 Differential Revision: D9499945 Pulled By: jamesr66a fbshipit-source-id: 1f842a32d0b0645259cc43b29700b86d99c59a45	2018-08-28 20:25:26 -07:00
Peter Goldsborough	fe15aedacc	Store schema in serialized modules and check arguments in function call (#10872 ) Summary: This PR adds argument checking for script method invocation from C++. For this I had to: 1. The schema of a method is currently not serialized in script modules, so we now store the function schema in the `doc_string` field of the ONNX proto. Upon loading of a serialized script module, we parse the schema into the structured C++ form and assign it to the loaded method, 2. Inside `Method::operator()`, we now verify the number and types of arguments. CC The controller you requested could not be found. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10872 Differential Revision: D9521219 Pulled By: goldsborough fbshipit-source-id: 5cb3d710af6f500e7579dad176652c9b11a0487d	2018-08-28 20:11:39 -07:00
Bram Wasti	ba71547e93	Add clip op to IR Summary: self explanatory Reviewed By: highker Differential Revision: D9551065 fbshipit-source-id: 14b3807af5337654c360a23816cffd7dd346bad5	2018-08-28 19:25:02 -07:00
Bram Wasti	90eb0b6031	Cleanup accidental logging Summary: cleanup Reviewed By: duc0 Differential Revision: D9549449 fbshipit-source-id: 9154b36a39936566fc2711a6e7bd33049681d1c8	2018-08-28 18:55:29 -07:00
Shihao Xu	72a84127b1	Add Workspace methods ws.feed_blob(name, arr) ws.remove_blob(name) (#10929 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10929 Workspace classes methods were missing on the Python side. Being able to write the New Checkpoint Framework with more control of the workspace and cleaner implementation. Added - ws.feed_blob(name, arr) - ws.remove_blob(name) Reviewed By: mraway Differential Revision: D9486867 fbshipit-source-id: ea02d2e3a39d716a5a3da0482f57d4ac4c893763	2018-08-28 17:54:34 -07:00
Junjie Bai	8e5b8490bf	Add relevant code for adding caffe2 pybind extensions registry to rocm (#10975 ) Summary: `cfa5dbadfc` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10975 Differential Revision: D9546838 Pulled By: bddppq fbshipit-source-id: 3bd6dc0a4eee582bb92fc33ed27fc40eb3ab1200	2018-08-28 15:40:37 -07:00
Orion Reblitz-Richardson	4cb968fb77	Default hidden visibility (#10752 ) Summary: Flipping to hidden visibility one more time. Let's see what fails. cc mingzhe09088 pjh5 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10752 Reviewed By: ezyang Differential Revision: D9526343 Pulled By: orionr fbshipit-source-id: c0e9c29270e95e1b2e21c598095f720c199e1e52	2018-08-28 15:25:43 -07:00
Tommy Yu	92ff070b83	Add CPU version of hard sigmoid operator to caffe2 (#10837 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10837 Add CPU version of hard sigmoid operator to caffe2. The definition of this operator can be found here: https://github.com/onnx/onnx/blob/master/docs/Operators.md#HardSigmoid. Reviewed By: BIT-silence Differential Revision: D9489536 fbshipit-source-id: 67b3171ed96d5ebcc8d500d93e7827a4a9705a81	2018-08-28 14:55:49 -07:00
Tongzhou Wang	efd2aeac9e	Set -Wno-stringop-overflow only with GCC >=7 (#10954 ) Summary: `stringop-overflow` is added in GCC 7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10954 Differential Revision: D9546084 Pulled By: SsnL fbshipit-source-id: e6e68f993f1dbaa879ca66dc43bbcff9c49890ff	2018-08-28 14:25:29 -07:00
Duc Ngo	b3601a0425	nomnigraph - add documentation for new ReplaceSubgraph api to README.md (#10802 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10802 add documentation for new ReplaceSubgraph api to README.md Reviewed By: yinghai Differential Revision: D9473282 fbshipit-source-id: 144c895564af83cc8727a0370e894c2f0b7eadf5	2018-08-28 12:55:25 -07:00
Bram Wasti	cfa5dbadfc	Add nomnigraph bindings Summary: Adds basic nomnigraph python bindings for quickly playing with the graphs. Reviewed By: duc0 Differential Revision: D9441936 fbshipit-source-id: fd70f8ea279b28c766e40f124008800acd94bddd	2018-08-28 12:40:16 -07:00
Teng Li	a88463cd9a	Working async version of AllGather, test fix and compiler warnings, and CI (#10932 ) Summary: The previous NCCL all gather doesn't work as expected. This is a fully working async version. Tested on both C++ and Python Frontend. Multi-node: ``` tengli@learnfair042:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ TMPFILE="/private/home/tengli/temp/tengli-test" RANK=0 WORLD_SIZE=2 ./ProcessGroupNCCLTest Multi-node world size: 2 rank: 0 Allreduce test successful Broadcast test successful Reduce test successful Allgather test successful tengli@learnfair117:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ TMPFILE="/private/home/tengli/temp/tengli-test" RANK=1 WORLD_SIZE=2 ./ProcessGroupNCCLTest Multi-node world size: 2 rank: 1 Allreduce test successful Broadcast test successful Reduce test successful Allgather test successful ``` CI test: ``` test_set_get (__main__.FileStoreTest) ... ok test_set_get (__main__.PrefixFileStoreTest) ... ok test_set_get (__main__.PrefixTCPStoreTest) ... ok test_allreduce_ops (__main__.ProcessGroupGlooTest) ... ok test_broadcast_ops (__main__.ProcessGroupGlooTest) ... ok test_allgather_ops (__main__.ProcessGroupNCCLTest) ... ok test_allreduce_ops (__main__.ProcessGroupNCCLTest) ... ok test_broadcast_ops (__main__.ProcessGroupNCCLTest) ... ok test_reduce_ops (__main__.ProcessGroupNCCLTest) ... ok test_common_errors (__main__.RendezvousFileTest) ... ok test_nominal (__main__.RendezvousFileTest) ... ok test_common_errors (__main__.RendezvousTCPTest) ... ok test_nominal (__main__.RendezvousTCPTest) ... ok test_unknown_handler (__main__.RendezvousTest) ... ok test_set_get (__main__.TCPStoreTest) ... ok ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10932 Differential Revision: D9542067 Pulled By: teng-li fbshipit-source-id: 25513eddcc3119fd736875d69dfb631b10f4ac86	2018-08-28 12:40:14 -07:00
Michael Carilli	579bc43a14	Future-proofing embedding.py against heuristic changes (#10959 ) Summary: - rebase of https://github.com/pytorch/pytorch/pull/9851 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10959 Differential Revision: D9542292 Pulled By: weiyangfb fbshipit-source-id: ce51864d203c8ed89da3817f1da020a0ee932960	2018-08-28 12:40:12 -07:00
Xingdong Zuo	3b891d9d49	Support direct access of `nn.RNNCellBase` Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10944 Differential Revision: D9541085 Pulled By: soumith fbshipit-source-id: 59077f3b226d04c68a93cd6864894e8f6c594aba	2018-08-28 12:25:12 -07:00
David Riazati	5c58cda8ca	Add subname to console output for assertExpected (#10559 ) Summary: Running `--accept` on a test doesn't tell you explicitly which sub-test is being updated, this PR fixes that Pull Request resolved: https://github.com/pytorch/pytorch/pull/10559 Differential Revision: D9353977 Pulled By: driazati fbshipit-source-id: a9d4014386ff0fe388a092f3dcf50f157e460f04	2018-08-28 12:13:03 -07:00
Edward Yang	91797c0672	Replace direct include of caffe2.pb.h with an intermediary header caffe2_pb.h (#10946 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10946 ``` codemod -d . --extensions cc,cpp,cu,cuh,h caffe2/proto/caffe2.pb.h caffe2/proto/caffe2_pb.h ``` Reviewed By: houseroad Differential Revision: D9539945 fbshipit-source-id: 497d04720e8e7e61c05ffe1b23733d0cb774de7e	2018-08-28 11:57:08 -07:00
Lu Fang	5ed62ea6fa	Add Upsample example for torch onnx exporting Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10550 Reviewed By: orionr Differential Revision: D9541932 Pulled By: houseroad fbshipit-source-id: 4d179d189c176482ae919e5cc74607b9d315ed26	2018-08-28 11:39:55 -07:00
Zachary DeVito	22c9bc3117	Resolve builtins using a dict rather than by name (#10927 ) Summary: Changes the approach for resolving builtin ops so that the following works ``` add = torch.add script def foo(x): return add(x, x) ``` This handles cases when people alias torch and torch.nn.functional to shorter names. This works by building a table of id -> builtin name for the know builtin ops in torch, torch.nn.functional, and for any user-defined op created by accessing in torch.ops.foo.bar This allows us to clean up many SugaredValue types in the compiler. Notes: * we now consider any attributes on python modules to be constants (e.g. math.pi, and torch.double). * fixes a bug where we incorrectly allowed attribute lookup on arbitrary pyton objects. It is now restricted to modules only. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10927 Differential Revision: D9527522 Pulled By: zdevito fbshipit-source-id: 0280422af08b4b0f48f302766d5a9c0deee47660	2018-08-28 11:25:11 -07:00
Jerry Zhang	c9d337f436	Split IsEmptyOp (#10918 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10918 att Differential Revision: D9515040 fbshipit-source-id: 53c05c160ba5dda92104aadc2e40801519a2cd28	2018-08-28 10:52:28 -07:00
Jerry Zhang	7de830b879	proper sharing in ShareExternalPointer (#10804 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10804 Make ShareData and ShareExternalPointer to create new storage when the old one is used by multiple tensors. When we need to modify the field of storage, we'll create a new storage instead. Reviewed By: ezyang Differential Revision: D9350686 fbshipit-source-id: 68d2b6b886b0367b0fc4fabfd55b9a480e7388ca	2018-08-28 10:52:26 -07:00
Wei Yang	7f9fd1cc26	allow RandomSampler to sample with replacement (#9911 ) Summary: fixes #7908 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9911 Reviewed By: yf225 Differential Revision: D9023223 Pulled By: weiyangfb fbshipit-source-id: 68b199bef3940b7205d0fdad75e7c46e6fe65ba7	2018-08-28 10:52:25 -07:00
Peter Goldsborough	504d705d0f	Support for CUDNN_HOME/CUDNN_PATH in C++ extensions (#10922 ) Summary: Currently we assume to find cudnn includes and libraries in the `CUDA_HOME` root. But this is not always true. So we now support a `CUDNN_HOME`/`CUDNN_PATH` environment variable that can have its own `/include` and `/lib64` folder. This means cudnn extensions now also get support on the FAIR cluster. soumith fmassa Pull Request resolved: https://github.com/pytorch/pytorch/pull/10922 Differential Revision: D9526856 Pulled By: goldsborough fbshipit-source-id: 5c64a5ff7cd428eb736381c24736006b21f8b6db	2018-08-28 09:40:29 -07:00
Rohan Varma	1421a9d704	added num_directions explanation to docstrings (#10786 ) Summary: Resolving [https://github.com/pytorch/pytorch/issues/10741](https://github.com/pytorch/pytorch/issues/10741). The current docs use `num_directions` quite a bit, without any explanation for them. `num_directions` is set to 2 if the RNN is bidirectional, or 1 otherwise. This change simply adds that to the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10786 Differential Revision: D9480235 Pulled By: zou3519 fbshipit-source-id: f61d1b0d2b943f84d5b7ff83df6fe0965a508a5e	2018-08-28 09:26:06 -07:00
Christian Puhrsch	bee779bc83	StorageImpl scalar_type_ to data_type_ Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10915 Reviewed By: ezyang Differential Revision: D9526416 Pulled By: cpuhrsch fbshipit-source-id: 68e43121d72b1b951c73df5bf7b598854fb0e291	2018-08-28 09:26:04 -07:00
Gregory Chanan	82bb9fbedd	Remove Scalar.local(). (#10917 ) Summary: It's a no-op now that Scalars don't store tensors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10917 Differential Revision: D9520267 Pulled By: gchanan fbshipit-source-id: 5388ff9a4fbb8fc9b9e1ce92208246bf6f08eb92	2018-08-28 07:41:36 -07:00
なるみ	7c7a2ccb58	Update onnx.rst for v0.4 (#10810 ) Summary: Since we don't need `torch.autograd.Variable` anymore, I removed `torch.autograd.Variable` from `onnx.rst`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10810 Differential Revision: D9500960 Pulled By: zou3519 fbshipit-source-id: 1bc820734c96a8c7cb5d804e6d51a95018db8e7f	2018-08-28 07:26:01 -07:00
Edward Yang	de099564e3	Minor copy-edit on README Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10931 Reviewed By: cpuhrsch Differential Revision: D9526248 fbshipit-source-id: 2401a0c1cd8c5e680c6d2b885298fa067d08f2c3	2018-08-27 21:09:36 -07:00
Roy Li	de9cc98e66	Stop copying tensor memory when importing IR Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10487 Differential Revision: D9370084 Pulled By: li-roy fbshipit-source-id: ecff1d5d7d006fd60e4f6238ee86c56ad168bfc8	2018-08-27 19:25:42 -07:00
Zachary DeVito	2c342e50e1	Fix a bug in constant prop (#10923 ) Summary: More support for tuples has uncovered a bug in constant prop where it assumed it can create constant nodes of tuples, even though we cannot easily create a single prim::Constant to represent a tuples. This fix checks when we cannot represent an IValue as a prim::Constant and then stops propagating the node. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10923 Reviewed By: orionr Differential Revision: D9523417 Pulled By: zdevito fbshipit-source-id: 745058c4388d9a5e0fc1553eaa2731e31bc03205	2018-08-27 18:10:17 -07:00
Junjie Bai	157fb46ffc	Add -rdynamic only to linker flags to avoid compiler warnings (#10789 ) Summary: `clang: warning: argument unused during compilation: '-rdynamic'` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10789 Reviewed By: houseroad Differential Revision: D9467385 Pulled By: bddppq fbshipit-source-id: 610550a8f34cfa66b9dfa183752eb129dae21eaa	2018-08-27 17:56:21 -07:00
Edward Yang	f7b02b3a68	Change Tensor/TensorImpl to use c10::intrusive_ptr (#10824 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10824 API additions: - Tensor(c10::intrusive_ptr<TensorImpl,UndefinedTensor>&&) - Tensor(const c10::intrusive_ptr<TensorImpl,UndefinedTensor>&) - Tensor::operator=(Tensor&&) && (for completeness sake) - TensorBase::unsafeGetTensorImpl() - TensorBase::unsafeReleaseTensorImpl() - TensorBase::getIntrusivePtr() - TensorImpl::type_id() - Tensor::set_data() - Tensor::is_same(Tensor) - Tensor::use_count() - Tensor::type_id() - Tensor::scalar_type() - WeakTensor::is_same(WeakTensor) - intrusive_ptr::weak_use_count() - weak_intrusive_ptr::weak_use_count() - c10::raw::intrusive_ptr::{incref,decref,make_weak} - c10::raw::weak_intrusive_ptr::{incref,decref,lock} API changes: - Tensor::pImpl is no longer public (and now named tensor_impl_) - Most methods accessed this way are now accessible on Tensor maybe_zero_dim() and set_wrapped_number() being prominent exceptions (they are now accessed through unsafeGetTensorImpl()) - Type is no longer friend of Tensor - TensorBase::reset(TensorImpl) is deleted - TensorBase::reset(TensorImpl, bool should_retain) is deleted - TensorBase::swap(TensorBaseImpl&) is deleted; use std::swap instead - TensorBase::get() is deleted; use unsafeGetTensorImpl() instead - TensorBase::detach() is deleted; use unsafeReleaseTensorImpl() instead - TensorBase::retain() is deleted; use _raw_incref() instead - TensorBase::release() is deleted; use _raw_decref() instead - WeakTensor lost most of its methods (it no longer inherits from TensorBase) - TensorImpl::storage() is now a const method - Tensor(TensorBase) constructor removed, instead we go through getIntrusivePtr(). I'm not sure about this change; I happened to have accidentally removed the TensorBase constructor and decided to fix call sites, but I could go the other way. - detail::set_data() is deleted; use Tensor::set_data() instead - c10::raw_intrusive_ptr_target removed; use the functions in c10::raw instead. (The reason for this change, is that it is invalid to cast an intrusive_ptr_target* to a raw_intrusive_ptr_target* to take advantage of the methods. But there is no reason the incref/decref methods shouldn't also work on intrusive_ptr_target; it is primarily an API consideration. We can be more standards compliant by keeping them as functions, which are universally applicable.) - intrusive_ptr::reclaim() and weak_intrusive_ptr::reclaim() now work on pointers of the NullType. (This counts as a bug fix, because the documentation specified that pointers produced by release() are valid to reclaim(), and a release() on a null intrusive_ptr produces the NullType::singleton()) Bug fixes: - Dispatch code for mutable references incorrectly returned a reference to a value argument (which would immediately go out of scope). They now correctly return a tensor by value. - intrusive_ptr copy/move assignment did not work correctly when an object was assigned to itself. We now check for this case and no-op if so. (This bug manifested itself as a Tensor mysteriously becoming an UndefinedTensor after lines of code like 'x = x.mul_(y)') Other changes: - The checked cast functions in Utils.h have now been renamed and detemplatized into checked unwrap functions. - Added type_id() and scalar_type() methods to Tensor - pImpl is no longer public - Documented what the && overloads are doing - All occurrences of 'new TensorImpl' (and similar spellings, like 'new THTensor') have been expunged. This is NO LONGER a valid way to create a new tensor, and if you do this, upon your first incref, you will catch an ASSERT failure saying that only tensors created by intrusive_ptr::release() are valid to reclaim(). Use c10::make_intrusive instead in this situation. - IValue is adjusted to use intrusive_ptr instead of Retainable, and all other sub-classes of Retainable were modified to use intrusive_ptr. When doing this, I had to make the constructors of sub-classes like ConstantList public, so that c10::make_intrusive could invoke them. Fortunately, if you incorrectly stack allocate a ConstantList, and then try to get an intrusive_ptr to it, it will fail, as stack allocated ConstantLists have refcount 0. - IValue very narrowly sidesteps the problem of handling NullType, as it considers intrusive_ptr<TensorImpl> identical to intrusive_ptr<TensorImpl, UndefinedTensor> which is not always true. This was always the case, but there's now a comment explaining what's going on. Some MSVC bugs were uncovered during the preparation of this patch. They are documented as comments in the code. Reviewed By: gchanan Differential Revision: D9481140 fbshipit-source-id: 14a8ea0c231ed88b5715fb86d92730926f9f92fc	2018-08-27 16:11:01 -07:00
Roy Li	f2bb9f0bb5	speed up kl div loss (#10336 ) Summary: Moved kl div loss to aten. benchmarks for 5000 iterations on input size (1000,100) New ``` cuda: forward [0.9736350309103727, 0.9922929517924786, 0.9694818360731006] input requires_grad=True: backward [0.5595634011551738, 0.558339926879853, 0.5546616851352155] double backward [1.2445648494176567, 1.2245905152522027, 1.2349751549772918] target requires_grad=True: backward (new C++) [0.9489959231577814, 0.9553070571273565, 0.9556351029314101] double backward (new C++) [1.8184774098917842, 1.8164670099504292, 1.845708406995982] cpu: forward (new C++) [7.892430987209082, 8.3068826389499, 7.985283812973648] input requires_grad=True: backward (new C++) [4.328460982069373, 4.45323242014274, 4.27946363389492] double backward (new C++) [5.153504415880889, 4.629372010007501, 4.712803596165031] target requires_grad=True: backward (new C++) [3.4181493939831853, 3.3771288259886205, 3.7086612950079143] double backward (new C++) [0.21922698011621833, 0.1858532396145165, 0.19477044604718685] ``` Old ``` cuda: forward [3.101281268056482, 3.068499860819429, 3.0527669726870954] input requires_grad=True: backward [0.5650290949270129, 0.5730433077551425, 0.5588279226794839] double backward [1.1287697306834161, 1.13834543293342, 1.1298578432761133] target requires_grad=True: backward [0.9470391101203859, 0.9560198178514838, 0.9750375030562282] double backward [1.85760727385059, 1.7989214668050408, 1.788982989732176] cpu: forward (new C++) [12.474591840058565, 12.511441555805504, 12.666544185951352] input requires_grad=True: backward (new C++) [7.660991386976093, 7.449987292289734, 7.513917901087552] double backward (new C++) [4.073225498665124, 4.264980792999268, 4.429787891916931] target requires_grad=True: backward (new C++) [3.448499082121998, 3.9072313378565013, 3.2433970272541046] double backward (new C++) [2.126378359273076, 1.9045450473204255, 1.7932004742324352] ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10336 Differential Revision: D9213636 Pulled By: li-roy fbshipit-source-id: 27cc530f6276f58d35dc7a1d56dfc758a0fc4a7b	2018-08-27 16:10:59 -07:00
rohithkrn	f5910c8a36	Add MIOPEN recurrent operator (#10840 ) Summary: The goal of this PR is to enable miopen engine(for hip devices) for recurrent operator and also enable corresponding unit test. bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/10840 Differential Revision: D9518980 Pulled By: bddppq fbshipit-source-id: 214661e79a47c5dc6b712ef0fba986bd99db051f	2018-08-27 15:39:56 -07:00
Tongzhou Wang	8e33451e2e	Make torch.cuda.* take device objects; Update distributed docs (#10833 ) Summary: Commits: 1. Make `torch.cuda.*` take device objects 2. Update `torch.distributed` docs to emphasize calling `torch.cuda.set_device` before `init_process_group` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10833 Differential Revision: D9514241 Pulled By: SsnL fbshipit-source-id: 2497464305fb1e63d6c495291a5744aaa7e2696e	2018-08-27 15:24:42 -07:00
Elias Ellison	58b145f515	Fix negative indices in tracer (#10560 ) Summary: Previously when tracing slicing & select negative indices would get normalized, fixing the index to the size of the traced tensor. This makes the behavior the same as script so aten::select with negative indices is emitted. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10560 Differential Revision: D9493614 Pulled By: eellison fbshipit-source-id: ce7a8bae59863723247208d86b9f2948051ccc6c	2018-08-27 15:19:41 -07:00
Junjie Bai	9aa92bc261	Change the default value of DeviceOption.numa_node_id from -1 to 0 (#10877 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10877 change default value of DeviceOption.numa_node_id to 0 and use has_numa_node_id() to check existence Reviewed By: ilia-cher Differential Revision: D9473891 fbshipit-source-id: 91ac6a152f445644691023110c93d20a3ce80d43	2018-08-27 14:55:46 -07:00
Gregory Chanan	7842b6d0f7	Fix at::optional compile problems on Windows CUDA. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10909 Differential Revision: D9516837 Pulled By: gchanan fbshipit-source-id: fad7e3284e74c599b873ebaae2dcdf5013505855	2018-08-27 14:40:41 -07:00
Zachary DeVito	6ce799edd6	Tuples/Lists can now be inputs/outputs to script and other simple fixes. (#10812 ) Summary: * Fix the necessary pathways so that tuples and lists can be inputs to the script. * prevent linear algebra functions from being run in shape prop because they frequently will error out for nonsense data. * favor schema-driven python input conversion where possible. remaining cases where we directly create Stacks without schema are only for debugging * Make the error messages when calling script/trace functions more pythonic * Simplify FlattenTuples -- now that tuples are supported we can choose to only flatten tuples when needed. This may have to be revisited pending onnx test results, but is necessary for making tuple io work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10812 Differential Revision: D9477982 Pulled By: zdevito fbshipit-source-id: ed06fc426e6ef6deb404602a26c435a7fc40ea0c	2018-08-27 14:40:40 -07:00
Yanghan Wang	f64f6eed3a	move HeatmapMaxKeypointOp unittest to oss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10859 Reviewed By: newstzpz Differential Revision: D9498312 fbshipit-source-id: 08b8a596f774c9102286019f286ca0b74d1f5304	2018-08-27 12:56:46 -07:00
Richard Zou	35beecfe17	fix xfails involving literals (#10905 ) Summary: I missed these in #10900 cc apaszke jamesr66a zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10905 Differential Revision: D9516748 Pulled By: zou3519 fbshipit-source-id: a5c3e3b65a33c339d5c4e9fc160462c3d35705f3	2018-08-27 12:41:06 -07:00
vishwakftw	f940af6293	Bag of Distributions doc fixes (#10894 ) Summary: - Added `__repr__` for Constraints and Transforms. - Arguments passed to the constructor are now rendered with :attr: Closes https://github.com/pytorch/pytorch/issues/10884 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10894 Differential Revision: D9514161 Pulled By: apaszke fbshipit-source-id: 4abf60335d876449f2b6477eb9655afed9d5b80b	2018-08-27 09:55:27 -07:00
Richard Zou	67f6f930a8	Remove FIXME_zerol() from test_jit.py (#10900 ) Summary: The scalar situation has gotten a lot better and now we can remove all instances of FIXME_zerol(). cc zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10900 Differential Revision: D9514206 Pulled By: zou3519 fbshipit-source-id: e4e522f324126c5454cd6de14b832d2d1f6cb0ce	2018-08-27 08:55:08 -07:00
Tongzhou Wang	841d779598	Increase BC for PackedSequence ctor (#9864 ) Summary: PackedSequence is never supposed to be created by user, but unfortunately some community repo is already doing this (e.g., [here](`7c191048ce/torchmoji/model_def.py (L218-L229)`)). Some change we made break the calling pattern `PackedSequence(data=x, batch_sizes=y)`. This patch adds back support for that. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9864 Differential Revision: D9011739 Pulled By: SsnL fbshipit-source-id: 0e2012655d7f4863ec54803550df30874ec35d75	2018-08-27 08:25:23 -07:00
Gregory Chanan	c3271b53e4	Remove ability of Scalars to hold Tensors. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10889 Differential Revision: D9512589 Pulled By: gchanan fbshipit-source-id: 8b2b26c9f3a4da31a46f684793ab237e9ef9a323	2018-08-27 07:26:14 -07:00
Edward Z. Yang	3aaad3ecb1	Begin a bestiary of MSVC/NVCC bugs. (#10883 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10883 Differential Revision: D9513997 Pulled By: ezyang fbshipit-source-id: 37db956e57d86471323d284869bb844f5a4753ac	2018-08-27 07:09:47 -07:00
Adam Paszke	c8b246abf3	Prevent JIT from overspecializing to every single size configuration (#10844 ) Summary: Please review the expects carefully to make sure there are no regressions. I tried to go over them one by one when they changed, but it's sometimes easy to miss finer details. Summary of changes: - Renamed `TensorType` to `CompleteTensorType`. Added a new `TensorType` which records only the scalar type, number of dimensions, and device of a value. The argument behind the rename is to encourage people to use `CompleteTensorType` less, as most passes will only have limited information available. To make transition easier `complete_type->cast<TensorType>()` works, and makes our passes work with both kinds of specialization if they don't need extra the extra detail. - Renamed `ArgumentSpec` to `CompleteArgumentSpec`. Added a new `ArgumentSpec`, which matches argument only at the level of the new `TensorType`. - Shape analysis can process graphs with both `CompleteTensorType` and `TensorType`. - Fuser was a part that heavily relied on full shape information being available. Now, we simply try to fuse the largest possible graphs, and have to do run-time checks to make sure they match the code we generate. If they don't, we fall back to regular interpretation. The shape checks are implementing using an optimized method exploiting algebraic properties of shapes with broadcasting, and the relations of broadcasting with pointwise ops. A full written proof of correctness of the shape checking algorithm is included in a comment in `graph_fuser.cpp`. zdevito ezyang mruberry ngimel csarofeen Pull Request resolved: https://github.com/pytorch/pytorch/pull/10844 Differential Revision: D9498705 Pulled By: apaszke fbshipit-source-id: 0c53c2fcebd871cc2a29c260f8d012276479cc61	2018-08-26 09:54:48 -07:00
Jorg Doku	9679fc5fcd	Handling failing test on ROCm. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10854 Reviewed By: ezyang Differential Revision: D9498721 Pulled By: Jorghi12 fbshipit-source-id: 4018383fea5a2a6baff7183b0c0197a4b7a09f20	2018-08-26 07:55:33 -07:00
Yi Cheng	ddc37d7487	Update mobile predictor caller's interface Summary: Update all the caller for the new interface Reviewed By: highker Differential Revision: D9323167 fbshipit-source-id: a39335ceb402db0719f5f2314085ba9a81380308	2018-08-24 23:40:05 -07:00
Christian Puhrsch	d632ccd2c1	Cache isContiguous and numel Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10696 Differential Revision: D9437963 Pulled By: cpuhrsch fbshipit-source-id: 7217682f5e4b69c73d943411d738e4892bb465f5	2018-08-24 22:40:39 -07:00
Qinqing Zheng	17dac3e17f	Create class constant for string literal 'blob_names' Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10827 Reviewed By: boryiingsu Differential Revision: D9484567 fbshipit-source-id: 275eddc9406b5f427d72c0ab9b0da481b5e59ece	2018-08-24 22:11:43 -07:00
Jongsoo Park	8253cfaa72	Conv BN fusion for 3D conv (#10239 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10239 Make Conv + BN fusion also work for 3D convolutions Reviewed By: duc0 Differential Revision: D9176314 fbshipit-source-id: 6604aa569c5c3afdb4480a5810890bc617e449c4	2018-08-24 21:24:36 -07:00
Adam Paszke	542aadd9a7	Stop using symbolic override for tracing RNNs (#10638 ) Summary: This disables the symbolic override hacks and makes tracing emit the recently added ATen ops for RNNs (`aten::lstm`, `aten::gru`, ...). I managed to reuse pretty much all of the translation code for their symbolics. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10638 Differential Revision: D9385830 Pulled By: apaszke fbshipit-source-id: ff06ef7b1ae7c3b7774825e0991bc3887e1ff59b	2018-08-24 20:25:58 -07:00
Bram Wasti	f2f6e6c0e8	Add registry to pybind_state (#10759 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10759 Adding a basic registry pattern to pybindstate so that we can have separate 'cc' files register module updates. This is substantially cleaner than using multiple pybind modules (which have been known to cause bugs) Reviewed By: bddppq Differential Revision: D9441878 fbshipit-source-id: af9e9e98385e92b58ca50e935678328c62684d8e	2018-08-24 17:25:02 -07:00
Orion Reblitz-Richardson	c172ffb632	Remove the nanopb submodule Summary: After making changes internally, really remove the nanopb submodule. Finalizes https://github.com/pytorch/pytorch/pull/10772 Reviewed By: yns88 Differential Revision: D9504582 fbshipit-source-id: 4517607e5c8054a255c3984b8265f48fede2935b	2018-08-24 16:24:57 -07:00
Peter Goldsborough	148ea2a653	Create at::linear (#10799 ) Summary: Resubmission of https://github.com/pytorch/pytorch/pull/10755 with fix for ONNX ezyang jamesr66a Pull Request resolved: https://github.com/pytorch/pytorch/pull/10799 Differential Revision: D9482168 Pulled By: goldsborough fbshipit-source-id: 85d4bdfcf0d451f2e7a1c83c5f5415cdd6caacdc	2018-08-24 16:02:08 -07:00
Syed Tousif Ahmed	1fbabff76a	Refactor THCNumerics and add common math functions for at::Half (#10301 ) Summary: Summary: This PR is a followup of mruberry's https://github.com/pytorch/pytorch/pull/9318/. It tries to achieve the following: - Specializing std common math functions for `at::Half` type. - Create `CUDANumerics.cuh` to contain necessary parts from `THCNumerics.cuh`. - Update `THCNumerics.cuh` with new usage and comments to demonstrate the best practice for developers and hence, making way for its deprecation. - Remove legacy/redundant code path. - Remove unused CUDA HALF macros (see separate PR https://github.com/pytorch/pytorch/pull/10147) Comments: `CUDANumerics.cuh` contains mathematical functions that are either not in the std namespace or are specialized for compilation with CUDA NVCC or CUDA NVRTC. This header is derived from the legacy `THCNumerics.cuh`. Following are some rationale behind why some functions were kept while others were removed: - All arithmetic can now be done in ATen using binary cuda kernel or CUDA tensor pointwise apply (check https://github.com/pytorch/pytorch/pull/8919 and `CUDAApplyUtils`). `at::Half` comparisons rely on implicit conversion to float. - Functions that are c/c++ standard compliant, have been specialized for user defined types, for instance, the std namespace has been opened up for `at::Half`, that defines math function definitions for `at::Half`. Check `Half-inl.h` - Some standard compliant functions are specialized here for performance reasons. For instance, `powi` is used for `pow` calculation on integral types. Moreover, `abs`, `isinf`, `isnan` are specialized to save one API call vs when used with std. Although this is subject to change, depending on if we really care about saving one API call. - Numeric limits such as `max/min` is removed since they call standard defines. Moreover, numeric limits for `at::Half` is present in `Half-inl.h`. I understood that HIP has some issue with `std::numeric_limits` and this the related github issue I found: https://github.com/ROCm-Developer-Tools/HIP/issues/374. AlexVlx mentions that the issue can be avoided by launching `std::numeric_limits` in `__device__`. Since, we are launching lambdas with device contexts, I don't see an issue why `std::numeric_limits` won't compile on HIP if launched with device context within a kernel, unless I am not aware of the real reason why max/min was there in THCNumerics in the first place. (Haven't ever tried a build with HIP). Here are some reference PRs that was handy in refactoring TH into ATen: - https://github.com/pytorch/pytorch/pull/6786 - https://github.com/pytorch/pytorch/pull/5475 - https://github.com/pytorch/pytorch/pull/9401 - https://github.com/pytorch/pytorch/pull/8689 - https://github.com/pytorch/pytorch/pull/8919 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10301 Differential Revision: D9204758 Pulled By: soumith fbshipit-source-id: 09f489c1656458c02367b6cd31c3eeeca5acdc8a	2018-08-24 16:02:06 -07:00
Gregory Chanan	87a7840fa6	Remove Tensor constructor of Scalar. (#10852 ) Summary: This is along the way of removing Tensor as a member of the tagged union in Scalar. This simplifies ordering dependencies, because currently Scalar and Tensor both depend on each other (so we introduce a TensorBase). Also, this API isn't particularly useful publicly: we can't autograd through Scalars, so you still need a Tensor overload basically everywhere anyway. I'm undecided what the final API should be here. We could keep a Tensor constructor on Scalar, but have it generate a local scalar; this is convenient but given this API used to be non-synchronizing, it may not be the best. For now, I'm just using _local_scalar, which is clear, although we should get rid of the prefix _ if that's the API we intend to promote. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10852 Reviewed By: ezyang Differential Revision: D9496766 Pulled By: gchanan fbshipit-source-id: 16f39b57536b9707132a5a4d915650c381bb57db	2018-08-24 16:02:05 -07:00
Edward Yang	0d5584d8d7	Revert D9492561: [pytorch][PR] Moving the operator argument to the front for kernelPointwiseApply. Differential Revision: D9492561 Original commit changeset: d0f0e2ab7180 fbshipit-source-id: fc822e63b11866195ff7883f360338a41e25d9e2	2018-08-24 16:02:04 -07:00
Elias Ellison	0ef5cfd28c	fix ivalue printing for lists (#10777 ) Summary: Fixing the printing of IValue lists, which didn't work previously. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10777 Differential Revision: D9474264 Pulled By: eellison fbshipit-source-id: 0c7d6e7ecaa3f7908b131ac9f1036f19ac4f8b4f	2018-08-24 16:02:03 -07:00
Adam Paszke	983e0f2413	Remove Node::invalidateSchema (#10822 ) Summary: The schema_ field is a private and internal cache for nodes, and no methods meant to manipulate it should be publicly visible. This call wasn't even necessary at its call site, since removeInput will reset the schema by itself. zdevito jamesr66a Pull Request resolved: https://github.com/pytorch/pytorch/pull/10822 Reviewed By: zdevito Differential Revision: D9498683 Pulled By: apaszke fbshipit-source-id: 42e1743e3737cb7d81f88e556204487d328c0e47	2018-08-24 16:02:01 -07:00
Elias Ellison	74e6a666b3	If none of the schema match, add ImplicitTensorToNum conversions where needed. (#10180 ) Summary: When matching schema, first try to match without adding TensorToNum conversions. Then make another pass where TensorToNum conversions are allowed. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10180 Differential Revision: D9438153 Pulled By: eellison fbshipit-source-id: 80541b5abd06e9d4187e89dda751f44dab6f58c5	2018-08-24 16:02:00 -07:00
Yunus Rahbar	474684cf03	Re-sync with internal repository (#10868 )	2018-08-24 15:48:03 -07:00
Yinghai Lu	8044dc4eb8	Support new Reshape semantics (#10848 ) Summary: Since ONNX opset version >5, Reshape changed semantics to take a shape tensor as input instead of relying on `shape` attribute to decide what shape to reshape to. ONNXIFI op has been postponing this change as some of the backends such as TensorRT were not ready. Now that the backends have adopted this semantics, we can remove the legacy mode and output opset version 7 ONNX models. This change also flushes out some of the bugs and new requirement. - Converting shape info into int64 tensor - Fix a bug when we output the shape tensor in the mapped workspace instead of the original workspace Pull Request resolved: https://github.com/pytorch/pytorch/pull/10848 Reviewed By: houseroad Differential Revision: D9495121 Pulled By: yinghai fbshipit-source-id: a6f44a89274c35b33fae9a429813ebf21d9a3d1a	2018-08-24 11:46:41 -07:00
Adam Paszke	8130b1a950	Ignore stack frames coming from python3 object file (#10627 ) Summary: goldsborough Pull Request resolved: https://github.com/pytorch/pytorch/pull/10627 Reviewed By: ezyang Differential Revision: D9384411 Pulled By: apaszke fbshipit-source-id: ce4f6edb9ffbd0c7e320b9347da10399de472150	2018-08-24 11:26:21 -07:00
Jerry Zhang	6e2f6dc6e6	Move Allocator and Device to ATen/core Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10798 Reviewed By: ezyang Differential Revision: D9466602 fbshipit-source-id: f5bda17045076d8c81be9fa5a0749c97bf274b5f	2018-08-24 11:26:19 -07:00
Wei Yang	f1df85d799	bug-fix in normal_( ) (#10846 ) Summary: - fixes #10642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10846 Differential Revision: D9495014 Pulled By: weiyangfb fbshipit-source-id: 35a9fc349f9f0c21a24141f29c62853ab6a68dae	2018-08-24 11:26:18 -07:00
Jorg Doku	313139d14e	Moving the operator argument to the front for kernelPointwiseApply. (#10829 ) Summary: Currently on PyTorch AMD, memory accesses on the TensorInfo struct contained in the Operators passed into the kernelPointwiseApply kernel leads to hangs on the HCC runtime. Permuting the argument order such that the operator is first alleviates this issue and the kernel hangs disappear. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10829 Reviewed By: ezyang Differential Revision: D9492561 Pulled By: Jorghi12 fbshipit-source-id: d0f0e2ab7180e55846db909f2744b8c8b110205e	2018-08-24 11:10:43 -07:00
Lu Fang	e3d12d7afb	Automatic update of fbcode/onnx to 6146a85d371481222c10ede4430ad5476e60de87 (#10831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10831 Previous import was 7848f1e0414ba3b2e263609d93d46fd60790b2e9 Included changes: - [6146a85](https://github.com/onnx/onnx/commit/6146a85): Check pybind version (#1315) <Changming Sun> - [2cbf740](https://github.com/onnx/onnx/commit/2cbf740): Domain exists in GraphProto but not in Node (#1310) <Ryan Hill> - [9b874e9](https://github.com/onnx/onnx/commit/9b874e9): [Title] Add optimization pass eliminating nop Pad (#1307) <Tingfan Wu> Reviewed By: yinghai Differential Revision: D9485475 fbshipit-source-id: 3adb4e6e182278fd2abe5068a9d4569763e0ff0c	2018-08-24 10:54:40 -07:00
Orion Reblitz-Richardson	3c9775fff8	Remove nanopb since we've switched to protobuf (#10772 ) Summary: We no longer use nanopb in PyTorch (or Caffe2) so removing. All protobuf manipulation should go through standard protobuf, which is statically linked inside libcaffe2.so by default. cc zdevito pjh5 ezyang Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10772 Reviewed By: pjh5 Differential Revision: D9465894 Pulled By: orionr fbshipit-source-id: 8cdf9f1d3953b7a48478d381814d7107df447201	2018-08-24 10:54:38 -07:00
Orion Reblitz-Richardson	8c13971f57	Remove protobuf require and use requirements.txt (#10771 ) Summary: In prep for making FULL_CAFFE2 default, users shouldn't be required to have protobuf installed. cc pjh5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10771 Reviewed By: pjh5 Differential Revision: D9474458 Pulled By: orionr fbshipit-source-id: 3e28f5ce64d125a0a0418ce083f9ec73aec62492	2018-08-24 10:39:40 -07:00
Gregory Chanan	474bd60bad	Provide a tensor overload to mul_out_sparse_scalar. (#10828 ) Summary: This is a small part of the effort to remove Tensor as a tagged member in Scalar because it is inconsistent with how we normally do overloads. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10828 Differential Revision: D9485049 Pulled By: gchanan fbshipit-source-id: 103f5cc03bb7775cd2d3a0a5c0c5924838055f03	2018-08-24 09:39:26 -07:00
Sam Gross	e146518e46	Fix AT_CUDA_CHECK and AT_CUDNN_CHECK macros (#10834 ) Summary: Previously, the macros evaluated the expression multiple times on error. For example: ``` AT_CUDA_CHECK(cudaStreamWaitEvent(ptr->stream, event, 0)); ``` would previously expand to ``` if (cudaStreamWaitEvent(ptr->stream, event, 0) != cudaSuccess) { AT_ERROR("CUDA error: ", cudaGetErrorString(cudaStreamWaitEvent(ptr->stream, event, 0))); } ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10834 Differential Revision: D9493257 Pulled By: colesbury fbshipit-source-id: d2473020fd83a25aa421171d19c8dfe559155a9b	2018-08-24 09:09:18 -07:00
Richard Zou	ca567862b2	Support multidimensional indexing (#10787 ) Summary: Part of #10774. This PR does the following: - Support ast.ExtSlice in the frontend. This is done by returning a list of ast.Index and ast.Slice. - Support multidimensional indexing with ints and slices The general approach is to desugar multidimensional indexing into at::slice, at::select operations. This is exactly how normal pytorch does indexing (by desugaring it into at::slice, at::select, and other ops). I used [this code](https://github.com/pytorch/pytorch/blob/master/torch/csrc/autograd/python_variable_indexing.cpp) as reference. We should be able to copy the rest of this to implement the missing indexing features in script (indexing with ellipses, tensors, sequences, etc). After I'm done implementing the missing indexing features in future prs, I can try to templatize python_variable_indexing.cpp so that it can work with both JIT script and normal pytorch indexing, but right now I'm not sure if that's a good idea or not. cc zdevito jamesr66a apaszke wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/10787 Differential Revision: D9481402 Pulled By: zou3519 fbshipit-source-id: 78c9fa42771a037d157879e23e20b87401cf1837	2018-08-24 08:10:32 -07:00
Xiaodong Wang	6993e4a9f7	Caffe2 Functional enforcing inplace output (#10797 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10797 A few operators enforces in-place output (e.g., running mean/var for SpatialBN). Functional right now doesn't follow the inplace_enforced_ rules in OpSchema and therefore, the RunNetOnce() will fail on OpSchema->Verify(). Edit the output_names in Functional following the rules to pass check. Reviewed By: jerryzh168 Differential Revision: D9470582 fbshipit-source-id: 168efeccecc32184bd1d02f3fefe8e61faa4e0f4	2018-08-23 22:42:47 -07:00
Yi Cheng	8da4167129	Fix performance regression (#10835 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10835 The last diff of constructor cause performance regression in cold run. This one tried to fix this. Reviewed By: highker Differential Revision: D9489617 fbshipit-source-id: a77c2e2c903a73e2ad9806b4f9c209cdb751442f	2018-08-23 19:55:23 -07:00
Teng Li	df2d48b42c	Added PrefixStore, pybind, test for group backward compatibility (#10762 ) Summary: Added Prefix Store support. This will make group be backward compatible. Test is covered too. ``` tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ ./FileStoreTest Using temporary file: /tmp/testoglRl4 Using temporary file: /tmp/testepZIpB Test succeeded tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ ./TCPStoreTest Test succeeded ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10762 Differential Revision: D9484032 Pulled By: teng-li fbshipit-source-id: 85754af91fe3f5605087c4a2f79ae930a9fd1387	2018-08-23 18:10:37 -07:00
Duc Ngo	61b34d42e7	nomnigraph - isSubgraphMatch returns the matched Subgraph & map from MatchNodes to graph nodes (#10605 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10605 Make isSubgraphMatch returns a subgraph and map from MatchNodes to graph nodes in the result, which makes it easier to write graph fusion logic. Also include some more helper methods for NN subgraph matcher. Reviewed By: bwasti Differential Revision: D9374931 fbshipit-source-id: 3a273295eec81a43027ec3a9e835d27f00853df9	2018-08-23 16:40:19 -07:00
Tigran Hakobyan	ee022a476a	Added this-consts to all methods on SymbolicVariable (#10805 ) Summary: Self explanatory. See https://github.com/pytorch/pytorch/issues/9109 or T32954812 for more details Pull Request resolved: https://github.com/pytorch/pytorch/pull/10805 Reviewed By: ezyang Differential Revision: D9477686 Pulled By: hakobyant fbshipit-source-id: 73dd84e5295e4c749bd6416ce2f6eb7590f05cbc	2018-08-23 16:25:27 -07:00
Peter Goldsborough	9403e0cac0	Use ATen implementation of RNNs (#10761 ) Summary: apaszke recently ported RNNs from Python into ATen, which means we can replace our implementation in the C++ API (written by ebetica) with the ATen implementation, which cleans up a lot of code (+99, -323). Thanks apaszke! I also added the `bidirectional` and `batch_first` options to the C++ API RNN options, just because why not. apaszke ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/10761 Differential Revision: D9443885 Pulled By: goldsborough fbshipit-source-id: b6ef7566b9ced2b2f0b2e1f46c295b6f250c65a8	2018-08-23 16:12:14 -07:00
Johannes M Dieterich	a4c59a9dab	MIOpen integration, more tests enabled, bug fixes (#10612 ) Summary: * first integration of MIOpen for batch norm and conv on ROCm * workaround a ROCm compiler bug exposed by elementwise_kernel through explicit capture of variables in the densest packing * workaround a ROCm compiler bug exposed by having `extern "C" __host__` as a definition and just `__host__` in the implementation through the hipify script * use fabs() in accordance with C++11 for double absolute, not ::abs() which is integer-only on ROCm * enable test_sparse set on CI, skip tests that don't work currently on ROCm * enable more tests in test_optim after the elementwise_bug got fixed * enable more tests in test_dataloader * improvements to hipification and ROCm build With this, resnet18 on CIFAR data trains without hang or crash in our tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10612 Reviewed By: bddppq Differential Revision: D9423872 Pulled By: ezyang fbshipit-source-id: 22c0c985217d65c593f35762b3eb16969ad96bdd	2018-08-23 15:24:47 -07:00
Zachary DeVito	3d43a82440	Add support for vararg style functions. (#10250 ) Summary: Things like `zeros(1,2,3, dtype=torch.int)` are now supported in the script by altering tryMatchSchema to auto-construct the list `[1,2,3]` when it sees inlined members of the list as the last positional arguments. I suggest reading the commits individually, since the first two incrementally change how we do tryMatchSchema to get it ready for adding vararg list conversion, while the third actually does the modification. closes #10632 closes #8516 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10250 Differential Revision: D9478235 Pulled By: zdevito fbshipit-source-id: 0c48caf7a6184e463d9293d97015e9884758ef9c	2018-08-23 15:10:36 -07:00
Edward Yang	9dbcc9cebd	Move _raw_* intrusive pointer manipulations to raw_intrusive_ptr_target (#10779 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10779 The idea is to let classes opt-in to providing these methods by default. Reviewed By: jerryzh168 Differential Revision: D9466076 fbshipit-source-id: b6beee084cc71d53ce446cdc171d798eeb48dc12	2018-08-23 14:32:24 -07:00
Pengyao Chen	dec3ed7b49	Increase the limit for Proto size (#10745 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10745 ParseProtoFromLargeString hits limit when using recurring v2. To unblock warmup project, we can increase the limit temporarily. More details in this post -- https://fb.facebook.com/groups/264913123977784/permalink/463566404112454/ Differential Revision: D9436368 fbshipit-source-id: 54488f27ef941cab679843cb0c502095dd056c1b	2018-08-23 13:55:50 -07:00
Andrei Maximov	432b3adffc	Print blob sizes on fatal signal (#10766 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10766 Added a `Workspace::ForEach(...)` API for accessing the global set of existing Workspace instances. This is used in the signal handler to print blob info on the thread receiving a fatal signal. Reviewed By: mraway Differential Revision: D9147768 fbshipit-source-id: a94d0b5e6c88390a969ef259ecb8790173af01a4	2018-08-23 13:39:55 -07:00
Edward Yang	82ddeb7f2b	Using shared implementation in Tensor (#10619 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10619 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9047 Reviewed By: jerryzh168 Differential Revision: D8417101 fbshipit-source-id: 98e0a3275864283c2f06d28f4c9b859b5827ed4d	2018-08-23 13:39:53 -07:00
Sam Gross	23a366be33	Use ATen native functions for THCTensor_cadd/cmul/cdiv/csub (#10707 ) Summary: This seems to save a few percent in binary size in libcaffe2_gpu.so, but the effect may not be real. In fact, deleting some functions can cause the binary size to increase (perhaps due to alignment issues). cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/10707 Differential Revision: D9409009 Pulled By: colesbury fbshipit-source-id: 282931e562e84e316a33ac6da4788c04c2984f08	2018-08-23 13:31:03 -07:00
mruberry	0f5c8edfd3	Removes unused THCState code paths (#9735 ) Summary: To prepare THCState for refactoring into ATen, this PR removes unused THCState code paths. In particular, it: - Removes the UVA Allocator - Removes the THDefaultDeviceAllocator - Respects the 1 BLAS and 1 sparse handle per device reality - Removes kernel p2p access - Removes setting p2p access - Removes the GCHandler code path - Removes many unused THCState_... functions - Removes THCThreadLocal.h/.cpp It does not change the preexisting external behavior of any remaining function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9735 Differential Revision: D9438558 Pulled By: SsnL fbshipit-source-id: dde9acbec237a18bb6b75683e0526f7ff1c9a6ea	2018-08-23 13:10:05 -07:00
Lin Li	ab9e7ae23e	Add CUDA implementation of LARS --caffe2 (#10509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10509 This diff enables CUDA implementation of LARS operator in caffe2. Reviewed By: enosair Differential Revision: D9318356 fbshipit-source-id: 365b9f01e3afd4d9d3ba49155e72e728119f40c5	2018-08-23 12:55:57 -07:00
Will Feng	b14f2e899c	Preserve sparse tensor shape and dim invariants, and add scalar tensor support (#9279 ) Summary: When 0-sized dimension support is added, we expect an empty sparse tensor to be a 1-dimensional tensor of size `[0]`, with `sparseDims == 1` and `denseDims == 0`. Also, we expect the following invariants to be preserved at all times: ``` _sparseDims + _denseDims = len(shape) _indices.shape: dimensionality: 2, shape: (_sparseDims, nnz) _values.shape: dimensionality: 1 + _denseDims. shape: (nnz, shape[_sparseDims:]) ``` This PR fixes various places where the invariants are not strictly enforced when 0-sized dimension support is enabled. Tested and `test_sparse.py` passes locally on both CPU and CUDA with the `USE_TH_SIZE_ZERO_DIM` flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9279 Differential Revision: D8936683 Pulled By: yf225 fbshipit-source-id: 12f5cd7f52233d3b26af6edc20b4cdee045bcb5e	2018-08-23 10:10:24 -07:00
なるみ	0eb2c83006	Fix link in THNN/README.md Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10821 Differential Revision: D9481118 Pulled By: soumith fbshipit-source-id: 0a416202eb4db025ec7d395e70344cbbf626fec0	2018-08-23 09:25:16 -07:00
Fritz Obermeyer	fcfb1c1979	Make more distributions jittable Summary: This uses zou3519's new `torch.broadcast_tensors()` #10075 to make `Categorical.log_prob()` and the `*Normal.__init__()` methods jittable. Previously `.log_prob()` was failing due to calls to `torch._C.infer_size()` with errors like ``` def log_prob(self, value): if self._validate_args: self._validate_sample(value) > value_shape = torch._C._infer_size(value.size(), self.batch_shape) if self.batch_shape else value.size() E RuntimeError: expected int at position 0, but got: Tensor ``` After this change I'm able to jit many more of Pyro's tests. Reviewed By: ezyang Differential Revision: D9477487 Pulled By: apaszke fbshipit-source-id: 5f39b29c6b8fa606ad30b02fefe2dfb618e883d6	2018-08-23 08:09:49 -07:00
Erik Brinkman	529fc68df2	Update docs with clean (#10819 ) Summary: Add tip about cleaning if installing ninja after a build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10819 Reviewed By: soumith Differential Revision: D9480095 Pulled By: erikbrinkman fbshipit-source-id: 96ae1387038afe6964a1bd1e2186468f6a5ea12f	2018-08-23 07:25:19 -07:00
Edward Yang	deda05e59f	Revert D9395814: move HeatmapMaxKeypointOp unittest to oss Differential Revision: D9395814 Original commit changeset: 25073eb6b143 fbshipit-source-id: 56f2b7b57e3c6361e2d78e5ba7850ea3b89e98fb	2018-08-23 06:54:29 -07:00
Xianjie Chen	b885dea300	parallize the dense part in event models Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10768 Reviewed By: Wakeupbuddy Differential Revision: D9445750 fbshipit-source-id: b8c2ddfe3ccb9278506de15a5e43bada016408f7	2018-08-22 22:40:07 -07:00
Elias Ellison	5c0eece2fd	Force types on values returned from if blocks to be equivalent (#10281 ) Summary: When emitting if Branches, check that the types on each value returned are equivalent. As with reassignment of values, tensors are not forced to be the same shape or subtype. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10281 Differential Revision: D9466566 Pulled By: eellison fbshipit-source-id: 746abdeb34a0f68806b8e73726ad5003b536911c	2018-08-22 19:55:38 -07:00
Yanghan Wang	9a43fc5eaa	move HeatmapMaxKeypointOp unittest to oss Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10674 Reviewed By: newstzpz Differential Revision: D9395814 fbshipit-source-id: 25073eb6b143fc1e7cbf5f887545d2b7df15c9a9	2018-08-22 19:11:10 -07:00
Yi Cheng	4aa5075cae	update the constructor to accept the PredictorConfg only to set up the predictor (#9483 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9483 The interface is updated to accept the config to construct the predictor. Reviewed By: highker Differential Revision: D8872999 fbshipit-source-id: 3ca54d644970823fc33c0ade9a005e12f52e2b24	2018-08-22 19:11:09 -07:00
François Garillot	f0ec3bfa56	Changes for Python3 compatibility (#10524 ) Summary: Review by tomdz volkhin anshulverma Pull Request resolved: https://github.com/pytorch/pytorch/pull/10524 Reviewed By: ezyang Differential Revision: D9328001 Pulled By: huitseeker fbshipit-source-id: 144721c4fd9a1ea6cf6673793416f20cb448aa93	2018-08-22 18:55:01 -07:00
Teng Li	44b47fd7f3	Working pybind version of MPI process group and abort() pybind (#10606 ) Summary: This will make pybind version of MPI PG work. The issue is the scope of the tensor list won't be available for the MPI worker thread. So we pass the vector by value instead. Also added recv_anysource pybind to make it work. The front-end API will wrap one level up with an int for this function. So taking a tensor should be the easiest way for now. Also added abort pybind and fixed the flaky test. ``` tengli@devfair033:~/new_pytorch/pytorch/torch/lib/build/c10d/test$ mpirun -np 8 ProcessGroupMPITest Test successful Test successful Test successful Test successful Test successful Test successful Test successful Test successful ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10606 Differential Revision: D9474393 Pulled By: teng-li fbshipit-source-id: cca236c333656431e87d0d3573eeae9232c598b0	2018-08-22 18:26:04 -07:00
Wei Wen	6c75fc0aa3	Intergrating stochastic quantization to easgd to reduce communication + supporting quantization on both sides (split from D8849770) (#10644 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10644 Depends on D8493264 Reviewed By: chocjy, boryiingsu Differential Revision: D9347706 fbshipit-source-id: 6fdcc5b61098bf47ec9391b1f009b0e6a0615842	2018-08-22 17:10:03 -07:00
Adam Paszke	f72e813c2f	Allow tracing functions that take tuples of tensors as inputs (#10637 ) Summary: And return tuples. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10637 Reviewed By: eellison Differential Revision: D9385892 Pulled By: apaszke fbshipit-source-id: 542f4444d909fb246d7f1d88d6fb98345de2d431	2018-08-22 15:37:10 -07:00
Jesse Hellemn	043a2e36e5	Removing setup_caffe2.py (#10734 ) Summary: FULL_CAFFE2=1 python setup.py (install \| build_deps develop) should be all anyone needs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10734 Reviewed By: orionr Differential Revision: D9439354 Pulled By: pjh5 fbshipit-source-id: 0169afcda4f8f38c57498ba2151f7654ecce6070	2018-08-22 15:37:07 -07:00
Richard Zou	6c84f7fea0	Relax RHS type assert for augassign (#10730 ) Summary: Augassign (i.e., `x += 1`) gets desugared to an assignment of a binop (`x = x + 1`). Right now we assert that the RHS of the binop is a tensor, but it really doesn't have to be because we support scalar/scalar ops and also list-list ops (i.e., `[1, 2] + [2, 3]`). Pull Request resolved: https://github.com/pytorch/pytorch/pull/10730 Differential Revision: D9465110 Pulled By: zou3519 fbshipit-source-id: 7b118622701f09ce356aca81b8db743d9611097b	2018-08-22 15:10:33 -07:00
James Reed	d40a598777	Back out "[pytorch][PR] Create at::linear" (#10785 ) Summary: Multiple failing external and internal CI signals were ignored when this commit was landed. goldsborough please fix the text failures and resubmit this change as a new PR Pull Request resolved: https://github.com/pytorch/pytorch/pull/10785 Reviewed By: ezyang Differential Revision: D9466791 Pulled By: jamesr66a fbshipit-source-id: b260e93bac95d05fd627c64e620b6aefb5045949	2018-08-22 14:39:59 -07:00
James Reed	6fcac354c5	Erase ListConstruct nodes for ONNX export (#10713 ) Summary: ONNX doesn't support this. Instead flatten the inputs to the ListConstruct op and inline it into the subsequent usage Pull Request resolved: https://github.com/pytorch/pytorch/pull/10713 Differential Revision: D9458508 Pulled By: jamesr66a fbshipit-source-id: 0b41e69320e694bb2f304c6221864a39121e4694	2018-08-22 14:39:58 -07:00
Tongzhou Wang	de11a5fb28	Resubmit #8322 with scipy version check Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10775 Differential Revision: D9458207 Pulled By: SsnL fbshipit-source-id: f2b0dbf2d236134afded9b15d8bf55ff98f50e7b	2018-08-22 13:39:49 -07:00
Gregory Chanan	ee3e48d34b	Move Backend, Layout, ATenGeneral, Deprecated, Generator to ATen/core. (#10740 ) Summary: I included "legacy" includes in the old spots for Backend, Generator, Layout; it seemed unlikely that the other ones had direct user includes. This is another step on the path to move Type/Tensor to ATen/core. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10740 Reviewed By: ezyang Differential Revision: D9435888 Pulled By: gchanan fbshipit-source-id: 89f4f0f445d4498a059d3a79069ba641b22bbcac	2018-08-22 13:39:46 -07:00
Chetter2	5ca2713a8b	Fix performance of WeightedRandomSampler (#10636 ) Summary: Since https://github.com/pytorch/pytorch/pull/8958 was merged, the BatchSampler samples 0d tensors from WeightedRandomSampler instead of integers. It significantly reduces performance. This PR fix it the same way as https://github.com/pytorch/pytorch/pull/10361 fix DistributedSampler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10636 Differential Revision: D9423869 Pulled By: zou3519 fbshipit-source-id: f94da2d4cccf70e63beea6cfc3d1230b5610ae44	2018-08-22 13:15:48 -07:00
Wei Wen	0e30fa6f3c	Faster random number generation in fused_rowwise_random_quantization_ops (#10634 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10634 ``` Trying example: test_speed_of_rand_quantization(self=<caffe2.caffe2.python.operator_test.rand_quantization_op_speed_test.TestSpeedFloatToFusedRandRowwiseQuantized testMethod=test_speed_of_rand_quantization>, bitwidth_=2, random_=True, data_shape_=array([1024, 1224]), gc=, dc=[, device_type: 1]) Sub+Scale+Sum time: 1.9944190979003908 ms Quantizing time: 2.080512046813965 ms (1.0431669296609765X) De-quantizing time: 0.7375001907348633 ms (0.36978195380863577X) ``` ``` Trying example: test_speed_of_rand_quantization(self=<caffe2.caffe2.python.operator_test.rand_quantization_op_speed_test.TestSpeedFloatToFusedRandRowwiseQuantized testMethod=test_speed_of_rand_quantization>, bitwidth_=1, random_=True, data_shape_=array([1024, 1224]), gc=device_type: 1, dc=[, device_type: 1]) Sub+Scale+Sum time: 1.6691923141479492 ms Quantizing time: 7.500243186950684 ms (4.493336761366071X) De-quantizing time: 1.1209726333618164 ms (0.6715658967876477X) ``` Reviewed By: jspark1105 Differential Revision: D8849770 fbshipit-source-id: 2bb2bac7e633f647f38e419ce980b8958f3bcae2	2018-08-22 13:15:46 -07:00
Junjie Bai	754ec9e386	Reduce rocm link time with ThinLTO Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10758 Differential Revision: D9467554 Pulled By: bddppq fbshipit-source-id: 6853ccd96ac3209e062c110913ea37d6840c8134	2018-08-22 13:15:45 -07:00
Julian Rosenblum	9767951ca8	Remove regex matching from undefined_tensor_test, fixes #10013 (#10702 ) Summary: Don't regex against strings that may have come from the backtrace. Better to just not regex at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10702 Reviewed By: ezyang Differential Revision: D9406154 Pulled By: jsrmath fbshipit-source-id: 9b17abee2a6e737a32c05f1e3963aef4b6638a47	2018-08-22 12:39:57 -07:00
Jerry Zhang	b0ad8105d2	Split storage from tensor (#10053 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10053 Tensor in Pytorch 1.0 will have Tensor -> TensorImpl -> Storage -> StorageImpl In this diff we split Storage from Tensor in order to align with this design. We'll have Tensor -> Storage -> StorageImpl after this diff Reviewed By: ezyang Differential Revision: D9384781 fbshipit-source-id: 40ded2437715a3a2cc888ef28cbca9a25b1d5350	2018-08-22 11:55:02 -07:00
Vishwak Srinivasan	5fb9b31ed5	Add matrix_rank (#10338 ) Summary: - Similar functionality as NumPy - Added doc string - Added tests Differential Revision: D9240850 Pulled By: SsnL fbshipit-source-id: 1d04cfadb076e99e03bdf699bc41b8fac06831bf	2018-08-22 09:58:38 -07:00
Anders Papitto	fbd7189949	add explicit flag to build static libtorch (#10754 ) Summary: I've tested locally that this works to build static and non-static binaries with and without CUDA. In terms of ongoing testing, I am working on incorporating this into the release package generation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10754 Differential Revision: D9457423 Pulled By: anderspapitto fbshipit-source-id: aa1dcb17c67c0f0c493a9cf93aca4a6e06b21666	2018-08-22 09:26:07 -07:00
Edward Yang	227635142f	Delete THD master_worker (#10731 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10731 Differential Revision: D9423675 Pulled By: ezyang fbshipit-source-id: 37221e11d84cc3672b944af598ea229a1d4c38cc	2018-08-22 08:54:36 -07:00
Pritam Damania	2fe5fa78fa	Use FinishDeviceComputation instead of adding events in Operator::SyncDevice Summary: The code in Operator::SyncDevice had some duplicate logic and using FinishDeviceComputation sufficed in this case. Reviewed By: yinghai Differential Revision: D9348288 fbshipit-source-id: d8d874bab491e6d448fcd5fa561a8b99d502753b	2018-08-22 01:09:53 -07:00
Ahmed Aly	22446a3619	Productionize CRF layer in PyText (#10362 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10362 This diff implements a manual export from PyText's CRF module to the caffe2 CRF layer. Note that most of the changes in caffe2/python/crf.py are just formatting changes, the only relevant change is the new class CRFUtils. Reviewed By: hikushalhere Differential Revision: D9234126 fbshipit-source-id: 1a67d709034660e8b3d5ac840560b56de63e3f69	2018-08-22 00:25:26 -07:00
Edward Yang	19031c68dc	Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage (#10488 ) Summary: ``` Use intrusive_ptr in Storage; replace unique_ptr<Storage> with Storage This patch does two major changes: - It replaces the use of Retainable in Storage with a new implementation based on intrusive_ptr. This will be necessary because Caffe2 will be using this class to implement intrusive_ptrs, and we need to line these up for the merge. One good thing about the new implementation is that the default copy/move constructors/assignment operators and destructor work automatically, instead of needing to be hardcoded into Storage/Tensor. - It replaces all places where we returned std::unique_ptr<Storage> with Storage, collapsing an unnecessary double indirection that is no longer necessary now that we have correctly working copy/move constructors. I didn't initially want to do step (2), but it was very important to eliminate all bare uses of new Storage and new StorageImpl, and this making the API change was the most straightforward way to do this. HOW TO FIX YOUR CODE IN THE NEW API - You no longer need to dereference the result of tensor.storage() to pass it to set. So, instead of: x.set_(*y.storage()); just write: x.set_(y.storage()); - If you were accessing methods on StorageImpl via the pImpl() method, you must use the dot operator to run pImpl(). Even better; just drop pImpl, we now have method forwarding. So, instead of: storage->pImpl()->data(); just do: storage->data(); // storage.pImpl()->data() works too but is not as recommended - storage->getDevice() is no more; instead use storage->device().index() MISC CODE UPDATES - retain, release, weak_retain, weak_release and weak_lock are now reimplemented using the "blessed API", and renamed to make it clearer that their use is discouraged. - nvcc OS X and general OS X portability improvements to intrusive_ptr - A new comment in intrusive_ptr describing how stack allocated intrusive_ptr_targets work differently than heap allocated ones from c10::make_intrusive CAVEAT EMPTOR - THStorage_weakRetain used to work on strong pointers, but it NO LONGER works with intrusive_ptr. You must reclaim the strong pointer into a real strong pointer, construct a weak pointer from it, and then release the strong and weak pointers. See StorageSharing.cpp for an example. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10488 Reviewed By: gchanan Differential Revision: D9306134 Pulled By: ezyang fbshipit-source-id: 02d58ef62dab8e4da6131e1a24834a65c21048e2	2018-08-21 21:39:55 -07:00
Tongzhou Wang	abb209ef25	Fixes *fft docs (#10760 ) Summary: cc cranmer fixes #10751 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10760 Differential Revision: D9444473 Pulled By: SsnL fbshipit-source-id: a4036773a93981801c1283d69f86e30cb0fe3d6d	2018-08-21 21:09:04 -07:00
Yiming Wu	e5e2514f4e	fix debug_info arg in createOperator and improve reroute_tensor (#10736 ) Summary: -Fixed C2 core.CreateOperator debug info assignment -Improving core.Net.reroute_tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/10736 Differential Revision: D9426659 Pulled By: harouwu fbshipit-source-id: 90caf848c88854e17e568d5f6910dc6c81fd000a	2018-08-21 19:40:16 -07:00
Peter Goldsborough	1068ba667c	Create at::linear (#10755 ) Summary: The optimized code for `linear()` which uses `addmm` when a bias is given was duplicated three times in the ATen and the C++ API. Let's just have `at::linear` and use that everywhere. apaszke ezyang (who mentioned this in #10481) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10755 Differential Revision: D9443881 Pulled By: goldsborough fbshipit-source-id: a64862d1649b5961043d58401625ec267d97d9f3	2018-08-21 19:40:15 -07:00
Bram Wasti	a2ca634e04	Add enforce back to converter.cc Summary: hotfix for B*8 Differential Revision: D9444060 fbshipit-source-id: 368f8463e684c39ec0ac18bcb11a7b6132d9f874	2018-08-21 19:09:22 -07:00
James Reed	ddf187c198	Dont assume serialized integral types were widened to int32 in raw_data (#10718 ) Summary: zdevito et al came to the conclusion that the ONNX spec does not mandate the widening conversion of integral types when serializing tensor data into raw_data, as opposed to serializing the data into int32_data. PyTorch recently made this change in the export code, which caused import in caffe2 to break because it did not match semantics. This fixes that Pull Request resolved: https://github.com/pytorch/pytorch/pull/10718 Differential Revision: D9423712 Pulled By: jamesr66a fbshipit-source-id: 479fbae67b028bf4f9c1ca1812c2c7b0c6cccd12	2018-08-21 18:41:31 -07:00
Aaron Jaech	6325e5aa48	fix typo in error message (#9827 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9827 changed unitilized to uninitialized Reviewed By: jerryzh168 Differential Revision: D8995509 fbshipit-source-id: 94518d5542a7bff49fcb9a4505c0c7a959746f78	2018-08-21 18:41:29 -07:00
tomdz	44f996f82c	Py3 fixes for layer_model_helper.py (#10525 ) Summary: Fixes `__getattr__` to adhere to its Python API contract, and wraps `range()` call in a list since it does not return one anymore in Python 3. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10525 Reviewed By: ezyang Differential Revision: D9441360 Pulled By: tomdz fbshipit-source-id: d489c0e7cefecc4699ca866fd55ddbfa629688d4	2018-08-21 18:41:28 -07:00
Peter Goldsborough	71ddd837d7	Support custom ops in ScriptModule and tidy up test files (#10610 ) Summary: This PR adds support for using custom ops in ScriptModules, the last step for our custom op strategy. You can now write ``` import torch torch.ops.load_library('libcustom_ops.so') class Model(torch.jit.ScriptModule): def __init__(self): super(Model, self).__init__() torch.jit.script_method def forward(self, input): return torch.ops.custom.op(input) + 1 model = Model() model.forward(torch.ones(5)) # Works model.save("model.pt") # Works model = torch.jit.load("model.pt") # Works ``` You can then load the `model.pt` in C++ and execute its `forward` method! Missing for this was the fact that the script compiler didn't know to convert `ops.custom.op` into a `BuiltinFunction` which then emits a function call. For this I came up with the following strategy inside `torch/csrc/jit/scrip/init.cpp`: 1. When we access `torch.ops`, we return a `CustomOpValue` (subclass of `PythonValue`), whose purpose is only to return a `CustomOpNamespaceValue` (subclass of `PythonValue`) whenever something under it is accessed. 2. `CustomOpNamespaceValue` will then for each field accessed on it return a `BuiltinFunction`. This doesn't reduce performance for any calls that are not to `torch.ops` (as opposed to inspecting every function call's name the call site, for example). I also had to fix `BuiltinFunction` to not assume the namespace is always `aten::`. A lot of other changes are just tidying up the Python and C++ test harness before I integrate it in CI. zdevito dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/10610 Differential Revision: D9387832 Pulled By: goldsborough fbshipit-source-id: c00f431db56c7502a66fe1f813fe78067f428ecb	2018-08-21 18:41:27 -07:00
Tongliang Liao	e94ae99d24	Delete copy constructor/assignment of class Observable explicitly. (#10593 ) Summary: This should resolves "error C2280: 'std::unique_ptr<caffe2::ObserverBase<caffe2::OperatorBase>,std::default_delete<_Ty>> &std::unique_ptr<_Ty,std::default_delete<_Ty>>::operator =(const std::unique_ptr<_Ty,std::default_delete<_Ty>> &)': attempting to reference a deleted function" from Visual Studio. It should also make error message more human-readable in case if something really messed up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10593 Reviewed By: orionr Differential Revision: D9436397 Pulled By: mingzhe09088 fbshipit-source-id: 31711667297b4160196134a34365da734db1c61d	2018-08-21 16:56:04 -07:00
Shihao Xu	04b773ab87	Support Loading to GPU (#10710 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10710 Can't resume from checkpoint for workflows that use GPU. The problem is just we didn't leverage the already-provided GPU deserialization of Caffe2. `keep_device` arg of LoadOp. See https://fburl.com/y27ltaxw How a serialized BlobProto (contraining TensorProto) is loaded into GPU memory? - Load BlobProto from DB. https://fburl.com/pe1qaeyf - Deserialize the BlobProto into a Blob instance. https://fburl.com/5dirjuuh and https://fburl.com/stoho0x1 - Call Blob->Deserialized. https://fburl.com/bnureu32 - Deserializer Registration. https://fburl.com/wbu95ry7 https://fburl.com/ycetud8u - Create TensorCUDA Deserializer. https://fburl.com/2lirfuqj - Create Tensor on GPU and get TensorProto of BlobProto. https://fburl.com/7dre82zg - Copy TensorProto in CPU to Tensor on GPU. https://fburl.com/fr0qk2oe Cloned the GPU workflows for testing in D9125520. Reviewed By: mraway Differential Revision: D9372950 fbshipit-source-id: 2bf70747bd71e8da16239197f7d2761d63f09ff8	2018-08-21 13:57:36 -07:00
Orion Reblitz-Richardson	edb34434ab	More changes for hidden visibility (#10692 ) Summary: Let's run CI tests to see what fails given the changes that just landed in https://github.com/pytorch/pytorch/pull/10624 cc mingzhe09088 ezyang Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10692 Reviewed By: mingzhe09088 Differential Revision: D9423617 Pulled By: orionr fbshipit-source-id: 3bda1f118d13f8dd8e823727c93167cae747d8cf	2018-08-21 13:39:57 -07:00
nadavbh12	8a1739b05d	Add arguments __repr__ in Distribution base class Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10373 Differential Revision: D9240316 Pulled By: ezyang fbshipit-source-id: f35c500f61f86e6be405e8bd4040db5146224984	2018-08-21 12:10:23 -07:00
Lei Zhang	9c321a8779	Add util function from core type to dtype (#10716 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10716 title Reviewed By: idning Differential Revision: D9417357 fbshipit-source-id: 0f71805b1d64a46791d6ee4d8620763f878ffdb6	2018-08-21 10:55:19 -07:00
Lu Fang	b23d59ce1a	Make ONNX_ATEN_FALLBACK as internal default option Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10629 Reviewed By: bddppq Differential Revision: D9381106 fbshipit-source-id: 03d42c95d17a70a68fe0f38dad68f1793996dfce	2018-08-21 10:10:50 -07:00
Jorghi12	b0b5139149	Set the BUILD_ENVIRONMENT variable before installing sccache. (#10640 ) Summary: Set the build environment before installing sccache in order to make sure the docker images have the links set up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10640 Reviewed By: yf225 Differential Revision: D9399593 Pulled By: Jorghi12 fbshipit-source-id: a062fed8b7e83460fe9d50a7a27c0f20bcd766c4	2018-08-21 09:40:41 -07:00
Marat Dukhan	30ad13faca	Avoid shadowing i, j vars in GeneralProposals test (#10721 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10721 - Fix compilation warning "declaration of 'i' shadows a previous local [-Werror=shadow-compatible-local]" Reviewed By: newstzpz Differential Revision: D9419688 fbshipit-source-id: 76efc3688782ce4ead3c89e7069211736febfac2	2018-08-21 09:11:38 -07:00
Gregory Chanan	f9d1b001e1	Move THNN Reduction to ATen/core. (#10703 ) Summary: This is part of moving the (base) Type to ATen/core; Some Type methods have default argument of type THNN Reduction. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10703 Differential Revision: D9406060 Pulled By: gchanan fbshipit-source-id: 789bb3387c58bd083cd526a602649105274e1ef6	2018-08-21 08:54:35 -07:00
Mingzhe Li	f0d8a36e70	Completely remove build_aten and use_aten (#10469 ) Summary: Breaking out of #8338 to completely remove build_aten and use_aten. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10469 Reviewed By: orionr Differential Revision: D9413639 Pulled By: mingzhe09088 fbshipit-source-id: b7203aa4f5f2bb95c504c8dc187a3167f2570183	2018-08-20 20:26:42 -07:00
Michael Suo	9e75ec11fb	Make empty list literals construct empty Tensor[] (#10705 ) Summary: This will make the common case more natural (no need to do `_construct_empty_tensor_list()`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10705 Differential Revision: D9411622 Pulled By: michaelsuo fbshipit-source-id: 2d91fbc5787426748d6e1c8e7bbeee737544dc96	2018-08-20 18:28:28 -07:00
Jesse Hellemn	5c0d9a2493	Soumith's last few patches to v0.4.1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10646 Reviewed By: ml7 Differential Revision: D9400556 Pulled By: pjh5 fbshipit-source-id: 1c9d54d5306f93d103fa1b172fa189fb68e32490	2018-08-20 18:28:27 -07:00
Duc Ngo	e449a27646	Fix issues link in Caffe2 readme (#10711 ) Summary: Change to pytorch issues link orionr pjh5 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/10711 Reviewed By: orionr Differential Revision: D9412870 Pulled By: duc0 fbshipit-source-id: 341e8504ade8eba614cead832e5b5fdca4b1c270	2018-08-20 16:55:11 -07:00
JerryShih	826550a32e	Update the onnx Gemm op to FC/FCTransposed logic in caffe2 onnx backend (#10108 ) Summary: The broadcast is used by default when the opset version is greater then 6. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10108 Reviewed By: bddppq Differential Revision: D9176934 Pulled By: houseroad fbshipit-source-id: b737bd87b0ddc241c657d35856d1273c9950eeba	2018-08-20 16:09:22 -07:00
Jesse Hellemn	15d7f49205	Adding ATEN_NO_TEST option to root level cmake for propogation to aten Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10708 Reviewed By: ml7 Differential Revision: D9410916 Pulled By: pjh5 fbshipit-source-id: b216a9ff7be23ff8754f2fe0b8197b5d006aa08d	2018-08-20 15:40:27 -07:00
James Reed	585e6b581f	Allow method-style casts on tensors (#10641 ) Summary: Closes https://github.com/pytorch/pytorch/issues/10631 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10641 Differential Revision: D9407598 Pulled By: jamesr66a fbshipit-source-id: a0331f4e9e55d92718cde7a1112fe8c705206b1f	2018-08-20 14:10:21 -07:00
Edward Yang	39a3dcc999	Fix #10698 build failure (#10704 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10704 Differential Revision: D9406072 Pulled By: ezyang fbshipit-source-id: 0d472ef84cddc3bf7600b06d04e5e02e94d59fa3	2018-08-20 14:10:19 -07:00
Jason Gauci	b4684db698	Add support for Log() Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10694 Reviewed By: houseroad Differential Revision: D9405612 Pulled By: MisterTea fbshipit-source-id: 6d83d3c2db933a3822076c7faf578ac0e92e60c6	2018-08-20 13:25:21 -07:00
Huan Gui	7832e9d564	Add a bisect percentile operator (#10563 ) Summary: Add a bisect percentile operators with lower and upper bounds for interpolation Pull Request resolved: https://github.com/pytorch/pytorch/pull/10563 Reviewed By: chocjy Differential Revision: D7802182 Pulled By: olittle fbshipit-source-id: 89ebfa8b3463adc2c89235fa3dfffa187a9d5417	2018-08-20 13:14:05 -07:00
Jerry Zhang	3d0757430b	Fix EnsureCPUOutputOp (#10651 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10651 EnsureCPUOutputOp will copy the input from another Context to CPU, but currently there is no guarantee that the Copy will be executed. Differential Revision: D9390046 fbshipit-source-id: af3ff19cf46560264cb77d2ab8821f0cc5be74f6	2018-08-20 12:12:48 -07:00
Duc Ngo	2e563c417c	Nomnigraph - rename some APIs that invole Subtree to Subgraph (#10551 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10551 Renaming from "subtree" -> "subgraph" to improve clarity of subgraph matcher APIs since it now supports DAG This is pure renaming, no functionalities change. Reviewed By: bwasti Differential Revision: D9348311 fbshipit-source-id: 4b9267845950f3029dfe385ce3257d3abb8bdad4	2018-08-20 10:55:21 -07:00
Duc Ngo	aa9f328fa3	Nomnigraph - DAG matching (#10549 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10549 Support dag matching in nomnigraph. This is done by maintaining a map from node in the MatchGraph to node in the input graph, and additionally enforce that same nodes in the MatchGraph must match to same nodes in the input graph (with the exception of multiplicity i.e. when count != 1 on the MatchGraph node). In a follow up diff, I'll rename the API that refers to subtree as subgraph to improve clarity. Reviewed By: bwasti Differential Revision: D9347322 fbshipit-source-id: 171491b98c76852240a253279c2654e96dd12632	2018-08-20 10:55:19 -07:00
Gregory Chanan	0cce4620fe	Fix backend/device-type comparison with MKLDNN. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10689 Differential Revision: D9400450 Pulled By: gchanan fbshipit-source-id: f75b042b886d5d525edb2c423173a9646c613a1b	2018-08-20 10:41:08 -07:00
Tongzhou Wang	db7b7f1359	fix typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10686 Differential Revision: D9399874 Pulled By: SsnL fbshipit-source-id: 28130992d2416721552f72cfa835ff0358caeefa	2018-08-20 10:40:55 -07:00
Orion Reblitz-Richardson	d4832f1e7b	More fixes for hidden visibility (#10624 ) Summary: Some more `ATEN_API` additions for hidden visibility. Running CI tests to see what fails to link. cc Yangqing mingzhe09088 ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10624 Reviewed By: mingzhe09088 Differential Revision: D9392728 Pulled By: orionr fbshipit-source-id: e0f0861496b12c9a4e40c10b6e0c9e0df18e8726	2018-08-20 10:11:59 -07:00
Adam Paszke	9ad9191323	Fix cuDNN dropout state cache (#10662 ) Summary: Minor fix for the cuDNN cache. Previously we would skip the event reinitialization when an RNN function would be called on GPU 0, and then on GPU 1, but it would be in eval mode on GPU1. That would cause us to skip event re-initialization, and cause an incorrect resource handle error when trying to record the event. soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/10662 Reviewed By: soumith Differential Revision: D9393629 Pulled By: apaszke fbshipit-source-id: e64c1c1d2860e80f5a7ba727d0b01aeb5f762d90	2018-08-20 05:09:41 -07:00
Kittipat Virochsiri	c37fac4d50	Fixing stop condition on composite reader (#9888 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9888 Limiter cannot be shared or copied; just pass it to the first reader. Reviewed By: xianjiec Differential Revision: D9008871 fbshipit-source-id: e20cd785b26b1844e156efc3833ca77cfc3ffe82	2018-08-20 03:02:20 -07:00
Xiang Gao	83066e9b30	Add trigonometry functions for ONNX export (#7540 ) Summary: Trigonometry functions are newly added to ONNX in a recent PR https://github.com/onnx/onnx/pull/869 This PR makes pytorch support exporting graphs with trigonometry functions. This PR might need to wait until it is ready to change ```python _onnx_opset_version = 6 ``` to ```python _onnx_opset_version = 7 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/7540 Differential Revision: D9395041 Pulled By: bddppq fbshipit-source-id: bdf3e9d212b911c8c4eacf5a0753bb092e4748d2	2018-08-19 23:01:28 -07:00
Tongzhou Wang	3f603eeee8	some improvements on distributed docs Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10666 Differential Revision: D9395242 Pulled By: SsnL fbshipit-source-id: 952326b9c5a1a974a1c33a0e12738e1e21ad9956	2018-08-19 17:40:28 -07:00
Tongzhou Wang	108b657159	Import DistributedSampler in utils/data/__init__ (#10671 ) Summary: There is no reason that user should do an extra import to use DistributedSampler. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10671 Differential Revision: D9395189 Pulled By: SsnL fbshipit-source-id: 8f41d93813c8fb52fe012f76980c6a261a8db9b2	2018-08-19 16:55:13 -07:00
Edward Yang	6bdbad93b9	Refactor Device to not depend on Backend. (#10478 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10478 - Removed Backend constructor from Device, and fixed all use-sites to use DeviceType::CPU instead of kCPU, or use a new function backendToDeviceType to perform the conversion. - New method device_type() on Type; it gives you the underlying device type, e.g., CPU for SparseCPU. - We add backward compatibility for kCPU/kCUDA uses, by introducing a new special type which is implicitly convertible to both DeviceType and Backend. As long as you don't define a function that's overloaded on both DeviceType and Backend (but not on BackendOrDeviceType), the implicit conversions will ensure that uses of at::Device(at::kCPU) keep working. We fixed use-sites in the library, but did NOT fix sites in the test code, so that we can exercise this BC code. Reviewed By: Yangqing Differential Revision: D9301861 fbshipit-source-id: 9a9d88620500715c7b37e655b4fd761f6dd72716	2018-08-18 17:39:14 -07:00
Richard Zou	f1420adfe3	Move at::chunk into the graph fuser (#10178 ) Summary: ... to avoid slow at::chunk (it is slow due to tensor initialization). Picking up from #10026 This is done through the following: 1) Absorb starting chunks into FusionGroup as a part of the graph fuser pass. 2) When compiling a kernel, emit a `std::vector<ConcatDesc>` that describes if an input (of the original graph) will be chunked. 3) When launching a kernel, `use std::vector<ConcatDesc>` to chunk an input tensor on the CPU. This chunk directly takes in an at::Tensor and creates four TensorInfo structs in-place in the argument list, bypassing the creation of intermediate Tensors. - Expect test and correctness test to see if a single chunk is fused by the graph fuser - Correctness test for a variety of chunks (dimension = beginning, middle, end) and tensors (contiguous, non-contiguous, edge case (splitSize = 1) for both CPU/CUDA - Expect test for multiple chunks fused into the same kernel and correctness test. cc zdevito apaszke LSTM forward pass, 1 layer, 512 hidden size and input size, 100 seq length, requires_grad=False on all inputs and weights. After changes: ``` thnn cudnn jit 8.8468 6.5797 9.3470 ``` Before changes: ``` thnn cudnn jit 9.9221 6.6539 11.2550 ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10178 Differential Revision: D9382661 Pulled By: zou3519 fbshipit-source-id: 1f8a749208fbdd45559775ce98cf4eb9558448f8	2018-08-18 16:10:11 -07:00
poh	d87b4e941b	fix python interpreter can not be found without `PYTHON_EXECUTABLE` (#10659 ) Summary: Take 2 of #10543 The problem was that between commit and merge there was added one more run point `tools/build_libtorch.py`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10659 Differential Revision: D9393540 Pulled By: soumith fbshipit-source-id: 8ebfed600fc735fd1cb0489b161ec80e3db062e0	2018-08-18 15:40:08 -07:00
Taewook Oh	152762a567	Fix warnings diagnosed in recent clang (#10647 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10647 Fix "missing std::move from the return value" warning diagnosed by recent clang compiler. Reviewed By: soumith, DavidCallahan Differential Revision: D9384692 fbshipit-source-id: 8ad951e47d605e6f98a9650f2dec2909ad0f3eb8	2018-08-17 21:32:58 -07:00
Richard Zou	e29b5a1ea8	graph fuser inserts explicit expands where necessary (#10325 ) Summary: Fixes #10096 If the only thing preventing a simple mappable operator from being fused into a fusion group is that its Tensor inputs are not of the same shape as the output, then the graph fuser inserts explicit expand nodes for those inputs. This helps the graph fuser not miss out on any fusion opportunities involving simple mappable operations that have Tensor inputs. This PR doesn't do anything for the scalar case; that can be addressed later. Test Plan - Simple expect test case - Added expect tests for a raw LSTMCell. The expands help speed up the forwards pass by allowing more operations to be fused into the LSTMCell's single FusionGroup. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10325 Differential Revision: D9379308 Pulled By: zou3519 fbshipit-source-id: 86d2202eb97e9bb16e511667b7fe177aeaf88245	2018-08-17 16:03:46 -07:00
Yinghai Lu	7c55d11ba5	Make sure we don't relocate the weight name buffer (#10630 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10630 `onnxTensorDescriptorV1.name` points to the string buffer. We use a vector of strings to serve as the storage. This means we cannot reallocate the vector because that may invalidate the `onnxTensorDescriptorV1.name` pointers. Solution is to reserve a large enough vector so that it won't reallocate. Reviewed By: bddppq, houseroad Differential Revision: D9381838 fbshipit-source-id: f49c5719aafcc0829c79f95a2a39a175bcad7bfe	2018-08-17 16:03:31 -07:00
Peter Goldsborough	65b9308128	Basic infrastructure for C++ documentation (#10569 ) Summary: Adds the folder structure, Doxyfile, sphinx setup and Makefile to build C++ docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10569 Differential Revision: D9386744 Pulled By: goldsborough fbshipit-source-id: 0a7c581dcf0a5f7b01ba19d317b493cf95935134	2018-08-17 15:39:50 -07:00
Jesse Hellemn	b62b378022	Adding torch support for CMAKE_ARGS env Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10635 Reviewed By: ml7 Differential Revision: D9383845 Pulled By: pjh5 fbshipit-source-id: fb21bda12e88053eec738974e6e419388c5038d9	2018-08-17 14:54:43 -07:00
Tongzhou Wang	c5c1c051ca	Fix dropout fused kernel applied in eval mode (#10621 ) Summary: fixes https://github.com/pytorch/pytorch/issues/10584 cc apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10621 Differential Revision: D9379397 Pulled By: SsnL fbshipit-source-id: 5ff2939ba794af082ce597ef289a09ee757636dc	2018-08-17 14:54:42 -07:00
Richard Zou	86c9856d9c	Fuse tensor-scalar ops when scalar is constant (#10511 ) Summary: This is on the way to resolving #9940. Fixes #10501 This PR modifies graph fuser to fuse operations that have constant scalar arguments. These constant scalar arguments are directly inlined into the kernel body. The context for this is that LSTM backward (in particular, sigmoid backward) has many add(x, 1.) operations. This PR should be sufficient for LSTM backward to get fused by the graph fuser. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10511 Differential Revision: D9378896 Pulled By: zou3519 fbshipit-source-id: 6a7a2987f5b6e8edaaf4b599cd200df33361650f	2018-08-17 14:10:23 -07:00
Keren Zhou	f3ac619764	Add fusion support for batchnorm and convolution without bias Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10595 Reviewed By: bwasti Differential Revision: D9110099 fbshipit-source-id: e1ed66c7d82b2f9987b7eb9c7f98877a6dbeb902	2018-08-17 12:11:44 -07:00
Adam Paszke	d35f365ad5	Remove all cuDNN specific inputs to RNN functions (#10581 ) Summary: This is still not the final PR, but it removes all blockers for actually using the RNN functions directly in the JIT. Next patch should be final, and will actually remove the symbolic_override code, and change it to proper symbolics for those ATen functions. Turns out the symbolic code can be also cleaned up a bit, and I'll do that too. zdevito ezyang colesbury (for minor DispatchStub.h) changes There was no way to handle those in the JIT for now, and they turned out to be completely unnecessary. It should make the Python and C++ module code much simpler too, since all the logic is now centralized in the native functions. The downside is that RNN modules no longer own their dropout buffers, which are shared per-device instead (with appropriate locking and synchronization). This might appear as a perf regression at first, but in reality it's highly unlikely that anyone will want to run cuDNN RNNs on the same GPU in parallel. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10581 Reviewed By: colesbury Differential Revision: D9365541 Pulled By: apaszke fbshipit-source-id: 3ef8677ee5481bae60c74a9117a2508665b476b5	2018-08-17 11:09:51 -07:00
Wanchao Liang	52058204d6	Add nn functional tests in JIT (#10409 ) Summary: The PR is the first step to integrate torch.nn library with JIT. It adds the tests for nn functional interfaces in trace/script mode, and tries to find out the different between torch.nn.functional ops and the ATen ops, to see the work need to be done in order to support a full set of nn functional in script mode. Some statistics in summary: - Totally 84 useful functions in torch.nn.functional (the number does not include helper funcs and deprecated funcs in torch.nn.functional). - 7 functions/ops does not support higher gradient, so just excluded from the whole test. - 36 functions is different with the Aten op for different reasons. Among those 36 functions, bunch of them (roughly around 10-15) are just naming difference and simple transformation using other ops inside the function. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10409 Differential Revision: D9350694 Pulled By: wanchaol fbshipit-source-id: 8fce6f30d8d25ace5a544a57b219fe61f5a092f8	2018-08-17 11:09:49 -07:00
Andrey Malevich	b4e72ea811	Revert D9377394: [pytorch][PR] [Caffe2] Add AT_CORE_EXPORT and AT_CORE_IMPORT. Differential Revision: D9377394 Original commit changeset: 993062a461ff fbshipit-source-id: af8ab92e9b88466602508981d9b3ea24ce393dfc	2018-08-17 10:39:27 -07:00
Jongsoo Park	bd9ab650ae	fix compile error in math_hip.cc from new Im2Col/Col2Im interface (#10623 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10623 Fix compile error in https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-build/10280//console Reviewed By: ezyang Differential Revision: D9379451 fbshipit-source-id: 67cc3964981edba1915b93c49643caa300d63c16	2018-08-17 10:24:25 -07:00
Edward Yang	ff440b61f6	Revert D9378844: [pytorch][PR] fix python interpreter can not be found Differential Revision: D9378844 Original commit changeset: 022e20aab7e2 fbshipit-source-id: 962280707e84edff2a4f59b1ce2f4211a579a055	2018-08-17 10:09:27 -07:00
Elias Ellison	e190505e84	Adding support for inlining if branches (#10084 ) Summary: Inlining if branches which have constant inputs. If an if node gets inlined, the set of mutated variables returned by its ancestors may have changed. In the following example the block should return a mutated set of (a) and not (a, b). ``` if cond: if True: a = a - 1 else: b = b - 1 ``` To calculate this we recursively update mutate variables in if branches from the leaf nodes up. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10084 Reviewed By: michaelsuo Differential Revision: D9340429 Pulled By: eellison fbshipit-source-id: b0dd638a5cace9fdec3130460428fca655ce4b98	2018-08-17 09:48:47 -07:00
Junjie Bai	31c7a32d1c	Include aten_op by default in caffe2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10603 Reviewed By: ahhegazy, dzhulgakov Differential Revision: D9364309 fbshipit-source-id: e72d9f2b1e99cb0fb2186c737fcd925b14d42754	2018-08-17 08:39:46 -07:00
Yinghai Lu	03982fb8d3	Fix subgraph cutting wrt recent external_input change in nomnigraph (#10598 ) Summary: https://github.com/pytorch/pytorch/pull/10100 recently take external input/output in nomnigraph. This PR makes adjust to 0. Relax some of the conditions on external input 1. Update NNModule inputs/outputs when pruning the input/output. 2. Avoiding copying external input/output as nomnigraph already takes care of it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10598 Reviewed By: bwasti Differential Revision: D9371730 Pulled By: yinghai fbshipit-source-id: 9273be5041dc4cc8585587f47cb6721e518a06a8	2018-08-17 08:25:49 -07:00
Nikita Melentev	ff3a481aee	fix python interpreter can not be found (#10543 ) Summary: Custom python installation, which have no aliases to `python` or `python3` can't be found by cmake `findPythonInterp` without extra cmake argument. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10543 Differential Revision: D9378844 Pulled By: ezyang fbshipit-source-id: 022e20aab7e27a5a56b8eb91b6026151116193c7	2018-08-17 08:25:48 -07:00
Tongliang Liao	51222500e2	Add AT_CORE_EXPORT and AT_CORE_IMPORT. (#10602 ) Summary: Fix "error LNK2019: unresolved external symbol" from "CAFFE_KNOWN_TYPE" in tests where we should use dllexport instead of AT_CORE_API(=dllimport). Pull Request resolved: https://github.com/pytorch/pytorch/pull/10602 Differential Revision: D9377394 Pulled By: Yangqing fbshipit-source-id: 993062a461ffce393f2321c5391db5afb9b4e7ba	2018-08-17 02:09:38 -07:00
Jongsoo Park	cc53807be5	group conv with NHWC layout (#10585 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10585 group conv with NHWC layout Reviewed By: BIT-silence Differential Revision: D7547497 fbshipit-source-id: da0ec5a4512c15a0a0d7b79e6ce00c1f8f77f661	2018-08-17 00:39:23 -07:00
onnxbot	0aefb9f26c	Update onnx to onnx/onnx@7848f1e (#10613 ) Summary: https://github.com/onnx/onnx/commit/7848f1e Pull Request resolved: https://github.com/pytorch/pytorch/pull/10613 Reviewed By: houseroad Differential Revision: D9376224 Pulled By: bddppq fbshipit-source-id: ce8a53255ba24f0f8f989570e8b015837f8442fb	2018-08-16 23:39:37 -07:00
Summer Deng	6667d55e73	Disallow input filler for GatherRangesOp (#10592 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10592 Filter out GatherRanges ops Reviewed By: highker Differential Revision: D9365220 fbshipit-source-id: e21ab00dc9e553c9aaf172e1241206e0c0a7a23d	2018-08-16 21:39:09 -07:00
Hassan Eslami	3578909671	Remove unused code base for distributed training (#10282 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10282 This diff removes the unused/deprecated features from the code base. Reviewed By: manojkris Differential Revision: D9169859 fbshipit-source-id: d6447b7916a7c687b44b20da868112e6720ba245	2018-08-16 20:10:17 -07:00
Anders Papitto	f1d40ef280	build_pytorch_libs.sh: use MAX_JOBS rather than NUM_JOBS (#10600 ) Summary: MAX_JOBS is set by our jenkins setup Pull Request resolved: https://github.com/pytorch/pytorch/pull/10600 Differential Revision: D9375317 Pulled By: anderspapitto fbshipit-source-id: 25416d5ee12372f7610baa78cb7b423806b26aa2	2018-08-16 20:10:15 -07:00
Peter Goldsborough	c101a57a74	Build mechanism for custom operators (#10226 ) Summary: This is the last step in the custom operator implementation: providing a way to build from C++ and Python. For this I: 1. Created a `FindTorch.cmake` taken largely from ebetica with a CMake function to easily create simple custom op libraries 2. Created a ` torch/op.h` header for easy inclusion of necessary headers, 3. Created a test directory `pytorch/test/custom_operator` which includes the basic setup for a custom op. 1. It defines an op in `op.{h,cpp}` 2. Registers it with the JIT using `RegisterOperators` 3. Builds it into a shared library via a `CMakeLists.txt` 4. Binds it into Python using a `setup.py`. This step makes use of our C++ extension setup that we already have. No work, yey! The pure C++ and the Python builds are separate and not coupled in any way. zdevito soumith dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/10226 Differential Revision: D9296839 Pulled By: goldsborough fbshipit-source-id: 32f74cafb6e3d86cada8dfca8136d0dfb1f197a0	2018-08-16 18:56:17 -07:00
Marat Dukhan	67c6d93634	Tune minimal work size (#10599 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10599 Not spawning threads with spin-lock synchronization is bad because they will switch to `condvar` wait, which increases wake-up latency next time they are needed. Reviewed By: ajtulloch Differential Revision: D9366664 fbshipit-source-id: 3b9e4a502aeefaf0ddc4795303a855d98980b02e	2018-08-16 17:39:57 -07:00
Jerry Ma	afd7477eaa	Add ``buffers(),` `named_buffers()`` methods. (#10554 ) Summary: This commit adds the ``buffers()`` and ``named_buffers()`` methods as analogues of ``parameters()`` and ``named_parameters()``. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10554 Reviewed By: SsnL Differential Revision: D9367762 Pulled By: jma127 fbshipit-source-id: f2042e46a7e833dce40cb41681dbd80d7885c74e	2018-08-16 16:26:48 -07:00
Junjie Bai	342517e6e7	Back out "Add aten_op to caffe2 onnx (python) backend" (#10589 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10589 Original commit changeset: 2cc6fedbaf08 Reviewed By: houseroad Differential Revision: D9365208 fbshipit-source-id: 3871d8e70f0d8e48c8af9593c78587d16c45afc2	2018-08-16 15:15:27 -07:00
Orion Reblitz-Richardson	488ea824ed	Additional changes to make GPU builds work (#10507 ) Summary: A continuation of https://github.com/pytorch/pytorch/pull/10504 for GPU, torch, etc. builds. I was testing with ``` FULL_CAFFE2=1 python setup.py build_deps \| tee ~/log.txt cat ~/log.txt \| egrep 'undefined refer' \| sort \| less ``` I'll rebase on master when Yangqing's changes in 10504 land, but putting up for some testing. cc mingzhe09088 anderspapitto ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10507 Reviewed By: Yangqing Differential Revision: D9359606 Pulled By: orionr fbshipit-source-id: c2a3683b3ea5839689f5d2661da0bc9055a54cd2	2018-08-16 13:25:27 -07:00
Ailing Zhang	ef15bb8787	remove implicit conversion from gpu to cpu (#10553 ) Summary: Resubmit #10416 with fixed tests . This is to remove implicit conversion from gpu to cpu in when calling numpy to keep behavior match others. It requires users to move the tensor back to cpu() before call numpy functions on it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10553 Differential Revision: D9350212 Pulled By: ailzhang fbshipit-source-id: 9317d8fea925d4b20ae3150e2c1b39ba5c9c9d0a	2018-08-16 12:10:39 -07:00
Chao Li	d6f3c88418	Revert D9076734: Split storage from tensor Differential Revision: D9076734 Original commit changeset: ea9e1094ecf8 fbshipit-source-id: 3fa9b65b7265fce6207d9e1d9ef4707dbb29704b	2018-08-16 11:25:32 -07:00
Kirtesh Patil	40a070422d	Adding new allreduce bcube routines to ops supported by gloo (#10494 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10494 Adding the AllredubeBcube routines as they are now available in gloo. Reviewed By: wesolwsk Differential Revision: D8269473 fbshipit-source-id: 6a3a32291bbf1fbb328b3ced0f2a753dc5caf4e5	2018-08-16 10:56:26 -07:00
Yinghai Lu	4be4b4c8b5	Remove weight from input of onnxifi backend op (#10575 ) Summary: The ONNXIFI backend will absorb the constant weight in Conv, so we should not add it as an input. This is just a test artifacts. Note that Onnxifi transformer will do the right thing when cutting the graph to absorb the weights. rdzhabarov Pull Request resolved: https://github.com/pytorch/pytorch/pull/10575 Reviewed By: houseroad Differential Revision: D9357339 Pulled By: yinghai fbshipit-source-id: a613fa3acafa687295312f5211f8e9d7f77b39cd	2018-08-16 10:56:25 -07:00
Fei Sun	319fefe9e6	Support benchmark on windows machines Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10564 Reviewed By: llyfacebook Differential Revision: D9356389 Pulled By: sf-wind fbshipit-source-id: f6c58e68d3eaf3a39c9f89b8f04e6039c75b4cd9	2018-08-16 10:56:23 -07:00
Gregory Chanan	00f2731112	Merge THTensor into TensorImpl Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10479 Differential Revision: D9315800 Pulled By: gchanan fbshipit-source-id: b13ef0de3342600b02b54e0700eb02021a9d1a9e	2018-08-16 08:10:06 -07:00
Anders Papitto	130881f0e3	Delete build_caffe2.sh, replace with build_libtorch.py (#10508 ) Summary: delete build_caffe2.sh, replace with build_libtorch.py as suggested by peter (and copy-pasted from his draft PR). This ensures that all consumers of the torch CMake file go through as unified a path as possible. In order to change the surrounding infrastructure as little as possible, I made some tweaks to enable build_pytorch_libs.sh to generate the test binaries relative to the current directory, rather than hardcoding to pytorch/build. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10508 Differential Revision: D9354398 Pulled By: anderspapitto fbshipit-source-id: 05b03df087935f88fca7ccefc676af477ad2d1e9	2018-08-16 08:10:04 -07:00
Edward Yang	c6facc2aaa	Add conversions between DataType and ScalarType. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10472 Reviewed By: gchanan Differential Revision: D9298048 fbshipit-source-id: c58efa582eab64c58d0771d90d90862911c168d1	2018-08-16 07:55:31 -07:00
Edward Yang	fdd2b9baee	Add DataType alias Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10547 Reviewed By: soumith Differential Revision: D9346040 fbshipit-source-id: 1069a44182ccff68b1694086c8b709ba2046b22b	2018-08-16 07:55:29 -07:00
Edward Yang	8fdba4ec35	Move all operator<< overloads out of the global namespace. (#10546 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10546 Have you ever written an operator<< overload in the caffe2 namespace in a core Caffe2 header, and then been stunned when some completely unrelated code started breaking? This diff fixes this problem! The problem looks like this: 1. You're building against a really old version of glog (think 0.3.2, or something like that) 2. This version of glog defines operator<< overloads for std containers in the global namespace 3. You add a new overload in your current namespace (e.g., caffe2). Congratulations: this overload is preferentially chosen over the global namespace one for all calls to << in that namespace. And since it doesn't actually have std::vector overloads, unrelated Caffe2 code breaks. Newer versions of glog have a fix for this: they have the line: namespace std { using ::operator<<; } in their header. So let's help old versions of glog out and do this ourselves. In our new world order, operator<< overloads defined in the global namespace won't work (unless they're for std containers, which work because of ADL). So this diff also moves all those overloads to the correct namespace. Reviewed By: dzhulgakov Differential Revision: D9344540 fbshipit-source-id: 6246ed50b86312668ebbd7b039fcd1233a3609cf	2018-08-16 07:55:27 -07:00
Tongliang Liao	238b4b9236	Resolve error C2370 "redefinition; different storage class" by adding dllimport. (#10571 ) Summary: For #10568 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10571 Differential Revision: D9357987 Pulled By: Yangqing fbshipit-source-id: 6726f0a1d31a225375a0ddc0e05284f3eb89dda8	2018-08-16 00:39:33 -07:00
Junjie Bai	84427d26db	Add aten_op to caffe2 onnx (python) backend Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10579 Reviewed By: houseroad Differential Revision: D9357837 fbshipit-source-id: 2cc6fedbaf088df7e11b52a91dfe3b8f0d7fd599	2018-08-16 00:39:30 -07:00
Junjie Bai	76da0b34c2	Remove an unused variable found by linter Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10578 Differential Revision: D9357880 Pulled By: bddppq fbshipit-source-id: 6b56c2dbd02258124b5a4656cdf44d14a59e1b71	2018-08-16 00:25:44 -07:00
Tongliang Liao	7487ee55f1	Resolving error C2487 "member of dll interface class may not be declared with dll interface" by removing nested CAFFE2_API. (#10572 ) Summary: For #10570 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10572 Differential Revision: D9357984 Pulled By: Yangqing fbshipit-source-id: a8f74e384eb3219fb6ac71ada4a45e6bce9199eb	2018-08-16 00:25:41 -07:00
Owen Anderson	abf85bf0ef	Perform CSE across block boundaries. (#10105 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10105 Differential Revision: D9186678 Pulled By: resistor fbshipit-source-id: 87b63d4fc0c7d394edb4777acdefa8f022a8bf8d	2018-08-16 00:25:36 -07:00
Peter Goldsborough	2e0dd86903	Make torch::Tensor -> at::Tensor (#10516 ) Summary: This PR removes the `using Tensor = autograd::Variable;` alias from `torch/tensor.h`, which means `torch::Tensor` is now `at::Tensor`. This PR fixes up some last uses of `.data()` and tidies up the resulting code. For example, I was able to remove `TensorListView` such that code like ``` auto loss = torch::stack(torch::TensorListView(policy_loss)).sum() + torch::stack(torch::TensorListView(value_loss)).sum(); ``` is now ``` auto loss = torch::stack(policy_loss).sum() + torch::stack(value_loss).sum(); ``` CC jgehring ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/10516 Differential Revision: D9324691 Pulled By: goldsborough fbshipit-source-id: a7c1cb779c9c829f89cea55f07ac539b00c78449	2018-08-15 21:25:12 -07:00
Vishwak Srinivasan	8013dac43d	Fix bincount for empty input (#9757 ) Summary: Added tests too. Fixes #9756 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9757 Reviewed By: Yangqing Differential Revision: D9348485 Pulled By: soumith fbshipit-source-id: e13afadf8dbea20ee6ee595383c522dcbaf8796a	2018-08-15 20:55:59 -07:00
Teng Li	05dcf00644	fixed c10d test (#10557 ) Summary: fixed NCCL test, which is not run in CI. We should enable it soon. ``` ~/new_pytorch/pytorch/test$ python test_c10d.py ............... ---------------------------------------------------------------------- Ran 15 tests in 13.099s OK ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10557 Reviewed By: ailzhang Differential Revision: D9353286 Pulled By: teng-li fbshipit-source-id: 5a722975beaa601203f51c723522cc881f2d2090	2018-08-15 17:22:38 -07:00
Yangqing Jia	0a809fc8b1	build changes to make cpu unified build working. (#10504 ) Summary: Properly annotated all apis for cpu front. Checked with cmake using cmake -DUSE_ATEN=ON -DUSE_CUDA=OFF -DBUILD_ATEN=ON and resulting libcaffe2.so has about 11k symbols. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10504 Reviewed By: ezyang Differential Revision: D9316491 Pulled By: Yangqing fbshipit-source-id: 215659abf350af7032e9a4b0f28a856babab2454	2018-08-15 17:22:36 -07:00
Xiaomeng Yang	87cac4c2f1	Update Im2Col related to make preparation for group conv in NHWC order. (#10439 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10439 Update Im2Col related to make preparation for group conv in NHWC order. Reviewed By: houseroad Differential Revision: D9285344 fbshipit-source-id: 1377b0243acb880d2ad9cf73084529a787dcb97d	2018-08-15 17:10:24 -07:00
Yiming Wu	579962f2a8	reroute tensor feature in core.Net and generate one net feature in model_helper (#10528 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10528 adding 2 features to core and model_helper - reroute_tensor which supports op insertion on net level - model_helper complete net and cut net used for full graph analysis Differential Revision: D9330345 fbshipit-source-id: 56341d3f500e72069ee306e20266c8590ae7985a	2018-08-15 16:40:15 -07:00
Jerry Zhang	523bdc8ec1	Split storage from tensor (#10053 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10053 Tensor in Pytorch 1.0 will have Tensor -> TensorImpl -> Storage -> StorageImpl In this diff we split Storage from Tensor in order to align with this design. We'll have Tensor -> Storage -> StorageImpl after this diff Reviewed By: dzhulgakov Differential Revision: D9076734 fbshipit-source-id: ea9e1094ecf8c6eaeaa642413c56c6a95fb3d14e	2018-08-15 16:40:14 -07:00
Gregory Chanan	03e9ea5ef0	Fix leaking of Storages (not StorageImpls) (#10552 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10552 Fix leaking of Storages (not StorageImpls) Reviewed By: li-roy Differential Revision: D9349824 fbshipit-source-id: 31f14951020a63189bebda25a3bf8bf195cd227f	2018-08-15 16:10:00 -07:00
Lukasz Wesolowski	4c49da34a9	Add new MKLDNN fallback operators (#10526 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10526 Resubmitting these changes. Previously they caused issues with multifeed, which I fixed with D9280622 Reviewed By: yinghai Differential Revision: D9327323 fbshipit-source-id: ec69428039b45c6221a5403b8fe9a83637857f04	2018-08-15 15:55:22 -07:00
Simon Wang	a129f9ad3b	Revert D9332335: [pytorch][PR] Implements volumetric (5d) affine grid generation. Differential Revision: D9332335 Original commit changeset: 1b3a91d078ef fbshipit-source-id: 3dcce680257a6da121f5d67918ed4236e0c5bfec	2018-08-15 15:25:11 -07:00
Thomas Viehmann	151e7de893	varargs for einsum (#10067 ) Summary: Implemented via a wrapper, thank you Richard for the suggestion! Fixes: #9929 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10067 Differential Revision: D9083388 Pulled By: soumith fbshipit-source-id: 9ab21cd35278b01962e11d3e70781829bf4a36da	2018-08-15 15:13:25 -07:00
Will Feng	fb45ec5ac3	Don't set DEBUG=1 in ASAN build (#9902 ) Summary: This should make ASAN tests run faster. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9902 Differential Revision: D9032986 Pulled By: yf225 fbshipit-source-id: 3d2edec2d7ce78bc995d25865aa82ba6d3f971d0	2018-08-15 14:39:57 -07:00
Marat Dukhan	26c764a1db	Update FP16 submodule. Close #10523 (#10548 ) Summary: Pull a fix in FP16 for compilation bug when using Intel Compiler Pull Request resolved: https://github.com/pytorch/pytorch/pull/10548 Differential Revision: D9349469 Pulled By: Maratyszcza fbshipit-source-id: 43e6dc5c3c18319d31eca23426770c73795feec5	2018-08-15 14:26:56 -07:00
Orion Reblitz-Richardson	021b4888db	Remove setup_requires and tests_require from setup.py for FULL_CAFFE2 (#10530 ) Summary: In my environment, it looks like setup.py hangs when running ``` FULL_CAFFE2=1 python setup.py build_deps ``` Removing this fixes things, but we might also want to look at `tests_require`, which came over from `setup_caffe2.py`. cc pjh5 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10530 Differential Revision: D9349597 Pulled By: orionr fbshipit-source-id: 589145eca507dfaf16386884ee2fbe60299660b4	2018-08-15 14:26:53 -07:00
Eli Amesefe	c5b1aa93ee	Export uint8 tensors as byte string in mobile_exporter and add GivenTensorByteStringToUInt8FillOp (#10385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10385 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10354 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10316 Because Protobuf encodes uint8_t tensors using a less space efficient varint uin32_t encoding, we are adding a new operator that reads back a byte string into a uint8_t tensor. Reviewed By: harouwu Differential Revision: D9004839 fbshipit-source-id: dfd27085c813fdeff13fee15eef4a2e7fef72845	2018-08-15 14:26:50 -07:00
Edward Yang	6f14202acd	Revert D9276252: [pytorch][PR] remove implicit conversion to cpu Differential Revision: D9276252 Original commit changeset: ea7d9d4f9390 fbshipit-source-id: 5977bf90d4c84b47e15bc8266cc3ce5602c4e05f	2018-08-15 13:55:18 -07:00
Syed Tousif Ahmed	5adcac3dce	Cuda half macros cleanup (#10147 ) Summary: This PR removes couple of macros throughout TH* as part of the re-factoring effort for ATen. Removing these macros should avoid confusion among developers who are trying to move things from TH* to ATen. This PR is part of the THCNumerics deprecation that I have been working on following up on mruberry's https://github.com/pytorch/pytorch/pull/9318. I am separating these two commits to see if removal of these macros doesn't upset the pytorch public CI, as well as internal builds. - Commit `1248de7baf` removes the code paths guarded by `CUDA_HALF_INSTRUCTIONS` macro. Since the macro was removed in commit `2f186df52d`, `ifdef CUDA_HALF_INSTRUCTIONS` would return false and hence the code path that is kept after this change is for the false case of `ifdef CUDA_HALF_INSTRUCTIONS` - Commit `520c99b057` removes the code paths guarded by `CUDA_HALF_TENSOR` macro. Since Pytorch now provides support for only CUDA 8.0 and above, `CUDA_HALF_TENSOR` is always true since CUDA 8.0 satisfies `CUDA_HAS_FP16` and hence, the code path that is kept after this change is for the true case of `ifdef CUDA_HALF_TENSOR`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10147 Differential Revision: D9345940 Pulled By: soumith fbshipit-source-id: c9392261dd432d304f1cdaf961760cbd164a59d0	2018-08-15 13:25:42 -07:00
Adam Paszke	86363e1d8e	Move RNN implementations to C++ (#10481 ) Summary: This is the first of two changes that are supposed to improve how we handle RNNs in the JIT. They still get traced as `PythonOp`s, but now it will be much easier to actually expose them to the JIT as e.g. `aten::lstm`, and ignore the Python interpreter entirely. This needs some symbolic adjustments that will be part of a second PR. Even when we fix symbolics, there will still be a bit of a problem with statefulness of the cuDNN API (we need a mutable cache for the dropout state, but our IR has no way of representing that). zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10481 Reviewed By: ezyang Differential Revision: D9341113 Pulled By: apaszke fbshipit-source-id: 0ae30ead72a1b12044b7c12369d11e5ca8ec30b5	2018-08-15 13:25:41 -07:00
Thomas Viehmann	484395edfb	Fix corner case with torch.multinomial (#9960 ) Summary: In the shortcut for n_sample=1, when category 0 has 0 weight, we should not map the (uniform) sample 0 to category 0. The conversion uniform->multinomial was apparently written to work on a (0,1] range (like curand uses), but PyTorch uses a [0,1) range. Fixes: #4858. Thank you, Roy Fejgin for reporting. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9960 Reviewed By: soumith Differential Revision: D9341793 Pulled By: ailzhang fbshipit-source-id: 6b1a96419a7bc58cc594f761f34c6408ff6354cf	2018-08-15 13:25:39 -07:00
Bram Wasti	fb09292020	Increase tolerance in ConvBN test Summary: reduce flakiness of test Reviewed By: Maratyszcza Differential Revision: D9344877 fbshipit-source-id: 24d5e1b873f94d816c980f3b7db93248cf10aca5	2018-08-15 13:14:35 -07:00
Tongzhou Wang	254dedf604	Propagate NaN through threshold (#10277 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/10238 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10277 Reviewed By: SsnL Differential Revision: D9199825 Pulled By: soumith fbshipit-source-id: 8ee7f9a72d9546d429f311c3f6028461d3c93fe2	2018-08-15 12:59:31 -07:00
Will Feng	0bbcc7b534	Don't assume curl version in Windows build script (#10476 ) Summary: Since we can't specify version number to `choco install curl`, we should not assume that `7.57.0` is the curl version that's in the Windows AMI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10476 Differential Revision: D9303129 Pulled By: yf225 fbshipit-source-id: 198544be68330860fbcf93c99bc995f4e280bda7	2018-08-15 12:59:23 -07:00
James Sun	85408e744f	Move filler interface to operator schema (#10522 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10522 Move filler interface to operator schema to avoid extra code for caffe2 mobile. Reviewed By: dzhulgakov Differential Revision: D9312940 fbshipit-source-id: 77fb2406f0c6b171a1912a207e05e36da50c6966	2018-08-15 12:40:18 -07:00
Johan Gudmundsson	9646d68962	support broadcasting in _kl_categorical_categorical (#10533 ) Summary: Support broadcasting in _kl_categorical_categorical this makes it possible to do: ``` import torch.distributions as dist import torch p_dist = dist.Categorical(torch.ones(1,10)) q_dist = dist.Categorical(torch.ones(100,10)) dist.kl_divergence(p_dist, q_dist) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10533 Differential Revision: D9341252 Pulled By: soumith fbshipit-source-id: 34575b30160b43b6c9e4c3070dd7ef07c00ff5d7	2018-08-15 12:40:17 -07:00
Orion Reblitz-Richardson	05a260da43	Bump gloo to latest master (#10545 ) Summary: Needed by the Gloo development team. Verifying nothing breaks in CI. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10545 Reviewed By: Maratyszcza Differential Revision: D9344413 Pulled By: orionr fbshipit-source-id: 207edb71170870bacec47a635a12d7f55b6c1275	2018-08-15 12:25:44 -07:00
Ailing Zhang	5d27d68779	remove implicit conversion to cpu (#10416 ) Summary: Fixes #9934 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10416 Differential Revision: D9276252 Pulled By: ailzhang fbshipit-source-id: ea7d9d4f9390edefcd0865a98498f6c4307c291d	2018-08-15 12:25:42 -07:00
Brian Hart	9cffe783f1	relax tolerance for two torch.half (float16) tests (#10519 ) Summary: Two tests in the 'nn' test bucket may fail when the torch.half (float16) data type is used. The assertions used in the tests intend to allow slight floating point imprecision in the results, but the tolerances used for the comparisons are too strict for the half type. Relax the tolerances so that slight float16 imprecision won't cause test failures. The affected tests are: - test_variable_sequence_cuda - test_Conv2d_groups_nobias For more information, see issue: https://github.com/pytorch/pytorch/issues/7420 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10519 Differential Revision: D9343751 Pulled By: soumith fbshipit-source-id: 90aedf48f6e22dd4fed9c7bde7cd7c7b6885845a	2018-08-15 12:11:20 -07:00
Duc Ngo	d93e8ab343	Nomnigraph - Refactor SubtreeMatchCriteria to become a Graph of MatchNode (#10512 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10512 SubtreeMatchCriteria now becomes a graph of MatchNode MatchNode consists of NodeMatchCriteria, nonTerminal and count. This is a cleaner internal representation of the data structure and will bring us much closer to DAG matching. Note that I still keep the debugString method because convertToDotGraph doesn't currently work with Subgraph. Reviewed By: bwasti Differential Revision: D9321695 fbshipit-source-id: 58a76f007a9a95d18cf807d419c2b595e9bc847f	2018-08-15 12:11:18 -07:00
Mingfei Ma	f59bcea2c3	parallel max and min for ATen on CPU (#10343 ) Summary: optimize max and min reduction for ATen CPU path, current code path from TH module runs in sequential on CPU. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10343 Differential Revision: D9330799 Pulled By: ezyang fbshipit-source-id: 5b8271e0ca3e3e73f88a9075aa541c8756001b7c	2018-08-15 11:41:01 -07:00
Bangsheng Tang	44b029f5b8	move matrix formation for dot products to precompute/request-only (#10531 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10531 fixed a naming issue in pairwise_similarity Reviewed By: huayuli00 Differential Revision: D9331716 fbshipit-source-id: d7de36f20504c08b1c7871ccdffa343221a3da0c	2018-08-15 11:02:10 -07:00
Eli Stevens	f5a4dd89b5	Implements volumetric (5d) affine grid generation. (#8322 ) Summary: I've implemented affine grid generation for volumetric (5d) inputs. The implementation is based off of the spatial implementation, extended by one dimension. I have a few questions about my implementation vs. the existing one that I will add inline. I have some extensive test cases for the forward pass here: https://gist.github.com/elistevens/6e3bfb20d8d0652b83bd16b3e911285b However, they use `pytest.fixture` extensively, so I'm not sure the best way to incorporate them into the pytorch test suite. Suggestions? I have not tested backwards at all. Diff probably best viewed with whitespace changes ignored. Thanks for considering! Pull Request resolved: https://github.com/pytorch/pytorch/pull/8322 Differential Revision: D9332335 Pulled By: SsnL fbshipit-source-id: 1b3a91d078ef41a6d0a800514e49298fd817e4df	2018-08-15 11:02:08 -07:00
Jongsoo Park	d8ff7ad6f8	generalize order switch ops for 1-3d (#10395 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10395 Order switch ops (NCHW2NHWC and NHWC2NCHW) were only supporting 2D images. This diff generalizes them to 1D and 3D, and also add a unit test we didn't have. Reviewed By: protonu Differential Revision: D9261177 fbshipit-source-id: 56e7ec54c9a8fb71781ac1336f3f28cf024b4bda	2018-08-15 10:09:31 -07:00
James Reed	0f05f5fb07	ATen layer norm symbolic (#10513 ) Summary: We can't rely on the ATen fallback pathway here because we need to parse out the constant attributes explicitly Pull Request resolved: https://github.com/pytorch/pytorch/pull/10513 Reviewed By: dzhulgakov Differential Revision: D9322133 Pulled By: jamesr66a fbshipit-source-id: 52af947e6c44532ef220cb4b94838ca838b5df06	2018-08-15 08:28:52 -07:00
Peizhao Zhang	ce8e8feceb	Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified. (#10390 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10390 Fixed a bug in box_with_nms_limit where it may produce more bounding boxes than specified. * The original code first finds the threshold for the boxes at the 'detectons_per_im' position, and filters out boxes lower than the threshold. * In some cases that there are multiple boxes have the same threshold, the op will return more boxes than 'detectons_per_im'. Reviewed By: wat3rBro Differential Revision: D9252726 fbshipit-source-id: 63f40829bcd275cb181692bc7547c384cee01499	2018-08-14 23:54:23 -07:00
Matt Dawkins	e41528a5cc	Also set stdin to subprocess pipe in FindCUDA windows popen call (#10379 ) Summary: Background: we run pytorch in embedded C++ pipelines, running in C++ GUIs in https://github.com/Kitware/VIAME and without this addition, the call was failing with the below error, but only on certain windows platforms/configurations: OSError: [WinError6] The handle is invalid At: C:\Program Files\VIAME\Python36\site-packages\torch\cuda_init_.py(162):_lazy_init C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): <lambda> C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(182): _apply C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(176): _apply C:\Program Files\VIAME\Python36\site-packages\torch\nn\modules\module.py(249): cuda C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\arrows\pytorch\pytorch_resnet_f_extractor.py(74):_init_ C:\Program Files\VIAME\lib\python3.6None\site-packages\kwiver\processes\resnet_descriptors.py(132): _configure Pull Request resolved: https://github.com/pytorch/pytorch/pull/10379 Differential Revision: D9330772 Pulled By: ezyang fbshipit-source-id: 657ae7590879004558158d3c4abef2ec11d9ed57	2018-08-14 23:10:20 -07:00
Freddie Mendoza	f1631c3106	Modify build.sh and test.sh scripts for ppc64le jenkins build and test (#10257 ) Summary: Initial jenkins builds / test scripts for ppc64le. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10257 Differential Revision: D9331278 Pulled By: ezyang fbshipit-source-id: 6d9a4f300a0233faf3051f8151beb31786dcd838	2018-08-14 21:54:44 -07:00
Wei Yang	19ad55cc02	set coalesced=false at sparse transpose() and removed transpose invariants (#10496 ) Summary: - fixes https://github.com/pytorch/pytorch/issues/6219 - removed invariants at https://github.com/pytorch/pytorch/pull/4707 - assume a sparse tensor with coalesced=true when: 1. its elements are unique and 2. the indices are in sorted order Pull Request resolved: https://github.com/pytorch/pytorch/pull/10496 Differential Revision: D9311214 Pulled By: weiyangfb fbshipit-source-id: 167fa5a8e9e5f9c800db02f728a1194029f7e4f3	2018-08-14 21:25:37 -07:00
Mingzhe Li	964e30de1d	Workaround for Cuda9.2 and GCC7 compilation errors (#10510 ) Summary: Breaking out of #8338 This PR is a workaround for a bug with CUDA9.2 + GCC7. Here is the error this PR fixed: .../pytorch/caffe2/operators/elementwise_ops.h: In constructor ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>::BinaryElementwiseWithArgsOp(const caffe2::OperatorDef&, caffe2::Workspace)’: .../pytorch/caffe2/operators/elementwise_ops.h:106:189: error: ‘GetSingleArgument<bool>’ is not a member of ‘caffe2::BinaryElementwiseWithArgsOp<InputTypes, Context, Functor, OutputTypeMap>’ BinaryElementwiseWithArgsOp(const OperatorDef& operator_def, Workspace ws) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10510 Reviewed By: orionr Differential Revision: D9319742 Pulled By: mingzhe09088 fbshipit-source-id: ce59e3db14539f071f3c20301e77ca36a6fc3f81	2018-08-14 20:54:52 -07:00
Teng Li	b6cc65afea	Send, Recv, RecvAnysource, Barrier Op for MPI PG and Python Bindings (#10227 ) Summary: Based on: https://github.com/pytorch/pytorch/pull/10199 Added: (1) send, recv, recvanysource, and barrier for MPI process group. (2) python binding (3) testing Please review: `2e64f5d675` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10227 Reviewed By: ailzhang Differential Revision: D9327138 Pulled By: teng-li fbshipit-source-id: 80496714550a3ca498eb474465ddbd1b8d657d49	2018-08-14 20:10:11 -07:00
Zeming Lin	26e40fa665	Tensor.accessor now fails on rvalue reference (#10518 ) Summary: Previously, it's easy to do `x[0].accessor<float, 2>()`. However, x[0] is a temporary, so the accessor will point to invalid strides/sizes and probably segfault. With this change, such unsafe code is a compile error. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10518 Reviewed By: goldsborough Differential Revision: D9329288 Pulled By: ebetica fbshipit-source-id: d08763bee9a19a898b9d1ea5ba648f27baa1992f	2018-08-14 19:41:31 -07:00
Wei Wen	17ecc06b65	static casting TIndex (#10514 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10514 fix the bug which break the windows build in fused_rowwise_random_quantization_ops.h Reviewed By: ezyang, jspark1105 Differential Revision: D9322291 fbshipit-source-id: a6a27e87423b6caa973414ffd7ccb12076f2e1e4	2018-08-14 18:42:44 -07:00
Junjie Bai	60aa416a6d	Re-purpose setup_caffe2.py for faster caffe2 build iterations (#10520 ) Summary: setup.py is the official install script, setup_caffe2.py is not used any more Pull Request resolved: https://github.com/pytorch/pytorch/pull/10520 Reviewed By: yinghai Differential Revision: D9325548 Pulled By: bddppq fbshipit-source-id: 3dda87f3dff061b574fd1d5c91859044f065ee33	2018-08-14 18:13:19 -07:00
James Reed	32bb4040dd	Unified type annotation parsing for script frontends (#10279 ) Summary: After this, all combinations of {String frontend, Python AST Frontend}{Python 3-style type annotations, MyPy-style type comments}{Script method, Script function} should properly accept type annotations. Possible TODOs: - Clean up the functions marked HACK - Clean up the Subscript tree-view to better match the Python AST versions - Can we use this for Python functions? That's the only place annotations.get_signature() is still needed Pull Request resolved: https://github.com/pytorch/pytorch/pull/10279 Differential Revision: D9319726 Pulled By: jamesr66a fbshipit-source-id: b13f7d4f066b0283d4fc1421a1abb9305c3b28fa	2018-08-14 18:13:15 -07:00
Teng Li	b69b1c477b	Adding python binding for MPI process group (#10199 ) Summary: Based on https://github.com/pytorch/pytorch/pull/10159 Please review ProcessGroupMPI.cpp/hpp and init.cpp Pull Request resolved: https://github.com/pytorch/pytorch/pull/10199 Reviewed By: yf225 Differential Revision: D9324027 Pulled By: teng-li fbshipit-source-id: 2dd524bee0c7ca8f9594ec3b4f3ebbbb608df337	2018-08-14 15:56:33 -07:00
Duc Ngo	39bfc2d0d4	Nomnigraph - add diagnostic ability for Subgraph matching API (#10267 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10267 isSubtreeMatch now returns a SubtreeMatchResult which contains a match flag and a debugMessage string that contains the reason why a subtree is not matched (if requested). Reviewed By: bwasti Differential Revision: D9182429 fbshipit-source-id: 530591fad592d02fb4c31fc398960a14ec90c86a	2018-08-14 15:56:31 -07:00
Teng Li	3c39e857ca	Python binding for reduce,allgather,scatter,gather ops and python tests (#10159 ) Summary: Provided python binding for these four ops. Also provided nccl binding test. Based on https://github.com/pytorch/pytorch/pull/10058 Please only review init.cpp, and test file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10159 Reviewed By: yf225 Differential Revision: D9323192 Pulled By: teng-li fbshipit-source-id: b03822009d3a785ec36fecce2fc3071d23f9994e	2018-08-14 14:24:57 -07:00
Lara Haidar-Ahmad	16ecd6f99c	Fix Debug Build On Windows (#10359 ) Summary: compile files in torch/csrc with /MDd runtime library option for debug build on Windows Pull Request resolved: https://github.com/pytorch/pytorch/pull/10359 Differential Revision: D9316946 Pulled By: SsnL fbshipit-source-id: c84bfad81d61cd49f39b7bce7177edd2b1e8bd69	2018-08-14 13:24:14 -07:00
Teng Li	3f3a30f79c	Added Reduce,AllGather,Gather,Scatter Ops for NCCL and MPI process groups (#10058 ) Summary: Added - Reduce (both NCCL and MPI) - AllGather (both NCCL and MPI) - Gather (MPI) - Scatter (MPI) for c10d process groups. This basically finalizes all supported ops for C10d to match THD. All ops are tested as well. ``` mpirun -np 8 ./ProcessGroupMPITest Test successful Test successful Test successful Test successful Test successful Test successful Test successful Test successful ``` ``` ./ProcessGroupNCCLTest Allreduce test successful Broadcast test successful Reduce test successful Allgather test successful ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10058 Reviewed By: yf225 Differential Revision: D9316312 Pulled By: teng-li fbshipit-source-id: 6a6253268d34332327406b1f87335d1402f7133f	2018-08-14 13:10:21 -07:00
Peter Goldsborough	13814d6744	Remove use of data() in optimizers (#10490 ) Summary: After talking to users of the C++ API we found that having the tensor type be `autograd::Variable` causes more complications than having it be `at::Tensor`. It used to be a problem because `at::Tensor` didn't have the "autograd API" of variable (e.g. `detach()` or `grad()` methods), but those methods are now on `at::Tensor`. As such, we want to make a last big breaking change to have the tensor type be `at::Tensor`, while factory methods like `torch::ones` will return `Variable`s disguised as `at::Tensor`. This will make many things easier, like calling functions in ATen that take vectors of tensors. This PR makes a small step in this direction by updating the optimizer classes to not use `.data()` on `Variable` to access the underlying `at::Tensor`. Using `.data()` is effectively a hack to work around our modification rules for tensors that require grad. The proper way of doing things is to use `with torch.no_grad` or equivalently `NoGradGuard` in C++ to guard in-place operations. The next step can then simply redefine `torch::Tensor` to be `at::Tensor`. This transition should be smooth, since all methods available on `Variable` are at this point available on `at::Tensor`. For this PR I: 1. Modified the implementations of optimizers to not use `.data()`. This means the implementations are now different from PyTorch, which still uses the legacy method of using `.data`. 2. To properly verify (1), I added more fine-grained test cases to our optimizer tests, e.g. `SGD` with and without `weight_decay`, then with `nesterov` etc. Generally more tests = more happy! 3. Minor cleanup of the optimizer codebase ebetica apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10490 Differential Revision: D9318229 Pulled By: goldsborough fbshipit-source-id: fb386700f37840542bc5d323f308ea88fe5ea5c5	2018-08-14 13:10:19 -07:00
Lu Fang	bdb11e716a	Split the dependence of ONNX from test_operators.py (#10151 ) Summary: Now, run `python test/onnx/test_operators.py --no-onnx`, we won't introduce any onnx python dependence. (No onnx/protobuf python packages needs to be installed) The major changes: - output pbtxt from C++ exporter directly, so the floating format may be slightly different. (This should be fine, since it's just to guard ONNX exporting.) - ONNX python packages are only imported if we run the ONNX related checks. Those checks are disabled when using `--no-onnx` flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10151 Reviewed By: jamesr66a Differential Revision: D9130706 Pulled By: houseroad fbshipit-source-id: ea28cf5db8399929179698ee535137f209e9ce6f	2018-08-14 12:54:44 -07:00
Jan Stria	eea8ab1861	Move common code to RNNCellBase. (#10399 ) Summary: There are three classes `RNNCell`, `LSTMCell`, `GRUCell` inherited from `RNNCellBase`, all defining the identical initialization function `reset_parameters`. Lets move it to the common base. Another option is to have different initialization for RNN, LSTM and GRU. Maybe those weights whose output is processed with sigmoid (i.e. gain=1) should be initialized differently from those going to tanh (gain=5/3)? Pull Request resolved: https://github.com/pytorch/pytorch/pull/10399 Differential Revision: D9316978 Pulled By: SsnL fbshipit-source-id: a2d9408f0b5c971a3e6c3d42e4673725cf03ecc1	2018-08-14 12:39:59 -07:00
Jongsoo Park	bd497809e2	CAFFE_ENFORCE -> CAFFE_ENFORCE_EQ for error with more information (#10244 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10244 Use CAFFE_ENFORCE_EQ(x, y) instead of CAFFE_ENFORCE(x == y) in conv_op_impl.h for error messages with more information. Reviewed By: viswanathgs Differential Revision: D9177091 fbshipit-source-id: cf8d10afec1ce6793d3ae0b62f05648722a4130b	2018-08-14 12:24:44 -07:00
Edward Yang	2400512a08	Remove unnecessary include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10486 Reviewed By: ml7 Differential Revision: D9305283 fbshipit-source-id: 0d1316f9a72670ddbe8d95ead93603d00ad0f63b	2018-08-14 12:10:04 -07:00
Anders Papitto	d1442b36f3	add a rebuild_libtorch command for speedier iteration. (#10036 ) Summary: It just calls into `ninja install`. For iterative work on libtorch.so/_C.so, `python setup.py rebuild_libtorch develop` should provide quick iteration Pull Request resolved: https://github.com/pytorch/pytorch/pull/10036 Differential Revision: D9317869 Pulled By: anderspapitto fbshipit-source-id: 45ea45a1b445821add2fb9d823a724fc319ebdd2	2018-08-14 12:10:02 -07:00
Peizhao Zhang	520f4f6cb9	Added some unit test for box_with_nms_limit_op. (#10389 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10389 Added some unit test for box_with_nms_limit_op. Reviewed By: wat3rBro Differential Revision: D9237860 fbshipit-source-id: 2d65744bd387314071b68d2a0c934289fc64a731	2018-08-14 11:55:03 -07:00
Tongzhou Wang	d043f83019	Add tests for Tensor.* nn.* F.* docs (#10311 ) Summary: Test only for existence for now. I had to skip a lot of them so there a FIXME in the test. Also I'm not testing torch.* because of namespace issue. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10311 Differential Revision: D9196341 Pulled By: SsnL fbshipit-source-id: 9c2ca1ffe660bc1cc664474993f8a21198525ccc	2018-08-14 11:39:46 -07:00
Richard Zou	b4462511fd	Add LSTMCell backward pass expect tests (#10506 ) Summary: - Exposed get_debug_graph for ScriptModule (gets the debug graph for its forward Method) - Added forward/backward expect tests for lstm and milstm cells. These are intended to prevent regressions cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10506 Differential Revision: D9316590 Pulled By: zou3519 fbshipit-source-id: 3c2510d8363e9733ccbc5c7cc015cd1d028efecf	2018-08-14 11:39:44 -07:00
Yinghai Lu	e5811becdd	Add tags for onnx tensor descriptors (#10502 ) Summary: We missed 2 places to add tags when we create tensor descriptors. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10502 Reviewed By: Maratyszcza Differential Revision: D9312075 Pulled By: yinghai fbshipit-source-id: 329e83ec5470b0a778d2eda525dd6f2143facbdf	2018-08-14 11:25:52 -07:00
Orion Reblitz-Richardson	9497383706	Fix some warnings (#10297 ) Summary: Fixing some compiler warnings while looking at symbol visibility. cc smessmer ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10297 Reviewed By: soumith Differential Revision: D9195336 Pulled By: orionr fbshipit-source-id: 04cbfd3549984caec7bdd1a5b39a6d25e80348e9	2018-08-14 10:40:08 -07:00
Zachary DeVito	61bedc96f0	Schema-based creation of graph nodes (#10198 ) Summary: This commit adds the ability to insert a node with inputs, using the schema to check the inputs are valid types, fill in any default values, and perform standard implicit conversions. Since it is schema based, it will discover and use the right overload. Constructors to `NamedValue` enable it to be constructed using `IValue` constants so it is possible to use constant values in the input list as well: ``` g.insert(aten::add, {v, 3}); ``` Keyword arguments are also supported: ``` g.insert(aten::add, {v}, {{"other", t}, {"scalar", 1}}); ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10198 Differential Revision: D9307252 Pulled By: zdevito fbshipit-source-id: 644620aa85047d1eae1288383a619d50fec44d9b	2018-08-14 10:25:38 -07:00
LadyRick	3a40baa15c	fix a grammatical error: accelerate compute (#10204 ) Summary: "accelerate compute" a verb shouldn't go with another verb. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10204 Differential Revision: D9316699 Pulled By: fmassa fbshipit-source-id: f1126c594905c3236ffd6b7e57a92552d3d4c1f1	2018-08-14 10:11:15 -07:00
Ahti Kalervo	ef44faece2	check attribute existence in torch.legay.nn.SpatialFullConvolution in method type (#8740 ) Summary: This is related to #5255 When adding cuda support for the model, this error comes: `` AttributeError: 'SpatialFullConvolution' object has no attribute 'finput' `` here is my short code for test. https://gist.github.com/kaleaht/26518c3deea5d1d3dda722fbf1f3ecdc I converted torch7's model also from here. https://github.com/art-programmer/FloorplanTransformation Pull Request resolved: https://github.com/pytorch/pytorch/pull/8740 Differential Revision: D8872735 Pulled By: SsnL fbshipit-source-id: 8d97f8b59cdf4049e87be14b78c4608fd973d149	2018-08-14 10:11:13 -07:00
jgong5	329d901a91	Fold AffineChannel to Conv, the same way as BN (for Detectron models) (#10293 ) Summary: AffineChannel is being used by public Detectron models, e.g. Mask-RCNN and Faster-RCNN. This PR folds this op into convolution the same way as BN to speed up inference. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10293 Differential Revision: D9276789 Pulled By: yinghai fbshipit-source-id: fbf6dd2c1be05f5713f760752e7245b1320a122b	2018-08-13 22:43:37 -07:00
Bram Wasti	c618df154e	Add intrinsic support for external_input/output to nomnigraph (#10100 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10100 nomnigraph has until this point tried to ignore external input and output, as they aren't very well defined (does order matter?). but for DCE and some of Keren's work they are becoming necessary. I went ahead and added this to the core nomnigraph converter Reviewed By: yinghai Differential Revision: D9105487 fbshipit-source-id: a2e10e3cc84515611d6ab7d4bc54cf99b77729c0	2018-08-13 21:39:17 -07:00
Vishwak Srinivasan	7d16e87f14	Fix byte ordering issue in from_numpy (#9508 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/3671 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9508 Differential Revision: D9307186 Pulled By: soumith fbshipit-source-id: 39dcaa6fd2d330d7085802acd6f63c19270164fa	2018-08-13 21:39:16 -07:00
peter	facb293aad	Fix FindMKL.cmake for Windows (#10453 ) Summary: Targets the issue discussed at https://github.com/pytorch/pytorch/pull/7399#issuecomment-400788971. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10453 Differential Revision: D9311591 Pulled By: soumith fbshipit-source-id: ac0712e10bdac4ea3f76d6fbad2178ec958b3a31	2018-08-13 21:09:27 -07:00
Richard Zou	fed05cf4cf	Fix prim::FusedConcat bug (#10466 ) Summary: Fixes #10456 The graph fuser was fusing together groups with prim::FusedConcat (the producer) with other ops (the consumer) if the consumer is fusable. For example, ``` import torch torch.jit.script def fn(x, y, z): x1 = x + y y1 = x - y w = torch.cat([x1, y1]) return w + z x = torch.randn(2, 2, dtype=torch.float, device='cpu') y = torch.randn(2, 2, dtype=torch.float, device='cpu') z = torch.randn(4, 2, dtype=torch.float, device='cpu') fn(x, y, z) fn.graph_for(x, y, z) ``` produced the following graph: ``` graph(%x : Float(2, 2) %y : Float(2, 2) %z : Float(4, 2)) { %3 : int = prim::Constant[value=1]() %y1 : Float(2, 2) = aten::sub(%x, %y, %3) %8 : int = prim::Constant[value=0]() %14 : Float(4, 2) = prim::FusionGroup_0[device=-1](%z, %y1, %x, %y) return (%14); } with prim::FusionGroup_0 = graph(%1 : Float(4, 2) %5 : Float(2, 2) %7 : Float(2, 2) %8 : Float(2, 2)) { %11 : int = prim::Constant[value=1]() %9 : int = prim::Constant[value=1]() %x1 : Float(2, 2) = aten::add(%7, %8, %9) %w : Float(4, 2) = prim::FusedConcat[dim=0](%x1, %5) %2 : int = prim::Constant[value=1]() %3 : Float(4, 2) = aten::add(%w, %1, %2) return (%3); } ``` this is a problem because it violates two invariants: 1) all inputs to the FusionGroup must have the same size 2) prim::FusedConcat's output must not be used inside the FusionGroup This PR fixes this problem by checking if the output to a FusionGroup came from a prim::FusedConcat node when deciding whether to fuse the consumer and producer. If the producer is a value that came from a prim::FusedConcat node in a FusionGroup, then consumer & producer do not get fused. cc apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10466 Differential Revision: D9296686 Pulled By: zou3519 fbshipit-source-id: ed826fa9c436b42c04ca7d4d790cece804c162bd	2018-08-13 21:09:25 -07:00
Junjie Bai	099a545376	Hipify Caffe2 binaries (#10468 ) Summary: petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/10468 Reviewed By: yinghai Differential Revision: D9301178 Pulled By: bddppq fbshipit-source-id: 5da88aa4d79a5142f8e744cdcd8ae85951bc387c	2018-08-13 20:56:28 -07:00
Peter Goldsborough	9a9224e5c1	Remove "locally" from CONTRIBUTING.md (#10495 ) Summary: A bootcamper was confused by the word "locally" and thought it meant on his macbook as opposed to his FB dev machine. Besides the confusion for the FB context, the word "locally" isn't really necessary at all soumith ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10495 Reviewed By: soumith Differential Revision: D9311480 Pulled By: goldsborough fbshipit-source-id: 2779c7c60f903a1822a50d140ed32a346feec39e	2018-08-13 20:56:26 -07:00
Lukasz Wesolowski	f6eb966fd2	Fix TanhGradientOperator linker errors (#10426 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10426 We were seeing linker errors for TanhGradientOperator in multifeed. Since we only use the float specialization, we might as well define it that way. Reviewed By: yinghai Differential Revision: D9280622 fbshipit-source-id: d2ffb698c73a84bb062de5e1f3bda741330e4228	2018-08-13 17:57:10 -07:00
Wei Wen	ffb59e5f20	adding stochastic quantization caffe2 operators (encoder and decoder in CPU are implemented. GPU mode is pending) Summary: This operator implements b (1/2/4/8) bit stochastic quantization of a floating matrix in a row-wise fashion. 8/b floating values are concatenated to a byte and returned in uint8 tensor. PR: https://github.com/pytorch/pytorch/pull/8629 Reviewed By: harouwu Differential Revision: D8493264 fbshipit-source-id: 01f64066568a1e5a2b87c6d2134bd31cdf119c02	2018-08-13 16:39:23 -07:00
pbialecki	c6fc3ab557	fixes printing non-contiguous tensors Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10405 Differential Revision: D9302794 Pulled By: soumith fbshipit-source-id: e4a7db8d33400a5a050d05fd1679de8bc3cbcf30	2018-08-13 16:26:20 -07:00
Gregory Chanan	216961b7bf	Remove is_zero_dim_ bool in THTensor. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10415 Reviewed By: ezyang Differential Revision: D9274954 Pulled By: gchanan fbshipit-source-id: 353a52d91556d5b81c3510eb2bf399d102c9a0a4	2018-08-13 12:39:06 -07:00
peter	f59cce95b4	Some symbol annotation fixes for Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10369 Differential Revision: D9300187 Pulled By: ezyang fbshipit-source-id: bf29966ad6aa221332b7232a965fb85e652f866d	2018-08-13 12:26:00 -07:00
Edward Yang	382ff03222	Add missing #pragma once Reviewed By: ml7 Differential Revision: D9299779 fbshipit-source-id: b5b5a1b9ead1b275d3ae54ecfad99617d2869094	2018-08-13 11:39:45 -07:00
iotamudelta	75651d5b58	improve use of ROCm libraries, enable more tests, small fixes (#10406 ) Summary: * some small leftovers from the last PR review * enable more unit test sets for CI * replace use of hcRNG w/ rocRAND (docker image was already updated w/ newer rocRAND) * use rocBLAS instead of hipBLAS to allow convergence w/ Caffe2 * use strided_batched gemm interface also from the batched internal interface * re-enable Dropout.cu as we now have philox w/ rocRAND Pull Request resolved: https://github.com/pytorch/pytorch/pull/10406 Reviewed By: Jorghi12 Differential Revision: D9277093 Pulled By: ezyang fbshipit-source-id: 7ef2f6fe4ead77e501ed7aea5c3743afe2466ca2	2018-08-13 11:39:43 -07:00
Jesse Hellemn	cd81217f8e	A single print statement in setup.py Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10473 Reviewed By: ml7 Differential Revision: D9299196 Pulled By: pjh5 fbshipit-source-id: f9aa84c2859df12f9da9ac5205e1918c253e19fb	2018-08-13 11:39:42 -07:00
Sam Gross	0b63d12db6	Don't call into Python during Storage destruction. (#10407 ) Summary: ``` This removes PyObjectFinalizer. We were seeing SIGSEGV at exit in some programs that use multiprocessing. The backtrace pointed to StorageRef.__del__ being called from subtype_dealloc. My guess is that the Python interpreter was shutdown before all C++ Storage objects were deallocated. Deallocating the C++ Storage called the finalizer which called back into Python after it was no longer safe to do so. This avoids a callback from C++ into Python during Storage finalization. Instead, dead Storage objects (expired weak references) are collected periodically when shared_cache exceeds a limit. The limit is scaled with 2x the number of live references, which places an upper bound on the amount of extra memory held by dead Storage objects. In practice, this should be very small. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10407 Differential Revision: D9272400 Pulled By: colesbury fbshipit-source-id: ecb14d9c6d54ffc91e134c34a4e770a4d09048a2	2018-08-13 11:20:07 -07:00
Edward Yang	64235d5c01	Rewrite TensorImpl to use TensorTypeId. (#10278 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10278 Translation to Backend happens immediately before we go into the Type universe; otherwise we use TensorTypeId. I allocated TensorTypeId corresponding exactly to existing ATen Backend. Only CPUTensorId and CUDATensorId are relevant in the Caffe2 universe. Reviewed By: gchanan Differential Revision: D9184060 fbshipit-source-id: 9d3989c26f70b90f1bbf98b2a96c57e2b0a46597	2018-08-13 11:20:04 -07:00
Edward Yang	145eb330ad	Back out "Back out "Move typeid.h to move to ATen/core"" (#10465 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10465 Original commit changeset: 7050fe845e65 Reviewed By: jerryzh168 Differential Revision: D9296375 fbshipit-source-id: cb8161440ba809dcec5027858a29cd026d537fc3	2018-08-13 11:20:01 -07:00
Zeming Lin	b8530dc1f0	A few additions (#9837 ) Summary: This PR provides 4 fixes / features: 1. torch::nn::Cloneable inherits virtually from torch::nn::Module. We want to pass around a module with new functions, and the best way to do this is to do a diamond inheritance pattern, i.e. ```c++ struct MySuperModuleImpl : virtual public torch::nn::Module { virtual void myFunction() = 0; } struct MySuperModule : public torch::nn::Cloneable<MySuperModule>, MySuperModuleImple {}; struct MyModule : public MySuperModule<MyModule> { void myFunction() override; }; ``` This way, we can simply pass around MySuperModuleImpl around instead of torch::nn::Module. 2. Optimizer options are public now, since there's no way to decay the LR or modify it during training otherwise 3. Serialization functions creates autograd history and calls copy_! Bad! 4. Optimizers did not create buffers after add_parameters was called. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9837 Reviewed By: goldsborough Differential Revision: D9199746 Pulled By: ebetica fbshipit-source-id: 76d6b22e589a42637b7cc0b5bcd3c6b6662fb299	2018-08-13 10:24:58 -07:00
root	0a39a9cfbc	Add db directory for hipifying (#10428 ) Summary: bddppq petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/10428 Differential Revision: D9297115 Pulled By: bddppq fbshipit-source-id: d7134ff24102f03f762e6a7b4340055546c9ecfd	2018-08-13 10:24:56 -07:00
Yangqing Jia	56267cc97b	gflags improvement to allow CAFFE2_EXPORTS (#10444 ) Summary: Explanation copied from code: // Motivation about the gflags wrapper: // (1) We would need to make sure that the gflags version and the non-gflags // version of Caffe2 are going to expose the same flags abstraction. One should // explicitly use caffe2::FLAGS_flag_name to access the flags. // (2) For flag names, it is recommended to start with caffe2_ to distinguish it // from regular gflags flags. For example, do // CAFFE2_DEFINE_BOOL(caffe2_my_flag, true, "An example"); // to allow one to use caffe2::FLAGS_caffe2_my_flag. // (3) Gflags has a design issue that does not properly expose the global flags, // if one builds the library with -fvisibility=hidden. The current gflags (as of // Aug 2018) only deals with the Windows case using dllexport, and not the Linux // counterparts. As a result, we will explciitly use CAFFE2_EXPORT to export the // flags defined in Caffe2. This is done via a global reference, so the flag // itself is not duplicated - under the hood it is the same global gflags flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10444 Differential Revision: D9296726 Pulled By: Yangqing fbshipit-source-id: a867d67260255cc46bf0a928122ff71a575d3966	2018-08-13 09:54:48 -07:00
Edward Yang	64a6f17177	Fix ATen/core header installation. (#10463 ) Summary: Fixes #10353 and fixes #10397. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10463 Differential Revision: D9296491 Pulled By: ezyang fbshipit-source-id: f825c2a21a113e44a6f5c1c5ec17814d9deac366	2018-08-13 09:25:49 -07:00
onnxbot	fa5d95a00c	Bump onnx to onnx/onnx@0d250de (#10452 ) Summary: `0d250dea76` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10452 Reviewed By: houseroad Differential Revision: D9288037 Pulled By: bddppq fbshipit-source-id: 206be3ee2b8ebca26f3d8af0597078363ed6d168	2018-08-13 00:09:15 -07:00
Tongliang Liao	3cbe8f0c3e	Detect system RocksDB installation with CMake config files. (#7315 ) Summary: On Windows, the FindRocksDB script doesn't detect rocksdb installation built by cmake. And it doesn't include/link the RocksDB dependencies either, like: * `Snappy` * `Shlwapi.lib` * `Rpcrt4.lib` This PR try to detect in config mode first before using private find module. Pull Request resolved: https://github.com/pytorch/pytorch/pull/7315 Differential Revision: D9287587 Pulled By: Yangqing fbshipit-source-id: 314a36a14bfe04aa45013349c5537163fb4c5c00	2018-08-12 18:24:10 -07:00
Tongliang Liao	82d11b847e	Use CUDA_LINK_LIBRARIES_KEYWORD instead of hacking. (#10437 ) Summary: There's no need to hack. Using `CUDA_LINK_LIBRARIES_KEYWORD` is the normal way. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10437 Differential Revision: D9287579 Pulled By: Yangqing fbshipit-source-id: d3d575ea8c3235576ba971e4b7493ddb435f92f3	2018-08-12 18:09:20 -07:00
Tongliang Liao	508de8109f	Added missing "AT_" prefix to macro. (#10436 ) Summary: For issue #10435 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10436 Differential Revision: D9287578 Pulled By: Yangqing fbshipit-source-id: b07de3a2d7fa6f980a189b5e8f7ce05dfa1bef50	2018-08-12 18:09:19 -07:00
Yinghai Lu	1756daaa75	Use FULL_CAFFE2 to build caffe2 and python in one shot (#10427 ) Summary: Building caffe2 and pytorch separately will end up duplicated symbols as they now share some basic libs. And it's especially bad for registry. This PR fixes our CI and build them in one shot with shared symbols. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10427 Reviewed By: bddppq Differential Revision: D9282372 Pulled By: yinghai fbshipit-source-id: 0514931ea88277029a68fa5368ff4336472f132e	2018-08-12 15:39:12 -07:00
Edward Yang	51f154e072	Fix Python lint errors. (#10441 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10441 Reviewed By: Yangqing Differential Revision: D9285502 Pulled By: ezyang fbshipit-source-id: 12c94b28bee9cade930c8f260577e81ea1915269	2018-08-11 21:08:50 -07:00
Yangqing Jia	cd53b78bd0	Remove caffe namespace GetEmptyStringAlreadyInited (#10438 ) Summary: A followup cleanup of #10380 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/10438 Differential Revision: D9285692 Pulled By: Yangqing fbshipit-source-id: c73defbef00d3b563240d0b69d85bd0a6e3eb504	2018-08-11 17:39:58 -07:00
jgong5	ab6afc2b23	Optimize max_pooling for inference for MKL-DNN/IDEEP device (#10156 ) Summary: Optimize the max_pooling operation for inference path by setting the "inference" flag to the underlying MKL-DNN, saving the computation and store of max indices which is only needed for training. To make the API compatible, training mode is still the default and inference mode is set in the optimizeForIdeep path. Test shows the speed-up of a single max_pooling operation is up to 7X on BDW. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10156 Differential Revision: D9276755 Pulled By: yinghai fbshipit-source-id: ad533d53aabb8ccb3b592da984d6269d9b794a8a	2018-08-10 23:14:05 -07:00
Yinghai Lu	d3ccc836de	Fix warning in Nomnigraph (#10425 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10425 `const size_t` as return value doesn't make sense. Reviewed By: duc0 Differential Revision: D9281442 fbshipit-source-id: c3d9c94f5dbe516476f0c74f63c35e60893c8140	2018-08-10 22:40:26 -07:00
Edward Yang	1dbdc5a93d	Back out "Move typeid.h to move to ATen/core" Summary: Original commit changeset: 21f2c89e58ca Reviewed By: yinghai Differential Revision: D9282171 fbshipit-source-id: 7050fe845e6524b965bdd45794a6fa1665b83e34	2018-08-10 21:39:25 -07:00
Jason Gauci	31646edfff	Increase GLOO rendevous timeout Summary: Increase GLOO rendevous timeout Reviewed By: teng-li Differential Revision: D9273544 fbshipit-source-id: 5c22c1d18df3032f019ff12e2a720aea7c390f15	2018-08-10 18:40:18 -07:00
Edward Yang	767687835e	Replace sudo with --user in CI caffe2 install Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10328 Reviewed By: pjh5 Differential Revision: D9275809 Pulled By: ezyang fbshipit-source-id: c22cb1570c67199b74b2188ad83b1e4828e11911	2018-08-10 15:11:43 -07:00
Adam Paszke	adbcb3c1dc	Move dropout and alpha dropout to ATen (#10384 ) Summary: zdevito ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/10384 Reviewed By: ezyang Differential Revision: D9272583 Pulled By: apaszke fbshipit-source-id: ed5d37b28ce9ff25800bbaa0daf066cfbf1f9921	2018-08-10 14:55:28 -07:00
Gregory Chanan	5b0be9de59	Remove TH compatibility calls for strides. (#10414 ) Summary: This should just work now that sizes/strides are unified between TH and ATen. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10414 Differential Revision: D9274681 Pulled By: gchanan fbshipit-source-id: 69eb766f4e3a5b6c57b15837cffdef513b6d7817	2018-08-10 13:54:58 -07:00
Edward Yang	674f7a9778	Correctly share CUDA Parameters. (#10220 ) Summary: ``` Correctly share CUDA Parameters, requires_grad and hooks. Previously, the following was true: - If you put a Parameter for a CUDA tensor in multiprocessing queue (or otherwise tried to transfer it), this failed, saying that we cannot pickle CUDA storage. This is issue #9996. - If you put a leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False (It should have come out the other end as requires_grad=True). Similarly, backwards hooks were lost. - If you put a non-leaf Tensor that requires_grad=True through the multiprocessing queue, it would come out the other end as requires_grad=False. The root cause for the first issue was that implementation of reductions for Parameter used the superclass implementation (tensor) in __reduce_ex__, but this always picks up the non-ForkingPickler reduction, which doesn't work with CUDA tensors. So, we registered a new ForkingPickler specifically for Parameter, and adjusted the code to correctly rewrap a Tensor in a Parameter if it was originally a parameter. While working on this, we realized that requires_grad and backwards hooks would not be preserved in the ForkingPickler reduction implementation. We fixed the reducer to save these parameters. However, Adam Paszke pointed out that we shouldn't allow sending requires_grad=True, non-leaf Tensors over a multiprocessing queue, since we don't actually support autograd over process boundar. We now throw an error in this case; this may cause previously working code to fail, but this is easy enough to fix; just detach() the tensor before sending it. The error message says so. Fixes #9996. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10220 Differential Revision: D9160746 Pulled By: ezyang fbshipit-source-id: a39c0dbc012ba5afc7a9e646da5c7f325b3cf05c	2018-08-10 13:54:56 -07:00
Christian Puhrsch	0b8a0125ab	Fixes torch.log after torch.expand giving incorrect results (#10269 ) Summary: fixes #10241 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10269 Differential Revision: D9272472 Pulled By: cpuhrsch fbshipit-source-id: cd1afbb4386a0d0956ee21b24f0d529755b986ca	2018-08-10 13:39:38 -07:00
Tongzhou Wang	6a55238a3f	Grid sampler: nearest interpolation & reflection padding (#10051 ) Summary: closes #9702 . cc jph00 Commit structure: 1. Change the index calculation logic. I will explain using 1-D for simplicity. Previously we have (in pseudo code): ``` // 1. get the float locations from grid scalar_t x = from_grid() // 2. find the integral surrounding indices int x_left = floor(x) int x_right = x_left + 1 // 3. calculate the linear interpolate weights scalar_t w_left = x_right - x scalar_t w_right = x - x_left // 4. manipulate the integral surrounding indices if needed // (e.g., clip for border padding_mode) x_left = manipulate(x_left, padding_mode) x_right = manipulate(x_right, padding_mode) // 5. interpolate output_val = interpolate(w_left, w_right, x_left, x_right) ``` This is actually incorrect (and also unintuitive) because it calculates the weights before manipulate out-of-boundary indices. Fortunately, this isn't manifested in both of the current supported modes, `'zeros'` and `'border'` padding: + `'zeros'`: doesn't clip + `'border'`: clips, but for out-of-bound `x` both `x_left` and `x_right` are clipped to the same value, so weights don't matter But this is a problem with reflection padding, since after each time we reflect, the values of `w_left` and `w_right` should be swapped. So in this commit I change the algorithm to (numbers corresponding to the ordering in the above pseudo-code) ``` 1. get float location 4. clip the float location 2. find the integral surrounding indices 3. calculate the linear interpolate weights ``` In the backward, because of this change, I need to add new variables to track `d manipulate_output / d manipulate_input`, which is basically a multiplier on the gradient calculated for `grid`. From benchmarking this addition doesn't cause obvious slow downs. 2. Implement reflection padding. The indices will keep being reflected until they become within boundary. Added variant of `clip_coordinates` and `reflect_coordinates` to be used in backward. E.g., ```cpp // clip_coordinates_set_grad works similarly to clip_coordinates except that // it also returns the `d output / d input` via pointer argument `grad_in`. // This is useful in the backward pass of grid_sampler. scalar_t clip_coordinates_set_grad(scalar_t in, int64_t clip_limit, scalar_t grad_in) ``` For example, if `in` is clipped in `'border'` mode, `grad_in` is set to `0`. If `in` is reflected odd* times in `'reflection'` mode, `grad_in` is set to `-1`. 3. Implement nearest interpolation. 4. Add test cases 5. Add better input checking Discussed with goldsborough for moving `operator<<` of `at::Device`, `at::DeviceType` and `at::Layout` into `at` namespace. (Otherwise `AT_CHECK` can't find them.) 6. Support empty tensors. cc gchanan + Make empty tensors not acceptable by cudnn. + Add `AT_ASSERT(kernel block size > 0)` if using `GET_BLOCKS` + Cache `numel` in `TensorGeometry` I was going to use `numel` to test if cudnn descriptor should accept a tensor, but it isn't used eventually. I can revert this if needed. 7. Add more test cases, including on input checking and empty tensors 8. Remove an obsolete comment 9. Update docs. Manually tested by generating docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10051 Differential Revision: D9123950 Pulled By: SsnL fbshipit-source-id: ac3b4a0a36b39b5d02e83666cc6730111ce216f6	2018-08-10 12:43:27 -07:00
Jesse Hellemn	def3715e82	Minor changes for nicer pip packages (#9544 ) Summary: I am using this to test a CI job to upload pip packages, and so am using the Caffe2 namespace to avoid affecting the existing pytorch packages. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9544 Reviewed By: orionr Differential Revision: D9267111 Pulled By: pjh5 fbshipit-source-id: a68162ed29d2eb9ce353d8435ccb5f16c3b0b894	2018-08-10 12:09:46 -07:00
Yangqing Jia	40109b16d0	Remove caffe1 specific proto (#10380 ) Summary: This was used as a convenient way for us to convert c1 models. Now that conversion is more or less done, we should probably require any users who need to convert c1 models to explicitly install c1. This PR removes the explicit c1 proto (which was copied from c1) in favor of explicit installation. Note that caffe_translator would still work properly, only difference is that now users need to install c1 separately. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10380 Differential Revision: D9267981 Pulled By: Yangqing fbshipit-source-id: a6ce5d9463e6567976da83f2d08b2c3d94d14390	2018-08-10 11:10:26 -07:00
Anders Papitto	018790cd4b	thread BUILD_SHARED_LIBS through build_pytorch_libs.sh Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10272 Differential Revision: D9239337 Pulled By: anderspapitto fbshipit-source-id: 187b3acb7e85635d9b45a3dd82c98d86a2b51e70	2018-08-10 10:39:31 -07:00
Gregory Chanan	9b8a036873	Fix basic.cpp, which compared equality between a size [1] tensor with… (#10404 ) Summary: … a size [] tensor. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10404 Differential Revision: D9268467 Pulled By: gchanan fbshipit-source-id: 92bb387358f4030519c6883c12ea69312185446e	2018-08-10 10:39:29 -07:00
Zhishuai Zhang	e524a8994b	Make lengths_host_.CopyFrom synced in LengthsCosineCoherenceOp and LengthsTileOp (#10360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10360 It seems `lengths_host_.CopyFrom(lengthsInput, &context_);` is asynchronous w.r.t. the host while `lengths_host_.CopyFrom(lengthsInput);` is synchronous. However, according to jerryzh168, `lengths_host_.CopyFrom(lengths, &context_); context_.FinishDeviceComputation();` is the safest way to guarantee synchronization. Reviewed By: jerryzh168 Differential Revision: D9197923 fbshipit-source-id: 827eb63d9d15c1274851e8301a793aed39d4fa6b	2018-08-10 10:39:28 -07:00
Adam Paszke	be5fb8f6fd	Move fused RNN kernels into ATen (#10305 ) Summary: As in the title. I also did a small refactor that let us loose almost 400 loc. This is a first step in moving the RNN code to C++. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10305 Reviewed By: ezyang Differential Revision: D9196227 Pulled By: apaszke fbshipit-source-id: 54da905519aade29baa63ab1774a3ee1db5663ba	2018-08-10 09:12:05 -07:00
Gregory Chanan	e221791afc	Fix typo. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10387 Differential Revision: D9255840 Pulled By: gchanan fbshipit-source-id: 97b52d4e349c1e2d1970abde7dc6b25e7cf668a0	2018-08-10 08:55:30 -07:00
Gregory Chanan	1e3e26e3e8	Use nDimensionLegacyNoScalars in THTensorDimApply. (#10388 ) Summary: This issue was exposed in https://github.com/pytorch/pytorch/pull/10383. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10388 Differential Revision: D9255836 Pulled By: gchanan fbshipit-source-id: 88c5a6415c27d56ff54d00a8957fdc1617cfbde7	2018-08-10 08:55:28 -07:00
Edward Yang	3667d029b4	Move typeid.h to move to ATen/core (#10163 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10163 - Remove dependency on caffe2/core/common.h for ATen/core/typeid.h Unfortunately, Windows seems to rely on typeid.h including this header, so it is still included from the forwarding header caffe2/core/typeid.h - Deduplicate Demangle/DemangleType with their ATen equivalents Reviewed By: smessmer Differential Revision: D9132432 fbshipit-source-id: 21f2c89e58ca1e795f1b2caa316361b729a5231b	2018-08-10 08:45:44 -07:00
Roy Li	e9ad74357e	Use serialization container in ir import export (#10394 ) Summary: Copy of #10191 because these changes didn't land with the diff. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10394 Differential Revision: D9260816 Pulled By: li-roy fbshipit-source-id: 7dc16919cfab6221fda1d44e98c5b900cfb40558	2018-08-10 00:09:30 -07:00
Michael Suo	0950d7a98d	support list slicing (#10318 ) Summary: As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10318 Differential Revision: D9254351 Pulled By: michaelsuo fbshipit-source-id: be891a584dc295b5e353f7f5257d64a356fb9586	2018-08-09 17:25:13 -07:00
Gregory Chanan	b1e3239ec8	Fix some backwards definitions wrt keepdim. (#10382 ) Summary: Before we had 0-dim tensors in TH, we were flexible in what we accepted wrt to the difference between size [] and size [1] tensors in backwards functions because they were identical in TH. So, we had backwards definitions that were technically incorrect, but happened to work. This often masks shape issues, adds greatly to code complexity and thus IMO isn't worth keeping. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10382 Differential Revision: D9244618 Pulled By: gchanan fbshipit-source-id: 2c29c53a8ffe8710843451202cad6b4323af10e8	2018-08-09 15:11:55 -07:00
Gregory Chanan	209af45614	Back out "[pytorch][PR] Fix bincount for empty input" Summary: Original commit changeset: 6c4c66c23679 Reviewed By: SsnL Differential Revision: D9253403 fbshipit-source-id: bf5ee669ed095c06ff58a2871f7350e879261076	2018-08-09 14:25:33 -07:00
Alex Sergeev	18d2fcde7a	Fix performance of DistributedSampler per #8958 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10361 Differential Revision: D9240798 Pulled By: ezyang fbshipit-source-id: dc4cfe79612f711bbcff34a147877df6a5f7b89f	2018-08-09 12:54:37 -07:00
Thomas Viehmann	64a60030a6	Don't copy on clamp, clamp_out (#10352 ) Summary: This makes clamp and relu faster (fixes #10276). The extra copying was introduced when clamp moved to ATen and the _th_clamp_ wrapper was used to forward to TH/THC, we remove that and add _th_clamp(_out) instead. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10352 Reviewed By: ezyang Differential Revision: D9233590 Pulled By: SsnL fbshipit-source-id: 4f86a045498e5e577fb22656c71f171add7ed0ac	2018-08-09 12:40:47 -07:00
Vishwak Srinivasan	b43beec070	Fix bincount for empty input (#9757 ) Summary: Added tests too. Fixes #9756 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9757 Differential Revision: D8966879 Pulled By: soumith fbshipit-source-id: 9f08a9d5d5d037db16319141d7a227a5efa23869	2018-08-09 12:40:45 -07:00
peter	cc5b47ff47	Fix the logic for PATH guess on Windows Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10372 Differential Revision: D9240207 Pulled By: soumith fbshipit-source-id: 0933f6fde19536c7da7d45044efbdcfe8ea40e1f	2018-08-09 12:40:44 -07:00
Tongzhou Wang	3fa1c1022a	Avoid std::thread ctor "cannot resolve" error (#10381 ) Summary: If an `at::test` function is added, gcc can't figure out the `std::thread(test, -1)` resolution. It is not a problem for current code. I bumped into this when playing with native functions. But I think it is a good to just prevent it from happening in future by removing `using namespace at;`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10381 Differential Revision: D9241614 Pulled By: SsnL fbshipit-source-id: 972ac3cecff3a50602b3fba463ae1ebd3f53d036	2018-08-09 11:55:40 -07:00
peter	99b10adc01	Fix compile flags for MSVC Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10368 Differential Revision: D9240791 Pulled By: ezyang fbshipit-source-id: 536b093b5c800cc1cf02cbbde9ae341e25d083d1	2018-08-09 09:39:58 -07:00
Gregory Chanan	7d53c876dc	Move maybeZeroDim to TH, change condition so it doesn't turn off scal… (#10333 ) Summary: …ars. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10333 Differential Revision: D9206091 Pulled By: gchanan fbshipit-source-id: 492c50189edc2056aa2acce98d49234d2a54ce39	2018-08-09 09:28:57 -07:00
Gregory Chanan	e967fa9757	Fix THTensor_nElement for scalars. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10332 Differential Revision: D9206039 Pulled By: gchanan fbshipit-source-id: 0bc7c15050a6a602f621d3e9ecc3a6ea35481a6a	2018-08-09 09:28:55 -07:00
Thomas Viehmann	52d85bedb7	Deal with undefined tensors in unbind backward (#9995 ) Summary: When only part of the outputs of unbind are used in a backward, the gradients for the others are undefined. This sets those to zero in to_tensor_list. Fixes: #9977 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9995 Differential Revision: D9239610 Pulled By: soumith fbshipit-source-id: eb8d1b3f2b4e615449f9d856e10b946910df9147	2018-08-09 08:54:28 -07:00
Zhishuai Zhang	b70b7066f7	Keep kEps in one place to make sure they are consistent (#10334 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10334 Keep kEps in one place to make sure they are consistent Reviewed By: xianjiec Differential Revision: D9202280 fbshipit-source-id: 35d173ce1d1a361b5b8cdbf1eac423e906e7c801	2018-08-09 08:27:42 -07:00
Tongzhou Wang	04f381650e	Resubmit: Fix dataloader hang when it is not completely iterated (#10366 ) Summary: https://github.com/pytorch/pytorch/pull/9655 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10366 Differential Revision: D9237393 Pulled By: SsnL fbshipit-source-id: fabfad7f371ba33300098f6b885c0e3f26c3e14a	2018-08-09 00:10:24 -07:00
Tongzhou Wang	037d8d1bab	Order Loss functions alphabetically in nn.rst Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10365 Differential Revision: D9237287 Pulled By: SsnL fbshipit-source-id: 28e9de76b9cfd8f63c8df561ff1531ea8d0803ea	2018-08-08 22:39:55 -07:00
Marat Dukhan	9dfc4edc68	Update NNPACK and cpuinfo submodules (#8564 ) Summary: Bring in extra optimizations in Winograd-based convolution on NEON Pull Request resolved: https://github.com/pytorch/pytorch/pull/8564 Reviewed By: hlu1 Differential Revision: D9088140 Pulled By: Maratyszcza fbshipit-source-id: 2089191416db98bdad8f0e4848b1435fcf74a88b	2018-08-08 22:39:52 -07:00
Thomas Viehmann	6e49f933ad	Check that result is on CPU for CPU unary ops kernels (#10358 ) Summary: Fixes: #10270 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10358 Differential Revision: D9233066 Pulled By: soumith fbshipit-source-id: 39b7524fe55ddb899fb27e2c0ef504ce54dbad35	2018-08-08 21:11:53 -07:00
Duc Ngo	783f2c60b2	nomnigraph - Enhancements to subgraph matching APIs (#10218 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10218 SubtreeMatchCriteria now supports: - nonTerminal flag : if this is set, it means we only match the root of the subtree and do not care about the children. Example use case: to match an "input" node but does not care how the input is produced. Additional tests for these new logic are added to subgraph_matcher_test.cc. Subgraph matching APIs for NNGraph is also added. (Further enhancement to make the SubgraphMatching API constructs a Subgraph object/more diagnostic information will go later). Reviewed By: bwasti Differential Revision: D9156092 fbshipit-source-id: 3f28ac15d9edd474b3e0cd51fd7e6f973299d061	2018-08-08 14:56:23 -07:00
Ailing Zhang	69760e2840	update torch.eig() doc (#10315 ) Summary: This fixes #9383 Update torch.eig() doc, the complex part is written based on https://scc.ustc.edu.cn/zlsc/sugon/intel/mkl/mkl_manual/GUID-16EB5901-5644-4DA6-A332-A052309010C4.htm Pull Request resolved: https://github.com/pytorch/pytorch/pull/10315 Reviewed By: yf225 Differential Revision: D9200723 Pulled By: ailzhang fbshipit-source-id: d2e186fd24defbc4fdea6c2cf3dc4f7e05e1d170	2018-08-08 06:43:41 -07:00
Edward Yang	0d03219a42	Remove hack as integrated builds use FULL_CAFFE2 now (#10320 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10320 Reviewed By: jerryzh168 Differential Revision: D9198902 Pulled By: ezyang fbshipit-source-id: 8af28d607735e5f4450c40127c1f8c262ea602ce	2018-08-07 21:40:07 -07:00
Thiago Crepaldi	7d6d7bef6a	Enable docker image build for PyTorch using specific python version (#10317 ) Summary: Current Dockerfile builds pytorch using default python within miniconda, which happens to be Python 3.6 This patch allows users to specify which python should be installed in the default miniconda environment used by the pytorch dockerfile. I have tested the build for python 2.7, 3.5, 3.6 and 3.7. Python 2.7 required typing and cython Pull Request resolved: https://github.com/pytorch/pytorch/pull/10317 Differential Revision: D9204401 Pulled By: ezyang fbshipit-source-id: 11355cab3bf448bbe8369a2ed1de0d409c9a2d6e	2018-08-07 16:13:33 -07:00
Gregory Chanan	66b3bae47c	Add sizesLegacyNoScalars/stridesLegacyNoScalars analog of sizeLegacyN… (#10323 ) Summary: …oScalars,strideLegacyNoScalars. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10323 Differential Revision: D9200567 Pulled By: gchanan fbshipit-source-id: 5580d6f92eef0acb04132f1978436cc31cdf563a	2018-08-07 15:41:28 -07:00
Christian Puhrsch	b7bc327180	Remove new_Tensor and generated components Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10194 Differential Revision: D9160559 Pulled By: cpuhrsch fbshipit-source-id: 133185b3d4258c154dc43f7572dbef6bfa6786f3	2018-08-07 15:09:38 -07:00
Peter Goldsborough	5390476297	Add tracing to custom op and simplify tracer overall (#10212 ) Summary: This PR adds tracing infrastructure for custom operators. It also simplifies the tracer overall, and changes the codegen to do more metaprogramming there instead of via C++ (which was necessary for the custom op tracing). To give an example of the tracer/metaprogramming change, what used to look like this in `VariableType.cpp`: ``` jit::tracer::PreTraceInfo trace_info; if (jit::tracer::isTracing()) { trace_info = jit::tracer::preRecordTrace(jit::aten::index_select, "self", self, "dim", dim, "index", index); } ``` is now simply the inlined version of `preRecordTrace`, minus C++ metaprogramming: ``` torch::jit::Node* node = nullptr; if (jit::tracer::isTracing()) { auto& graph = jit::tracer::getTracingState()->graph; node = graph->create(jit::aten::index_select_out, /outputs=/0); jit::tracer::recordSourceLocation(node); jit::tracer::addInputs(node, "result", result); jit::tracer::addInputs(node, "self", self); jit::tracer::addInputs(node, "dim", dim); jit::tracer::addInputs(node, "index", index); graph->appendNode(node); } ``` zdevito apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/10212 Differential Revision: D9199615 Pulled By: goldsborough fbshipit-source-id: cd4b603c1dc01340ead407228e109c99bdba2cfc	2018-08-07 13:54:15 -07:00
Natalia Gimelshein	5bb21493fd	add fused dropout kernels (#9666 ) Summary: While waiting for dropout to be fully ported to ATen, here's performance fix for the most common dropout case. Dropout is still in python function, I just added efficient path to it. I could not make inplace work, because generator always generates `return self` for inplace function, and I need to return both original tensor and mask, so inplace goes on the existing pass. Even with non-inplace version, since mask is now a ByteTensor, memory used is just a little larger than for inplace dropout, due to savings on mask. Once dropout is moved to aten, these kernels still can be used for efficient implementation. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9666 Reviewed By: SsnL Differential Revision: D8948077 Pulled By: ezyang fbshipit-source-id: 52990ef769471d957e464af635e5f9b4e519567a	2018-08-07 13:34:53 -07:00
Viswanath Sivakumar	74979495f0	Optional input lengths in CTC op (#10228 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10228 Sometimes, for all items in the minibatch in test mode, input length will be equal to max time steps. This avoids having to pass in an external tensor. Differential Revision: D9174378 fbshipit-source-id: 22f7d5c311c855d9c3ac59f2a5e773279bd69974	2018-08-07 13:34:51 -07:00
mruberry	9b1a65bec3	Extends type and shape tracing with device (#9796 ) Summary: This PR extends the existing type and shape metadata tracing and verification done in autograd with device information. This expansion of tracing is required for #8354, is likely useful in other scenarios, and is a healthy sanity check, just like type and shape tracing. The precise changes are: - TypeAndShape -> InputMetadata, now includes device() - Creating InputMetadata is simplified to just require a tensor, and callers were updated to use this simpler invocation wherever possible - The gradient accumulator of a variable is now reset when set_data() is called if either the type or device changes, and this reset now locks to avoid contention with acquiring the gradient accumulator - Mismatched devices during backward() will throw a runtime error, just like mismatched type and shape - (Bonus!) Two uninitialized pointers in THCReduce are now initialized (to nullptr) to prevent build warnings fyi colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/9796 Reviewed By: goldsborough Differential Revision: D9119325 Pulled By: ezyang fbshipit-source-id: 76d1861b8d4f74db0575ff1f3bd965e18f9463de	2018-08-07 12:25:17 -07:00
Edward Yang	2993c42ee4	Squash some 'invalid escape sequence' warnings. (#10310 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10310 Differential Revision: D9196254 Pulled By: ezyang fbshipit-source-id: 63bb8e52ac6970fe8e11a2d3c491ab58250dc467	2018-08-07 12:25:15 -07:00
Wei Yang	db7a2b1f0d	fix doc for as_tensor (#10309 ) Summary: - fixes #9914 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10309 Differential Revision: D9196427 Pulled By: weiyangfb fbshipit-source-id: c9a01e42c2e9dbfe2bd94ad14651d9f578751de2	2018-08-07 11:24:45 -07:00
Wei Yang	dcaafdd04b	fix doc of sparse_coo_tensor (#10308 ) Summary: - fixes #9998 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10308 Differential Revision: D9196423 Pulled By: weiyangfb fbshipit-source-id: 23b4ed96e354ac9aa7c268aad105818a2c6d3bd8	2018-08-07 11:24:44 -07:00
Jorghi12	20a549b101	Start using a newer version of rocRand that's PyTorch compatible. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10280 Differential Revision: D9196349 Pulled By: Jorghi12 fbshipit-source-id: 4147f2e6e3fdd641b026f3761d684437591405be	2018-08-07 11:09:59 -07:00
Roy Li	fe68879832	Fix dir(torch) for python 3.7 (#10271 ) Summary: fixes #10160. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10271 Differential Revision: D9188031 Pulled By: li-roy fbshipit-source-id: a3620553a8ba2b7391acdf78dbe58afcdb6c5f7f	2018-08-07 09:57:51 -07:00
Edward Yang	ad76fc8807	s/DISABLE_COPY_AND_ASSIGN/AT_DISABLE_COPY_AND_ASSIGN/ (#10275 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10275 Remove forwarding declaration in caffe2/core/common.h ``` codemod -d caffe2 --extensions cc,cpp,cu,cuh,h \\bDISABLE_COPY_AND_ASSIGN AT_DISABLE_COPY_AND_ASSIGN ``` Reviewed By: mingzhe09088 Differential Revision: D9184809 fbshipit-source-id: 958cf5162b0d92b83ea9c2597abb77320ca57ce8	2018-08-07 08:54:26 -07:00
Edward Yang	66f7b8abbe	Better macro name hygiene prefixing. (#10274 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10274 Good C++ libraries don't take up un-namespaced identifiers like DISABLE_COPY_AND_ASSIGN. Re-prefix this. Follow up fix: codemod Caffe2 to use the new macro, delete the forwarding definition Reviewed By: mingzhe09088 Differential Revision: D9181939 fbshipit-source-id: 857d099de1c2c0c4d0c1768c1ab772d59e28977c	2018-08-07 08:54:24 -07:00
Adam Lerer	18e298305e	Increase TCP listen queue size from 64 to 1024 (#10268 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10268 Running torch.distributed.init_process_group fails with more than ~64 processes, with various errors like connection refused or connection reset by peer. After some digging, it looks like the root cause is that all workers have to connect to master via TCP (both in Zeus init and in DataChannelTCP - look for `connect()`), and the listening socket only has a backlog of 64. I increased the backlog to 1024, that seems like enough for reasonable purposes (the hard limit is 65535 in /proc/sys/net/core/somaxconn). There's probably a more correct way to do this that involves retries when connection is refused. Reviewed By: soumith Differential Revision: D9182216 fbshipit-source-id: 2f71c4995841db26c670cec344f1e3c7a80a7936	2018-08-07 08:26:06 -07:00
Edward Yang	1a797ec810	Revert "clean up the build a bit. We no longer need the separate buil… (#10285 ) Summary: …d_libtorch entrypoint (#9836)" This reverts commit 62e23a1ee47eb66056e6695cefef4e42599f8bd0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10285 Differential Revision: D9193107 Pulled By: ezyang fbshipit-source-id: de96dce12fdf74410413ae18feee5caf0bed0025	2018-08-07 07:40:20 -07:00
Michael Suo	b6402648f4	fix off-by-one bug in open-ended slicing (#10286 ) Summary: Previously, `tensor[i:]` was transformed to `tensor[i:-1]`. This incorrectly leaves off the last element. Noticed this when implementing slicing for list types. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10286 Differential Revision: D9193292 Pulled By: michaelsuo fbshipit-source-id: df372b815f9a3b8029830dd9e8769f9985a890e7	2018-08-07 00:39:42 -07:00
Michael Suo	5a7c710548	Support some basic list operations (#10225 ) Summary: Support a few basic operators: - eq - add - len - select (indexing) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10225 Differential Revision: D9172338 Pulled By: michaelsuo fbshipit-source-id: 6e75ec1453b9589b0fb4698598ecdba5a5fccff9	2018-08-07 00:39:40 -07:00
Michael Suo	1bae6e24c9	Change empty list literal compiler error to match actual builtin name (#10265 ) Summary: I changed the name of this builtin to match Python's native style, but forgot to change the compiler error to match. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10265 Differential Revision: D9192963 Pulled By: michaelsuo fbshipit-source-id: 225ca4cd50fbbe3b31c369deeb3123a84342aab1	2018-08-07 00:39:39 -07:00
Edward Yang	fa9ea5bde9	Move CoreAPI.h to Macros.h, to give it a more accurate name. (#10264 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10264 Since we now have DISABLE_COPY_AND_ASSIGN macro in the file, CoreAPI is no longer an accurate name. Reviewed By: dzhulgakov Differential Revision: D9181687 fbshipit-source-id: a9cc5556be9c43e6aaa22671f755010707caef67	2018-08-06 22:27:44 -07:00
Edward Yang	da44cf6101	Move TensorTypeId, TensorTypeIdRegistration and flat_hash_map to ATen/core (#10263 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10263 Auxiliary changes that were needed: - Add DISABLE_COPY_AND_ASSIGN to CoreAPI.h (maybe we should rename this file now) Reviewed By: dzhulgakov Differential Revision: D9181321 fbshipit-source-id: 975687068285b5a94a57934817c960aeea2bbafa	2018-08-06 22:27:40 -07:00
James Geboski	f1cf3105de	Revert D9169049: [pytorch][PR] Add new mkldnn fallback operators Differential Revision: D9169049 Original commit changeset: 3bc30250d734 fbshipit-source-id: 65a91594bda699ff9535b27dccd0d1e5d1a8036a	2018-08-06 20:39:30 -07:00
wuhuikx	f47bec821e	Add new mkldnn fallback operators (#10162 ) Summary: Add new ideep fallback operators. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10162 Reviewed By: yinghai Differential Revision: D9169049 Pulled By: wesolwsk fbshipit-source-id: 3bc30250d7340fea2c442f36d16b85241ceee6e7	2018-08-06 16:56:00 -07:00
Achal	25b2e88750	Stop propagating std flags to downstream gcc/nvcc (#10098 ) Summary: When we directly use -std=c++11, it propagates to the downstream applications. Problems: 1. Gcc flags propagating to nvcc. 2. nvcc flags propagating to nvcc. (Which throws an error like redeclaration of std flag) This PR will fix these propagation issues! Similar problem: https://github.com/FloopCZ/tensorflow_cc/pull/92 https://github.com/CGAL/cgal/issues/2775 Requires: Cmake 3.12 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10098 Differential Revision: D9187110 Pulled By: ezyang fbshipit-source-id: 0e00e6aa3119c77a5b3ea56992ef3bbfecd71d80	2018-08-06 15:30:27 -07:00
Edward Yang	8b08eca203	Move ScalarType to ATen/core, splitting out Backend Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10262 Reviewed By: dzhulgakov Differential Revision: D9157408 fbshipit-source-id: 11631a35dfc6cb1f73f61ea08d3115f8ef4cb034	2018-08-06 15:30:25 -07:00
iotamudelta	a38b572de3	enable unit tests and other changes (#10266 ) Summary: This PR for the ROCm target does the following: * enable some unit tests on ROCm * fix a missing static_cast that breaks BatchNorm call on ROCm * fix BatchNorm to work on ROCm w/ ROCm warp sizes etc * improve the pyhipify script by introducing kernel scope to some transpilations and other improvements * fix a linking issue on ROCm * for more unit test sets: mark currently broken tests broken (to be fixed) * enable THINLTO (phase one) to parallelize linking * address the first failing of the elementwise kernel by removing non-working ROCm specialization Pull Request resolved: https://github.com/pytorch/pytorch/pull/10266 Differential Revision: D9184178 Pulled By: ezyang fbshipit-source-id: 03bcd1fe4ca4dd3241f09634dbd42b6a4c350297	2018-08-06 14:54:01 -07:00
Jerry Zhang	e0d43572c1	Cleaner semantics for Reserve (#10261 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10261 1. Reserve Currently, Reserve will allocate new memory and old data in the tensor is also preserved, and Resize is relying on this behavior in some call-site, e.g. https://github.com/pytorch/pytorch/blob/master/caffe2/operators/reservoir_sampling.cc#L103, where we should be using Extend. We want to bring semantics of Reserve to be more aligned with std::vector, i.e. we want it to be an optimization about memory allocation and remove the semantics about preserving the data. We'll remove the guarantee that data will be preserved after Reserve, and Extend will be the only API that preserves old data when we do in-place extension of memory. This also helps with the later refactoring on split Storage from Tensor. Also, we'll only pass in the outer dimension to Reserve which means the later dimensions should be set before we call Reserve. 2. Extend/Shrink Previously, Extend actually means ExtendBy and Shrink means ShrinkTo, I would like to add a ExtendTo for convenience, and change Shrink to ShrinkTo. Old functions calling Extend is still there, although it actually means Extend by, but I think it still makes sense to have it. 3. Usage Patterns The expected usage patterns right now is: ``` t->Resize({0, 32, 32, 32}); t->template mutable_data<T>(); // set meta_ t->Reserve(100); auto* t_data = t->template mutable_data<T>(); // feed data to tensor using t_data for (int i = 0; i < 100; ++i) { t->Extend(1, 50, &context_); // you can continue to use t_data if you have reserved enough space // otherwise, you should call t->template mutable_data<T> again to // get the new data pointer since Extend will allocate new memory even // though the original data is preserved. } ``` Reviewed By: ezyang Differential Revision: D9128147 fbshipit-source-id: e765f6566d73deafe2abeef0b2cc0ebcbfebd096	2018-08-06 14:40:16 -07:00
Xiaomeng Yang	a13a53c151	Optimize group_norm on cpu (#10246 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10246 Optimize group_norm on cpu Reviewed By: houseroad Differential Revision: D9177878 fbshipit-source-id: 41f7aadc6336317c338c75daccef6cb98e9de9de	2018-08-06 14:26:09 -07:00
Peter Goldsborough	0c848f4179	Python integration for custom operators (#10149 ) Summary: Adds the Python path to custom operators, including dynamically loading operations into Python. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10149 Reviewed By: ezyang Differential Revision: D9158380 Pulled By: goldsborough fbshipit-source-id: 3edffa639e8d2959e9e80d1bd4f20ab4a1b3ca02	2018-08-06 13:54:48 -07:00
Anders Papitto	62e23a1ee4	clean up the build a bit. We no longer need the separate build_libtorch entrypoint (#9836 ) Summary: the new entrypoint is `./tools/build_pytorch_libs.sh caffe2` this will also speed up CI builds a bit, since we will no longer be compiling all of libtorch twice Pull Request resolved: https://github.com/pytorch/pytorch/pull/9836 Differential Revision: D9182634 Pulled By: anderspapitto fbshipit-source-id: 0b9a20ab04f5df2d5c4e7777e4dc468ab25b9ce2	2018-08-06 13:41:51 -07:00
Gregory Chanan	d1a0c2eaf8	Add back THTensor_nDimension. (#10259 ) Summary: Turns out some people are using this via the C-API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10259 Differential Revision: D9180135 Pulled By: gchanan fbshipit-source-id: 68f59beabf7f8093e67581d7e7ebfe8dff9e6b69	2018-08-06 11:09:41 -07:00
Gregory Chanan	6ac35b35d1	Stop using THLongStorage for sizes/strides, remove THLongStorageView. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10219 Reviewed By: cpuhrsch Differential Revision: D9159550 Pulled By: gchanan fbshipit-source-id: 745a6d335613688ed41b32369ee4938907ce8cbb	2018-08-06 09:25:32 -07:00
Jongsoo Park	835a5d4f49	Add cost inference of fwd sparse operators and sparse adagrad (#9314 ) Summary: We should also add cost inference for sparse operators in backward pass later. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9314 Reviewed By: orionr Differential Revision: D8789240 Pulled By: jspark1105 fbshipit-source-id: 68c2170f294fe13bcc409276f599b5fa8a98bcd3	2018-08-06 08:39:16 -07:00
peter	506142ac8a	Add warning for building PyTorch using Python 2.7 on Windows (#10247 ) Summary: Fixes #9232. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10247 Differential Revision: D9178257 Pulled By: SsnL fbshipit-source-id: cc553335a5a918b6d77fe1064460cb66114859ca	2018-08-05 21:24:02 -07:00
Lingyi Liu	267c397c5b	Add the ocr_det model for benchmarking (#10245 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10245 as title Reviewed By: sf-wind Differential Revision: D9176654 fbshipit-source-id: 3339d2aa6a0ceb0e751745c06dcfd025ccbf5449	2018-08-05 16:45:35 -07:00
Lingyi Liu	7f2e43a084	Add the ocr_rec model json (#10240 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10240 as title Reviewed By: sf-wind Differential Revision: D9176522 fbshipit-source-id: 5b92c0b4ed24f96fe7b1321a3ab5ad26dcd3318d	2018-08-05 16:45:23 -07:00
Shuichi KITAGUCHI	df23bdc82d	add BEGIN NOT-CLEAN-FILES marker to .gitignore. (#10233 ) Summary: Using Visual Studio Code and Visual Studio, these IDEs store configurations to `FOLDER/.vscode` and `FOLDER/.vs`. But "setup.py clean" deletes these folders because those are described in `.gitignore` file. To prevent this, add "BEGIN NOT-CLEAN-FILES" marker to `.gitignore` file and "setup.py clean" ignores lines after this marker. Discussed in #10206 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10233 Differential Revision: D9175515 Pulled By: ezyang fbshipit-source-id: 24074a7e6e505a3d51382dc5ade5c65c97deda37	2018-08-05 15:55:44 -07:00
Xiaomeng Yang	f57e4ce1d5	Update broadcast with alpha to reduce num of launching kernels. (#10235 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10235 Update broadcast with alpha to reduce num of launching kernels. Reviewed By: houseroad Differential Revision: D9175824 fbshipit-source-id: 7a463833350a2c84dcfb82f73cf40da403dd59a0	2018-08-04 19:54:20 -07:00
Qin Huang	ab293924bb	support generic feature in DPER2 (#10197 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10197 Support generic feature in DPER2 For now since we only have one generic type 1, we are directly adding the parsed feature record to embedding feature. For new feature types with specific structure, there should also be corresponding coding changes expected. Reviewed By: itomatik Differential Revision: D8788177 fbshipit-source-id: 9aaa6f35ece382acb4072ec5e57061bb0727f184	2018-08-04 15:25:13 -07:00
Xiaomeng Yang	57d2d4bcff	Optimize reduce ops for 2d and 3d (#9992 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9992 Optimize reduce ops for 2d and 3d Reviewed By: houseroad Differential Revision: D9042505 fbshipit-source-id: 62af2125aa6439106293e59bdf6a2b920792fd2d	2018-08-04 13:53:58 -07:00
Richard Zou	29406a2c4c	Fix shared_ptr refcycle in graph executor (#10222 ) Summary: Fixes #10032 When capturing an output, GraphExecutorAutogradFunction creates SavedVariable with is_output=False and owns it: https://github.com/pytorch/pytorch/blob/master/torch/csrc/jit/graph_executor.cpp#L87 Constructing SavedVariable with is_output=False makes it own a copy of the shared_ptr<GraphExecutorAutogradFunction>, which causes a reference cycle: `6456b944fd/torch/csrc/autograd/saved_variable.cpp (L27)` The solution in this PR is to construct the SavedVariable with is_output=True if the captured value is an output. Test Plan Turn on cuda memory checking for JitTestCase. If the test's name includes "cuda" or "gpu" in it, the cuda memory checking test happens. cc zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10222 Reviewed By: ezyang Differential Revision: D9162995 Pulled By: zou3519 fbshipit-source-id: aeace85a09160c7a7e79cf35f6ac61eac87cbf66	2018-08-04 11:39:10 -07:00
Marat Dukhan	2141cb7d53	Update OnnxifiOp to reflect onnx/onnx#1256 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10230 Reviewed By: yinghai Differential Revision: D9174527 Pulled By: Maratyszcza fbshipit-source-id: 753493e67446b528d65b146e89ea9f874b469ead	2018-08-04 08:09:19 -07:00
Sergii Dymchenko	5df8547ff9	Fix ONNX LogSoftmax export. (#9576 ) Summary: This fixes an issue with incorrect `axis=-1` in the exported ONNX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9576 Reviewed By: yinghai Differential Revision: D9125463 Pulled By: houseroad fbshipit-source-id: 6f4cb1067d1aa6bb0a9f56690fc21816c98eebfa	2018-08-03 22:09:42 -07:00
Edward Yang	36939417b2	Introduce at::DeviceType, which subsumes at::Device::Type and (partially) caffe2::DeviceType (#10175 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10175 Previously, we had at::Device::Type and caffe2::DeviceType (from protobuf), intended to help us distinguish between CPU, CUDA, etc. devices. This replaces at::Device::Type entirely with at::DeviceType, which in turn is a direct, 'enum class' version of the protobuf generated caffe2::DeviceType 'enum'. We can't eliminate the 'enum' because this would a pretty drastic API change (enum is interconvertible with integers, enum class is not) but we can make the two line up exactly and share code for, e.g., printing. Reviewed By: Yangqing Differential Revision: D9137156 fbshipit-source-id: 566385cd6efb1ed722b25e6f7849a910b50342ab	2018-08-03 19:25:06 -07:00
Edward Yang	98d60ad43d	Replace caffe2::EnforceNotMet with at::Error Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10184 Reviewed By: dzhulgakov Differential Revision: D9140095 fbshipit-source-id: 3beead825609cec5054347e59903b0b78ef150f8	2018-08-03 19:25:05 -07:00
Edward Yang	e2976ea519	Make at::Error look more like caffe2::EnforceNotMet (#10183 ) Summary: - New concept of a message stack; you can add messages using AppendMessage - New concept of a caller; it's just a way to pass along some arbitrary extra information in the exception Coming soon is changing Caffe2 to use at::Error instead of EnforceNotMet Pull Request resolved: https://github.com/pytorch/pytorch/pull/10183 Differential Revision: D9139996 Pulled By: ezyang fbshipit-source-id: 6979c289ec59bc3566a23d6619bafba2c1920de9	2018-08-03 19:25:03 -07:00
Edward Yang	c7c6e93312	Use target_compile_definitions for AT_CORE_STATIC_WINDOWS (#10213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10213 nvcc only respects definitions, not options. Reviewed By: dzhulgakov Differential Revision: D9154388 fbshipit-source-id: 04c4809154df1c61108b65f1115fccdeb336952e	2018-08-03 19:25:02 -07:00
Edward Yang	02a64b183c	Move ATenGeneral back out of core. (#10224 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10224 It doesn't work with Caffe2; use AT_CORE_API from ATen/core/CoreAPI.h instead. Reviewed By: smessmer Differential Revision: D9162467 fbshipit-source-id: 3c7d83c1ccb722ebac469296bdd7c3982ff461e5	2018-08-03 19:25:01 -07:00
Edward Yang	41dce17e22	Delete TensorImpl::type_, replace with backend_/scalar_type_/is_variable_ (#10210 ) Summary: The basic game plan is to stop accessing the type_ field directly, and instead using the stored backend_, scalar_type_ and is_variable_ to look up the appropriate Type from Context. Storage of backend_ and scalar_type_ are new. At some future point in time, I'd like to look at this code carefully to see if I can get everything in this codepath inlining. I didn't do it in this patch because there are circular include problems making things difficult. Some other details: - Added Device::backend() which does what it says on the tin - SparseTensorImpl is temporarily hard-coded to root in at::Context for the appropriate context. If/when we put this in shared code, we'll have to break this dep too, but for now it should be OK. - There's a stupid problem with globalContext() deadlocking if you didn't actually initialize it before loading libtorch.so (which is bringing along the variable hooks). I fixed this by reordering the static initializers. Fixes #9784 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10210 Differential Revision: D9150697 Pulled By: ezyang fbshipit-source-id: 89e2006c88688bcfab0dcee82dc369127c198c35	2018-08-03 18:25:19 -07:00
Wei Yang	149d4f776b	use logsigmoid at multilabel_soft_margin_loss, and change output from shape=(N, C)to (N,) (#9965 ) Summary: - fixes #9141, #9301 - use logsigmoid at multilabel_soft_margin_loss to make it more stable (NOT fixing legacy MultiLabelSoftMarginCriterion) - return (N) instead of (N, C) to match the same behavior as MultiMarginLoss - Note that with this PR, the following behavior is expected: ``` loss = F.multilabel_soft_margin_loss(outputs, labels, reduction='none') loss_mean = F.multilabel_soft_margin_loss(outputs, labels, reduction='elementwise_mean') loss_sum = F.multilabel_soft_margin_loss(outputs, labels, reduction='sum') loss.sum() == loss_sum # True loss.mean() == loss_mean # True ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9965 Differential Revision: D9038402 Pulled By: weiyangfb fbshipit-source-id: 0fa94c7b3cd370ea62bd6333f1a0e9bd0b8ccbb9	2018-08-03 17:54:19 -07:00
Dmytro Dzhulgakov	7bc87172ea	Kill Tensor::shares_data (#10217 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10217 It's only used in debug printing and is not that reliable anyway. If we want to implement it later - we should do it proper accounting for shared storages. Reviewed By: jerryzh168 Differential Revision: D9155685 fbshipit-source-id: 48320d41a0c4155645f3ba622ef88730a4567895	2018-08-03 17:40:39 -07:00
Jerry Zhang	3b3aff2ed6	IsType<TensorCPU> -> IsType<Tensor>(CPU) (#10135 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10135 att Reviewed By: yinghai Differential Revision: D9121892 fbshipit-source-id: 4a4a3bfc450896b619bf92c92ef218aaaefc3081	2018-08-03 17:24:59 -07:00
Sebastian Messmer	4aa7469d1f	Implement c10 ops needed for benchmark (#9360 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9360 This implements a first set of c10 operators, namely the ones needed for the multithread predictor benchmark. All implementations are CPU-only and experimental. They're not meant to be used in production. They can be used, however, to test calling simple c10 MLPs from Caffe2 or PyTorch when working on these integration paths. Reviewed By: dzhulgakov Differential Revision: D8811698 fbshipit-source-id: 826789c38b2bfdb125a5c0d03c5aebf627785482	2018-08-03 16:09:27 -07:00
Sebastian Messmer	08e7af20d3	Implement calling of c10 ops from c2 (#9369 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9369 This adds the capability for caffe2 to call c10 operators and adds a dummy c10 sigmoid op as a proof of concept. I used this test script to make sure it works: from caffe2.python import workspace, model_helper import numpy as np data1 = np.random.rand(16, 100).astype(np.float32) workspace.FeedBlob("data1", data1) m = model_helper.ModelHelper(name="my net") sigmoid1 = m.net.C10Sigmoid_DontUseThisOpYet("data1", "sigmoid1") sigmoid2 = m.net.Sigmoid("data1", "sigmoid2") workspace.RunNetOnce(m.param_init_net) workspace.CreateNet(m.net) data1 = np.random.rand(16, 100).astype(np.float32) workspace.FeedBlob("data1", data1) workspace.RunNet(m.name, 1) print(workspace.FetchBlob("data1")) print(workspace.FetchBlob("sigmoid1")) print(workspace.FetchBlob("sigmoid2")) (and check that both sigmoid outputs are the same) Reviewed By: ezyang Differential Revision: D8814669 fbshipit-source-id: eeb0e7a854727f1617a3c592a662a7e5ae226f40	2018-08-03 16:09:23 -07:00
wuhuikx	c5abe8844a	Add IDEEP fallbacks for Resnet50 training ops (#8541 ) Summary: 1. Add fallback gradient ops 2. In fallback ops, set the output Tensor as CPUTensor instead of IDEEPTensor if ndim = 0. Because IDEEPTensor doesn't support 0 dim. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8541 Reviewed By: yinghai Differential Revision: D9115233 Pulled By: wesolwsk fbshipit-source-id: 163e6a76f02bd781c95d1060ccbacf2cab90055e	2018-08-03 15:54:17 -07:00
Sebastian Messmer	4680ab4d44	Generalize intrusive_ptr comment (#10216 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10216 - Reviewed By: ezyang Differential Revision: D9155601 fbshipit-source-id: 154de2e6ad747134413a3ab3ae0b7507b8284d49	2018-08-03 14:25:28 -07:00
Sebastian Messmer	97cbcb7d67	Allow releasing/retaining weak_intrusive_ptr (#10214 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10214 Seems we're passing weak pointers over C API boundaries. Need this API there too. Reviewed By: ezyang Differential Revision: D9154505 fbshipit-source-id: c9889689b87dad5d918f93ba231e01704b8d2479	2018-08-03 14:25:24 -07:00
Thomas Viehmann	6456b944fd	ctc_loss odds and ends (#10112 ) Summary: - Add convenience wrapper to pass tensors as input_lengths, target_lengths - Fix documentation example - Check BLANK >= 0 Thank you, Simon and Soumith for the suggestions! Pull Request resolved: https://github.com/pytorch/pytorch/pull/10112 Differential Revision: D9130737 Pulled By: SsnL fbshipit-source-id: f9a0022a969788bda3db9f360e2564b519ebf2e6	2018-08-03 13:25:18 -07:00
Sebastian Messmer	65d32b1705	Remove unused substitutions (#10187 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10187 These substitutions don't actually occur in the target file. Remove them. Reviewed By: ezyang Differential Revision: D9141567 fbshipit-source-id: fcfddee0b4d31e21763b39d852577d2dbb9ce843	2018-08-03 12:25:59 -07:00
Sebastian Messmer	f51f15bb27	Update include paths for ATen/core (#10130 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10130 Update some include paths to make them internally consistent Reviewed By: ezyang Differential Revision: D9119906 fbshipit-source-id: b44e5cab8e8e795ee18afe9ffc6caf1f2b413467	2018-08-03 11:57:02 -07:00
Aaron Jaech	f77b62c3e1	Add documentation for margin arg in Caffe2 MarginRankingCriterionOp (#10186 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10186 The MarginRankingCriterionOp margin argument was undocumented. Reviewed By: jerryzh168 Differential Revision: D9141228 fbshipit-source-id: 724d45dc8e555fbe9d3e8afc7b6bf8ed17bbbdb1	2018-08-03 11:45:51 -07:00
Peter Goldsborough	cb0e72e00d	Add registerOperator overloads that infer the schema (#10048 ) Summary: This PR adds a way to infer the JIT/script schema of a function from its signature, and then create an operator from the schema and implementation. The implementation function is wrapped into another function, which pops values from the stack into an argument tuple, then invokes the function and pushes the return value back onto the stack, sometimes unpacking the return value if it is a tuple. Currently the method is called `createOperator`. We may want to think of a nicer way of registering ops in tandem with `RegisterOperators`. It might be very cumbersome to add a template constructor to `Operator`, so maybe we can come up with a chaining method on `RegisterOperators` like `RegisterOperators(schema, func).op(schema.func).op(schema, func)` -- it has to work at startup time (for a static variable) though. We can solve this in another PR. zdevito apaszke smessmer dzhulgakov Pull Request resolved: https://github.com/pytorch/pytorch/pull/10048 Differential Revision: D9125975 Pulled By: goldsborough fbshipit-source-id: de9e59888757573284a43787ae5d94384bfe8f9a	2018-08-03 11:45:49 -07:00
Owen Anderson	7a377b9a53	Add torch.argsort mirroring similar functionality in numpy. (#9600 ) Summary: Per issue #9542 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9600 Differential Revision: D8952338 Pulled By: resistor fbshipit-source-id: c3f69d62858ad9458ec5ae563e3ff24b1c9283a7	2018-08-03 11:45:47 -07:00
Sebastian Messmer	c91af1202a	Make release_resources non-const (#10192 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10192 - release_resources() method must be non-const because it modifies the object - for intrusive_ptr<const MyClass>, this needs to be const_cast :( Reviewed By: ezyang Differential Revision: D9143808 fbshipit-source-id: 9203ff7a7ff3bec165931279371c6e75d4f0ca8c	2018-08-03 11:24:45 -07:00
Sebastian Messmer	39476d79a2	Allow releasing/reclaiming intrusive_ptr (#10133 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10133 This is useful for C APIs where we want to give owning pointers to/from other languages. Reviewed By: ezyang Differential Revision: D9121493 fbshipit-source-id: f903f5830f587b2ba69c0636ddcf1a066bbac2e0	2018-08-03 11:24:43 -07:00
Edward Yang	5753746d29	Enable static initializer order ASAN. (#10211 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10211 Differential Revision: D9150687 Pulled By: ezyang fbshipit-source-id: 4cd458d19a34788c8897905a87d1b52229f67f90	2018-08-03 11:24:42 -07:00
Christian Puhrsch	4a6fbf03c6	Make StorageImpl member variables largely private and use getters and setters Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10074 Differential Revision: D9086887 Pulled By: cpuhrsch fbshipit-source-id: d2dd0d6a1b71d0f864aefb64cd1daefd11dcfb91	2018-08-03 11:10:02 -07:00
Wanchao Liang	50cf326158	Allow type cast between int and float in Script (#10168 ) Summary: The PR allows int→float and float→int casts. Current we only allow `tensor→int` and `tensor→float` casts. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10168 Differential Revision: D9141163 Pulled By: wanchaol fbshipit-source-id: 5e5591a98b4985a675641dfc9a385b2a0bf8e208	2018-08-03 10:56:05 -07:00
Jerry Zhang	5d3782b655	Fix IDEEP Copys (#10104 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10104 . Reviewed By: yinghai Differential Revision: D9109638 fbshipit-source-id: 319cc5711132314dfba0f09ac403522f21ad532b	2018-08-03 10:31:32 -07:00
Jerry Zhang	656bb320b7	EnforceFinite test (#10143 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10143 att Reviewed By: xianjiec Differential Revision: D9122444 fbshipit-source-id: 010abcc1eb64f084c00890e8de5f5d422b4b8d02	2018-08-03 10:31:29 -07:00
Michael Suo	13de6e8dfa	Make list literals construct ListType (#10193 ) Summary: Previously, `foo = [bar, baz]` would construct a TupleType of fixed arity. This would cause code like: ``` foo = [2] if True: foo = [2, 2] ``` to fail to compile, since `(int)` is not the same as `(int, int)`. This PR changes things so that list literals construct ListTypes, which can be resized. Potentially breaking changes introduced: - Empty list literals are now disallowed, `_constructEmptyFooList()` builtins are required to replace them. - Iterable variable unpacking where the rhs is a list is now disallowed. (Tuples still work) - Lists must have a single type. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10193 Differential Revision: D9147166 Pulled By: michaelsuo fbshipit-source-id: bbd1b97b0b6b7cb0e6f9d6aefa1ee9c731e63039	2018-08-03 00:55:23 -07:00
Tongzhou Wang	ab0ac6391b	fix padding doc not rendered correctly (#10196 ) Summary: somehow sphinx doesn't like the previous wording Pull Request resolved: https://github.com/pytorch/pytorch/pull/10196 Differential Revision: D9146817 Pulled By: SsnL fbshipit-source-id: 2140859bc363af556a021658def946d7afbdb245	2018-08-02 23:26:45 -07:00
Junjie Bai	4778afb8bb	In Expand support using -1 to indicate preserving original size (#10174 ) Summary: zrphercule https://pytorch.org/docs/stable/tensors.html#torch.Tensor.expand Pull Request resolved: https://github.com/pytorch/pytorch/pull/10174 Differential Revision: D9136467 Pulled By: bddppq fbshipit-source-id: 825c489899097acda8d43706964d78a104cdf583	2018-08-02 22:09:47 -07:00
Junjie Bai	dd527db711	Skip TestConvolution.test_convolution_sync on ROCM which caused random segfaults (#10179 ) Summary: https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/4701/console petrex ashishfarmer rohithkrn Pull Request resolved: https://github.com/pytorch/pytorch/pull/10179 Differential Revision: D9139657 Pulled By: bddppq fbshipit-source-id: 9b1bb2ad185ed16fff696ce026a5ee5fcf9cbaee	2018-08-02 21:09:27 -07:00
Zachary DeVito	1f78e06f63	Add g.insertConstant and clean up dead attributes code (#10177 ) Summary: * Changes `insertConstant(g, val)` to `g.insertConstant(val)`. * Moves SourceRange to its own file to enable it. * Cleans up dead attribute code in schema matching and graph. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10177 Differential Revision: D9137789 Pulled By: zdevito fbshipit-source-id: 8a73cfb01a576f02e7e4dce019be9c0a0002989d	2018-08-02 20:45:31 -07:00
Sebastian Messmer	798b530361	weak_intrusive_ptr (#10038 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10038 Add weak_ptr ability to intrusive_ptr. Reviewed By: ezyang Differential Revision: D9039980 fbshipit-source-id: dd504d6e0d7acf5914cd45845355e28f9df201fb	2018-08-02 17:25:14 -07:00
Sebastian Messmer	2bd709a7c8	intrusive_ptr (#9897 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9897 Add an IntrusivePtr class to do intrusive refcounting with a shared_ptr-like interface. Reviewed By: ezyang Differential Revision: D9018619 fbshipit-source-id: 5de8706aab8eea2e30bead0f59bd6a7ca4d20011	2018-08-02 17:25:12 -07:00
Roy Li	0e9c6898cb	Export modules in ir with google protobuf Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9746 Differential Revision: D9110006 Pulled By: li-roy fbshipit-source-id: 8b9744c042f822fdfe959a7a7fef3d0baff4f639	2018-08-02 15:54:51 -07:00
Lukasz Wesolowski	e2ecf3914a	Change default CUDA block size from 512 to 128 (#10090 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10090 Decreasing the block size improves GPU utilization for use cases with small input sizes (e.g. 10000) Reviewed By: pjh5 Differential Revision: D9093573 fbshipit-source-id: c8f995b773a00b1bea3a3809c0f6557133efd9dd	2018-08-02 15:40:13 -07:00
Taewook Oh	7dc870bd7b	Delete invalid 'template' keyword (#10173 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10173 With D9024330, `Extend` fundtion is no more a template, which makes the `template` keyword here invalid. For some reason current version of LLVM doesn't catch this, but the latest one does. Reviewed By: jerryzh168 Differential Revision: D9133462 fbshipit-source-id: 54ac9aad01f81b9b4e7b6e2864b8961478d2d860	2018-08-02 14:50:11 -07:00
Owen Anderson	dad6e8bb6c	Remove capture specifiers in register_aten_ops when they're not needed. (#9669 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9669 Differential Revision: D8952335 Pulled By: resistor fbshipit-source-id: 8fbbec7a7f55fbeeda3509cb3d339e1db90a53e6	2018-08-02 13:40:31 -07:00
Christian Puhrsch	94c67f1454	Replace storageimpl type with scalar_type and backend Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10097 Differential Revision: D9124287 Pulled By: cpuhrsch fbshipit-source-id: c976abeeaaa085b972812c1a3270eb6aef0c0dca	2018-08-02 13:31:30 -07:00
Tongzhou Wang	538b15d13c	Use PYTORCH_PYTHON to call generate_code.py (#10171 ) Summary: Probably fixes https://github.com/pytorch/pytorch/issues/8373#issuecomment-409994847 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10171 Differential Revision: D9135607 Pulled By: SsnL fbshipit-source-id: 72f535875658c857621e41fd25c2174052714557	2018-08-02 12:54:14 -07:00
Edward Yang	9e85a7a9de	Back out "[pytorch][PR] [TENSOR MERGE] Delete type_ field from TensorImpl, replaced with backend_/scalar_typ…" (#10169 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10169 Original commit changeset: 2b4d867abfdc Reviewed By: pjh5, SsnL Differential Revision: D9135216 fbshipit-source-id: d5c9f12c3a0f75df224c781e1cd1e323cdfbb0d5	2018-08-02 12:39:01 -07:00
onnxbot	7be071a829	Update onnx to onnx/onnx@2a3a226 (#10167 ) Summary: `2a3a226a96` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10167 Reviewed By: houseroad Differential Revision: D9134738 Pulled By: bddppq fbshipit-source-id: 9d3fd3c04a584d5626146f174ac78cabfa0e5934	2018-08-02 12:25:19 -07:00
Rob Kunkle	6e85112f12	Adding katex rendering of equations, and required edits to equations. (#8848 ) Summary: This fixes issue #8529. - Adds Katex extension to conf.py and requirements.txt - Fixes syntax differences in docs - Should allow documentation pages to render faster Pull Request resolved: https://github.com/pytorch/pytorch/pull/8848 Reviewed By: soumith Differential Revision: D8677702 Pulled By: goodlux fbshipit-source-id: c4a832c5879e0eebcb14763b35a41663331ba23f	2018-08-02 12:25:17 -07:00
Junjie Bai	ee98533746	Fix compiler warnings on ignored const qualifiers Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10142 Reviewed By: yinghai Differential Revision: D9125502 Pulled By: bddppq fbshipit-source-id: 8043b2a05507a4707220fa820ab6cc486760a93e	2018-08-02 12:10:37 -07:00
Edward Yang	5765549155	codemod -d caffe2 --extensions cc,h CaffeTypeId TypeIdentifier (#10166 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10166 TypeIdentifier is still easy to codemod away from Reviewed By: smessmer Differential Revision: D9132840 fbshipit-source-id: bc83a8b17b2e7c19c9d2c9cfe5c7ce6ec1d8cec5	2018-08-02 11:54:30 -07:00
Lin Li	4a2f3cc45f	Improve lars operator by applying clipping (#9905 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9905 This diff improves lars operator in Caffe2 by applying clipping to the computed learning rate Reviewed By: pjh5 Differential Revision: D9020606 fbshipit-source-id: b579f1d628113c09366feac9406002f1ef4bd54f	2018-08-02 11:54:28 -07:00
Gregory Chanan	a243e517fa	Guard sizes/strides in TH/THC for scalars. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10145 Differential Revision: D9125791 Pulled By: gchanan fbshipit-source-id: d0b8c88c49d7af85971a4531a63fd85a97bfbec7	2018-08-02 11:24:36 -07:00
Elias Ellison	170d29769b	Strings lexing, parsing, implementation in print (#9324 ) Summary: This PR adds strings to the ast and implements them for print statements. Strings are lifted as attributes to the print node. They must be arguments to print itself, not as an argument for an object that is passed to print. If they are encountered elsewhere a NYI exception will be thrown. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9324 Reviewed By: jramseyer Differential Revision: D8807128 Pulled By: eellison fbshipit-source-id: 984401ff458ed18d473c6d1bd86750e56c77d078	2018-08-02 11:09:03 -07:00
Gregory Chanan	230ca98d4b	Remove THTensor_isSize. (#10146 ) Summary: This is part of the process of removing THLongStorage to represent sizes/strides. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10146 Differential Revision: D9126611 Pulled By: gchanan fbshipit-source-id: b0d995a4c51dfd54bf76dcfee9a69f37f9d01652	2018-08-02 10:39:43 -07:00
James Reed	9c818bfbc7	Refactor PythonValue types + use tryMatchSchema for PythonOp Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10132 Differential Revision: D9121327 Pulled By: jamesr66a fbshipit-source-id: 6d8bcf6b0dca54106cf9ed740bcff857062a03da	2018-08-02 10:26:58 -07:00
iotamudelta	cfa05706ef	ROCm contributions week 29 (#9653 ) Summary: In this changeset: * improvements to `hipify-python.py` * marking unit tests broken for ROCm * reducing the number of jobs for the built to avoid out of memory issues * switch to Thrust/cub-hip master for the CI Pull Request resolved: https://github.com/pytorch/pytorch/pull/9653 Differential Revision: D9117791 Pulled By: ezyang fbshipit-source-id: a6c3c7b81f2bda9825974bf9bf89a97767244352	2018-08-02 09:09:00 -07:00
Runtian Zhou	70d47f92db	Add support for rand_like op in fusion compiler (#9795 ) Summary: Enabled support for generating random numbers in fusion compiler. Currently a philox RNG implemented by Tensorflow is used, as the NVRTC couldn't resolve the curand.h header correctly. The two implementation should have exact same behavior according to our tests. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9795 Differential Revision: D8999029 Pulled By: SsnL fbshipit-source-id: f0d2616a699a942e2f370bdb02ac77b9c463d7b8	2018-08-02 08:55:25 -07:00
Duc Ngo	4a5cd4f6ab	nomnigraph - new utility for graph transformation (#10081 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10081 Add new utility that make it easier to write graph transformation. Callers now only need to take care of the actual transformation logic. The subgraph matching is simplified because callers only need to specify a simple construct for subtree matching criteria. The utlity is SubgraphMatcher::replaceSubtree Some notes: - replaceSubtree takes a subtree matching criteria, and a lambda that takes a subtree root. It does't not handle any transformations itself. Callers should be responsible for the transformation part, including deleting all nodes in the matched subtree(s). We could enhance this to also handle the deletion part if it turns out to be useful. - Only sub tree matching is supported for now but we can add general DAG sub-graph support later if needed. Reviewed By: bwasti Differential Revision: D9073297 fbshipit-source-id: 465a0ad11caafde01196fbb2eda2d4d8e550c3b6	2018-08-01 23:09:41 -07:00
Zhicheng Yan	acbc2744d8	fix bug in 3d group convolution (#9860 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9860 For 3D group convolution, in the case of CUDNN 7 and NCHWD order, filter dim is (M, C/group_, k_h, h_w, k_d). According to CUDA doc (https://docs.nvidia.com/deeplearning/sdk/cudnn-developer-guide/index.html#grouped-convolutions), the existing implementation is incorrect, and will crash the 3d video model training with group convolution. In the implementation, `filter.dims(1)` is already `C/group_`. So don't need to divide it by `group_` again. Reviewed By: BIT-silence Differential Revision: D9008807 fbshipit-source-id: 2f0d6eb47f4e16d7417a7e3baeba709e3254154f	2018-08-01 22:55:38 -07:00
Chunli Fu	57061d600a	Auto-batching IR transformation for control flow (#9392 ) Summary: Implement IR transformation for control flow - `prim::Constant`: clone to new graph directly - `prim::NumToTensor`: create a `BatchTensor` from output tensor with `batch_size = 1` - `prim::TensorToNum`: clone to new graph - `prim::ListConstruct`: clone to new graph - `prim::If`: execute both `if_block` and `else_block` and combine results from them using `cond` - `prim::Loop`: - for loop - while loop: change while `cond` to `cond_any`, use `cond` to update outputs test case: hand-written LSTM, greedy search, beam search Pull Request resolved: https://github.com/pytorch/pytorch/pull/9392 Differential Revision: D8822369 Pulled By: ChunliF fbshipit-source-id: 8f03c95757d32e8c4580eeab3974fd1bc429a1e5	2018-08-01 22:24:35 -07:00
Edward Yang	8a25acbba5	Use angle brackets instead of quotes for includes. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10153 Reviewed By: smessmer Differential Revision: D9123768 fbshipit-source-id: 0970552ba4d5772fb3cef2db3af3181d98f85140	2018-08-01 22:02:51 -07:00
Edward Yang	5699250acc	Move IdWrapper to ATen/core (#10152 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10152 - Moved from namespace c10::guts to at - I fixed the use sites, since there were only three of them - Macro renamed from C10_ to AT_ Reviewed By: smessmer Differential Revision: D9123652 fbshipit-source-id: bef3c0ace046ebadb82ad00ab73371f026749085	2018-08-01 22:02:50 -07:00
Edward Yang	8cc7d33656	Renumber typeid.h so that the number lines up with ScalarType (#10139 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10139 We want CaffeTypeId to be interconvertible with at::ScalarType, and this means we should have the numbers line up exactly. Fortunately this is not too hard to do. Reviewed By: smessmer Differential Revision: D9123058 fbshipit-source-id: 7e9bd59ca25a552afe9d2d0a16cedc4f6311f911	2018-08-01 22:02:46 -07:00
Richard Zou	6b338c8026	Implement torch.broadcast_tensors (#10075 ) Summary: This exposes expand_outplace to python. Fixes #8076. Fixes #10041. I didn't name it torch.broadcast because numpy.broadcast does something slightly different (it returns an object with the correct shape information). Pull Request resolved: https://github.com/pytorch/pytorch/pull/10075 Differential Revision: D9125816 Pulled By: zou3519 fbshipit-source-id: ebe17c8bb54a73ec84b8f76ce14aff3e9c56f4d1	2018-08-01 19:18:34 -07:00
Michael Suo	191482fa39	Distinguish TupleLiteral from ListLiteral (#10128 ) Summary: Previously, the parser was emitting list literals for tuples, but the IR was representing list literals internally with TupleTypes. For implementing most list operations, I think it will be helpful distinguish between lists (dynamic size, homogeneous types) and tuples (fixed arity, heterogeneous types) This diff modifies the parser logic to emit tuple literals. This frees us to represent lists as ListType in the IR, while still properly mapping tuple literals to TupleTypes. A following diff will actually switch over list literals to emit ListTypes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10128 Differential Revision: D9121305 Pulled By: michaelsuo fbshipit-source-id: e0cad07ae8bac680f7f8113d10e5129d5a1a511d	2018-08-01 19:18:31 -07:00
Yinghai Lu	a44d9d6eb4	Fix tensor check logic in logging (#10138 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10138 Note that `TensorCPU` and `TensorGPU` are all refined to be `Tensor` now. Basically they are the same thing. So check like `blob.IsType<TensorCPU>()` is no longer safe as `TensorGPU` can pass the check too. We need to systematically weed out the such usage in our codebase... @[100008320710723:jerryzh] Reviewed By: houseroad Differential Revision: D9115273 fbshipit-source-id: 13b293c73691002eac34e095cdcd96c27183e875	2018-08-01 18:09:19 -07:00
Edward Yang	24bb8cecbe	Move ATen/Half to ATen/core, and apply lint (#10137 ) Summary: This rewrites checked_convert to use stringstreams, eliminating the use of to_string which is not available on Android stdc++. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10137 Reviewed By: smessmer Differential Revision: D9122340 fbshipit-source-id: b7c1bff70e36217305f2b3333c51543ef8ff3d9c	2018-08-01 17:54:58 -07:00
Junjie Bai	806854a3c5	Pin AMD gpu id in Caffe2 CI (#10144 ) Summary: petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/10144 Differential Revision: D9125707 Pulled By: bddppq fbshipit-source-id: 8ef8f3da6ceb1855f28fc24be621b9b4854ff7f9	2018-08-01 17:39:21 -07:00
Edward Yang	59c355c870	Move halfbits2float and float2halfbits conversions to ATen. (#10134 ) Summary: This will be needed soon because I want to move Half.h into ATen/core, and then I cannot have a TH dependency. I also took the liberty of making the code more strict-aliasing safe (this is not actually useful, since we will never built Torch with strict aliasing) by replacing pointer casts between float and unsigned with a memcpy instead. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10134 Differential Revision: D9121920 Pulled By: ezyang fbshipit-source-id: 3b1f86a7c5880e8ac1a589a51f0635bb72e1fd40	2018-08-01 17:09:12 -07:00
Jenny Ramseyer	4ed5b9267c	#8518 Support for empty tuples (#10027 ) Summary: Fixing #8518 Sorry for the pile of commits; I forgot to rebase. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10027 Reviewed By: ezyang Differential Revision: D9070028 Pulled By: jramseyer fbshipit-source-id: 49729c9755ab8a586711e9f6d6a574f3035a7e75	2018-08-01 16:10:00 -07:00
Pushkar Tripathi	1f6888b70a	Allow mobile exporter to export string arrays (#10017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10017 Allow mobile exporter to export string arrays Reviewed By: pjh5 Differential Revision: D9061213 fbshipit-source-id: b6c5257eb2f0f964dba255b97dc5d32af8ce15a7	2018-08-01 16:09:58 -07:00
Edward Yang	1d427fd6f6	Delete type_ field from TensorImpl, replaced with backend_/scalar_typ… (#9787 ) Summary: …e_/is_variable_ The basic game plan is to stop accessing the type_ field directly, and instead using the stored backend_, scalar_type_ and is_variable_ to look up the appropriate Type from Context. Storage of backend_ and scalar_type_ are new. At some future point in time, I'd like to look at this code carefully to see if I can get everything in this codepath inlining. I didn't do it in this patch because there are circular include problems making things difficult. Some other details: - Added Device::backend() which does what it says on the tin - SparseTensorImpl is temporarily hard-coded to root in at::Context for the appropriate context. If/when we put this in shared code, we'll have to break this dep too, but for now it should be OK. - There's a stupid problem with globalContext() deadlocking if you didn't actually initialize it before loading libtorch.so (which is bringing along the variable hooks). I didn't fix it in this PR; it's tracked in #9784 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9787 Reviewed By: cpuhrsch Differential Revision: D8980971 Pulled By: ezyang fbshipit-source-id: 2b4d867abfdc3999a836a220c638c109053145a8	2018-08-01 15:34:56 -07:00
Sebastian Messmer	edb90387b2	Lint ArrayRef.h (#10129 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10129 - Reviewed By: ezyang Differential Revision: D9119933 fbshipit-source-id: dd13c6d2a0ab72d943acff5cb02b3278ca8c7ba6	2018-08-01 15:34:54 -07:00
Sebastian Messmer	080ae5ea1f	Remove implicit ArrayRef -> vector conversion (#9740 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9740 - Remove implicit ArrayRef -> vector conversion - Fix 4 call sites that accidentally did an implicit expensive vector conversion but wouldn't have needed to - Remove explicit vector conversion from 4 call sites that also didn't need to do that Reviewed By: ezyang Differential Revision: D8961693 fbshipit-source-id: 980da9f988083c0072497f9dbcbbf6f516fa311c	2018-08-01 15:34:52 -07:00
Sebastian Messmer	e2846c365a	Improve ArrayRef (#9610 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9610 Mostly making some stuff in ArrayRef constexpr to give it better perf. Reviewed By: ezyang Differential Revision: D8926785 fbshipit-source-id: af6d4b05fbc69d20855a80f3edc2b501577a742b	2018-08-01 15:34:50 -07:00
Richard Zou	ad6d62250a	Add torch.compiled_with_cxx11_abi(). (#10071 ) Summary: It returns whether PyTorch was built with _GLIBCXX_USE_CXX11_ABI=1. Fixes #8385 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10071 Differential Revision: D9088946 Pulled By: zou3519 fbshipit-source-id: b00fd92ee340ef34f60bdd6027ceaf46dd7442c0	2018-08-01 15:34:48 -07:00
onnxbot	1b1c47dfe5	Update onnx to onnx/onnx@32ac71b (#10126 ) Summary: `32ac71b1b9` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10126 Reviewed By: houseroad Differential Revision: D9120544 Pulled By: bddppq fbshipit-source-id: 4fbe1f16e3b712c092f2f188324173ba1ecc1062	2018-08-01 14:28:54 -07:00
Gregory Chanan	fb24c52dc3	Prepare TH for first class scalars (0-dimensional tensors). Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10123 Differential Revision: D9121068 Pulled By: gchanan fbshipit-source-id: 1cdc6e4b327cf158729cbb4026315be63b159f9d	2018-08-01 14:28:53 -07:00
Gregory Chanan	2d56b5cf8b	Prepare THC for first class scalars (0-dimensional tensors). Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10072 Differential Revision: D9082421 Pulled By: gchanan fbshipit-source-id: d4327b07aaef85cc2521393008154ebceae8cbfd	2018-08-01 14:28:51 -07:00
Edward Yang	59af5b928a	Move UniqueVoidPtr to ATen/core and apply lint Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10131 Reviewed By: smessmer Differential Revision: D9121096 fbshipit-source-id: a6861429f06302e3e279ff669961bba34a9fb7a1	2018-08-01 13:25:23 -07:00
Edward Yang	2d6738e89e	Fix lint in ATen/core (but not ArrayRef) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10124 Reviewed By: smessmer Differential Revision: D9119768 fbshipit-source-id: c0a56d27401b730956945146d4f48d4d5a9b77a6	2018-08-01 13:25:19 -07:00
Roy Li	f908b2b919	Use google protobuf in pytorch onnx import/export Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8469 Reviewed By: houseroad Differential Revision: D9102041 Pulled By: li-roy fbshipit-source-id: 805c473745d181b71c7deebf0b9afd0f0849ba4f	2018-08-01 12:54:41 -07:00
Edward Yang	5a44be50ab	Minor nit in comment in CMakeLists.txt Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10125 Reviewed By: smessmer Differential Revision: D9119766 fbshipit-source-id: 290b804bc552b1c3f68e5129ff60ef7f34307714	2018-08-01 12:39:38 -07:00
Anders Papitto	e8f27311aa	fix a couple problems with libtorch cmake file (#10091 ) Summary: in particular, make not building tests actually work Pull Request resolved: https://github.com/pytorch/pytorch/pull/10091 Differential Revision: D9121366 Pulled By: anderspapitto fbshipit-source-id: d7d38cf759aa46bff90d3b4f695c20f29039ae75	2018-08-01 11:39:33 -07:00
Owen Anderson	f126687fbc	Add a dump() method to IR Node's. (#10106 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10106 Differential Revision: D9119891 Pulled By: resistor fbshipit-source-id: 5f41d8890007c639f8f0cdc92d11b128433ad6b8	2018-08-01 11:09:53 -07:00
Sebastian Messmer	4070005081	Move C++17.h to ATen/core (#10107 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10107 This header is needed for ATen/core stuff This diff also fixes an issue in C++17.h when run in C++17 enabled compilers. Reviewed By: ezyang Differential Revision: D9095209 fbshipit-source-id: d45947956019a7095875f48746b88c414e8865bc	2018-08-01 09:54:59 -07:00
Peter Goldsborough	87d57dc5f5	Simplified Operator (#10080 ) Summary: zdevito explained that the attributed versions of `Operator`s are no longer necessary. This PR does two things: 1. Removes all code associated with attributed operators, 2. Adds a second kind of state to `Operator` where it is constructed with an `Operation` directly instead of an `OperationCreator`. This will be useful to test custom operators which don't require a node (you can just retrieve it directly). Now rebased on top of https://github.com/pytorch/pytorch/pull/9801 zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/10080 Differential Revision: D9113668 Pulled By: goldsborough fbshipit-source-id: 1276a191c7cf89da1c38488769f2105ce2664750	2018-08-01 09:41:08 -07:00
Mingzhe Li	f1964c43fd	Update eigen submodule to fix BUILD_ATEN issue (#10095 ) Summary: Extracted from https://github.com/pytorch/pytorch/pull/8338 Updating Eigen submodule to fix an issue we saw with BUILD_ATEN and BUILD_CAFFE2 removal. cc mingzhe09088 ezyang smessmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/10095 Reviewed By: mingzhe09088 Differential Revision: D9109877 Pulled By: orionr fbshipit-source-id: 90e36c298d8a22398558d70dc5f68a95a7687d6b	2018-08-01 09:41:06 -07:00
Anders Papitto	a2a7b0c01a	Initial documentation for building libtorch (#10087 ) Summary: It's not a particularly pretty process right now, but it may as well be documented. I'm not aware of an ideal location for this, so I'm just dropping it in the docs/ folder for now as recommended by soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/10087 Differential Revision: D9119681 Pulled By: anderspapitto fbshipit-source-id: cd4afb642f3778c888d66a501bc697d0b0c88388	2018-08-01 09:41:02 -07:00
Dr. Kashif Rasul	ee964c51f4	NegativeBinomial distribution (#9345 ) Summary: - [x] implement distribution - [x] add tests - [x] docs cc ingmarschuster Pull Request resolved: https://github.com/pytorch/pytorch/pull/9345 Differential Revision: D8807023 Pulled By: ezyang fbshipit-source-id: 7bf7f352dd455e0909c58dd94e1bdebba0e8b5c8	2018-08-01 08:39:25 -07:00
Xingdong Zuo	2f848ec8ec	Use new PyTorch API to make code simpler Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9968 Differential Revision: D9088316 Pulled By: li-roy fbshipit-source-id: 2658fe0c1734d8b064cbad24d8f0d6c341400b4e	2018-08-01 08:39:23 -07:00
Edward Yang	fa6b28bf40	Move ArrayRef, Backtrace, Error, SmallVector, optional to ATen/core; add CoreAPI (#10092 ) Summary: This also makes Backtrace more portable, by disabling its functionality for mobile builds as well. It also handles Caffe2 static Windows builds by introducing a new variable, AT_CORE_STATIC_WINDOWS, which must be set if you're building ATen on Windows as part of a static library. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10092 Reviewed By: gchanan, smessmer Differential Revision: D9094393 Pulled By: ezyang fbshipit-source-id: 93281f9302bd378605a26589ae308faf1dac7df4	2018-08-01 08:39:22 -07:00
Gregory Chanan	b503109f20	Guard sizes/strides in THCUNN for scalars. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10083 Differential Revision: D9093572 Pulled By: gchanan fbshipit-source-id: a5c27571ec06f8ed30e6b3b492c743444b58d9fe	2018-08-01 08:10:33 -07:00
Tongzhou Wang	43b151224e	Move grid sampler to ATen (#9961 ) Summary: Spatial version benchmark \| \| CPUFloat THNN \| CPUFloat ATen \| CPUDouble THNN \| CPUDouble ATen \| CUDAHalf THNN \| CUDAHalf ATen \| CUDAFloat THNN \| CUDAFloat ATen \| CUDADouble THNN \| CUDADouble ATen \| \|---------------------------\|---------------\|---------------\|----------------\|----------------\|---------------\|---------------\|----------------\|----------------\|-----------------\|-----------------\| \| [1024x1x28x28] zero pad \| 2.19281888s \| 0.21280479s \| 2.52922535s \| 0.23944831s \| 0.17494774s \| 0.06242800s \| 0.31270599s \| 0.03706479s \| 0.40542483s \| 0.07391024s \| \| [1024x1x28x28] border pad \| 3.04329610s \| 0.24705672s \| 2.29205394s \| 0.22336411s \| 0.17980361s \| 0.06212497s \| 0.31415701s \| 0.03847790s \| 0.43020391s \| 0.07540464s \| \| [32x3x244x244] zero pad \| 18.29301333s \| 2.18566656s \| 19.01662397s \| 3.51552224s \| 1.72487235s \| 0.28933954s \| 2.02466702s \| 0.18178749s \| 2.63671613s \| 0.41391206s \| \| [32x3x244x244] border pad \| 18.72205329s \| 2.02600884s \| 20.13017297s \| 3.25979590s \| 1.96455693s \| 0.33070564s \| 2.18666625s \| 0.19546938s \| 2.91268897s \| 0.38465047s \| For #9702 basics: + grid tensors have dimensions `[N, H, W, 2]` (or `[N, D, H, W, 3]` for 3d). + input/output tensors have dimensions `[N, C, H, W]` (or `[N, C, D, H ,W]` for 3d) + grid sampler maps `input([N, C, inp_H, inp_W]), grid([N, H, W, 2])` to `output([N, C, H, W])` (3d case is similar). variable naming: + `tensor_sH` means the stride of `tensor` at the dimension of `H`. + `tensor_ptr_NCH` is a data pointer that always points to the beginning of the `tensor[n][c][h]` slice in the loop. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9961 Differential Revision: D9057175 Pulled By: SsnL fbshipit-source-id: 9ed8f1dc376ed10229f047fdcf3c90dbd250bee6	2018-08-01 07:54:46 -07:00
Xiang Gao	6fc75eadf0	Add CELU activation to pytorch (#8551 ) Summary: Also fuse input scale multiplication into ELU Paper: https://arxiv.org/pdf/1704.07483.pdf Pull Request resolved: https://github.com/pytorch/pytorch/pull/8551 Differential Revision: D9088477 Pulled By: SsnL fbshipit-source-id: 877771bee251b27154058f2b67d747c9812c696b	2018-08-01 07:54:44 -07:00
Wei Yang	6f6a1f2d63	fix test_load_error_msg failure (Network is unreachable) (#10021 ) Summary: - fixes [some failure] - removed use of urlopen in test_load_error_msg] cc soumith Pull Request resolved: https://github.com/pytorch/pytorch/pull/10021 Differential Revision: D9068108 Pulled By: weiyangfb fbshipit-source-id: a9484d4a913508d54731b6a1eef3cddff66604f2	2018-08-01 00:24:01 -07:00
Pritam Damania	5bd43a7af8	Refactor Seq2SeqModelCaffe2EnsembleDecoder (#10035 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10035 This is an initial diff which refactors some of the components in the Seq2SeqModelCaffe2EnsembleDecoder class. Reviewed By: jmp84 Differential Revision: D9026372 fbshipit-source-id: 449635208f24494209ae2fb78a19fca872970ea8	2018-07-31 23:09:09 -07:00
Yinghai Lu	3d247041e4	Force sync device when ops are sampled for observation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10054 Reviewed By: xw285cornell Differential Revision: D9071097 fbshipit-source-id: 44357cdf79148e81db86c5350122a1a320a923fb	2018-07-31 21:09:00 -07:00
Bram Wasti	ec807f2a91	Bail out if netdef has disable_nomnigraph argument Summary: allow models to override nomnigraph opts Reviewed By: ajtulloch Differential Revision: D9035729 fbshipit-source-id: 2b30208263c14ce7039f27c618a3b232bf11ee33	2018-07-31 20:54:46 -07:00
Bram Wasti	fcd567ed15	Enable Optimization on mobile by default Summary: Re-enable opt by default Reviewed By: Maratyszcza Differential Revision: D8525434 fbshipit-source-id: a61253907251a44cfc59e0b50fb1906c5eb20558	2018-07-31 20:54:44 -07:00
Peter Goldsborough	7d2bda7588	Move DDP broadcast coalesced to C++ (#9729 ) Summary: This PR depends on the tests added in #9670. It moves the first, tiny function from the c10d DDP to C++: `dist_broadcast_coalesced`. Let me know if ` torch/csrc/distributed/c10d/ddp.h` will be a good place to put these rewritten functions. pietern The controller you requested could not be found. apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9729 Differential Revision: D8985308 Pulled By: goldsborough fbshipit-source-id: dc459fe9040273714044152063585e746974752f	2018-07-31 19:54:21 -07:00
0phoff	294c065384	Changed serialization mechanism of LambdaLR scheduler (#9927 ) Summary: I opened an issue explaining some of my frustrations with the current state of schedulers. While most points that I raised in [that issue](https://github.com/pytorch/pytorch/issues/8741#issuecomment-404449697) need to be discussed more thoroughly before being implemented, there are some that are not so difficult to fix. This PR changes the way the LambdaLR scheduler gets serialized: > The lr_lambda functions are only saved if the are callable objects (which can be stateful). > There is no point in saving functions/lambdas as you need their definition before unpickling and they are stateless. This has the big advantage that the scheduler is serializable, even if you use lambda functions or locally defined functions (aka a function in a function). Does this functionality need any unit tests? Pull Request resolved: https://github.com/pytorch/pytorch/pull/9927 Differential Revision: D9055505 Pulled By: soumith fbshipit-source-id: 6c1cec588beedd098ec7d2bce6a9add27f29e48f	2018-07-31 19:39:06 -07:00
Kyle M. Tarplee	aae37324cc	fixed a newly introduced regression in softmax (#10066 ) Summary: There is a regression in softmin in 0.4.1 that was not present in 0.4.0. The behavior of softmin(x) should match softmax(-x) however instead it is implemented (in v0.4.1) as -softmax(x). These are not the same. The fix is trivial because the bug is due to operator precedence. This is a major regression that broke my training. I'm not sure how a unit test did not catch this. ``` x = torch.tensor([1, 2, 3.5, 4]) print(F.softmin(x, dim=0)) # this has the wrong output in 0.4.1 but correct in 0.4.0 print(F.softmax(-x, dim=0)) # this is what softmax should be print(F.softmax(x, dim=0)) print(-F.softmax(x, dim=0)) # this is how softmax is implemented incorrectly ``` In 0.4.1 this produces tensor([-0.0278, -0.0755, -0.3385, -0.5581]) tensor([0.6668, 0.2453, 0.0547, 0.0332]) tensor([0.0278, 0.0755, 0.3385, 0.5581]) tensor([-0.0278, -0.0755, -0.3385, -0.5581]) In 0.4.0 this produces the correct values tensor([ 0.6668, 0.2453, 0.0547, 0.0332]) tensor([ 0.6668, 0.2453, 0.0547, 0.0332]) tensor([ 0.0278, 0.0755, 0.3385, 0.5581]) tensor([-0.0278, -0.0755, -0.3385, -0.5581]) Pull Request resolved: https://github.com/pytorch/pytorch/pull/10066 Differential Revision: D9106995 Pulled By: soumith fbshipit-source-id: 7332503c6077e8461ad6cd72422c749cf6ca595b	2018-07-31 19:28:30 -07:00
Bram Wasti	f2412fbafc	Allow multiple ops.def and clean up code gen in general Summary: This is a cleanup and refactoring. In its original form (changeset 6fdf915c057a) this diff caused a 5% regression on ads CPU. The root cause was an omission of link_whole = True, causing symbols to be stripped in mode/opt and forcing the converter to fallback causing patterns to be unmatched in the graph transform logic. This version of the diff tests for link_whole by including a C++ test of the transform Reviewed By: yinghai Differential Revision: D9040511 fbshipit-source-id: 3e19b89989aa68b021762d12af2d0b4111280b22	2018-07-31 19:28:28 -07:00
Shuichi KITAGUCHI	799c947cf3	add .gitattributes for EOL conversion. (#9813 ) Summary: `.bat` file's EOL is LF, so a build is failed on some Windows machines. To fix this, add `.gitattributes` and set batch file's EOL to CRLF. Discussion is in #9677. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9813 Differential Revision: D9026486 Pulled By: soumith fbshipit-source-id: 341eaa677c35f8476a7eda1bac9827385072eb29	2018-07-31 18:38:43 -07:00
Bram Wasti	9c0f65fc87	Remove While op stuff (#10102 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10102 these codepaths are unused, deleting them Reviewed By: yinghai Differential Revision: D9109764 fbshipit-source-id: 8ace42a399806632bfbcada96b383268f0a8ae89	2018-07-31 17:56:25 -07:00
Bram Wasti	c54d71ba60	Upgrade old transform passes to newer APIs (#10046 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10046 stampable Reviewed By: duc0 Differential Revision: D9075830 fbshipit-source-id: dc65be1d39625ef24ad319b5ce0263ecfe7a10c9	2018-07-31 17:39:35 -07:00
Bram Wasti	ceb0f14176	Fix SpatialBN Fusion (#10044 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10044 The test was subtly broken! This transform wasn't writing to the correct blob and the test did not catch that because it was looking at the old version. thanks @[100022211048576:kerenzhou] for catching this Reviewed By: Jokeren Differential Revision: D9075520 fbshipit-source-id: c31ff0afcd78dd2dc7ffc240e2e89eeda87f1fb4	2018-07-31 17:39:34 -07:00
Zachary DeVito	bf744bea94	Parse and register schema declarations lazily (#9801 ) Summary: This should prevent slow startup times, and will not report as many errors during static initialization time which are hard to debug ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9801 Reviewed By: goldsborough Differential Revision: D8986603 Pulled By: zdevito fbshipit-source-id: 440d43ab5e8cffe0b15118cb5fda36391ed06dbc	2018-07-31 17:24:24 -07:00
Gregory Chanan	34c7c56c73	Re-enable empty n-dimensional empty tensor and fix parallel CPU on empty tensors (#10077 ) Summary: This is a combination of https://github.com/pytorch/pytorch/pull/9947 (this was reverted) and https://github.com/pytorch/pytorch/pull/10076. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10077 Differential Revision: D9087491 Pulled By: gchanan fbshipit-source-id: 9fe9905628000f2ff3e47df32533cd7d1f25a354	2018-07-31 16:43:45 -07:00
Junjie Bai	ba5d33bede	Re-Enable ATen in C2 in integration builds to test ONNX ATen conversions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10060 Differential Revision: D9081387 Pulled By: bddppq fbshipit-source-id: 13cbff63df5241e013d4ebacfcd6da082e7196f6	2018-07-31 15:27:05 -07:00
Yinghai Lu	e04f8bbfa6	Add virtual dtor for ideep context (#10059 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10059 Without virtual dtor, it could induce incorrect sized deallocation, messing up the memory. And unfortunately, sized deallocation cannot be detected by ASAN, yet. Reviewed By: jerryzh168 Differential Revision: D9080526 fbshipit-source-id: c136cf653134e75b074326be2bc03627da42446f	2018-07-31 15:27:02 -07:00
Edward Yang	d2178562a4	Remove some unnecessary includes. (#10085 ) Summary: The affected files are all files that are planned to be moved to ATen/core; the includes are for headers which are NOT slated for movement. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10085 Differential Revision: D9093746 Pulled By: ezyang fbshipit-source-id: 2beeffdae26d03d631d2d51b40bf6303759a2f50	2018-07-31 15:13:37 -07:00
Adam Paszke	1f13453b4d	Slightly relax the constraints on argument and return types to script functions (#9969 ) Summary: This lays out initial support for taking and returning a richer set of types than only tensors. Floats and ints are already valid, lists are straightforward to add, tuples need some discussion. Based on top of #9948. Review only the last commit. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9969 Reviewed By: zdevito Differential Revision: D9076973 Pulled By: apaszke fbshipit-source-id: 5a1fe912ea6b79ab2bfd0dcce265eb05855b5ff0	2018-07-31 14:25:29 -07:00
Sebastian Messmer	58fd6e1dd6	Also add ATen/core tests to oss CI (#10029 ) Summary: - Pull Request resolved: https://github.com/pytorch/pytorch/pull/10029 Reviewed By: ezyang Differential Revision: D9070030 Pulled By: smessmer fbshipit-source-id: b5ae79a383dc14e7d79e6a82c5d70e951c9f5168	2018-07-31 13:54:39 -07:00
Lu Fang	ee17ed672b	Add missing dependencies (#10086 ) Summary: Fix the master Pull Request resolved: https://github.com/pytorch/pytorch/pull/10086 Differential Revision: D9093741 Pulled By: houseroad fbshipit-source-id: 65e42994ae7d8e0b449d10a8116a7609434aad04	2018-07-31 13:54:38 -07:00
Roy Li	2422801625	fix _pointwise_loss for target gradients (#10018 ) Summary: _pointwise loss has some python special casing, we converted reduction to aten enums too early. fixes #10009 Pull Request resolved: https://github.com/pytorch/pytorch/pull/10018 Differential Revision: D9075489 Pulled By: li-roy fbshipit-source-id: 4bf2f5e2911e757602c699ee1ec58223c61d0162	2018-07-31 13:39:58 -07:00
Lu Fang	56d1a82b31	Add shape inference when converting from onnx to caffe2 (#10037 ) Summary: Otherwise, some RNN case conversion may fail. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10037 Reviewed By: orionr Differential Revision: D9072298 Pulled By: houseroad fbshipit-source-id: 080f589eba8618719453feb15a7a494fe5380dd0	2018-07-31 12:42:02 -07:00
Ailing Zhang	371a786b18	Errors out when Openmpi < 2.x.x with distributed. (#10015 ) Summary: This PR fixes #9418 . Openmpi 1.10 segfaults in MPI_Bcast with CUDA buffer. And it's a retired openmpi version. I've tested on 2.1.1 and 3.0.0 and they work well. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10015 Reviewed By: soumith Differential Revision: D9088103 Pulled By: ailzhang fbshipit-source-id: fc0a45e5cd016093ef0dbb9f371cbf67170d7045	2018-07-31 12:24:40 -07:00
Edward Yang	1ae520c704	Add AT_CHECK for null storage. (#9823 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9823 Differential Revision: D9029433 Pulled By: ezyang fbshipit-source-id: 6101556305593c66f618b20d8c2a084ae2558ea8	2018-07-31 12:09:25 -07:00
Thomas Viehmann	685224aa14	Add CTC loss (#9628 ) Summary: The CPU and CUDA variants are a direct transposition of Graves et al.'s description of the algorithm with the modification that is is in log space. The there also is a binding for the (much faster) CuDNN implementation. This could eventually fix #3420 I still need to add tests (TestNN seems much more elaborate than the other testing) and fix the bugs than invariably turn up during the testing. Also, I want to add some more code comments. I could use feedback on all sorts of things, including: - Type handling (cuda vs. cpu for the int tensors, dtype for the int tensors) - Input convention. I use log probs because that is what the gradients are for. - Launch parameters for the kernels - Errors and obmissions and anything else I'm not even aware of. Thank you for looking! In terms of performance it looks like it is superficially comparable to WarpCTC (and thus, but I have not systematically investigated this). I have read CuDNN is much faster than implementations because it does not use log-space, but also the gathering step is much much faster (but I avoided trying tricky things, it seems to contribute to warpctc's fragility). I might think some more which existing torch function (scatter or index..) I could learn from for that step. Average timings for the kernels from nvprof for some size: ``` CuDNN: 60.464us compute_alphas_and_betas 16.755us compute_grads_deterministic Cuda: 121.06us ctc_loss_backward_collect_gpu_kernel (= grads) 109.88us ctc_loss_gpu_kernel (= alphas) 98.517us ctc_loss_backward_betas_gpu_kernel (= betas) WarpCTC: 299.74us compute_betas_and_grad_kernel 66.977us compute_alpha_kernel ``` Of course, I still have the (silly) outer blocks loop rather than computing consecutive `s` in each thread which I might change, and there are a few other things where one could look for better implementations. Finally, it might not be unreasonable to start with these implementations, as the performance of the loss has to be seen in the context of the entire training computation, so this would likely dilute the relative speedup considerably. My performance measuring testing script: ``` import timeit import sys import torch num_labels = 10 target_length = 30 input_length = 50 eps = 1e-5 BLANK = 0#num_labels batch_size = 16 torch.manual_seed(5) activations = torch.randn(input_length, batch_size, num_labels + 1) log_probs = torch.log_softmax(activations, 2) probs = torch.exp(log_probs) targets = torch.randint(1, num_labels+1, (batch_size * target_length,), dtype=torch.long) targets_2d = targets.view(batch_size, target_length) target_lengths = torch.tensor(batch_size[target_length]) input_lengths = torch.tensor(batch_size[input_length]) activations = log_probs.detach() def time_cuda_ctc_loss(grout, args): torch.cuda.synchronize() culo, culog_alpha = torch._ctc_loss(args) g, = torch.autograd.grad(culo, args[0], grout) torch.cuda.synchronize() def time_cudnn_ctc_loss(groupt, args): torch.cuda.synchronize() culo, cugra= torch._cudnn_ctc_loss(args) g, = torch.autograd.grad(culo, args[0], grout) torch.cuda.synchronize() def time_warp_ctc_loss(grout, args): torch.cuda.synchronize() culo = warpctc.ctc_loss(args, blank_label=BLANK, size_average=False, length_average=False, reduce=False) g, = torch.autograd.grad(culo, args[0], grout) torch.cuda.synchronize() if sys.argv[1] == 'cuda': lpcu = log_probs.float().cuda().detach().requires_grad_() args = [lpcu, targets_2d.cuda(), input_lengths.cuda(), target_lengths.cuda(), BLANK] grout = lpcu.new_ones((batch_size,)) torch.cuda.synchronize() print(timeit.repeat("time_cuda_ctc_loss(grout, args)", number=1000, globals=globals())) elif sys.argv[1] == 'cudnn': lpcu = log_probs.float().cuda().detach().requires_grad_() args = [lpcu, targets.int(), input_lengths.int(), target_lengths.int(), BLANK, True] grout = lpcu.new_ones((batch_size,)) torch.cuda.synchronize() print(timeit.repeat("time_cudnn_ctc_loss(grout, args)", number=1000, globals=globals())) elif sys.argv[1] == 'warpctc': import warpctc activations = activations.cuda().detach().requires_grad_() args = [activations, input_lengths.int(), targets.int(), target_lengths.int()] grout = activations.new_ones((batch_size,), device='cpu') torch.cuda.synchronize() print(timeit.repeat("time_warp_ctc_loss(grout, *args)", number=1000, globals=globals())) ``` I'll also link to a notebook that I used for writing up the algorithm in simple form and then test the against implementations against it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9628 Differential Revision: D8952453 Pulled By: ezyang fbshipit-source-id: 18e073f40c2d01a7c96c1cdd41f6c70a06e35860	2018-07-31 11:09:48 -07:00
Edward Yang	430e44480f	Delete some obsolete steps in the ROCm build. (#10005 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10005 Differential Revision: D9066107 Pulled By: ezyang fbshipit-source-id: 346f654214cff1c956a4022173347d95657ee9d4	2018-07-31 11:09:46 -07:00
Junjie Bai	f779202711	Correctly set CAFFE2_DISABLE_NUMA when USE_NUMA=OFF in cmake (#10061 ) Summary: previously https://github.com/pytorch/pytorch/blob/master/caffe2/core/numa.cc still gets compiled even when USE_NUMA=OFF Pull Request resolved: https://github.com/pytorch/pytorch/pull/10061 Reviewed By: houseroad Differential Revision: D9081385 Pulled By: bddppq fbshipit-source-id: ad28b647e0033727839770b1da0fba341b1b7787	2018-07-31 11:01:51 -07:00
Junjie Bai	cba03e2ebe	Handle dynamic repeats in onnx symbolic (#10052 ) Summary: ONNX Tile can takes the `repeats` as dynamic input Pull Request resolved: https://github.com/pytorch/pytorch/pull/10052 Differential Revision: D9076841 Pulled By: bddppq fbshipit-source-id: ddd692c5f5846c8fdba019baa9fad83ef9638da4	2018-07-31 10:39:50 -07:00
Gregory Chanan	0c11101eca	Prepare THNN/THCUNN for first class scalars. (#10023 ) Summary: I previous did some transformations, e.g. _nDimension,_dim -> nDimensionLegacyAll, nDimension -> nDimensionLegacyNoScalars. But this didn't touch dim(), which needs to be updated to support scalars. Instead of doing an (ugly) move, I audited the call sites and updated the cases that could be size 1. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10023 Differential Revision: D9068996 Pulled By: gchanan fbshipit-source-id: c63820767dd1496e908a5a96c34968482193f2c5	2018-07-31 10:39:48 -07:00
Mohammad Hossein Sekhavat	c2d9d2888b	Fix typo in tensors.rst (#10073 ) Summary: An tensor -> A tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/10073 Differential Revision: D9087421 Pulled By: soumith fbshipit-source-id: 6713f5a5e11fb11dff0ab5d2d6274f7837c6625f	2018-07-31 10:13:40 -07:00
103yiran	68cbe37c6a	fix the reference link path Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9240 Reviewed By: SsnL Differential Revision: D8764196 Pulled By: ezyang fbshipit-source-id: 3efc70714406d801ed74f52313beca61129593c7	2018-07-31 09:09:46 -07:00
Adam Paszke	5e5c15dd42	Add (constant size) TensorLists to JIT, use them in cat and stack nodes (#9948 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9948 Reviewed By: ezyang Differential Revision: D9033666 Pulled By: apaszke fbshipit-source-id: 02d75e391ed6dee62500842df50f0b6ee5e38846	2018-07-31 07:39:52 -07:00
Gregory Chanan	6fb9acfc16	Revert empty n-dim and ATen in C2 integration builds Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10064 Differential Revision: D9082082 Pulled By: gchanan fbshipit-source-id: ae49470f5b4c89b13beb55fd825de1ba05b6a4fa	2018-07-31 07:25:56 -07:00
Lu Fang	78b806c861	Fix the onnx symbolic for upsample (#10001 ) Summary: We missed the upsample symbolic when bumping up the opset to 7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10001 Reviewed By: bddppq Differential Revision: D9067212 Pulled By: houseroad fbshipit-source-id: 3e285d2800a32cb04fa82f8e7f261bdd010a8883	2018-07-30 21:39:48 -07:00
Edward Yang	37a226de63	When BUILD_ATEN=OFF, use ATen/core directly (#10019 ) Summary: ATenCore.h is a dummy header to just test that this is working at all. Pull Request resolved: https://github.com/pytorch/pytorch/pull/10019 Reviewed By: smessmer Differential Revision: D9067262 Pulled By: ezyang fbshipit-source-id: 58bab9c0aa83b56335e36b719b9b6505400d8dee	2018-07-30 21:09:55 -07:00
Junjie Bai	aa36a5d01c	Add typing into caffe2 requirements.txt for USE_ATEN (#10047 ) Summary: I was dumb lol Pull Request resolved: https://github.com/pytorch/pytorch/pull/10047 Differential Revision: D9076023 Pulled By: bddppq fbshipit-source-id: 10587875d04ac2aed2e015846fc73ce9e4717a4f	2018-07-30 20:09:21 -07:00
Junjie Bai	51539fa383	Add pyyaml into caffe2 requirements.txt for USE_ATEN Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10039 Reviewed By: houseroad Differential Revision: D9074261 Pulled By: bddppq fbshipit-source-id: 26df516633d5a4ec539a03a62cf9e7839e1e1964	2018-07-30 18:11:25 -07:00
Andrew Tulloch	8f0a229078	Fix HPTT path for 0-sized inputs. Reviewed By: Maratyszcza Differential Revision: D9068091 fbshipit-source-id: 4aeac45f9732a86979a08488637bf0ba6cc79b34	2018-07-30 17:54:57 -07:00
Duc Ngo	788b2e996d	nomnigraph - minor cleanup of Graph.h (#9890 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9890 Minor cleanups for Graph.h to make it more consistent with our style guide Also fix opt/device.cc and binary_match_test.cc to not access subgraph.nodes_ which is now private Reviewed By: bwasti Differential Revision: D9017108 fbshipit-source-id: 9f5cba4a2cd2a452a955005f4704f6c120bbc1d5	2018-07-30 16:24:03 -07:00
Yinghai Lu	e0a0234018	Remove C++14 feature (#10022 ) Summary: Which test should I look at, bddppq? Pull Request resolved: https://github.com/pytorch/pytorch/pull/10022 Reviewed By: bddppq Differential Revision: D9068732 Pulled By: yinghai fbshipit-source-id: 241ef72c7fac0ed0b8c58ecdffbb5e24eb956217	2018-07-30 16:24:02 -07:00
onnxbot	3e3f40aeeb	Update onnx to latest master (#10024 ) Summary: `df01dbc005` Pull Request resolved: https://github.com/pytorch/pytorch/pull/10024 Reviewed By: houseroad Differential Revision: D9069464 Pulled By: bddppq fbshipit-source-id: 751328352cd495e27b6bd533f4632d3d6d06c4a6	2018-07-30 15:54:34 -07:00
Elias Ellison	e57cb4a1b2	Add a Constant Propagation Pass to the JIT (#8808 ) Summary: Adding a constant propagation pass to the JIT. I have added examples to the expect files. There are a couple of special cases which have not been implemented here. IF nodes with constant conditions can be inlined with the correct block. WHILE nodes can be removed if the condition is false. I have added a test for each case in test_jit.py file as expected failures. To be consistent with DCE, python ops & CPP ops are treated as not having side-effects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8808 Reviewed By: wanchaol Differential Revision: D8906770 Pulled By: eellison fbshipit-source-id: 10ad796d89f80b843566c9ddad6a0abd1f3dc74c	2018-07-30 15:54:31 -07:00
Xiuyan Ni	db96a0951f	Add SIMD version to GFTRL optimizer (#9698 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9698 Add SIMD version to GFTRL optimizer Differential Revision: D8949723 fbshipit-source-id: 835ce2ce49630ae43fc6bac63c545c14b25f5a26	2018-07-30 15:27:24 -07:00
Christian Puhrsch	9987282134	Use Retainable as base class for StorageImpl Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9956 Reviewed By: gchanan Differential Revision: D9066103 Pulled By: cpuhrsch fbshipit-source-id: 1a5a2ace306308707add3d0e0c1fc861f5c79705	2018-07-30 15:08:56 -07:00
Gregory Chanan	7214754663	Check and return when numel() == 0 in Loops.cuh. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/10031 Reviewed By: colesbury Differential Revision: D9070346 Pulled By: gchanan fbshipit-source-id: d6ad4e6ca43d334f5be42fea35915270dd8f405e	2018-07-30 15:01:28 -07:00
Junjie Bai	57750bd638	Enable ATen in C2 in integration builds to test ONNX ATen conversions (#10014 ) Summary: zrphercule Pull Request resolved: https://github.com/pytorch/pytorch/pull/10014 Reviewed By: houseroad Differential Revision: D9061842 Pulled By: bddppq fbshipit-source-id: 1e1c2aeae62dd2cc5c6a8d5e1d395ea5cf882734	2018-07-30 15:01:13 -07:00
Thomas Viehmann	6c7fb1582f	Introduce __array_priority__ on torch.Tensor (#9651 ) Summary: This causes numpy to yield to the torch functions, e.g. instead of numpy array/scalar __mul__ converting the tensor to an array, it will now arrange for the Tensor __rmul__ to be called. Fixes case 2 of #9468 I also makes case 3 and 4 equivalent but does not fix them. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9651 Differential Revision: D8948079 Pulled By: ezyang fbshipit-source-id: bd42c04e96783da0bd340f37f4ac3559e9bbf8db	2018-07-30 14:39:43 -07:00
vishwakftw	ea3c36b822	NumPy Scalar to PyTorch Scalar (#9225 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/4985 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9225 Differential Revision: D8769317 Pulled By: ezyang fbshipit-source-id: eeaeaf0749c9dc9e372634da68b4bd23e6e3ad28	2018-07-30 14:39:40 -07:00
Mingzhe Li	c9eab34e63	Fix Caffe2 with ATen conda build failure (#10020 ) Summary: Extracted from `627624627e` and in support of https://github.com/pytorch/pytorch/pull/10019 cc pjh5 mingzhe09088 ezyang smessmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/10020 Reviewed By: pjh5 Differential Revision: D9068124 Pulled By: orionr fbshipit-source-id: 4dd4910136a312b6517c65ce8802837108475f89	2018-07-30 14:10:02 -07:00
Peter Goldsborough	04939a4745	Match parameter names and = default (#9737 ) Summary: More clang tidy cleanups in `torch/csrc`. This time: 1. `hicpp-use-equals-default` recommends `= default` instead of `{}` for constructors/destructors. This is better practice because it expresses the intent better (https://stackoverflow.com/questions/6502828/what-does-default-mean-after-a-class-function-declaration) 2. `readability-inconsistent-declaration-parameter-name` enforces that parameter names in the declaration match parameter names in the definition. This is just generally useful and can prevent confusion and bugs. Also updated my script a little bit. apaszke ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/9737 Differential Revision: D9069069 Pulled By: goldsborough fbshipit-source-id: f7b3f3a4eb4c9fadc30425a153566d3b613a41ae	2018-07-30 14:10:00 -07:00
Zachary DeVito	40a8239984	Fix a bug in argument spec (#9958 ) Summary: Non-tensor types did not set the running total_dims count, causing corrupted data. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9958 Reviewed By: jamesr66a Differential Revision: D9065621 Pulled By: zdevito fbshipit-source-id: 0ac1fcdf6da076a9c9ebd5d70ce9126e3f8e722e	2018-07-30 13:08:59 -07:00
Thomas Viehmann	faa96c1c47	Deal with spaces in einsum equation string (#9994 ) Summary: Fixes #9930 Thank you, vadimkantorov for the report. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9994 Differential Revision: D9042876 Pulled By: ezyang fbshipit-source-id: 3bbd1aaaf1b432be40a7652b6a746d80934a216b	2018-07-30 12:57:56 -07:00
Gregory Chanan	ce5f0d40b6	Enable n-dimensional empty tensors. (#9947 ) Summary: These could use some autograd tests, which are coming in a later PR, but using them in autograd is probably pretty rare. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9947 Reviewed By: ezyang Differential Revision: D9032778 Pulled By: gchanan fbshipit-source-id: fa5a6509d3bac31ea4fae25143e82de62daabfbd	2018-07-30 12:33:17 -07:00
Jerry Zhang	73a60efccc	Fix Caffe2CTScan error (#9962 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9962 att Reviewed By: hlu1 Differential Revision: D9036869 fbshipit-source-id: 3155af00c62d489f998cbfba07121c4fd20e1c6f	2018-07-30 12:33:15 -07:00
Edward Yang	b4f8c60931	Don't use the XML reporter for Catch2. (#10012 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/10012 Differential Revision: D9057766 Pulled By: ezyang fbshipit-source-id: 12148a8cf3061423c61b3e7b36864dfcdb1138a1	2018-07-30 11:25:09 -07:00
Christian Puhrsch	9a9a7325c6	Remove the generation of storage files Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9954 Reviewed By: gchanan Differential Revision: D9035947 Pulled By: cpuhrsch fbshipit-source-id: 9b56c7a68e3f562ea11b9265a5fa234838f2b4e0	2018-07-30 09:53:57 -07:00
Edward Yang	432ca747b0	Don't seed GPUs if there are none available. (#9931 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9931 Differential Revision: D9051375 Pulled By: ezyang fbshipit-source-id: 1721f6217e07f80adc107d95e897cd7dd488659a	2018-07-30 08:23:53 -07:00
onnxbot	3609977d7f	Update onnx to onnx/onnx@c761845 (#9964 ) Summary: `c761845c7f` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9964 Reviewed By: houseroad Differential Revision: D9038133 Pulled By: bddppq fbshipit-source-id: 6ce740944e636175d2de4602edb92cc4d7e8e5ac	2018-07-29 23:10:12 -07:00
Giovanni	5ff1551eb9	ATen's emscripten support (#9803 ) Summary: Not sure if anybody is interested but I managed to infer a `GRU` fine in `wasm` using ATen's compiled with emscripten. It was quite trivial to fix the configuration. It also passes most of the tests, specially all scalar tensor tests. The command line to configure was, but could be simplified: ``` emconfigure cmake -DAT_LINK_STYLE=STATIC -DCAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO=OFF -DCMAKE_C_FLAGS="-Wno-implicit-function-declaration -DEMSCRIPTEN -s DISABLE_EXCEPTION_CATCHING=0" -DCMAKE_CXX_FLAGS="-Wno-implicit-function-declaration -DEMSCRIPTEN -s DISABLE_EXCEPTION_CATCHING=0" -DCMAKE_INSTALL_PREFIX=/home/sugar/aten-wasm ../ ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9803 Differential Revision: D9004610 Pulled By: ezyang fbshipit-source-id: db26c59f27162ed80f6aee2973c4cb9252d3d1e4	2018-07-29 20:39:00 -07:00
peter	3d6015db0e	Add essential PATH for the Windows PyTorch loading process (#9920 ) Summary: Fixes #9818. It seems original Python doesn't add `[PYTHONPATH]\Library\bin` into `PATH`. We try to add it before dll loading process. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9920 Differential Revision: D9040825 Pulled By: soumith fbshipit-source-id: c07fff71b2aea254a396042ab677696f6829aac7	2018-07-29 08:23:59 -07:00
Anshul Jain (B*8)	56974a06b5	Revert D8909766: [caffe2] Simplify order switch operators Differential Revision: D8909766 Original commit changeset: 17a302d5bf4a fbshipit-source-id: 56c75a8ce27873ed1d5f194b9d6bf0049d8f21ba	2018-07-28 18:40:13 -07:00
rasbt	eee01731a5	Adds the default value for the amsgrad arg to the Adam docstring (#9971 ) Summary: Minor addition to the docstring of `torch.nn.optim.Adam`, adding the default argument description for the `amsgrad` argument to the docstring for concistency. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9971 Differential Revision: D9040820 Pulled By: soumith fbshipit-source-id: 168744a6bb0d1422331beffd7e694b9d6f61900c	2018-07-28 09:23:45 -07:00
Junjie Bai	b99492a507	Fix BlobStatRegistry HIP BlobStatGetter registration issue (#9973 ) Summary: This was introduced in #9826 following the corresponding cuda file context_gpu.cu file, tests have passed in the PR, at that point master was 94439d7df. However during the long landing process, a new master commit aebf3b4 has come in that removed the `CAFFE_KNOWN_TYPE(Tensor<HIPContext>)` in context_hip.cc file, which then has broken the HIP BlobStatGetter, and we did NOT run tests again during merge and so when #9826 later landed to master the rocm tests start breaking. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9973 Differential Revision: D9040671 Pulled By: bddppq fbshipit-source-id: f3b16cabaf681fc0535ca733db0b48430868f922	2018-07-28 02:23:40 -07:00
Huayu Li	46d8002800	Fix bug that always uses the same blob when repeating poolings Reviewed By: houseroad Differential Revision: D9027902 fbshipit-source-id: 957702ad9736812ec5aa32066d286c2c3adffc49	2018-07-28 00:09:16 -07:00
Wanchao Liang	47c1badf90	Fix the clamp special case and gradient problem on None, add None to JIT (#9596 ) Summary: Supersedes #8925 This PR fixes #8502, it fixes the gradients problem for clamp when passing None to the function, and add support for the NoneLiteral and NoneType in script to enable clamp tests. Now we could have corner cases like: ```python torch.jit.script def func(): x = torch.randn(3, 3, requires_grad=True) y = torch.clamp(x, None, 0) # max = 0 y = torch.clamp(x, min=None, max=0) ``` In both JIT and Aten, we use Scalar(NAN) as a sentinel value when passing None type to function clamp, this is the current way we used to support None type in JIT and to solve the gradient problem when user explicitly passing None into clamp. In JIT side, we create a tensor(NAN) and undefinedTensor if we encounter None when matching the function schema, and later in the interpreter, it will translate to Scalar(NAN) if needed. Ideally we don't need clamp_min and clamp_max in ATenNative/Autograd and could only support clamp after this change, but since bunch of other operators (e.g. Activation.cpp, Loss.cpp) is using clamp_min in several places, we will still have the functions available, but all python invocations will only call clamp instead of clamp_min/max (with calling underlying th_max/th_min in clamp). zdevito jamesr66a Pull Request resolved: https://github.com/pytorch/pytorch/pull/9596 Reviewed By: zdevito Differential Revision: D8940839 Pulled By: wanchaol fbshipit-source-id: c543a867b82e0ab8c99384773b173fdde2605d28	2018-07-27 22:54:33 -07:00
James Reed	851c18dd20	PyTorch File Format API (#9900 ) Summary: This is a follow-up to https://github.com/pytorch/pytorch/pull/9794 that contains only the serialization library and exposes a cleaner API. This should later be incorporated into the module export code Pull Request resolved: https://github.com/pytorch/pytorch/pull/9900 Reviewed By: zdevito Differential Revision: D9021057 Pulled By: jamesr66a fbshipit-source-id: 01af74a7fdd1b90b2f5484644c3121d8ba9eb3b3	2018-07-27 22:24:57 -07:00
JerryShih	d913db70f2	Handle the "spatial" attribute in onnx BatchNormalization op (#9492 ) Summary: If we have this "spatial" attribute and its value equals to 1, we could just remove this attribute and convert this op to caffe2 SpatialBN. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9492 Differential Revision: D8988165 Pulled By: houseroad fbshipit-source-id: a9218dc9cd5fab43deb371f290f81285f5283231	2018-07-27 22:09:15 -07:00
Jerry Zhang	bcba5a50d1	Fix EnforceFiniteOp Summary: att Reviewed By: kennyhorror Differential Revision: D9040248 fbshipit-source-id: 0da0f3b1ce51375731098cc86c92f35953be0861	2018-07-27 22:01:23 -07:00
Bram Wasti	ab4e209007	Back out "[caffe2][nomnigraph] Allow multiple ops.def and clean up code gen in general" Summary: Original commit changeset: 6fdf915c057a Reviewed By: yinghai Differential Revision: D9040008 fbshipit-source-id: 33fd5d4ddc0ec8cae56cf86f6d63b6f666e51a3e	2018-07-27 20:09:14 -07:00
Igor Milyakov	607688e928	Adding reciprocal operator and a test Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9908 Differential Revision: D9035809 Pulled By: virtan fbshipit-source-id: bce1db46fd55faeeab18a3b266d25c8beeb08df7	2018-07-27 18:24:43 -07:00
Lu Fang	ee827f6ba3	Fix a testcase in logsoftmax onnx export (#9660 ) Summary: We only support special case. The original dim is not supported by ONNX. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9660 Reviewed By: bddppq Differential Revision: D8965507 Pulled By: houseroad fbshipit-source-id: 021dffdf0489c2d3a50bfd1e0c4cfd00d4a3d776	2018-07-27 17:54:32 -07:00
Igor Milyakov	12a1af3731	Adding conv tests with explicit algo definition Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9798 Differential Revision: D9034663 Pulled By: virtan fbshipit-source-id: d722f25f1dd00231ccc3ad5960bbbef63af02c2d	2018-07-27 17:39:17 -07:00
Jerry Zhang	9eeb4e17af	Split gather op for easier smaller code size (#9916 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9916 att Differential Revision: D8961085 fbshipit-source-id: 39a9838647dc97611e77beb0607c4655de727ada	2018-07-27 17:15:33 -07:00
root	c3fe071483	Update hip files (#9826 ) Summary: The goal of this PR is to update the hip files to reflect relevant changes in cuda source files. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9826 Differential Revision: D9032840 Pulled By: bddppq fbshipit-source-id: 504e55c46308eebfee3c9a7beea1f294fe03470f	2018-07-27 16:54:39 -07:00
Norman Mu	a532c1a48c	Fix default argument value for CTCGreedyDecoder op (#9747 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9747 Currently the ctc_greedy_decoder op initializes the `merge_repeated` argument only if it has been provided by the user. Change to initialize in all cases. Reviewed By: houseroad Differential Revision: D8963635 fbshipit-source-id: 18955c7c26a77d9d7f5137e4dec085252ffabfeb	2018-07-27 16:33:07 -07:00
cclauss	eb9bb1f09a	Travis CI: Run flake on Python 2.7 and 3.7 (#9953 ) Summary: Flake8 will produce different results on Python 2 and 3. Python 3.7 has __async__ as a reserved word https://github.com/pytorch/pytorch/pull/4999. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9953 Differential Revision: D9035415 Pulled By: soumith fbshipit-source-id: 8a46e028a2e20a7e3f6d90137020268d65a7cc64	2018-07-27 14:43:26 -07:00
Sam Gross	829d763c69	Implement add, sub, mul, div using TensorIterator (#8919 ) Summary: ``` This adds TensorIterator, a helper class for computing element-wise operations that's intended to replace the CPU and CUDA apply utils functions. CPU kernels are implemented as functions that operate on strided 1-d tensors compared to CPUApplyUtils which operated individual elements. This allows the kernels to handle vectorization, while TensorIterator handles parallelization and non-coalesced dimensions. GPU kernels continue to operate on elements, but the number of specializations is reduced. The contiguous case remains the same. The non-contiguous case uses a single (reduced) shape for all operands and the fast integer division from THCIntegerDivider. To avoid extra specializations for indexing with 64-bits, large operations are split into smaller operations that can be indexed with 32-bits. Major semantic changes: - No more s_add, s_mul, s_div, or s_sub. Broadcasting is handled by TensorIterator. The autograd engine performs the reduction assuming standard broadcasting if the gradient shape does not match the expected shape. Functions that do not use standard broadcasting rules should either continue to trace the expand calls or handle the reduction in their derivative formula. - Use ONNX v7, which supports broadcasting ops. Performance impact: - Small increased fixed overhead (~0.5 us) - Larger overhead for wrapped numbers (~2.5 us) - No significant change for ops on contiguous tensors - Much faster worst-case performance for non-contiguous GPU tensors - Faster CPU bias addition (~2x) - Faster GPU bias addition (~30% faster) Future work: - Decrease overhead, especially for wrapping numbers in Tensors - Handle general inter-type operations - Extend to unary ops and reductions - Use buffering for compute-bound operations on non-contiguous tensors (pull in from CPUApplyUtils) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/8919 Differential Revision: D8677600 Pulled By: colesbury fbshipit-source-id: 61bc9cc2a36931dfd00eb7153501003fe0584afd	2018-07-27 14:43:24 -07:00
Owen Anderson	e3c4057b6c	Eliminate an extra lookup in the hashtable during CSE. (#9668 ) Summary: zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9668 Differential Revision: D8955185 Pulled By: resistor fbshipit-source-id: f3f929efc11be63850bd863679cc7b297c98d679	2018-07-27 14:43:22 -07:00
Christian Puhrsch	ef9801f32c	Merge THStorage into at::Storage Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9772 Reviewed By: ezyang Differential Revision: D9019375 Pulled By: cpuhrsch fbshipit-source-id: d5185e29747929d648e4260db4967452cd40f563	2018-07-27 13:53:55 -07:00
Owen Anderson	6ed41adb04	Use round-to-negative division when computing output sizes for convolutions involving striding and dilation. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9640 Differential Revision: D8948081 Pulled By: resistor fbshipit-source-id: 06f2e3ad1bdb448be6f36577cb9bd27c884df595	2018-07-27 13:22:54 -07:00
Wei Yang	8c0355c90d	convert lambd directly to scalar_t at hardshrink (#9919 ) Summary: - convert lambd directly to scalar_t instead of creating a tensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/9919 Differential Revision: D9026708 Pulled By: weiyangfb fbshipit-source-id: d20ab06ecc12aa972ee9d1323ee2f84abf8d5ffd	2018-07-27 13:22:52 -07:00
Adam Paszke	ce0b895a0c	Fix UBSAN error in ONNX peephole pass, make it more robust. Summary: Minor fix for a bug introduced by D9004285 Reviewed By: anderspapitto Differential Revision: D9028762 fbshipit-source-id: 9b9c5eef30e61d7ae19784e0418fa29bad2b5564	2018-07-27 12:38:56 -07:00
Thomas Viehmann	c77e4bc4d5	export tensor(ArrayRef, options) on Windows (#9904 ) Summary: I hope this helps me for the windows build failure in #9628 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9904 Differential Revision: D9026715 Pulled By: soumith fbshipit-source-id: bb97d41d060823f5a37bfc9a1659815b8b9f4eab	2018-07-27 12:14:52 -07:00
Jerry Zhang	aebf3b47ae	Remove template parameter from Tensor (#9939 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9939 Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13 Pull Request resolved: https://github.com/pytorch/translate/pull/166 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125 Closes https://github.com/pytorch/pytorch/pull/9125 Use inheritance for polymorphism, and remove template parameter This is to change the templating in call sites, the core implementations will change later Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are: 1. We added an extra argument DeviceType to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)), 2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided. 3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type 4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s. Reviewed By: ezyang, houseroad Differential Revision: D9024330 fbshipit-source-id: e0b8295d2dc6ebe2963383ded5af799ad17164ba	2018-07-27 10:56:39 -07:00
Lu Fang	94439d7df4	Suppress the vptr warning in ubsan (#9909 ) Summary: Unblock https://github.com/pytorch/pytorch/pull/8469 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9909 Differential Revision: D9023650 Pulled By: houseroad fbshipit-source-id: 7682a9cd7905e98c802b820ad59745672b32970d	2018-07-27 10:28:07 -07:00
Gregory Chanan	c0bacc6284	Guard test_lapack_empty with has_magma. (#9936 ) Summary: CUDA lapack functions generally don't work unless has_magma is true. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9936 Differential Revision: D9028579 Pulled By: gchanan fbshipit-source-id: 9b77e3b05253fd49bcabf604d0924ffa0e116055	2018-07-27 10:09:00 -07:00
Eugene Vorontsov	bf32ea8094	Fix dimension check in 1D instance norm, allowing 2D tensors alongside 3D. (#9924 ) Summary: Fixes #9776. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9924 Differential Revision: D9028328 Pulled By: soumith fbshipit-source-id: d5f22abb2be83b34aee95ebe144c97519a6854f8	2018-07-27 09:24:07 -07:00
Gregory Chanan	d3ba9a173e	Handle case where THC btrifact doesn't zero info. (#9907 ) Summary: This was showing up in the n-dimensional empty tests as flaky because it's reading uninitialized cuda memory. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9907 Differential Revision: D9021413 Pulled By: gchanan fbshipit-source-id: 31542b7597919df9afd6e528bb108a4a3e8eaf60	2018-07-27 09:11:44 -07:00
Gregory Chanan	1af1b0c2a5	Remove THTensor::_dim, temporarily remove THTensor_nDimension. (#9895 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9895 The primary goal here was to remove THTensor::_dim, which isn't part of the API moving forward. Instead, we provide 3 options for getting the dimensionality (this is temporary although non-trivial to remove!): ``` nDimension corresponds to the "true" ATen dimension. TODO: implement. nDimensionLegacyNoScalars correpsonds to the ATen dimension, except scalars are viewed as 1-dimensional tensors. nDimensionLegacyAll corresponds to the ATen dimension, except scalars are viewed as 1-dimensional tensors and tensors with a dimension of size zero are collapsed to 0-dimensional tensors. ``` So in this patch, nDimension -> nDimensionLegacyNoScalars and _dim/_nDimension goes to nDimensionLegacyAll. These are just codemods. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9835 Reviewed By: ezyang Differential Revision: D8999338 Pulled By: gchanan fbshipit-source-id: a4d676ac728f6f36ca09604a41e888d545ae9311	2018-07-27 08:56:38 -07:00
Gregory Chanan	bc66d98248	Fix narrow on empty tensors after negative size support. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9838 Differential Revision: D9002345 Pulled By: gchanan fbshipit-source-id: 13f4bacff94d9d0ea31a3b73a75b9b3e774eabf5	2018-07-27 07:55:20 -07:00
Changmao Cheng	7b375ed362	fix ParameterDict doc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9918 Differential Revision: D9026402 Pulled By: soumith fbshipit-source-id: d0459dcda631e8921ab39725b9045e03960da5c9	2018-07-27 01:10:50 -07:00
tomguluson92	a709f23225	revise a little spell mistake in tensor.py (#9868 ) Summary: Hello! I just find a small spell mistake while reading this source code. Just PR it, Thx! Pull Request resolved: https://github.com/pytorch/pytorch/pull/9868 Reviewed By: gchanan, ezyang Differential Revision: D9016030 Pulled By: soumith fbshipit-source-id: fc3877177be080adbdbda99a169e401691292ebb	2018-07-27 00:55:03 -07:00
Junjie Bai	4a192bcc3d	Rename onnx integration tests file to avoid confusion Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9913 Differential Revision: D9026787 Pulled By: bddppq fbshipit-source-id: a3e7e79973abc4f5fe163f3e86b24382a1efd082	2018-07-26 23:40:41 -07:00
Adam Paszke	8cb1eef7b9	Unify IR operator representation (stop using attributes in the JIT) (#9807 ) Summary: Based on top of #9763 (first 3 commits belong to that PR). The first commits from this PR are "Stop using attributes ..." I tried to separate the changes into fairly meaningful commits. I can't split them up into smaller PRs, because everything starts working and all tests pass only after the whole sequence, but hopefully this will make reviewing somewhat easier. Known issues/regressions/future tasks: - `aten::lerp` and `aten::clamp` are no longer fusable - `CreateAutodiffSubgraphs` needs a rewrite - It is much more strict now, and will miss a lot of opportunities, especially when viewing ops are involved. Our previous approach was "ignore the assumption on shape availability in gradient formulas to determine differentiability, and hope that shape prop will be robust enough to actually deliver them before we differentiate", which obviously doesn't scale well to more complex cases. We should either work on reducing the size dependency of grad formulas (feasible e.g. for `view`/`reshape`, unfeasible for `squeeze`/`unsqueeze`), or make `CreateAutodiffSubgraphs` integrate some kind of "I could integrate this node into an AD subgraph, but will I be able to infer the shape of its input" reasoning (kind of like a limited shape prop, that doesn't infer anything, and only tells if it could infer something). - It sometimes creates constant-only (or constants + one node) graphs, which is useless - Broken `aten::add` in auto-batching, because it gained a non-tensor input. I changed the test for pointwise operations to use `aten::mul` instead, but I needed to disable the LSTM cell test. I'm not sure how scalar constants should be implemented in this case, because I don't fully understand our format. cc: ChunliF - Graph import does some hacks to recover type of constants. This code should be removed once we'll gain the ability to export the IR along with value types. - There's still a fair amount of dead code that can be removed. I didn't want to make this diff any bigger, and removing it is an easy task. - Graph fuser could be improved to use signature matching (possibly using `OperatorSet`) instead of basing on node kinds. - Manual constant propagation for the `ListConstruct` node in `torch/onnx/utils.py` should be replaced with a proper constant propagation pass (or we should ensure that the one we have handles at least this case before we remove this code). zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9807 Reviewed By: ezyang Differential Revision: D9004285 Pulled By: apaszke fbshipit-source-id: fe88026a765f6b687354add034c86402362508b7	2018-07-26 22:11:50 -07:00
Vignesh Ramanathan	2c1d9e09b8	Support UINT8 for addition data in ImageInputOp (#9901 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9901 Added support for UINT8 datatype for additional data (prefetching and output) by ImageInputOp Reviewed By: ashwinb Differential Revision: D9018964 fbshipit-source-id: f938a8a072c15c0ee521b2f16788c024b08cd37f	2018-07-26 22:11:46 -07:00
James Sun	aa671ddefa	Support production models with predictor benchmark (#9855 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9855 Support production models with predictor benchmark Two new flags are added: `--update_prod`: pull production data (netdef, input types, input dims) from Hive and store locally `--use_prod`: run benchmark with local production data with the same workload as in production. By default, 300 models will be loaded. production vs benchmark avg net run time: (collected by prod: https://fburl.com/scuba/6lb91zfx and bench: https://fburl.com/ngjj1dc8) prod: `408us` vs bench: `543us` (With prod data distribution, this should be even closer) framework overhead (as of 2018-07-22): prod: ``` 9.111% BlackBoxPredictor::Run 4.602% SimpleNet::Run 2.377% Operator::Run 1.786% BlackBoxPredictor::AllocateMemory 1.372% Observable::StartAllObservers 1.358% Observable::StartObserver 1.206% Blob::GetMutable ``` bench: ``` 8.577% BlackBoxPredictor::operator() 3.276% SimpleNet::Run 1.954% Operator::Run 1.697% BlackBoxPredictor::AllocateMemory 1.477% Tensor::ShareData 1.230% Blob::GetMutable 1.034% Observable::StartObserver ``` Reviewed By: yinghai Differential Revision: D8942996 fbshipit-source-id: 27355d7bb5a9fd8d0a40195261d13a97fa24ce17	2018-07-26 21:39:29 -07:00
David Brownell	eb33887816	Addressed issue identified by static code analysis: potential buffer … (#9889 ) Summary: …overrun Pull Request resolved: https://github.com/pytorch/pytorch/pull/9889 Differential Revision: D9026278 Pulled By: soumith fbshipit-source-id: ee2ee255f34731ddc581261984c3caf56faa0e12	2018-07-26 21:09:51 -07:00
Vishwak Srinivasan	e41eb43327	Remove deprecated masked_copy (#9819 ) Summary: No tests are affected by this removal. Closes https://github.com/pytorch/pytorch/issues/1885 and closes #9817 While I was at it, I also fixed #9876 . Pull Request resolved: https://github.com/pytorch/pytorch/pull/9819 Differential Revision: D9018126 Pulled By: SsnL fbshipit-source-id: a9142bf4e2403bef05779a097f61fa8b7db04b71	2018-07-26 20:55:18 -07:00
Owen Anderson	a841006353	Simplify some code by directly constructing unordered_set from nodes. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9675 Differential Revision: D8952196 Pulled By: resistor fbshipit-source-id: 5ef2308fed9f702021f650cf2d241a83d880d359	2018-07-26 19:54:38 -07:00
Yi Cheng	dfa0af093d	Move predictor into caffe2/caffe2/predictor (#9548 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9548 Pull Request resolved: https://github.com/pytorch/translate/pull/157 One part of refactor predictor. Move all the files into predictor dir. Reviewed By: highker Differential Revision: D8845276 fbshipit-source-id: 1e917464b0c8a042f025128a082c784eaa3b7013	2018-07-26 19:03:40 -07:00
Sam Gross	c045e969b6	Use qualified name at::Half in Dispatch.h (#9848 ) Summary: This makes AT_DISPATCH_ALL_TYPES_AND_HALF valid outside of the at namespace. See https://github.com/pytorch/extension-cpp/issues/15 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9848 Differential Revision: D9006921 Pulled By: colesbury fbshipit-source-id: a6e4f097a9d6fb85c921e1c9b9ea25d0f2db06dc	2018-07-26 19:03:24 -07:00
Jongsoo Park	e7ab093d93	Simplify order switch operators (#9581 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9581 Mostly to simplify code. Should also improve performance but order switch ops don't take much time anyway. Reviewed By: viswanathgs Differential Revision: D8909766 fbshipit-source-id: 17a302d5bf4aba2755d88223fc01a41fd72c5919	2018-07-26 18:24:29 -07:00
Wanchao Liang	b7b61a8eb4	Change expect, cast on Type to return shared pointers, make isSubtypeOf accept TypePtr (#9786 ) Summary: Follow up task of #9584. Commit 1: - change expect/cast to return shared pointers instead of raw pointer - isSubtypeOf accept TypePtr instead. Use `x->isSubtypeOf(NumberType::get())` rather than `x->isSubtypeOf(*NumberType::get())` Commit 2: - to address enable_shared_from_this pitfalls, we make the constructor private and expose the factory method to make sure user can only create it using our factory method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9786 Reviewed By: zdevito Differential Revision: D8980441 Pulled By: wanchaol fbshipit-source-id: e5c923fc57a701014310e77cf29985b43bb25364	2018-07-26 18:09:45 -07:00
Ailing Zhang	9df9c46992	fix loading 1dim tensor from 0.3.* to 0dim tensor (#9781 ) Summary: This PR fixes #9743 . Adding backward support when loading a checkpoint from 0.3.* with 1dim tensor, they are now 0 dim tensor in 0.4+. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9781 Differential Revision: D8988196 Pulled By: ailzhang fbshipit-source-id: a7a1bc771d597394208430575d5a4d23b9653fef	2018-07-26 17:09:41 -07:00
Gregory Chanan	d65c667f28	Avoid divide-by-zero when hamming_window window length is 0. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9896 Reviewed By: ezyang Differential Revision: D9018572 Pulled By: gchanan fbshipit-source-id: fa314687973124165bffb3084932d8ab6d872a93	2018-07-26 15:56:44 -07:00
Fei Sun	d1260d26fe	Sleep before run (#9891 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9891 Add an argument to benchmark binary to specify the seconds to sleep before the run and after the warmup. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9880 Reviewed By: llyfacebook Differential Revision: D9014254 Pulled By: sf-wind fbshipit-source-id: d5566186c8ed768f1e170e9266c5f2d6077391e0	2018-07-26 14:39:17 -07:00
Norman Mu	18a6541b82	Create IDEEP fallback operators for ctc decoder ops (#9847 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9847 CTCBeamSearchDecoder and CTCGreedyDecoder do not currently support IDEEP execution. Add fallback operators to allow IDEEP execution of models that use these operators. Reviewed By: yinghai Differential Revision: D9006234 fbshipit-source-id: fc539ba67b07d1f960d28564d8adde0be8690649	2018-07-26 14:09:11 -07:00
Jerry Zhang	969b62f276	Revert D8121878: Remove template parameter from Tensor Differential Revision: D8121878 Original commit changeset: 4a5e9a677ba4 fbshipit-source-id: d8e2c0bb145b52fbcca323b22d1d3346f0b3249e	2018-07-26 14:02:04 -07:00
Junjie Bai	456f41301c	Disable unique ops test on rocm (#9892 ) Summary: Somehow we have Unique operator tests in two places test_unqiue_ops.py and hypothesis_test.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/9892 Reviewed By: houseroad Differential Revision: D9017631 Pulled By: bddppq fbshipit-source-id: 1f9e40e4953afca26141ef4581202b9b9fce0ae9	2018-07-26 13:10:23 -07:00
zou3519	1dc708493e	Add html-stable target to docs Makefile (#9884 ) Summary: This lets one build docs for the release easier. All of the unstable warnings are removed in `make html-stable`. cc soumith SsnL Sample build: ![image](https://user-images.githubusercontent.com/5652049/43277115-05e2f720-90d5-11e8-9977-b0b4a6ee4b8e.png) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9884 Reviewed By: SsnL Differential Revision: D9016001 Pulled By: zou3519 fbshipit-source-id: 5cf2dfbf886de993242db28cdac5d0c5fadbdc4d	2018-07-26 12:09:06 -07:00
Junjie Bai	0c84a5c27e	Pass shape infos to ONNX -> Caffe2 C++ conversion backend (#9870 ) Summary: And let Gemm conversion to inspect the input `C` to try converting to FC. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9870 Reviewed By: houseroad Differential Revision: D9013198 Pulled By: bddppq fbshipit-source-id: b4c509cfccca238262e1c406b004e66cef256321	2018-07-26 12:00:32 -07:00
Adam Paszke	e39c8043dc	Make GraphExecutors work on Stacks instead of variable_tensor_lists (#9763 ) Summary: This is blocking the IR operator unification, because I need to be able to pass scalars to backward functions. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9763 Reviewed By: zou3519 Differential Revision: D8978457 Pulled By: apaszke fbshipit-source-id: 570b4c3409322459cb0f2592069730a7d586ab20	2018-07-26 12:00:27 -07:00
Junjie Bai	6f10944f88	Re-enable rocm tests that have been fixed in rocm 1.8.2 (#9862 ) Summary: petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/9862 Differential Revision: D9012520 Pulled By: bddppq fbshipit-source-id: cdcc184e23befa8dbd1bc44d59bd25766aac33d0	2018-07-26 10:54:57 -07:00
Gregory Chanan	716f7d657d	Remove Broadcast.py. (#9843 ) Summary: I don't think this file is used anywhere, I guess we'll find out! (Weirdly this failed lint on one of my PRs even though it shouldn't). Pull Request resolved: https://github.com/pytorch/pytorch/pull/9843 Differential Revision: D9003949 Pulled By: gchanan fbshipit-source-id: 26d580d1e7cdd30e82e5f4176244e51fd7cd616d	2018-07-26 10:44:24 -07:00
Jerry Zhang	cd5adc7b5f	Remove template parameter from Tensor (#13 ) Summary: Pull Request resolved: https://github.com/facebookresearch/weakly-supervised-action-detection/pull/13 Pull Request resolved: https://github.com/pytorch/translate/pull/166 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9125 Closes https://github.com/pytorch/pytorch/pull/9125 Use inheritance for polymorphism, and remove template parameter This is to change the templating in call sites, the core implementations will change later Before Caffe2 Tensor class was compile-time fixed to bind to a particular device/context. With this change, we're making it a runtime property (stored inside the tensor), but preserve the same semantics. For example, one has to specify device type in order to create a Tensor - there are no uninitialized tensors. More specifically the changes are: 1. We added an extra argument DeviceType to most of the constructors of the tensor, e.g. (Tensor(DeviceType type)), 2. Semantics of constructor Tensor(const Tensor<SrcContext>& src, ContextForCopy* context); is changed, in this constructor, the second context is passed in to enable us to call the templated Copy function, it could be in a different context as source and target previously, now we'll enforce that the context should have same device type as src, if it is provided. 3. To preserve 'get-or-construct' semantics of Blob, we added specialized getter Blob::GetMutableTensor that verifies both that Blob contains a Tensor and that it's of a correct type 4. Specifically, Tensor type is not default-constructible any more (as we don't have unknown device tensors) and thus some of the code handling STL containers needs to change Note: Some changes are postponed just to keep this diff a bit smaller. Please see `TODO`s. Reviewed By: xw285cornell Differential Revision: D8121878 fbshipit-source-id: 4a5e9a677ba4ac82095df959851a054c81eccf81	2018-07-26 10:25:23 -07:00
Wei Yang	2c7e7e37a6	Corrected doc in class RNNCell (#9866 ) Summary: fixes #9642 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9866 Differential Revision: D9012131 Pulled By: weiyangfb fbshipit-source-id: d2849b1a50234dbdb335dffab4835c9de85183c3	2018-07-26 09:27:05 -07:00
Junjie Bai	bdbbcf068a	Temporarily disable test_unique on rocm since it keeps running into segfault (#9872 ) Summary: petrex https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3758/console https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3757/console https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-clang3.8-rocm1.7.1-ubuntu16.04-test/3752/console Pull Request resolved: https://github.com/pytorch/pytorch/pull/9872 Reviewed By: ezyang Differential Revision: D9013335 Pulled By: bddppq fbshipit-source-id: 80490a0fd4a86aa9c8454378c0edddc57d135c4e	2018-07-26 08:34:00 -07:00
Ashish	e70fc145a9	MIOpen fixes for Caffe2 (#9842 ) Summary: The PR contains: Fixes for running MIOpen conv operator in a multi worker scenario, along with a performance fix Fixing a typo in MIOpen pool op and adding some extra checks for MIOpen spatial BN op bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/9842 Differential Revision: D9012512 Pulled By: bddppq fbshipit-source-id: 270e1323c20fbfbc4b725f9a4ff34cd073ddaaa8	2018-07-26 02:42:26 -07:00
Junjie Bai	3be8e4db51	Do not run ONNX integration tests in parallel Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9861 Differential Revision: D9011458 Pulled By: bddppq fbshipit-source-id: 7ab1b1763d56f1290ade7a99682ad461c97f807b	2018-07-25 21:54:29 -07:00
Junjie Bai	997f46d1e1	Disable "filter too much" health check for fc operator tests (#9865 ) Summary: makes the CI flaky Pull Request resolved: https://github.com/pytorch/pytorch/pull/9865 Differential Revision: D9011882 Pulled By: bddppq fbshipit-source-id: 5124ab97d258eed7585734d64fb01e5df98abd0d	2018-07-25 21:41:14 -07:00
Marat Dukhan	ba062e7da9	Update OnnxifiOp according to onnx/onnx#1224 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9844 Reviewed By: yinghai Differential Revision: D9004222 Pulled By: bddppq fbshipit-source-id: 1bdcefc0dfbd5e3422217b5254b2462e5a568d2a	2018-07-25 19:29:38 -07:00
Edward Yang	5e4de0821a	Set ROCm MAX_JOBS=4 (#9856 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9856 Differential Revision: D9009100 Pulled By: ezyang fbshipit-source-id: 28f34128fcb7c3d6a115884bf28dc2a6bde5aed6	2018-07-25 19:09:41 -07:00
Edward Yang	6cd0174ff5	Reimplement localScalar as a native function. (#9762 ) Summary: I split it into two parts, _local_scalar and _local_scalar_dense (unchecked) so I could reuse the sparse logic in both paths. _local_scalar became a method on Tensor to work around a circular include problem. This is resurrected copy of #9652 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9762 Differential Revision: D8972348 Pulled By: ezyang fbshipit-source-id: 2232dbfc8e1286b8a4a1c67d285c13a7771aad4c	2018-07-25 19:09:39 -07:00
Edward Yang	ad47228020	Test pinning Hypothesis 3.59.0 (#9830 ) Summary: We think this will band-aid some of the new Caffe2 test failures. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9830 Differential Revision: D9008052 Pulled By: ezyang fbshipit-source-id: 84f1c0faea429d758d760965d6cbfe9e4c72eb19	2018-07-25 18:11:10 -07:00
Edward Yang	b84b78a69d	Fix the ROCM build, and enable sccache for it Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9841 Differential Revision: D9008030 Pulled By: ezyang fbshipit-source-id: 51cac3c75fc52658b22a10a6bf8a479bcf803fb2	2018-07-25 17:55:47 -07:00
James Reed	0b16b03b98	Plumb type annotations through script compilation (new) (#9547 ) Summary: Supersedes https://github.com/pytorch/pytorch/pull/9405 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9547 Reviewed By: zdevito Differential Revision: D8900327 Pulled By: jamesr66a fbshipit-source-id: a00a94615af4fbaec98ee3ede0cb54bcfd9108dd	2018-07-25 17:10:14 -07:00
Xiaomeng Yang	445c17d492	Update CopyMatrix in math (#9792 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9792 Update CopyMatrix in math Reviewed By: houseroad Differential Revision: D8982421 fbshipit-source-id: da2056306cde3300124b21eba7a6c2d113111002	2018-07-25 16:10:52 -07:00
Duc Ngo	74ac5265d1	nomnigraph - make use of nodeIterator (#9831 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9831 Follow up to D8980903 - replace dataIterator with nodeIterator where the data isn't used. Reviewed By: pjh5 Differential Revision: D8998351 fbshipit-source-id: c333847ecd8b6d8075352322845839b94a63aecc	2018-07-25 15:40:44 -07:00
Wei Yang	302adb7cc8	added torch.rot90() to ATen (#8628 ) Summary: 1. fixes #6271 2. implemented torch.rot90() following [numpy.rot90()](`6a58e25703/numpy/lib/function_base.py (L54-L138)`) Pull Request resolved: https://github.com/pytorch/pytorch/pull/8628 Reviewed By: ezyang Differential Revision: D8987860 Pulled By: weiyangfb fbshipit-source-id: 8dac3b2a1f6d3288672977aba8b547706ce97fe9	2018-07-25 15:11:44 -07:00
Gregory Chanan	2f5c0c30cd	Make logsumexp work with empty tensors again. (#9825 ) Summary: https://github.com/pytorch/pytorch/pull/9755 broke this, but it was only tested if size zero dims were turned on (it can still happen even if that isn't turned on, because we support size [0] tensors). Pull Request resolved: https://github.com/pytorch/pytorch/pull/9825 Differential Revision: D8997303 Pulled By: gchanan fbshipit-source-id: 911dce112f73fad0f3980a7f4f9423df0f2d923d	2018-07-25 13:41:24 -07:00
Edward Yang	4b0098f3ae	Add --allow-change-held-packages to make nccl2 install in docker work (#9828 ) Summary: This was used to build Caffe2 Docker version 170. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9828 Differential Revision: D8997808 Pulled By: ezyang fbshipit-source-id: f48938b2b71bc86578c9d9b46c281ed05478724e	2018-07-25 11:56:40 -07:00
James Reed	279b836675	Add some user-friendly checks in pack padded symbolic to ensure thing… (#9731 ) Summary: …s are the right type Pull Request resolved: https://github.com/pytorch/pytorch/pull/9731 Reviewed By: soumith Differential Revision: D8958693 Pulled By: jamesr66a fbshipit-source-id: 7db1f86a85188fd2c84d0edaaaac6a096d64ba52	2018-07-25 11:25:42 -07:00
Gregory Chanan	be163f50a3	Avoid divide-by-zero when bartlett_window size is 0. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9788 Differential Revision: D8980951 Pulled By: gchanan fbshipit-source-id: 429b341ac687afe4f1429bb141ef070bf315519c	2018-07-25 10:40:39 -07:00
Gregory Chanan	56fbfee872	Remove ifdef __cplusplus from THTensor.h, have cpp self-contained in … (#9775 ) Summary: …THTensor.hpp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9775 Differential Revision: D8977140 Pulled By: gchanan fbshipit-source-id: d6d2461f7cb0511ee1def52ac1032a86349a7105	2018-07-25 10:25:17 -07:00
Tongzhou Wang	a7f183f971	Revert "Fix dataloader hang when it is not completely iterated (#9655 )" (#9804 ) Summary: This reverts commit 9ee513365121cd387e11987c66db6599ac53ded7. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9804 Reviewed By: ezyang Differential Revision: D8987780 Pulled By: SsnL fbshipit-source-id: 75ad70b0b8d672d0b35235fa248b187be64b68e5	2018-07-25 10:10:30 -07:00
Yuan Xie	c14e17eced	Co-disitillation with different archs and/or feature set (#9793 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9793 Enable co-distillation with different archs Reviewed By: pjh5 Differential Revision: D8888479 fbshipit-source-id: eac14d3d9bb6d8e7362bc91e8200bab237d86754	2018-07-25 10:10:27 -07:00
bhushan	ea67a2bd11	Allows negative index to tensor.narrow (Fixes: #9546 ) Summary: Fixes #9546 Test cases added Reviewed By: ezyang Differential Revision: D8974842 Pulled By: zou3519 fbshipit-source-id: a7707406c2a21e8e14f9c2a8ad4d64c8b08156df	2018-07-25 09:25:45 -07:00
Gregory Chanan	0853d13f86	Move scalar boolean to THTensor, rename scalar in this context to zer… (#9783 ) Summary: …o dim. Manifest: 1) The scalar boolean is now in THTensor, although it isn't hooked up at the TH level yet. 2) setScalar is gone, everything now goes through the maybeScalar equivalent (which is renamed) 3) all "scalars" in this context now refer to "zero_dim" in order to differentiate this concept from the "Scalar" class. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9783 Differential Revision: D8978911 Pulled By: gchanan fbshipit-source-id: f09254be4bebad0e4c510fefe4158b4f7e92efe1	2018-07-25 09:25:41 -07:00
Duc Ngo	8825e323b5	nomnigraph - Add way to check if a NodeRef is in a graph, and make a graph node iterator (#9790 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9790 - Add way to check if a NodeRef is in a graph - Make a nodeIterator (similar to dataIterator) but only iterate through nodes. Reviewed By: bwasti Differential Revision: D8980903 fbshipit-source-id: b20504a46715858752e25242303125a15a709b88	2018-07-25 09:02:13 -07:00
Jorghi12	42a4747389	Temporarily need this to prevent sccache from breaking. (#9810 ) Summary: Temporarily need this to prevent sccache from breaking when I move sccache install to the DockerFile. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9810 Differential Revision: D8991684 Pulled By: Jorghi12 fbshipit-source-id: 14cd0278f53a72372f9bbe27b228980f8d3c1d4a	2018-07-25 09:01:58 -07:00
caoxudong	a74a3fdeb6	typo fix, tutorials url with http protocol is not valid (#9812 ) Summary: The tutorials url with http is not valid, replacing it with https. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9812 Differential Revision: D8991344 Pulled By: ezyang fbshipit-source-id: c12faa57905b50eadc320f9938c39c4139bd093b	2018-07-25 07:54:26 -07:00
sethah	3ef521e98a	Implement backward for torch.symeig (#8586 ) Summary: Partially fixes https://github.com/pytorch/pytorch/issues/6890. (backward pass for non-symmetric eigen-decomposition is not implemented in other packages, e.g. autograd, mxnet, tensorflow, presumably because the eigenvalues can be imaginary for the general case, and AFAIK we cannot support complex numbers). This patch adds a backward function for the symmetric eigen-decomposition function `torch.symeig`. The formula used is taken from [here](http://eprints.maths.ox.ac.uk/1079/1/NA-08-01.pdf). Unit tests are added to verify correctness. There is still one outstanding issue, which is how to handle the case where the `symeig` is called with `eigenvectors=False`. In this case, the eigenvectors are returned as a zero tensor, but the backward computation for the eigenvalues depends on the eigenvectors. There was a previous attempt to implement this in https://github.com/pytorch/pytorch/pull/2026, where apaszke mentioned that the `eigenvectors` argument should be overridden so that they are saved for the backwards pass. The forward code is autogenerated, though, and it isn't clear to me how that would be done. I'd appreciate any guidance. For now, there is a unit test that will fail until that issue is resolved. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8586 Reviewed By: ezyang Differential Revision: D8872760 Pulled By: SsnL fbshipit-source-id: 76614495d0f9c118fec163a428f32e5480b4d115	2018-07-25 07:16:10 -07:00
Edward Yang	0262fd0f91	Delete Tensor::typeString() (#9764 ) Summary: The primary use-site of typeString was checked_cast_tensor. I did a little more than I needed in this patch, to set the stage for actually deleting the tensor type. Specifically, I modified checked_cast_tensor to explicitly take Backend and ScalarType, the idea being that once we remove the tensor subclasses, we will delete the T template parameter. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9764 Differential Revision: D8969196 Pulled By: ezyang fbshipit-source-id: 9de92b974b2c28f12ddad13429917515810f24c6	2018-07-24 22:26:15 -07:00
Edward Yang	723a600ebd	Update for new incremental build instructions Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9773 Differential Revision: D8988285 Pulled By: ezyang fbshipit-source-id: c2c3b7cefb54e4e18602b180281f22939293a383	2018-07-24 22:26:13 -07:00
Tony Duan	bca10ad706	Implementation of Weibull distribution (#9454 ) Summary: This implements the two-parameter Weibull distribution, with scale $\lambda$ and shape $k$ parameters as described on [Wikipedia](https://en.wikipedia.org/wiki/Weibull_distribution). Details - We implement as a transformed exponential distribution, as described [here](https://en.wikipedia.org/wiki/Weibull_distribution#Related_distributions). - The `weibull_min` variance function in scipy does not yet support a vector of distributions, so our unit test uses a scalar distribution instead of a vector. Example of the bug: ``` >>> sp.stats.expon(np.array([0.5, 1, 2])).var() # fine array([1., 1., 1.]) >>> sp.stats.weibull_min(c=np.array([0.5, 1, 2])).var() # buggy Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 490, in var return self.dist.var(self.args, self.kwds) File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1242, in var res = self.stats(args, **kwds) File "/usr/local/lib/python3.7/site-packages/scipy/stats/_distn_infrastructure.py", line 1038, in stats if np.isinf(mu): ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all() ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9454 Differential Revision: D8863574 Pulled By: SsnL fbshipit-source-id: 1ad3e175b469eee2b6af98e7b379ea170d3d9787	2018-07-24 20:40:15 -07:00
Siddharth Goyal	4b61760738	Add Adadelta optimizer to caffe2 (#9088 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9088 Closes https://github.com/pytorch/pytorch/pull/9088 - Added CPU/GPU implementations of Adadelta and SparseAdadelta. - Added corresponding Python unittests Reviewed By: BIT-silence Differential Revision: D8712169 fbshipit-source-id: 544e99e13b230a919672a7341b3715d64597c0be	2018-07-24 20:09:21 -07:00
Anders Papitto	620952117e	remove unnecessary -Wno= flags Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9608 Differential Revision: D8946664 Pulled By: anderspapitto fbshipit-source-id: b05f10af58da25b2a2588f7153f393bb3637f29a	2018-07-24 18:40:42 -07:00
Jesse Hellemn	9cf76cfb4c	Chaning conda build script to use current python version Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9780 Reviewed By: ml7 Differential Revision: D8983501 Pulled By: pjh5 fbshipit-source-id: 79208796247433cbe271a2d06f66254587d96f80	2018-07-24 18:40:40 -07:00
Peter Goldsborough	f62bc01dfe	Remove TORCH_ASSERT (#9575 ) Summary: I got some tensor->variable conversion exceptions from `torch/csrc/autograd/variable.h`, which used the `TORCH_ASSERTM` macros instead of `AT_CHECK`, so they didn't have backtraces. This was such a substantial loss for debugability that I decided to update the whole codebase to use the backtrace-enabled ATen macros instead of `TORCH_ASSERT` and `JIT_ASSERT`, the latter having been an alias of the former. ezyang apaszke zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9575 Differential Revision: D8924566 Pulled By: goldsborough fbshipit-source-id: 7a4013b13eec9dbf024cef94cf49fca72f61d441	2018-07-24 18:10:06 -07:00
Sebastian Messmer	d2610fb379	Constexpr Type Ids -> 6.5% caffe2 perf improvement (#9603 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9603 Using constexpr for some heavily queried type ids gives us a 6.5% perf improvement for caffe2. Benchmark results: P59829647 Also ad canaries (but they don't show a significant difference): - adfinder: - https://our.intern.facebook.com/intern/ads/canary/411346509423301481 - https://our.intern.facebook.com/intern/ads/canary/411346563021753557 - adindexer: - https://our.intern.facebook.com/intern/ads/canary/411346517006038367 - https://our.intern.facebook.com/intern/ads/canary/411346571387258927 - multifeed_predictor: - https://our.intern.facebook.com/intern/ads/canary/411346526631282941 - https://our.intern.facebook.com/intern/ads/canary/411346583141009531 Reviewed By: dzhulgakov Differential Revision: D8841577 fbshipit-source-id: 1a0ce7f2bee1ae54b723caefe5bc7f85a20935b4	2018-07-24 17:24:55 -07:00
Keren Zhou	6c6a353a66	Fix speedbenchmark bug (#9770 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9770 Add zero ops to operators that do not have a valid schema Reviewed By: hlu1 Differential Revision: D8957472 fbshipit-source-id: d8d0a351183e88ace2e050a87c1e1c363af67e33	2018-07-24 17:10:37 -07:00
onnxbot	d7d673b68d	Updata onnx to lastest master (#9782 ) Summary: `52d40befa7` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9782 Reviewed By: yinghai, houseroad Differential Revision: D8978668 Pulled By: bddppq fbshipit-source-id: 238f76a36784c12cc5655a2ee059f7e0169c0bb6	2018-07-24 14:42:01 -07:00
Junjie Bai	e5fe66d7ea	Add support for specifying device_option in Functional (#9619 ) Summary: e.g. ``` Functional.Add(x, y, device_option=DeviceOption(HIP, 0)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9619 Differential Revision: D8966599 Pulled By: bddppq fbshipit-source-id: 22235e42f19278e79802642798bf0ee70a1202f6	2018-07-24 14:41:59 -07:00
Tongzhou Wang	37fc58f1d3	Use torch::empty before random_ on seed gen Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9769 Reviewed By: goldsborough Differential Revision: D8977636 Pulled By: SsnL fbshipit-source-id: c2437d5ef53dc74e1b17eb16e728e1d67ae314c7	2018-07-24 14:41:58 -07:00
Peter Goldsborough	f393df774b	Test case for c10d DDP (#9670 ) Summary: Before I can rewrite portions of the c10d DDP in C++ I need proper tests in place to make sure I am not breaking anything as I port code. There were no tests for the c10d DDP in place so I wrote some. I refactored the c10d tests to derive some tests cases from a general `MultiGPUTestCase` and followed lots of patterns from `test_distributed.py` w.r.t. how tests are skipped (such that the main process doesn't initialize CUDA, which I found is a super important detail!!!). I am largely unfamiliar with this code so feel free to scrutinize. The DDP test code itself is also largely taken from `test_distributed.py` but more inlined which I find easier to read. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9670 Differential Revision: D8977724 Pulled By: goldsborough fbshipit-source-id: 186eab38a72384d7992a2ec5c89f304ad42d5944	2018-07-24 14:10:24 -07:00
Gregory Chanan	e26d584445	Remove isScalar() from TensorImpl. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9765 Differential Revision: D8969474 Pulled By: gchanan fbshipit-source-id: 42002b129488179affc919dba877de5a4e8f9fb5	2018-07-24 12:55:06 -07:00
Thomas Viehmann	7050d83dd7	Make logsumexp_out inplace (#9755 ) Summary: Fixes: #9754 Maybe this could also make its way into 0.4.1, it is a severe debugging headache if you hit this... Pull Request resolved: https://github.com/pytorch/pytorch/pull/9755 Reviewed By: ezyang Differential Revision: D8967178 Pulled By: zou3519 fbshipit-source-id: 151ed24e3a15a0c67014e411ac808fb893929a42	2018-07-24 12:40:48 -07:00
Vishwak Srinivasan	360c1bbd5b	Add multivariate log-gamma (mvlgamma) (#9451 ) Summary: 1. Add tests in test_cuda, test_torch 2. Add doc strings Closes https://github.com/pytorch/pytorch/issues/9378 . Differential Revision: D8859746 Pulled By: ezyang fbshipit-source-id: 939c309d90940a7aa08f53004c9e7b3b1c9cf54e	2018-07-24 12:10:10 -07:00
Edward Yang	6885b3fd62	Delete dead IsVariable enum. (#9768 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9768 Differential Revision: D8975802 Pulled By: ezyang fbshipit-source-id: f85844872a1eb13e782aba0c168a3a1c1ac0313d	2018-07-24 11:58:11 -07:00
vishwakftw	f9a99d5504	Specify default initialization schemes for modules in docs (#9038 ) Summary: This closes #6906 . Reviewed By: ezyang Differential Revision: D8698632 Pulled By: weiyangfb fbshipit-source-id: 259c1dbdc264a8e9f83e196fa72d135babd97d48	2018-07-24 11:58:08 -07:00
Kittipat Virochsiri	2b134c72e6	Add interface to provide blob types to shape&type inference (#9643 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9643 Current map interface assumes float data type, which is not always correct. Reviewed By: kennyhorror Differential Revision: D8455784 fbshipit-source-id: b94a31267760f7f97c15aa4b03008affc347fd10	2018-07-24 11:58:05 -07:00
Junjie Bai	7af5883860	Eanble python tests on ROCM (#9616 ) Summary: petrex Pull Request resolved: https://github.com/pytorch/pytorch/pull/9616 Differential Revision: D8960623 Pulled By: bddppq fbshipit-source-id: bde93bda6230094e6bf4badd8ee79f0688ae1993	2018-07-24 11:37:58 -07:00
Gregory Chanan	6ab5e697b9	Small fixups for enabling zero size dims. (#9724 ) Summary: 1) Properly test cpu for alpha/beta addmm cases. 2) Unsqueeze on empty no longer throws an exception. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9724 Reviewed By: ezyang Differential Revision: D8958513 Pulled By: gchanan fbshipit-source-id: 6ce2ec4a47201f9b225b8c52354144ace43e9e09	2018-07-24 11:11:39 -07:00
Gregory Chanan	675d80841a	Small fixups for n-dimensional empty tensors in CUDA non-reduction di… (#9722 ) Summary: …m ops. Continuation of https://github.com/pytorch/pytorch/pull/9658. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9722 Differential Revision: D8956321 Pulled By: gchanan fbshipit-source-id: 116fcaa1be5b1373f03217911556a28125cc860d	2018-07-24 11:11:37 -07:00
Will Wilson	f6496229a5	Fixes xcode 10 beta 4 compile error (#9748 ) Summary: When building iOS apps with a caffe2 dependency, we were seeing the `caffe2/caffe2/mobile/contrib/ios/mpscnn/mpscnn.mm:33:17: error: method 'copyWithZone:' in protocol 'NSCopying' not implemented [-Werror,-Wprotocol]`. This fixes it by implementing a shallow copy with that method. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9748 Reviewed By: jerryzh168 Differential Revision: D8954332 Pulled By: williamtwilson fbshipit-source-id: 0cd44408257c0bd3f4ffb80312ea9d13d13e5ff3	2018-07-24 11:11:35 -07:00
Edward Yang	1283834600	Devirtualize TensorImpl::toString (#9758 ) Summary: This can hardly be called an improvement (we now print CPUFloatType instead of CPUFloatTensor) but it was the simplest way I could think of devirtualizing this function in the short term. Probably need some sort of native function that gives string information about a tensor. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Approved in #9710 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9758 Differential Revision: D8966935 Pulled By: ezyang fbshipit-source-id: a4641affe0a6153f90cdd9f4f2a1100e46d1a2db	2018-07-24 11:11:33 -07:00
Gregory Chanan	679d397f28	Fix scalar_tensor_test for squeeze/unsqueeze with zero sized dimensions. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9766 Differential Revision: D8971173 Pulled By: gchanan fbshipit-source-id: 50bf7778eee7c60f51e1660ad834e161fa40f563	2018-07-24 10:42:39 -07:00
Junjie Bai	a7afba7308	Remove duplicated functions (#9601 ) Summary: found by linter, duplication was likely introduced in previous code sync Pull Request resolved: https://github.com/pytorch/pytorch/pull/9601 Differential Revision: D8922379 Pulled By: bddppq fbshipit-source-id: 1f61bd7f539d823e62920615674a532ec0149623	2018-07-24 10:23:46 -07:00
Lu Fang	adda789770	Skip maxpool_with_indices onnx tests (#9751 ) Summary: Not in the same format. Skip at the moment. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9751 Reviewed By: yinghai Differential Revision: D8965636 Pulled By: houseroad fbshipit-source-id: 81d39c2f5625c14c0e1ee11408b5f7267b53798f	2018-07-24 10:23:43 -07:00
Edward Yang	ba634c11df	Move strides to base class. (#9749 ) Summary: Approved in #9644 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9749 Differential Revision: D8965336 Pulled By: ezyang fbshipit-source-id: d1b0763e592f298395621cfd684715dc0a550cd6	2018-07-23 22:27:48 -07:00
Zachary DeVito	9bf72b2087	Add missing windows exports Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9738 Reviewed By: apaszke Differential Revision: D8961728 Pulled By: zdevito fbshipit-source-id: aacba8c03d0d8dfe1e87585d1c2b26703d2ed103	2018-07-23 19:55:19 -07:00
Xiaomeng Yang	5df3eae89e	Add 1x1 specialization for conv with NCHW order (#9671 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9671 Add 1x1 specialization for conv with NCHW order Reviewed By: houseroad Differential Revision: D8944686 fbshipit-source-id: 94bf44f69498b1934b7dfff4c0e989342c7bb61c	2018-07-23 18:54:58 -07:00
Tongzhou Wang	a387331e54	Re-enable test_segfault after recent dataloder changes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9700 Differential Revision: D8953615 Pulled By: SsnL fbshipit-source-id: c6aa3c07dd2857dd54889d47e537a6b1e9198c60	2018-07-23 18:38:42 -07:00
Edward Yang	099b5ba9d1	Tensor merge PRs from July 20 (#9713 ) Summary: Constituent PRs: - [x] #9553 Remove unnecessary functions from StorageDerived.h (by cpuhrsch, reviewed by ezyang) - [x] #9588 Use THTensor/Storage for THVoidTensor/Storage (by cpuhrsch , reviewed by gchanan) - [x] #9627 Delete context from tensor (by ezyang, reviewed by gchanan) - [x] #9641 Tensor reorganization (by ezyang, reviewed by gchanan ) - [x] #9647 Remove dim_ from THTensor (by cpuhrsch, reviewed by ezyang) - [x] #9650 Remove context (by cpuhrsch, reviewed by gchanan and ezyang) - [x] #9715 Fix Windows build in tensor merge PR (by ezyang, reviewed by gchanan and SsnL) Upcoming PRs which didn't make this cut: - [x] #9644 Stride move to TensorImpl, and nits (by ezyang, reviewed by gchanan) - [ ] #9652 Native localScalar (by ezyang, UNREVIEWED AND FAILING TESTS) - [x] #9710 Devirtualize TensorImpl::toString (by ezyang, reviewed by gchanan) - [ ] #9654 Use int64_t instead of ptrdiff_t for size / Rename flag to resizable_ (by cpuhrsch, CHANGES REQUESTED AND FAILING TESTS) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9713 Reviewed By: gchanan Differential Revision: D8960882 Pulled By: ezyang fbshipit-source-id: 99747b2c5462c7ff6809b67aacb4197626408204	2018-07-23 18:00:41 -07:00
Bram Wasti	e3fb9088d5	Allow multiple ops.def and clean up code gen in general Summary: Basic cleanup, refactoring out some ops to closed source fb Reviewed By: yinghai Differential Revision: D8720722 fbshipit-source-id: 6fdf915c057a5749656d9f34a57fc142de6b076b	2018-07-23 15:44:04 -07:00
Peter Goldsborough	5849354aa1	Add operator<< overloads for TensorOptions (#9606 ) Summary: Added `operator<<` overloads for `at::TensorOptions` on request of ebetica Example output: ``` TensorOptions(dtype=Double, device=cpu, layout=Strided, requires_grad=false) ``` ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9606 Differential Revision: D8925191 Pulled By: goldsborough fbshipit-source-id: 0503bc2851268276e9561d918290bc723e437c9c	2018-07-23 15:11:33 -07:00
Peter Goldsborough	d05a8145c5	Change behavior of clone to clone to a device (#9609 ) Summary: ebetica made me aware that `nn::Module::clone()` always clones to the current device (usually CPU) instead of preserving the device of each parameter. This PR changes the signature of `clone` from `shared_ptr<Module> clone()` to `shared_ptr<Module> clone(optional<Device> device = nullopt)` with semantics of: 1. If a `device` is given, all parameters/buffers are moved to that device, 2. If no `device` is supplied (default), parameters/buffers retain their device. ezyang apaszke ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/9609 Differential Revision: D8957367 Pulled By: goldsborough fbshipit-source-id: 0d409ae645ed2b8d97d6fc060240de2f3d4bc6c8	2018-07-23 14:55:25 -07:00
Peter Goldsborough	31ba2f15e1	Rename embedding variable to weight (#9720 ) Summary: I renamed the variable in the `Embedding` module from `weight` to `table` a few months ago, because it seemed like a more meaningful name. Turns out it's not such a good idea because it deviates from PyTorch, which unnecessarily breaks C++->Python translated code. ebetica ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9720 Differential Revision: D8955647 Pulled By: goldsborough fbshipit-source-id: 77228b07d2b733866e8cdecaa6d0686eef4cc3ea	2018-07-23 14:55:24 -07:00
Anders Papitto	431415adc4	quick patch for PackPadded removal to propagate the correct size. (#9657 ) Summary: The underlying reason why this is even an issue is that the conversion into and out of the 'fictional' onnx operators is done in an unhygenic order. This doesn't address that, but it does fix the one observable case where this produces an incorrect result, and unblocks some other work being done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9657 Differential Revision: D8940824 Pulled By: anderspapitto fbshipit-source-id: ea827a24c85447fe4ae470336a746329598eee84	2018-07-23 14:25:39 -07:00
Zachary DeVito	a949245a86	Switch interpreter to use IValue's primitive int/floats (#9718 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9718 This patch switches the interpreter to use IValue's primitive numbers rather than tensors for computing on integers and floats. In addition to preparing the interpreter for first-class support of other types, this cleans up the handling of primitive numbers, making it possible to just use the normal operator overloading dispatch to find the right implementation for numbers. As a result of this change, a lot of other functionality needed to be updated since it was the first time we use non-tensors in a lot of places in the code base. Notes: * Fixes code_template.py so that multi-line strings are indented correctly when used on a standalone line * Cast operators (`int(x)`) now are functional. Some tests have addition conversions to integers because we no longer allow implicit tensor -> integer conversions following the same convention as in python * prim::ListConstruct/createList has been added to the interpreter for creating lists and this has replaced aten::stack for integers lists * gen_jit_dispatch.py has been refactored so that non-tensor types use operators on IValues to extract the primitives * IValue gains a .to<T> method that is the equivalent of tensor_as but for IValue instead of at::Tensor * `constant_as<T>` is switched over to using IValues's `.to<T>` method, to make conversion from constant->IValue->C++ type more consistent. This functionality combined with `toIValue(Value)` replaces the `tensor_as` and `as_tensor` family of functions. conditional expressions (if, loop) and operators related to them are now computed on integers rather than tensors * IValue gains constructors for constructing from at::Scalar and converting to it. However, IValue itself will always store the scalars as a double or int64. * To align with python 3 syntax, TK_INT, TK_FLOAT, and TK_BOOL have been removed from the parser, and int/float/bool are just treated as special identifiers in the compiler, along with print. These are represented as special sugared values with a `call` method implemented. For int/float/bool this implements casting behavior. * Dropped shared_from_this from Type/Module. They were not needed and they making debugging harder because they internally throw/catch exceptions. * Shape propagation has been updated to support running nodes that include floating point primitive types, this required some refactoring of internal functions. * TensorToNum and NumToTensor have actual implementations as operators now * regster_prim_ops now contains implementations of math operators for float/int primitive types, and for mixed (prim <+> tensor) versions. This removes the need for special handling in compiler.cpp * Primitive math is now entirely handled by letting the compiler choose the right overloads. This removes tons of special casing in the compiler. * incorporates eellison's change to allow casting from return values. Due to the addition of primitive support, the code need slight modifications, so I just pre-merged it here. * stack.h gains generic vararg versions of push/pop that know how to convert to/from C++ types: ``` at::Tensor a; at::Scalar b; pop(stack, a, b); at::Tensor c = a + b; push(stack, c); ``` apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9584 Reviewed By: apaszke Differential Revision: D8910546 Pulled By: zdevito fbshipit-source-id: 0f3e60d4d22217f196a8f606549430e43b7e7e30	2018-07-23 14:11:11 -07:00
Yinghai Lu	a9742e1a27	Add fallback to TensorCPU if there are unsupported types for IDEEP Tensor (#9667 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9667 MKL-DNN doesn't support 64-bit integger (`cfee61bf81/include/mkldnn_types.h (L62-L75)`). So force converting from `TensorCPU<long>` to `s32` Ideep tensor will cause memory issue. This diff gives an alternative solution, where we just fall through to TensorCPU. The reasoning is that since MKL-DNN doesn't support 64 bit integer tensor, downstream ops have to be in CPUConext. So there is no reason force converting to ideep tensor and back. Reviewed By: pjh5 Differential Revision: D8943544 fbshipit-source-id: f514903cda27e34b8887271c9df56c8220895116	2018-07-23 13:54:57 -07:00
Norman Mu	ee2cc68259	Add ctc_beam_search_decoder op for caffe2 (#9622 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9622 Implement a ctc_beam_sarch_decoder operator based on ctc_greedy_decoder. Differential Revision: D8903100 fbshipit-source-id: 38973632cb437e5cfcb9ed3a48ed6b901c10efa3	2018-07-23 13:40:24 -07:00
Sam Gross	aa8a9fa5fc	Extend DispatchStub to support CUDA dispatch (#9664 ) Summary: This is a modification of the strategy from https://github.com/pytorch/pytorch/pull/8919 and https://github.com/pytorch/pytorch/pull/9579. ``` Previously, the CPU architecture-specific kernels self-registered with the DispatchStub. When linking as part of a static library, this requires the flag --whole-archive to be passed to the linker to ensure that the object files for the kernels are included. Caffe2 and TensorFlow use that strategy. We ran into some issues with --whole-archive blowing up the binary size of some downstream projects in Facebook. This PR avoids --whole-archive for CPU kernels. The downside is that the generic code needs to be aware of whether kernels are compiled with AVX and with AVX2 (via HAVE_AVX_CPU_DEFINITION and HAVE_AVX2_CPU_DEFINITION). The CUDA kernels still self-register with DispatchStub because the CPU library is not aware of whether the CUDA library will be available at runtime. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9664 Differential Revision: D8943350 Pulled By: colesbury fbshipit-source-id: 329229b0ee9ff94fc001b960287814bd734096ef	2018-07-23 13:40:23 -07:00
Xiaolong Wang	3e9e3ef383	Improving diagnose RF NE with Cali (#9550 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9550 as titled Differential Revision: D8899226 fbshipit-source-id: 3c7cf026e8cbc0e95770e5a35b213a97bebba385	2018-07-23 13:40:21 -07:00
Sebastian Messmer	88d6b6e6cd	Fix D8722560 (#9717 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9717 D8722560 was landed with some build errors, unfortunately the c10 code isn't part of contbuild yet. Fixing them. Differential Revision: D8954141 fbshipit-source-id: 2a082fb8041626e45ccd609f37a8ef807f6dad8a	2018-07-23 12:55:20 -07:00
Peter Goldsborough	5094684238	Create torch::from_blob for variables (#9605 ) Summary: Need an overload of `at::from_blob` for Variables. ezyang colesbury ebetica Pull Request resolved: https://github.com/pytorch/pytorch/pull/9605 Differential Revision: D8926226 Pulled By: goldsborough fbshipit-source-id: e377c0d019d4377f3fc124614c7dcc562aa69990	2018-07-23 12:40:12 -07:00
Fei Sun	14d4bdb406	Reformat output data format to make it more general for other binaries (#9555 ) Summary: This is to simplify the data format during benchmarking. After this change, we can use the same benchmarking harness data conversion method to parse data from multiple binaries. This change should be coordinated with the PR: https://github.com/facebook/FAI-PEP/pull/63 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9555 Reviewed By: pjh5 Differential Revision: D8903024 Pulled By: sf-wind fbshipit-source-id: 61cabcff99f0873729142ec6cb6dc230c685d13a	2018-07-23 11:11:26 -07:00
idansc	029cf1d78a	Improve error messages of wrong dimensions (#9694 ) Summary: Updated the error message terms _matrices_ and _vectors_ to _2D tensors_ and _1D tensors_ respectively. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9694 Differential Revision: D8949589 Pulled By: ezyang fbshipit-source-id: 2cdcd72e0e9a4459f3691c133bb16ef218b5cf3f	2018-07-23 10:10:55 -07:00
fehiepsi	9525925119	Low rank multivariate normal (#8635 ) Summary: This pull request implements low rank multivariate normal distribution where the covariance matrix has the from `W @ W.T + D`. Here D is a diagonal matrix, W has shape n x m where m << n. It used "matrix determinant lemma" and "Woodbury matrix identity" to save computational cost. During the way, I also revise MultivariateNormal distribution a bit. Here are other changes: + `torch.trtrs` works with cuda tensor. So I tried to use it instead of `torch.inverse`. + Use `torch.matmul` instead of `torch.bmm` in `_batch_mv`. The former is faster and simpler. + Use `torch.diagonal` for `_batch_diag` + Reimplement `_batch_mahalanobis` based on `_batch_trtrs_lower`. + Use trtrs to compute term2 of KL. + `variance` relies on `scale_tril` instead of `covariance_matrix` TODO: - [x] Resolve the fail at `_gradcheck_log_prob` - [x] Add test for KL cc fritzo stepelu apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/8635 Differential Revision: D8951893 Pulled By: ezyang fbshipit-source-id: 488ee3db6071150c33a1fb6624f3cfd9b52760c3	2018-07-23 10:10:53 -07:00
Gregory Chanan	9d6521c3a0	Support n-dimensional empty tensors in CUDA non-reduction dimension f… (#9658 ) Summary: …unctions. This also unifies the error checkign between scatter/scatterAdd on CUDA. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9658 Differential Revision: D8941527 Pulled By: gchanan fbshipit-source-id: 750bbac568f607985088211887c4167b67be11ea	2018-07-23 08:40:12 -07:00
peter	53083b8353	Remove CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS and fix CUDA 8 build on Windows (#9491 ) (#9491 ) Summary: Fixes #9092. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9491 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9693 Differential Revision: D8946850 Pulled By: ezyang fbshipit-source-id: bd816f459ab70f6b4a0983305a1ce341bb633707	2018-07-23 06:40:39 -07:00
Tongzhou Wang	9ee5133651	Fix dataloader hang when it is not completely iterated (#9655 ) Summary: second trial of https://github.com/pytorch/pytorch/pull/7140 cc csarofeen Let's see if this works. It passes everything locally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9655 Differential Revision: D8940177 Pulled By: SsnL fbshipit-source-id: 8d6340fc9f7355c71e1e26b262da166402faa158	2018-07-22 20:38:27 -07:00
Edward Yang	1afdc57ed8	Hide all other fields in THTensor (#9683 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9683 This pops off `refcount_`, `storage_`, `storage_offset_`; there are now no more direct accesses to these fields and we can make them private (with appropriate friending). Stacked on #9561 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9591 Reviewed By: SsnL Differential Revision: D8922246 Pulled By: ezyang fbshipit-source-id: dfae023d790e29ce652e2eab9a1628bbe97b318d	2018-07-22 09:09:34 -07:00
Di Yu	f3d72b2101	Modify barrier net to allow better control over its initialization and execution in DPM (#9665 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9665 In data_parallel_model, we isolate synchronizing barrier init net into its own from the param_init_net, so that we could have finer granularity of control over the barrier net. Reviewed By: andrewwdye Differential Revision: D8375389 fbshipit-source-id: ce0c8c1c8e4bd82b7078a1b07abaced3f149d578	2018-07-22 00:23:47 -07:00
Adam Paszke	769cb5a640	Add new ways of matching nodes with schemas in the JIT (#9567 ) Summary: REVIEW LAST COMMIT ONLY As discussed in our yesterday's meeting. Nodes can be now matched to particular overloads using the `matches(...)` function: ```cpp n->matches("aten::type_as(Tensor self, Tensor other) -> Tensor") ``` This also changes the shape prop and peephole passes to use those functions for matching. This fixes a few bugs, makes them much more robust, and prepares us for removal of attributes. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9567 Reviewed By: zdevito Differential Revision: D8938482 Pulled By: apaszke fbshipit-source-id: eb2382eeeae99692aada2d78d5d0c87c8ef1545e	2018-07-21 21:39:07 -07:00
Xiaomeng Yang	a01d6f01b5	Update channel_shuffle_op and transpose 2d to speed up ShuffleNet (#9525 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9525 Update channel_shuffle_op and transpose 2d to speed up ShuffleNet Reviewed By: houseroad Differential Revision: D8889361 fbshipit-source-id: 60196e819b6842becc53b4859b62d4419a0e2c6e	2018-07-21 12:54:33 -07:00
Owen Anderson	3bb8c5eab1	Allow MKLDNN on macOS, and any other OS where CMake is able to detect it. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9638 Reviewed By: soumith Differential Revision: D8946130 Pulled By: resistor fbshipit-source-id: 87bd9cb12608467b05bd4998fdb00bfdbd038ca2	2018-07-20 22:27:02 -07:00
Edward Yang	b5c8d59451	Add a CUDAContext header include Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9662 Differential Revision: D8945581 Pulled By: ezyang fbshipit-source-id: 2fe0adc96456788579f7d6f1c4513fe45360c030	2018-07-20 20:39:09 -07:00
Edward Yang	23ed26a0c3	Guard include of cuda-only header comm.h (#9656 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9656 Reviewed By: colesbury Differential Revision: D8941361 Pulled By: ezyang fbshipit-source-id: c18cb0e606ae0608e5892040192b8792ae542b74	2018-07-20 19:46:36 -07:00
Ashish	5e84403d5f	Fix for half conversion for ROCm 1.8.2 (#9663 ) Summary: This PR contains the change for explicit conversion between ushort and __half required for ROCm 1.8.2 support bddppq Pull Request resolved: https://github.com/pytorch/pytorch/pull/9663 Differential Revision: D8943937 Pulled By: bddppq fbshipit-source-id: 16102f9dbc68ed4ece2e8fc244825c3992c24901	2018-07-20 17:11:30 -07:00
Gregory Chanan	3efdece9da	Support n-dimensional empty tensors in take/put. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9635 Differential Revision: D8935119 Pulled By: gchanan fbshipit-source-id: 5035583e7322b1a1720d961945dd0eefb4cb28ef	2018-07-20 15:40:49 -07:00
Yinghai Lu	45e5c17ecf	ONNXIFI transform (#9569 ) Summary: Cut-off runnable subgraph and off-load to ONNXIFI backend Pull Request resolved: https://github.com/pytorch/pytorch/pull/9569 Reviewed By: Maratyszcza Differential Revision: D8930408 Pulled By: yinghai fbshipit-source-id: 2b494f7f8dc10c00e58cf0fed5c4a9434be6155b	2018-07-20 15:09:59 -07:00
Kittipat Virochsiri	01581037dc	Add workspace.RunPlanInBackground (#9637 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9637 Adding a method to run plan in background. The intended use is to run BlueWhale's data reading & preprocessing net in background while the GPU is training. Reviewed By: MisterTea Differential Revision: D8906439 fbshipit-source-id: b1c73ca7327e2d87a8f873924e05ab3d161a3f1e	2018-07-20 14:56:12 -07:00
Mike Ruberry	1003ccfa15	Creates CUDAContext (#9435 ) Summary: ezyang noticed that the CUDAStream files lived under ATen/ despite being CUDA-specific, and suggested porting them to ATen/cuda and exposing them with a new CUDAContext. This PR does that. It also: - Moves ATen's CUDA-specific exceptions for ATen/cudnn to ATen/cuda for consistency - Moves getDeviceProperties() and getCurrentCUDASparseHandle() to CUDAContext from CUDAHooks The separation between CUDAContext and CUDAHooks is straightforward. Files that are in CUDA-only builds should rely on CUDAContext, while CUDAHooks is for runtime dispatch in files that can be included in CPU-only builds. A comment in CUDAContext.h explains this pattern. Acquiring device properties and CUDA-specific handles is something only done in builds with CUDA, for example, so I moved them from CUDAHooks to CUDAContext. This PR will conflict with #9277 and I will merge with master after #9277 goes in. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9435 Reviewed By: soumith Differential Revision: D8917236 Pulled By: ezyang fbshipit-source-id: 219718864234fdd21a2baff1dd3932ff289b5751	2018-07-20 12:56:15 -07:00
Kittipat Virochsiri	8a0fe0a588	set_input_record() should always add external input (#9636 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9636 Make sure that the blobs are registered to the net Reviewed By: pjh5 Differential Revision: D8924883 fbshipit-source-id: f09422a2d4d5ba8bf6cfbfd00172097b5ab1fcd6	2018-07-20 11:55:37 -07:00
Gregory Chanan	bae156a481	Support (some) CUDA Lapack on n-dimensional empty tensors. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9631 Reviewed By: ezyang Differential Revision: D8933202 Pulled By: gchanan fbshipit-source-id: 1ade4ca439bf26aa921df1da83a827d860f8f48f	2018-07-20 11:40:25 -07:00
vmirly	d3688861ec	Fixed a missing '=' in LPPoolNd repr function (#9629 ) Summary: In the repr funciton of LPPoolNd(..) class, there was a missing '='. (`kernel_size{kernel_size}`) Link to line in the code: https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/pooling.py#L694 Original: return 'norm_type={norm_type}, kernel_size{kernel_size}, stride={stride}, ' \ 'ceil_mode={ceil_mode}'.format(self.__dict__) Fixed: return 'norm_type={norm_type}, kernel_size={kernel_size}, stride={stride}, ' \ 'ceil_mode={ceil_mode}'.format(self.__dict__) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9629 Differential Revision: D8932913 Pulled By: soumith fbshipit-source-id: 9030dff6b14659b5c7b6992d87ef53ec8891f674	2018-07-20 11:24:42 -07:00
Zhaoheng Ni	a3a6ab60cd	Fix the error in UnpackSegmentsOp when calculating the gradient with "max_length" argument (#9598 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9598 The "max_length" should be passed to UnPackSegmentsOp if "max_length" is given when calling PackSegmentsOp. Reviewed By: jerryzh168 Differential Revision: D8919799 fbshipit-source-id: 8c97aa717b69177b8a5d5d56892817d488853840	2018-07-20 11:09:34 -07:00
Adam Paszke	1d4d9fc7da	Prepare to stop using attributes in the JIT (#9505 ) Summary: This PR adds machinery to cache the schema in an IR node, and allows lookups of (possibly) constant inputs by their names (instead of position). The new methods are: - `at::optional<T> get<T>(Symbol name)` - if the argument called name is a constant, then casts it to type `T` and returns it. If it's not constant returns `nullopt`. Raises an error if there's no argument with that name. - `at::optional<IValue> get<T>(Symbol name)` - like above, but packs the result in an IValue - `Value* getValue(Symbol name)` - retrieves a `Value*` for an argument (no need to know its position). All above functions currently inspect the attributes as well, but that's only so that I could start using them in other places in the JIT without disrupting our current functionality. I wanted this diff to be a preparation that doesn't change the semantics too much, and so both the tracer and script create nodes with attributes. The next PR will put that to a stop, and hopefully the changes we need to make to other components will be simpler thanks to what I did here. One more thing I'd like to do before actually stopping creating the non-attributed nodes is to have a convenient way of creating a schema programmatically, matching nodes against it, and creating them without having to pack inputs into flat argument lists (which is quite error prone). zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9505 Reviewed By: ezyang Differential Revision: D8915496 Pulled By: apaszke fbshipit-source-id: 39d14fc9a9d73d8494f128367bf70357dbba83f5	2018-07-20 10:56:00 -07:00
Sam Gross	b9e89cf9fd	Revert "Extend DispatchStub to support CUDA dispatch (#9579 )" (#9614 ) Summary: This reverts commit bcf0bf42a1727c8ee788f733c28579d0e36a387c. The commit was causing issues for some internal FB projects. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9614 Reviewed By: Yangqing Differential Revision: D8929552 Pulled By: colesbury fbshipit-source-id: ae9026ad8762a4c5de401273694b4c878fc241a6	2018-07-20 10:25:11 -07:00
Christian Puhrsch	bbb30ad4ab	Use THTensor/Storage for THVoidTensor/Storage (#9588 ) Summary: Change akin to change for THVoidStorage. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9588 Reviewed By: gchanan Differential Revision: D8915559 Pulled By: cpuhrsch fbshipit-source-id: 6cc69df0e29942c62750f990903dfd8e4d344581	2018-07-20 09:54:44 -07:00
Christian Puhrsch	f84fdc7866	Remove unnecessary functions from StorageDerived.h Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9553 Reviewed By: ezyang Differential Revision: D8915526 Pulled By: cpuhrsch fbshipit-source-id: 32013d3aa58a1a68637f99ee619d06e27fadaad6	2018-07-20 09:41:36 -07:00
Vishwak Srinivasan	7b9d8916e5	Fix integral type dispatch error message (#9625 ) Summary: This fix will prevent errors like (found in `bincount`) ``` RuntimeError: %s not implemented for '%s'bincounttorch.FloatTensor ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9625 Differential Revision: D8932945 Pulled By: soumith fbshipit-source-id: 794e3b58d662779402ab318e274661826a5db8b2	2018-07-20 09:24:27 -07:00
Tongzhou Wang	2a0018f2a8	Add scatter_add_ doc (#9630 ) Summary: fixes #4176 cc vishwakftw I didn't do `:math:` and `\neg` because I am using double ticks so they render more similarly with `:attr:`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9630 Differential Revision: D8933022 Pulled By: SsnL fbshipit-source-id: 31d8551f415b624c2ff66b25d886f20789846508	2018-07-20 08:41:05 -07:00
Tongzhou Wang	bfe2aa093e	docs fixes (#9607 ) Summary: fixes #9589 #9507 #9502 #9390 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9607 Reviewed By: ezyang, soumith Differential Revision: D8923575 Pulled By: SsnL fbshipit-source-id: cb61d990333b700d813ce781040c3d0325999b8c	2018-07-20 07:55:25 -07:00
Anders Papitto	4028ff6c3a	Revert "quick patch for PackPadded removal to propagate the correct s… (#9613 ) Summary: …ize. (#9593)" This reverts commit 85b28163584380bf4953f2ac2fa21df9715f12d5. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9613 Reviewed By: bddppq Differential Revision: D8929322 Pulled By: anderspapitto fbshipit-source-id: 3ae4d320e5407acc1fb63a26b7d1f2ff4059eba9	2018-07-20 00:39:29 -07:00
Adam Paszke	aa7af94656	Make JIT tracing a thread-local property (#9414 ) Summary: As in the title. Lets us simplify a lot of code. Depends on #9363, so please review only the last commit. zdevito Pull Request resolved: https://github.com/pytorch/pytorch/pull/9414 Reviewed By: zdevito Differential Revision: D8836496 Pulled By: apaszke fbshipit-source-id: 9b3c3d1f001a9dc522f8478abc005b6b86cfa3e3	2018-07-19 19:09:39 -07:00
Rio Hoshi	5651b27458	Add CAFFE_STATIC_EVENT to Stats (#9501 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9501 Added a new stat value to log static states like CPU and memory usage. Reviewed By: pjh5 Differential Revision: D8872254 fbshipit-source-id: 469e94cab99029a3da55f8986dddeadac076e2a8	2018-07-19 16:25:59 -07:00
Peter Goldsborough	b770156a7a	Functional DataParallel (#9234 ) Summary: This PR adds the functional version of `DataParallel` (i.e. `data_parallel`) to the C++ frontend. For this, I had to: 1. Add "differentiable" versions of scatter and gather, which perform their inverse operation in the backward pass, to C++. I've added them under `torch/csrc/autograd/functions/comm.{h,cpp}`. I had to move some utilities from `VariableType.cpp` into `torch/csrc/autograd/functions/utils.h`, and changed them a bit to fix the `const_cast`s for which there were `TODO`s, 2. Implement the `replicate`, `parallel_apply` and the combining `data_parallel` functions in C++. `replicate` is implemented based on our existing `clone()` interface, along with the ability to set the current device via `at::OptionsGuard` (so nice). `parallel_apply` is implemented using `at::parallel_for` (CC cpuhrsch) and [follows the code from PyTorch](https://github.com/pytorch/pytorch/blob/master/torch/nn/parallel/parallel_apply.py). Added lots of tests for these things. apaszke ezyang ebetica colesbury Pull Request resolved: https://github.com/pytorch/pytorch/pull/9234 Differential Revision: D8865182 Pulled By: goldsborough fbshipit-source-id: 4f1fecf2b3f3bc1540c071dfb2d23dd45de433e4	2018-07-19 16:12:04 -07:00
Peter Goldsborough	7e78e80d94	Make error message for empty module friendlier (#9565 ) Summary: In our pimpl system, default constructing a module holder default constructs the contained module. This means `Linear linear;` is ill-formed, since `Linear` doesn't have a default constructor. Instead we require `Linear linear = nullptr;` to get the empty state of the `Linear`. This PR makes the error message for the ill-formed case nicer. I had to change the forwarding constructors of most of our modules for this, but that's a minor adjustment. E.g. ``` Linear linear; In file included from /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/module.h:5:0, from /home/psag/pytorch/pytorch/test/cpp/api/module.cpp:3: /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/pimpl.h: In instantiation of ‘torch::nn::ModuleHolder<Contained>::ModuleHolder() [with Contained = torch::nn::LinearImpl]’: /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/modules/dropout.h:45:1: required from here /home/psag/pytorch/pytorch/torch/csrc/api/include/torch/nn/pimpl.h:46:5: error: static assertion failed: You are trying to default construct a module which has no default constructor. Use = nullptr to give it the empty state (like an empt y std::shared_ptr). static_assert( ``` ebetica ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/9565 Differential Revision: D8903666 Pulled By: goldsborough fbshipit-source-id: 5e6b788921a27a44359db89afdc2b057facc5cec	2018-07-19 15:56:54 -07:00
Sam Gross	bcf0bf42a1	Extend DispatchStub to support CUDA dispatch (#9579 ) Summary: This is a few files taken from https://github.com/pytorch/pytorch/pull/8919. They're unchanged from the latest versions of that PR. ``` This is part of https://github.com/pytorch/pytorch/pull/8919. It's separated to make it easier to merge the PR in pieces. There are a few major changes to DispatchStub - The environment variable ATEN_CPU_CAPABILITY overrides the CPU capability detection code (Previous ATEN_DISABLE_AVX/AVX2) - DispatchStub is defined in the generic native code instead of the CPU_CAPABILITY_DEFAULT kernel. ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9579 Differential Revision: D8909000 Pulled By: colesbury fbshipit-source-id: fdeb606270b06acdab3c01dba97ec9d81584ecc0	2018-07-19 14:25:40 -07:00
Edward Yang	a08119afc2	Eliminate direct access to size/strides of THTensor; replace them with std::vector (#9561 ) Summary: * THTensor now stores `sizes_` and `strides_` which is a `std::vector<int64_t>` * Anywhere a "public" API function made use of a int64_t* of sizes, I opted to just finagle it out of the tensor using THTensor_getSizePtr rather than try to rewrite all of these sites to use ArrayRef. They should use ArrayRef eventually, but not yet. * There are new utility functions for resizing sizes/strides in one go (THTensor_resizeDim), or replacing sizes and strides with completely new values (THTensor_setSizesAndStrides) * Anywhere you said `t->size[n] = 0`, we now say `THTensor_setSizeAt(t, n, 0)`, ditto for strides * Anywhere you said `t->size[n]`, we now say `t->size(n)` (coming soon: ditto for strides) Previous review of just the `std::vector` change in #9518, but I'm planning to merge this all in one go. Note for gchanan: review from commit "ci" and after Pull Request resolved: https://github.com/pytorch/pytorch/pull/9561 Reviewed By: cpuhrsch Differential Revision: D8901926 Pulled By: ezyang fbshipit-source-id: 483cf275060ab0a13845cba1ece39dd127142510	2018-07-19 14:10:06 -07:00
Junjie Bai	f521823b7b	Do not always set broadcast argument when exporting new onnx add and sub to caffe2 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9597 Reviewed By: colesbury Differential Revision: D8920575 Pulled By: bddppq fbshipit-source-id: 97423e1bf6a20559d466d2ac56c9e74e10bfc129	2018-07-19 14:10:05 -07:00
Zhishuai Zhang	6557856671	Fix l2 normalization when handling zero vector (#9594 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9594 When the input vector is a zero vector, the previous GPU code will give Nan in backward. We fix this. Reviewed By: pjh5 Differential Revision: D8849732 fbshipit-source-id: 87b1fb1ee05dfdb0d43bcbe67e36f15896fe1706	2018-07-19 14:10:03 -07:00
Anders Papitto	85b2816358	quick patch for PackPadded removal to propagate the correct size. (#9593 ) Summary: The underlying reason why this is even an issue is that the conversion into and out of the 'fictional' onnx operators is done in an unhygenic order. This doesn't address that, but it does fix the one observable case where this produces an incorrect result, and unblocks some other work being done. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9593 Differential Revision: D8919125 Pulled By: anderspapitto fbshipit-source-id: a88ca979c3b9d439863e223717d3697180c26121	2018-07-19 14:10:02 -07:00
Tongzhou Wang	f33cd36c9b	Use int64_t for im2col and col2im (#9590 ) Summary: Fixes #9404 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9590 Differential Revision: D8916020 Pulled By: SsnL fbshipit-source-id: ac6758326bbb09b48642b149f4eb8f466ef7044e	2018-07-19 11:29:24 -07:00
Gregory Chanan	f180373d68	Support n-dimensional empty tensors in CUDA BLAS and fix a btrifact bug. (#9573 ) Summary: This is mainly straightforward, with two exceptions: 1) cublasSgemv, cublasDgemv appear to have a bug where (x,0).mv(0) does not handle beta, whereas cublasSgemm, cublasDgemm do for case where (x,0).mm(0,y). This is handled by manually calling zero / mul. 2) I fixed a bug in btrifact that was broken even when dealing with non-empty tensors. Basically, if out.stride(0) was 1, because the underlying BLAS call expects column-major matrices, to get a column-major tensor, out.transpose_(0, 1) would be called. But this is just wrong, as if the batch dimension (0) doesn't match the size of the columns (1), you don't even have a tensor of the correct shape. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9573 Reviewed By: ezyang Differential Revision: D8906144 Pulled By: gchanan fbshipit-source-id: de44d239a58afdd74d874db02f2022850dea9a56	2018-07-19 09:50:27 -07:00
Tongzhou Wang	aee9e90abd	Fix TestAutograd.test_as_strided (#9538 ) Summary: 0. Fixes #9479 1. rewrites `as_strided` as a native function. This is fine because `set_` does the scalar check. 2. allow using `self` in `python_default_init`. Previously `python_variable_methods.cpp` has `self` as an input `PyObject *`, and use `self_` as the unpacked tensor. But `python_torch_functions.cpp` just use `self` as the unpacked tensor, making it impossible to use `self` in `python_default_init`. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9538 Differential Revision: D8894556 Pulled By: SsnL fbshipit-source-id: ca7877b488e12557b7fb94e781346dcb55d3b299	2018-07-19 09:11:13 -07:00
Will Feng	e0446fcfa9	Pass dtype to tensor contructor in test_neg (#9558 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/9554. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9558 Differential Revision: D8901085 Pulled By: yf225 fbshipit-source-id: 0edb176fcb18e0c0bcfc6f209343b9097767c9b8	2018-07-19 08:54:39 -07:00
Peter Yeh	54db14e390	HIP Operators Generator--> HipOpG (#9322 ) Summary: The goal of this PR is to add an infrastructure; to convert(hipify) CUDA ops into [HIP](https://github.com/ROCm-Developer-Tools/HIP) ops , at compile time. Note that HIP ops, which are portable c++ code, can run on AMD and NVIDIA platform. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9322 Differential Revision: D8884707 Pulled By: bddppq fbshipit-source-id: dabc6319546002c308c10528238e6684f7aef0f8	2018-07-19 00:26:06 -07:00
Marat Dukhan	45f0d05202	Adapt OnnxifiOp to removed suffix handling in ONNXIFI loader (#9571 ) Summary: Adapt to changes in onnx/onnx#1203 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9571 Reviewed By: yinghai Differential Revision: D8907892 Pulled By: bddppq fbshipit-source-id: 9f88471639dbe9050194e84340f335bece834d5d	2018-07-18 19:26:23 -07:00
Viswanath Sivakumar	604f7e98c3	Expose CAFFE2_USE_OPENCV preprocessor flag (#9509 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9509 generate_proposals_op_util_nms.h conditionally requires OpenCV in some cases, and earlier this was checking just CV_MAJOR_VERSION macro, but that is undefined unless opencv.hpp is included. Adding `-DCAFFE2_USE_OPENCV` to TARGETS when opencv is included in external_deps to check for this correctly. Thanks jinghuang for flagging this issue! Differential Revision: D8880401 fbshipit-source-id: 65abbcf4ffe3feffc0ee2560882cb8eb0b7476f9	2018-07-18 18:56:49 -07:00
Yi Cheng	b3e141e84c	Add predictor config into Predictor (#9434 ) Summary: This is the first step of refactoring the Predictor. In this diff the config struct is introduced and the internal data structure of Predictor has been updated. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9434 Differential Revision: D8843262 Pulled By: fishbone fbshipit-source-id: 23f5e4751614e3fedc9a04060d69331bfdecf864	2018-07-18 16:39:56 -07:00
Keren Zhou	04b33b7231	Add byte_weight_dequant_op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9541 Reviewed By: hlu1 Differential Revision: D8882964 fbshipit-source-id: 06d2e0d227ea6a4a8dc5ef1ea9dd1d449c149b47	2018-07-18 16:27:21 -07:00
Christian Puhrsch	c1ee8835b6	Constructors and member functions for THStorage (#9357 ) Summary: Added on top of ezyang's https://github.com/pytorch/pytorch/pull/9278 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9357 Reviewed By: ezyang Differential Revision: D8863934 Pulled By: cpuhrsch fbshipit-source-id: a45c955c0b1e9e0866749b3a7e8a36de931bdff1	2018-07-18 15:56:26 -07:00
Anders Papitto	4c615b1796	Introduce libtorch to setup.py build (#8792 ) Summary: Prior to this diff, there have been two ways of compiling the bulk of the torch codebase. There was no interaction between them - you had to pick one or the other. 1) with setup.py. This method - used the setuptools C extension functionality - worked on all platforms - did not build test_jit/test_api binaries - did not include the C++ api - always included python functionality - produced _C.so 2) with cpp_build. This method - used CMake - did not support Windows or ROCM - was capable of building the test binaries - included the C++ api - did not build the python functionality - produced libtorch.so This diff combines the two. 1) cpp_build/CMakeLists.txt has become torch/CMakeLists.txt. This build - is CMake-based - works on all platforms - builds the test binaries - includes the C++ api - does not include the python functionality - produces libtorch.so 2) the setup.py build - compiles the python functionality - calls into the CMake build to build libtorch.so - produces _C.so, which has a dependency on libtorch.so In terms of code changes, this mostly means extending the cmake build to support the full variety of environments and platforms. There are also a small number of changes related to the fact that there are now two shared objects - in particular, windows requires annotating some symbols with dllimport/dllexport, and doesn't allow exposing thread_local globals directly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8792 Reviewed By: ezyang Differential Revision: D8764181 Pulled By: anderspapitto fbshipit-source-id: abec43834f739049da25f4583a0794b38eb0a94f	2018-07-18 14:59:33 -07:00
Peter Goldsborough	3b886500a0	Add CUDAGuard to ATen (#9277 ) Summary: THCStream was recently moved to ATen by mruberry: https://github.com/pytorch/pytorch/pull/8997. This PR now introduces a guard class that replaces `AutoStream` from `torch/csrc/` and also uses this new stream interface. I had to extend the `CUDAStream` interface with unchecked calls, so that we can reset the stream without throwing an exception in the guard's destructor. colesbury apaszke ezyang Fixes https://github.com/pytorch/pytorch/issues/7800 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9277 Differential Revision: D8865183 Pulled By: goldsborough fbshipit-source-id: 67c9bc09629d92fa5660286b5eec08fde9108cd7	2018-07-18 14:40:31 -07:00
Christian Puhrsch	8769fec03f	Move clamp into ATen (#9506 ) Summary: Glue component of https://github.com/pytorch/pytorch/pull/9319 Important to unblock wanchaol Pull Request resolved: https://github.com/pytorch/pytorch/pull/9506 Reviewed By: wanchaol Differential Revision: D8879437 Pulled By: cpuhrsch fbshipit-source-id: 16ea8a93f3f5df2695180b3a30a583834b7004f1	2018-07-18 13:40:11 -07:00
Edward Yang	c506ff97c8	Disable py2-clang3.8-rocmnightly-ubuntu16.04-test in disabled-configs… (#9543 ) Summary: ….txt setting In the ROCm branches we will experiment with turning this on. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9543 Differential Revision: D8897990 Pulled By: ezyang fbshipit-source-id: ae9d25d1b79ee421d49436593edf8c7e49b3a4e5	2018-07-18 12:58:56 -07:00
Xiaomeng Yang	ca3b36aa6a	Add implementation for batch_moments_op (#9510 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9510 Add implementation for batch_moments_op Reviewed By: houseroad Differential Revision: D8587654 fbshipit-source-id: d20f52cc8e900716c1057e68c147258dfda5245b	2018-07-18 11:59:54 -07:00
Keren Zhou	8c741b7c4f	Add transformation from caffe2::resizeop to onnx::upsample Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9511 Reviewed By: hlu1 Differential Revision: D8876692 fbshipit-source-id: 9ba346e225cfbc686d370134fe41a28333b933cc	2018-07-18 11:59:52 -07:00
Artem Volkhin	b6b6e1b39f	Fix core.Plan.create_from_proto (#9438 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9438 Current implementation of create_from_proto doesn't work as expected: it duplicates networks and execution steps by copying original PlanDef first and adding each step one-by-one later. Reviewed By: pjh5 Differential Revision: D8850316 fbshipit-source-id: 9b02836d6e6ee1c91cfdd3b4c4804f14137dc22b	2018-07-18 10:55:55 -07:00
Tongzhou Wang	27455e9c78	Use _six for inf and nan (#9500 ) Summary: Things like `float('inf')` are actually quite expensive. ```py In [1]: import math In [2]: %timeit -n 200 math.inf 49.3 ns ± 1.42 ns per loop (mean ± std. dev. of 7 runs, 200 loops each) In [3]: %timeit -n 200 float('inf') 194 ns ± 39.1 ns per loop (mean ± std. dev. of 7 runs, 200 loops each) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9500 Reviewed By: soumith Differential Revision: D8876229 Pulled By: SsnL fbshipit-source-id: 78602b76bb53d5588910b58270930c0bd413d2d7	2018-07-18 10:40:29 -07:00
Natalia Gimelshein	35f7925aad	fix small literals being flushed to 0 by std::to_string Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9478 Differential Revision: D8872083 Pulled By: soumith fbshipit-source-id: 90083b6047f59466949ace249193094131a30cd5	2018-07-18 09:25:06 -07:00
Edward Yang	d6e124e9a5	Dummy CircleCI config. (#9537 ) Summary: The purpose of this config is to make sure that CircleCI builds don't fail when I turn them on for pytorch/pytorch. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9537 Differential Revision: D8894497 Pulled By: ezyang fbshipit-source-id: 22f43c84a9b8a54cd47a6572ba068f70a73f043a	2018-07-18 09:25:05 -07:00
Tong Xiao	28954b9e68	Fix RoIAlignOp GPU implementation for RoIs without batch index (#9230 ) Summary: Fix RoIAlignOp GPU implementation for RoIs without batch index According to https://caffe2.ai/docs/operators-catalogue.html#roialign, RoIs is "2D input of shape (R, 4 or 5)" Pass RoIs 2nd dimension as kernel parameter and adjust kernel accordingly Pull Request resolved: https://github.com/pytorch/pytorch/pull/9230 Reviewed By: houseroad Differential Revision: D8886798 Pulled By: malfet fbshipit-source-id: 52a8b4df85f7e350e36c842ee4428f3a1cba2588	2018-07-18 08:39:50 -07:00
Nikita Shulga	8fe2622090	Fix gatherTopK template (#9231 ) Summary: Fix gatherTopK template This change makes it possible to instantiate getherTopK() with IndecesType other than caffe2::TIndex Pull Request resolved: https://github.com/pytorch/pytorch/pull/9231 Reviewed By: houseroad Differential Revision: D8886778 Pulled By: malfet fbshipit-source-id: d5fb1f8814710cd81bc0cf65e0f96fd9fd8317da	2018-07-18 08:25:23 -07:00
Gregory Chanan	f277645968	Support N-dimensional empty tensors in CPU BLAS and (a selection of) … (#9522 ) Summary: …CPU LAPACK routines. Note that the LAPACK functions in general require a different approach, because direct calls with size zero dims do not work. Here I just selected a reasonable subset of LAPACK routines to support. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9522 Reviewed By: ezyang Differential Revision: D8888180 Pulled By: gchanan fbshipit-source-id: 16b9013937806d375d83d1c406815765fda00602	2018-07-18 08:25:21 -07:00
bhushan23	5eaed750c2	Implementing torch.isfinite (#9487 ) Summary: fixes #9132 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9487 Reviewed By: soumith Differential Revision: D8875529 Pulled By: SsnL fbshipit-source-id: d1b8aa825d202cfbdca27897da6a8bc1b714f856	2018-07-18 08:25:20 -07:00
albanD	57608214d4	Make squeeze doc consistent with it's behaviour (#9529 ) Summary: A 0-dimensional tensor is now returned when squeezing a tensor with a single element. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9529 Differential Revision: D8893103 Pulled By: soumith fbshipit-source-id: 658189ecfff283b2b7281feb16a397692d6dbd8f	2018-07-18 08:25:18 -07:00
iotamudelta	3eb3f03776	ROCm contributions week 28 (#9432 ) Summary: This PR contains the ROCm contributions of last week: * documentation of pyHIPIFY data format originating from #8812 reviewing comments by ezyang * removal of most patch files from the `amd_build` directory and integration into the code base * enabling of previously disabled_features that do compile now * improvement to the static_cast feature in pyHIPIFY (it will only apply static_cast to kernel arguments, not launch arguments) * addition of two workarounds to pyHIPIFY for ROCm/HIP shortcomings: a) `__forceinline__` does not imply `static`, hence change to `__inline__`, b) `std::[exp,log,pow]` math functions cannot be selected in device code, use `::[exp,log,pow]` instead. Both of these workarounds will be removed once the issues are fixed upstream. Neither of these issues have surfaced on the CI but were reproduced internally. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9432 Differential Revision: D8887441 Pulled By: ezyang fbshipit-source-id: 71cf5c6b13772a66d10be369a45ebf06e4e268e1	2018-07-18 07:54:58 -07:00
fehiepsi	73225e4a1d	add docs for using `python setup.py clean` in developing mode (#9524 ) Summary: This command (suggested by albanD when I raised a related question in pytorch slack) is super useful to me. I have used it several times and it worked like a charm (without it, I have to delete entire pytorch folder and clone things again). So I guess it is nice to have in the CONTRIBUTING doc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9524 Differential Revision: D8890126 Pulled By: soumith fbshipit-source-id: c1798ff1ab2423627fcd8e0662a66c4e85cb2413	2018-07-18 05:23:41 -07:00
Jinhyun lewha0 Kim	89db578e66	Fixed a typo Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9523 Differential Revision: D8890124 Pulled By: soumith fbshipit-source-id: dea8d153fc352c36b219298c52f2c97caf9999f4	2018-07-18 05:09:22 -07:00
James Sun	6de038286a	Add random data filler to predictor bench to support production nets (#9520 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9520 Add random data filler to predictor bench to support production nets Reviewed By: salexspb Differential Revision: D8712757 fbshipit-source-id: 2c732b2ba71ab210f9222adf94d08442ca71dc03	2018-07-18 00:46:02 -07:00
Edward Yang	543d4af79f	Be strict prototypes clean. (#9516 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9516 Differential Revision: D8886493 Pulled By: ezyang fbshipit-source-id: fea974fd96c7d81126a129eb5b8b06eb1b028526	2018-07-17 20:25:53 -07:00
Wei Yang	aa73348d75	added reminder of args naming rules to readme (#9504 ) Summary: - I ran into this couple days ago, and thought it might be useful to take note on that Pull Request resolved: https://github.com/pytorch/pytorch/pull/9504 Reviewed By: soumith Differential Revision: D8887396 Pulled By: weiyangfb fbshipit-source-id: d2061cf379ce140d6e43ef6c18241f7ce00dbab6	2018-07-17 19:40:38 -07:00
Edward Yang	004d924807	Give THTensor a constructor, use new/free. (#9496 ) Summary: Stacked on #9495 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9496 Differential Revision: D8875528 Pulled By: ezyang fbshipit-source-id: 6419d2ffb07aaf49c1462e7b64737019abbb7f61	2018-07-17 19:25:37 -07:00
Ilia Cherniavskii	c33d2c0b04	Thread-safe dispatcher table (#9126 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9126 Closes https://github.com/pytorch/pytorch/pull/9126 Allow concurrent read and writes in dispatcher table Reviewed By: smessmer Differential Revision: D8722560 fbshipit-source-id: e376bcd59f1b9f6b0e6fd3dd376a55561ea3c9c3	2018-07-17 17:41:53 -07:00
Huamin Li	13e0c9295d	Add Support for count_include_pad in AveragePool in Caffe2 ONNX Backend (#9458 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9458 The goal is to support count_include_pad in Caffe2 ONNX backend. This commit contains the first step - support 4-D tensor cases. AveragePool with count_include_pad can be expressed as PadImage + AveragePool. Reviewed By: houseroad Differential Revision: D8852180 fbshipit-source-id: 4db00e9771be7a000a2d92850dfd066d9c9c38bf	2018-07-17 17:41:52 -07:00
vishwakftw	1c3580b6fe	Added hash for device (#9246 ) Summary: If this is good, I could write some tests to ensure collision doesn't occur within a given range. Closes #7228 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9246 Differential Revision: D8872608 Pulled By: ezyang fbshipit-source-id: 0ed29a73188f4167b42756f59a5c9a3d5cb37326	2018-07-17 17:10:17 -07:00
tippisum	5c695e3a60	Implement 2D and 3D alpha_dropout (#9073 ) Summary: It implements per-channel alpha_dropout. It also creates corresponding function classes and unifies the process of dropout and alpha_dropout. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9073 Differential Revision: D8727008 Pulled By: ezyang fbshipit-source-id: 9d509f9c5db4e98f7b698cdfc4443505a4d2b331	2018-07-17 17:10:16 -07:00
Yanghan Wang	6116954e97	oss heatmap_max_keypoint_op Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9470 Reviewed By: pjh5 Differential Revision: D8826713 fbshipit-source-id: 47674af86b3a5ae0752056faf3b93f0d96e38fc2	2018-07-17 16:55:47 -07:00
Lin Li	0fe980c748	Memory usage measurement -- Caffe2 (#9017 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9017 Closes https://github.com/pytorch/pytorch/pull/9017 Added "get_blob_size_bytes" to "pybind_state.cc" in Caffe2 to expose the size of blob in bytes. Reviewed By: kuttas Differential Revision: D8685696 fbshipit-source-id: 9a9d38f207c8c59ef534217181e8ce1514617628	2018-07-17 16:40:23 -07:00
Edward Yang	9b0c53ac22	Deduplicate THTensor and THCTensor. (#9495 ) Summary: This is enabled by the allocator patch; previously we could not deduplicate THStorage_free/THCStorage_free; now we can. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9495 Reviewed By: SsnL Differential Revision: D8875497 Pulled By: ezyang fbshipit-source-id: 387198dff446eb9f84d2d6187066fae1d595dea7	2018-07-17 15:41:15 -07:00
Peter Goldsborough	2249751422	Add OptimizerBase::add_parameters (#9472 ) Summary: ebetica asked for a way to add parameters to `Optimizer`s after they are created. ebetica ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/9472 Differential Revision: D8872176 Pulled By: goldsborough fbshipit-source-id: 39a4032c519a6d3b458dd3596361b04afea10365	2018-07-17 14:10:22 -07:00
Gregory Chanan	890037eaaf	Fix (non-reduction) ops over a dimension for n-dimensional empty tens… (#9482 ) Summary: …ors (CPU). This includes (mainly) CPU fixes; CUDA fixes are a little more involved because you can't use an empty grid. This also includes a fix for index_copy, which checked that self.size(dim) == src.size(0), which isn't correct (the same dimension should be compared). Finally, also includes a fix for CUDA flip (although it's not tested yet), to get the stride using multiplication rather than division to avoid divide-by-0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9482 Reviewed By: ezyang Differential Revision: D8873047 Pulled By: gchanan fbshipit-source-id: 86523afd3d50277834f654cd559dfbc7875cdffe	2018-07-17 13:11:04 -07:00
Viswanath Sivakumar	8be4657871	Add ideep copy for TensorCPU<long> in IDEEPFallbackOp (#9480 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9480 Ops like Reshape sometimes take a second input tensor of long with the new shape (can also be specified in arg). If this input tensor is passed in via external input (which ONNX does sometimes), LoadOp fails with an exception. Such ops anyway are executed by IDEEPFallbackOp, so this should be fine. Reviewed By: yinghai Differential Revision: D8872671 fbshipit-source-id: 659a02416c374e373ce041a7d65a174be828702d	2018-07-17 11:55:23 -07:00
Junjie Bai	30f849cdc5	Correct model name in caffe2 onnx backend tests Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9485 Reviewed By: houseroad Differential Revision: D8873733 Pulled By: bddppq fbshipit-source-id: 3a3cc351834cbbedce360760504ea16f5fa0ea06	2018-07-17 11:41:01 -07:00
Edward Yang	d2d43824cd	Delete flag from THTensor. (#9494 ) Summary: It was only used to toggle refcounting, but we ALWAYS refcount tensors. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9494 Differential Revision: D8875169 Pulled By: ezyang fbshipit-source-id: 3a8618fb288334e62942bbaf388f3c9e473e7524	2018-07-17 11:25:41 -07:00
Edward Yang	e5678794ed	Reenable multiprocessing preserve sharing tests on ASAN. (#9498 ) Summary: This issue was fixed in 976f9253a5425918eda7cf865b097cf42b5da8d7 Fixes #5311. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9498 Differential Revision: D8875605 Pulled By: ezyang fbshipit-source-id: 449ffe975d35c959f92874437ba9be37d4d3a1f2	2018-07-17 11:10:21 -07:00
Tongzhou Wang	050a2588b5	change stft to have consistent signature with librosa (#9497 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9497 Fixes #7883 by using `rfft`. It's worth noting that this is BC breaking. And it's impossible to detect the change because the two signatures before and after this change supports a common subset of calling patterns, e.g., `stft(Tensor, int, int)`. (some other calling patterns will raise error). soumith and I plan to change the current `stft` interface because it is a bit messy and non-standard. rafaelvalle suggested us that `librosa` is a good reference API to align with. After discussing with soumith and ezyang , and given that `stft` is only out for 1 release, I decide to go with directly changing the signature. Also, my understanding is that most researchers in this field will welcome this change as `librosa` seems to be the golden-standard here. (it doesn't yet support all `pad_mode` but those will become available if added to `F.pad`.) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9308 Reviewed By: ezyang Differential Revision: D8806148 Pulled By: SsnL fbshipit-source-id: f6e8777d0c34d4a4d7024e638dc9c63242e8bb58	2018-07-17 10:55:43 -07:00
Brian W. Hart	7d2a17876f	test_cuda: ensure tests use float and adjust HalfTensor tolerances (#9475 ) Summary: test_cuda.py uses routine 'number' to prepare many testscases. number should return a floating point value for float-type tensor types, or integer otherwise. But number's test to classify the type is incorrect, so it always returns the integer value. (type(t).__name__ is always 'torch.tensortype' so never matches 'Double', 'Float', or 'Half'.) Update number to use the existing is_floating() helper to make the check. The change to number causes a few tests to fail for HalfTensor. Relax the tolerance for those in line with other HalfTensor testcases. The failing tests--for addcdiv and fill--were not previously relaxed for HalfTensor so are held to the over-strict 1e-5 default tolerance. Finally, update a couple other tests for HalfTensor type to use the existing is_half() helper. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9475 Reviewed By: yf225 Differential Revision: D8872112 Pulled By: ezyang fbshipit-source-id: 016e3e15adb23f6606bd4c08218954c1396699db	2018-07-17 10:25:17 -07:00
vishwakftw	52cc073212	Implement reshape_as (#9452 ) Summary: 1. Added tests 2. Added doc string 3. Remove view_as redundant definition from tensor.py Closes #9416 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9452 Differential Revision: D8851794 Pulled By: ezyang fbshipit-source-id: 0aa0430dd0a174e1a5caddbc50a7e2c9eb7802bc	2018-07-17 08:54:42 -07:00
Thiago Crepaldi	11fc16dc98	Remove HTML tags from README.md (#9296 ) Summary: This change makes README.md compatible with both Github and VSTS markdown engines. Images can be reduced if necessary Pull Request resolved: https://github.com/pytorch/pytorch/pull/9296 Differential Revision: D8874931 Pulled By: soumith fbshipit-source-id: 0c530c1e00b06fc891301644c92c33007060bf27	2018-07-17 07:24:43 -07:00
onnxbot	4ff636a3fd	Update onnx to onnx/onnx@b2817a6 (#9476 ) Summary: `b2817a682f` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9476 Reviewed By: houseroad Differential Revision: D8868253 Pulled By: bddppq fbshipit-source-id: b1f14bab47f020f0bc0239da7e2bbf959a407d6a	2018-07-16 22:17:09 -07:00
Peter Goldsborough	ae44a6b5e3	Fix Sequential::clone() (#9372 ) Summary: I noticed that `Sequential::clone()` does not work. This is because `Sequential` does not use `reset()` which is normally where modules have to initialize and register its submodules. Further, this is because of the way `Sequential` allows its modules to be passed in the constructor, which doesn't work with `reset()` (since it does "late" initialization). I've added some better error messages inside `Cloneable::clone()` which makes this kind of mistake clearer for other users, and tests for `Sequential::clone()`. I also had to give `AnyModule` a deep `clone()` method. ebetica ezyang Pull Request resolved: https://github.com/pytorch/pytorch/pull/9372 Differential Revision: D8865189 Pulled By: goldsborough fbshipit-source-id: b81586e0d3157cd3c4265b19ac8dd87c5d8dcf94	2018-07-16 21:53:42 -07:00
Gu, Jinghui	e8b8c3895e	Enable Conv fusion optimizations in optimizeForIdeep (#9255 ) Summary: Enable fusion for IDEEP in optimizeForIdeep including Conv+ReLU, Conv+Sum, Conv+Sum+ReLU, Conv+BN Pull Request resolved: https://github.com/pytorch/pytorch/pull/9255 Reviewed By: bddppq Differential Revision: D8809030 Pulled By: yinghai fbshipit-source-id: af30bad3b96cb965bd26a4dfa810370faec4bb88	2018-07-16 21:28:50 -07:00
Viswanath Sivakumar	9235ff53f1	Clip horizontal bounding boxes during rotated detection for backward compatibility (#9403 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9403 In BBoxTransform and GenerateProposal ops, clip_boxes makes sure the bbox fits within the images. For rotated boxes, this doesn't always make sense as there could be multiple ways to clip a rotated box within an image boundary. Moreover, clipping to a horizontal box means we leave out pixels of interest potentially. Therefore, we clip only boxes with angle almost equal to 0 (with a specified `angle_thresh` tolerance). Reviewed By: pjh5 Differential Revision: D8828588 fbshipit-source-id: 39c1eafdb5d39d383780faa0a47e76149145e50c	2018-07-16 20:24:49 -07:00
Will Feng	ad74006ffa	Pass THDRequest as void* pointer to THDRequest_free (#9398 ) Summary: This fixes https://github.com/pytorch/pytorch/issues/9054. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9398 Reviewed By: ezyang Differential Revision: D8827778 Pulled By: yf225 fbshipit-source-id: 862287802cb69c6ac71ff4df19cadb89b1face1d	2018-07-16 19:25:22 -07:00
Brad Stocks	c4bff25282	Additional operator information values (#9153 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9153 Closes https://github.com/pytorch/pytorch/pull/9153 Modified the values reported by the benchmarking platform to include tensor_shape and op_args. These values have a different naming scheme to values like flops and latency. Reviewed By: sf-wind Differential Revision: D8729791 fbshipit-source-id: f050200be01c6d0794bf5faaa6e8cef12a00affe	2018-07-16 17:40:44 -07:00
Junjie Bai	7df48d0444	Merge .cu and _gpu.cc files Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9473 Reviewed By: houseroad Differential Revision: D8865754 Pulled By: bddppq fbshipit-source-id: 406eda6c145f03a0ee35c4643ec1ec0092fbce88	2018-07-16 17:10:18 -07:00
Yinghai Lu	45140368c3	Update onnx-tensort module to the latest (#9469 ) Summary: Update onnx-tensort to follow up recent changes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9469 Reviewed By: Maratyszcza Differential Revision: D8866704 Pulled By: yinghai fbshipit-source-id: 3b96ec2fa28470f0d4b5a7c62ab332eeba4bdb12	2018-07-16 17:10:16 -07:00
Wanchao Liang	5ff686651f	move batchop import to init to avoid debugging confusions (#9425 ) Summary: fixes #9409 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9425 Reviewed By: ezyang Differential Revision: D8842844 Pulled By: wanchaol fbshipit-source-id: 3c6b26470d59d8d1fc5f79caa70252b9de7290e4	2018-07-16 15:40:28 -07:00
Edward Yang	80160f6186	Skip PyTorch ROCm tests in the script. (#9467 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9467 Reviewed By: houseroad Differential Revision: D8860794 Pulled By: ezyang fbshipit-source-id: 9b11475d9bb4b3361973865d7f68e562bffbf9d8	2018-07-16 15:40:26 -07:00
Edward Yang	976f9253a5	Eliminate storage views. (#9466 ) Summary: Storage views were previously used to implement CUDA IPC sharing, but they weren't necessary. The new strategy is described in Note [CUDA IPC and the caching allocator]. This also fixes an unrelated bug, where we weren't actually using the Tensor forking pickler, because we didn't register a pickler for torch.Tensor. Fixes #9447. Fixes #46. Signed-off-by: Edward Z. Yang <ezyang@fb.com> CC apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9466 Reviewed By: apaszke Differential Revision: D8859698 Pulled By: ezyang fbshipit-source-id: 3362cb92f6ae4aa37084c57d79b31004bd0b4a97	2018-07-16 15:40:24 -07:00
Zachary DeVito	9ed2190bdb	Add a tagged union type that replaces tensor in the interpreter. (#9368 ) Summary: IValue is short for interpreter value. It is used frequently so a short name is important. This will allow us to implement more non-tensor types in an efficient way and remove many hacks from the compiler. This PR is limited. It only introduces IValue and changes interpreter to use it. Follow up PRs will: * Change the way aten_ops consume non-tensor types so that integer lists, are no longer represented as Tensors. * Introduce TensorList as a fundamental type and remove all vararg handling in gen_jit_dispatch * Change the compiler to implement math on primitive numbers rather than converting to tensors. jamesr66a apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9368 Reviewed By: ezyang Differential Revision: D8817598 Pulled By: zdevito fbshipit-source-id: 29dce80611ce5f6384234de9d12a67861d2b112f	2018-07-16 15:40:22 -07:00
Adam Paszke	9ae77cc1f5	Implement tensor weak references (#9363 ) Summary: Add `WeakTensor` - a `Tensor` counterpart which doesn't keep the data (or any other expensive resources) alive. They can be `.lock()`ed and return `at::optional<Tensor>` if they're still alive. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9363 Reviewed By: ezyang Differential Revision: D8815434 Pulled By: apaszke fbshipit-source-id: 1b3e96503c1285d78ef124c585e65c7630f3253e	2018-07-16 13:10:29 -07:00
Edward Yang	9413fabb0b	Nuke TestCollectEnv (#9459 ) Summary: The tests were too flaky, and the procedure for legitimately updating versions of software too onerous, to warrant continually testing these. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9459 Reviewed By: zou3519 Differential Revision: D8852357 Pulled By: ezyang fbshipit-source-id: 24e99cd00b4252cdeec2a1d9af92456b4a54912a	2018-07-16 13:10:28 -07:00
vishwakftw	b0c5c86492	Add test case for segmentation fault fix in grad_fn (#9457 ) Reviewed By: apaszke Differential Revision: D8863572 Pulled By: ezyang fbshipit-source-id: 13749f51320a4e403644674b0335aed4987fa887	2018-07-16 13:10:26 -07:00
Runtian Zhou	66fe3b5c06	Add peephole optimization for type_as operators. (#9316 ) Summary: If the type_as operator takes in two values with the same type, remove that operator. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9316 Reviewed By: zdevito Differential Revision: D8808355 fbshipit-source-id: 2d5710a6380b22f4568fc38a439061b5340c4eb1	2018-07-16 10:26:56 -07:00
Will Feng	52abcdd0dc	Fix out-of-range error for test_neg (#9431 ) Summary: `test_neg` sometimes fails internally because `random_()` can generate an out-of-range value for CharTensor. This PR fixes it. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9431 Reviewed By: SsnL Differential Revision: D8843284 Pulled By: yf225 fbshipit-source-id: bf516cceb8f780e133fa54f7364c77821eb7c013	2018-07-16 10:26:54 -07:00
Ma, Mingfei	e7f49d1444	add depthwise conv support for mkldnn (#8782 ) Summary: Change-Id: I3836dacc63afc1b5e31b1d706bba6bb13699ba41 beneficial for depth wise convolution on CPU, such as mobilenet, etc. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8782 Reviewed By: SsnL Differential Revision: D8790869 Pulled By: ezyang fbshipit-source-id: 29f410763ce403c2438fc527aa354ff02e1829bf	2018-07-15 17:40:55 -07:00
Alican Bozkurt	8766daeec9	Refactor `_log_sum_exp` (#9173 ) Summary: This PR removes `distributions.utils._log_sum_exp` in favor of `torch.logsumexp`. Also fixes some warnings with `reduce` arg. in `binary_cross_entropy_with_logits` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9173 Reviewed By: SsnL Differential Revision: D8764174 Pulled By: ezyang fbshipit-source-id: b9c4136dbf0182e8ae77082e6448d23a430d5cb6	2018-07-15 17:40:53 -07:00
Karan Dwivedi	97008a64a1	Add ModuleDict and ParameterDict containers (#8463 ) Summary: Addresses: https://github.com/pytorch/pytorch/issues/4048 and https://github.com/pytorch/pytorch/pull/5297#issuecomment-394924139 Pull Request resolved: https://github.com/pytorch/pytorch/pull/8463 Reviewed By: SsnL Differential Revision: D8689291 Pulled By: ezyang fbshipit-source-id: 47e67d9bae1b64ec10771a2c00c56229463b1598	2018-07-15 17:40:52 -07:00
Edward Yang	cffca2926b	Introduce SupervisedPtr, delete THAllocator and THCDeviceAllocator (#9358 ) Summary: See Note [Supervisor deleter] for how SupervisedPtr works. This design is not the obvious one, but there were a lot of constraints feeding into it: - It must support the reallocation usage-pattern, where, given an existing Storage, we allocate a new region of memory, copy the existing data to it, and then deallocate the old region of memory. - Creation of a deleter for memory MUST avoid dynamic allocations in the common case. We've done some benchmarking in Caffe2 where dynamic allocation for deleters is ruinously expensive, and it's really hard to avoid these performance tarpits in very general function wrappers like std::function or folly::Function (while benchmarking this, we discovered that folly::Function's move constructor was way more expensive than it should be). - We need to be able to deallocate data that comes from external sources, e.g., dlpack and numpy tensors. Most notably, you often cannot deallocate these with merely the void* data pointer; you need some extra, out-of-band information (e.g., the managing struct) to deallocate it. Sometimes, you may even want to resize data living in an external source! - The "core" allocators need to support being wrapped in a Thrust allocator, so you need to be implement the following two functions: char* allocate(size_t); void deallocate(char, size_t); - We need to support tensors which contain non-POD, non-trivially copyable data; specifically tensors of std::string. This is an upcoming requirement from Caffe2. It's dirty AF, but it's really useful. - It should use C++ standard library types like std::unique_ptr (which is hugely problematic because std::unique_ptr doesn't call the deleter when the pointer is null.) Here is the billing of changes: - Built-in support for realloc() has been DROPPED ENTIRELY. Instead, you're expected to allocate and then copy from the old memory to the new memory if you want to do a reallocation. This is what you'd generally have expected to occur; and axing realloc() from the design lets us avoid some tricky correctness issues with std::realloc(), namely the fact that we must refuse the realloc if the type of the elements are not trivially copyeable. If it really matters, we can add this back, but there really needs to be a good explanation WHY you need fast resizing reallocations (by in large, people don't resize their storages, and it should be acceptable to have a performance degradation when they do). - TH_STORAGE_FREEMEM is no more; instead, if you want a storage which doesn't free its result, you just give it an empty deleter. - What we used to call an "allocator" (really, a combined object for allocating/deleting) has been split into two concepts, an allocator, and a smart pointer (SupervisedPtr) which knows how to delete data. - Unlike previously, where THAllocator/THCDeviceAllocator could have a per-tensor context storing extra information (e.g., a pointer to the metadata you need to actually free the tensor), there is no context in the allocator or the deleter of the smart pointer; instead, the smart pointer directly holds an owning reference to the metadata necessary to free the data. This metadata is freshly manufactured* upon every allocation, which permits us to resize tensors even in the absence of built-in support for realloc(). - By default, allocators don't support "raw" allocations and deallocations with raw pointers. This is because some allocations may return a different context every time, in which case you need to reconstruct the context at delete time (because all you got was a void, not a unique_ptr that carries the deleter). - The diff between at::Allocator and THCDeviceAllocator is a bit larger: - It used to return a cudaError_t. Now, allocators are expected to check the error status immediately and throw an exception if there was an error. It turns out that this is what was immediately done after all occurrences of allocate/release, so it wasn't a big deal (although some subsidiary interfaces had to themselves be converted to not return cudaError_t). There is one notable exception to this, and it is how we handle CUDA OOM: if this occurs, we attempt to return unused memory to the system and try again. This is now handled by a catch-all try-catch block. The cost of catching the exception is probably the least of your worries if you're about to OOM. - It used to take the CUDA stream to perform the allocation on as an argument. However, it turned out that all call sites, this stream was the stream for the current device. So we can push this into the allocator (and the choice, in the future, could be made explicitly by twiddling thread local state.) - It held two extra methods, emptyCache and cacheInfo, specifically for interacting with some state in THCCachingAllocator. But this "generality" was a lie, since THCCachingAllocator was the only allocator that actually implemented these methods, and there is actually a bunch of code in THC which assumes that it is the caching allocator that is the underlying allocator for CUDA allocations. So I folded these two methods into this interface as THCCachingAllocator_emptyCache and THCCachingAllocator_cacheInfo. - It held its context directly inside the THCDeviceAllocator struct. This context has been moved out into whatever is holding the at::Allocator. - The APIs for getting at allocators/deleters is now a little different. - Previously there were a bunch of static variables you could get the address of (e.g., &THDefaultAllocator); now there is a function getTHDefaultAllocator(). - Some "allocators" didn't actually know how to allocate (e.g., the IPC "allocator"). These have been deleted; instead, you can wrap the produced pointers into SupervisedPtr using an appropriate makeSupervisedPtr() static method. - Storage sharing was a lot of work to wrangle, but I think I've tamed the beast. - THMapAllocator and its "subclasses" have been refactored to be proper, honest to goodness C++ classes. I used the enum argument trick to get "named" constructors. We use inheritance to add refcounting and management (in libshm). What we previously called the "Context" class (Context has been dropped from the name) is now the supervisor for the data. - Sometimes, we need to pull out the file descriptor from a tensor. Previously, it was pulled out of the allocator context. Now, we pull it out of the supervisor of the SupervisorPtr, using the static method fromSupervisedPtr(), which uses the deleter as the typeid, and refines the type if it matches. - I renamed the std::function deleter into InefficientStdFunctionSupervisor, to emphasize the fact that it does a dynamic allocation to save the std::function deleter. TODO: - Windows libshm is in shambles and needs to be fixed. Perhaps for the future: - newFromFd is now unconditionally calling cudaPointerGetAttributes even though this is unnecessary, because we know what the device is from higher up in the callstack. We can fix this by making newWithDataAndAllocator also take an explicit device argument. - Consider statically distinguishing between allocators that support raw_allocate/raw_deallocate, and those which don't. The Thrust constraint applies only to the CUDA device allocator; you never need to allocate CPU memory this way - Really want to get rid of storage views. Ugh. Nontrivial bugs I noticed when preparing this patch: - I forgot to placement-new unique pointers and attempted to assign them directly on uninitialized memory; very bad! Sam Gross has encouraged me to replace this with a proper constructor but I keep putting it off, because once everything goes in StorageImpl there really will be a proper constructor. - I rewrote a number of APIs to use newWithDataAndAllocator instead of newWithAllocator, calling the allocator at the call site (because they required "allocation context" which we no longer give to "allocators"). When I did this, I forgot to insert the multiplication with sizeof(real) to scale from numels to number of bytes. - The implementation of swap on storages was missing it for scalarType and backend. It was benign (because the only case we call swap is when these are the same), but I fixed it anyway. - I accidentally returned a nullptr unique_ptr with no deleter, even though there was a legitimate one. This matters, because some code still shoves its hands in the deleter context to get extra metadata about the function. - I used std::move() on a unique_ptr, and then did a boolean test on the pointer aftewards (always false!) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9358 Reviewed By: SsnL Differential Revision: D8811822 Pulled By: ezyang fbshipit-source-id: 4befe2d12c3e7fd62bad819ff52b054a9bf47c75	2018-07-15 15:11:18 -07:00
bhushan	5eb9d40cc6	Introducing IsInf (#9169 ) Summary: torch.isinf - checks element wise +/- inf implements #9132 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9169 Reviewed By: SsnL Differential Revision: D8768614 Pulled By: zou3519 fbshipit-source-id: dd1b5f6c976deb421d626e22cdd25500ec04d796	2018-07-15 07:55:09 -07:00
Ailing Zhang	fda03406cf	add device to CUDAEvent (#9415 ) Summary: This PR add a device_ member to CUDAEvent. This is necessary since if we create a cudaEvent on one device but destroy it from another, it also creates an additional context on that device. So this device information is needed to guard the cudaEventDestroy. (cc: ngimel is this expected behavior? I can provide a simple cu script to repro this). c10d tests are probably not in CI yet, please let me know how the test are run and I could double check. Thanks pietern apaszke for help debugging! Pull Request resolved: https://github.com/pytorch/pytorch/pull/9415 Reviewed By: apaszke Differential Revision: D8839688 Pulled By: ailzhang fbshipit-source-id: b950ba37d57b9e3c5fe71726ec92f6a9601c4d0e	2018-07-14 13:38:41 -07:00
Thomas Viehmann	a4f63576b6	Make localScalar error message more intuitive (#9443 ) Summary: Fixes: #9419 This assumes that anyone who knows localScalar can also grep for the error message or get a traceback. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9443 Reviewed By: soumith Differential Revision: D8850718 Pulled By: ezyang fbshipit-source-id: a106fee718fef97064e861810a49ca05f536f27e	2018-07-14 12:24:56 -07:00
Thomas Viehmann	8444e1660b	Only accept continguous tensors in TopK for cuda (#9441 ) Summary: Fixes: #9421 I don't think it is easy to deal with non-contiguous array in cuda topk, so I'm adding a check. The argument number is a bit confusing when it shows in PyTorch but it is consistent with the other checks. (Not sure whether it would make sense to eliminate argument numbers from the error TH/THC error messages given that they're probably off more than once...) Do we need a test that it indeed refuses non-contiguous? Pull Request resolved: https://github.com/pytorch/pytorch/pull/9441 Reviewed By: soumith Differential Revision: D8850719 Pulled By: ezyang fbshipit-source-id: d50561bb37ed50ab97aeaf54d8e3fc6c765bdc7c	2018-07-14 12:24:52 -07:00
Mark Richardson	88146484b4	Add support for .norm() pytorch onnx export and ReduceL1/ReduceL2 caffe2 operators (#9299 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9299 Onnx has ReduceL1 and ReduceL2 operators that would facilitate this, so allow pytorch to export those and allow caffe2 to run them. I only implemented this on CPU so far. Reviewed By: pjh5 Differential Revision: D8757381 fbshipit-source-id: 68afc9e2f90042a70929b73ace05a499b5c670c7	2018-07-14 10:54:13 -07:00
James Reed	7160846c81	Only view() rhs of index_put if we need to (#9424 ) Summary: During tracing (and export) we are now introducing an unnecessary hard-coded view on the RHS of indexed assignments such as `tensor[idxs] = rhs`. This caused a regression in the PyTorch translate models because these expressions appear with variable sizes in the RHS. This change makes it so we only call view if we indeed need to strip leading 1-dimensions Pull Request resolved: https://github.com/pytorch/pytorch/pull/9424 Reviewed By: colesbury Differential Revision: D8838881 Pulled By: jamesr66a fbshipit-source-id: 399e5daa7d021f4f59f6f92b9fae581f92bfc538	2018-07-14 00:10:21 -07:00
Zhaoheng Ni	5ac8a80f8b	Add BatchBucketizeOp in caffe2 (#9385 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9385 The operator transform dense features to sparse features by bucketizing. Only the feature in indices tensor will be transformed and output. Reviewed By: bddppq Differential Revision: D8820351 fbshipit-source-id: a66cae546b870c6b2982ac20641f198334f2e853	2018-07-13 20:39:30 -07:00
Jian Zhang	099a6d5e08	Implementation of Wngrad optimizer caffe2 python wrapper and unit test on least square regression (#9001 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9001 Closes https://github.com/pytorch/pytorch/pull/9001 We added caffe2 python wrapper and unit test for the Wngrad C++ operator. Reviewed By: chocjy Differential Revision: D8655724 fbshipit-source-id: fb259afd6fd50231691bd75c52852b20a1e1aec8	2018-07-13 18:54:52 -07:00
Jian Zhang	9e2f2cab94	Implementation and operator test for Wngrad optimizer (#8999 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8999 Closes https://github.com/pytorch/pytorch/pull/8999 Implemented the WRgrad optimizer operator for dense (base case as well as the case with additional output for effective learning rate and update value) and sparse case. Reviewed By: pjh5 Differential Revision: D8627933 fbshipit-source-id: a63cde46c04bcc6b428ab5f77a4b3b2beb66c046	2018-07-13 18:11:41 -07:00
Vishwak Srinivasan	86eeeab758	Fix segmentation fault in grad_fn (#9292 ) Summary: Fixes #8774 . Reviewed By: soumith Differential Revision: D8836478 Pulled By: apaszke fbshipit-source-id: f113bf47fe493be9f095a5a5490caf08dbb44e38	2018-07-13 14:46:13 -07:00
Liyuan Liu	bcd20f96e0	update docs (#9423 ) Summary: minor modification: fixed the incorrect comment format for ```split_size_or_sections``` (https://pytorch.org/docs/master/torch.html#torch.split) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9423 Differential Revision: D8841367 Pulled By: soumith fbshipit-source-id: 2d09a38ce8d278ac29b3864e8d09a91cd296196c	2018-07-13 13:55:35 -07:00
Peter Goldsborough	fd25a2a86c	Remove virtual+override anti-pattern (#9335 ) Summary: I'm cramming through clang tidy emitted warnings. This PR addresses the `hi-cpp-override` check which warns that `virtual` + `override` is redundant, since `override` already signifies that a function is overriding and thus virtual. Where there was `virtual` + `override` I removed the `virtual`, where there was `virtual` and no `override` I removed `virtual` and added `override`. ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9335 Differential Revision: D8807082 Pulled By: goldsborough fbshipit-source-id: e0a261053f6540a22cc56ec160a24aa285af6319	2018-07-13 11:25:01 -07:00
Jesse Hellemn	c6376cf999	A reasonable way to detect Python include dirs and library Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9361 Reviewed By: ml7 Differential Revision: D8837706 Pulled By: pjh5 fbshipit-source-id: 6979f9f37709c23e72b9169531787a60f3b37254	2018-07-13 11:25:00 -07:00
Michael Carilli	cc9dcdff16	Improving THCReduce.cuh's performance on latency-bound non-contiguous reductions (#9214 ) Summary: This PR improves perfomance of (formerly) latency-bound non-contig-dim reduction kernels by up to 20X, while maintaining determinism. Currently, reducing across a non-contiguous dimension uses the parallelism exposed across the number of output elements. This means that performance suffers if the number of output elements is small. Example: ``` a = torch.cuda.FloatTensor(32768, 32) a.sum(dim=0) ``` Before this PR, `a.sum`'s kernel (kernelReduceNoncontigDim_shared) took 138 microseconds on my machine. The speed-of-light estimate (based on a bandwidth of 700 GB/s) should be around 6 microseconds. After this PR's changes, `a.sum(dim=0)`'s kernel takes 6.9 microseconds on my machine. Christian implemented some nice logic to squeeze out better performance for cases like `a.sum` using intra-block and instruction-level parallelism across the dimension being reduced, but his kernel still only launched one block for every 32 output elements. This was insufficient to saturate the device in many cases, like `a.sum` here (where only one block is launched). My PR adds block cooperation across the dimension being reduced. Many blocks, instead of one block, help to reduce into each 32 output elements. Internally, each block leverages all of Christian's nice logic to compute a partial reduction into a per-block staging buffer, then the last block to finish combines the results to compute the final output. Block cooperation does require THCudaMalloc-ing staging and semaphore buffers, so it's not always worthwhile. I included a set of rough heuristics to decide when the kernel should choose to use block cooperation. These heuristics are based on Python-side timings of calling sum() many times in a loop, and comparing to the old implementation. I tested a wide range of sizes (to determine heuristics) and as long as the number of output elements is greater than 16ish, I don't think there are any remaining pathological sizes where users will encounter unexpectedly poor performance. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9214 Reviewed By: gchanan Differential Revision: D8808127 Pulled By: colesbury fbshipit-source-id: 139f310fc6ea6d187a7c983128f8eb8e1c9b4be3	2018-07-13 11:10:51 -07:00
Sam Gross	06e47d88b5	Remove ScalarConvert and cast_wrapper in favor of static_cast (#9401 ) Summary: While talking to mruberry, I noticed a few places that use special cast wrappers that are no longer necessary. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9401 Differential Revision: D8828874 Pulled By: colesbury fbshipit-source-id: 2b7fe7ac3af3b71be26b43a9ad3949f8065a7bc9	2018-07-13 10:25:05 -07:00
Gregory Chanan	57a05983be	Move non-dimension reduction var/std to native wrappers. (#9400 ) Summary: This is to unify the handling of empty tensors in std/var between the dimension reduce and all reduce cases. Also to avoid triggering ubsan errors around divide by 0. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9400 Reviewed By: ezyang Differential Revision: D8828879 Pulled By: gchanan fbshipit-source-id: 6b9306805c94251eec28bd12e234618338bff4e3	2018-07-13 08:25:41 -07:00
Gregory Chanan	f09828ee0e	Support n-dimensional empty tensors in TensorShape methods. (#9362 ) Summary: This includes either bug fixes or NumPy semantics changes for the following methods: chunk, diagonal, unfold, repeat, flatten, reshape, split, unsqueeze. The n-dimensional empty tensor feature is still hidden behind a feature flag. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9362 Reviewed By: ezyang Differential Revision: D8817002 Pulled By: gchanan fbshipit-source-id: 6ff704ec96375f00b4dd39ebcd976efac0607fb4	2018-07-13 08:25:40 -07:00
Thomas Viehmann	3799b10c44	various documentation formatting (#9359 ) Summary: This is a grab-bag of documentation formatting fixes. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9359 Differential Revision: D8831400 Pulled By: soumith fbshipit-source-id: 8dac02303168b2ea365e23938ee528d8e8c9f9b7	2018-07-13 02:48:25 -07:00
Xiaomeng Yang	bb9ff58c6d	Add cudnn activation ops (#9379 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9379 Add cudnn activation ops Reviewed By: houseroad Differential Revision: D8818013 fbshipit-source-id: d3881c634a46578b9331da07f9fdf7e1f31d7e8a	2018-07-12 23:18:56 -07:00
Alexander Sidorov	b15a7d05ce	Inference benchmark: NUMA-awareness + multi-model support Summary: Pure experimental addition to guide us on delivering this into real production systems and their threadpools. Biggest limitation now is that we need to turn off BlackBoxPredictor activation deallocation logic to get to sane performance Reviewed By: highker Differential Revision: D8798029 fbshipit-source-id: ec7962689d605fba62b2c9e0904309df567a25a4	2018-07-12 20:09:19 -07:00
Vishwak Srinivasan	cd3e067e46	Add reversed(torch.Tensor) (#9216 ) Summary: Closes https://github.com/pytorch/pytorch/issues/3376 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9216 Differential Revision: D8753933 Pulled By: soumith fbshipit-source-id: 5dac9b8b11ff34a205b6478db99b02fda8bd9cce	2018-07-12 19:42:07 -07:00
Sebastian Messmer	04fce5eca6	Remove dummy c10 folder (#9367 ) Summary: This was previously meant to be used for c10 code but that plan since changed Pull Request resolved: https://github.com/pytorch/pytorch/pull/9367 Reviewed By: orionr Differential Revision: D8814361 Pulled By: smessmer fbshipit-source-id: 8e35fa74e160343a2bb8432013847677aa73695a	2018-07-12 19:14:55 -07:00
103yiran	117a5c3cc0	fix the annotation Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9380 Differential Revision: D8821294 Pulled By: zou3519 fbshipit-source-id: b375cd0de9042bcaef1d22de104966fb704bd43e	2018-07-12 18:53:59 -07:00
Peter Goldsborough	4a796e4430	Initialization functions (#9295 ) Summary: To allow our C++ customers to use our initialization methods as well, this PR moves some of the code from `torch.nn.init` to ATen, calls it from Python, and adds equivalent code to the C++ frontend. Notes: 1. Happy to hear thoughts on whether it's ok to have e.g. `torch.nn.init.dirac_` and `torch.dirac_` (the former has a `no_grad` guard). We have this for `ones_` and stuff too, so I don't mind it. 2. I left the exception checking in Python because they throw `ValueError`s while ATen errors show as `RuntimeError`s. I imagine this would break users' error handling if someone were to have a `try`-`except` handler for `ValueError` (or maybe it's a far fetch) EDIT: After discussions with zdevito, the PR now simply duplicates the code in C++ exclusively for the C++ API, and we leave the Python code as-is (to make it easier for people to read/modify). ebetica ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9295 Differential Revision: D8813793 Pulled By: goldsborough fbshipit-source-id: 4b969f3f75952c1be4e837e19e23b8098e5fbd4b	2018-07-12 18:53:57 -07:00
Ruochen Liang	e90860780b	Migrate PriorCorrectionCalibration to Dper3 Summary: Migrated PriorCorrectionCalibration from Dper2 layer to Dper3 module. A few notes: 1. Calibration operators need dynamic linking; 2. All calibration implementation and tests are located in /modules/calibration/ 3. Added a type inference function in operator_shcema.h/operator_schema.cc Reviewed By: idning Differential Revision: D8756832 fbshipit-source-id: 7e6300a3bb3d3feaaf3b82340ece2f35d71493fc	2018-07-12 18:40:07 -07:00
Pieter Noordhuis	2ead3b0e54	Update include paths to use c10d prefix everywhere Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9397 Reviewed By: goldsborough Differential Revision: D8825909 Pulled By: pietern fbshipit-source-id: 25af272819e04eacbb6bd69e3f1c03c78f091d13	2018-07-12 17:55:22 -07:00
Syed Tousif Ahmed	34554d6adb	Enable standalone build of ATen (#9377 ) Summary: This PR changes the ATen `CMakeLists.txt` slightly, to enable standalone build of ATen inside PyTorch. Currently, the tests in ATen gets linked to `libcaffe.so libcaffe2.so`. As a result, ATen can't be built standalone without building from the root pytorch directory. I know that there is a big merge happening between caffe2 and pytorch and hence, the purpose of this PR is to really start a conversation on what would be the proper way of migrating the CMakeLists to enable clean builds. We should also follow up on this PR: https://github.com/pytorch/pytorch/pull/7275. For your reference, that PR has the explanation for why `-Wl --no-as-need` is needed. Moreover, without `set(ATen_CUDA_SRCS ${all_cuda_cpp})`, the standalone build will throw unresolved references. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9377 Reviewed By: smessmer Differential Revision: D8825921 Pulled By: orionr fbshipit-source-id: c521159b4885639fc7990a9819202051455d07db	2018-07-12 14:25:00 -07:00
Pieter Noordhuis	43103af7a7	Use at::DeviceGuard everywhere (#9396 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9396 The custom and local CUDADevice RAII wrapper has been superseded by at::DeviceGuard so it doesn't make sense to keep it around. Reviewed By: ailzhang Differential Revision: D8824200 fbshipit-source-id: 39fa00ffab4f495606c8001446e976bbf603e866	2018-07-12 13:43:47 -07:00
Junjie Bai	99dbcd0451	set CMAKE_HIP_ARCHIVE_APPEND (#9394 ) Summary: petrex To make `-DBUILD_SHARED_LIBS=OFF` working Pull Request resolved: https://github.com/pytorch/pytorch/pull/9394 Reviewed By: mingzhe09088 Differential Revision: D8822947 Pulled By: bddppq fbshipit-source-id: 4fb213c723138804fb0fdb3b381e32623cf14468	2018-07-12 12:24:49 -07:00
Chenguang Xi	feaee21968	Plotting embeddings norm being slow in distributed training. (#9325 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9325 as title. Fixing by calculating norm on same device. Reviewed By: chocjy Differential Revision: D8668136 fbshipit-source-id: 6671a1858da4b0a6f766f067b7fa648a072cd219	2018-07-12 11:51:23 -07:00
Jesse Hellemn	374fee4804	Minor cleanup to scripts Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9354 Reviewed By: orionr Differential Revision: D8810415 Pulled By: pjh5 fbshipit-source-id: 792b0dc6f6a4fabde38e2ad4475963526204914c	2018-07-12 10:54:44 -07:00
Alican Bozkurt	d017e1798f	add erfc Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9366 Differential Revision: D8816768 Pulled By: soumith fbshipit-source-id: 7d709f932cf156a2e7ec71c710837beb7f647d66	2018-07-12 08:32:02 -07:00
Gregory Chanan	b154761547	Guard nullptrs around memcpy. Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9370 Reviewed By: ezyang Differential Revision: D8816996 Pulled By: gchanan fbshipit-source-id: 8cad41a5259774d86e94807eb4a7f43f66fdf47f	2018-07-12 08:32:00 -07:00
Mary McBreen	483ae8cb5d	Replaces const ref with && for apply (#9175 ) Summary: Addresses https://github.com/pytorch/pytorch/issues/5011 Tested with python test/test_autograd.py Pull Request resolved: https://github.com/pytorch/pytorch/pull/9175 Reviewed By: zdevito Differential Revision: D8736377 Pulled By: marymcbreen fbshipit-source-id: ff86f427f7b2cf0cab5912e7f32812bd0f49a712	2018-07-12 08:31:59 -07:00
Adam Paszke	e1863778e3	Guard gloo algorithm creation with DeviceGuard (#9371 ) Summary: Let us avoid creating a context on GPU0 unnecessarily. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9371 Reviewed By: pietern Differential Revision: D8817343 Pulled By: apaszke fbshipit-source-id: a6cc91a1dd127840486a42c64f97f117475b0d5f	2018-07-11 23:08:31 -07:00
Pieter Noordhuis	aeccec755d	In Gloo backend use ring reduction by default (#9309 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9309 This is faster when you're dealing with a small number of processes. Around the 16 processes mark the halving/doubling algorithm is faster. Reviewed By: apaszke Differential Revision: D8785364 fbshipit-source-id: 4a03326266e473026d943787186e149d0cc489f0	2018-07-11 21:40:01 -07:00
Tongzhou Wang	00b4b4703e	fix unsqueeze doc (#9374 ) Summary: fixes #9348 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9374 Differential Revision: D8817215 Pulled By: SsnL fbshipit-source-id: 047661ae4556bb19e4cd125b01a3fd75ed6642f3	2018-07-11 21:25:44 -07:00
Hassan Eslami	7f38ea4555	Remove unused feature: num PS tuning Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9293 Reviewed By: huitseeker Differential Revision: D8778499 fbshipit-source-id: 0cf59e02cb37b3fe22885c1b5e10b5d2e7585382	2018-07-11 18:54:45 -07:00
Chunli Fu	a487b08c2e	AutoBatching - IR transformation(basic operators) (#9198 ) Summary: Use decorator `torch.jit.batch` to implement auto-batching (call `to_batch` pass to do IR tranformation). - `to_batch` pass: "to_batch.h/cpp" in csrc/jit/passess to transform a graph to a new batched graph. - Write several basic operators for BatchTensor (add, mul, sigmoid, tanh, mm, matmul, select). - Register the operators in a lookup table `<std::string, std::shared_ptr<Graph>>`. (use the Graph to replace the original node in IR graph) Move BatchTensor in python from torch.BatchTensor to torch.jit.BatchTensor Pull Request resolved: https://github.com/pytorch/pytorch/pull/9198 Reviewed By: zdevito Differential Revision: D8744466 Pulled By: ChunliF fbshipit-source-id: 9ea56a30f55cb870f13a2069a47cc635419763ff	2018-07-11 18:25:07 -07:00
Akshay Chalana	e30ff68410	Add Hardtanh Export (#8804 ) Summary: Added hartanh CPU/GPU Implementations and backend tests to Caffe2 Pull Request resolved: https://github.com/pytorch/pytorch/pull/8804 Reviewed By: bddppq Differential Revision: D8813987 Pulled By: houseroad fbshipit-source-id: 2480296eab3373425b9e1734a10c009b4f5d3e26	2018-07-11 18:09:51 -07:00
Lu Fang	1a8e826ed4	Skip the count_include_pad in average pool for now (#9365 ) Summary: Will create a bootcamp task. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9365 Reviewed By: bddppq Differential Revision: D8813889 Pulled By: houseroad fbshipit-source-id: bce1eaafd0efb3c27c0f71fcc40a8313e2b1c7b8	2018-07-11 18:09:50 -07:00
Peter Goldsborough	153e2e96d4	Make Sequential ref-counted (#9151 ) Summary: In the C++ API, `Sequential` currently was not refcounted itself, but stored `shared_ptr<AnyModule>` to get the reference semantics. This is unfortunate because most modules in the API are accessed via `->`, e.g. `Linear l(1, 2); l->forward(...);`. `Sequential` was different in that it had value semantics itself, thus was accessed via `.`. This PR makes `Sequential` store `AnyModule` (without extra indirection), and uses the same pImpl mechanism we use for all other modules to make `Sequential` have reference semantics itself. This makes it consistent with the rest of the library. It also removes one level of indirection inside of `Sequential`, which is cool. One thing I had to change was that the `ModuleHolder` with which the whole pImpl thing is implemented previously did some tricks to make `Linear(3, 4)` actually construct `Linear(LinearOptions(3, 4))`. This doesn't work well with `Sequential` since it takes a variadic parameter pack. Instead, I made `ModuleHolder` forward all arguments to the underlying module, and then further pushed the trick to forward parameters to modules' options types into the actual Modules. This adds one constructor per Module in the library. This is not something user modules have to do (unless they want this nice forwarding themselves). It makes the code simpler overall. ezyang ebetica apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9151 Reviewed By: ezyang Differential Revision: D8809298 Pulled By: goldsborough fbshipit-source-id: da68452c3de912fbc67af330ba93b5220de6909f	2018-07-11 17:24:59 -07:00
Ilia Cherniavskii	94bc4c6091	Ensure pending tasks are finished in case of failure (#9290 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9290 Ensure pending tasks (e.g. network ops) are finished when net fails Reviewed By: heslami Differential Revision: D8777230 fbshipit-source-id: e57fcf1df6aa0ed8847923391502b666edb43674	2018-07-11 15:39:46 -07:00
Yan Shang	8253947256	Make error message more informative (#9352 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9352 I am debugging a failed workflow f61490672, and found the original error message to be not informative. Differential Revision: D8808181 fbshipit-source-id: 3f524ca092881186a492c5c0456124ce31d54751	2018-07-11 15:09:46 -07:00
Orion Reblitz-Richardson	7f33ec55b2	Fix Eigen issue on OS X with CUDA and nvcc compile (#9350 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9350 Re-apply #9270 Breaking this out of #8338 This takes care of the Eigen failure we saw on Mac CUDA builds when BUILD_CAFFE2 and BUILD_ATEN were removed. Fix is to isolate Eigen from headers included by cu files and processed by nvcc. This was worked on with smessmer. Reviewed By: mingzhe09088 Differential Revision: D8794431 fbshipit-source-id: de656334af46c697802073f8e8d9a6aeb9ca65a7	2018-07-11 14:00:05 -07:00
Xiaomeng Yang	cbcf45274b	Move tanh function to math (#9328 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9328 Move tanh function to math Reviewed By: houseroad Differential Revision: D8794745 fbshipit-source-id: ea525dedde6f53592b06c2caffd6426688dea5fc	2018-07-11 13:59:50 -07:00
Mingzhe Li	7d8b532c1f	Fix CUDA build failures (#9347 ) Summary: Breaking this out of #8338 This fixes some CUDA related build and runtime issues after BUILD_CAFFE2 and BUILD_ATEN are removed. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/9347 Reviewed By: orionr Differential Revision: D8806954 Pulled By: mingzhe09088 fbshipit-source-id: 9f8e3feee06478d1ac2deb30796939453352d388	2018-07-11 13:39:59 -07:00
Yinghai Lu	80380f637c	Fix to make ONNXIFI flow work (#9340 ) Summary: Small step to have Relu test work. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9340 Reviewed By: bddppq Differential Revision: D8807018 Pulled By: yinghai fbshipit-source-id: 429f3185e12afb12aaecfea8dd9595fdf838d356	2018-07-11 13:09:41 -07:00
Peter Goldsborough	18a975210d	Add explicit to conversions (#9336 ) Summary: Another code-mod for clang-tidy: Conversion operators should be marked explicit so that they don't cause unwanted implicit conversions. This is especially important for `operator bool()`, see https://stackoverflow.com/questions/39995573/when-can-i-use-explicit-operator-bool-without-a-cast ezyang apaszke Pull Request resolved: https://github.com/pytorch/pytorch/pull/9336 Reviewed By: apaszke Differential Revision: D8807065 Pulled By: goldsborough fbshipit-source-id: 0e9f4ebd0048a2a510c0d05fa410695d7e977eb1	2018-07-11 12:10:30 -07:00
Viswanath Sivakumar	c2dd90c40e	Add angle normalization for rotated boxes (#9056 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9056 Closes https://github.com/pytorch/pytorch/pull/9056 Updates bbox_transform for rotated boxes with angle info to normalize the predicted angle to be within [angle_bound_lo, angle_bound_hi] range. Reviewed By: pjh5 Differential Revision: D8706240 fbshipit-source-id: f3ee834cf362736136e285f0f8f0c063af94a879	2018-07-11 11:25:54 -07:00
Viswanath Sivakumar	9126f95ac3	GenerateProposals and BoxWithNMSLimit ops: Add support for rotated boxes (#8953 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8953 Closes https://github.com/pytorch/pytorch/pull/8953 Based on RRPN paper: https://arxiv.org/abs/1703.01086 Reviewed By: pjh5 Differential Revision: D8655687 fbshipit-source-id: 4985739e585c07dd406b9386dc7f46ad93576798	2018-07-11 11:25:52 -07:00
Viswanath Sivakumar	491f317b24	NMS util for rotated boxes (#8954 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8954 Closes https://github.com/pytorch/pytorch/pull/8954 Based on RRPN paper: https://arxiv.org/abs/1703.01086 Reviewed By: pjh5 Differential Revision: D8618673 fbshipit-source-id: 4c54297e3b3bf614de4d7c0146176a419518790a	2018-07-11 11:25:49 -07:00
JerryShih	8da936ab52	Fix the build break for python3.7 PyUnicode_AsUTF8AndSize() prototype changing (#9259 ) Summary: https://docs.python.org/3.7/c-api/unicode.html#c.PyUnicode_AsUTF8AndSize The return type changes from "char" to "const char". Pull Request resolved: https://github.com/pytorch/pytorch/pull/9259 Reviewed By: orionr Differential Revision: D8776219 Pulled By: pjh5 fbshipit-source-id: e5eadf71264002ba57cfb68dd39686a7ec074092	2018-07-11 10:39:43 -07:00
Adam Paszke	b9f575fc33	Remove legacy code from the JIT (#9323 ) Summary: In particular, get rid of backward tracing and CppOp. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9323 Reviewed By: ezyang Differential Revision: D8795935 Pulled By: apaszke fbshipit-source-id: fb7a7eeee41902da35f2a8efd77262ca60fd6bbe	2018-07-11 10:25:38 -07:00
Richard Zou	05559b4071	Accumulate MSELoss reduce=True into accreal instead of real (#9287 ) Summary: THNN was accumulating the result of reduction loss functions into real instead of accreal. This was causing precision issues with MSELoss. This patch only fixes MSELoss. Some of the other losses exhibit bad precision as well (because they accumulate into real instead of accreal) and require more investigation. I will open an issue for those (#9286) Fixes #8710 cc li-roy SsnL Pull Request resolved: https://github.com/pytorch/pytorch/pull/9287 Reviewed By: SsnL Differential Revision: D8775708 Pulled By: zou3519 fbshipit-source-id: d1a1f159deee0cb90fd8e81e63b246115eea8e9e	2018-07-11 10:25:36 -07:00
Viswanath Sivakumar	748a90d05b	BBoxTransform op: Add support for rotated boxes (#8952 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8952 Closes https://github.com/pytorch/pytorch/pull/8952 Based on RRPN paper: https://arxiv.org/abs/1703.01086 Reviewed By: pjh5 Differential Revision: D8598547 fbshipit-source-id: 3699379df9bf45ed5bdd395175a0e26a77e079f7	2018-07-11 10:25:34 -07:00
Zachary DeVito	01cffaa7e8	fix extra output in generate_code.py (#9339 ) Summary: operator.cpp is not generated. removing the line prevents generate_code.py from always thinking it is out of date and running. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9339 Reviewed By: ezyang Differential Revision: D8798689 Pulled By: zdevito fbshipit-source-id: f25a2e215fec29aa51571e6a31771f0f91e7a213	2018-07-11 10:25:31 -07:00
Thomas Viehmann	b2a74d17ad	document torch.utils.dlpack (#9343 ) Summary: dlpacks deserve documentation. :) I wonder whether it might make sense to merge the various small torch.utils pages (and include a link for the larger ones, e.g. data) to enhance the structure in the docs. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9343 Differential Revision: D8801227 Pulled By: soumith fbshipit-source-id: 2980d271971743b86f052bec5a2cb4d146a90d9b	2018-07-11 07:46:09 -07:00
Lu Fang	04a7fc1dc4	Add Upsample support in C2 onnx backend for opset 1 Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9327 Reviewed By: ailzhang Differential Revision: D8798462 Pulled By: houseroad fbshipit-source-id: d7d1127a853de6a7bb8fdef146f283487e1e5569	2018-07-10 22:43:25 -07:00
Huamin Li	fb9f9c9ba2	Implement Sinh and Cosh (#9213 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9213 Closes https://github.com/pytorch/pytorch/pull/9213 Added hyperbolic trig functions Sinh and Cosh Reviewed By: BIT-silence Differential Revision: D8752566 fbshipit-source-id: 5a58336a5153ec804404b9ac7b10b5662ede3cb7	2018-07-10 18:55:31 -07:00
Christian Puhrsch	00aeb0b84b	Privatize values for vec256 (#9321 ) Summary: Helps prevent calling functions of the base case on float/double/int subclasses that aren't supported. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9321 Reviewed By: colesbury Differential Revision: D8793627 Pulled By: cpuhrsch fbshipit-source-id: 7fde779ecd4b890dda406f3d1306b58bab40efe2	2018-07-10 18:11:16 -07:00
Johannes M Dieterich	b4c66459c5	Add pyHIPIFY scripts needed for ROCm transpilation to PyTorch (#8812 ) Summary: As discussed in call, this will allow us to keep this integral part of the effort to run PyTorch on ROCm in sync with the main code. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8812 Reviewed By: ezyang Differential Revision: D8796245 Pulled By: bddppq fbshipit-source-id: 8e12c2acf6a7e0740f31b21e50be74e10ed8b12c	2018-07-10 18:02:43 -07:00
Roy Li	a47a30b9ce	Implement grid_sampler in aten (#8929 ) Summary: Partially addresses #8928. Maybe #7273? Pull Request resolved: https://github.com/pytorch/pytorch/pull/8929 Reviewed By: ezyang Differential Revision: D8668919 Pulled By: li-roy fbshipit-source-id: 8ad07b224d2ab211c274c4c10f042501efaae32c	2018-07-10 15:10:24 -07:00
Keren Zhou	ea1869244f	Change depthwise convolution bandwidth formula (#9317 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9317 Change depthwise convolution bandwidth formula Reviewed By: hlu1 Differential Revision: D8786684 fbshipit-source-id: ba76fea94a6d2fda8d87f40dd626b3dfd90770ed	2018-07-10 14:24:10 -07:00
Zachary DeVito	0a679105ff	Fix missing accept file changes Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9313 Reviewed By: ezyang Differential Revision: D8789043 Pulled By: zdevito fbshipit-source-id: 283607116c49a4f3a82658d9b4d45f5df3ae283b	2018-07-10 13:39:24 -07:00
Christian Puhrsch	e9e47ce8f1	Vectorize sigmoid (#8612 ) Summary: This PR ports the vectorization of sigmoid to also enable better performance for non-contiguous arrays. Detailed timings will follow shortly. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8612 Reviewed By: ezyang Differential Revision: D8712298 Pulled By: cpuhrsch fbshipit-source-id: 01a3d06af8d04513edd024ab1d01a6b753fc6f6a	2018-07-10 12:40:39 -07:00
Zachary DeVito	efefd1d7cf	Unify aten_dispatch and aten_schema into a single operator abstraction with human-readable schema. (#8885 ) Summary: This is a series of two commits that should probably be read separately. They are stacked on top of #9018 since the second commit requires it for correctness. Commit 1 ======= This commit is the first in a series that will clean up how we handle declaring operators and intrinsics in the JIT to make it more modular and readable. This introduces readable declarations that can be used to register operators and switches gen_jit_dispatch to generate this schema. A follow up PR will remove the dispatch keys like "add-3" and resolve ops directly based on the registered schema, further simplifying the generation process. * Switches schema over to parsed declarations, in the future this will allow something like: ``` registry.register_intrinsic("foo(Tensor a, Tensor b) -> Tensor", [](Stack& stack) { ... }) ``` This will allow the scalable registration of intrinsics for lists, tuples, and other ops, as long as meta-data for these ops (e.g. derivatives and size propagation routines). The declarations resemble those used by PythonArgParser but have been singificantly cleaned up to minimize the number of types that can appear in the declaration. We should strive to get the other parts of PyTorch switched over to this restricted declaration set when possible, but it is too much to do in a single PR. My hope is that eventually we will use a very similar language to describe declarations in C10, and this can serve as a guide for that. Parsing is done using the script lexer, so it is very robust to whitespace and extensible for future types. This removes the other way we encoded schema, and makes it easier to see what schema are registered. Current generated declarations: https://gist.github.com/zdevito/a96a17766fb3a098d69a91ee00abaaf6 * Switches how we handle attempting to use an integer in the place of a fixed-sized int list, such as in conv (e.g. 'int[3] stride=1'). Now that we can statically distinguish between int and Tensor, we handle the expansion as an implicit conversion in the compiler. This allows us to simplify the interpreter since it no longer needs to handle the conversion itself. * Schema declarations have been changed so that they match the type system in the IR exactly. In particular, attribute_info which was used by liftConstantAttributes has been dropped and constant attributes are lifted purely based on the type of the input. Type conversions in compiler have been simplified due to this change. * Error highlighting in ErrorReport now only reports at most 20 lines of code, to make reading where an error occurred easier. Commit 2 ======= This commit unifies aten_dispatch and aten_schema into a single Operator object that both contains schema and implementation information. In the future we can use this object to also contain functionality like shape prop and autodiff needed by all operators. Operators are registered globally, and dispatch logic uses the schema information to figure out which variant to use. Descriptor keys, a frequent source of inscrutable debug errors, have been removed. * Introduce Operator, to replace TensorOp. Unlike TensorOp, we use Operator for all op implementations, including primitives that may occur in the graphs. The only exceptions are ops that are only known to the interpreter like jumps, and GraphExecutors where we need to record additional debug info. * Adds a global registry for Operator implementations. aten_dispatch.cpp turns into register_aten_ops.cpp, which registers all the Operators for aten with the operator registry. register_prim_ops.cpp now contains the implementations for primitive operators that used to be in the interpreter. This means that it is now safe to use `getOperation(node)` to lookup the true interpreter function for the node, which will simplify const-propagation passes. * Remove addInterpreterOpHandler in favor of global operator registry. * Instead of descriptors, we match Node arguments directly against FunctionSchema describing expected inputs in `matchSchema`. `matchSchema` knows how parse both attributes and positional inputs from a node and match it to the appropriate registered operator. Debug error messages when we try to run an invalid operator are significantly improved: they now automatically display the schema for the op with the same name that are registered. * Merge aten_schema into regsiter_aten_ops. Each Operator takes a string schema which is parsed to determine when to dispatch to that op. * Cleans up gen_jit_dispatch.py now that we do not need to write out descriptors. In particular, skip_scalar_overloads can be removed since Richard's code sorts declarations to put Tensor, Tensor declarations first. * remove matchSchemaAndLiftConstantAttributes and use emitBuiltinCall instead to remove code duplication * refactor stack manipulation functions into a separate header file. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8885 Reviewed By: jamesr66a Differential Revision: D8751048 Pulled By: zdevito fbshipit-source-id: 312aabfbf88307c5f6ab947b6caf691468b94557	2018-07-10 10:24:48 -07:00
peter	d867757649	Fix CUDA 8 build for Windows (#9300 ) Summary: Replacement of #9023. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9300 Differential Revision: D8781492 Pulled By: soumith fbshipit-source-id: 6c0994da46d3112c24769f92366836c397891d93	2018-07-10 10:24:46 -07:00
Mike Kelley	8e6e8098ce	Revert D8768025: [pytorch][PR] Fix Eigen issue on OS X with CUDA and nvcc compile Differential Revision: D8768025 Original commit changeset: 5b34017aeb67 fbshipit-source-id: 6ec892ff483bb9d966eb7138eadc77443972c8f8	2018-07-10 10:24:43 -07:00
Orion Reblitz-Richardson	bbeae24145	Fix Eigen issue on OS X with CUDA and nvcc compile (#9270 ) Summary: Breaking this out of #8338 This takes care of the Eigen failure we saw on Mac CUDA builds when BUILD_CAFFE2 and BUILD_ATEN were removed. Fix is to isolate Eigen from headers included by cu files and processed by nvcc. This was worked on with smessmer. cc mingzhe09088 smessmer BIT-silence Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/9270 Reviewed By: mingzhe09088 Differential Revision: D8768025 Pulled By: orionr fbshipit-source-id: 5b34017aeb67e35a1b5938d962181ccd4cd37591	2018-07-10 09:25:42 -07:00
Thomas Viehmann	3254bcaed8	Call deleter when destroying unconsumed DLPack PyCapsules (#9297 ) Summary: Usually DLPack consumer is expected to call DLManagedTensor's deleter to signal that it doesn't need the contents. This patch calls the deleter when freeing unconsumed DLPack capsules created by PyTorch. Test script: ``` import torch import torch.utils.dlpack import gc for i in range(10000): a = torch.randn(1000,1000, dtype=torch.float32, device='cuda') b = torch.utils.dlpack.to_dlpack(a) gc.collect() ``` Before patch: consume all GPU ram. After patch: constant GPU ram consumption. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9297 Differential Revision: D8781571 Pulled By: soumith fbshipit-source-id: 2ebadec6c857646220d632ca64110af430dbd52f	2018-07-10 07:56:59 -07:00
Kaiyu Shi	89c2b50a15	Grad clip for parameters on different devices (#9302 ) Summary: I'm trying to write a multi-gpu network by pipelining some layers onto different GPUs. However, the current gradient clip requires all the parameters to locate in the same device. The overhead of CUDA launch is reduced since the scalar calculation is performed on CPU, but it introduces extra data transfers. No performance regression is observed by running the following snippet: ```python import time import torch module = torch.nn.Sequential( torch.nn.LSTM(1024, 1024), torch.nn.LSTM(256, 256), torch.nn.Linear(100, 10000), ).cuda() torch.nn.utils.clip_grad_norm_(module.parameters(), 1) torch.cuda.synchronize() start = time.time() for _ in range(1000): torch.nn.utils.clip_grad_norm_(module.parameters(), 1) torch.cuda.synchronize() time_elapse = time.time() - start print('{} ms per clip'.format(time_elapse)) ``` Pull Request resolved: https://github.com/pytorch/pytorch/pull/9302 Differential Revision: D8781551 Pulled By: soumith fbshipit-source-id: 9d76d01fe0531927f770a16b9523872a7e08e927	2018-07-10 07:56:55 -07:00
Tongzhou Wang	1597fc594d	3d conv should use int64_t (#9274 ) Summary: Fixes #9264 . There can be so many elements in the output of `vol2col` so it overflows `int` range! This PR changes 3d conv to use `int64_t` mostly. Also fixes some unused var warning (cc goldsborough ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9274 Differential Revision: D8770682 Pulled By: SsnL fbshipit-source-id: f6e37f1aa56fe1009dd4c9bcbc042244e47252db	2018-07-10 06:39:45 -07:00
Edward Yang	d0d1820814	Add weak pointer and finalizer support directly to THStorage. (#9148 ) Summary: The underlying use-case is the file descriptor to storage cache in torch.multiprocessing.reductions. Previously, this was implemented by wrapping an existing allocator with a "weak ref" allocator which also knew to null out the weak reference when the storage died. This is terribly oblique, and prevents us from refactoring the allocators to get rid of per-storage allocator state. So instead of going through this fiasco, we instead directly implement weak pointers and finalizers in THStorage. Weak pointers to THStorage retain the THStorage struct, but not the data_ptr. When all strong references die, data_ptr dies and the finalizers get invoked. There is one major hazard in this patch, which is what happens if you repeatedly call _weak_ref on a storage. For cleanliness, we no longer shove our grubby fingers into the finalizer struct to see if there is already a Python object for the weak reference and return it; we just create a new one (no one is checking these Python objects for identity). This means if you keep calling it, we'll keep piling on finalizers. That's bad! But I am not going to fix it until it is actually a problem for someone, because then we need to add another caching layer. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9148 Differential Revision: D8729106 Pulled By: ezyang fbshipit-source-id: 69710ca3b7c7e05069090e1b263f8b6b9f1cf72f	2018-07-10 06:25:33 -07:00
Lu Fang	e06abab264	Fix Upsample ONNX Symbolic (#9288 ) Summary: Adjust the change to changes in ATen Pull Request resolved: https://github.com/pytorch/pytorch/pull/9288 Reviewed By: ailzhang Differential Revision: D8779078 Pulled By: houseroad fbshipit-source-id: 7f387eeb35ae1f5a1494afc6287853a87a6173b4	2018-07-09 23:25:26 -07:00
Lu Fang	181d2a5e60	Add support of is_compatible for old version of onnx (#9284 ) Summary: Fix the problem if caffe2 works with old version of onnx Pull Request resolved: https://github.com/pytorch/pytorch/pull/9284 Reviewed By: yinghai Differential Revision: D8773894 Pulled By: houseroad fbshipit-source-id: 99b5a962099f854edc85a2ea815cb88c82a6e175	2018-07-09 21:09:14 -07:00
Yinghai Lu	7ace3a99ec	Fix TensorRT tests (#9285 ) Summary: ONNX-TensorRT is still using old opset (<7). Patch it for now. Future fix would be expose versioning in onnx exporter. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9285 Reviewed By: houseroad Differential Revision: D8775268 Pulled By: yinghai fbshipit-source-id: c272073f80cce35ebd971e44ec9472e3c8fd4b9e	2018-07-09 20:40:19 -07:00
Peter Goldsborough	4498fb962b	Add space around operator (#9294 ) Summary: Fixes lint failure on master Pull Request resolved: https://github.com/pytorch/pytorch/pull/9294 Differential Revision: D8779010 Pulled By: goldsborough fbshipit-source-id: da1ea2604189fd704c22fa8a5770bd92845cea91	2018-07-09 20:24:21 -07:00
Gregory Chanan	f92edf7ef4	N-dimensional empty tensors: indexing, factories, reductions. (#9209 ) Summary: This PR implements and tests N-dimensional empty tensors for indexing, factories, and reductions if compiled with -DUSE_TH_SIZE_ZERO_DIM. Still remaining to add: 1) TensorShape functions 2) Simple linear algebra functions (matrix multiply variants) 3) Other functions that operate over a dimension (but don't reduce). Pull Request resolved: https://github.com/pytorch/pytorch/pull/9209 Reviewed By: ezyang Differential Revision: D8751257 Pulled By: gchanan fbshipit-source-id: 2113374dc7af6caf31a99bf67b3893f130a29e23	2018-07-09 19:40:01 -07:00
peter	19ecb5f8ad	Fix docs for Windows CUDA 8 builds (#9254 ) Summary: Fixes #9200. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9254 Differential Revision: D8778011 Pulled By: soumith fbshipit-source-id: 0a2c2863ac1bc515397fc446039db64d1d4e236d	2018-07-09 18:55:03 -07:00
Jesse Hellemn	99ab082366	Making setup.py install work for Caffe2 (#8509 ) Summary: Tested on my mac on a pretty clean anaconda3 Pull Request resolved: https://github.com/pytorch/pytorch/pull/8509 Reviewed By: orionr Differential Revision: D8702257 Pulled By: pjh5 fbshipit-source-id: eda03ef9732da9fc56b31d909af5c0e39520d689	2018-07-09 18:10:58 -07:00
Kaiyu Shi	342dbcc35a	Remove legacy redundant codes (#9252 ) Summary: Fix #9167 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9252 Differential Revision: D8774644 Pulled By: soumith fbshipit-source-id: 0b004f497026bca3b101c577e78aec22bdc3df51	2018-07-09 16:55:28 -07:00
Hector Yuen	2b8aea3ada	add more logging messages to dimension checks of FCGradient (#9203 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9203 Closes https://github.com/pytorch/pytorch/pull/9203 Added extra logging for FCGradient input dimension checks Reviewed By: yinghai Differential Revision: D8738549 fbshipit-source-id: d4f26572d86f3d44f40c9dca62d4f241ba15aead	2018-07-09 16:55:26 -07:00
Lu Fang	c67ade26a7	Add onnx support for clamp_min clamp_max (#9224 ) Summary: Add support for clamp as required by https://github.com/onnx/onnx/issues/1168 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9224 Reviewed By: yinghai Differential Revision: D8758945 Pulled By: houseroad fbshipit-source-id: fad724d273c59f4527e96481ee6b2d14bfba205d	2018-07-09 16:25:44 -07:00
Mingzhe Li	01a7ca3d64	Fix Pytorch Mac build issues (#9283 ) Summary: Breaking this out of #8338 This fixed Mac build issues after BUILD_CAFFE2 and BUILD_ATEN are removed. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/9283 Reviewed By: orionr Differential Revision: D8773459 Pulled By: mingzhe09088 fbshipit-source-id: 71942e8e6891a625e6b1a7dc0160e87444c64209	2018-07-09 15:40:46 -07:00
Mingzhe Li	29b1c2cfce	Install typing for Mac (#9271 ) Summary: Breaking this out of #8338 When BUILD_CAFFE2 and BUILD_ATEN are removed, we need to install typing on Mac. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/9271 Reviewed By: orionr Differential Revision: D8768701 Pulled By: mingzhe09088 fbshipit-source-id: 052b96e90e64b01e6b5dd48b91c0fb12fb96b54a	2018-07-09 14:58:50 -07:00
Mingzhe Li	a70a90b28f	Fix pytorch linux build issues (#9273 ) Summary: Breaking out of #8338 This fixes the build issues with pytorch on linux machines after BUILD_CAFFE2 and BUILD_ATEN are removed. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/9273 Reviewed By: orionr Differential Revision: D8768869 Pulled By: mingzhe09088 fbshipit-source-id: 2730426ed1bed398eb5dc804c7348aeeb27c93d3	2018-07-09 14:41:36 -07:00
Edward Z. Yang	d0ad696f9d	Warn about THPObjectPtr needing GIL. (#9265 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Pull Request resolved: https://github.com/pytorch/pytorch/pull/9265 Differential Revision: D8767687 Pulled By: ezyang fbshipit-source-id: 900b37f2749112cafc5b48e7b444a256df18186a	2018-07-09 13:55:22 -07:00
Orion Reblitz-Richardson	b19b38c427	Fix Mac CUDA issues (#9269 ) Summary: Breaking this out of #8338 This takes care of failures we saw on Mac CUDA builds when BUILD_CAFFE2 and BUILD_ATEN were removed. Specifically, smessmer fixed `std::hash` being handled in a weird way by nvcc and I fixed an nvcc template issue by moving `SparseNormalizeOp::RunOnDevice` implementation into the cc file. cc mingzhe09088 smessmer Pull Request resolved: https://github.com/pytorch/pytorch/pull/9269 Reviewed By: mingzhe09088 Differential Revision: D8767984 Pulled By: orionr fbshipit-source-id: 550686bfcef6d331f16d593859c99169216c5c2e	2018-07-09 12:40:40 -07:00
Mingzhe Li	744cd90074	Fix Android build issue (#9275 ) Summary: Breaking this out of #8338 This fixed an Android build issue after BUILD_CAFFE2 and BUILD_ATEN are removed. cc orionr Pull Request resolved: https://github.com/pytorch/pytorch/pull/9275 Reviewed By: orionr Differential Revision: D8769913 Pulled By: mingzhe09088 fbshipit-source-id: afce52a12697757a0b2103c7c343e19ab158a9f7	2018-07-09 12:40:37 -07:00
Yinghai Lu	cb98c5020a	Normalize IDEEP spatial bn op test (#9276 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9276 Use `checkDevice` instead rolling our own. Reviewed By: orionr Differential Revision: D8769401 fbshipit-source-id: bd47ec2b2501552c2da1cee2eb9ad96a215602b4	2018-07-09 11:55:41 -07:00
Orion Reblitz-Richardson	936f47f271	Make roi_align_rotated_op_test not rely on 1.12.0 numpy.rot90 (#9267 ) Summary: Breaking this out of https://github.com/pytorch/pytorch/pull/8338 Use a local version of `np.rot90` with an `axes` argument, since we don't have NumPy 1.12.0 in all of the test environments. Caffe2 conda2-ubuntu16.04, for example, fails. Generally, it seems better to not require a NumPy bump just for this test. cc mingzhe09088 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9267 Reviewed By: mingzhe09088 Differential Revision: D8767819 Pulled By: orionr fbshipit-source-id: c51a6295d58366eba06e4e55e3f1ffaa8af96975	2018-07-09 11:55:39 -07:00
Orion Reblitz-Richardson	768a0e3298	Some more changes to support USE_CUDNN=OFF (#9268 ) Summary: Breaking this out of #8338 More changes required to support USE_CUDNN=OFF. We should be able to land some of our fixes before the big BUILD_CAFFE2 and BUILD_ATEN removal lands. cc mingzhe09088 Yangqing Pull Request resolved: https://github.com/pytorch/pytorch/pull/9268 Reviewed By: mingzhe09088 Differential Revision: D8767981 Pulled By: orionr fbshipit-source-id: 0607ca2773253b685209c274a3adf70180d8ce58	2018-07-09 11:55:38 -07:00
Pieter Noordhuis	1483bb7246	Remove unused functions (#9223 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9223 TSIA Reviewed By: ezyang Differential Revision: D8755761 fbshipit-source-id: 284fa03397df5626bd56de150b90ba61ae3b8c6e	2018-07-09 10:09:47 -07:00
Tongzhou Wang	e8536c08a1	Update extension docs, fix Fold/Unfold docs (#9239 ) Summary: Commits: 1. In extension doc, get rid of all references of `Variable` s (Closes #6947 ) + also add minor improvements + also added a section with links to cpp extension :) goldsborough + removed mentions of `autograd.Function.requires_grad` as it's not used anywhere and hardcoded to `return_Py_True`. 2. Fix several sphinx warnings 3. Change `*` in equations in `module/conv.py` to `\times` 4. Fix docs for `Fold` and `Unfold`. + Added better shape check for `Fold` (it previously may give bogus result when there are not enough blocks). Added test for the checks. 5. Fix doc saying `trtrs` not available for CUDA (#9247 ) Pull Request resolved: https://github.com/pytorch/pytorch/pull/9239 Reviewed By: soumith Differential Revision: D8762492 Pulled By: SsnL fbshipit-source-id: 13cd91128981a94493d5efdf250c40465f84346a	2018-07-08 19:09:39 -07:00
richard	f48e15624e	Unique cuda support (#8899 ) Summary: Add cuda support for unique. There is a simple test below for a tensor including 1M <int> data. And the performance is faster. ```python Performance cpu: 0.05040597915649414 s x: tensor([1, 3, 1, ..., 4, 9, 4]) x output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9]) x inverse: tensor([0, 2, 0, ..., 3, 8, 3]) gpu: 0.015192985534667969 s y: tensor([1, 3, 1, ..., 4, 9, 4], device='cuda:0') y output: tensor([1, 2, 3, 4, 5, 6, 7, 8, 9], device='cuda:0') y inverse: tensor([0, 2, 0, ..., 3, 8, 3], device='cuda:0') ``` ```python Code import torch import time x=torch.randint(1,10,(1000000,),dtype=torch.long) device = torch.device("cuda") y=x.to(device) start = time.time(); output,inverse = x.unique(sorted=True,return_inverse=True) stop = time.time(); print('cpu:',stop-start,'s') print('x:',x) print('x output:',output) print('x inverse:',inverse) start = time.time(); output1,inverse1 = y.unique(sorted=True,return_inverse=True) torch.cuda.synchronize(); stop = time.time(); print('gpu:',stop-start,'s') print('y:',y) print('y output:',output1) print('y inverse:',inverse1) ``` Closes https://github.com/pytorch/pytorch/pull/8899 Reviewed By: SsnL Differential Revision: D8677655 Pulled By: ezyang fbshipit-source-id: 09df3f0602f235c5d36c7a6e7e1d89dbf82570bb	2018-07-08 17:09:26 -07:00
Zachary DeVito	819815d9c0	Fix missing compile_commands.json for aten (#9227 ) Summary: When we moved the libaten build into libcaffe2, we changed the location where it generated compile_commands.json such that it was no longer being picked up by the build script. This fixes it so it is still found. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9227 Reviewed By: goldsborough Differential Revision: D8757984 Pulled By: zdevito fbshipit-source-id: 73df26bf08d98f18ac841d6c0db7e332fd328ab6	2018-07-08 16:54:34 -07:00
Xiang Gao	a615baa51f	move unbind to ATen Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/8587 Differential Revision: D8764086 Pulled By: soumith fbshipit-source-id: 7f311cf13c341040e1f2cf4a8f05723e32d38947	2018-07-08 16:46:35 -07:00
Jenny Ramseyer	66dc97e51c	#8714 Improve Error Messages for module re-assignment (#9212 ) Summary: Here's an improved error message. Let me know if this change makes the errors a little clearer. Closes https://github.com/pytorch/pytorch/pull/9212 Reviewed By: soumith Differential Revision: D8752896 Pulled By: jramseyer fbshipit-source-id: d2bd8462c3ddf14acd3de56a4c1aeb75a9bc4067	2018-07-08 16:46:33 -07:00
mruberry	d6f21fc663	Ports Streams to ATen (#8997 ) Summary: This PR moves the THCStream logic (from both the THCStream and THCState APIs) to ATen. In particular, it: + Creates a new (THC free) at::CUDAStream class and API + Extends the at::Context API to expose it + Stubs the current THCStream and THCState APIs to use it + Updates THC to no longer violate stream encapsulation (stream.hpp is dead) + Adds an ATen cpp test of the API + Bonus: Removes some debug spew in test_nn.py The new API has several advantages over the old one: (1) It comes with an easy to use RAII, the CUDAStream. CUDAStreams have the expected copy and move semantics and are implicitly convertible to cudaStream_t. (2) It does not depend on THCState, THCThreadLocal, or CUDA (thanks to goldsborough for suggesting the dynamic registration technique) (3) It provides one consistent API/place for all stream operations, instead of having them split between THCStream and THCState (4) The internals are completely encapsulated, unlike the historic THCStream (5) It has getAndRetain semantics, which are safer than the historic gets (which allowed a gap between acquisition and retention) There are a couple things this PR does not do, however, which are left for future work: - It leaves the c10d:CUDAStream class as a THCStream wrapper (which now really wraps an at::CUDAStream). - It leaves historic users of THCStream mostly untouched, except where they violated encapsulation (by using stream.hpp). A couple forward declarations were also changed. I hope this PR allows easy usage of streams from ATen and is a useful pattern for porting more of the THCState API. Pull Request resolved: https://github.com/pytorch/pytorch/pull/8997 Differential Revision: D8683375 Pulled By: soumith fbshipit-source-id: 2e48ad85f1f9c8817684fe63a267938e80eafdcf	2018-07-08 16:25:09 -07:00
Bram Wasti	75919b4e18	Expose generic device copy algorithm (#9009 ) Summary: Pull Request resolved: https://github.com/pytorch/pytorch/pull/9009 Closes https://github.com/pytorch/pytorch/pull/9009 Nice little helper for the related stacked diff github_tests_pass Reviewed By: hyuen Differential Revision: D8688509 fbshipit-source-id: 22de241d69932210d161df1e29d9c41eb50a8133	2018-07-08 15:40:36 -07:00
fehiepsi	4ad6e53557	fix the deprecate argument in bce with logits (#9162 ) Summary: As title. Pull Request resolved: https://github.com/pytorch/pytorch/pull/9162 Differential Revision: D8753892 Pulled By: SsnL fbshipit-source-id: 7ce9ac16571a550a3fa7b86d68eb5c077a5956fb	2018-07-07 10:26:35 -07:00
Junjie Bai	f40ed548d8	Bump onnx submodule (#9215 ) Summary: To include new onnx backend test cases Pull Request resolved: https://github.com/pytorch/pytorch/pull/9215 Reviewed By: houseroad Differential Revision: D8754785 Pulled By: bddppq fbshipit-source-id: 2c113a7155c537c4ec5ddb021661d68acb775879	2018-07-06 15:42:22 -07:00
Benjamin Graham	067b270717	Optimize LeakyReLU and PReLU 'forward' functions on the CPU (#9206 ) Summary: This looks like a totally cosmetic change, but for some reason it reduces the runtime by ~50% running in a single CPU thread. ``` import os os.environ['OMP_NUM_THREADS']='1' #Use one CPU thread import torch, torch.nn as nn, time def test_net(net,offset): net.eval() total=0 with torch.no_grad(): for _ in range(100): x = torch.randn(100,100,100)+offset start_time = time.time() y = net(x) total+=time.time()-start_time print(net, total*10, 'ms') for offset in [-1,0,+1]: test_net(nn.LeakyReLU(),offset) test_net(nn.PReLU(),offset) ``` Closes https://github.com/pytorch/pytorch/pull/9206 Reviewed By: yf225 Differential Revision: D8749491 Pulled By: btgraham fbshipit-source-id: 3db8049dd151c0ba9ae1dd5c05bcc58bcab97e9a	2018-07-06 15:42:19 -07:00
Ailing Zhang	227c8f2654	Implement nn.functional.interpolate based on upsample. (#8591 ) Summary: This PR addresses #5823. * fix docstring: upsample doesn't support LongTensor * Enable float scale up & down sampling for linear/bilinear/trilinear modes. (following SsnL 's commit) * Enable float scale up & down sampling for nearest mode. Note that our implementation is slightly different from TF that there's actually no "align_corners" concept in this mode. * Add a new interpolate function API to replace upsample. Add deprecate warning for upsample. * Add an area mode which is essentially Adaptive_average_pooling into resize_image. * Add test cases for interpolate in test_nn.py * Add a few comments to help understand linear interpolation code. There is only "cubic" mode missing in resize_images API which is pretty useful in practice. And it's labeled as hackamonth here #1552. I discussed with SsnL that we probably want to implement all new ops in ATen instead of THNN/THCUNN. Depending on the priority, I could either put it in my queue or leave it for a HAMer. After the change, the files named as Upsampling.c works for both up/down sampling. I could rename the files if needed. Differential Revision: D8729635 Pulled By: ailzhang fbshipit-source-id: a98dc5e1f587fce17606b5764db695366a6bb56b	2018-07-06 15:28:11 -07:00
Yinghai Lu	766fa1fc96	Fix IDEEP CMakefile (#9217 ) Summary: The reason is that we are referencing `__ideep_looked_for` here: `77484d91db/cmake/Modules/FindMKL.cmake (L350)` This was first flushed out in https://github.com/pytorch/pytorch/pull/8105 and probably can help with https://github.com/pytorch/pytorch/issues/9024 Pull Request resolved: https://github.com/pytorch/pytorch/pull/9217 Reviewed By: houseroad Differential Revision: D8754491 Pulled By: yinghai fbshipit-source-id: 70aecc2d60684b9ea522403dc98a0a1a2c3db7e6	2018-07-06 15:28:07 -07:00
Hao Lu	af107c4d16	Fix shape inference bug (#9199 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9199 The input shapes are not logged correctly in production because `PerfNetObserver::Stop()` only gets called after the inference is done for the net and in the mobile models, it's common practice to reuse the blobs as much as possible to save memory. And the shapes of the blobs keep changing during inference. By the time you you query `InputTensorShapes()` in `PerfNetObserver::Stop()`, you only get the final shape of the blobs. To fix this bug, I moved the 'InputTensorShapes()' query from `PerfNetObserver::Stop()` to `PerfOperatorObserver::Stop()`. The latter gets called at the end of operator->run() whereas `PerfNetObserver::Stop()` gets called at the end of net->run(). Also remove `PerfOperatorObserver::getAnalyticalCost()` since it's now done on the server side and no longer needed on mobile Reviewed By: Maratyszcza Differential Revision: D8743346 fbshipit-source-id: 5d2d0132e3f5e084be7d0173863e695e62a6b4a0	2018-07-06 15:15:17 -07:00
Zhaoheng Ni	f87499a8f3	Modify the original PackSegments operator by adding "max_length" argument (#9048 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9048 max_length argument helps fix the shape of the output to be N * max_length * D, where N is the batch_size, D is the feature_dim. Reviewed By: bddppq Differential Revision: D8702782 fbshipit-source-id: e30555608fee1c4a61cc95922f4a71c7f54903af	2018-07-06 14:33:59 -07:00
Xiuyan Ni	4e5369349f	Add FTRL Optimzier with Group Lasso regularizer (#9074 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9074 Implement an optimzier based on FTRL Optimzier which support Group Lasso regularizer. The relevant paper list for this optimizer: 1. About the FTRL Optimizer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/41159.pdf, 2. About the group lasso regularizer solver: http://www.cse.cuhk.edu.hk/~king/PUB/ICML2010-Yang-473.pdf Differential Revision: D8623146 fbshipit-source-id: 40e08aa6319d1ad7aa95e8716e3de83b9cfb8452	2018-07-06 13:41:00 -07:00
Bram Wasti	c0bfe2a6ed	Clean up conversion registration Summary: [x] get registry working [x] move all current ops to registry Reviewed By: yinghai Differential Revision: D8706115 fbshipit-source-id: 8dfce79039b57dea1c15e8e291cdd74f39766ade	2018-07-06 13:40:56 -07:00
Shaoliang Nie	da39c24971	Add GroupL1Norm regularizer (#9115 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9115 As desc Reviewed By: hlu1 Differential Revision: D8718011 fbshipit-source-id: c9d750662064dd6e6362b6b13d9d0175e93e60e4	2018-07-06 13:26:09 -07:00
Peter Goldsborough	f1ce15b50c	Move nccl scatter and gather to C++ (#9117 ) Summary: As I try to replicate DP in C++, I need to move some functions into C++ from Python. This PR ports the scatter and gather primitives from Python in torch/cuda/comm.py to C++ in torch/csrc/cuda/comm.cpp. The basic infrastructure was already there, since apaszke had rewritten broadcast in C++ already. I'm not very familiar with this code, so let me know if I'm doing something wrong. I largely just literally translated the code. I don't know how "public" `torch.cuda.comm` is, but I feel like the `destination_index` parameter for `gather` should be changed from -1 indicating CPU to `None` indicating CPU, and `-1` indicating the default CUDA device. That would make the code clearer IMO. apaszke colesbury teng-li pietern Closes https://github.com/pytorch/pytorch/pull/9117 Differential Revision: D8721729 Pulled By: goldsborough fbshipit-source-id: 1844a488079d21fa209b32e2c73e48632cbe9e68	2018-07-06 11:10:33 -07:00
Peter Goldsborough	d863391871	nn::Module::as (#9149 ) Summary: Added a way to `dynamic_cast` an `nn::Module` and get a pointer to it. `nn::Module::is<T>` just checked if the return value of the `dynamic_cast` was nullptr, so I got rid of `is<T>` since it's equivalent to `as<T> != nullptr`(or just `as<T>` due to boolean conversion). We're now at ``` if (auto* conv = module.as<nn::Conv2d>()) { conv->weight.data().normal_(0.0, 0.02); } else if (auto* bn = module.as<nn::BatchNorm>()) { bn->weight.data().normal_(1.0, 0.02); bn->bias.data().fill_(0); } ``` ezyang apaszke ebetica Closes https://github.com/pytorch/pytorch/pull/9149 Differential Revision: D8735954 Pulled By: goldsborough fbshipit-source-id: e2b8f6f0cea16a621f8bc0807a33cc7651d25154	2018-07-06 11:10:29 -07:00
Zachary DeVito	9aded4351e	Allow arbitrary namespaces for Symbols (#9018 ) Summary: Context: I am updating jit::FunctionSchema to use `Symbol name;` rather than `std::string name`. Sometimes the name refers to a builtin thing like `prim::UnpackTuple`, sometimes to an aten operator like `aten::add`, and sometimes just to a raw string, like `my_method_foo` that really doesn't belong in any namespace and should be printed to the user in that form. For this last case, I want the ability to create a raw Symbol again, like was previously possible, that just represents an interned string. This PR enables that use, keeps the other functionality still possible, and simplifies interned_string's implementation a bit. This changes how Symbol is implemented. Now the namespace of a symbol is optional and the namespaces themselves are Symbols. This allows Symbol to be used with arbitrary namespaces, and allows you to use Symbol as an simple interned string using via fromQualString and toQualString without :: in the string. This also simplifies the implementation. Like with string conversion, builtin primitives go through a fast path for namespace lookup while registered symbols require holding a lock and reading an array entry to lookup the namespace. Note: alexnet expect file update is from a previous commit. It doesn't run in CI because pytorch vision is not installed. Closes https://github.com/pytorch/pytorch/pull/9018 Reviewed By: SsnL Differential Revision: D8690449 Pulled By: zdevito fbshipit-source-id: b65ee57704641d7294fe115c5470cf55d406458f	2018-07-06 10:11:15 -07:00
Will Feng	84884dc2d3	Allow passing '0' to ASAN/UBSAN flags (#9202 ) Summary: Similar to https://github.com/pytorch/pytorch/pull/9187, This PR makes setting the `PYTORCH_TEST_WITH_ASAN` and `PYTORCH_TEST_WITH_UBSAN` flags easier internally, by allowing the flags to be set to `0`. Closes https://github.com/pytorch/pytorch/pull/9202 Differential Revision: D8745533 Pulled By: yf225 fbshipit-source-id: 6293f52f2e8b1c3ef150becfdc2dd7ded56d5d80	2018-07-06 08:40:37 -07:00
Gregory Chanan	168a29f497	Create native wrappers around dimension reduction functions. (#9197 ) Summary: This is necessary for n-dimensional empty tensors, which have special native handling. Closes https://github.com/pytorch/pytorch/pull/9197 Differential Revision: D8744083 Pulled By: gchanan fbshipit-source-id: 3cc692a1d62cbeb169681b7c40e3df50e12953b7	2018-07-06 08:11:23 -07:00
Adam Paszke	1f1fb813a6	Use a static random_device in StorageSharing (#9080 ) Summary: I've been cleaning up my email notifications, and noticed that this PR used a stack-allocated `random_device`. This is generally a bad idea due to this sentence from the C++ reference (emphasis mine): > `std::random_device` may be implemented in terms of an implementation-defined pseudo-random number engine if a non-deterministic source (e.g. a hardware device) is not available to the implementation. In this case each `std::random_device` object may generate the same number sequence. If this is how this object is implemented, then this `rd()` call will give the same result at every call. cc yf225 Closes https://github.com/pytorch/pytorch/pull/9080 Differential Revision: D8748342 Pulled By: soumith fbshipit-source-id: 22987befee61ff7faacda5ecc10138c2ac5d26ff	2018-07-06 07:39:53 -07:00
Matthew Rocklin	eadc5071e8	Use torch.save in _StorageBase.__reduce__ (#9184 ) Summary: Previously this used the ``.toliist`` method, which converted the storage object into a list of Python objects, and then sent those to pickle. For storage objects of non-trivial size, this was very slow. Now we reuse the logic of the ``torch.save`` function to efficiently turn the Storage object into bytes, and send those instead. This reduces the semantic information (it's harder to interpret the bytes) but should be orders of magnitude more efficient when serializing data with the pickle protocol or with copy For future work it would be nice to develop a mechanism to get a buffer of bytes out of a Storage object, and use that alongside the current ``from_buffer`` method. See #9168 for context Closes https://github.com/pytorch/pytorch/pull/9184 Differential Revision: D8747794 Pulled By: soumith fbshipit-source-id: ac598e660c043788ed1ffab3d0303812886edf79	2018-07-06 07:24:53 -07:00
Tongzhou Wang	7b25cbbef9	Test nn.Module on non-contiguous inputs (#9114 ) Summary: 1. Let `ModuleTest` raise when they fail on non-contiguous inputs. Fix legacy modules. 2. Fix BN (both THNN and cuDNN) not working on non-contiguous inputs. 3. Fix CUDA EmbeddingBag not working on non-contiguous inputs. To prevent calling `.contiguous()` on in both `forward` and `backward`, a. prefix all current `embedding_bag` functions with `_`, indicating that they require input to be contiguous (there is a check in each function). b. create `embedding_bag`, which makes input arguments `.contiguous()`, and calls `_embedding_bag` 3. Make many ATen `embedding` functions to work on non-contiguous inputs so we don't need to call `input = input.contiguous()` in Python `nn.functional.embedding`. 4. Fix dense-sparse addition when the sparse input is not coalesced and indices or values tensor is not contiguous. This came up in the test cases of Embedding modules with `sparse=True`. Added tests. 5. Update `TensorUtils.cpp` to use `AT_` macros. Request: review from cpuhrsch on the `Embedding` changes. review from ezyang on ATen sparse & BN changes. Closes https://github.com/pytorch/pytorch/pull/9114 Differential Revision: D8717299 Pulled By: SsnL fbshipit-source-id: 0acc6f1c9522b5b605361e75112c16bbe1e98527	2018-07-05 21:09:34 -07:00
Tongzhou Wang	a769fae91d	Fix TestAutograd.test_pinverse not actually testing (#9192 ) Summary: cc vishwakftw Also added a check if none of the input tensors in `gradcheck` have `requires_grad=True`. Closes https://github.com/pytorch/pytorch/pull/9192 Differential Revision: D8739401 Pulled By: SsnL fbshipit-source-id: 81bb3aa0b5c04eb209b137a4bd978e040e76cbcd	2018-07-05 18:55:00 -07:00
Will Feng	ff501c30af	Turn on UBSAN in the OSS build (#8813 ) Summary: Copy of https://github.com/pytorch/pytorch/pull/8802 Closes https://github.com/pytorch/pytorch/pull/8813 Differential Revision: D8707364 Pulled By: yf225 fbshipit-source-id: bc201980b50e9fb44c42a17f898b50d3558fc417	2018-07-05 15:55:49 -07:00
Xiaomeng Yang	21c420c32c	Remove unused RowwiseArgMaxOp (#9119 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9119 Remove unused RowwiseArgMaxOp Reviewed By: houseroad Differential Revision: D8719826 fbshipit-source-id: 57d78c8b93bc94a4634d806c7c2041f8c18678a5	2018-07-05 15:25:28 -07:00
Lu Fang	f45dfbccef	Add support for ArgMax and ArgMin in C2 onnx backend and frontend (#9050 ) Summary: Pass the end to end test cases in https://github.com/onnx/onnx/pull/1049 Closes https://github.com/pytorch/pytorch/pull/9050 Reviewed By: hlu1 Differential Revision: D8703757 Pulled By: houseroad fbshipit-source-id: 63308202e349dfc02d532e87f49495ba1aab085b	2018-07-05 14:26:08 -07:00
Gao, Xiang	213540cd85	Add meshgrid to PyTorch (#8581 ) Summary: Part of this issue https://github.com/pytorch/pytorch/issues/7580 Closes https://github.com/pytorch/pytorch/pull/8581 Differential Revision: D8661660 Pulled By: soumith fbshipit-source-id: 4a72fb5152ed6eb4d57f14de691bf09a2a2e5b0c	2018-07-05 11:25:27 -07:00
Will Feng	1c9073b43a	Allow passing '0' to NO_MULTIPROCESSING_SPAWN (#9187 ) Summary: This PR makes setting the `NO_MULTIPROCESSING_SPAWN` easier internally, by allowing the flag to be set to `0`. Closes https://github.com/pytorch/pytorch/pull/9187 Differential Revision: D8736206 Pulled By: yf225 fbshipit-source-id: b8a34cb9a747b13bc9428777a3ed766ce441cfe1	2018-07-05 11:10:46 -07:00
Vishwak Srinivasan	14cbd9adb8	Implement torch.pinverse : Pseudo-inverse (#9052 ) Summary: 1. Used SVD to compute. 2. Tests in test_autograd, test_cuda and test_torch 3. Doc strings in _torch_docs.py and _tensor_docs.py Closes #6187 Closes https://github.com/pytorch/pytorch/pull/9052 Reviewed By: soumith Differential Revision: D8714628 Pulled By: SsnL fbshipit-source-id: 7e006c9d138b9f49e703bd0ffdabe6253be78dd9	2018-07-05 09:11:24 -07:00
Francisco Massa	f6027bb15d	Install hpp headers for CPP Extensions (#9182 ) Summary: With the Cppzation of a few files in `TH`/`THC`, the CPP extensions got broken whenever the user uses feature from `THC` in their files, when pytorch is installed via `python setup.py install`. This addresses issues such as ``` /home/me/.conda/envs/pytorch/lib/python3.6/site-packages/torch/lib/include/THC/THCDeviceTensorUtils.cuh:5:25: fatal error: THCTensor.hpp: No such file or directory ``` Closes https://github.com/pytorch/pytorch/pull/9182 Reviewed By: soumith Differential Revision: D8734581 Pulled By: fmassa fbshipit-source-id: 2a1138f208592eaccb01fcdb805a6b369d7a497a	2018-07-05 07:55:25 -07:00
vishwakftw	08daed40f7	Fix bug in flip() (#9156 ) Summary: Closes #9147 Added a test to prevent regression in test_torch Added entries in docs cc ezyang weiyangfb Closes https://github.com/pytorch/pytorch/pull/9156 Differential Revision: D8732095 Pulled By: soumith fbshipit-source-id: 7a6892853cfc0ccb0142b4fd25015818849adf61	2018-07-04 07:24:01 -07:00
Pieter Noordhuis	4b2b690792	Install THC/THCGeneral.hpp (#9159 ) Summary: This file was added in #9107 but wasn't installed. The libraries in ./torch/lib use the headers from Caffe2/ATen from their temporary install path at torch/lib/tmp_install, and c10d was not able to find THC/THCGeneral.hpp before this fix. Closes https://github.com/pytorch/pytorch/pull/9159 Reviewed By: Yangqing Differential Revision: D8731107 Pulled By: pietern fbshipit-source-id: d6009f6f6e8e6e0f37dea24cc4c3570736943ab1	2018-07-03 21:40:44 -07:00
vishwakftw	49f88ac956	Add grid lines for activation images, fixes #9130 (#9134 ) Summary: 1. Add dashed light blue line for asymptotes. 2. RReLU was missing the activation image. 3. make clean in docs will remove the activation images too. Sample image: ![image](https://user-images.githubusercontent.com/23639302/42224142-5d66bd0a-7ea7-11e8-8b0a-26918df12f7c.png) Closes https://github.com/pytorch/pytorch/pull/9134 Differential Revision: D8726880 Pulled By: ezyang fbshipit-source-id: 35f00ee08a34864ec15ffd6228097a9efbc8dd62	2018-07-03 19:10:00 -07:00
nkhuyu@gmail.com	e3dbdb2a17	Fix the comments: code and comments dimensions mis-match (#9070 ) Summary: This will resolve the code and comments mis-match issue. Closes https://github.com/pytorch/pytorch/pull/9070 Differential Revision: D8712261 Pulled By: ezyang fbshipit-source-id: a8a7d8af890a41ec246e11c2a62b0bde297be9c1	2018-07-03 14:39:57 -07:00
Roy Fejgin	b479494ed4	loss plugin: Fix indexing into a scalar (#9143 ) Summary: The loss plugin was using the old-style loss[0] access, which in PyTorch 0.4 and later is an attempt to index into a scalar, generating a warning. Replaced that with loss.item(). This fixes https://github.com/pytorch/pytorch/issues/9142 Closes https://github.com/pytorch/pytorch/pull/9143 Differential Revision: D8726403 Pulled By: ezyang fbshipit-source-id: 6c496b140a74d22c8423f511db901b18615fd6fa	2018-07-03 14:25:44 -07:00
Edward Yang	b432837a9d	Add some missing error checks in sparse. (#9140 ) Summary: - There were missing error messages for AT_CHECK in SparseTensorImpl::set_indices_and_values - We have to check that the backends of all our inputs line up, since native does not do it for us. - Some math operations were missing shape tests. Fixes #9110 Signed-off-by: Edward Z. Yang <ezyang@fb.com> Closes https://github.com/pytorch/pytorch/pull/9140 Differential Revision: D8724349 Pulled By: ezyang fbshipit-source-id: 3c75104187aca97cbe92bb0ec24f6ded07b2c3d6	2018-07-03 13:11:12 -07:00
Gregory Chanan	f17b9e4cde	Fix boolean indexing. (#8920 ) Summary: Booleaning indexing was special cased to handle a single boolean value, but didn't generally work given multiple booleans. This PR unifies the behavior with slicing. Note that only 'True' and torch.tensor(True) behave like NumPy due to the lack of n-dimensional empty tensors. The corresponding tests for false values have been added, but are guarded behind a flag until we add n-dimensional empty tensors. Closes https://github.com/pytorch/pytorch/pull/8920 Reviewed By: ezyang Differential Revision: D8661876 Pulled By: gchanan fbshipit-source-id: 0dc8a45a303aa41f729d04ab8908cfaf2e3ce3d7	2018-07-03 10:24:12 -07:00
Jesse Hellemn	4f89777d29	Removing extraneous main function to fix buck test detection (#9121 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9121 This main function causes 'buck test caffe2_test_cpu' to run 0 tests Reviewed By: orionr Differential Revision: D8719343 fbshipit-source-id: dc1cf76b0355637eaae193be2159f5746873b9f9	2018-07-03 09:25:12 -07:00
Edward Yang	e09d993d8b	Move easy THStorage/THCStorage functions out of generic (#9136 ) Summary: Some functions are exactly implemented in THStorage_; in that case, we called those functions directly. Stacked on #9135 Closes https://github.com/pytorch/pytorch/pull/9136 Reviewed By: Yangqing Differential Revision: D8723998 Pulled By: ezyang fbshipit-source-id: 653d23a5e1db4b9bdda50641fa97730894cc8ed5	2018-07-03 09:11:51 -07:00
peter	9b0cece9b0	Enable the general usage of _download_url_to_file (#9090 ) Summary: A requirement for the fix on https://github.com/pytorch/examples/issues/378. Closes https://github.com/pytorch/pytorch/pull/9090 Reviewed By: goldsborough Differential Revision: D8712254 Pulled By: ezyang fbshipit-source-id: b28765f24d891890e9d88757ee4ec704e38e6af7	2018-07-02 19:55:39 -07:00
Peter Goldsborough	97b9712aed	Create Sequential::extend (#9116 ) Summary: There is no way to concatenate two `Sequential`s in Python, but it's also easier to do in an immutable fashion by just writing `Sequential(first.modules() + second.modules())`. Concatenating vectors isn't as easy in C++, so I think it's fair to save users some for loops by giving them `Sequential::extend()`. apaszke ebetica ezyang CC jamespinkerton Closes https://github.com/pytorch/pytorch/pull/9116 Reviewed By: ezyang Differential Revision: D8719630 Pulled By: goldsborough fbshipit-source-id: 840d7ac70755350e6202b493c531e30ecbb6546f	2018-07-02 19:42:03 -07:00
Junjie Bai	16570ef0d5	Update onnx submodule to include the protobuf fix for windows Summary: Closes https://github.com/pytorch/pytorch/pull/9113 Reviewed By: houseroad Differential Revision: D8717259 Pulled By: bddppq fbshipit-source-id: c99a4390b764707affea7db765abef789230f497	2018-07-02 19:42:01 -07:00
Roy Li	21c786071b	update nn loss tests to use new reduction arg (#9118 ) Summary: The tests were using the old args, which caused them to emit a lot of deprecation warnings. closes #9103. Reviewed By: ezyang Differential Revision: D8720581 Pulled By: li-roy fbshipit-source-id: 3b79527f6fe862fb48b99a6394e8d7b89fc7a8c8	2018-07-02 19:41:57 -07:00
Edward Yang	4d57a1750c	Unify THStorage and THCStorage structs. (#9107 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9107 Some details about how this was done: - For now, the allocators for CPU and CUDA are different (unifying the allocators is a bigger change to make, I'll contribute this in a later patch). To smooth this over, the allocator field now stores a void* instead of THAllocator* or THCDeviceAllocator*; to make this clear the field is renamed to allocatorVoidPtr. - Some THStorage functions which were generated per-scalar are now generalized, and thus moved out of the generic/ library. This way they can be called directly from a non-code-generated at::Storage - THCState is moved into a C++ header. This is actually not really related to this particular diff, but I'll need it soon to replace THAllocator/THCDeviceAllocator with at::Allocator (C++, so I can't mention it in a C header file.) - THPPointer needs to be adjusted, since there is no more type refinement between THStorage/THCStorage for it to template match over. This is a little tricky, because I can't refer to THCStorage_free unless we actually compile with CUDA. So there's two copies of the function now: one for the CPU build, one for the CUDA build. If we ever split CUDA/non-CUDA Python builds, you will have to indirect this through some dynamic dispatch. I want to soon replace the THCDeviceAllocator pointers in THCState with at::Allocator, but I can't reference a C++ namespaced type from C code, so THCState needs to move. Signed-off-by: Edward Z. Yang <ezyang@fb.com> Closes https://github.com/pytorch/pytorch/pull/9087 Reviewed By: orionr Differential Revision: D8712072 Pulled By: ezyang fbshipit-source-id: c6e1ea236cd1df017b42a7fffb2dbff20d50a284	2018-07-02 17:09:52 -07:00
Peter Goldsborough	5d474e1812	Make all module members public (#9111 ) Summary: Having circulated the C++ API a bit, I found that it would make it easier for folks to access module parameters directly than through the `parameters()` map. So here I make all variables/submodules and also the configuration options for every module public. For RNNs, I also updated the names of parameters to match PyTorch, e.g. `hhw` -> `w_hh`. This should make it easier to transition from Python. apaszke ebetica Closes https://github.com/pytorch/pytorch/pull/9111 Differential Revision: D8717112 Pulled By: goldsborough fbshipit-source-id: 3d36d5e161f7a86f44db7136c9c2fa53067abe1c	2018-07-02 16:09:57 -07:00
Wei Yang	cb1bfe91af	Deprecated several functions at torch.nn.functional (#8748 ) Summary: 1. fixes #6245 2. deprecated tanh, sigmoid Closes https://github.com/pytorch/pytorch/pull/8748 Differential Revision: D8697975 Pulled By: weiyangfb fbshipit-source-id: f30714aa0611a1fe870040692f3dbcc8238aece9	2018-07-02 15:54:46 -07:00
Bram Wasti	50392cc554	Store OperatorDef by copy (#9108 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9108 OperatorDef ownership was given to the net in the past, we no longer want to do that Reviewed By: pjh5 Differential Revision: D8705347 fbshipit-source-id: 34976de202a7a7a71b935dd13c1bc8e9c73552e0	2018-07-02 15:42:18 -07:00
Jerry Zhang	b79e8f79d8	Make SumElementsGradient use copy (#9039 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9039 att Reviewed By: ezyang Differential Revision: D8696455 fbshipit-source-id: 945e49a4c294fa39f847576d44ca0e6a32ecaf18	2018-07-02 13:25:12 -07:00
Thomas Viehmann	e977485449	detach spectral norm calculated weight in eval mode (#9020 ) Summary: As we left weight to be the last calculated weight in eval mode, we need to detach it from the computation in order to facilitate using backward. The typical use case is in GANs when the discriminator has spectral norm, is in eval mode and we want to backprop through the discriminator to get weight gradients for the generator. Closes https://github.com/pytorch/pytorch/pull/9020 Reviewed By: ezyang Differential Revision: D8694054 Pulled By: SsnL fbshipit-source-id: 09ee5843687cac3ed4c40759ac577a14c5371730	2018-07-02 10:39:47 -07:00
Yavuz Yetim	553c41f082	Adds serialization path (#9035 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9035 This diff builds on the structure in the stacked diff to add serialization/deserialization. It supports the old format and a new suggested format. Reviewed By: ilia-cher Differential Revision: D8415115 fbshipit-source-id: acaacce2b015f4c6ac0ae22625455290a3f30262	2018-07-02 09:09:39 -07:00
Tongzhou Wang	623ae0c07c	Fix loading 0.4 BN checkpoints (#9004 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/8481 Closes https://github.com/pytorch/pytorch/pull/9004 Reviewed By: soumith Differential Revision: D8684017 Pulled By: SsnL fbshipit-source-id: 57820ad5f6b60795358c9447409a364a93ffa1d9	2018-07-01 22:24:21 -07:00
Tongzhou Wang	179807a8c7	Fix MAGMA svd and eig (#9082 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/9079 There is room for speed-up for both functions (see https://github.com/pytorch/pytorch/issues/9083), but let's get this in to unblock #9052 . Closes https://github.com/pytorch/pytorch/pull/9082 Reviewed By: ezyang Differential Revision: D8711687 Pulled By: SsnL fbshipit-source-id: f043a9bf55cb6aec5126c3331d35761f7aa3f8e3	2018-07-01 22:24:17 -07:00
Soumith Chintala	474fdd7e2d	minor pybind for jit (#8890 ) Summary: add two small bindings to recently added attributes. Also want to leave a reference gist here: https://gist.github.com/soumith/8102ef39530bac09070912b1a5401d0f It showcases: - traced a module - symbolically differentiated the forward graph, to get a forward, backward graph - executed the subsequent forward + backward graphs correctly - compared the jit vs non-jit results Closes https://github.com/pytorch/pytorch/pull/8890 Reviewed By: ezyang Differential Revision: D8677663 Pulled By: soumith fbshipit-source-id: a29919c05baad997cd7fb7df718f933a83035118	2018-07-01 21:39:29 -07:00
Yan Zhu	8364470e5c	fix expty batch for softmax (#9075 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9075 as title Reviewed By: QueryConnectionException Differential Revision: D8710616 fbshipit-source-id: ca505e1a733cc24db9e2ab83a5395c64fa8360c4	2018-07-01 16:40:14 -07:00
peter	04f2708265	Fix build script for Windows (#9060 ) Summary: 1. Escape quotes 2. Use file exist logic to determine build success/failure Closes https://github.com/pytorch/pytorch/pull/9060 Differential Revision: D8707290 Pulled By: soumith fbshipit-source-id: a34265f46725eaaf9489bc38546200aeae75e8a9	2018-07-01 07:10:06 -07:00
Roy Li	c61f0217a5	combine size_average and reduce args in loss functions (#8018 ) Summary: closes #7929 Closes https://github.com/pytorch/pytorch/pull/8018 Differential Revision: D8682540 Pulled By: li-roy fbshipit-source-id: 649170dd1a7f373151c1d4e949838bd1c5651936	2018-07-01 05:39:00 -07:00
Xiaomeng Yang	03e7953a98	Use FixedDivisor in Reduce and Broadcast CUDA kernels (#9072 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9072 Use FixedDivisor in Reduce and Broadcast CUDA kernels Reviewed By: houseroad Differential Revision: D8710243 fbshipit-source-id: 6f1da12234898594a1be8c979d942aa515832aeb	2018-07-01 00:25:34 -07:00
Will Feng	90fd4df695	Add flag for disabling tests with multiprocessing spawn start method (#9061 ) Summary: This will resolve some of the timeout issues in CPU and GPU tests internally. Closes https://github.com/pytorch/pytorch/pull/9061 Reviewed By: ezyang Differential Revision: D8707471 Pulled By: yf225 fbshipit-source-id: 9dc82a2c9da0c540ae015442f74b9b2b1a67a246	2018-06-30 14:39:11 -07:00
Luca Antiga	2c6c53f5ce	Ensure that domain starts with domain_prefix before extracting substring (#9053 ) Summary: Fixes #9049. When provided with a domain string that lacks proper prefix, i.e. `org.pytorch.`, an exception is thrown. Closes https://github.com/pytorch/pytorch/pull/9053 Differential Revision: D8708264 Pulled By: ezyang fbshipit-source-id: e2593d8d36a17d3bb26fc0b239a61b84f1c38ecb	2018-06-30 10:39:40 -07:00
Peter Goldsborough	0515664c42	Make _C depend on csrc-no-python (#9057 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9057 Make the `_C` target depend on the `csrc-no-python` target. Also removes the `csrc` target and the with-python version of autogradpp (which is not used). Let me know if we should pick better names here. I also ran into a nasty linker issue with only one symbol being undefined. It turns out had been given inline linkage in the `.cpp` file, which I believe is an error. Reviewed By: orionr Differential Revision: D8705750 fbshipit-source-id: 8de083e371dbf5e9f12c15572d88e1c595dfa087	2018-06-29 20:39:24 -07:00
Yan Zhu	b07ea04e23	empty batch for spatialBN (#8933 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8933 spatialBN implementation cannot deal with empty batch, this diff tries to enable zero batch setting: during training, when batch_size = 0: in forward, output's saved_mean and saved_var are zeros. in backward, the gradient for SCALE_GRAD and BIAS_GRAD are zeros. Reviewed By: pjh5 Differential Revision: D8644699 fbshipit-source-id: 599ea687329d68699c987e05f56f409f4e729d1c	2018-06-29 18:40:41 -07:00
Tongzhou Wang	d7487bfe9e	Speed-up multidim sum (#8992 ) Summary: 1. Instead of using non `_out` variant, we allocate a buffer and use `_out` variant to write the intermediate results into the buffer. 2. Reduce dimensions in order of decreasing sizes. Benchmark: Sum a randn tensor of shape `[200, 1, 30, 40, 20, 1, 50]` along dimensions `[4, 6, 3, 0, 2, 5]`. Averaged across 1000 times: ``` before patch: CPU: 0.0441 s CUDA: 0.0273 s after patch: CPU: 0.0234 s CUDA: 0.0047 s ``` Closes https://github.com/pytorch/pytorch/pull/8992 Differential Revision: D8681069 Pulled By: SsnL fbshipit-source-id: 2c5d5af5c5a284f2e945181f2b24ee8c78becd50	2018-06-29 18:40:39 -07:00
Peter Goldsborough	9ce15173fb	Move _cudnn_init_dropout_state to TensorOptions and enable cuDNN dropout in C++ API RNNs (#9012 ) Summary: The goal of this PR was to add support for dropout descriptors in the C++ API's RNN class. The end result is a 4x-5x speedup for our RNN integration tests since they can now use cuDNN instead of autograd when dropout is set. To achieve this, I had to move `_cudnn_init_dropout_state` to the `TensorOptions` API. I also fixed a bug around `RNN::cuda()` not flattening parameters for cuDNN. ebetica ezyang Closes https://github.com/pytorch/pytorch/pull/9012 Reviewed By: pjh5 Differential Revision: D8689786 Pulled By: goldsborough fbshipit-source-id: 44fb191f5a38e41c4ded5417306b5bbc012cd56c	2018-06-29 17:25:23 -07:00
Lu Fang	863754c722	Update the ONNX op coverage in C2 Summary: Closes https://github.com/pytorch/pytorch/pull/9051 Reviewed By: pjh5 Differential Revision: D8704583 Pulled By: houseroad fbshipit-source-id: 186e8b62378ab4f7cdef5fa77dc08c6b9ddc9cc0	2018-06-29 17:25:19 -07:00
Ailing Zhang	d793473e60	add note to avoid memory surge on GPU (#9019 ) Summary: Addresses #7415 . Adding a note first, will do the API change if there's a need in the future. Closes https://github.com/pytorch/pytorch/pull/9019 Differential Revision: D8694056 Pulled By: ailzhang fbshipit-source-id: 0b6fa43fa62ac55deff3b3b099d1bc9fee74a5f9	2018-06-29 16:55:17 -07:00
Chunli Fu	67b21117b7	Add BatchTensor class (#8922 ) Summary: Add BatchTensor class - construct from data, mask, dims or construct from list of tensors - can return a list of tensors from an BatchTensor class next step: do IR level transformation and operators Closes https://github.com/pytorch/pytorch/pull/8922 Differential Revision: D8668986 Pulled By: ChunliF fbshipit-source-id: 8b24d2a9f46a3b42dbb397e99e9e059dfb2b326e	2018-06-29 15:57:27 -07:00
James Reed	3a71cf2e54	Disable verbose printing for time sequence prediction test Summary: Closes https://github.com/pytorch/pytorch/pull/9040 Reviewed By: soumith, wanchaol Differential Revision: D8697870 Pulled By: jamesr66a fbshipit-source-id: 212fe14aaf9c60c4c9c6d383b202395b1d0ec680	2018-06-29 12:40:18 -07:00
James Reed	7a1081b310	Re-enable passing operator-level tests (#9044 ) Summary: Just tried these and they work now Closes https://github.com/pytorch/pytorch/pull/9044 Reviewed By: soumith Differential Revision: D8698819 Pulled By: jamesr66a fbshipit-source-id: 1d5574de1819aa31fc36ad245186c7aa68587178	2018-06-29 12:25:28 -07:00
James Reed	b3fe200704	Fix TestJit.test_alexnet expect file Summary: Closes https://github.com/pytorch/pytorch/pull/9041 Reviewed By: soumith Differential Revision: D8698147 Pulled By: jamesr66a fbshipit-source-id: 63eb1bc96562b6f972aeba8748454efb9c889d5c	2018-06-29 12:25:25 -07:00
Pieter Noordhuis	f6cfd83a80	Find unused port for test dynamically (#9037 ) Summary: Closes https://github.com/pytorch/pytorch/pull/9037 Fixes flaky test failures due to port in use. Reviewed By: soumith Differential Revision: D8696779 fbshipit-source-id: a05412d1eb1dcb9a4b35023dead371aa33d62c39	2018-06-29 12:25:23 -07:00
Lu Fang	b75490414c	Bump up the C2 onnx frontend opset to 8 (#9006 ) Summary: Now ONNX master has bump up to opset 8. Closes https://github.com/pytorch/pytorch/pull/9006 Reviewed By: yinghai Differential Revision: D8685417 Pulled By: houseroad fbshipit-source-id: f0c0a3682417b8803a856e232c2740cf3e68e554	2018-06-29 11:56:11 -07:00
Tongzhou Wang	4efbd2e22c	Improve DataLoader worker fail error message (#9007 ) Summary: Tell people to run with num_workers=0 when DataLoader worker failed Closes https://github.com/pytorch/pytorch/pull/9007 Differential Revision: D8686005 Pulled By: SsnL fbshipit-source-id: bf872267f609c7b86e943061caab953149507bfe	2018-06-29 11:09:55 -07:00
Tongzhou Wang	a2bf55f9eb	Fix select backward when wrap dim (#9033 ) Summary: Previous backward was broken when `index=-1` because slicing `[-1:0]` gives empty tensor/list/array. Added a test. cc goldsborough Closes https://github.com/pytorch/pytorch/pull/9033 Differential Revision: D8694300 Pulled By: SsnL fbshipit-source-id: 8377b043896f8d0b1da173cc0077ace0bea5e862	2018-06-29 10:40:13 -07:00
peter	2507e273dc	Fix CUDA 8 for Windows (#9023 ) Summary: Fix missing functions for MSVC 2015 Inspired by https://github.com/tensorflow/tensorflow/pull/13525 Closes https://github.com/pytorch/pytorch/pull/9023 Reviewed By: soumith Differential Revision: D8694046 Pulled By: ezyang fbshipit-source-id: 92cb7b9efd76d97a264c12a1521be550176f58d5	2018-06-29 09:40:48 -07:00
Yinghai Lu	c2a89b69b9	Support to ONNXIFI op (#8749 ) Summary: This PR adds basic support to ONNXIFI op. Closes https://github.com/pytorch/pytorch/pull/8749 Reviewed By: Maratyszcza Differential Revision: D8665739 Pulled By: yinghai fbshipit-source-id: 961916f9e1a4a26390b73c4b648d177883143a22	2018-06-29 09:10:26 -07:00
Karan Dwivedi	37e526e1a8	Better print of nn Containers (#8939 ) Summary: Fix https://github.com/pytorch/pytorch/issues/8900 Waiting on https://github.com/pytorch/pytorch/pull/8463 1. Remove extra Line 2. ... Closes https://github.com/pytorch/pytorch/pull/8939 Reviewed By: soumith Differential Revision: D8687730 Pulled By: ezyang fbshipit-source-id: 81c57a03683875704d537cb4585b11838f70df56	2018-06-29 08:24:09 -07:00
Max Schwarz	512c49e831	Correct link flag order for GNU ld in utils.cpp_extension.load (#9021 ) Summary: Any flags linking libraries only take effect on inputs preceding them, so we have to call `$cxx $in $ldflags -o $out` instead of the other way around. This was probably not detected so far since the torch libraries are already loaded when loading JIT-compiled extensions, so this only has an effect on third-party libraries. This also matches our behavior on windows. Closes https://github.com/pytorch/pytorch/pull/9021 Reviewed By: soumith Differential Revision: D8694049 Pulled By: ezyang fbshipit-source-id: e35745fc3b89bf39c14f07ce90d6bd18e6a3d7cc	2018-06-29 08:24:07 -07:00
Thomas Viehmann	6a1e801071	add second variant to Tensor.add, Tensor.add_ docstring (fixes: #8690 ) (#9027 ) Summary: fixes: #8690 Closes https://github.com/pytorch/pytorch/pull/9027 Reviewed By: soumith Differential Revision: D8694042 Pulled By: ezyang fbshipit-source-id: bc3b1112b41f959231854366cdcf9292b3699779	2018-06-29 08:24:06 -07:00
vishwakftw	b795620442	Fix x.pow(0) gradient when x contains 0 (#8945 ) Summary: This closes https://github.com/pytorch/pytorch/issues/8940 . Closes https://github.com/pytorch/pytorch/pull/8945 Differential Revision: D8668853 Pulled By: ezyang fbshipit-source-id: 80a629352ee2f506c38a05647b769281579a5af7	2018-06-29 06:53:42 -07:00
James Reed	00b5d397ae	Fix resolution callback for @script_method (#8912 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/8715. This was peeking too few frames up when we instantiate the callback Closes https://github.com/pytorch/pytorch/pull/8912 Reviewed By: ezyang Differential Revision: D8684972 Pulled By: jamesr66a fbshipit-source-id: 11dbb919ae7273f92cbe25fe21f7946b9fa28aeb	2018-06-28 22:56:17 -07:00
vishwakftw	4643269eb5	Document get_device, fixes #8857 (#8859 ) Differential Revision: D8677690 Pulled By: ezyang fbshipit-source-id: 0167672d1d2659d9fc7d68530760639ba35ed7d8	2018-06-28 22:11:08 -07:00
Edward Yang	bf65df5310	Get rid of possible ODR violation with const char*. (#8962 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Closes https://github.com/pytorch/pytorch/pull/8962 Differential Revision: D8668580 Pulled By: ezyang fbshipit-source-id: 466a5940f175f1f339dc826a1ab32bf3e42e64fd	2018-06-28 17:53:55 -07:00
Teng Li	5b7951057d	Distributed Data Parallel Module Implementation (#8584 ) Summary: This is an initial implementation of Distributed Data Parallel module for c10d GLOO and NCCL backend. Have done performance testing and made sure that both single GPU / process and multi-GPU / process are able to overlap communication with BW computation The idea is, DDP will bucket parameters and do all reduce in the reverse order of the bucket. Since all C10D ops are async ops, no more dedicated thread is needed and we simply queue the all-reduce kernels once the bucket is ready following the deterministic reduction order. Tested with 8 nodes 64 GPUs, ResNet 50, hit the required accuracy within 90 epochs Closes https://github.com/pytorch/pytorch/pull/8584 Reviewed By: goldsborough Differential Revision: D8678696 Pulled By: teng-li fbshipit-source-id: 440341b804befc6762e92acece2759ba47157cea	2018-06-28 17:25:40 -07:00
Christian Puhrsch	30549a1293	Deal with more threads than necessary (#8961 ) Summary: Fixes https://github.com/pytorch/pytorch/issues/8949 Closes https://github.com/pytorch/pytorch/pull/8961 Reviewed By: colesbury Differential Revision: D8669829 Pulled By: cpuhrsch fbshipit-source-id: 368f76a2c6602a62fb7609d404af9753c87dc605	2018-06-28 16:44:23 -07:00
James Reed	2e23bc1a20	Switch to emitting ScriptModule for scripted and traced functions (#8876 ) Summary: Solves https://github.com/pytorch/pytorch/issues/8716 and closes https://github.com/pytorch/pytorch/issues/8867 This makes it so that all of {script, traced} {module, function} create ScriptModules and implements proper inlining between them. This also greatly simplifies things and makes clear that tracing is a way to convert regular Python into a ScriptModule Closes https://github.com/pytorch/pytorch/pull/8876 Differential Revision: D8675996 Pulled By: jamesr66a fbshipit-source-id: 3b12ad4b758324f558074c27c1f1a9fb616b170a	2018-06-28 16:44:21 -07:00
Wanchao Liang	0bd9e96b08	Enable script for time-sequence prediction (#8862 ) Summary: Enable script for the time-sequence prediction, did bunch of hacks to make the script mode work, and couple of issues discovered while enabling the time-sequence prediction, all noted in #8452, Shall we merge this PR and iteratively fix those issues thereafter? Closes https://github.com/pytorch/pytorch/pull/8862 Differential Revision: D8677683 Pulled By: wanchaol fbshipit-source-id: 02319cd56c87de523be898f0e6c541dd15e57cac	2018-06-28 16:10:10 -07:00
Peter Goldsborough	f0772c0ab2	Replace max_pool with max_pool_with_indices (#8946 ) Summary: Re-push from https://github.com/pytorch/pytorch/pull/8892 Closes https://github.com/pytorch/pytorch/pull/8946 Differential Revision: D8666862 Pulled By: goldsborough fbshipit-source-id: 44cd3d63d347316818a7b0f5f89fce8ff7486736	2018-06-28 16:10:08 -07:00
Peter Goldsborough	66465f1e17	Create nn::Module::is (#8970 ) Summary: When initializing weights for my C++ model, I had to write ```cpp void initialize_weights(nn::Module& module) { if (module.name().find("Conv2d") != std::string::npos) { module.parameters()["weight"].data().normal_(0.0, 0.02); } else if (module.name().find("BatchNorm") != std::string::npos) { auto parameters = module.parameters(); parameters["weight"].data().normal_(1.0, 0.02); parameters["bias"].data().fill_(0); } } ``` The string-based module determination is not very nice, and not very C++-y. So I created `nn::Module::is<T>` which does a `dynamic_cast` inside. It also handles the `ModuleHolder` vs. `Module` distinction. It now becomes ```cpp if (module.is<nn::Conv2d>()) { module.parameters()["weight"].data().normal_(0.0, 0.02); } else if (module.is<nn::BatchNorm>()) { auto parameters = module.parameters(); parameters["weight"].data().normal_(1.0, 0.02); parameters["bias"].data().fill_(0); } ``` ebetica ezyang apaszke Closes https://github.com/pytorch/pytorch/pull/8970 Differential Revision: D8677476 Pulled By: goldsborough fbshipit-source-id: 053294e19b6a58cce868167596c89639f7de91c2	2018-06-28 16:10:04 -07:00
Will Feng	15a75208ee	Use std::random_device for generating storage handle (#8971 ) Summary: Currently the `test_RNG_after_pickle` in the PR would fail because pickling a tensor changes the RNG state. This PR aims to fix it. Closes https://github.com/pytorch/pytorch/pull/8971 Reviewed By: ezyang Differential Revision: D8677474 Pulled By: yf225 fbshipit-source-id: 1713d9611699ad288b66d92dbb29ce9feb34b8cf	2018-06-28 15:10:27 -07:00
Xiaomeng Yang	838fdd6f99	Add Cube and Cbrt Ops (#8991 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8991 Add Cube and Cbrt Ops Reviewed By: houseroad Differential Revision: D8678848 fbshipit-source-id: 051dd475e45ad9f1d11a8b32ae3acd1f7459b930	2018-06-28 14:55:30 -07:00
Wei Yang	61ca0ba222	Add log1p for sparse tensor (#8969 ) Summary: - fixes log1p at #8853 - added log1p of sparse tensor in ATen - make log1p of sparse tensor non-differentiable and raise error, because local derivate of log1p for zero element is 1 / (0 + 1) = 1 and make tensor dense Closes https://github.com/pytorch/pytorch/pull/8969 Reviewed By: ezyang Differential Revision: D8677491 fbshipit-source-id: 8363a613519de4bc75eda087ccd20a3eb2d18126	2018-06-28 13:10:11 -07:00
Shaoliang Nie	8d384600b8	Add ShapeTypeInference for Conditional operator (#8924 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8924 Closes https://github.com/pytorch/pytorch/pull/8915 As desc Reviewed By: ezyang Differential Revision: D8649582 fbshipit-source-id: d08a456b9861dd7edd19ed18e16d4778b4240c90	2018-06-28 12:10:24 -07:00
Richard Zou	7310229426	Fix TestCollectEnv flakiness (#8983 ) Summary: The problem was a bad regex; the version hash match used to match 6 wildcards. This PR changes it to match \w+, which is sufficient for the test because the version hash is always followed by either whitespace or a right-paren. Fixes #8981 Closes https://github.com/pytorch/pytorch/pull/8983 Differential Revision: D8677771 Pulled By: zou3519 fbshipit-source-id: dfdde98669bcd682335145cba98c82530a815afa	2018-06-28 11:45:37 -07:00
Xiaomeng Yang	93cc7d1923	Add in_place test for binary ops Summary: Closes https://github.com/pytorch/pytorch/pull/8973 Reviewed By: houseroad Differential Revision: D8674216 Pulled By: BIT-silence fbshipit-source-id: bde1ff7b47dbc8a48d1ff72b345c767af698a09b	2018-06-28 11:45:35 -07:00
Peter Goldsborough	ccc14071f4	Fix Module::zero_grad (#8964 ) Summary: `nn::Module::zero_grad` did not respect undefined `grad()` variables. This is fixed (the code now replicates PyTorch). ebetica ezyang apaszke Closes https://github.com/pytorch/pytorch/pull/8964 Reviewed By: ezyang Differential Revision: D8677529 Pulled By: goldsborough fbshipit-source-id: afdc4ba00dbf5012c37d1f794c731937ee5e422e	2018-06-28 10:26:52 -07:00
Lu Fang	63233f98ad	Bump up opset version to 7 in Caffe2 ONNX exporter (#8854 ) Summary: Will bump up to opset 8 in another PR to match the current opset version. Already tested through generating the models in current model zoo. Closes https://github.com/pytorch/pytorch/pull/8854 Reviewed By: ezyang Differential Revision: D8666437 Pulled By: houseroad fbshipit-source-id: feffdf704dd3136aa59c0f1ff1830c14d1bd20aa	2018-06-28 07:39:02 -07:00
Peter Goldsborough	148088a681	Convert at::Tensor to torch::Tensor in AnyModule (#8968 ) Summary: Operations on `Variable`s (or `torch::Tensor`) usually return `at::Tensor`. This is usually fine, but the `AnyModule` used in the implementation of `torch::Sequential` is very picky about types, and does not understand implicit conversions like this. This means that `sequential.forward(at_tensor_that_is_actually_a_variable)` will fail unless you wrap `at_tensor_that_is_actually_a_variable` with `torch::Tensor`. This PR adds a special case to `AnyModule` that will convert an `at::Tensor` to `torch::Tensor` when the tensor is really a variable, and else just pass the `at::Tensor`. This is a nice little usability improvement for the often-used `Sequential` class. ebetica ezyang Closes https://github.com/pytorch/pytorch/pull/8968 Reviewed By: ezyang Differential Revision: D8670407 Pulled By: goldsborough fbshipit-source-id: 3635ed6ed28238f3900ce4a876d07f1b11713831	2018-06-28 06:40:48 -07:00
Sam Gross	77484d91db	Add AT_WARN to issue warnings from ATen (#8967 ) Summary: Use AT_WARN from python_anomaly_mode instead of printing to stdout. Closes https://github.com/pytorch/pytorch/pull/8967 Reviewed By: ezyang Differential Revision: D8670654 Pulled By: colesbury fbshipit-source-id: 3f7aee8ea06914d7d4381feec086e95f0b194752	2018-06-27 21:24:39 -07:00
Yinghai Lu	c3b499227d	Avoid iomp/gomp clash when building IDEEP ops (#8955 ) Summary: This PR does 3 things - Reorder the search order of `intel_lp64` and `gf_lp64` as the first one is more essential and should have high priority. - Avoid repetitive searching of MKL libraries in `ideep` and `mkldnn` submodule if we already found those in `FindMKL` - Avoid adding more MKL dependencies to IDEEP if MKL is also found. TODO: provide an option for user to chose iomp or gomp. Closes https://github.com/pytorch/pytorch/pull/8955 Reviewed By: bddppq Differential Revision: D8666960 Pulled By: yinghai fbshipit-source-id: 669d3142204a8b47c19a900444246fc44a139012	2018-06-27 21:24:36 -07:00
Junjie Bai	ccd3e2c03d	Skip operator tests in rocm CI jobs (#8720 ) Summary: disable operator tests for now until we have enough rocm workers in CI Closes https://github.com/pytorch/pytorch/pull/8720 Reviewed By: ezyang Differential Revision: D8654871 Pulled By: bddppq fbshipit-source-id: ff2504d6a7182f85f7cc15618f2df8e512447fa8	2018-06-27 20:39:19 -07:00
Yinghai Lu	059ccb62c1	bump up onnx version (#8975 ) Summary: To include the change in https://github.com/onnx/onnx/pull/1151 Closes https://github.com/pytorch/pytorch/pull/8975 Reviewed By: bddppq Differential Revision: D8673552 Pulled By: yinghai fbshipit-source-id: f55c270ef869bd2e19fdabbdf906a6ae12129791	2018-06-27 20:24:30 -07:00
Yinghai Lu	346de2535d	Workaround lack of 0-dim support in ideep (#8959 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8959 MKL-DNN doesn't have support to 0-dim tensor. As a workaround, we produce CPUTensor instead of Ideep tensor in the fallback ops. And for those tensors, we don't need Ideep copy op anymore. Reviewed By: viswanathgs Differential Revision: D8665168 fbshipit-source-id: 59678de2c5aed8c691ab5caaadede6d6c000dd7b	2018-06-27 20:24:28 -07:00
Peter Goldsborough	03d0a70a4d	Set random seed at the start of C++ tests (#8903 ) Summary: Sets the random seed at the start of C++ tests so that everything is super deterministic. I made sure we only generate random values from torch instead of `std::`, so that this seed always applies. I.e. I do: ``` torch::randint(2, {2}, at::kInt64) ``` instead of ``` std::rand() % 2 ``` Also got rid of the tests that test the random seeding, since it would interfere here. And the test is not useful since we just use ATen's seeding mechanism, which should work. Fixes #7288 #7286 #7289 ebetica ezyang Closes https://github.com/pytorch/pytorch/pull/8903 Differential Revision: D8667269 Pulled By: goldsborough fbshipit-source-id: a833e86e156d5e68dae8c53a4b1c433cb0608b6c	2018-06-27 20:09:46 -07:00
Karan Dwivedi	a41d433d9d	Check key should be string in nn.Module.add_module, parameter and buffer (#8960 ) Summary: Because I probably messed up the rebase in https://github.com/pytorch/pytorch/pull/8905 Closes https://github.com/pytorch/pytorch/pull/8960 Reviewed By: soumith Differential Revision: D8668202 Pulled By: ezyang fbshipit-source-id: 41e19803c7ac7aac898c8e70c6a9769314476ca9	2018-06-27 19:40:00 -07:00
Richard Zou	07b6c28715	Fix comment in file Summary: Closes https://github.com/pytorch/pytorch/pull/8966 Differential Revision: D8670090 Pulled By: zou3519 fbshipit-source-id: fe92f31264cec89b0e0139f44720dd72b4f31c6e	2018-06-27 19:11:14 -07:00
Duc Ngo	f52c2ca1c6	net_async tracing use enable_profile arg from NetDef (#8927 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8927 Closes https://github.com/pytorch/pytorch/pull/8855 - Add parameter `enable_tracing` to the Arg field of NetDef. `net_async_tracing` will only enable Tracer for Net instances that have this field set (unless the command line argument also include the net name). - Append a unique id to the json profiling result file because there could be multiple instances of the same net running. - Dump json profling file regularly instead of just when the Tracer object is destroyed Reviewed By: ilia-cher Differential Revision: D8372378 fbshipit-source-id: 8adc9d59f48b67456beed2e3a88235c298fdfd01	2018-06-27 16:24:57 -07:00
Dmitriy Serdyuk	ba8e133844	Refactor batch sampler (#8958 ) Summary: Fixes #8652, fixes #8957 Closes https://github.com/pytorch/pytorch/pull/8958 Reviewed By: ezyang Differential Revision: D8668253 Pulled By: soumith fbshipit-source-id: 663d461621511166f29cfcc902e6c2a71befa647	2018-06-27 16:06:47 -07:00
Dmytro Dzhulgakov	6aa8b67ed0	Attempt to fix operator<< in Caffe2 Summary: Closes https://github.com/pytorch/pytorch/pull/8947 Reviewed By: dzhulgakov Differential Revision: D8664902 Pulled By: bddppq fbshipit-source-id: 1cf7123062b8604e4477eee6142b087675344992	2018-06-27 14:54:45 -07:00
Peter Goldsborough	fef9a66d08	Use torch:: instead of at:: (#8911 ) Summary: This PR is the final step to making `torch::` the only namespace users of the C++ API ever see. Basically, I did: ``` cpp namespace torch { using namespace at; } ``` And then changed `torch::` to `at::` almost everywhere. This worked surprisingly well out of the box. So users can now write `torch::relu` and `torch::log_softmax` and `torch::conv2d` instead of having to know when to use `at::` and when `torch::`. This is happy! Another thing I did was to have `using Dtype = at::ScalarType`, which will be the eventual name anyway. ebetica ezyang apaszke zdevito Closes https://github.com/pytorch/pytorch/pull/8911 Reviewed By: ezyang Differential Revision: D8668230 Pulled By: goldsborough fbshipit-source-id: a72ccb70fca763c396c4b0997d3c4767c8cf4fd3	2018-06-27 14:42:01 -07:00
Orion Reblitz-Richardson	4c5192788b	Cleanup of the shipit commit (#8956 ) Summary: Some files shouldn't have been added. Minor changes. Closes https://github.com/pytorch/pytorch/pull/8956 Reviewed By: pjh5 Differential Revision: D8667962 Pulled By: orionr fbshipit-source-id: 3331c6e93763ea4ea5b0c17dba1f0fc92172fd1b	2018-06-27 14:41:59 -07:00
Zhicheng Yan	e6208b3340	by default, donot throw image decoding error (#8951 ) Summary: Closes https://github.com/pytorch/pytorch/pull/8951 Change default value of max decode error rate to 1.0 which means we don't throw such runtime error by default Reviewed By: avulanov Differential Revision: D8665640 fbshipit-source-id: 9d373979dd8a97253ad528b167f8d73a28fee82a	2018-06-27 14:26:49 -07:00
Jerry Zhang	da4cb226d8	Fix a bug introduced by the deletion of copy constructor of tensor Summary: Closes https://github.com/pytorch/pytorch/pull/8942 Reviewed By: jerryzh168 Differential Revision: D8666530 Pulled By: orionr fbshipit-source-id: ddb311141ec7dbf163665ebfc6b475b219a5a999	2018-06-27 13:10:58 -07:00
Jesse Hellemn	a898a8f1f0	Adding pyyaml to mac and windows builds Summary: Closes https://github.com/pytorch/pytorch/pull/8851 Reviewed By: mingzhe09088 Differential Revision: D8666075 Pulled By: pjh5 fbshipit-source-id: a3fdc9f9801f814b1e4010bd20ba51afbb048a1d	2018-06-27 13:10:57 -07:00
Orion Reblitz-Richardson	624303340e	Remove third_party from CODEOWNERS file (#8950 ) Summary: No longer required now that we've switched over to ShipIt on master. Closes https://github.com/pytorch/pytorch/pull/8950 Reviewed By: Yangqing Differential Revision: D8666175 Pulled By: orionr fbshipit-source-id: 6d8b8b38f6558d87cabd0aa19b72a390057c137b	2018-06-27 11:54:42 -07:00
Edward Yang	6446ffa536	More detailed help message for 'without ATen_cuda library' message. (#8898 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Closes https://github.com/pytorch/pytorch/pull/8898 Differential Revision: D8661562 Pulled By: ezyang fbshipit-source-id: 9cb976f9642c6f40902b10b34eada2d6ff6fd81c	2018-06-27 11:44:01 -07:00
Tongzhou Wang	d9c64851e9	Fix nccl/CMakeLists.txt (#8948 ) Summary: Changes (were merged) in #8834 and #8829 (cc yf225 ) were lost in `9ec0a2aef4 (diff-6997846ce6daf0c271e2db9ef0508551)`. This PR resubmits them. Closes https://github.com/pytorch/pytorch/pull/8948 Differential Revision: D8665760 Pulled By: SsnL fbshipit-source-id: 15514021fa79e6b908ea665dd6cb464b3ea00ab0	2018-06-27 11:44:00 -07:00
Mingzhe Li	c4744cfafa	bilinear upsample operator on CPU Summary: Add support for bilinear upsample operator on CPU. Reviewed By: BIT-silence Differential Revision: D7853215 fbshipit-source-id: 9043c95f9eb4e1f6df324e8f7a4e8fdb0c758f66	2018-06-27 10:12:06 -07:00
Edward Yang	c82715ced5	Add some extra punctuation to README. (#8941 ) Summary: Signed-off-by: Edward Z. Yang <ezyang@fb.com> Closes https://github.com/pytorch/pytorch/pull/8941 Differential Revision: D8661797 Pulled By: ezyang fbshipit-source-id: 876163b11a8463d7560308b2b8e68231f2a657cb	2018-06-27 08:56:13 -07:00
Orion Reblitz-Richardson	9ec0a2aef4	fbshipit-source-id: ba600fcd2b5cefc7621357bdeb05e24cea02e5af	2018-06-27 04:50:56 -07:00
Peter Goldsborough	290d20b094	Replace max_pool with max_pool_with_indices (#8892 ) * Create max_poolXd_with_indices * Match ATen names in ONNX symbolic	2018-06-26 17:09:30 -07:00
Orion Reblitz-Richardson	edb88b5f3a	Update from Facebook (#8887 ) * add opencl + fpga context adds an opencl context inside caffe2/fb which can be used for fpga access * [Caffe2] Force tensor inference checks to be triggered during testing We've started to rely on TensorInference functions more for different analysis. This diff ensures that the TensorInference function's result matches what is expected from the definition of the operator. * Enable building //caffe2:torch with @mode/opt In @mode/opt, python runs out of a PAR, which breaks a lot of assumptions in the code about where templates/ folders live relative to __file__. Rather than introduce hacks with parutil, I simply turn template_path into a parameter for all the relevant functions and thread it through from the top level. * [Caffe2] Fix cost models for DotProduct and Div. Update Tensor Inference for dot product As title. DotProduct states that output is a 1-D tensor (https://caffe2.ai/docs/operators-catalogue.html#dotproduct) though code suggests it is either 0- or 1-D depending on inputs. TensorInference defined to support implementation. * [SG-MoE] Add an option to make the experts NOT as components * [nomnigraph] Rename and fixup convertToNeuralNetOperator API This will make things a bit cleaner * no longer symlink THNN.h and THCUNN.h * forced decoder network (onnx export) Closes https://github.com/pytorch/translate/pull/95 Add networks in ensemble_export.py to create a forced decoding network from PyTorch NMT checkpoints. This network takes an arbitrary numberized (source, target) pair and returns the model score for the translation, including penalties. Vocabulary reduction networks are also supported, but note that target indices which are not in the possible_translation_tokens generated for the source input will be trea * Revert schema change to fix production models Revert schema change to fix production models * MockLogDeviceReader - rebase on FIX # Goal 1), Build a make_mock_log_device_reader using make_mock_reader 2), Replace the real log_device_reader here: https://fburl.com/raihwf1p # Log by D8151734 Real log_device_reader: ``` I0529 20:29:05.373108 954994 tensor.h:839] Tensor print_net/log of type std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >. Dims: (): read_net/ParseOpenTrainingRow:0 I0529 20:29:05.373244 954994 tensor.h:839] Tensor read_net/ParseOpenTrainin * [C2/D2][1/n]: Nonnegative-Constrained Optimization -- log barrier implement log barrier as a regularization method * Add teacher weight screening. Add teacher weight sceening according to teacher labels. If teacher label is zero, we do not use the distill loss in the objective function. * Add NormalizerContext See task for more detail. This implementation is a copy of what exists for RegularizerContext except for how the parameters are defined in the model_definition thrift file. I'll try an alternative implementation which overrides the default arguments of functions instead like for argscopes in tensorflow. https://github.com/pytorch/pytorch/compare/master...MaximeBoucher:update-from-facebook-0939578c068c?expand=1 * Adding cosine similarity option in dot processor Add pairwise cosine similarity option in dot product. Add an option to concate dot product and cosine similarity. Add test cases. * [nomnigraph][redo] Concat elim for sparseNN Same as D7962948, which was reverted because Operator Schema was not defined * [pytorch] Revert pytorch/pytorch#7918 'Release GIL when copying to shared memory', breaks ASAN Revert this pytorch diff that breaks ASAN when running Filament in dev mode; in opt mode it gives "bad file descriptor" errors. Looks like a race when copying tensors to shared memory in multiple mp.Queue's (which spawn separate threads). https://github.com/pytorch/pytorch/pull/7918/files * [nomnigraph][mobile] Enable nomnigraph by default, use -Oz on nomnigraph related code to reduce code size enables nomnigraph and reduces codesize * [Warmup] Allow both offline incremental training and online training Change plan name on saving side and reading side to support both training type This diff depends on D8128530 and D8168651. * Revert D7802642: [Warmup] Allow both offline incremental training and online training This reverts commit afc213cf9b36cecf75333a788391c4d09f4afccc @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Add legacy grad logic to fix div op on old graphs. Add legacy grad logic to fix div op on old graphs. * Correctly propagate operator failures Propagate errors from operators that throw exceptions and return false * Revert D8374829: [caffe2][nomnigraph][redo] Concat elim for sparseNN This reverts commit 6dda028c463e54bb5c32188bbbe9202107e188a5 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [Caffe2] Added extra_info to core.DeviceOption(), enforced extra_info to be inherited in scope.DeviceScope extra_info is a newly defined field in DeviceOption proto. This diff added extra_info to the core.DeviceOption(). And, In scope.DeviceScope(), this diff enforce the new scope to inherit the extra_info from old scope. * [opt] hgdirsync wasn't enabled, merge diverged code Here's the damage, P59732616 basically xplat was left behind but had the change from assert to CAFFE_ENFORCE * OMP parallelism over RoIs for RoIAlign op Simpler to parallelize over RoIs. Shouldn't affect other uses as it relies on the number of OMP threads set during startup. PR: https://github.com/pytorch/pytorch/pull/8562 * Use int64_t for shape in FillOps to avoid overflow of int32 * Implement Rotated RoIAlign op Based on Rotated RPNs as explained in https://arxiv.org/abs/1703.01086. The idea is simple - orientation/angle is added as an RPN anchor parameter and then the angle is further regressed similar to bbox coords. There are some additional changes related to NMS and IoU, but besides that it's a direct extension to Faster-RCNN. Further details in https://fb.quip.com/sZHlA1iMfWPZ. RoIs are represented in [center_x, center_y, width, height, angle] format. `angle` repre * Rotated RoIAlign op CUDA forward implementation CUDA forward impl for D8415490 * RoIAlignRotated op CUDA backward pass implementation TSIA * All remaining fixes to eliminate process_github.sh Most of this diff has already been reviewed separately, except for the parts relating to _thnn/utils.py and _utils._internal.py remove skipIf(True, 'Fbcode') line from process_github.sh replace sed of cpp file with #ifdef to control cudnnDestroy use undo sync-time deletion of .gitattributes, remove process_github.sh switch to using _utils._internal rather than try-import-except This diff also fixes the open-source bug where rebuilds have * Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" Original commit changeset: 7707d2efe60e The original diff is backout becuase the online trainer package is backed out. This code would only work with new online trainer package * [easy] improve error log in adagrad op as title * re-allow use of thnn_h_path This fixes cffi usage in OSS * [4/4] [tum] paralyzing layerNorm for GPU full sync as title * add compile=False to pytorch tests, remove hack with pyc * Add shape and type inference for RowWiseArgMax operator See title * Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training" This reverts commit 78167eeef0af16b60f72c82f9dcdda9b41b4dcbd @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [fix-flaky-test] mock_hive_reader_test flaky, because GlobalCounter collects local counts intervally # Problem `MockHiveReader` uses `GlobalCounter` to limit `max_examples`. GlobalCounter on server node collect local counts from worker nodes every 1 sec. This 1 sec delay makes it impossible to limit exactly to the `max_examples`, it will definitely exceed `max_examples`. # Plan Given, ``` Expected num_examples = max_examples + num_examples/sec (Read Speed) x 1 sec (GlobalCounter Sync Int * [Caffe2] Fix FCGradient cost inference. Prevent overflow in cost inference FCGradient missed a factor 2 in the `num_outputs == 3` case. Overflow was occurring with flop calculation for FC. Changed types to `uint64_t` to prevent future problems. * Fix binary ops with empty inputs Fix binary ops with empty inputs * Support the filling of input blob with provided data as title for Biz Integrity case * Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" Original commit changeset: 30c55dd38816 Original diff is reverted due to introducing bad integration test. Fixed the integration test. * [c2][easy] improve pack ops error loggings as desc. * Add ShapeTypeInference for LpNorm operator As desc * Shard test_nn to reduce runtime for each test target Closes https://github.com/pytorch/pytorch/pull/8793 The current test_nn would time out and be disabled in GreenWarden, and we need to have an option to split it up in order to pass the stress test. Right now GreenWarden roughly allows running 100 test cases in test_nn before timing out, and here we have an option to divide test_nn into 30 shards (with ~40 tests in each shard) to allow for some test suite growth in the future. * Change default caffe2_streams_per_gpu to 1 * Remove IN_SANDCASTLE from common.py and test_nn.py We prefer to disable the failing tests through Sandcastle UI instead. * Add a new class for an updated prof_dag.proto This diff contains: - An updated prof_dag.proto that contains blob profiles. - A class to deserialize this information (serialization is in a follow up diff) - Update to separate profiling information from NeuralNet (and use it as part of the class above). - Unit tests * Lambdarank for SparseNN This diff adds a lambda_rank_layer for SparseNN. changes include 1) Adds support for multi sessions in c2 op 2) Adds support for two different loss functions in c2 op 3) Unit tests for op * Revert D8586950: Back out "Revert D8515341: Back out "Revert D7802642: [Warmup] Allow both offline incremental training and online training"" This reverts commit 012220ed63eccc35659a57b31d16a3625da6317b @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [easy] A few fixups to multithread predictor benchmark (1) support perf on T6 server (2) remove dead code * fix a bug about the map size as title * Fix reduce sum on in-place case. Fix reduce sum on in-place case. * [Warmup] Reland reverted diff Allow both offline incremental training and online training Closes https://github.com/pytorch/pytorch/pull/8827 fix net transform integration test. Allow offline and online trainer to coexist D7802642. * Add StoreHandlerNotAvailableException Add an exception for a store that is not available or has been deleted. * Use exception handling for fault tolerance, missing KV store Remove status blobs to communication ops so that exceptions propagate on failure. * [C2/D2][2/n]: Nonnegative-Constrained Optimization -- bounded grad proj for simple bounded constrained optimization, incl non-negative box constraints. * [GanH]: Adaptive Weighting with More Estimations With implemented postivity optimization, we now learn adaptive weights with different parameterizations. This improves parameter estimation and training stability. * Revert some changes for landing * Remove AutoNoGIL in StorageSharing * Temporarily disable net_tests * Revert "[Caffe2] Force tensor inference checks to be triggered during testing" This reverts commit 67ef05c22b2f71b4a489695384932f968384a2a4. * Revert "Fix reduce sum on in-place case." This reverts commit 6cb8a8e1b3db7b6d20941b0053e3f3836068eb64. * Revert "Revert "Fix reduce sum on in-place case."" This reverts commit 130a257c0893dc09f4bd6e6a45d112261807fd2c.	2018-06-26 14:55:48 -07:00
Tongzhou Wang	055f527242	[build] Use conda cmake in two CI builds (#8864 ) * use conda cmake in pytorch-linux-xenial-cuda8-cudnn6-py2 and pytorch-linux-xenial-cuda9-cudnn6-py3 * update test_expect * add exit 1 * check cmake 3.5 * bump expect driver version * add back space	2018-06-26 17:22:04 -04:00
Peter Goldsborough	55757357b2	[C++ API] Better forward methods (#8739 ) * Better forward methods in C++ API capitalize error message in test_torch.test_flatten Support for operator() * Add operator() to Functional * Get rid of SigmoidLinear * Add BoundFunction to FunctionalImpl * Remove macro from conv because it makes errors more nasty	2018-06-26 13:23:16 -07:00
Pieter Noordhuis	f607794dc2	[c10d] No default device for ProcessGroupGloo (#8888 ) This should be set by the code that instantiates it, be it the Python bindings or other C++ code. Defaulting to use localhost is not useful beyond tests. Instead of keeping multiple default paths around we can punt on it here and require it to be initialized elsewhere.	2018-06-26 11:37:20 -07:00
Vadim Velikodniy	74d2d562f3	Fix default values for affine= in the docstrings of InstanceNormXd (#8895 )	2018-06-26 14:06:31 -04:00
Edward Z. Yang	76e9dbad37	Stop making dynamic allocations of PinnedMemoryAllocator. (#8896 ) There is no relevant state in PinnedMemoryAllocator, so we can have a single allocator with static lifetime. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-26 14:03:44 -04:00
Peter Goldsborough	1f36caceb2	[C++ API] Rework optimization package (#8815 ) * Rework optim folder * Removed TORCH_OPTIMIZER_CLASS macro * Got rid of CRTP/Impl * Removed TORCH_AUTOGRAD_KWARG * Differentiate between Optimizer and LossClosureOptimizer * Make Optimizers parameters based instead of model based * Allow construction of optimizer from arbitrary vector * Added test for zero grad * Added test for external parameter vectors * Now comparing against baseline values * Documentation * Post rebase fixes * Different strategy for creating and accessing buffers in optimizers * Fix member ordering	2018-06-26 10:13:14 -07:00
Yu Feng	22ba8726da	Mention MPICH_MAX_THREAD_SAFETY=multiple. (#8580 ) Currently, this is a common step to enable level 3 support on MPICH based systems.	2018-06-26 12:40:48 -04:00
gchanan	31327dd1e1	Unify isViewable, handle n-dimensional empty tensors. (#8883 ) * Unify isViewable, handle n-dimensional empty tensors. 1) Unifies the two isViewable functions in ATen and TH. 2) Handle n-dimensional empty tensors in the implementation 3) Clarify some comments. This requires an extra copy in the TH case, but that will go away. * Also unify THCTensor version. * Remove C-linkage from THTensor_compute_stride. * Update comment.	2018-06-26 12:38:45 -04:00
Vadim Velikodniy	6e28d4d364	Add pos_weight argument to nn.BCEWithLogitsLoss (#5660 ) (#6856 ) * Add pos_weight argument to nn.BCEWithLogitsLoss and F.binary_cross_entropy_with_logits (#5660) - Add an option to control precision/recall in imbalanced datasets - Add tests (but new_criterion_tests) * Move pos_weight to the end of args list in the documentation. `pos_weight` was moved to the end because it is the last argument in both `nn.BCEWithLogitsLoss` and `binary_cross_entropy_with_logits`	2018-06-26 12:31:07 -04:00
Tongzhou Wang	f935ba1b05	[build] Enable clang-specific warnings only when using clang (#8869 ) * Wraps clang only warnings in an if * add back -Wno-missing-field-initializers	2018-06-26 11:09:25 -04:00
Pieter Noordhuis	8e019826c9	Fix cmake cudnn autodetection (#8891 ) If CUDNN_INCLUDE_DIR, CUDNN_LIB_DIR, and/or CUDNN_ROOT_DIR were set, but USE_CUDNN was not explicitly set, the code in cmake/Dependencies.cmake would set USE_CUDNN=OFF even though it could be found. This caused an issue in ATen, where it includes its CuDNN bindings if the variable CUDNN_FOUND is set. This was the case, because the find_package call in cmake/public/cuda.cmake searches for CuDNN and ends up finding it. The net result is that ATen tried to compile CuDNN bits, but the caffe2::cudnn target is never defined let alone added as dependency, and the build fails on not being able to find the header cudnn.h. This change does two things: 1) Restore CuDNN autodetection by setting USE_CUDNN=ON if it is found. 2) Remove obsolete FindCuDNN.cmake module. This functionality now lives in cmake/public/cuda.cmake.	2018-06-26 06:54:27 -07:00
Pieter Noordhuis	af741dc2fd	[c10d] Fix link order for building C++ tests (#8889 ) List dependency on gloo_cuda before dependency on gloo such that unresolved symbols in gloo_cuda are correctly resolved (since the linker resolves from left to right). This fixes building c10d C++ tests on GCC 4.8.	2018-06-25 23:59:32 -07:00
anderspapitto	8ef5d37ac5	directly add_subdirectory(nanopb) from torch CMakeLists (#8870 ) currently torch/CMakeLists doesn't know how to find nanopb without some higher-level script (setup.py or build_all.sh) telling it where to look, which is an obstacle towards fully CMake-ifying libtorch.so. This change removes that dependency.	2018-06-25 21:23:25 -07:00
Peter Goldsborough	47492ed451	[C++ API] Bag of fixes (#8843 ) * Bag of fixes * Rename tensor_range.h to tensor_list_view.h * Post rebase fixes * Rename torch::tensor namespace to torch::tensors due to name conflict * Avoid recursion in Module::to	2018-06-25 21:11:49 -07:00
Tongzhou Wang	3d580f2f7d	[build] Raise in cmake when seeing NVCC{9/9.1} + GCC6 combo (#8863 ) * Add error message for NVCC{9/9.1} + GCC6 combo * requires -> require	2018-06-26 00:07:13 -04:00
Peter Goldsborough	8e98a1a84d	Create avg_pool1d in ATen (#8880 ) * Create avg_pool1d in ATen * Put function name into check1d method	2018-06-25 20:31:32 -07:00
li-roy	85f4d2b55a	throw error when grid_sample is passed unsupported mode (#8884 )	2018-06-25 22:37:41 -04:00
Zachary DeVito	f74207c99f	Allow autograd to work even when the shape of values cannot be determined (#8641 ) This commit implements the solution proposed in https://github.com/pytorch/pytorch/issues/8410 to workaround the need to create zero tensors with the same shape as inputs. It introduces the concept of a LinearBlock which marks places in the code where we know if all the inputs to the node are zero, then the outputs to the node are also zero. Autodiff introduces LinearBlocks around backwards functions, which have this property. specializeUndef then propagates Undef nodes using this information. Notes: * Since we do not always specialize, we have a pass LowerLinearBlocks that replaces the block with an if statement that dynamically guards the Undef case. * We introduce AutogradAdd which is addition that still works when its inputs might be undefined. In cases where we specialize this will get removed in favor of a normal add, but there are cases where gradient graphs do not specialize (e.g. when they are not differentiable, but a derivative is required) so it is important for this op to be executable.	2018-06-25 18:40:04 -07:00
Peter Goldsborough	7a614799f7	Make at::Tensor::to() const (#8839 ) * Make at::Tensor::to() const * Add cheaper checks to Tensor::to	2018-06-25 17:55:10 -07:00
onnxbot	5cb8586dde	[auto] Update onnx to 458c521 - Fix typo (onnx/onnx#1143 ) `458c521844`	2018-06-25 23:37:19 +00:00
Xiaomeng Yang	288d37998a	[Caffe2] Fix gradient_check on in-place ops (#8828 ) * Fix gradient_check on in-place ops * Fix hsm_test * Fix SplitByLengthOp test * Fix input_device_options for gradient_checker * Fix hypothesis_test_util.py	2018-06-25 15:25:56 -07:00
Tongzhou Wang	838fb87874	Fix as_strided_backward (#8721 ) * make as_strided safer * patching as_strided; and stop using it in backward * Test a simple case in as_strided_backward * a long note * remove boundary checks of as_strided; implement slow path * wip * fix as_strided backward when input is overlapping check for input overlapping too [doc] clarify gradcheck behabior when input is overlapping longer note * fix a deprecation warning in test_autograd * nits	2018-06-25 18:17:35 -04:00
Soumith Chintala	b5a123c06c	[jit] Add python bindings for Gradient and differentiate (#8830 ) * improve assertion error message in jit::differentiate * add python binding for Graph::copy * add pybind for jit::differentiate and jit::Gradient	2018-06-25 18:09:29 -04:00
Praveen Palanisamy	49a3e49627	Fixes #8508 . Upcasted loc to 1-d if a scalar loc is provided to MultivariateNormal (#8543 ) * Fixes #8508 Broadcasted loc to 1-d if a scalar loc is provided to MultivariateNormal. * move to non-inplace	2018-06-25 18:06:51 -04:00
onnxbot	41181169ae	[auto] Update onnx to 6bedd27 - add broadcasting support for min/max/sum/mean (onnx/onnx#1124 ) `6bedd27b03`	2018-06-25 22:03:11 +00:00
gchanan	89afb93e1d	Delete dead TH size inference code. (#8866 )	2018-06-25 17:45:43 -04:00
Sebastian Meßmer	cca247635c	First version of dispatcher (#8713 )	2018-06-25 13:11:53 -07:00
Tongzhou Wang	2b926aafb0	[build] disable test_expect for pinning cmake to 3.5* in dockerfiles repo (#8850 ) * pin pytorch-linux-xenial* to use cmake 3.5* * disable test_expect	2018-06-25 14:21:42 -04:00
gchanan	04440d2c57	Fix nonzero and tensor printing of n-dimensional empty tensors. (#8849 )	2018-06-25 12:09:47 -04:00
Tongzhou Wang	1e7fcb5d1b	fix NCCL NVCC_GENCODE w/ multiple archs (#8834 )	2018-06-25 08:07:53 -07:00
Ben	e251fb5036	Add file and line to CUDA_CHECK and CUDNN_CHECK (#8836 ) * Add file and line to CUDA_CHECK and CUDNN_CHECK * use stringstream * clang-format * switch to AT_ERROR	2018-06-25 10:46:52 -04:00
Will Feng	e31ab99932	[Ready for Review] Better fix for NCCL + sccache (#8829 ) * Better fix for NCCL + sccache * Try to set NUM_JOBS to 1 * Try to fix third_party/nccl/CMakeLists.txt as well * Pass NUM_JOBS to nccl/CMakeLists.txt	2018-06-25 02:17:07 -04:00
ptrblck	50410c9572	fixes #8840 (#8841 )	2018-06-25 02:01:05 -04:00
Peter Goldsborough	a5df8ec841	Created DefaultTensorOptions in ATen (#8647 ) * Created DefaultTensorOptions * Fix TensorOptions() call which was interpreted as function decl * Fix empty OptionsGuard * Make options_ and mutex_ in DefaultTensorOptions class static because of dynamic linker issues * Make DefaultOptions thread local	2018-06-24 21:15:09 -07:00
Peter Goldsborough	521f5111ad	[C++ API] Use torch::Tensor instead of at::Tensor/Variable mix (#8680 ) * Use torch::Tensor instead of at::Tensor/Variable mix * TensorRange -> TensorListView	2018-06-24 19:03:39 -07:00
peterjc123	22a70fbe2e	Minor fixes for finding CUDNN (#8743 ) * Minor fixes for finding CUDNN * Minor fixes for comment * Fix lints * Fix naming conflicts * Fix import name	2018-06-24 21:42:19 -04:00
Thomas Viehmann	fc22bf3e82	Spectral norm improvements (#8590 ) * Spectral norm improvements - Don't do iterations on weight in eval mode To facilitate this, register weight as buffer in order to be able to use module with spectral norm in eval mode after immediately after loading state dict (#8208) - Use weight instead of weight_orig as weight when removing spectral norm - Add dim parameter in case the normalization should occur w.r.t. a dimension other than 0 (#7865) * add and update spectral norm tests * More spectral norm tests Thank you, Simon, for the suggestions.	2018-06-24 17:15:13 -04:00
Edward Z. Yang	3598356420	Port THCS to ATen. (#8689 ) * Port THCS to ATen. General structure of the sparse implementation: - SparseCUDATensor.{cpp, cu} and SparseCUDATensorMath.cu contain the same functions as their CPU analogues - SparseCUDAApplyUtils.cuh contains what used to be in THCSTensor.cu - SparseCUDABlas.cu contains what used to be THCSparse.cu Unrelated improvements: - Forward declared CUDA types in Context.h are now moved exclusively to CUDAHooks - New getCurrentCUDASparseHandle in Context - Support for printing CUSPARSE_STATUS_ZERO_PIVOT error message directly Some unusual pieces: - get_device got the LegacyBridge makeover, as it needs special logic on sparse tensors (defer to the inner tensors). - I noticed that I need to turn off device_guard codegen for many functions in sparse, noticed because get_device became a native function, and resulted in an infinite recursion. This was done by adding device_guard: False to the native definitions. An alternative strategy might be to make the heuristic for deciding when to put in a device guard more clever. Scaffolding removal: - LegacyBridge now special-cases only on sparse versus dense; no more CUDA test (hooray!) - Native bindings get CUDA/SparseCUDA dispatch entries. CPU sparse refactoring: - New SparseUtils.h header, with all of the utility functions that used to live in SparseTensor.cpp - new_with_tensor_sparse now correctly handles both CPU and CUDA - transpose functions in sparse/ turned out to be dead, so I killed them Bugs I noticed while working on this: - I used accessor<...>() on a CUDA tensor, because I thought it does the CUDA-CPU sync. It does not. Last mile changes: - I killed all of the THS/THCS directories, build scripts, bindings everything. It is now no more! - A bunch of trampolines in LegacyBridge are no more; anything that was "sparse only" is now done natively. - `sparse_coo_tensor` is implemented a little funny, but we think it's a good idea. - HIP is handled by explicitly ifdef'ing out all kernels; we'll add support for this at some later point in time. - TH_INDEX_BASE is now unconditionally set to 0. - Some uses of x.type() now replaced with x.options(), the new way of doing it. - More notes about checked_cast_tensor, and eliminate Storage/Tensor fields in the code gen env when they are dead. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-24 15:14:09 -04:00
Tongzhou Wang	731273b8d6	Improve convT output_padding docs (#8825 ) * improve output_padding doc for convT modules * Update functional.py * Update conv.py * lint	2018-06-23 14:33:18 -04:00
ngimel	e4ff0b8aa1	remove unnecessary headers from SpectralOps, add cuda.h include to deviceutils (#8819 )	2018-06-23 14:31:13 -04:00
Vishwak Srinivasan	ebae3f502c	Fix CUDA_NVCC_EXECUTABLE from being set to empty (#8822 )	2018-06-23 11:11:32 -04:00
Jong Wook Kim	7fbd57091d	Doc: specify batch_first is True by default in RNN (#8807 )	2018-06-22 19:33:25 -04:00
bddppq	74fa304b31	[Caffe2] Export clang compilation datatbase in setuptools build (#8811 )	2018-06-22 16:19:43 -07:00
Tongzhou Wang	12904edae9	Test that broadcast doesn't copy when dst and src devices are the same (#8803 ) * test that broadcast doesn't copy when dst and src devices are the same * only test if input is cuda	2018-06-22 17:36:19 -04:00
cpuhrsch	46bff5d9ff	Set MKL VML error mode to ignore (#8800 )	2018-06-22 16:54:47 -04:00
Tongzhou Wang	73b92472d2	[README.md] Use GitLab URL for CMake (#8799 ) * update to GitLab url * use GitLab url for upstream CMake	2018-06-22 16:51:35 -04:00
Vishwak Srinivasan	1d4cf095b8	Add CUDA to logspace and linspace declarations in Declarations.cwrap (#8798 ) * Add CUDA to logspace and linspace These functions are already implemented, but where not exposed. Fixes https://github.com/pytorch/pytorch/issues/8786 . * Add small tests	2018-06-22 16:14:27 -04:00
Tongzhou Wang	675b579bf9	cmake wrapper (#8797 )	2018-06-22 15:29:25 -04:00
Edward Z. Yang	d3ec956d91	Revert "ROCm 1.8.2 does not define CUBLAS_STATUS_ARCH_MISMATCH (#8732 )" (#8791 ) Upstream fixed 1.8.2, and it will be fine in the final release. This reverts commit 9dffaf593e8c58a6d02583079162f4a88cb1bc66.	2018-06-22 15:05:56 -04:00
li-roy	f138111d52	remove unused flag (#8779 )	2018-06-22 10:54:48 -07:00
Ailing	ddda7cfea5	allow output_size to contain None in adaptive pooling methods (#8596 ) * allow output_size to contain None in adaptive pooling methods * fix lint * address comments	2018-06-22 13:29:15 -04:00
Yinghai Lu	b1b77c9eb5	Use virtual dtor for Annotation (#8780 )	2018-06-22 10:20:37 -07:00
Tongzhou Wang	e6c7b38f94	Cache cufft plans (#8344 ) * cache cufft plans * use an LRU cache * suffix CuFFTParams members with _ * import print_function for py2 * lint * fix potential race; add dummy impl for CPU only builds * cpp formatting; remove nccl makefile change * Use CUDA hooks instead * comments and doc * update the error message * move LRU cachae to a separate file and native::detail namespace * update comment * specify NOTE location in CuFFTPlanCache.h * update disabled_features.yaml to make amd ci work * another fix for AMD CI in disabled_features.yaml * Wrap cufft_plan_cache_* methods in __HIP_PLATFORM_HCC__ * improve the notes * lint * revert onnx change * put back inlining for CUFFT_CHECK	2018-06-22 13:02:34 -04:00
Orion Reblitz-Richardson	fed44cb1b3	Remove aten project for main build (#8532 )	2018-06-22 08:40:44 -07:00
Wei Yang	ce13ca235e	added default lambd=0.5 for hardshrink (#8770 ) * added default lambd=0.5 and tests * lint	2018-06-22 09:52:55 -04:00
Orion Reblitz-Richardson	5a7b4840d9	Move nanopb-generated ONNX to unique file name (#8773 ) * Move nanopb-generated ONNX to unique file name * fix other places	2018-06-22 09:51:56 -04:00
Lu Fang	9c426797a8	Expose is_compatible function (#8783 )	2018-06-21 23:37:54 -07:00
onnxbot	83f846ff7a	[auto] Update onnx to 410530e - Make test suite backward compatible (onnx/onnx#1137 ) `410530e8c6`	2018-06-22 06:35:03 +00:00
Hexus (Shihao Xu)	bd95f8f948	Resolve name conflict of ContextManager (#7244 ) * Resolve conflicting name, ContextManager Concept name `Context Manager` is taken by Python. See https://docs.python.org/3.6/reference/datamodel.html#with-statement-context-managers It says, A context manager is an object that defines the runtime context to be established when executing a with statement. The context manager handles the entry into, and the exit from, the desired runtime context for the execution of the block of code. The `ContextManager` here is more like a registry. And there is a C++ registry in caffe2 codebase `caffe2/caffe2/core/registry.h`. There is also a Caffe2DBRegistry, declared by calling `CAFFE_DECLARE_REGISTRY(Caffe2DBRegistry, DB, const string&, Mode);` in `caffe2/caffe2/core/db.h`. I think we can follow the concept name `Registry`, calling it `ContextRegistry`. * Make Classes and Functions internal to this module start with "_" Make Classes and Functions internal to this module start with "_" * Update context.py * Update context.py	2018-06-22 00:41:51 -04:00
gchanan	53c0de57d9	Document ideal vs actual SparseTensorImpl invariants. (#8776 )	2018-06-21 23:08:18 -04:00
Will Feng	fd32cc6118	Disable sccache when building NCCL (#8708 ) * Disable sccache when building NCCL * Fix nccl CMakeLists.txt	2018-06-21 17:30:07 -07:00
cpuhrsch	0750967496	Adjust nested parallelization to deal with OMP (#8723 ) * Adjust parallelization to deal with OMP	2018-06-21 20:24:53 -04:00
onnxbot	54a2e817a6	[auto] Update onnx to bc986de - Add is_compatible method in python backend (onnx/onnx#1132 ) `bc986dee4c`	2018-06-22 00:16:24 +00:00
Mike Ruberry	dc5837a1f4	[JIT] Adds fp16 support to the jit (#8679 ) * adds fp16 support to the jit * improves formatting * improves formatting * added an explanatory comment * fixes Python2 flake8 * updates c code * all except halfs	2018-06-21 18:14:51 -04:00
Pieter Noordhuis	709c300437	[c10d] Configurable number of algorithm entries per key (#8765 )	2018-06-21 14:30:55 -07:00
Sam Gross	2bb7e480c1	Define conversions and operations on at::Half (#8660 ) The goal is to be able to use at::Half throughout ATen, including in CUDA kernels and have it operate like built-in types. This avoids the need for cuda::from_type and cuda::to_type before every AT_DISPATCH_ALL_TYPES_AND_HALF call.	2018-06-21 17:16:32 -04:00
Peter Goldsborough	41c08fe4a1	Add tools/shared/_utils_internal.py to gitignore (#8756 )	2018-06-21 13:28:46 -07:00
Richard Zou	8489c4cc6e	Better support for literals in jit script (#8687 ) Addresses #8177 A design doc can be found here: [gist](https://gist.github.com/zou3519/4b7f13f03cc9f3612bd9363e6405fa0a) version or [quip](https://fb.quip.com/azL1AqUckBdo) version General approach: - Add NumberType, FloatType, IntType to represent Python numbers, floats and ints. - Emit these types for python literals - Change aten_schema such that Scalars are NumberType, int64_t and bool are IntType. - Emit aten::type_as, prim::NumToTensor, and prim::TensorToNum nodes for tensor-number math. (see examples below) - Erase NumberType, prim::NumToTensor, and prim::TensorToNum for ONNX export ### Tensor/number math ``` import torch @torch.jit.script def fn(x): return x + 1 ``` ``` graph(%x : Dynamic) { %1 : int = prim::Constant[value={1}]() %2 : Dynamic = prim::NumToTensor(%1) %3 : Dynamic = aten::type_as(%2, %x) %4 : Dynamic = aten::add[alpha={1}](%x, %4) return (%5); } ``` ### Number/Number Math ``` import torch @torch.jit.script def fn(zero): c = 1 + 1 return zero + c ``` ``` graph(%zero : Dynamic) { %1 : int = prim::Constant[value={1}]() %2 : int = prim::Constant[value={1}]() %3 : Dynamic = prim::num_to_tensor(%1) %4 : Dynamic = prim::num_to_tensor(%2) %5 : Dynamic = aten::add[alpha={1}](%3, %4) %c : int = prim::TensorToNum(%6) # this is the result of the addition ... return (%13); } ``` List of squashed commits: * Introduce Python Number types Added: IntType, FloatType, NumberType with IntType <: NumberType FloatType <: NumberType Changed aten_schema so arguments have corresponding types * Emit a NumberType for python literals. Also emit a NumberType for Scalar default values. * Add prim::NumToTensor and prim::TensorToNum * Add DynamicType -> NumberType implicit cast for bc * Better ensureTensor error message * Add ensureTensorOrNumber. Allow passing Number to some functions Like the range() construct and slices * Patch IntList to work. IntList is still a DynamicType in the frontend: a tensor gets built from a List[int]. Also, IntList[1] is a "union between int and IntList" the way it is implemented. If the frontend sees an int being passed for an IntList[1] arg, it converts it to a tensor as well. * Enforce some order on schemas to avoid overload ambiguity add(Tensor, Tensor) should appear earlier than add(Tensor, Scalar). This matches the order in which python_arg_parser parses its arguments. * Disable std_dim and var_dim tests. With the new schema information, std(input, keepdim) and std(input, dim) are ambiguous. This will need to be fixed at a later date. * Add NumberType erasure pass. This is used for ONNX export and to ensure that NumberType information doesn't reach the interpreter * Add support for mixed tensor/number math ops. * Tests for new functionality. Includes: - Tensor/number math - number/number math - EraseNumberTypes pass test * Patch tests Update expect tests for: - decompose_addmm - loop unrolling tests Because python numbers are now NumberType, they cannot be returned by functions anymore. Work around this by using "torch.full", or by adding a tensor([0]) (taken from FIXME_zerol()). Both approaches are used because torch.full is more readable, but it is broken in some cases. * Add erase_number_types to torch/CMakeLists.txt * Move math back to emitSimpleExpr from emitSugaredExpr * Remove some dead lines * Renable some excluded script/trace tests that are fixed. * Move some tests to expected failure * Address some comments (more addressing to come) * Erase relevant aten::type_as nodes in EraseNumberTypes I also changed it so that EraseNumberTypes is only called for ONNX export. It is no longer used to prevent prim::NumToTensor/prim::TensorToNum from reaching shape_analysis or interpreter.cpp. shape_analysis infers the type of the output of these nodes to be the same as their input. intepreter.cpp treats both of these nodes as no-ops. * Add reminder to fix std/var * Call EraseNumberTypes only when exporting a script module * Update expects after rebase	2018-06-21 15:43:38 -04:00
Tongzhou Wang	3de45f3430	Add ssnl and zou3519 as pytorch doc owner (#8754 )	2018-06-21 15:10:02 -04:00
Tongzhou Wang	be3d65a7e2	i2h<->h2h in gif (#8750 ) * i2h<->h2h * should have 11 frames	2018-06-21 14:46:47 -04:00
James Reed	c8cc246226	[JIT] Tests for calling between different frontend modes (#8704 )	2018-06-21 10:38:03 -07:00
Edward Z. Yang	40262ca9d1	Disable flaky test_lstm_fusion_cpu test (#8747 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-21 10:32:27 -07:00
Will Feng	e07a49e15a	Set DEBUG=1 in trusty-py3.6-gcc5.4 CI build (#8593 )	2018-06-21 12:58:43 -04:00
Will Feng	b300934db6	Add CUDA 9.2 + GCC 7 build and test to CI (#8592 )	2018-06-21 12:58:28 -04:00
Edward Z. Yang	117b77e574	Install vim by default on all Caffe2 docker images. (#8731 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-21 11:10:32 -04:00
Peter Goldsborough	98a7d84a5a	Link to C++ extensions in README.md (#8737 )	2018-06-21 09:48:04 -04:00
gchanan	c0dfe23703	Support n-dimensional empty tensors in (most of) THCUNN. (#8722 ) * Support n-dimensional empty tensors in (most of) THCUNN. * Fix incorrect parens.	2018-06-21 09:12:16 -04:00
gchanan	9b465313cf	Support n-dimensional empty tensors in more of TH/THC. (#8726 ) * Support n-dimensional empty tensors in more of TH/THC. * Fix warning.	2018-06-21 09:11:28 -04:00
Edward Z. Yang	9dffaf593e	ROCm 1.8.2 does not define CUBLAS_STATUS_ARCH_MISMATCH (#8732 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-21 08:31:08 -04:00
Will Feng	ac068fdabe	Use env var to pass sharding options to test_nn.py (#8727 ) Buck doesn't support passing arguments to Python unit tests, and we have to use environment variables to pass the sharding options instead. Also, buck test doesn't go through the __name__ == '__main__' code path and we need to move the env var checking logic to top-level. * Use env var to pass sharing options to test_nn.py * Move env var checking to top-level * fix lint	2018-06-21 08:30:28 -04:00
onnxbot	bbd71a7c81	[auto] Update onnx to 9b9f595 - Make axis optional (onnx/onnx#1128 ) `9b9f595107`	2018-06-21 05:49:53 +00:00
Ben	4f604a436b	Export tensor descriptor (#8313 ) * Export TensorDescriptor * Export descriptors * install cudnn_h * Add tests and with_cuda * tab to space * forgot cpp * fix flake * ld flags * flake * address comments * clang-format * fixtest * fix test * extra headers * extra headers * camelcasing	2018-06-20 22:32:50 -07:00
Edward Z. Yang	35e66efbfc	Don't set HIP flags on non-HIP build. (#8728 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-20 23:53:31 -04:00
onnxbot	6181979a7c	[auto] Update onnx to 7558954 - Use cmath instead of math.h (onnx/onnx#1129 ) `7558954ffd`	2018-06-21 02:56:50 +00:00
onnxbot	d79711d689	[auto] Update onnx to 068f1a4 - Optimization pass to fuse batch normalization operator with convolution operator (onnx/onnx#1106 ) `068f1a4079`	2018-06-20 23:48:51 +00:00
gchanan	f037d392c1	Support n-dimensional empty tensors in (most of) THNN. (#8702 ) * Support n-dimensional empty tensors in (most of) THNN. Most of the argument checking in THNN is directly around dimensionality, which doesn't work in general for n-dimensional empty tensors, because you will end up dividing by 0 or similar. Instead, we change these to check for empty and give error messages for those cases as well. In some cases, the error messages are improved as well. * Fix bug.	2018-06-20 18:30:19 -04:00
Pieter Noordhuis	1e570fa5a8	Add c10d/Def.hpp placeholder (#8711 ) This is a placeholder for the header that is generated CMake. It is needed if you include the c10d headers directly from this directory.	2018-06-20 15:03:58 -07:00
James Reed	802929608c	[JIT] Improve test coverage for ErrorReport instances (#8668 ) * [JIT] Coverage for ErrorReport * Fixes * lint * More coverage	2018-06-20 14:51:53 -07:00
Tongzhou Wang	d00c79f2b5	Improve cudnn RNN backward error message in eval mode (#8706 ) * Improve cudnn RNN backward in eval error msg * fix accidental change	2018-06-20 17:47:17 -04:00
Peter Goldsborough	17784d2029	Make at::tensor faster (#8709 )	2018-06-20 14:46:58 -07:00
anderspapitto	544690bf4e	Update rnn.py (#8705 )	2018-06-20 17:46:09 -04:00
anderspapitto	48e90e3339	Build system changes (#8627 ) * All changes needed to get rid of process_github.sh * allow thnn_h_path	2018-06-20 17:45:26 -04:00
Peter Goldsborough	0acddd6cee	Add torch.cuda.cudnn_is_available (#8703 )	2018-06-20 14:18:03 -07:00
Sebastian Meßmer	85468155ce	Implement OpSchema and a default DispatchKey (#8662 )	2018-06-20 14:14:24 -07:00
onnxbot	f9da3aa1aa	[auto] Update onnx to b1571d8 - ONNXIFI loader library (onnx/onnx#556 ) `b1571d829f`	2018-06-20 20:59:15 +00:00
JackLangerman	5642937ac1	more formatting (#8701 ) * fix formatting in :math: in fold docstring * escape more underscores	2018-06-20 15:32:33 -04:00
Vishwak Srinivasan	3e25b4af6d	Fix #8692 (#8699 )	2018-06-20 15:17:54 -04:00
Wanchao	73ce21a313	Create captured inputs recursively for loop to resolve loop-carried dependencies across nested blocks (#8345 ) * enable captured inputs for if Stmt to fix the carried deps bug in nested blocks * postpone captured inputs deletion and add new test case * recursively generate captured values for nested loops * check asSimple when recursively create captured input	2018-06-20 12:09:24 -07:00
Will Feng	d6c873a393	Shard test_nn to reduce runtime for each test target (#8678 ) * Shard test_nn to reduce runtime for each test target * Use load_tests for selecting tests to enable * fix lint * Use arg parser from common.py	2018-06-20 15:01:28 -04:00
Peter Goldsborough	9335885b1b	Create at::tensor (#8475 )	2018-06-20 11:44:21 -07:00
Richard Zou	b4cd9f2fc9	Clarify mp note about sharing a tensor's grad field. (#8688 ) * Clarify mp note about sharing a tensor's grad field. * Address comments * Address comments	2018-06-20 14:22:38 -04:00
Peter Goldsborough	08c1770d79	Add owner rule for cpp_extension.py (#8700 )	2018-06-20 14:11:28 -04:00
JackLangerman	b492d103ee	fix formatting in :math: in fold docstring (#8696 )	2018-06-20 13:36:57 -04:00
gchanan	b6af5d40bf	Some 0-sized dimension support, port catArray away from resizeLegacy. (#8666 ) * Some 0-sized dimension support, port catArray away from resizeLegacy. The goal of this PR is to port catArray away from resizeLegacy (so we can delete the legacy resize calls), but since catArray has some weird behavior because we don't have arbitrary 0-sized dimension support, I made some effort to fix these both in one pass. The major changes here are: 1) catArray uses the new resize API, no longer the old resizeLegacy API. 2) As 1) is the last usage of resizeLegacy, it is deleted. 3) If compiled with USE_TH_SIZE_ZERO_DIM, catArray will work and properly check shapes for n-dimensional empty tensors. 4) However, we retain the old behavior of "ignoring" size [0] tensors in catArray. We previously allowed this because we didn't have n-dimensional empty tensors. 5) To get the above to work, we also add support for n-dimensional empty tensors for narrow and slice (ifdef USE_TH_SIZE_ZERO_DIM). 6) We change the stride formula for empty tensors to match NumPy; basically, we never multiply by 0 as the size, always at least 1, so the strides are monotonically increasing in the empty tensor case. 7) We print the size of empty tensors if size != [0]; this matches NumPy behavior (even in cases where the size could be inferred from the brackets. 8) For test purposes, we add torch._C._use_zero_size_dim() to add tests for the above. * Fix flake8. * Address review comments.	2018-06-20 13:26:08 -04:00
li-roy	cc6b046f48	Implement flatten function (#8578 ) * Implement flatten function * address comments * allow start_dim=end_dim * undo submodule change	2018-06-20 12:53:06 -04:00
Peter Goldsborough	065fdbd500	Created Tensor::to functions (#8643 ) * Created Tensor::to functions * Only have to(dtype) and to(device) * Ignore requires_grad in TensorOptions(Tensor) constructor	2018-06-20 09:28:08 -07:00
Vishwak Srinivasan	d97c9dd019	Add a warning in gradcheck if inputs precision < float64 (#8663 ) * Solves #8659 This PR adds a warning to alert users about the possibility of a failure in the gradcheck * Fix lint * Update gradcheck.py * Update gradcheck.py * update error message * Update warning message to be more descriptive	2018-06-20 12:23:22 -04:00
li-roy	61b863cbdc	Fix parsing of floating point defaults in python_arg_parser (#8681 )	2018-06-20 12:17:44 -04:00
Pieter Noordhuis	3da27312bb	Export ProcessGroupGloo options to Python (#8664 ) This surfaces the options struct that can be passed to the ProcessGroupGloo constructor to Python. By default, if no options struct is passed at construction time, the Python bindings default to using a struct with a TCP backed Gloo device that uses the machine's hostname to resolve the IP address to bind to.	2018-06-20 09:08:06 -07:00
Jinghui	0e0031e204	Fix build error in pybind_state_ideep (#8684 ) Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>	2018-06-20 08:29:48 -07:00
gchanan	695fd98192	Compatibility: write nDimension/_nDimension corresponding to dim()/_dim(). (#8676 ) Currently, THTensor_(nDimension) goes to _dim(), which makes it difficult to move individual usages over to the new API. Instead, let's create a THTensor_(_nDimension) going to _dim() and THTensor_(nDimension) going to _dim(). To do this, we will redirect all current calls and move them over as we did for _dim() and dim().	2018-06-20 11:00:25 -04:00
Will Feng	6402a4278b	Improve win-build.sh for local build (#8674 )	2018-06-20 09:41:50 -04:00
ngimel	be3e3f2ec8	don't do unnecessary copies for bernoulli_ (#8682 )	2018-06-20 10:53:35 +02:00
cpuhrsch	7fa81d6dbc	Use parallel if get_num_threads 0 (#8677 )	2018-06-19 22:12:15 -04:00
li-roy	8e4fe5dcf4	Fix serialization for Parameters (#8633 ) * Fix serialization for Parameters * address comments * addres comments	2018-06-19 22:11:13 -04:00
Yangqing Jia	637dcdc279	Remove dangling inclusion path (#8671 )	2018-06-19 17:02:20 -07:00
Peter Goldsborough	d46312fd15	Create at::from_blob (#8640 )	2018-06-19 17:00:28 -07:00
Sebastian Meßmer	66e8ecf2ea	16bit typeid (#8534 ) * 16bit typeid * CaffeTypeId::createTypeId() instead of TypeMeta::_createTypeId()	2018-06-19 19:23:58 -04:00
cpuhrsch	4608aa3058	Setup wrappers to get vectorized version of mean (#8618 ) * Setup wrappers to get vectorized version of mean * Responding to review 1 * Responding to review 2 * Use variadic AT_CHECK * Fix AT_CHECKS in ReduceOps	2018-06-19 18:14:35 -04:00
Sebastian Meßmer	d3b690ecd5	TensorTypeId (#8389 )	2018-06-19 15:05:24 -07:00
cpuhrsch	7a048cdcd7	Vectorize non-contiguous unary operations (#8488 ) * Vectorize non-contiguous unary operations All builds pass. Manual Windows rerun is here: https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/9714/	2018-06-19 16:56:49 -04:00
Yangqing Jia	03f7289fcf	Add CAFFE2_USE_CUDNN guard on context_gpu.cu (#8657 )	2018-06-19 13:49:06 -07:00
Tongzhou Wang	2bf8b702a3	Fix broadcast copying device[0] tensor when not using NCCL (#8222 ) * Fix broadcast copying device[0] tensor when not using NCCL; Avoids potential extra copy in flatten_dense_tensors * use toType * revert dense_flat changes * address comments	2018-06-19 16:34:29 -04:00
Tongzhou Wang	a60540ed2b	Make NCCL build select NVCC_GENCODE smarter (#8615 ) * Make NCCL build select NVCC_GENCODE smarter * add info print * replace ; with \s * gencode\s -> gencode= * Don't let nccl use sccache	2018-06-19 16:31:17 -04:00
Teng Li	61c96811be	[c10d] NCCL python binding and CI test, with bug fixes (#8357 ) * [c10d] NCCL python binding and CI test, with bug fixes * Addressed comments and further bug fix * Made NCCL build optional, made C10D libc10d.a only * Fixed tests so that NCCL pg won't run when not neeeded * Addressed comments	2018-06-19 13:02:39 -07:00
James Reed	5ca4f5b43b	[JIT] Remove dead functions (#8658 )	2018-06-19 12:46:23 -07:00
Peter Goldsborough	a2dd707031	[C++ API] Create fixed width dtypes in torch:: namespace (#8639 ) * Create fixed width dtypes in torch:: namespace * Make kByte -> kUInt8	2018-06-19 12:40:58 -07:00
Peter Goldsborough	7ccecbbb4e	Create Tensor::options (#8630 )	2018-06-19 11:09:01 -07:00
gchanan	6cc7670bed	Port all indirect calls of resizeNdLegacy to resizeNd. (#8603 ) * Port all indirect calls of resizeNdLegacy to resizeNd. * Handle 1-d to 1-d resize. * Maintain behavior of tensor.set_(). * Fix lack of initializer_list in C :). * Return full dimensionality from newSizeOf.	2018-06-19 13:28:48 -04:00
zrphercule	65f7797d4d	typo corrected (#8632 )	2018-06-19 10:23:08 -07:00
Orion Reblitz-Richardson	c80a703829	Add CODEOWNERS entry for third_party to track changes (#8654 )	2018-06-19 08:59:11 -07:00
Thomas Viehmann	b8b051cc19	change avg_pool2/3d count_include_pad default to what it is in the docs and in 0.2 (#8645 )	2018-06-19 11:55:57 -04:00
Thomas Viehmann	9a9eadacc6	explicitly check device for grid_sampler (fixes: #8599 ) (#8646 )	2018-06-19 11:53:46 -04:00
Ailing	5f64484800	update to avoid potential duplicate error msg (#8638 )	2018-06-19 08:50:00 -07:00
kittipatv	32bc28dd18	caffe2 export (#8642 )	2018-06-19 00:50:33 -07:00
Ailing	1ac1a9dbc6	update doc for comparison operators (#8636 )	2018-06-18 21:15:22 -07:00
Ailing	f14887a63f	check for exact shape match before loading (#8619 ) * check for exact shape match before loading * Use RuntimeError instead of ValueError to keep it consistent with other errors * fix lint	2018-06-18 20:16:34 -07:00
Peter Goldsborough	271406f276	[C++ API] Make pImpl easy to use in modules to enable happy reference semantics (#8347 ) * Created TORCH_MODULE macro Rewrote Linear Rewrote Dropout and added default constructor to TORCH_MODULE macro Turned TORCH_MODULE contens into a proper base class Added some documentation Got rid of the old Dropout module Got rid of the old Embedding module Got rid of the old BatchNorm module Got rid of the old Conv module Fixing optimizers Rebase Removed old RNN modules and the TORCH_ATTR macro Removed temporary P:: namespace Added cloning behavior to all modules Got rid of some get() calls self review nits Remove noexcept from ModuleHolder methods that can throw Remove spaces Add missing override to reset() methods Added examples to documentation in pimpl.h * Post rebase fixes	2018-06-18 19:45:53 -07:00
Marat Dukhan	d3651585b8	Simplify pthreadpool implementation on top of Caffe2 thread pool (#7666 ) Remove one layer of pointer dereference when calling the thread pool.	2018-06-18 19:06:50 -07:00
Teng Li	2289815fc3	Make CI green again (#8631 )	2018-06-18 17:11:04 -07:00
bddppq	6307c117b3	Fix const type qualifier warning (#8613 )	2018-06-18 16:34:02 -07:00
zrphercule	c44c95fd0b	New operator 'expand' (#8263 ) * operator 'expand' * updated operator with a simple testcase * Revert "updated operator with a simple testcase" This reverts commit 1ce9f8ac567b525677254b0dce5735d7fea133d7. * updated operator with a simple testcase * expand operator with a passed testcase * typo * GPU full support added * GPU support testing... * GPU full supported * formatted * nits repaired * gpu parameters fixed * Expander removed * nits fixed, document added * formatted * new testcases added & nits repaired	2018-06-18 16:33:47 -07:00
cpuhrsch	05c473b85c	Temporarily remove TBB (#8255 )	2018-06-18 19:31:57 -04:00
Peter Goldsborough	4f37a6481d	Fix DeviceGuard usage in THD (#8622 )	2018-06-18 18:33:54 -04:00
Edward Z. Yang	10961a5b6d	Add OpenMPI for MPI tests. (#8625 ) * Add mpich for MPI tests. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Changed to OpenMPI * Comments change	2018-06-18 15:30:01 -07:00
James Reed	a7bf539002	[JIT] add missing check for excluding tensor method tests (#8617 ) * Improve check for addmm in autodiff * Fix missing check for excluding tensor method tests	2018-06-18 15:13:57 -07:00
James Reed	525aa74165	Improve check for addmm in autodiff (#8575 )	2018-06-18 15:12:56 -07:00
Edward Z. Yang	e4f254224e	apt update before installing nccl2 (#8624 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-18 15:10:02 -07:00
gchanan	11ea8175d4	Remove all resizeLegacy calls, except for catArray. (#8616 ) catArray is more complicated because it requires real 0-size dimension support. The other changes are safe in that the functions are never called (and are now deleted), or they are used on a result of THTensor_(newSizeOf), which has a valid size.	2018-06-18 18:08:04 -04:00
onnxbot	0a5fe55c9f	[auto] Update onnx to 53edd9e - Exclude Random Generator from Test Coverage Stat (onnx/onnx#1119 ) `53edd9e80e`	2018-06-18 20:08:51 +00:00
cpuhrsch	90532d5f57	Don't use MKL VML for log2 if below MKL build 20180406 (#8614 )	2018-06-18 16:07:01 -04:00
James Reed	ae25737455	Add kwarg support to test_autograd and stop using deprecated schema for accumulation ops (#8574 )	2018-06-18 12:41:22 -07:00
Richard Zou	2039c7a38f	Fix test_rnn_args_check (#8606 ) test_rnn_args_check generates mismatched input_shape and hidden_shape args. To do this, it changes a dimension of input_shape or hidden_shape to have an incorrect size. Before, the test was changing the size of a dimension to -1. However, this is flawed because an input of size i.e. (6, -1, 2) is wrong. This PR fixes it so that the test changes sizes of dimensions to `bad_size = 7`. As long as none of the other sizes (input_size, hidden_size, num_layers, batch_size) divide this, we don't have to worry about that dimension being accidentally broadcasted into working.	2018-06-18 14:08:57 -04:00
Paul Jesse Hellemn	e62c3a470c	[Caffe2] Make cmake find current Python first (#8569 ) * Make cmake find current Python first * Switch from string syntax to list syntax in cmake/Dependencies	2018-06-18 09:39:37 -07:00
Edward Z. Yang	88db4c816e	Disable flaky Chaining tests (#8601 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-18 11:24:37 -04:00
gchanan	c1d04c73d2	Implement non-legacy TH/THC resize, with pseudo 0-sized dimension support. (#8559 ) Unlike resizeLegacy / resizeNdLegacy, these don't call deprecated methods (e.g. _dim) and don't map between logical sizes (i.e. nDimension == 0 -> size [0]). What you ask for is what you get. The full 0-sized dimension support is hidden behind an ifdef, because we it's not fully supported yet.	2018-06-18 10:37:31 -04:00
Peter Goldsborough	d813ffc613	Dont show Python frames in backtrace (#8579 )	2018-06-18 10:13:08 -04:00
Thomas Viehmann	0ae8b6c027	add fold example and add nn.Fold/nn.Unfold and F.fold/F.unfold to doc (#8600 ) * add fold example and add nn.Fold/nn.Unfold and F.fold/F.unfold to doc and a few drive-by doc fixes * typo	2018-06-18 09:36:42 -04:00
Peter Goldsborough	372d1d6735	Create ATen tensors via TensorOptions (#7869 ) * Created TensorOptions Storing the type in TensorOptions to solve the Variable problem Created convenience creation functions for TensorOptions and added tests Converted zeros to TensorOptions Converted rand to TensorOptions Fix codegen for TensorOptions and multiple arguments Put TensorOptions convenience functions into torch namespace too All factory functions except _like support TensorOptions Integrated with recent JIT changes Support _like functions Fix in place modification Some cleanups and fixes Support sparse_coo_tensor Fix bug in Type.cpp Fix .empty calls in C++ API Fix bug in Type.cpp Trying to fix device placement Make AutoGPU CPU compatible Remove some auto_gpu.h uses Fixing some headers Fix some remaining CUDA/AutoGPU issues Fix some AutoGPU uses Fixes to dispatch_tensor_conversion Reset version of new variables to zero Implemented parsing device strings Random fixes to tests Self review cleanups flake8 Undo changes to variable.{h,cpp} because they fail on gcc7.2 Add [cuda] tag to tensor_options_cuda.cpp Move AutoGPU::set_index_from into .cpp file because Windows is stupid and sucks Fix linker error in AutoGPU.cpp Fix bad merge conflict in native_functions.yaml Fixed caffe2/contrib/aten Fix new window functions added to TensorFactories.cpp * Removed torch::TensorOptions Added code to generate wrapper functions for factory methods Add implicit constructor from Backend to TensorOptions Remove Var() from C++ API and use torch:: functions Use torch:: functions more subtly in C++ API Make AutoGPU::set_device more exception safe Check status directly in DynamicCUDAHooksInterface Rename AutoGPU to DeviceGuard Removed set_requires_grad from python_variables.h and warn appropriately in Variable::set_requires_grad remove python_default_init: self.type() Add back original factory functions, but with deprecation warnings Disable DeviceGuard for a couple functions in ATen Remove print statement Fix DeviceGuard construction from undefined tensor Fixing CUDA device compiler issues Moved as many methods as possible into header files Dont generate python functions for deprecated factories Remove merge conflict artefact Fix tensor_options_cuda.cpp Fix set_requires_grad not being checked Fix tensor_new.h TEMPORARILY put some methods in .cpp files to see if it solves issues on windows and mac Fix bug in DeviceGuard.h Missing includes TEMPORARILY moving a few more methods into .cpp to see if it fixes windows Fixing linker errors * Fix up SummaryOps to use new factories Undo device agnostic behavior of DeviceGuard Use -1 instead of optional for default device index Also move DeviceGuard methods into header Fixes around device index after optional -> int32_t switch Fix use of DeviceGuard in new_with_tensor_copy Fix tensor_options.cpp * Fix Type::copy( * Remove test_non_float_params from ONNX tests * Set requires_grad=False in ONNX tests that use ints * Put layout/dtype/device on Tensor * Post merge fixes * Change behavior of DeviceGuard to match AutoGPU * Fix C++ API integration tests * Fix flip functions	2018-06-16 00:40:35 -07:00
Wei Yang	c9b8d8566d	Added flip() fn in ATen (CPU + CUDA) (#7873 ) * Spelling fix in MultivariateNormal docstring (#7915) * [c10d] MPI Process Group Implementation (#7783) This provides a bare-minimum MPI Process Group implementation, the commit is on top of @pietern's Gloo Process Group PR. * [c10d] MPI Process Group Implementation ref: https://github.com/pytorch/pytorch/issues/7434 * Better exception, atexit func, and addressed comments * Clang formatting changes * Static initialization and addressed comments * Added constness back * Test will now launch mpi processes if found * CMakeList Changed * Fix Windows doc for import error (#7704) * Fix Windows doc for import error * Fix doc again * Fix wrong format * Moved condition for dilated grouped convolutions to CUDNN convolution implementation (#7465) * Updates to caffe2 operator documentation (#7917) * Significant updates to the operator docs in prep for merge * [auto] Update onnx to 307995b - Update from upstream (onnx/onnx#1038) `307995b143` * Test if ASAN is actually working as part of ASAN tests. (#6050) * Test if ASAN is actually working as part of ASAN tests. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Drop explicit use of libstdc++, we should not care. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Build with DEBUG=1 Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Increase main thread stack size when using ASAN. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Split up detail.h (#7836) * Fix THCUNN SpatialDepthwiseConvolution assuming contiguity (#7952) * Fix fbcode compatibility (#7939) * add test for correctness of transpose fusion (#7950) * [JIT][script] Fix emitted gather and slice for dynamic indices (#7861) * [JIT][script] Fix emitted gather for dynamic indices * Also fix slice * Address comments * cache and use BLAS_SET_BY_USER so that it doesn't set itself to TRUE when run second time (#7942) * Add unsafe flag to skip checking in prepare (#7832) * Add unsafe flag to skip checking in prepare * pop * Rename cuda::type to cuda::into_type and provide cuda::from_type. (#7937) These are used to convert Half -> half and half -> Half respectively. from_type will be used for runtime type checking in THC. * Try to fix TORCH_CUDA_ARCH_LIST for PyTorch again (#7936) * try again * use DEFINED * use a loop * Minor fixes * remove sort requirement from pad-sequence (#7928) * pad-sequence no longer requires sorting entries pad-sequence can get the max_len from the list of sequences. entries only need to be sorted if output will be used for pack_padded_sequence, which can throw the error itself. * remove sort requirement from pad-sequence Picks up from #5974. Removes the requirement that input sequences to pad_sequence have to be sorted. Addressed the comments in the PR: - Updated docstring for pad_sequence - Remove sort requirement in pad_sequence test - Test unsorted and sorted sequences in pad_sequence test * Fix checkBackend error message (#7926) * Fix checkBackend error message Fixes #7849 * Switch order of printing args * Split CI tests in half and run them in parallel (#7867) * Split and run tests in parallel * Refactor tests * Handling of scalars in torch.Size (#5676) * Handling of scalars in torch.Size torch.Size() constructor uses python_arg_parser IntList in python_arg_parser can take iter/range Have IntList take python iterables and ranges. Address comments: don't use python_arg_parser and instead call __index__ in THPSize_pynew Address comments Address comments * Rebased * Address nit * [JIT] Fission and fusion passes for addmm (#7938) * Addmm decomposition pass * Addmm peephole pass * Fix handling of output shape in fusion pass * Add DCE to the peephole passes * add comments * maybe bugfix? * Fix GPU tests * fix py2/3 test issue * Set smaller grain size for some cases (#7941) * Fix returning scalar input in Python autograd function (#7934) * fix _wrap_outputs not working with scalar inputs * add a test * Prevent git autocrlf for bash scripts (#7949) * Delete unused file (#7919) * Fix typo in autodiff formula for addmm (#7932) * 1) use meshgrid for flip() CPU implementation, only need one copy of input tensor; 2) changed kernel of CUDA implementation, no need materialized indices tensor; 3) reusing error checking code * [caffe2] YellowFin parameter update GPU code fix. (#6993) * [Caffe2] Keep name of caffe2_pybind11_state and caffe2_pybind11_state_gpu in debug build (#7155) * Allowing MatMul to create a gradient even with 3 inputs. useful if you are differentiating a graph twice (#6536) * added const for local variables * Fix the cpp libtorch CUDA build (#7975) * Use mingfeima's mkldnn (#7977) * Fix the import part of the windows doc (#7979) * Change perf test folder after git checkout (#7980) * Move the broadcast check in MKL Add/Sum to runtime (#7978) * Use Glog's implementation of STL logging when possible. (#7206) Inject custom workaround into namespace std so that it can be found by ADL. * [Hotfix] Bring back warnings and -Werror to ATen (#7866) * Bring back warnings and -Werror to ATen * Unbreak... * Fix tbb errors * Enable ONNX backend Mean tests (#7985) * Add third wayt to determine IS_CONDA (#7971) * Fix EmbeddingBag max_norm option (#7959) * fix EmbeddingBag max_norm option * flake8 * add warning to the embedding bag arg change * Raise error when torch.load a storage on a non-existing device (#7921) * Raise error when torch.load a storage on a non-existing device Before, doing torch.load(...) on a CUDA tensor on a CPU-only machine would raise an unreadable error: ``` ~/pytorch/pytorch/torch/cuda/__init__.py in __enter__(self) 223 if self.idx is -1: 224 return --> 225 self.prev_idx = torch._C._cuda_getDevice() 226 if self.prev_idx != self.idx: 227 torch._C._cuda_setDevice(self.idx) AttributeError: module 'torch._C' has no attribute '_cuda_getDevice' ``` This PR makes it so that torch.load raises a hard error if one tries to load a storage onto a non-existing device and suggests the user to use torch.load's map_location feature. * Address comments * missing dep * Make THStorage / THCStorage have void* data ptr. (#7964) * Make THStorage / THCStorage have void* data ptr. This is the initial step in unifying the ATen and TH tensor representations, next is to only generate a single THStorage / THCStorage type. The major changes here are: 1) data has been renamed to data_ptr and made void* in THStorage/THCStorage. 2) THStorage / THCStorage stores a at::ScalarType representing its data type (This will be useful when we generate a single THStorage/THCStorage). 3) APIs for Accessing the data as a real: a) storage->data<real>() -- this does runtime-type checking (checks that the at::ScalarType is correct). b) storage->unsafeData<real>() -- as above, but no runtime-type checking (used in inner loops / fast code paths). c) THStorage_(data)(storage) -- this already existed, just calls storage->data<real>(). Add include. * Attempt to fix clang build issues. * Clarify comment and remove extra character. * Rename unsafeData -> unsafe_data. * Remove unnecessary 'to' function to get compile time rather than link time errors. * Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. (#6834) * Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. * Add support of all default cmake build types for release to cuda. * Remove python bindings for `torch.slice` (#7924) * skip python bindings for slice * remove tests * convert slice test to indexing * Build ONNX for PyTorch version of libcaffe2 (#7967) * support loading gzip (#6490) * support loading gzip * address comments * address comments * fix lint * fix test for python2 * Add memory leak check in CUDA tests (#7270) * Add memory leak check in CUDA tests * Tracking multi-GPU too * fix run_test.py not running __name__ == '__main__' content; add test for make_cuda_memory_checked_test * add a comment * skip if cuda * 1. Change the wrapper to a method in common.py:TestCase 2. Refactor common constants/method that initialize CUDA context into common_cuda.py 3. Update some test files to use TEST_CUDA and TEST_MULTIGPU * Fix MaxUnpool3d forward memory leak * Fix MultiLabelMarginCriterion forward memory leak * Fix MultiMarginLoss backward memory leak * default doCUDAMemoryCheck to False * make the wrapper skip-able * use TEST_MULTIGPU * add align_corners=True/False tests for Upsample; fix TEST_CUDNN * finalize interface * VolumetricMaxUnpooling_updateOutput * fix test_nccl * rename THC caching allocator methods to be clearer * make the wrapped function a method * address comments; revert changes to aten/src/THC/THCCachingAllocator.cpp * fix renamed var * Revert "Set smaller grain size for some cases" (#7988) * Entry for c10d in CODEOWNERS (#8001) * Fix a couple of typos (#7998) * Fix typo * Fix typo * Fix typo * Fix typo * Add on-stack observer cache for Observable (#7931) observers_list_ stores all the observers for an observable. The list is allocated on heap, which can cause LLC miss. Add an on-stack observer cache for fast access. In production, we have seen 20% speed up for start and stop observer calls. * Reduce grain size for Unary operations (#8003) * [auto] Update onnx to 8ec0e5f - Add index check for Transpose's type inference function (onnx/onnx#1053) `8ec0e5fe9b` * Make AT_FORALL_SCALAR_TYPES usable outside of at::namespace. (#7935) * Make AT_FORALL_SCALAR_TYPES usable outside of at::namespace. This requires renaming the _cast functions which used the unqualified names. * Separate onnx mapping of scalar type from cast name. * Fix flake8. * Properly cast onnx. * Remove WITH_ROCM cmake flag/variable (use USE_ROCM solely) (#8013) * Mention the pytorch-ci-hud on the README. (#8004) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Re-enable build env check (#7969) * Re-enable build env check * Fix linux test error * Try to fix macOS test error * Update nn.rst (#8029) * Example for Transformed Distribution (#8011) * [auto] Update onnx to 33e9cd4 - Remove the usage of default value to fix invalid proto3 files. (onnx/onnx#1052) `33e9cd4182` * [auto] Update onnx to 1504a33 - Convert schema assert for duplicate type names to exception (onnx/onnx#1057) `1504a33abb` * Support CUDA tensors in ProcessGroupGloo (#7694) This adds an unconditional dependency on CUDA, which is not desirable for the long term. Ideally we have split like ATen where we have different artifacts for different backends so you can decide at runtime what to use. * [auto] Update onnx to 3fb9656 - Fix for fbcode CI (onnx/onnx#1062) `3fb965666e` * propagate nan in some activations (#8033) * propagate nan in some activations * fix py2 not having math.nan * flake8 * Fix profiler crash when no events register (#8034) * Fix profiler crash when no events register When trying to profile, attempting to print the event table throws a vague error because the event list is empty: .... max_name_length = max(len(evt.key) for evt in events) ValueError: max() arg is an empty sequence This change fixes the error by returning an empty string. * Update profiler.py * Allow CI testing with different AVX configs (#8020) * Allow CI testing with different AVX configs * Unset ATEN_DISABLE_AVX and ATEN_DISABLE_AVX2 in default config * Support for generating ATen during the fbcode build, rather than committing the generated files (#8002) Paint the internal bikeshed a slightly different color to appease Buck tooling. * Factor python dependency out of interpreter (#7970) * Factor python dependency out of interpreter * Remove NO_PYTHON for the autograd engine If there is no python bindings, then a default Engine is constructed the first time it is requested. If the python libraries are loaded, then they override the default accessor and the default engine becomes a python Engine. Note: it is possible for two engines to be generated if a non-python one gets created before the python bindings are loaded. This case is rare, and just results in additional threads being spawned. * Fixing AlexNet test which is skipped in CI * [auto] Update onnx to 760c928 - add missing hasNInputShapes check for bidirectionalBroadcastShapeInference (onnx/onnx#1060) `760c9283d0` * Support modules that output scalar in Gather (and data parallel) (#7973) * Support modules that output scalar in Gather (and data parallel) * Improve warning msg * [auto] Update onnx to 9e7855d - Remove PyTorch generated Upsample tests cases (onnx/onnx#1064) `9e7855dcd4` * [script] Add support for torch.zeros, torch.ones, etc. (#7799) * [script] Add support for torch.zeros, torch.ones, etc. * modifies gen_jit_dispatch to creating bindings for functions that do not take tensor arguments, but do have an initial type argument * adds tensor attributes to these functions for device, layout, and dtype specification * extends the list of valid compiler constants to include device, layout, and dtype. * allows functions with Generators, but only using the default generator Known limitations: * when using `torch.float`, we convert it to a scalar tensor and make no checks that it is actually used only in a dtype specification. This is similar to how we handle Python numbers, creating some situations where the script is more permissive. Fixing this requires much more significant changes to the IR, so is lower priority for now. * devices specified using string literals e.g. 'cuda:1' do not work, since we do not support string literals in general. * Add profiling annotations to NeuralNet[Operator\|Data] (#8005) * Update from facebook 1ee4edd286a3 (#8040) * Adding instance weight to batch distill loss as title * add bfloat 16-31 added bfloat 16-31 and their respective unit tests * [CUDA9] Upgrade - fbcode CUDA9 upgrade diff D5654023 has been out for a while thanks to Pieter. But with time growing it's becoming quite hard to rebase, because of the symlinks and auto-generated build/config files in tp2. Break D5654023 into two diffs, one touching tp2 config files, and another one touching fbcode TARGETS file (adding nvcc flag). These two should be a bit easier to rebase (for detailed procedure see "Test Plan"). This diff can only be committed if: 1. CUDA 9 rpm is rolled out fleet-wide (TBD) 2. NVidia driver 390.40 is rolled out fleet-wide (done) 3. Upgrade CUDA 9.1, cudnn 7.1, nccl 2.1 (done) 4. Make sure all dependents are built (done) 5. Test all C2 operators, PyTorch (see test plan) * Share intermediate int32 buffer across Conv ops Adding a known type * [C2 fix] infer function for ensure_cpu_output_op this is adding the missing device funtion for ensure_cpu_output_op * [int8] Add blob serializer/deserializer for Int8TensorCPU To export to logfiledb * [nomnigraph] Add try catch block to optimization passes in predictor This will catch failures that happen in the optimization pass. * Caffe2: avoid static initialization order fiasco for CAFFE_ENFORCE CAFFE_ENFORCE uses strack trace fetcher. Which is currently a global static variable. If at static initialization time CAFFE_ENFORCE is used, this is a SIOF. Recently CAFFE_ENFORCE was added into init functions registration, so we started to see this. Meyers singleton is going to provide safety here. If stacktrace fetcher was not registered yet, it will just use a dummy one. * NUMA support in SparseNN CPU benchmark Adding support for NUMA in SparseNN CPU benchmark * [mobile-roofline] Add logging needed for roofline model This should be all that's needed * Let the operators using the same input if the operators are not chained or else, we have to change the input data dims * fix null-pointer-use UBSAN errors in in reshape_op.h * revert previous fix on input blob name as title * Adding flag to let MineHardNegative automatically extract single value from dict Model exporter requires the output of the model to be a struct. This makes it convenient to use those models directly in MineHardNegative by allow automatic extraction of the single element of dict, which is a common use case. * Reverting change that broke internal tests back to OSS compatible state * Skip CUDA memory leak test on BN tests on windows (#8043) * workaround for Sequential when one cannot retrieve python source (#8048) * [auto] Update onnx to 0dbec2a - - Generate protoc type hints on Windows (onnx/onnx#1047) `0dbec2a047` * [auto] Update onnx to 4f8ef17 - Remove erroneous documentation around maps and sequences. (onnx/onnx#1069) `4f8ef17ad3` * [auto] Update onnx to e6a500e - Extract constant to initializer (onnx/onnx#1050) `e6a500e54c` * [auto] Update onnx to 033f956 - make gcc happy (onnx/onnx#1061) `033f956f41` * Remove NO_PYTHON macros from Exceptions.h/cpp (#8007) Removes cases where NO_PYTHON was unnecessary in Exception.h/cpp * [ready] Clean up torch.distributions (#8046) * Have a single THStorage and THCStorage type. (#8030) No longer generate data-type specific Storage types, since all Storage types are now identical anyway. For (some) backwards compatibility and documentation purposes, the Real names, e.g. THLongStorage are now #defined as aliases to the single THStorage type * Reduce usages of TensorUtils<T>::DataType in THC. (#8056) TensorUtils<T> is basically ATen-dispatch-lite in that it allows one to do multi-type THC function dispatch with a single call. However, it is templatized on the Tensor type, and since we are moving to a single Tensor type, this doesn't work. Most of the functions in TensorUtils (e.g. getDims) can be pulled up a level, to just call THCTensor_nDimension (or directly accessing the member), but the DataType specific functions are more problematic. So, this PR does two things: 1) Replaces calls of 'TensorUtils<THCTensor>::DataType' with 'real' since these are identical 2) Templatizes the THC_pointwiseApplyX functions to take scalar types. To ensure this is done correctly, we static_assert that the scalar type template parameter matches the scalar type of the corresponding template parameter. We will need to get rid of these static_asserts in the future, but this is useful for now. * Support to run ONNX Upsample operator (mode=nearest) in Caffe2 (#8037) * Added support to run ONNX Upsample operator (mode=nearest) in Caffe2 * adding error checks to upsample * adding error checks to upsample * adding error checks to upsample * changing to np.isclose * Revert onnx submodule update * still fixing * [auto] Update onnx to eb12f72 - Add conv transpose test cases (onnx/onnx#886) `eb12f72a86` * [auto] Update onnx to bd98abb - Add a hook for doing post-processing on protobuf generated header files (onnx/onnx#1068) `bd98abbba0` * Skip ConvTraspose ONNX backend tests (#8074) * Post process onnx proto (#8064) * Post processing onnx generated protobuf files to hide global symbols * . * . * Add code for TensorBoard visualization of JIT GraphExecutors (#8050) * [auto] Update onnx to cc26486 - bump version to 7 for prelu. (onnx/onnx#1063) `cc26486541` * [auto] Update onnx to 356208d - add input tensor dimension checks to shape inference (onnx/onnx#1070) `356208d756` * Move backtrace to its own header (#8096) * Move backtrace to its own header * Move cxxabi.h into Backtrace.cpp * Fix and ignore some warnings (#8081) * Do an additional sanity check that nvcc and CUDA include dir agree. (#8094) If you set CUDA_HOME and CUDA_NVCC_EXECUTABLE together, you may end up in a situation where the CUDA_VERSION of your includes mismatches the CUDA version of your nvcc. See #8092 for a concrete case where this can occur. Explicitly detect this situation and give a good error message in this case! Signed-off-by: Edward Z. Yang <ezyang@fb.com> * use regex in kwarg parser (#8061) * Removing remaining NO_PYTHON ifdefs (#8067) * Remove NO_PYTHON in tracing * Remove NO_PYTHON in ir.h * Remove NO_PYTHON in test_jit.cpp * Replace std::size_t with size_t (#8093) * Remove out-of-date comment (#8114) * [Caffe2] Enabling AMD GPU Backend for Caffe2 (#7955) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Resolve merge conflicts * . * Update GetAsyncNetHIPThreadPool * Enable BUILD_CAFFE2 in pytorch build * Unifiy USE_HIP and USE_ROCM * always check USE_ROCM * . * remove unrelated change * move all core hip files to separate subdirectory * . * . * recurse glob core directory * . * correct include * . * Detect CUDNN related environment variables in cmake (#8082) * Implement adaptive softmax (#5287) * Implement adaptive softmax * fix test for python 2 * add return_logprob flag * add a test for cross-entropy path * address review comments * Fix docs * pytorch 0.4 fixes * address review comments * don't use no_grad when computing log-probs * add predict method * add test for predict * change methods order * get rid of hardcoded int values * Add an optional bias term to the head of AdaptiveSoftmax * Make libshm also test if rt requires pthread. (#8112) In some configurations (e.g., our internal build of GCC 5 + GLIBC 2.23), -lrt is not sufficient to use shm_open; you also need to declare a dependency on pthread. This patch adds a surgical extra fix to detect this situation, in the case that I noticed it failing in the wild. Fixes #8110 Signed-off-by: Edward Z. Yang <ezyang@fb.com> * [auto] Update onnx to 2d5ce4a - Remove empty model (onnx/onnx#1058) `2d5ce4aeb6` * Add missing pragma once. (#8118) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * [auto] Update onnx to 2a87616 - Tests for LRN operator (onnx/onnx#903) `2a876162ac` * Split SparseTensorImpl off from TensorImpl. (#7990) * Split SparseTensorImpl off from TensorImpl. At the moment they have the same data layout, but with the upcoming refactor they will not, and we need a place to put all of the sparse tensor specific fields. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update SparseTensorImpl.h * [Caffe2] Support non peer access in muji and fix bug when reduced_affix is empty (#6896) * [Caffe2] Support non peer access in muji * [Caffe2] Add test for 4 gpus and 2 groups * [Caffe2] Add comments * Fix bug when reduced_affix is empty * Fix typo and add comments about cpu and amd gpu * Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segfault (#8127) * Replace most remaining usages of TensorUtils<T>::DataType. (#8124) As in https://github.com/pytorch/pytorch/pull/8056, this doesn't work with a single TensorImpl type. This replaces the usages of with a templatized parameter and static_asserts that the new and old are equal. After this we can get rid of the old template parameter, but I want to ensure they are equivalent across all builds first. * Add utf-8 header to Python file with Unicode. (#8131) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Add back lrn test (#8134) * Revert "Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segfault (#8127)" This reverts commit 410191c4175eaae141306cdb3c3c1c1e8a495225. * Fix mismatched default values * Add non_blocking to Tensor/Module.to (#7312) * Add non_blocking to Tensor/Module.to * flake8 * Add argparse tests * cpp parse * Use C++ parser * use a commong parse function with Tensor.to * fix test_jit * use THPObjectPtr * increase refcount for None, True, and False * address comments * address comments * Fix job name checking for AVX tests (#8135) * Fix a corner case for ReShapeOp (#8142) In my use case, in the backward propogate pass, the reshape need to change a [0] tensor into [0,0] shaped tensor. The original implementation would cause out of index issue. This diff fix this problem. * cpu/ideep context converter (#8139) * fix type mismatch while call torch._C._cuda_setDevice (#8065) * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch in scatter * fix type mismatch in scatter * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch while call torch._C._cuda_setDevice * docs: Add warning to torch.repeat() (#8116) * docs: Add warning to torch.repeat() closes #7993 * docs: Add links for numpy functions * docs: Break the too long line * Accelerate bernoulli number generation on CPU (#7171) * opt bernoulli rng with vsl and openmp * detect cpu vendor for bernnoulli * retrigger test platform * check the vendor more severely * use cpuinfo to check vendor * docs: add canonical_url and fix redirect link (#8155) * docs: enable redirect link to work for each specific page * docs: add canonical_url for search engines closes #7222 * docs: update redirect link to canonical_url * docstring support for @script and @script_method (#7898) * docstring support for @script and @script_method * make it python2 compatible * improve according to review * improve build_stmts * use filter instead of list comprehension * improve the way wrap is handled for script_method * stash the original method instead * allow dynamic attr for ScriptMethod and GraphExecutor * a bit comment on build_Expr * remove _build_wrap * a bit improve on comments * rename to __original_methods * should be _original_methods * [auto] Update onnx to 968d28d - fix Node::isBefore (onnx/onnx#1075) `968d28d901` * remove some unnecessary cudaGetDevices (#8089) * remove unnecessary cudaGetDevices * make curDevice argument non-optional, add explicit checks to current_device * Fix cuda.framework error on OSX. (#8136) When compiling OSX with CUDA, Caffe2's build system uses find_package(cuda) to get its grubby hands on the CUDA driver library (for some strange reason, FindCUDA doesn't save this information as a variable). Unfortunately, on OSX, sometimes this picks up the cuda.framework folder, and then our build system chokes to death because it doesn't try to link against this as a framework. (Is the folder even a framework? I have no idea). This commit attempts to fix this in a two pronged fashion: 1. For some users, reducing the precedence of frameworks using CMAKE_FIND_FRAMEWORK seems to help. So we set these variables. However, this fix is not perfect; on my laptop it doesn't actually solve the problem. 2. PyTorch doesn't actually need the CUDA driver API. So we only add the dep when building Caffe2. Fixes #8022 Signed-off-by: Edward Z. Yang <ezyang@fb.com> * [C++ API] Improve and use OrderedDict for parameters / modules (#7823) * Improve OrderedDict for C++ API * Give OrderedDict a subject and fix review comments * Fix OrderedDict use in torch/csrc/jit/script/init.cpp * Fix __rshift__ bug (#8161) * Fix __rshift__ bug * Add small tests for __lshift__ and __rshift__ in test_cuda * Add a more elaborate check for __lshift__ and __rshift__ * refactor the test to address @zou3519 's comments * Move non-generic Storage code needed by TensorUtils to non-generic C++. (#8164) For non-generic function call implementations in Storage used by TensorUtils, we do the following: 1) Move the declaration from generic/C to non-generic/C++; we don't need backwards compatibility on these functions and want to use e.g. at::ScalarType. 2) Move the implementation from generic/C++ to non-generic/C++. 3) Change the generic implementation to call the non-generic implementation. This will allow us to get rid of the corresponding TensorUtils calls (once we move over the Tensor functions in the same manner). * Pinning opencv to < 3.4 in conda builds (#7923) * Pinning opencv to 3.1.0 in conda builds * Also pinning numpy to 1.11 * Trying only specifying <3.4 * Adding -setup- path, and better code structure (#8122) * Abstract parallelization to faciliate using threadpools (#8163) * [Caffe2] Update elementwise ops to support numpy style boradcast (#8070) * Update elementwise ops to support numpy style boradcast Update elementwise ops to support numpy style boradcast * Fix sqrt_op * Fix compare ops * Fix gradient test * Fix optimizer legacy broadcast * Fix legacy broadcast for elementwise ops * Skip flaky test * Fix eigen simple binary op * Fix attention test * Fix rnn test * Fix LSTM test * Fix tan grad * Fix schema check * Export getCudnnHandle (#7726) * [JIT] Support a single TensorList argument anywhere in the argument list + index_put (#8173) * [JIT] Support a single TensorList argument anywhere in the argument list * [JIT] index_put * use the correct datatype format (#8144) * Add back onnx console scripts dropped during migration from onnx-caffe2 (#8143) * Get rid of SOVERSION (again). (#8132) We don't want SOVERSION because pip will lose the symlink and double your distribution size, and also because our setup.py accidentally links against both libcaffe2.dylib and libcaffe2.1.dylib on OS X. This leads to a very puzzling error where you get the error "cannot initialize CUDA without ATen_cuda", because there are actually two copies of your registry in memory (because there are two copies of the dynamic library). Dropping SOVERSION makes it impossible to make this mistake. In principle, if the shared library load is done with DYLD_GLOBAL, that should also prevent two copies of the registry from popping up. Worth checking at some later point, if you need to bring back SOVERSION (because, e.g., pip finally fixed their software.) Partially fixes #8022. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix a corner case for ReShapeOp (#8178) In my use case, in the backward propogate pass, the reshape need to change a [0] tensor into [0,0] shaped tensor. The original implementation would cause out of index issue. This diff fix this problem. * Better conv error message basing on weight shape (#8051) * Add retry logic to sccache download for Windows build (#7697) * Add retry logic to sccache download for Windows build * fix script bug * clean up * fix caffe2 docker build (#7411) * [ONNX] Fix type_as symbolic (#8183) * [ONNX] Nuke type_as symbolic * make it better * Fix lookup + test * Yangqing as an ONNX codeowner (#8185) * Fix protobuf options (#8184) * protobuf * fix protobuf_MSVC_STATIC_RUNTIME * Add a loop unrolling pass to PyTorch JIT (#7672) * [auto] Update onnx to 4e65fd8 - fuse consecutive squeezes (onnx/onnx#1078) `4e65fd83ba` * [Caffe2] Merging setup.py with setup_caffe2.py (#8129) * Mergine setup.pys, torch works, caffe2 works up to other KP * Fix to super call for python 2 * Works on python2 on mac * Consolidating Caffe2 flags * Fix scalar check for sparse tensors. (#8197) * Fix scalar check for sparse tensors. As discovered in #8152 If `t` is a scalar sparse tensor, `t._indices` used to return a sparse empty tensor because the scalar check was incorrect. This PR modifies the scalar check to return a dense tensor instead of a sparse tensor. i.e. ``` tensor = torch.sparse_coo_tensor([], [], torch.Size([]), device=device) out = tensor._indices() # was a sparse tensor, now is dense. ``` * Fix typos * fix lint * Add more annotations for arguments in ATen schema (#8192) * use THCThrustAllocator in BCECriterion (#8188) * Allow parallel_apply to take in list[Tensor] (#8047) * Docs for gradcheck and gradgradcheck; expose gradgradcheck (#8166) * Docs for gradcheck and gradgradcheck; expose gradgradcheck * address comments * Implement randperm for CUDA (#7606) * Implement randperm for CUDA * Use Thrust to implement randperm * clean up * Fix test * Offload small input scenario to CPU * Fixed test * Try to fix Windows error * Fix Windows error and clean up * Use fork_rng context manager * Move test_randperm_cuda to test_cuda * Add half tensor support * Fix cuda::type error * Fix CPU offloading * Fix issues * No need to check range for n == 0 case * Update c10d build to link against Caffe2 (#8201) This follows #7399. * add wipe_cache option (#8204) as title * Replace (non-data) TensorUtils calls with non-generic THCTensor calls. (#8176) * Replace (non-data) TensorUtils calls with non-generic THCTensor calls. TensorUtils is templatized on the THTensor type, so to support a single tensor type (like ATen), we need to remove these. This PR does the following: 1) Allows THCTensorTypeUtils.cuh to include THCTensor.hpp. This involves moving includes of it outside of generic/, so we can use the new implementations. 2) Defines a single _THTensor struct and changes THCRealTensor to be a derived type of _THCTensor. This allows us to implement a single non-generic function and avoid static_cast or void * tricks to call it from the generic functions. 3) For functions inside of TensorUtils that don't use data pointers: a) Implement the functions in (non-generic) THTensor.cpp and declare them in (non-generic) THTensor.hpp. b) Have the generic versions call the non-generic versions. c) Replace the corresponding TensorUtils<THCTensor>::fn call with (non-generic) THTensor_fn. * Add comment about THCTensor struct. * Error if storage is null in setStorageNd or resizeNd. * Fix c10d compiler warnings (#8206) Copy compiler flags from the ones used in setup.py and fix warnings. This makes the root build that includes c10d headers warning free. * Bump gloo submodule (#8202) This includes facebookincubator/gloo#125. * rm -rf aten/contrib (#8165) * Remove aten/contrib * Remove from CMake * Fix tanh_op on ios build (#8207) * Fix tanh_op on ios build * Fix tanh * [auto] Update onnx to f28e2f1 - fix lrn spec (onnx/onnx#1090) `f28e2f1a60` * [cmake] deprecate caffe2_* specific cuda function in cmake. (#8200) * deprecate caffe2_* specific cuda function in cmake. * ENV{} -> $ENV{} * CUDA_ARCH_NAME -> TORCH_CUDA_ARCH_LIST * . * . * . * skip CUDA memory leak check on Windows altogether (#8213) * Record shape and type in autograd to validate gradients (#8168) The check that the gradient is defined is currently disabled because TestJit.test_ge_optimized will trigger the error. * [auto] Update onnx to 18d70ff - Graph should only have one (input) kParam node (onnx/onnx#1088) `18d70ff529` * Set up a c10 source folder (#7822) * Set up a c10 source folder * Change the benchmark log format and also log flops (#8215) as title * Move helper functions to unnamed namespace. (#8224) Currently, the helper functions in this file are in global namespace. I am guessing the purpose of excluding them from was to keep them local. * [auto] Update onnx to e96d823 - Update Google benchmark to 1.4.1 (onnx/onnx#1083) `e96d823e5c` * Change new bernoulli implementation to be fully generic. (#8218) The current implementation depends on THTensor types being unique, which is not guaranteed going forward. * Structure THTensor like THCTensor is structured. (#8217) In particular, define a base type, _THTensor, that can be used for all THRealTensor structs. This is just to have less cognitive load when dealing with generic THTensor/THCTensor types (as in templates). * move THCP-related utils to cuda/utils.cpp. (#8221) These files don't follow the usual pattern: In general the files torch/csrc/X torch/csrc/cuda/X both include the generic file torch/csrc/generic/X, where torch/csrc/X includes the cpu implementations and torch/csrc/cuda/X includes the cuda implementations. (Aside: this is probably not the best structure, the torch/csrc/X fiels should probably be moved to torch/csrc/cpu/X). utils.cpp combines these so that torch/csrc/utils.cpp has cuda specific code. This makes it impossible to declare a single THTensor and THCTensor template type (i.e. THPPointer<_THTensor>, THPointer<_THCTensor>). * [READY TO MERGE] Use ccache in macOS build (#8009) * Use ccache in macOS build * Moving to sccache * Don't use sccache in test job * [NEEDS REVIEW] Add nan and inf probability check to multinomial (#7647) * Add nan and inf probs check to multinomial * fix bug * Spawn CUDA test in subprocess * Make sure invalid input won't pass the test case * Try to fix error * Test failure cases in Python 3 only * Try to fix Windows error * Move CUDA test to test_cuda.py * fix issues * fix module name error * no need to check for CUDA existence in test_cuda * Use PY3 * [READY TO MERGE] Enable tests that use DataLoader with multiple workers on Windows (#6745) * Don't import TEST_CUDA for test_dataloader on Windows * test_partial_workers is stuck on Windows * Don't copy unneeded grads when using a function for several derivatives (Fixes #7722) (#7759) Trying to copy all results fails when one of them is a tensor list which has not been populated. This blew up for CuDNN RNNs when the weights did not require grad. Thanks to Sylvain Gugger for reporting! * Fix win mkldnn (#7718) * Sync build_pytorch_libs.bat with build_pytorch_libs.sh * fix quoting * add warnings * fix warnings * Add /EHa * [Caffe2] Add ADD operator for IDEEP (#8220) * Add ADD operator for IDEEP * Add boradcast check * Comments * Allow optional build and installation of native test binaries (#8225) * test finetuning * install off by default * Turn BUILD_TEST=ON for jenkins. * Turn on install_test in jenkins as well * Update MKL exporter to IDEEP ops (#8228) IDEEP exporter support * [ideep] Add IDEEP Squeeze op (#8227) Similar to MKLSqueezeOp at caffe2/mkl/operators/squeeze_op.cc * [auto] Update onnx to 62e63e9 - Fix build errors inside protobuf-bench (onnx/onnx#1084) `62e63e9de8` * Use .cc since some downstream libraries are configured for C++ only. (#8234) * Rename SparseTensor to SparseTensorRef. (#8237) I want to introduce using SparseTensor = Tensor (as a documentary type alias for Tensor), but the name is already taken. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * [caffe2] Build Android tests and binaries in CI (#7593) Update benchmark submodule to version with fixed Android/GNUSTL build * Remove core and util warnings (#8239) * Fix some signed/unsigned mismatches * Skip unused result warning * Explict fallthrough for murmur hash * Enable aligned new support to eliminate warning * Switch to int instead of unsigned in some cases * Remove .gitmodules.aten since it is in .gitmodules now (#8232) * Fix: gradcheck forced float32 (#8230) * Print requires_grad and grad_fn in string repr of tensor (#8211) For example: >>> torch.ones(3).requires_grad_() tensor([ 1., 1., 1.], requires_grad=True) >>> torch.ones(3).requires_grad_() * 5 tensor([ 5., 5., 5.], grad_fn=<MulBackward0>) The suffix (dtype, requires_grad, grad_fn) wraps to a new line if it would cause the the line to exceed the linewidth. >>> torch.ones(10).double().requires_grad_() tensor([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=torch.float64, requires_grad=True) * Fix TEST_CUDA import in test_cuda (#8246) * Fix lifting cat into its constant version (#8174) This fixes a bug where schema including varargs lists did not lift properly blocking correct ONNX export. * Don't override Tensor, Storage macros defined outside torch/csrc in t… (#8243) * Don't override Tensor, Storage macros defined outside torch/csrc in torch/csrc. This PR does the following: 1) Removes THSTensor macros in torch/csrc, which aren't used. 2) For macros defined outside of torch/csrc (THTensor, THTensor_, THStorage, THStorage_): a) No longer override them, i.e. previously THTensor could actually be THCTensor if a generic file was included from a file including THCP.h. b) Instead, introduce new macros THW* (e.g. THWTensor) to represent a (potentially empty) wildcard character. In addition to making this code easier to read and codemod, this allows us to more freely change TH/THC; for example: currently in the THC random code, the state is casted to THByteTensor; this happens to work because the macros don't happen to override THByteTensor. But if THByteTensor just becomes an alias of THTensor (which is the plan for a single tensor type), then this no longer works. The whole thing is a bit of a mess previously because you really have to understand which macros and redefined and which aren't. We could also rename the macros that live in torch/csrc (e.g. the THPTensor macros), but since that is more self contained, I punted for now. Don't change the plugin. * [auto] Update onnx to 3a035f4 - Add retry logic to model downloading (onnx/onnx#1077) `3a035f4397` * Fully genericize THC/THCUNN (except for TensorUtils and DeviceTensorUtils). (#8251) * [cmake] Use CAFFE2_USE_* for public/cuda.cmake (#8248) * Fix app size check (#8256) Fix app size check * wip on CPU impl * Stop BCELoss from returning negative results (#8147) * Stop BCELoss from returning negative results * check explicitly for 0 before taking log * add tests * fix lint * address comments * Relax CUDA_HOME detection logic, to build when libraries are found. (#8244) Log when no cuda runtime is found, but CUDA is found * Added backward function for kl_div target (#7839) * added backward fn for target * added module test for kl_div target, and assuming targets are probabilities * Change the output format of caffe2 observers (#8261) as title * Remove TensorUtils<T>::getData, provide data<T>() in TH(C)Tensor. (#8247) * Remove TensorUtils<T>::getData, provide data<T>() in TH(C)Tensor. * Fix template parameter. * [caffe2] Move submodule onnx-tensorrt forward (#7659) Commit 82106f833dcb0070446a150e658e60ca9428f89b is essential. * [ideep] Add IDEEP fallbacks for Faster-RCNN ops (#8260) TSIA * un-genericize THCDeviceTensorUtils. (#8258) * provide data<T>() in TH(C)Tensor. * un-genericize THCDeviceTensorUtils. This is used outside of generic context, so we need to un-genericize it to have a single THCTensor type. * [caffe2] Fix ATen dispatch for ops with TensorList arg (#8226) * [cmake] Add and export Modules_CUDA_fix (#8271) * Add and export Modules_CUDA_fix * actually, need to include before finding cuda * [auto] Update onnx to 2508156 - Make error message more verbose (onnx/onnx#1097) `2508156135` * [auto] Update onnx to 39e4668 - fix optimizer does not set ir_version bug (onnx/onnx#1098) `39e46687ea` * [cmake] Make cudnn optional (#8265) * Make cudnn optional * Remove cudnn file from cpu file * Move signal window functions to ATen; add Blackman window (#8130) * Move signal window functions to ATen; add Blackman window * fix cuda test not checking scipy * [ideep] Fuse Conv-Relu after IDEEP graph rewrite, skip group conv (#8233) IDEEP supports fusion for non-group conv * [c10d] NCCL Process Group implementation (#8182) * [c10d] Process Group NCCL implementation * Addressed comments * Added one missing return and clang format again * Use cmake/Modules for everything and fix gloo build * Fixed compiler warnings * Deleted duplicated FindNCCL * Set up CI build for CUDA 9.2 + macOS (#8274) * Add macOS CUDA build to CI * Fix undefined symbols issue * Use sccache for CUDA build * Fix sccache issues * clean up * c10 build setup (#8264) * Move c10/ to caffe2/dispatch/ * Set up caffe2/utils directory * Remove remaining TensorTypeUtils functions. (#8286) Mostly what's remaining is copy utilities -- these are now provided in THCTensorCopy.hpp and templatized on the ScalarType rather than the TensorType. * Create initial Python bindings for c10d (#8119) * Build and install c10d from tools/build_pytorch_libs.sh * Create initial Python bindings for c10d * clang-format * Switch link order to include more symbols * Add bindings and tests for ProcessGroupGloo * Add broadcast test * Separate build flag for c10d * Explicit PIC property * Skip c10d tests if not available * Remove c10d from Windows blacklist Let it skip by itself because it won't be available anyway. * Make lint happy * Comments * Move c10d module into torch.distributed * Close tempfile such that it is deleted * Add option USE_NVRTC which defaults to off (#8289) * [build] Remove /torch/lib/THD/cmake in favor of /cmake (#7159) * Remove /torch/lib/THD/cmake in favor of /cmake * path fix * Explicitly marking gloo to use cuda * Fix gloo path in THD * Have a single THTensor / THCTensor type. (#8288) * Remove remaining TensorTypeUtils functions. Mostly what's remaining is copy utilities -- these are now provided in THCTensorCopy.hpp and templatized on the ScalarType rather than the TensorType. * Have a single THTensor / THCTensor type. As was previously done with Storages, have only a single (dtype-independent) THTensor / THCTensor. For documentation and backwards compatibility purposes, the old names, e.g. TH(Cuda)LongTensor alias the new TH(C)Tensor type. * undef GENERATE_SPARSE. * [auto] Update onnx to 58efe0a - add float16 support back for math and reduction ops (onnx/onnx#1102) `58efe0a9ca` * Some utils for compile-time programming (#7778) * Add some C++17 features, implemented with C++14 * Add some type traits * Compile-time type list abstraction * Some utils for compile-time programming * Fix compatibility with a larger range of compilers * Use guts::array instead of std::array because of std::array shortcomings * code review comments * Use quotes for includes * Remove THC's FindMAGMA (#8299) * Entries for torch.distributed in CODEOWNERS (#8293) * Add depthwise convolution test for IDEEP (#8301) * Fix dividing by zero segfault in Reshape (#8302) when infer a dimension of zero size new shape * Removes unused THCTensorConv (#8229) * Replace Variables to Tensors (#8309) * Clean up old sccache log before build (#8305) * Remove unused grad ops on mobile to reduce app size (#8297) Remove unused grad ops on mobile to reduce app size * Small fixes (#8296) * [auto] Update onnx to 5ed684e - Remove/replace /MX with /WX for MSVC build. Was typo in a previous ch… (onnx/onnx#1104) `5ed684ebe5` * Fix sample code for cuda stream (#8319) * [auto] Update onnx to 4b4085c - Add missing warning ignoring flags to onnx_proto CMake target (onnx/onnx#1105) `4b4085c2e9` * [THD] fix broken THD build with NCCL (#8323) * Add docstring for `torch.sparse_coo_tensor` (#8152) * add sparse_coo_tensor docstring * update empty tensor example * whitespace * whitespace again * add error when backend is not supported by DDP (#8325) * Fix collect_env.py for Windows (#8326) * Fix collect_env.py for Windows * Fix expect file for Win machine * Fix the script doesn't stop eariler on error for MSVC and Ninja (#8277) * Simplify the solution * Remove the usage of set errorlevel * Skip test_multinomial_invalid_probs_cuda on Windows (#8324) * Support printing sparse tensors in ATen, fixes #8333. (#8334) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * [C++ API] Cursors (#8190) * Add cursors to C++ API * Small self nits * s/struct/class * Use more STL like names for cursors * Implement dim_arange operator (#8266) * Implement arange_like operator * add ONNX symbolic * lint * change name * Comment the hack * 1. fixed flip CPU impl for non-continuous flip dims; 2. added more tests; 3. using TensorInfo and collapseDims to speed up CUDA impl for cases where flip dim is the 1st or last dim * nits * 1. removed for loop in pointwise CUDA kernel; 2. using templated (int64_t) IndexType for indices in pointwise CUDA kernel * added torch.flip.__doc__ * nits	2018-06-15 21:20:55 -04:00
Soumith Chintala	92f67d9404	fix lint	2018-06-15 18:18:20 -07:00
li-roy	26bed6d83e	assert limit on cudnn grid_sampler (#8576 )	2018-06-15 17:33:33 -07:00
Mike Ruberry	7b2ad8893d	Eliminates noisy assert spew when running test_cuda.py (#8531 ) * Fixes test_multinomial_invalid_probs_cuda debug spew * Fixes test_multinomial_invalid_probs_cuda debug spew * Fixes Python linting	2018-06-15 19:52:53 -04:00
ngimel	682dec2cea	add relu to jit and exp to autodiff (#8573 )	2018-06-15 19:49:20 -04:00
Matthew Inkawhich	b10c94b507	Update operator documentation with markdown descriptions and interfaces (#8085 ) * Update operator documentation with markdown descriptions and interfaces * Added rest of updated operator documentation to source files * Commiting local changes for rebase * fixed bracket typo in sqrt_op.cc file * Added updated markdown documentation to remaining completed ops	2018-06-15 19:02:24 -04:00
Zachary DeVito	d968614502	Enable open registration of VariableType objects (#8540 ) We have 2 use cases where we want to experiment with new base ATen tensor types: * BatchTensor for matchbox * Tensors that live on accelerators It is possible to subclass TensorImpl to implement these but VariableType does not work with them because it cannot find the equivalent variable type in the registry. This commit changes the way we implement type -> variable(type) lookup so that torch::register_variable_type_for can be called on any at::Type. Lookups are still done using arrays so there should be no perf impact from the change.	2018-06-15 14:56:19 -07:00
Edward Z. Yang	711e5a6ceb	Port THS to ATen. (#8409 ) * Port THS to ATen. The basic structure of the patch: - All kernels in aten/src/THS got rewritten as native functions in aten/src/ATen/native/sparse I took the liberty to rename some of the kernels, opting for a longer, more transparent names than things like 'spaddcmul'. - Instead of holding fields for sparse tensor in the TH C struct THSTensor, they are now held in a C++ class SparseTensorImpl (this explains why I had to do this all in one go; I can't have two reps for sparse tensors!) Along the way, we change a key internal representation invariant: an "empty" sparse tensor has dimI == 1 and dimV == 0 (this is different from dimI == 0 and dimV == 0 we had before); this ensures that we maintain the invariant that dim == dimI + dimV. "Scalar" sparse tensors are made illegal, because there really is no way to properly express them in COO format. - Because we haven't ported THCS or any of the traditional dense TH implementations, there is a new set of adapter functions in native/LegacyBridge.cpp exclusively devoted to deciding whether or not to go to the new native implementation or back to the legacy TH binding (prefixed with th_). The intent is that when everything gets ported, we can delete this file. - I've kept the stubs for all the THS functions, but they now all error if you try to actually call them. Eventually, we should replace these with calls to ATen so that everything keeps working. - I gobbled up SparseMM (SparseMM.cpp is no more). It was tasty. There are some miscellaneous improvements which were needed for other changes in this patch: - There is now AT_FORALL_SCALAR_TYPES_EXCEPT_HALF, which does what it says on the tin. - axpy templated function moved to TH/BlasUtils.h, there's a new macro which lets you easily forward to all of the TH functions. We also expose THBlas_copy. I'm not terribly pleased with these functions but they seem to serve a purpose they need. - New method on Tensor to get TensorImpl, unsafeGetTensorImpl - accessor() is now this-const, since const-correctness on Tensor is a lie - New toSparse()/toDense() methods on Type; now you can call these directly without having to manually apply at::toSparse/toDense on the Backend and then running toBackend yourself. Changes to the kernels: - Previously, the whole body of all kernels was compiled for every supported scalar type. In our new implementation, the scalar dispatch has been pushed into the smallest extent which (1) is not in a type loop and (2) requires statically knowing the scalar type. These sites all use AT_DISPATCH_ALL_TYPES. I tried to use lambdas as much as possible, but sometimes it was not possible when a OpenMP pragma was used. - Anywhere we tested if the nDimension of a tensor was zero, we replaced with a test that numel is zero. Because, as we known, nDimension of zero-size tensors in TH is zero, and that's wrong wrong wrong (and not done this way in ATen). Some subtleties: - Places where previously fastget1d was used, I now use a TensorAccessor. However, you have to be careful about grabbing the accessor, because sometimes you will be accessor'ing indices/values and they are empty, which means they will be 1D* ("oh, aren't indices always 2D?" Nope. Nyet.) So, essentially, it is only safe to grab an accessor after you have checked that nnz != 0. All of these shenanigans will go away when we properly support zero-size dimensions. A few places, we test for this case just by wrapping the loop in a conditional on nnz. Some other places this is not so easy, so we instead short-circuit the function with a special case for when nnz == 0 (usually, these implementations are degenerate). - There is a very subtle but important difference between _sparse_get_impl(self)->indices() and self._indices(); the latter may return a view! This is because nnz is not guaranteed to match the dimensions of indices/values; you can "truncate" a sparse tensor by setting the nnz. Actually, I think this is not a good idea and we should enforce a stronger invariant, but for this patch I slavishly adhere to the old ways, and as such I have to be very careful if I want to resize something, I had better use the former and not the latter. - I had to reimplement broadcasting by hand (thus the s_ and non-s_ functions in the sparse native files). There is a very important distinction between foo_out and foo_, so it is important that the LegacyBridge function always call to the lower layer, and not try to avoid boilerplate by calling to another LegacyBridge function first. I did NOT put broadcasting in LegacyBridge (even though, ultimately, that's where it must live), because the th_ functions which are invoked from LegacyBridge handle broadcasting themselves, and I don't want to broadcast twice. - Sparse function MUST explicitly specify the Type they dispatch from, otherwise Variable wrapping/unwrapping will not work correctly. If you use _get_sparse_impl, that is sufficient to levy this requirement. - The "has native" tests in LegacyBridge.cpp are not 100%, because some of the functions are mixed dense-sparse functions, and so you can't just say, "Oh, if it's sparse and CPU, call the native sparse implementation." This is handled on a case by case basis. There is some especially complex logic for add(), which has dense-dense, sparse-sparse and dense-sparse implementations. - I added some uses of SparseTensorRef in native_functions.yaml, but you will notice that these are all on native_* functions, and not the actual, top-level functions. So the SparseTensorRef is purely documentary (helping you not call the wrong overload) but there is no magic; we do the wrapping ourselves the hard way. (This is in constrast to the TH binding code which is magical.) Except for _sparse_mask; _sparse_mask is magical. - There is a raw_copy_sparse_ method, which is really my way of getting around the fact that copy_ has never been implemented for sparse tensors (even before this patch), but there IS a super secret, internal way of doing these copies that the THS code used, and which I needed to get my hands on when I did this port. We should refactor so that either (a) copy_ does support sparse-sparse copy natively, or (b) we do this other ways. - Irritatingly, I must explicitly resize_as_ before copy_ into a tensor. This was not the case with THTensor_(copy) but I don't have any direct binding that doesn't have this requirement. - For some reason, the sparse tensor constructor accepts a scalar tensor for the values tensor. This is kind of weird because you always need an nnz-dimension. However, the old code supported this and just expanded it into a 1D size 0 tensor; so we need some explicit code to do this. There are maybe a bit more AT_ASSERTs in some of the kernels than is wise. I added them all when I was debugging and was loathe to remove them. Some last mile fixes after this commit went into PR - Move expand outside of dispatch so autograd works (it used to be inside and then we lost all of the recorded broadcasts). - Hack to duplicate the derivatives for our now two definitions TH and native. Mercifully the derivatives are short. - Apparently, TH has a special case to make foo_ functions method only, and if you don't do this the Python arg parsing is wrong. We carefully work around this in the native bindings - Apply DCE to a test_jit case, fixes wobbling due to DCE trick in tracing - Update test_function's output - Some last mile fixes for dispatch confusion in sparse_coo_tensor functions. - New simplified regression test based on failures I saw in ONNX - Increase tolerance on super resolution test - More robust dynamic_type normalization, fixes ONNX bug. The dynamic_type situation is very delicate; probably need to stop having both Scalar and real. - Make new_with_tensor_sparse more CUDA safe - Note about CUDA-safety in SparseTensorImpl - Rename dimI/dimV to sparseDims/denseDims. - Make localScalar on SparseTensorImpl work. - Make numel uniformly supported on all types, not just dense types - Add tests for is_nonzero() method (which exercises localScalar) - Disable constant JIT autogenerated tests, which are fragile and broken by this change, but being fixed in a parallel track. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-15 17:52:21 -04:00
Tongzhou Wang	c537fd7432	fix lint (#8567 )	2018-06-15 17:34:39 -04:00
Paul Jesse Hellemn	c457fc994d	Adding pyyaml to Ubuntu and Centos docker images (#8490 )	2018-06-15 13:55:48 -07:00
Xiao Yang	ec23ee67cf	add order switch op to nomnigraph (#8436 )	2018-06-15 10:07:41 -07:00
Soumith Chintala	dc186cc9fe	Remove NO_* and WITH_* across codebase, except in setup.py (#8555 ) * remove legacy options from CMakeLists * codemod WITH_ to USE_ for WITH_CUDA, WITH_CUDNN, WITH_DISTRIBUTED, WITH_DISTRIBUTED_MW, WITH_GLOO_IBVERBS, WITH_NCCL, WITH_ROCM, WITH_NUMPY * cover SYSTEM_NCCL, MKLDNN, NNPACK, C10D, NINJA * removed NO_* variables and hotpatch them only in setup.py * fix lint	2018-06-15 12:29:48 -04:00
Lu Fang	d7690742d5	Fix the formula of some norms (#8545 )	2018-06-15 10:41:26 -04:00
Jorghi12	b002aee0ff	Disable verbose logging for PyTorch ROCm nightly builds. (#8517 )	2018-06-15 09:14:03 -04:00
Soumith Chintala	7251d70c5b	fixed THD NO_CUDA (#8539 )	2018-06-15 09:09:23 -04:00
onnxbot	0965e8e9e7	[auto] Update onnx to 0125af3 - Add node test for Dropout (onnx/onnx#1115 ) `0125af3204`	2018-06-15 11:31:13 +00:00
onnxbot	4e3ada19cf	[auto] Update onnx to d9fc1b1 - Add Node test for BatchNormalization (onnx/onnx#1117 ) `d9fc1b14aa`	2018-06-15 08:36:29 +00:00
onnxbot	5a31f73611	[auto] Update onnx to b70ee6a - Make RNN/LSTM/GRU treatment of recurrent weights consistent (onnx/onnx#1103 ) `b70ee6a99b`	2018-06-15 05:22:43 +00:00
James Reed	677739cd1e	Fix createZerosLike for scalars (#8537 )	2018-06-14 20:51:14 -07:00
onnxbot	55de546146	[auto] Update onnx to c647994 - fix upper-bound for local-region in lrn test case (onnx/onnx#1095 ) `c6479945bb`	2018-06-15 03:40:07 +00:00
bddppq	a8bf30d7a5	caffe2 hip python binding (#8491 ) * caffe2 hip python binding * Change back onnx submodule	2018-06-14 19:56:56 -07:00
onnxbot	3a1265c739	[auto] Update onnx to 578a439 - Add Node Test for InstanceNormalization (onnx/onnx#1118 ) `578a439b63`	2018-06-15 01:40:42 +00:00
Edward Z. Yang	829bcf3e9b	Don't apply PR 12 to Thrust anymore. (#8542 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-14 21:39:21 -04:00
Edward Z. Yang	848873e1f6	Must run apt-get install as sudo. (#8454 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-14 21:32:42 -04:00
Lu Fang	302408e6c2	Support BatchNormalization opset 7 (#8482 )	2018-06-15 08:44:35 +08:00
Will Feng	54c456da68	Improve win-build.sh for Windows local build (#8493 )	2018-06-14 17:11:59 -07:00
James Reed	544605d3a9	[JIT] Remove TK_WHERE (#8536 )	2018-06-14 16:46:08 -07:00
James Reed	34c9d16ca1	[JIT] End-to-end example-based robustness testing for hybrid frontend (#8451 ) * End-to-end example-based robustness testing for hybrid frontend * delet this	2018-06-14 14:58:30 -07:00
li-roy	6869a5f0fb	Throw error on 0-length tensor slicing (#7775 ) * throw error on 0-length tensor slicing * return empty tensor instead of throwing error * make 0 slice work for tuples also * add tests * move check to aten * Address comments	2018-06-14 17:40:51 -04:00
gchanan	edc3000963	Move empty size logic from ATen into TH/THC. (#8468 ) * Move empty size logic from ATen into TH/THC. The goal here is to unify the tensor representations; since the "majority" of the representation is in TH, we push the empty size ({0}) and empty stride ({1}) logic into TH. This PR does the following: 1) Previously THTensor/THCTensor with dim_ == 0, size == nullptr, stride == nullptr are now dim_ == 1, size == {0}, stride == {1}. 2) The logic that previously implemented this at the ATen level (e.g. THLongStorageView STRIDE_EMPTY_TENSOR) is removed. 3) The above is pretty clean except for resize/resizeNd logic -- that is still called with nDimension == 0. So, we rename these to resizeLegacy, resizeNdLegacy, map nDimension == 1 into the new regime, and will later write a empty-aware resize/resizeNd and move over the calls to resizeLegacy, resizeNdLegacy. 4) Also introduces some ifdefs that are just used for testing: a) USE_TH_SCALAR: move scalar logic in TH b) USE_TH_ZERO_SIZE_DIM: support arbitrary 0-sized dimensions, i.e {...,0,...}. These are just used to write forward-looking correct code while call sites to _dim() (old TH nDimension) and resizeLegacy are updated. * Get rid of noelem_to_empty. * Use static_cast rather than C-style cast. * Allocator size for empty tensors in THS/THCS. * Add back THLongStorageView type Stride (TH and arg parsing has some magic that needs these to be nullptrs).	2018-06-14 16:56:52 -04:00
onnxbot	6287b80d67	[auto] Update onnx to 3ca20e6 - Remove obsolete installation doc. (onnx/onnx#1108 ) `3ca20e6993`	2018-06-14 20:50:50 +00:00
Wei Yang	ae55865a3b	Migrated hardshrink() to ATen and deprecated nn.Hardshrink() (#8117 ) * 1. added hardshrink() to ATen (CPU + GPU); 2. removed nn.Hardshrink(); 3. reusing previous tests for nn.Hardshrink() and included CUDA tests at test_nn; 4. default parameter lambda=0.5 is not working yet * optimized memory read/write * 1. pass in lambd as scalar for CPU/CUDA_apply; 2. removed tests for hardshrink at test_legacy_nn fixes test_utils * 1. replace zeros_like with empty_like; 2. use scalar_cast in cuda * 1. printing lambd value; 2. default lambd=0.5 is still failing * getting around Scalar bug buy removing default value of lambd from native_functions.yaml, and declare it at nn/functional.py * cleaned up debug printf	2018-06-14 16:42:20 -04:00
Peter Goldsborough	2ab4c9dbec	DEPRECATED -> AT_DEPRECATED (#8496 )	2018-06-14 16:25:49 -04:00
Jorghi12	c4194169a8	Temporary solution for having access to Python installation path. (#8487 ) * Temporary solution for having access to the root path for python installations until Caffe2/PyTorch figure out the best way to build. * Update build.sh Increasing the verbosity of HIP errors.	2018-06-14 16:05:03 -04:00
Zachary DeVito	2f25d1fbc1	Enable tracing and script autograd tests (#8145 ) This commit turns autograd function/method tests into tests run inside of a trace, or directly written using script. These tests have uncovered many bugs and limited functionality in the trace/script pathway, and these failing parts of the tests are disabled using new exclusion sets. The size of these sets will shrink as the bugs are fixed.	2018-06-14 11:48:15 -07:00
sf-wind	aa2c79a125	Add ONLY_FOR_TEST device type into executor (#8461 ) Add ONLY_FOR_TEST device type into executor to support some of the tests	2018-06-14 14:06:35 -04:00
Will Feng	467fc3c436	[READY TO MERGE] Improve docs for Multinomial and Categorical distributions (#8472 ) * Improve docs for Multinomial and Categorical distributions * more improvement * more improvement	2018-06-14 12:47:35 -04:00
Will Feng	aed98067bf	Pin correct clang version in macOS CI test (#8457 )	2018-06-14 12:47:24 -04:00
wuhuikx	fa277e6785	[IDEEP] [fix bug] Fix bug in ideep SkipOutputCopy strategy (#8372 ) * fix a bug for SkipIndices * IDEEP bug, revise the output to CPUTensor in SkipOutputCopy strategy * [IDEEP] Add IDEEP fallbacks for Style-Transfer ops	2018-06-14 09:42:00 -07:00
Tongzhou Wang	a4bd4f6c6f	Fix -g not passed to nvcc when DEBUG=1 (#8407 ) * Fix -g not passed to nvcc when DEBUG=1 * blacklist -Werror * filter CMAKE_CXX_FLAGS too * restore to space-delimited string before ending macro	2018-06-14 12:36:50 -04:00
Sebastian Meßmer	384936f73e	TypeId improvements (#8350 ) * Improve TypeId: - move it to c10 namespace to allow for easy extraction from caffe2 into c10 (i.e. reuseability from aten) - Use unordered_map/unordered_set instead of map/set for performance - Make TypeId a type safe class (i.e. no implicit casts from/to int) - Make TypeId constexpr - Some readability improvements (e.g. using instead of typedef) - Don't explicitly implement TypeMeta copy assignment and construction - let the compiler do that for us. - Add TypeMeta move constructor - Make TypeMeta members noexcept - Implement TypeMeta::operator== and operator!= as free functions instead of in-class * CR comments * fix * fix windows * Rename back to CaffeTypeId * Remove c10::TypeId/TypeMeta * remove C10_KNOWN_TYPE * code review	2018-06-14 09:16:26 -07:00
sf-wind	752bb954b4	Update RunAsyncFailure test (#8486 ) Fix RunAsyncFailure test	2018-06-14 12:05:57 -04:00
Chintak Sheth	21609e0fd0	``bincount`` feature implementation (#6688 ) * Implement CPU bincount feature support * Incorporate feedback on renaming to SummaryOps file and other nits * bincount gpu implementation * refactor cuda code and incorporate nits * doc fix * cuda bincount - cast weights to double if integral type * fix: signed unsigned comparison error * fix: ssize_t error * refactor * make template typenames readable and other nist * make compatible with v0.5 * incorporate comments * update test cases to ensure CUDA code coverage	2018-06-14 11:38:04 -04:00
Orion Reblitz-Richardson	2a0e98a334	Move libtorch CMakeLists.txt to torch/ (#8444 )	2018-06-14 11:36:49 -04:00
Paul Jesse Hellemn	e323f02277	Fixing missing PyCObject_Type bug (#8467 )	2018-06-14 08:08:25 -07:00
cpuhrsch	2184e3f933	Use MKL VML if available (#8458 )	2018-06-14 10:40:21 -04:00
ngimel	8d674c0d51	add comparison operators to jit (#8058 ) * add comparison operators to jit * try to fix CI * address review comments * fix type of comparison ops result * address review comments * fix indentation * add comments * require type_as to have non-dynamic tensor arg * Typo (should check if template argument of type_as, inputs()[1], is tensor) * Use .at() instead of [] * Use .at() again	2018-06-14 09:30:25 -04:00
Du Phan	9d88ff7d0d	Add half cauchy, half normal distributions (#8411 )	2018-06-14 10:28:42 +02:00
li-roy	6a85b133d3	Improve number formatting in tensor print (#7632 ) * Improve number formatting in tensor print * fix bad rebase * address comments * fix test * fix test * use assertExpected for tests * address comments * address comments	2018-06-13 23:57:16 -07:00
Lu Fang	bb9ef8fc2e	Support new version of Dropout (#8470 )	2018-06-14 14:47:47 +08:00
li-roy	2de4ab88f5	remove _assert_no_grad from loss modules (#8460 )	2018-06-13 21:30:51 -04:00
mcarilli	db14f3f33c	More efficient kernels that avoid deprecated shuffles in Embedding and LookupTable (#8400 ) * More efficient kernel that avoids deprecated shuffles in Embedding.cu and THCUNN/LookupTable.cu * Using WARP_BALLOT from THCDeviceUtils.cuh, also changing WARP_BALLOT to return unsigned	2018-06-13 21:29:51 -04:00
onnxbot	f7585178cd	[auto] Update onnx to b7d5a60 - Add stats on ONNX node tests (onnx/onnx#1110 ) `b7d5a60f90`	2018-06-14 01:18:58 +00:00
Peter Goldsborough	64d5b1454e	Add is_variable tag to Tensor (#8414 ) * Add is_variable tag to Tensor * Add is_variable tag to Type	2018-06-13 18:14:29 -07:00
li-roy	6e314f9f68	update tensor clone docs (#8462 )	2018-06-13 21:06:21 -04:00
llyfacebook	681964cc47	output each operator separately due to logcat truncation (#8456 ) as title	2018-06-13 21:05:05 -04:00
Jorghi12	ad378dfbaf	Adding necessary LOCAL variables in order for the perl script that HIP utils uses to run successfully without error. (#8464 )	2018-06-13 20:28:54 -04:00
bddppq	df3559ca58	Move hip utils files to a separate directory (#8446 )	2018-06-13 16:49:59 -07:00
Pieter Noordhuis	dc209ed963	[c10d] Rendezvous skeleton (#8294 ) * [c10d] Rendezvous skeleton The rendezvous function takes an URL and produces a triplet of a store, a process rank, and the process group size. For the file and TCP handlers, the rank and size must be specified, but other handlers may discover these parameters dynamically. It returns a generator function, such that if a rendezvous handler supports rerendezvous, you can write: for store, rank, size in c10d.rendezvous(...): pg = c10d.ProcessGroup(store, rank, size) while the process group is valid: # Do stuff with process group * Add Python 2 fallback for urlparse library * Import X as Y * Relative import seems to fix it * Spelling * Gate import on c10d availability	2018-06-13 15:27:32 -07:00
Jorghi12	8a837f0fe3	Repairing the integrated build path to handle the Caffe2 PR. (#8441 ) * Modifying the build path to handle Caffe2's merge * Update LoadHIP.cmake Fixing typo. * Update Dependencies.cmake Keeping hip_include_directories since other Caffe2 libs depend on it. * Update CMakeLists.txt Only including for the second time if we're building with ATen. * Update CMakeLists.txt Adding comments to make sure future users understand why necessary commands have been added.	2018-06-13 17:16:59 -04:00
Sam Gross	4d287f9074	Use int64_t instead of int for in loop that may overflow. (#8435 )	2018-06-13 17:02:32 -04:00
Pieter Noordhuis	2c9c48a323	Add CODEOWNERS entry for c10d test file (#8445 )	2018-06-13 16:22:57 -04:00
Wei Yang	71a3633e3f	change tensor.set_() argument names to match descriptions in doc (#8403 ) Replaced args name `storage` and `sourceStorage` to `source` in tensor.set_() to match the descriptions in docs.	2018-06-13 13:22:50 -07:00
sf-wind	5b86c3af4a	Update from facebook (#8384 ) * [fix] fixup the bias multiplier data access issue Hotfix for failues in conv_transpose * [D2][Easy]: lint regularizer lint with black * [GanH]: Split mu in adaptive weight for diagnose * [Dper] Add the ability to split FC weights into multiple smaller ones * fix SumReduceLikeOp for empty blob as desc. * add ctc_greedy_decoder for caffe2 ctc_greedy_decoder same as tf's * Update event callback handling Allow multiple callbacks per event * Add WeightedSum layer The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in honet: https://fburl.com/f4rmolg2 crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm * Replicate DAG's behavior Some callers expect RunAsync to block, replicate that behavior in case of explicit 'dag' net type * [dper] layernorm layer as title * Override dag, async_dag, async_polling Overriding dag, async_dag and async_polling with async_scheduling * Name the thread pools Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead. * [Caffe2] FilleOp should support int64_t dimensions Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc) * Remove caffe2/caffe2/contrib/torch/ It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!) #accept2ship * Fix linearWarmup multiplier check The multiplier needs to be non-negative, not strictly positive. * Revert D3314316 This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature. * Speedup generate proposals by partial_sort. Speedup generate proposals by partial_sort. FACEBOOK: - Saw speed improvement for training with this op. - Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details. * More parallel processing friendly for CPP version of GenerateProposals. More parallel processing friendly for CPP version of GenerateProposals. * [DT] [43/n] Lift stop conditions inside reader code back to flow control 1. Split multi_reader function into local_reader and remote_reader 2. Lifted stop conditions inside Limiter back to flow control 3. Split epoch flow building logic into 3 cases: - single machine (1 reader, 1 trainer on trainer0 node, no PS) - (1 reader + 1 trainer) on trainer0 node, has PS - multiple readers, readers do not share nodes with trainers, might have PS or not * Resolve conflicts for torch/_thnn/utils.py * [Caffe2] Handle image decoding errors Image decoding errors can make the whole training fail. This diff is to handle them 1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors. 2.Replace the image with empty in case of error 3.Count the number of errors and throw runtime exception if the rate reaches given number The empty image data is kept. It might introduce noise in the training data. * Update MKL exporter to IDEEP ops TSIA * [Caffe2] GlobalInit is thread safe, fixing the comment With the mutex and lock, GlobalInit is thread safe. Update the comments. * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * [DT]: fix predictor save similar to D6610058, here we add the fix for distributed online training * Remove net_singlethread_async_gpu.cc Closes https://github.com/caffe2/caffe2/pull/2528 This removes net_singlethread_async_gpu.cc as part of our effort to clean CUDAContext and the net executors. * Inline DFS task execution Add a DFS inline task execution mode in executor * Add c10 folder to fbcode This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten. * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * [Fix] sparse regularization in distributed training * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * Improve shard logging in net tracing code Make it handle arbitrary shard ids instead of just one digit ids. * [Caffe2] Call GlobalInit in predictor only in mobile FACEBOOK: Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens: User does not call GlobalInit and initFacebook after program starts User sets a flag manually: https://fburl.com/mcsumw7d User calls OSS predictor. OSS predictor calls GlobalInit GlobalInit calls initFacebook initFacebook resets all flags: https://fburl.com/tolszha1 Thus, the user manually set flags are overwritten This would happen anytime GlobalInit is called long after the program starts. I suppose the intention of the user in this case is not to call GlobalInit throughout the program, but use Caffe2 regardless (is that desired?) But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2. This issue doesn't exist in mobile, since initFacebook is not called on mobile. For now, guard the GlobalInit in predictor for mobile only. May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this. * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Add empty fix for SumLikeReduceOp Add empty fix for SumLikeReduceOp * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * Add thread_name.cc to the CMake file * No need to subtract 1. Fix test segfaults * Fix NetTest, ObserverTest Fix tests (cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41) * CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU * Add a variable to avoid conversion resizing issue * [fix] fixup the bias multiplier data access issue Hotfix for failues in conv_transpose * [D2][Easy]: lint regularizer lint with black * [GanH]: Split mu in adaptive weight for diagnose * [Dper] Add the ability to split FC weights into multiple smaller ones * fix SumReduceLikeOp for empty blob as desc. * add ctc_greedy_decoder for caffe2 ctc_greedy_decoder same as tf's * Update event callback handling Allow multiple callbacks per event * Add WeightedSum layer The motivation is to do weighted sum in HoNet/crossnet, in the next diff, I'll replace model.Add with model.WeightedSum in honet: https://fburl.com/f4rmolg2 crossnet: https://fburl.com/v7awn8se, https://fburl.com/63filbnm * Replicate DAG's behavior Some callers expect RunAsync to block, replicate that behavior in case of explicit 'dag' net type * [dper] layernorm layer as title * Override dag, async_dag, async_polling Overriding dag, async_dag and async_polling with async_scheduling * Name the thread pools Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead. * [Caffe2] FilleOp should support int64_t dimensions Change argument type to int64_t for shape argument of FillerOp (used in ConstantFill, XavierFill, etc) * Remove caffe2/caffe2/contrib/torch/ It's not used anywhere and depends on old lua torch that conflicts with Aten. Given PT1 it's not relevant any more (though it was nice and clever code!) #accept2ship * Fix linearWarmup multiplier check The multiplier needs to be non-negative, not strictly positive. * Revert D3314316 This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature. * Speedup generate proposals by partial_sort. Speedup generate proposals by partial_sort. FACEBOOK: - Saw speed improvement for training with this op. - Yanghan benchmarked the op on a small dataset and see consistent 100% improvement on speed (6ms -> 3ms) on 420 input resolution. See next diff for details. * More parallel processing friendly for CPP version of GenerateProposals. More parallel processing friendly for CPP version of GenerateProposals. * [DT] [43/n] Lift stop conditions inside reader code back to flow control 1. Split multi_reader function into local_reader and remote_reader 2. Lifted stop conditions inside Limiter back to flow control 3. Split epoch flow building logic into 3 cases: - single machine (1 reader, 1 trainer on trainer0 node, no PS) - (1 reader + 1 trainer) on trainer0 node, has PS - multiple readers, readers do not share nodes with trainers, might have PS or not * Resolve conflicts for torch/_thnn/utils.py * [Caffe2] Handle image decoding errors Image decoding errors can make the whole training fail. This diff is to handle them 1.Catch imdecode exceptions and check if decoded image has zero columns or rows. This is counted as decoding errors. 2.Replace the image with empty in case of error 3.Count the number of errors and throw runtime exception if the rate reaches given number The empty image data is kept. It might introduce noise in the training data. * Update MKL exporter to IDEEP ops TSIA * [Caffe2] GlobalInit is thread safe, fixing the comment With the mutex and lock, GlobalInit is thread safe. Update the comments. * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * [DT]: fix predictor save similar to D6610058, here we add the fix for distributed online training * Remove net_singlethread_async_gpu.cc Closes https://github.com/caffe2/caffe2/pull/2528 This removes net_singlethread_async_gpu.cc as part of our effort to clean CUDAContext and the net executors. * Inline DFS task execution Add a DFS inline task execution mode in executor * Add c10 folder to fbcode This adds the c10 folder and its test cases to fbcode. Build flags are mostly taken from aten. * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * [Fix] sparse regularization in distributed training * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * Improve shard logging in net tracing code Make it handle arbitrary shard ids instead of just one digit ids. * [Caffe2] Call GlobalInit in predictor only in mobile FACEBOOK: Calling GlobalInit long after the program starts may not be safe. There are issues if the following happens: User does not call GlobalInit and initFacebook after program starts User sets a flag manually: https://fburl.com/mcsumw7d User calls OSS predictor. OSS predictor calls GlobalInit GlobalInit calls initFacebook initFacebook resets all flags: https://fburl.com/tolszha1 Thus, the user manually set flags are overwritten This would happen anytime GlobalInit is called long after the program starts. I suppose the intention of the user in this case is not to call GlobalInit throughout the program, but use Caffe2 regardless (is that desired?) But adding GlobalInit in the OSS predictor would automatically call GlobalInit when using Caffe2. This issue doesn't exist in mobile, since initFacebook is not called on mobile. For now, guard the GlobalInit in predictor for mobile only. May want to ensure the GlobalInit is always called at the start of the program. @[3501714:kutta] has seen weird issues when not calling GlobalInit at the start of the program on server side. He has made some progress on this. * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Add empty fix for SumLikeReduceOp Add empty fix for SumLikeReduceOp * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * Add thread_name.cc to the CMake file * No need to subtract 1. Fix test segfaults * Fix NetTest, ObserverTest Fix tests (cherry picked from commit 3767e66c3f365596cba3d46d3e7322c933a0ab41) * CTCGreedyDecoderOp only has CPU implementation, test should only run on CPU * Add a variable to avoid conversion resizing issue * Remove the code per soumith's comments * Remove the code per soumith's comments * Remove blank lines in the end of file * Resolve conflicts for torch/_thnn/utils.py * Update MKL exporter to IDEEP ops TSIA * Back out "Add support for generating ATen files during fbcode build" Original commit changeset: 28970ddba353 @override-unit-failures (Note: this ignores all push blocking failures!) * add dependencies for online trainer Add some dependencies so that the online model can use DataPipeline and PredictionTransform operators Relevent post: https://fb.intern.facebook.com/groups/1324375037655677/permalink/1740993462660497/ * Resolve conflicts for tools/jit/gen_jit_dispatch.py * Support advanced pooling options in sum processor * support advanced pooling options in sum processor * remove redundant code * support attention in sum processor * resolve conflicts for caffe2/core/logging_is_google_glog.h and test/test_torch.py * Revert D7962948: [caffe2][nomnigraph] Concat elim for sparseNN This reverts commit f7f434dc5c34ca6058b9765d2ef615453d2276a9 @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * Remove Declarations.yaml * Include common.h * Change std::stoi to caffe2::stoi * [caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364) * [IDEEP] Upgrade IDEEP version Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * [IDEEP] Fix accuracy issue in conv op Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix build error due to lack of src in CMakeLists Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Remove the code per soumith's comments * [ONNX] Add an ATen fallback pathway for ONNX export (#8273) * ATen fallback for ONNX export * Move to enum * Fix model test * Add comment * Address comments BC interface * Remove imaginary file (#8415) * [Caffe2] Enable AMD/MIOPEN ops for Caffe2 (#8306) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Add MIOPEN pooling operator * Add MIOPEN activation operator * Add MIOPEN softmax operator * Add MIOPEN spatial batch norm operator * Add MIOPEN loacl response normalization operator * Add MIOPEN conv operator * Clean-up LRN ops * enable fp16 in MIOPEN pool ops * Enable fp16 for MIOPEN relu op * Enable fp16 for MIOPEN spatial batch norm op * code clean-up * revert float16 support * Create Caffe2 python binding for AMD/ROCM/HIP * Add op fallback for HIP operator * add hip src/test files in cmake * exclude hip src/test files * fix python binding for hip backend * fix MIOPEN pooling op workspace * hack to compile miopen operators * fix include path for MIOPEN ops * Fix include path * Add HIP math utilities * Fix path for HIP math utils * cmake fix * Cmake fix / hipcc for hip files * suppress hipcc warning * cmake fix /replcae USE_HIP with USE_ROCM * revert LoadHIP.cmake change * fix include for thrust/cub-hip * include path fix for conversion.h * Updated with latest upstream changes * clang format fixes * Context_hip updates * Fixed typo in rocblas handle get function * Updated hipified math utils * Updated math hip test util * Updated context hip test * Updated common_hip * Updated net async dag for HIP * Added MIOPEN in operator hip test * fix * C2 dependencies clean-up * fix include path for building custom protobuf * Decouple miopen pool op and conv_pool_op base * cmake refactor * fix operator_hip_test * move all hip/miopen ops files into caffe2/operators/hip * sanitize cmake * permission issue * remove extra parenthesis * remove artifact from resolving merge conflict * cont. sanitize cmake files * fix syntax error * sanitize conversion.h * . * Revert "." This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9. * clang-format * Enable some reduce operators' ONNX backend tests (#8418) * fix old comment to point to the right file (#8416) * Stop pinning nccl version. (#8421) Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428) * Enable some of the ONNX backend test on broadcasting (#8423) * Enable some of the ONNX backend test on broadcasting * enable gemm broadcast * Expose proto utils and ONNX (#8073) * Expose proto utils and ONNX from PyTorch libcaffe2.so * Try to use protobuf from _C.so * Fix ONNX proto header include * Adjust order of imports for ONNX until nanopb goes away * Set and use ONNX_NAMESPACE for PyTorch builds * Show protobuf summary for all builds * Add ONNX_NAMESPACE for cpp_build * Statically link libprotobuf.a into libtorch.so * Set ONNX_NAMESPACE on Windows build * Move core/dispatch up as well * Add /MD flag for Windows build of _C * Potential Windows fix for ONNX and protobuf * Add direct linkage from _C to ONNX on Windows * Only include protobuf wrapper for PyTorch * Pass extra_compile_args to _nvrtc ext build * Remove installation of .a files * Rebase creates some weird situations, revert them manually * Remove more weird changes due to rebase * Need to add thread_name.cc after merge	2018-06-13 13:10:45 -07:00
Vishwak Srinivasan	f1b5124306	Fix #8420 , defaulting the initial hidden state to 0 (#8427 )	2018-06-13 14:26:28 -04:00
Edward Z. Yang	09896d1e77	Allow nccl downgrades (#8429 ) * Revert "Stop pinning nccl version. (#8421)" This reverts commit 3cb45bafc8b9b023049e5f979a2bcb75e3f7009d. * Allow downgrades from libnccl2 install. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-13 13:56:34 -04:00
Orion Reblitz-Richardson	edd4e2c5d1	Expose proto utils and ONNX (#8073 ) * Expose proto utils and ONNX from PyTorch libcaffe2.so * Try to use protobuf from _C.so * Fix ONNX proto header include * Adjust order of imports for ONNX until nanopb goes away * Set and use ONNX_NAMESPACE for PyTorch builds * Show protobuf summary for all builds * Add ONNX_NAMESPACE for cpp_build * Statically link libprotobuf.a into libtorch.so * Set ONNX_NAMESPACE on Windows build * Move core/dispatch up as well * Add /MD flag for Windows build of _C * Potential Windows fix for ONNX and protobuf * Add direct linkage from _C to ONNX on Windows * Only include protobuf wrapper for PyTorch * Pass extra_compile_args to _nvrtc ext build * Remove installation of .a files	2018-06-13 10:25:32 -07:00
Lu Fang	7543d0f794	Enable some of the ONNX backend test on broadcasting (#8423 ) * Enable some of the ONNX backend test on broadcasting * enable gemm broadcast	2018-06-13 10:15:56 -07:00
Vishwak Srinivasan	61f61de270	Expose logsumexp docs and mark log_sum_exp in distributions for internal use (#8428 )	2018-06-13 12:27:58 -04:00
Edward Z. Yang	3cb45bafc8	Stop pinning nccl version. (#8421 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-13 10:53:56 -04:00
Yangqing Jia	7ca8e2f131	fix old comment to point to the right file (#8416 )	2018-06-13 21:33:05 +08:00
Lu Fang	a42c12bb11	Enable some reduce operators' ONNX backend tests (#8418 )	2018-06-13 21:32:50 +08:00
Peter Yeh	c37e5b7137	[Caffe2] Enable AMD/MIOPEN ops for Caffe2 (#8306 ) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Add MIOPEN pooling operator * Add MIOPEN activation operator * Add MIOPEN softmax operator * Add MIOPEN spatial batch norm operator * Add MIOPEN loacl response normalization operator * Add MIOPEN conv operator * Clean-up LRN ops * enable fp16 in MIOPEN pool ops * Enable fp16 for MIOPEN relu op * Enable fp16 for MIOPEN spatial batch norm op * code clean-up * revert float16 support * Create Caffe2 python binding for AMD/ROCM/HIP * Add op fallback for HIP operator * add hip src/test files in cmake * exclude hip src/test files * fix python binding for hip backend * fix MIOPEN pooling op workspace * hack to compile miopen operators * fix include path for MIOPEN ops * Fix include path * Add HIP math utilities * Fix path for HIP math utils * cmake fix * Cmake fix / hipcc for hip files * suppress hipcc warning * cmake fix /replcae USE_HIP with USE_ROCM * revert LoadHIP.cmake change * fix include for thrust/cub-hip * include path fix for conversion.h * Updated with latest upstream changes * clang format fixes * Context_hip updates * Fixed typo in rocblas handle get function * Updated hipified math utils * Updated math hip test util * Updated context hip test * Updated common_hip * Updated net async dag for HIP * Added MIOPEN in operator hip test * fix * C2 dependencies clean-up * fix include path for building custom protobuf * Decouple miopen pool op and conv_pool_op base * cmake refactor * fix operator_hip_test * move all hip/miopen ops files into caffe2/operators/hip * sanitize cmake * permission issue * remove extra parenthesis * remove artifact from resolving merge conflict * cont. sanitize cmake files * fix syntax error * sanitize conversion.h * . * Revert "." This reverts commit 56020cb0e996a31ae27bf1f8f491955ed0b121b9. * clang-format	2018-06-13 04:00:39 -07:00
Peter Goldsborough	36bf89bf09	Remove imaginary file (#8415 )	2018-06-12 23:17:19 -07:00
James Reed	04503962ff	[ONNX] Add an ATen fallback pathway for ONNX export (#8273 ) * ATen fallback for ONNX export * Move to enum * Fix model test * Add comment * Address comments BC interface	2018-06-12 22:59:45 -07:00
Jinghui	76f22b7aef	[caffe2] uprade IDEEP and hotfix for conv op accuracy issue (#8364 ) * [IDEEP] Upgrade IDEEP version Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * [IDEEP] Fix accuracy issue in conv op Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix build error due to lack of src in CMakeLists Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>	2018-06-12 22:06:16 -07:00
Jorghi12	81b92f7515	Get ROCm building again on master (#8343 ) Billing of changes: - New Jenkins script for building on rocm. For now it is a bit hacked together, but we can improve it once CI is running - New ROCM docker image for nightly HIP, and also some legacy packages that we need temporarily - New enabled config py2-clang3.8-rocmnightly-ubuntu16.04-build based off of the existing Caffe2 image (not built yet) - A big pile of cmake fixes, mostly to turn bits on/off when ROCM build is involved - Switch from hiprng to hcrng - Apply some patches directly in code, eliminating the patches - Use __hdiv instead of hdiv, it's more portable - THCNumerics<T>::gt doesn't work in HIP, so simulate it with sub - Add a few more overloads HIP needs - Turn off use of hcc to link (we plan to turn this back on to get tests running) - Search for hiprand, hiprng, hipblas, hipsparse - Better Python 2 portability	2018-06-12 23:05:21 -04:00
cpuhrsch	49d6c5f99f	Branch parallel if number of threads is 1 (#8401 )	2018-06-12 22:28:51 -04:00
Peter Goldsborough	7c9e936986	Add way of deprecating ATen functions (#8404 )	2018-06-12 19:26:43 -07:00
Will Feng	557511102e	Always include Modules_CUDA_fix for Caffe2 builds (#8396 )	2018-06-12 22:19:23 -04:00
Edward Z. Yang	4485ce66c2	Fix flaky RoiAlignTest, fixes #8084 . (#8312 ) * Fix flaky RoiAlignTest, fixes #8084. Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu> * Increase tolerance * more...	2018-06-12 20:06:24 -04:00
Edward Z. Yang	b947ac227d	Check if you forgot to specify 'variants: function' on _out (#8402 ) The Python binding generation code doesn't understand method '_out' bindings correctly, and will compute the indices wrong if you have an '_out' function that's also method. This is a quick check to prevent you from making this mistake. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-12 20:05:45 -04:00
anderspapitto	fcd9af8a25	changes to support ATen code generation inside fbcode (#8397 ) * Back out "Back out "Add support for generating ATen files during fbcode build"" Original commit changeset: 7b8de22d1613 I'm re-sending this diff exactly as it was approved and committed. Fixes to support @mode/opt will be sent separately for ease of review. * Enable building //caffe2:torch with @mode/opt In @mode/opt, python runs out of a PAR, which breaks a lot of assumptions in the code about where templates/ folders live relative to __file__. Rather than introduce hacks with parutil, I simply turn template_path into a parameter for all the relevant functions and thread it through from the top level.	2018-06-12 14:57:29 -07:00
Will Feng	ffffee6aa9	Skip test_multinomial_invalid_probs on Windows (#8360 )	2018-06-12 17:00:49 -04:00
Paul Jesse Hellemn	712a3fad27	Adding CMAKE_PREFIX_PATH and CMAKE_INSTALL_PREFIX to cmake summary (#8398 )	2018-06-12 14:21:11 -06:00
Wei Yang	c3e4b3c88b	raise more informative error msg for torch.load not support seek (#7754 ) Raising more informative error msg for torch.load() when input file does not support seek() or tell()	2018-06-12 12:57:28 -07:00
Richard Xue	c6db1bc952	Add gt lt ge le to the supported operators list (#8375 ) Add gt lt ge le to the supported operators list	2018-06-12 15:28:34 -04:00
Orion Reblitz-Richardson	bef12551ee	Check CAFFE2_USE_MSVC_STATIC_RUNTIME to set -MD vs -MT in cuda.cmake (#8381 )	2018-06-12 11:59:39 -07:00
Orion Reblitz-Richardson	5f5ea75283	Use SYSTEM For all includes in Dependencies.cmake (#8380 )	2018-06-12 11:59:02 -07:00
Orion Reblitz-Richardson	49eec35e5b	More warning skips (#8382 ) * Remove check for unused private fields * Suppress inconsistent-missing-override * Hopefully last warning skip for Mac * Add one more warning ignore	2018-06-12 14:44:36 -04:00
Tongzhou Wang	a77b391de7	[SpectralNorm] don't register original weight as buffer (#8170 ) * don't register original weight as buffer; fixes for buffers that require grad * add test	2018-06-12 14:42:05 -04:00
Xiaodong Wang	922adf8d09	Skip calling ncclCommDestroy in destructor (#8352 ) There is a bug in NCCL that causing seg faults while calling ncclCommDestroy() in the destructor during program exit. According to Nvidia, "Whether the NCCL destructor will be called before or after the CUDA runtime destructor is undefined, which can lead to crashes." For the immediate workaround, skip calling ncclCommDestroy ihe NCCL destructor. This is UGLY and we'll follow up with Nvidia to solve this ASAP.	2018-06-12 13:11:09 -04:00
Yangqing Jia	991bdd7f13	[build] remove the use of NO_CUDA (#8300 ) * Only remove NO_CUDA from CMakeLists.txt * @ezyang's catch	2018-06-12 12:14:36 -04:00
Pieter Noordhuis	5484a197d9	[c10d] Convenience wrappers for collective functions (#8292 ) * [c10d] Add convenience wrappers * Release GIL	2018-06-12 09:05:16 -07:00
Edward Z. Yang	cc8fbc9d08	Revert "Name the thread pools (#8137 )" (#8379 ) This reverts commit 96876d9e7ef6baf9d11541454b5f4d22b092de77.	2018-06-12 11:51:32 -04:00
Giuseppe Ottaviano	96876d9e7e	Name the thread pools (#8137 ) Caffe thread pools currently inherit the thread names from the thread that starts them, which can be misleading. Give them an explicit name instead.	2018-06-11 23:13:46 -07:00
Edward Z. Yang	a161639fcd	Move copyright lines back to NOTICE file, fixes #6911 (#8310 ) Signed-off-by: Edward Z. Yang <ezyang@cs.stanford.edu>	2018-06-11 23:12:41 -07:00
Xiaomeng Yang	44973a06ba	Add affine_channel_op (#8356 ) Add affine_channel_op	2018-06-11 20:51:11 -07:00
onnxbot	87dcdf5fe5	[auto] Update onnx to 86999f9 - Fix the LRN's doc (onnx/onnx#1107 ) `86999f90f0`	2018-06-12 02:52:51 +00:00
Will Feng	1f02ebd323	Use clang 8 to build CUDA in macOS CI (#8355 ) * Don't use -faligned-new flag for clang < 9.0 * Select Xcode 8.2 toolchain when building CUDA * Better comment	2018-06-11 22:45:40 -04:00
albanD	78e3259bbe	Add autograd automatic anomaly detection (#7677 ) * add autograd automatic anomaly detection * python 3 string support * Fix non python build * fix typo in doc * better test and naming fix * fix no python build and python object handling * fix missing checks * clean NO_PYTHON build * Remove unwanted changes	2018-06-11 21:26:17 -04:00
gchanan	38362fa9f3	Prepare for moving 0-sized dimensions in TH/THC. (#8337 ) This does the following: 1) makes nDimension an int64_t (to match ATen) 2) changes the dimension value to dim_ (so we catch direct usages) 3) provide an _dim() that provides access to the "old" view (so we can migrate functions one at a time) 4) have code call ->-_dim() instead of ->nDimension.	2018-06-11 21:18:02 -04:00
Edward Z. Yang	0cced57cb8	Build DEBUG mode with -O0, fixes #8335 . (#8336 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-11 21:05:12 -04:00
Dmytro Dzhulgakov	ae1ceef36a	Allow TypeMeta hold non-default-constructible types (#8349 ) Necessary for Tensor detemplatization (D8121878) - now tensor won't have default constructor (as we don't know the device). Thus this diff makes TypeMeta be constructible with non-default-constructible types in which case ctor() is non-null but always throws. It's dangerous however as we won't catch potential type errors at compile time. Luckily - the only place where ctor() is used is in Blob and Tensor which have templated wrappers there (GetMutable and mutable_data respectively). We can just enforce the necessary type requirements there explicitly as a static_assert. It also changes the failure behavior to be throw() instead of abort(). Aborting the process is not cool for the library :)	2018-06-11 15:53:07 -07:00
Xiaomeng Yang	ddab886105	[caffe2] Move elementwise grad ops to separate files (#8315 ) * Move elementwise grad ops to separate files Move elementwise grad ops to separate files * Fix proto build * Fix build * Fix sync error	2018-06-11 15:38:36 -07:00
Dmytro Dzhulgakov	46c0b01234	Revert D3314316 (#8346 ) This is after 2 years and we do not seem to have a use case for this one, so for the sake of clean API design we should potentially remove this. This would allow us to potentially pass in arguments to optionally construct an object, although it is indeed a little bit unclear how we can reuse existing objects if constructor arguments are passed in. In any case, we may want to remove this dangling feature.	2018-06-11 14:23:10 -07:00
Orion Reblitz-Richardson	9b1480a28e	Fix disabling of USE_CUDNN when not found (#8340 )	2018-06-11 11:40:51 -07:00
James Reed	607b86f603	Implement dim_arange operator (#8266 ) * Implement arange_like operator * add ONNX symbolic * lint * change name * Comment the hack	2018-06-11 10:49:29 -07:00
Peter Goldsborough	de4e97e89a	[C++ API] Cursors (#8190 ) * Add cursors to C++ API * Small self nits * s/struct/class * Use more STL like names for cursors	2018-06-11 09:48:43 -07:00
Edward Z. Yang	77660a9cbb	Support printing sparse tensors in ATen, fixes #8333 . (#8334 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-11 12:15:50 -04:00
Will Feng	77dea37dac	Skip test_multinomial_invalid_probs_cuda on Windows (#8324 )	2018-06-11 11:14:10 -04:00
peterjc123	f4b79f99d1	Fix the script doesn't stop eariler on error for MSVC and Ninja (#8277 ) * Simplify the solution * Remove the usage of set errorlevel	2018-06-11 11:03:30 -04:00
peterjc123	bed172cf54	Fix collect_env.py for Windows (#8326 ) * Fix collect_env.py for Windows * Fix expect file for Win machine	2018-06-11 10:52:21 -04:00
Ailing	52e4d3c4a2	add error when backend is not supported by DDP (#8325 )	2018-06-11 02:18:30 -04:00
Seth Hendrickson	94888106a9	Add docstring for `torch.sparse_coo_tensor` (#8152 ) * add sparse_coo_tensor docstring * update empty tensor example * whitespace * whitespace again	2018-06-11 00:03:51 -04:00
Teng Li	80b6f9edd6	[THD] fix broken THD build with NCCL (#8323 )	2018-06-10 23:48:10 -04:00
onnxbot	01f5ba4f3e	[auto] Update onnx to 4b4085c - Add missing warning ignoring flags to onnx_proto CMake target (onnx/onnx#1105 ) `4b4085c2e9`	2018-06-10 20:49:46 +00:00
Kaiyu Shi	0169ac5936	Fix sample code for cuda stream (#8319 )	2018-06-10 11:41:50 -04:00
onnxbot	bf8689d0e5	[auto] Update onnx to 5ed684e - Remove/replace /MX with /WX for MSVC build. Was typo in a previous ch… (onnx/onnx#1104 ) `5ed684ebe5`	2018-06-10 04:59:13 +00:00
Sebastian Meßmer	d33cc08a97	Small fixes (#8296 )	2018-06-09 23:11:35 -04:00
Xiaomeng Yang	5fe24968ed	Remove unused grad ops on mobile to reduce app size (#8297 ) Remove unused grad ops on mobile to reduce app size	2018-06-09 23:10:05 -04:00
Will Feng	07d3f14eed	Clean up old sccache log before build (#8305 )	2018-06-09 23:07:47 -04:00
Vishwak Srinivasan	b78466a37d	Replace Variables to Tensors (#8309 )	2018-06-09 23:07:15 -04:00
Mike Ruberry	29849e428c	Removes unused THCTensorConv (#8229 )	2018-06-09 17:15:26 -04:00
bddppq	3521cd54af	Fix dividing by zero segfault in Reshape (#8302 ) when infer a dimension of zero size new shape	2018-06-09 09:48:22 -07:00
Yinghai Lu	2ed03898cd	Add depthwise convolution test for IDEEP (#8301 )	2018-06-09 08:44:13 -07:00
Pieter Noordhuis	e6ef18d531	Entries for torch.distributed in CODEOWNERS (#8293 )	2018-06-09 00:28:40 -04:00
Yangqing Jia	788f05d215	Remove THC's FindMAGMA (#8299 )	2018-06-08 21:03:39 -07:00
Sebastian Meßmer	a34211bd79	Some utils for compile-time programming (#7778 ) * Add some C++17 features, implemented with C++14 * Add some type traits * Compile-time type list abstraction * Some utils for compile-time programming * Fix compatibility with a larger range of compilers * Use guts::array instead of std::array because of std::array shortcomings * code review comments * Use quotes for includes	2018-06-08 17:10:53 -07:00
onnxbot	f35d7cce91	[auto] Update onnx to 58efe0a - add float16 support back for math and reduction ops (onnx/onnx#1102 ) `58efe0a9ca`	2018-06-08 23:10:17 +00:00
gchanan	045e7435c3	Have a single THTensor / THCTensor type. (#8288 ) * Remove remaining TensorTypeUtils functions. Mostly what's remaining is copy utilities -- these are now provided in THCTensorCopy.hpp and templatized on the ScalarType rather than the TensorType. * Have a single THTensor / THCTensor type. As was previously done with Storages, have only a single (dtype-independent) THTensor / THCTensor. For documentation and backwards compatibility purposes, the old names, e.g. TH(Cuda)LongTensor alias the new TH(C)Tensor type. * undef GENERATE_SPARSE.	2018-06-08 17:57:44 -04:00
Yangqing Jia	37073f8be0	[build] Remove /torch/lib/THD/cmake in favor of /cmake (#7159 ) * Remove /torch/lib/THD/cmake in favor of /cmake * path fix * Explicitly marking gloo to use cuda * Fix gloo path in THD	2018-06-08 17:55:12 -04:00
Yangqing Jia	c486b8749d	Add option USE_NVRTC which defaults to off (#8289 )	2018-06-08 14:27:23 -07:00
Pieter Noordhuis	695d40efc2	Create initial Python bindings for c10d (#8119 ) * Build and install c10d from tools/build_pytorch_libs.sh * Create initial Python bindings for c10d * clang-format * Switch link order to include more symbols * Add bindings and tests for ProcessGroupGloo * Add broadcast test * Separate build flag for c10d * Explicit PIC property * Skip c10d tests if not available * Remove c10d from Windows blacklist Let it skip by itself because it won't be available anyway. * Make lint happy * Comments * Move c10d module into torch.distributed * Close tempfile such that it is deleted	2018-06-08 12:59:51 -07:00
gchanan	75563674c4	Remove remaining TensorTypeUtils functions. (#8286 ) Mostly what's remaining is copy utilities -- these are now provided in THCTensorCopy.hpp and templatized on the ScalarType rather than the TensorType.	2018-06-08 15:51:25 -04:00
Sebastian Meßmer	efba555a38	c10 build setup (#8264 ) * Move c10/ to caffe2/dispatch/ * Set up caffe2/utils directory	2018-06-08 12:11:17 -07:00
Will Feng	d56b4f2568	Set up CI build for CUDA 9.2 + macOS (#8274 ) * Add macOS CUDA build to CI * Fix undefined symbols issue * Use sccache for CUDA build * Fix sccache issues * clean up	2018-06-08 14:12:52 -04:00
Teng Li	a994b432ee	[c10d] NCCL Process Group implementation (#8182 ) * [c10d] Process Group NCCL implementation * Addressed comments * Added one missing return and clang format again * Use cmake/Modules for everything and fix gloo build * Fixed compiler warnings * Deleted duplicated FindNCCL	2018-06-08 10:33:27 -07:00
Viswanath Sivakumar	d301d9df7a	[ideep] Fuse Conv-Relu after IDEEP graph rewrite, skip group conv (#8233 ) IDEEP supports fusion for non-group conv	2018-06-08 10:29:15 -07:00
Tongzhou Wang	742912512c	Move signal window functions to ATen; add Blackman window (#8130 ) * Move signal window functions to ATen; add Blackman window * fix cuda test not checking scipy	2018-06-08 11:37:46 -04:00
Yangqing Jia	20c516ac18	[cmake] Make cudnn optional (#8265 ) * Make cudnn optional * Remove cudnn file from cpu file	2018-06-08 02:04:27 -07:00
onnxbot	147fc6b9cc	[auto] Update onnx to 39e4668 - fix optimizer does not set ir_version bug (onnx/onnx#1098 ) `39e46687ea`	2018-06-08 06:12:08 +00:00
onnxbot	2928a33f50	[auto] Update onnx to 2508156 - Make error message more verbose (onnx/onnx#1097 ) `2508156135`	2018-06-08 05:11:15 +00:00
Yangqing Jia	1a03ba51dc	[cmake] Add and export Modules_CUDA_fix (#8271 ) * Add and export Modules_CUDA_fix * actually, need to include before finding cuda	2018-06-07 21:50:30 -07:00
James Reed	49593a609a	[caffe2] Fix ATen dispatch for ops with TensorList arg (#8226 )	2018-06-07 20:35:22 -07:00
gchanan	80fade8af4	un-genericize THCDeviceTensorUtils. (#8258 ) * provide data<T>() in TH(C)Tensor. * un-genericize THCDeviceTensorUtils. This is used outside of generic context, so we need to un-genericize it to have a single THCTensor type.	2018-06-07 23:29:41 -04:00
Viswanath Sivakumar	4f1440e828	[ideep] Add IDEEP fallbacks for Faster-RCNN ops (#8260 ) TSIA	2018-06-07 20:21:56 -07:00
Pooya Davoodi	048b2f3a91	[caffe2] Move submodule onnx-tensorrt forward (#7659 ) Commit 82106f833dcb0070446a150e658e60ca9428f89b is essential.	2018-06-07 20:07:04 -07:00
gchanan	8d0c3c721a	Remove TensorUtils<T>::getData, provide data<T>() in TH(C)Tensor. (#8247 ) * Remove TensorUtils<T>::getData, provide data<T>() in TH(C)Tensor. * Fix template parameter.	2018-06-07 20:54:49 -04:00
llyfacebook	0c9b5f0825	Change the output format of caffe2 observers (#8261 ) as title	2018-06-07 17:30:43 -07:00
Wei Yang	4c2a1a1a64	Added backward function for kl_div target (#7839 ) * added backward fn for target * added module test for kl_div target, and assuming targets are probabilities	2018-06-07 17:17:18 -07:00
Ehsan Azar	ce122cc2d3	Relax CUDA_HOME detection logic, to build when libraries are found. (#8244 ) Log when no cuda runtime is found, but CUDA is found	2018-06-07 20:08:13 -04:00
li-roy	73966f65ae	Stop BCELoss from returning negative results (#8147 ) * Stop BCELoss from returning negative results * check explicitly for 0 before taking log * add tests * fix lint * address comments	2018-06-07 20:06:04 -04:00
Xiaomeng Yang	e2be77eae8	Fix app size check (#8256 ) Fix app size check	2018-06-07 15:34:22 -07:00
Pieter Noordhuis	78b88219fa	[cmake] Use CAFFE2_USE_* for public/cuda.cmake (#8248 )	2018-06-07 15:00:38 -07:00
gchanan	b4c6310247	Fully genericize THC/THCUNN (except for TensorUtils and DeviceTensorUtils). (#8251 )	2018-06-07 17:47:45 -04:00
onnxbot	95ae09c866	[auto] Update onnx to 3a035f4 - Add retry logic to model downloading (onnx/onnx#1077 ) `3a035f4397`	2018-06-07 20:33:02 +00:00
gchanan	93a9bb9f35	Don't override Tensor, Storage macros defined outside torch/csrc in t… (#8243 ) * Don't override Tensor, Storage macros defined outside torch/csrc in torch/csrc. This PR does the following: 1) Removes THSTensor macros in torch/csrc, which aren't used. 2) For macros defined outside of torch/csrc (THTensor, THTensor_, THStorage, THStorage_): a) No longer override them, i.e. previously THTensor could actually be THCTensor if a generic file was included from a file including THCP.h. b) Instead, introduce new macros THW* (e.g. THWTensor) to represent a (potentially empty) wildcard character. In addition to making this code easier to read and codemod, this allows us to more freely change TH/THC; for example: currently in the THC random code, the state is casted to THByteTensor; this happens to work because the macros don't happen to override THByteTensor. But if THByteTensor just becomes an alias of THTensor (which is the plan for a single tensor type), then this no longer works. The whole thing is a bit of a mess previously because you really have to understand which macros and redefined and which aren't. We could also rename the macros that live in torch/csrc (e.g. the THPTensor macros), but since that is more self contained, I punted for now. Don't change the plugin.	2018-06-07 16:10:10 -04:00
Zachary DeVito	a466c12bd4	Fix lifting cat into its constant version (#8174 ) This fixes a bug where schema including varargs lists did not lift properly blocking correct ONNX export.	2018-06-07 12:38:58 -07:00
Will Feng	f2c86532f3	Fix TEST_CUDA import in test_cuda (#8246 )	2018-06-07 15:12:05 -04:00
Sam Gross	14f5484e0d	Print requires_grad and grad_fn in string repr of tensor (#8211 ) For example: >>> torch.ones(3).requires_grad_() tensor([ 1., 1., 1.], requires_grad=True) >>> torch.ones(3).requires_grad_() * 5 tensor([ 5., 5., 5.], grad_fn=<MulBackward0>) The suffix (dtype, requires_grad, grad_fn) wraps to a new line if it would cause the the line to exceed the linewidth. >>> torch.ones(10).double().requires_grad_() tensor([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.], dtype=torch.float64, requires_grad=True)	2018-06-07 14:31:23 -04:00
Bhushan Sonawane	d2271dcee3	Fix: gradcheck forced float32 (#8230 )	2018-06-07 12:31:18 -04:00
Yangqing Jia	3eb9ba4d60	Remove .gitmodules.aten since it is in .gitmodules now (#8232 )	2018-06-07 09:12:37 -07:00
Orion Reblitz-Richardson	d1bdb3b10a	Remove core and util warnings (#8239 ) * Fix some signed/unsigned mismatches * Skip unused result warning * Explict fallthrough for murmur hash * Enable aligned new support to eliminate warning * Switch to int instead of unsigned in some cases	2018-06-07 09:10:33 -07:00
Marat Dukhan	ea5d871e49	[caffe2] Build Android tests and binaries in CI (#7593 ) Update benchmark submodule to version with fixed Android/GNUSTL build	2018-06-07 09:07:38 -07:00
Edward Z. Yang	7ed361a466	Rename SparseTensor to SparseTensorRef. (#8237 ) I want to introduce using SparseTensor = Tensor (as a documentary type alias for Tensor), but the name is already taken. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-07 11:03:49 -04:00
xkszltl	346568d40f	Use .cc since some downstream libraries are configured for C++ only. (#8234 )	2018-06-07 01:37:52 -07:00
onnxbot	c22c55ebed	[auto] Update onnx to 62e63e9 - Fix build errors inside protobuf-bench (onnx/onnx#1084 ) `62e63e9de8`	2018-06-07 05:42:14 +00:00
Viswanath Sivakumar	832c88a766	[ideep] Add IDEEP Squeeze op (#8227 ) Similar to MKLSqueezeOp at caffe2/mkl/operators/squeeze_op.cc	2018-06-06 21:58:51 -07:00
Viswanath Sivakumar	4df86b6547	Update MKL exporter to IDEEP ops (#8228 ) IDEEP exporter support	2018-06-06 21:43:43 -07:00
Yangqing Jia	b401e6b03a	Allow optional build and installation of native test binaries (#8225 ) * test finetuning * install off by default * Turn BUILD_TEST=ON for jenkins. * Turn on install_test in jenkins as well	2018-06-06 20:56:31 -07:00
Yinghai Lu	8af88f3525	[Caffe2] Add ADD operator for IDEEP (#8220 ) * Add ADD operator for IDEEP * Add boradcast check * Comments	2018-06-06 20:20:33 -07:00
Ben	2f18f864fb	Fix win mkldnn (#7718 ) * Sync build_pytorch_libs.bat with build_pytorch_libs.sh * fix quoting * add warnings * fix warnings * Add /EHa	2018-06-06 22:59:38 -04:00
Thomas Viehmann	d0ca8896d5	Don't copy unneeded grads when using a function for several derivatives (Fixes #7722 ) (#7759 ) Trying to copy all results fails when one of them is a tensor list which has not been populated. This blew up for CuDNN RNNs when the weights did not require grad. Thanks to Sylvain Gugger for reporting!	2018-06-06 22:54:23 -04:00
Will Feng	c84b97b979	[READY TO MERGE] Enable tests that use DataLoader with multiple workers on Windows (#6745 ) * Don't import TEST_CUDA for test_dataloader on Windows * test_partial_workers is stuck on Windows	2018-06-06 22:50:39 -04:00
Will Feng	89ea6acde2	[NEEDS REVIEW] Add nan and inf probability check to multinomial (#7647 ) * Add nan and inf probs check to multinomial * fix bug * Spawn CUDA test in subprocess * Make sure invalid input won't pass the test case * Try to fix error * Test failure cases in Python 3 only * Try to fix Windows error * Move CUDA test to test_cuda.py * fix issues * fix module name error * no need to check for CUDA existence in test_cuda * Use PY3	2018-06-06 22:49:12 -04:00
Will Feng	784c46ba1d	[READY TO MERGE] Use ccache in macOS build (#8009 ) * Use ccache in macOS build * Moving to sccache * Don't use sccache in test job	2018-06-06 22:38:10 -04:00
gchanan	1172b152ab	move THCP-related utils to cuda/utils.cpp. (#8221 ) These files don't follow the usual pattern: In general the files torch/csrc/X torch/csrc/cuda/X both include the generic file torch/csrc/generic/X, where torch/csrc/X includes the cpu implementations and torch/csrc/cuda/X includes the cuda implementations. (Aside: this is probably not the best structure, the torch/csrc/X fiels should probably be moved to torch/csrc/cpu/X). utils.cpp combines these so that torch/csrc/utils.cpp has cuda specific code. This makes it impossible to declare a single THTensor and THCTensor template type (i.e. THPPointer<_THTensor>, THPointer<_THCTensor>).	2018-06-06 20:58:57 -04:00
gchanan	5ec3041a42	Structure THTensor like THCTensor is structured. (#8217 ) In particular, define a base type, _THTensor, that can be used for all THRealTensor structs. This is just to have less cognitive load when dealing with generic THTensor/THCTensor types (as in templates).	2018-06-06 20:58:04 -04:00
gchanan	deb56dfd06	Change new bernoulli implementation to be fully generic. (#8218 ) The current implementation depends on THTensor types being unique, which is not guaranteed going forward.	2018-06-06 20:54:38 -04:00
onnxbot	07df98a3b8	[auto] Update onnx to e96d823 - Update Google benchmark to 1.4.1 (onnx/onnx#1083 ) `e96d823e5c`	2018-06-07 00:49:04 +00:00
yyetim	02734e389d	Move helper functions to unnamed namespace. (#8224 ) Currently, the helper functions in this file are in global namespace. I am guessing the purpose of excluding them from was to keep them local.	2018-06-06 17:16:34 -07:00
llyfacebook	7cace7219a	Change the benchmark log format and also log flops (#8215 ) as title	2018-06-06 17:04:54 -07:00
Sebastian Meßmer	b03ba9023e	Set up a c10 source folder (#7822 ) * Set up a c10 source folder	2018-06-06 16:56:17 -07:00
onnxbot	f3869b4e03	[auto] Update onnx to 18d70ff - Graph should only have one (input) kParam node (onnx/onnx#1088 ) `18d70ff529`	2018-06-06 23:40:38 +00:00
Sam Gross	12229afd00	Record shape and type in autograd to validate gradients (#8168 ) The check that the gradient is defined is currently disabled because TestJit.test_ge_optimized will trigger the error.	2018-06-06 18:09:53 -04:00
Tongzhou Wang	36b8cc5483	skip CUDA memory leak check on Windows altogether (#8213 )	2018-06-06 17:29:53 -04:00
Yangqing Jia	56b1dcccf6	[cmake] deprecate caffe2_* specific cuda function in cmake. (#8200 ) * deprecate caffe2_* specific cuda function in cmake. * ENV{} -> $ENV{} * CUDA_ARCH_NAME -> TORCH_CUDA_ARCH_LIST * . * . * .	2018-06-06 14:13:26 -07:00
onnxbot	f2f76e29ee	[auto] Update onnx to f28e2f1 - fix lrn spec (onnx/onnx#1090 ) `f28e2f1a60`	2018-06-06 21:13:09 +00:00
Xiaomeng Yang	1f23043b0a	Fix tanh_op on ios build (#8207 ) * Fix tanh_op on ios build * Fix tanh	2018-06-06 14:09:01 -07:00
Peter Goldsborough	7ee517a266	rm -rf aten/contrib (#8165 ) * Remove aten/contrib * Remove from CMake	2018-06-06 16:55:48 -04:00
Pieter Noordhuis	005eef5027	Bump gloo submodule (#8202 ) This includes facebookincubator/gloo#125.	2018-06-06 13:31:29 -07:00
Pieter Noordhuis	5935c5f23b	Fix c10d compiler warnings (#8206 ) Copy compiler flags from the ones used in setup.py and fix warnings. This makes the root build that includes c10d headers warning free.	2018-06-06 13:23:53 -07:00
gchanan	61fd99e1b3	Replace (non-data) TensorUtils calls with non-generic THCTensor calls. (#8176 ) * Replace (non-data) TensorUtils calls with non-generic THCTensor calls. TensorUtils is templatized on the THTensor type, so to support a single tensor type (like ATen), we need to remove these. This PR does the following: 1) Allows THCTensorTypeUtils.cuh to include THCTensor.hpp. This involves moving includes of it outside of generic/, so we can use the new implementations. 2) Defines a single _THTensor struct and changes THCRealTensor to be a derived type of _THCTensor. This allows us to implement a single non-generic function and avoid static_cast or void * tricks to call it from the generic functions. 3) For functions inside of TensorUtils that don't use data pointers: a) Implement the functions in (non-generic) THTensor.cpp and declare them in (non-generic) THTensor.hpp. b) Have the generic versions call the non-generic versions. c) Replace the corresponding TensorUtils<THCTensor>::fn call with (non-generic) THTensor_fn. * Add comment about THCTensor struct. * Error if storage is null in setStorageNd or resizeNd.	2018-06-06 16:19:40 -04:00
llyfacebook	4d025a6a54	add wipe_cache option (#8204 ) as title	2018-06-06 13:08:39 -07:00
Pieter Noordhuis	eaea0f4b82	Update c10d build to link against Caffe2 (#8201 ) This follows #7399.	2018-06-06 11:40:07 -07:00
Will Feng	edfcbfbe1f	Implement randperm for CUDA (#7606 ) * Implement randperm for CUDA * Use Thrust to implement randperm * clean up * Fix test * Offload small input scenario to CPU * Fixed test * Try to fix Windows error * Fix Windows error and clean up * Use fork_rng context manager * Move test_randperm_cuda to test_cuda * Add half tensor support * Fix cuda::type error * Fix CPU offloading * Fix issues * No need to check range for n == 0 case	2018-06-06 14:30:58 -04:00
Tongzhou Wang	9af3a80cff	Docs for gradcheck and gradgradcheck; expose gradgradcheck (#8166 ) * Docs for gradcheck and gradgradcheck; expose gradgradcheck * address comments	2018-06-06 13:59:55 -04:00
Tongzhou Wang	35f08b930d	Allow parallel_apply to take in list[Tensor] (#8047 )	2018-06-06 13:49:52 -04:00
ngimel	e6044e5576	use THCThrustAllocator in BCECriterion (#8188 )	2018-06-06 13:19:16 -04:00
Adam Paszke	c0b2a2aa3b	Add more annotations for arguments in ATen schema (#8192 )	2018-06-06 13:11:39 -04:00
Soumith Chintala	5e372c7106	fix lint	2018-06-06 12:53:58 -04:00
Richard Zou	115a494b5f	Fix scalar check for sparse tensors. (#8197 ) * Fix scalar check for sparse tensors. As discovered in #8152 If `t` is a scalar sparse tensor, `t._indices` used to return a sparse empty tensor because the scalar check was incorrect. This PR modifies the scalar check to return a dense tensor instead of a sparse tensor. i.e. ``` tensor = torch.sparse_coo_tensor([], [], torch.Size([]), device=device) out = tensor._indices() # was a sparse tensor, now is dense. ``` * Fix typos	2018-06-06 12:24:25 -04:00
Paul Jesse Hellemn	8e6f7a1382	[Caffe2] Merging setup.py with setup_caffe2.py (#8129 ) * Mergine setup.pys, torch works, caffe2 works up to other KP * Fix to super call for python 2 * Works on python2 on mac * Consolidating Caffe2 flags	2018-06-06 08:31:31 -07:00
onnxbot	857020b849	[auto] Update onnx to 4e65fd8 - fuse consecutive squeezes (onnx/onnx#1078 ) `4e65fd83ba`	2018-06-06 13:25:11 +00:00
Adam Paszke	f45a3d5558	Add a loop unrolling pass to PyTorch JIT (#7672 )	2018-06-06 09:36:12 +02:00
Ben	a6305ea210	Fix protobuf options (#8184 ) * protobuf * fix protobuf_MSVC_STATIC_RUNTIME	2018-06-05 22:43:05 -07:00
James Reed	c496a4a347	Yangqing as an ONNX codeowner (#8185 )	2018-06-05 22:06:32 -07:00
James Reed	3b8f4d1d88	[ONNX] Fix type_as symbolic (#8183 ) * [ONNX] Nuke type_as symbolic * make it better * Fix lookup + test	2018-06-05 22:06:20 -07:00
Guo Tang	bae82f726d	fix caffe2 docker build (#7411 )	2018-06-05 22:51:43 -04:00
Will Feng	e8d6ac50b4	Add retry logic to sccache download for Windows build (#7697 ) * Add retry logic to sccache download for Windows build * fix script bug * clean up	2018-06-05 22:38:30 -04:00
Tongzhou Wang	c1bd3b3fb7	Better conv error message basing on weight shape (#8051 )	2018-06-05 22:22:00 -04:00
sunnieshang	b2dac08049	Fix a corner case for ReShapeOp (#8178 ) In my use case, in the backward propogate pass, the reshape need to change a [0] tensor into [0,0] shaped tensor. The original implementation would cause out of index issue. This diff fix this problem.	2018-06-05 19:06:10 -07:00
Edward Z. Yang	c21465e32e	Get rid of SOVERSION (again). (#8132 ) We don't want SOVERSION because pip will lose the symlink and double your distribution size, and also because our setup.py accidentally links against both libcaffe2.dylib and libcaffe2.1.dylib on OS X. This leads to a very puzzling error where you get the error "cannot initialize CUDA without ATen_cuda", because there are actually two copies of your registry in memory (because there are two copies of the dynamic library). Dropping SOVERSION makes it impossible to make this mistake. In principle, if the shared library load is done with DYLD_GLOBAL, that should also prevent two copies of the registry from popping up. Worth checking at some later point, if you need to bring back SOVERSION (because, e.g., pip finally fixed their software.) Partially fixes #8022. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-05 22:03:04 -04:00
bddppq	d7ba404e29	Add back onnx console scripts dropped during migration from onnx-caffe2 (#8143 )	2018-06-05 22:02:14 -04:00
Xiao Yang	ffde23d45e	use the correct datatype format (#8144 )	2018-06-05 22:01:59 -04:00
James Reed	e53fec0495	[JIT] Support a single TensorList argument anywhere in the argument list + index_put (#8173 ) * [JIT] Support a single TensorList argument anywhere in the argument list * [JIT] index_put	2018-06-05 21:48:54 -04:00
Ben	ccabdfef42	Export getCudnnHandle (#7726 )	2018-06-05 20:51:52 -04:00
Xiaomeng Yang	9243b64bff	[Caffe2] Update elementwise ops to support numpy style boradcast (#8070 ) * Update elementwise ops to support numpy style boradcast Update elementwise ops to support numpy style boradcast * Fix sqrt_op * Fix compare ops * Fix gradient test * Fix optimizer legacy broadcast * Fix legacy broadcast for elementwise ops * Skip flaky test * Fix eigen simple binary op * Fix attention test * Fix rnn test * Fix LSTM test * Fix tan grad * Fix schema check	2018-06-05 15:49:16 -07:00
cpuhrsch	0517623517	Abstract parallelization to faciliate using threadpools (#8163 )	2018-06-05 22:36:17 +00:00
Paul Jesse Hellemn	ba46d3d981	Adding -setup- path, and better code structure (#8122 )	2018-06-05 14:40:00 -07:00
Paul Jesse Hellemn	fa1bdcf4d2	Pinning opencv to < 3.4 in conda builds (#7923 ) * Pinning opencv to 3.1.0 in conda builds * Also pinning numpy to 1.11 * Trying only specifying <3.4	2018-06-05 13:16:02 -07:00
gchanan	a3fc5ed351	Move non-generic Storage code needed by TensorUtils to non-generic C++. (#8164 ) For non-generic function call implementations in Storage used by TensorUtils, we do the following: 1) Move the declaration from generic/C to non-generic/C++; we don't need backwards compatibility on these functions and want to use e.g. at::ScalarType. 2) Move the implementation from generic/C++ to non-generic/C++. 3) Change the generic implementation to call the non-generic implementation. This will allow us to get rid of the corresponding TensorUtils calls (once we move over the Tensor functions in the same manner).	2018-06-05 14:50:02 -04:00
Vishwak Srinivasan	1cdd7b5c0f	Fix __rshift__ bug (#8161 ) * Fix __rshift__ bug * Add small tests for __lshift__ and __rshift__ in test_cuda * Add a more elaborate check for __lshift__ and __rshift__ * refactor the test to address @zou3519 's comments	2018-06-05 14:30:02 -04:00
Peter Goldsborough	990c6c5531	[C++ API] Improve and use OrderedDict for parameters / modules (#7823 ) * Improve OrderedDict for C++ API * Give OrderedDict a subject and fix review comments * Fix OrderedDict use in torch/csrc/jit/script/init.cpp	2018-06-05 14:29:09 -04:00
Edward Z. Yang	bf58bb5e59	Fix cuda.framework error on OSX. (#8136 ) When compiling OSX with CUDA, Caffe2's build system uses find_package(cuda) to get its grubby hands on the CUDA driver library (for some strange reason, FindCUDA doesn't save this information as a variable). Unfortunately, on OSX, sometimes this picks up the cuda.framework folder, and then our build system chokes to death because it doesn't try to link against this as a framework. (Is the folder even a framework? I have no idea). This commit attempts to fix this in a two pronged fashion: 1. For some users, reducing the precedence of frameworks using CMAKE_FIND_FRAMEWORK seems to help. So we set these variables. However, this fix is not perfect; on my laptop it doesn't actually solve the problem. 2. PyTorch doesn't actually need the CUDA driver API. So we only add the dep when building Caffe2. Fixes #8022 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-05 13:37:05 -04:00
ngimel	7c1e8c3c7a	remove some unnecessary cudaGetDevices (#8089 ) * remove unnecessary cudaGetDevices * make curDevice argument non-optional, add explicit checks to current_device	2018-06-05 13:17:47 -04:00
onnxbot	aec6d6a7d3	[auto] Update onnx to 968d28d - fix Node::isBefore (onnx/onnx#1075 ) `968d28d901`	2018-06-05 16:31:01 +00:00
Gao, Xiang	fe805794ac	docstring support for @script and @script_method (#7898 ) * docstring support for @script and @script_method * make it python2 compatible * improve according to review * improve build_stmts * use filter instead of list comprehension * improve the way wrap is handled for script_method * stash the original method instead * allow dynamic attr for ScriptMethod and GraphExecutor * a bit comment on build_Expr * remove _build_wrap * a bit improve on comments * rename to __original_methods * should be _original_methods	2018-06-05 10:36:08 -04:00
Ir1dXD	c719c8032c	docs: add canonical_url and fix redirect link (#8155 ) * docs: enable redirect link to work for each specific page * docs: add canonical_url for search engines closes #7222 * docs: update redirect link to canonical_url	2018-06-05 10:29:55 -04:00
Lynn	227a7640ce	Accelerate bernoulli number generation on CPU (#7171 ) * opt bernoulli rng with vsl and openmp * detect cpu vendor for bernnoulli * retrigger test platform * check the vendor more severely * use cpuinfo to check vendor	2018-06-05 10:23:48 -04:00
Ir1dXD	ee0b75a3d2	docs: Add warning to torch.repeat() (#8116 ) * docs: Add warning to torch.repeat() closes #7993 * docs: Add links for numpy functions * docs: Break the too long line	2018-06-05 10:15:36 -04:00
LaiyuanGong	f5cd479b59	fix type mismatch while call torch._C._cuda_setDevice (#8065 ) * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch in scatter * fix type mismatch in scatter * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch while call torch._C._cuda_setDevice * fix type mismatch while call torch._C._cuda_setDevice	2018-06-05 09:53:22 -04:00
Yinghai Lu	c446269568	cpu/ideep context converter (#8139 )	2018-06-04 21:28:59 -07:00
sunnieshang	f8c18e00d5	Fix a corner case for ReShapeOp (#8142 ) In my use case, in the backward propogate pass, the reshape need to change a [0] tensor into [0,0] shaped tensor. The original implementation would cause out of index issue. This diff fix this problem.	2018-06-04 20:40:43 -07:00
Will Feng	a5ce0126cc	Fix job name checking for AVX tests (#8135 )	2018-06-04 19:25:15 -04:00
Tongzhou Wang	c0a419e6ba	Add non_blocking to Tensor/Module.to (#7312 ) * Add non_blocking to Tensor/Module.to * flake8 * Add argparse tests * cpp parse * Use C++ parser * use a commong parse function with Tensor.to * fix test_jit * use THPObjectPtr * increase refcount for None, True, and False * address comments * address comments	2018-06-04 18:46:52 -04:00
bddppq	ec4a0f332e	Add back lrn test (#8134 ) * Revert "Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segfault (#8127)" This reverts commit 410191c4175eaae141306cdb3c3c1c1e8a495225. * Fix mismatched default values	2018-06-04 15:06:40 -07:00
Edward Z. Yang	94e197c262	Add utf-8 header to Python file with Unicode. (#8131 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-04 14:49:32 -07:00
gchanan	0ea2fa15a3	Replace most remaining usages of TensorUtils<T>::DataType. (#8124 ) As in https://github.com/pytorch/pytorch/pull/8056, this doesn't work with a single TensorImpl type. This replaces the usages of with a templatized parameter and static_asserts that the new and old are equal. After this we can get rid of the old template parameter, but I want to ensure they are equivalent across all builds first.	2018-06-04 16:48:57 -04:00
bddppq	410191c417	Skip OnnxBackendNodeModelTest::test_lrn_default_cuda that causes segfault (#8127 )	2018-06-04 12:34:15 -07:00
daquexian	df28f5d06e	[Caffe2] Support non peer access in muji and fix bug when reduced_affix is empty (#6896 ) * [Caffe2] Support non peer access in muji * [Caffe2] Add test for 4 gpus and 2 groups * [Caffe2] Add comments * Fix bug when reduced_affix is empty * Fix typo and add comments about cpu and amd gpu	2018-06-05 03:14:43 +08:00
Edward Z. Yang	7fc110b521	Split SparseTensorImpl off from TensorImpl. (#7990 ) * Split SparseTensorImpl off from TensorImpl. At the moment they have the same data layout, but with the upcoming refactor they will not, and we need a place to put all of the sparse tensor specific fields. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Update SparseTensorImpl.h	2018-06-04 15:02:09 -04:00
onnxbot	f24d715e23	[auto] Update onnx to 2a87616 - Tests for LRN operator (onnx/onnx#903 ) `2a876162ac`	2018-06-04 18:13:14 +00:00
Edward Z. Yang	cef8bfb33e	Add missing pragma once. (#8118 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-04 13:26:39 -04:00
onnxbot	7ba0dbc2cd	[auto] Update onnx to 2d5ce4a - Remove empty model (onnx/onnx#1058 ) `2d5ce4aeb6`	2018-06-04 16:35:08 +00:00
Edward Z. Yang	96a77b5aa8	Make libshm also test if rt requires pthread. (#8112 ) In some configurations (e.g., our internal build of GCC 5 + GLIBC 2.23), -lrt is not sufficient to use shm_open; you also need to declare a dependency on pthread. This patch adds a surgical extra fix to detect this situation, in the case that I noticed it failing in the wild. Fixes #8110 Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-04 12:12:59 -04:00
Marcin Elantkowski	c2046c1e5e	Implement adaptive softmax (#5287 ) * Implement adaptive softmax * fix test for python 2 * add return_logprob flag * add a test for cross-entropy path * address review comments * Fix docs * pytorch 0.4 fixes * address review comments * don't use no_grad when computing log-probs * add predict method * add test for predict * change methods order * get rid of hardcoded int values * Add an optional bias term to the head of AdaptiveSoftmax	2018-06-04 12:12:03 -04:00
bddppq	e749159064	Detect CUDNN related environment variables in cmake (#8082 )	2018-06-04 12:10:36 -04:00
bddppq	e5b997223c	[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7955 ) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a. * Resolve merge conflicts * . * Update GetAsyncNetHIPThreadPool * Enable BUILD_CAFFE2 in pytorch build * Unifiy USE_HIP and USE_ROCM * always check USE_ROCM * . * remove unrelated change * move all core hip files to separate subdirectory * . * . * recurse glob core directory * . * correct include * .	2018-06-04 09:04:30 -07:00
Sam Gross	3d7a064369	Remove out-of-date comment (#8114 )	2018-06-04 11:45:33 -04:00
Peter Goldsborough	04a3616de0	Replace std::size_t with size_t (#8093 )	2018-06-04 11:10:44 -04:00
Zachary DeVito	185f8fbe7c	Removing remaining NO_PYTHON ifdefs (#8067 ) * Remove NO_PYTHON in tracing * Remove NO_PYTHON in ir.h * Remove NO_PYTHON in test_jit.cpp	2018-06-04 10:53:28 -04:00
Seth Hendrickson	f8830f9991	use regex in kwarg parser (#8061 )	2018-06-04 10:47:55 -04:00
Edward Z. Yang	9fc0ba31b9	Do an additional sanity check that nvcc and CUDA include dir agree. (#8094 ) If you set CUDA_HOME and CUDA_NVCC_EXECUTABLE together, you may end up in a situation where the CUDA_VERSION of your includes mismatches the CUDA version of your nvcc. See #8092 for a concrete case where this can occur. Explicitly detect this situation and give a good error message in this case! Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-04 10:41:47 -04:00
bddppq	db5bc71562	Fix and ignore some warnings (#8081 )	2018-06-04 01:01:59 -07:00
Peter Goldsborough	dff115f47a	Move backtrace to its own header (#8096 ) * Move backtrace to its own header * Move cxxabi.h into Backtrace.cpp	2018-06-03 21:11:29 -07:00
onnxbot	36c3859d3e	[auto] Update onnx to 356208d - add input tensor dimension checks to shape inference (onnx/onnx#1070 ) `356208d756`	2018-06-03 18:15:05 +00:00
onnxbot	74672a31a2	[auto] Update onnx to cc26486 - bump version to 7 for prelu. (onnx/onnx#1063 ) `cc26486541`	2018-06-03 09:15:45 +00:00
Adam Paszke	9232afeffa	Add code for TensorBoard visualization of JIT GraphExecutors (#8050 )	2018-06-02 20:55:25 +02:00
bddppq	5e35fbfaa3	Post process onnx proto (#8064 ) * Post processing onnx generated protobuf files to hide global symbols * . * .	2018-06-02 10:46:48 -07:00
bddppq	01f5ee77e3	Skip ConvTraspose ONNX backend tests (#8074 )	2018-06-02 09:52:18 -07:00
onnxbot	624ade1eac	[auto] Update onnx to bd98abb - Add a hook for doing post-processing on protobuf generated header files (onnx/onnx#1068 ) `bd98abbba0`	2018-06-02 16:04:34 +00:00
onnxbot	1fc96b6471	[auto] Update onnx to eb12f72 - Add conv transpose test cases (onnx/onnx#886 ) `eb12f72a86`	2018-06-02 15:53:55 +00:00
Varun Jain	68948306bc	Support to run ONNX Upsample operator (mode=nearest) in Caffe2 (#8037 ) * Added support to run ONNX Upsample operator (mode=nearest) in Caffe2 * adding error checks to upsample * adding error checks to upsample * adding error checks to upsample * changing to np.isclose * Revert onnx submodule update * still fixing	2018-06-02 08:45:44 -07:00
gchanan	7be457c2a4	Reduce usages of TensorUtils<T>::DataType in THC. (#8056 ) TensorUtils<T> is basically ATen-dispatch-lite in that it allows one to do multi-type THC function dispatch with a single call. However, it is templatized on the Tensor type, and since we are moving to a single Tensor type, this doesn't work. Most of the functions in TensorUtils (e.g. getDims) can be pulled up a level, to just call THCTensor_nDimension (or directly accessing the member), but the DataType specific functions are more problematic. So, this PR does two things: 1) Replaces calls of 'TensorUtils<THCTensor>::DataType' with 'real' since these are identical 2) Templatizes the THC_pointwiseApplyX functions to take scalar types. To ensure this is done correctly, we static_assert that the scalar type template parameter matches the scalar type of the corresponding template parameter. We will need to get rid of these static_asserts in the future, but this is useful for now.	2018-06-02 11:26:02 -04:00
gchanan	7926313235	Have a single THStorage and THCStorage type. (#8030 ) No longer generate data-type specific Storage types, since all Storage types are now identical anyway. For (some) backwards compatibility and documentation purposes, the Real names, e.g. THLongStorage are now #defined as aliases to the single THStorage type	2018-06-02 11:05:02 -04:00
Vishwak Srinivasan	3cbaa6b785	[ready] Clean up torch.distributions (#8046 )	2018-06-02 16:54:53 +02:00
zrphercule	afa75fa6b2	Remove NO_PYTHON macros from Exceptions.h/cpp (#8007 ) Removes cases where NO_PYTHON was unnecessary in Exception.h/cpp	2018-06-01 22:37:18 -07:00
onnxbot	bef306eac7	[auto] Update onnx to 033f956 - make gcc happy (onnx/onnx#1061 ) `033f956f41`	2018-06-02 05:06:33 +00:00
onnxbot	f2573e8df7	[auto] Update onnx to e6a500e - Extract constant to initializer (onnx/onnx#1050 ) `e6a500e54c`	2018-06-02 04:29:28 +00:00
onnxbot	7379b22abe	[auto] Update onnx to 4f8ef17 - Remove erroneous documentation around maps and sequences. (onnx/onnx#1069 ) `4f8ef17ad3`	2018-06-02 04:20:54 +00:00
onnxbot	8d4e92a91d	[auto] Update onnx to 0dbec2a - - Generate protoc type hints on Windows (onnx/onnx#1047 ) `0dbec2a047`	2018-06-01 23:59:08 +00:00
Soumith Chintala	2fb957da81	workaround for Sequential when one cannot retrieve python source (#8048 )	2018-06-01 18:45:11 -04:00
Tongzhou Wang	eb2f21f1e4	Skip CUDA memory leak test on BN tests on windows (#8043 )	2018-06-01 18:09:14 -04:00
Bram Wasti	82b981e4db	Update from facebook 1ee4edd286a3 (#8040 ) * Adding instance weight to batch distill loss as title * add bfloat 16-31 added bfloat 16-31 and their respective unit tests * [CUDA9] Upgrade - fbcode CUDA9 upgrade diff D5654023 has been out for a while thanks to Pieter. But with time growing it's becoming quite hard to rebase, because of the symlinks and auto-generated build/config files in tp2. Break D5654023 into two diffs, one touching tp2 config files, and another one touching fbcode TARGETS file (adding nvcc flag). These two should be a bit easier to rebase (for detailed procedure see "Test Plan"). This diff can only be committed if: 1. CUDA 9 rpm is rolled out fleet-wide (TBD) 2. NVidia driver 390.40 is rolled out fleet-wide (done) 3. Upgrade CUDA 9.1, cudnn 7.1, nccl 2.1 (done) 4. Make sure all dependents are built (done) 5. Test all C2 operators, PyTorch (see test plan) * Share intermediate int32 buffer across Conv ops Adding a known type * [C2 fix] infer function for ensure_cpu_output_op this is adding the missing device funtion for ensure_cpu_output_op * [int8] Add blob serializer/deserializer for Int8TensorCPU To export to logfiledb * [nomnigraph] Add try catch block to optimization passes in predictor This will catch failures that happen in the optimization pass. * Caffe2: avoid static initialization order fiasco for CAFFE_ENFORCE CAFFE_ENFORCE uses strack trace fetcher. Which is currently a global static variable. If at static initialization time CAFFE_ENFORCE is used, this is a SIOF. Recently CAFFE_ENFORCE was added into init functions registration, so we started to see this. Meyers singleton is going to provide safety here. If stacktrace fetcher was not registered yet, it will just use a dummy one. * NUMA support in SparseNN CPU benchmark Adding support for NUMA in SparseNN CPU benchmark * [mobile-roofline] Add logging needed for roofline model This should be all that's needed * Let the operators using the same input if the operators are not chained or else, we have to change the input data dims * fix null-pointer-use UBSAN errors in in reshape_op.h * revert previous fix on input blob name as title * Adding flag to let MineHardNegative automatically extract single value from dict Model exporter requires the output of the model to be a struct. This makes it convenient to use those models directly in MineHardNegative by allow automatic extraction of the single element of dict, which is a common use case. * Reverting change that broke internal tests back to OSS compatible state	2018-06-01 17:41:09 -04:00
yyetim	9060b7f4e2	Add profiling annotations to NeuralNet[Operator\|Data] (#8005 )	2018-06-01 14:27:42 -07:00
Zachary DeVito	ef1c15f5ca	[script] Add support for torch.zeros, torch.ones, etc. (#7799 ) * [script] Add support for torch.zeros, torch.ones, etc. * modifies gen_jit_dispatch to creating bindings for functions that do not take tensor arguments, but do have an initial type argument * adds tensor attributes to these functions for device, layout, and dtype specification * extends the list of valid compiler constants to include device, layout, and dtype. * allows functions with Generators, but only using the default generator Known limitations: * when using `torch.float`, we convert it to a scalar tensor and make no checks that it is actually used only in a dtype specification. This is similar to how we handle Python numbers, creating some situations where the script is more permissive. Fixing this requires much more significant changes to the IR, so is lower priority for now. * devices specified using string literals e.g. 'cuda:1' do not work, since we do not support string literals in general.	2018-06-01 14:24:18 -07:00
onnxbot	2ec2e6947e	[auto] Update onnx to 9e7855d - Remove PyTorch generated Upsample tests cases (onnx/onnx#1064 ) `9e7855dcd4`	2018-06-01 21:15:47 +00:00
Tongzhou Wang	c6a923f486	Support modules that output scalar in Gather (and data parallel) (#7973 ) * Support modules that output scalar in Gather (and data parallel) * Improve warning msg	2018-06-01 16:20:39 -04:00
onnxbot	215abffe60	[auto] Update onnx to 760c928 - add missing hasNInputShapes check for bidirectionalBroadcastShapeInference (onnx/onnx#1060 ) `760c9283d0`	2018-06-01 20:14:57 +00:00
Zachary DeVito	23dd033b51	Factor python dependency out of interpreter (#7970 ) * Factor python dependency out of interpreter * Remove NO_PYTHON for the autograd engine If there is no python bindings, then a default Engine is constructed the first time it is requested. If the python libraries are loaded, then they override the default accessor and the default engine becomes a python Engine. Note: it is possible for two engines to be generated if a non-python one gets created before the python bindings are loaded. This case is rare, and just results in additional threads being spawned. * Fixing AlexNet test which is skipped in CI	2018-06-01 16:07:21 -04:00
anderspapitto	41ef5c2d4b	Support for generating ATen during the fbcode build, rather than committing the generated files (#8002 ) Paint the internal bikeshed a slightly different color to appease Buck tooling.	2018-06-01 16:04:02 -04:00
Will Feng	d27e138a1a	Allow CI testing with different AVX configs (#8020 ) * Allow CI testing with different AVX configs * Unset ATEN_DISABLE_AVX and ATEN_DISABLE_AVX2 in default config	2018-06-01 12:30:11 -07:00
Ryan Brigden	8f421159fd	Fix profiler crash when no events register (#8034 ) * Fix profiler crash when no events register When trying to profile, attempting to print the event table throws a vague error because the event list is empty: .... max_name_length = max(len(evt.key) for evt in events) ValueError: max() arg is an empty sequence This change fixes the error by returning an empty string. * Update profiler.py	2018-06-01 14:38:24 -04:00
Tongzhou Wang	bf29abd908	propagate nan in some activations (#8033 ) * propagate nan in some activations * fix py2 not having math.nan * flake8	2018-06-01 14:08:01 -04:00
onnxbot	8b447fa784	[auto] Update onnx to 3fb9656 - Fix for fbcode CI (onnx/onnx#1062 ) `3fb965666e`	2018-06-01 17:09:28 +00:00
Pieter Noordhuis	d0ec8af0fc	Support CUDA tensors in ProcessGroupGloo (#7694 ) This adds an unconditional dependency on CUDA, which is not desirable for the long term. Ideally we have split like ATen where we have different artifacts for different backends so you can decide at runtime what to use.	2018-06-01 09:54:45 -07:00
onnxbot	d0e27609ab	[auto] Update onnx to 1504a33 - Convert schema assert for duplicate type names to exception (onnx/onnx#1057 ) `1504a33abb`	2018-06-01 15:24:25 +00:00
onnxbot	03fe106448	[auto] Update onnx to 33e9cd4 - Remove the usage of default value to fix invalid proto3 files. (onnx/onnx#1052 ) `33e9cd4182`	2018-06-01 15:23:39 +00:00
Vishwak Srinivasan	52368f25cc	Example for Transformed Distribution (#8011 )	2018-06-01 16:23:57 +02:00
Xingdong Zuo	8be17723cb	Update nn.rst (#8029 )	2018-06-01 09:37:18 -04:00
Will Feng	b41050ff66	Re-enable build env check (#7969 ) * Re-enable build env check * Fix linux test error * Try to fix macOS test error	2018-06-01 06:57:47 -04:00
Edward Z. Yang	dbe5c7f6e9	Mention the pytorch-ci-hud on the README. (#8004 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-06-01 06:56:48 -04:00
bddppq	580d212267	Remove WITH_ROCM cmake flag/variable (use USE_ROCM solely) (#8013 )	2018-05-31 20:50:59 -04:00
gchanan	436211e27c	Make AT_FORALL_SCALAR_TYPES usable outside of at::namespace. (#7935 ) * Make AT_FORALL_SCALAR_TYPES usable outside of at::namespace. This requires renaming the _cast functions which used the unqualified names. * Separate onnx mapping of scalar type from cast name. * Fix flake8. * Properly cast onnx.	2018-05-31 20:50:16 -04:00
onnxbot	6c0bc27371	[auto] Update onnx to 8ec0e5f - Add index check for Transpose's type inference function (onnx/onnx#1053 ) `8ec0e5fe9b`	2018-06-01 00:11:02 +00:00
cpuhrsch	e63be0d58f	Reduce grain size for Unary operations (#8003 )	2018-05-31 21:59:53 +00:00
James Sun	0fe4cb10e3	Add on-stack observer cache for Observable (#7931 ) observers_list_ stores all the observers for an observable. The list is allocated on heap, which can cause LLC miss. Add an on-stack observer cache for fast access. In production, we have seen 20% speed up for start and stop observer calls.	2018-05-31 13:05:02 -07:00
Dmitriy Serdyuk	fd30487089	Fix a couple of typos (#7998 ) * Fix typo * Fix typo * Fix typo * Fix typo	2018-05-31 15:29:02 -04:00
Pieter Noordhuis	8afe4c95d6	Entry for c10d in CODEOWNERS (#8001 )	2018-05-31 15:28:16 -04:00
Edward Z. Yang	80ede55242	Revert "Set smaller grain size for some cases" (#7988 )	2018-05-31 15:24:03 -04:00
Tongzhou Wang	85ee94b7be	Add memory leak check in CUDA tests (#7270 ) * Add memory leak check in CUDA tests * Tracking multi-GPU too * fix run_test.py not running __name__ == '__main__' content; add test for make_cuda_memory_checked_test * add a comment * skip if cuda * 1. Change the wrapper to a method in common.py:TestCase 2. Refactor common constants/method that initialize CUDA context into common_cuda.py 3. Update some test files to use TEST_CUDA and TEST_MULTIGPU * Fix MaxUnpool3d forward memory leak * Fix MultiLabelMarginCriterion forward memory leak * Fix MultiMarginLoss backward memory leak * default doCUDAMemoryCheck to False * make the wrapper skip-able * use TEST_MULTIGPU * add align_corners=True/False tests for Upsample; fix TEST_CUDNN * finalize interface * VolumetricMaxUnpooling_updateOutput * fix test_nccl * rename THC caching allocator methods to be clearer * make the wrapped function a method * address comments; revert changes to aten/src/THC/THCCachingAllocator.cpp * fix renamed var	2018-05-31 15:09:54 -04:00
li-roy	bafec1637e	support loading gzip (#6490 ) * support loading gzip * address comments * address comments * fix lint * fix test for python2	2018-05-31 15:06:38 -04:00
Orion Reblitz-Richardson	3481c6c5e2	Build ONNX for PyTorch version of libcaffe2 (#7967 )	2018-05-31 11:57:35 -07:00
Seth Hendrickson	e9c33e91d9	Remove python bindings for `torch.slice` (#7924 ) * skip python bindings for slice * remove tests * convert slice test to indexing	2018-05-31 13:42:49 -04:00
xkszltl	89ba9dc44f	Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. (#6834 ) * Import/export observer symbols for DLL, which fixes the linking error in Visual Studio. * Add support of all default cmake build types for release to cuda.	2018-05-31 10:22:21 -07:00
gchanan	eb39a23d8e	Make THStorage / THCStorage have void* data ptr. (#7964 ) * Make THStorage / THCStorage have void* data ptr. This is the initial step in unifying the ATen and TH tensor representations, next is to only generate a single THStorage / THCStorage type. The major changes here are: 1) data has been renamed to data_ptr and made void* in THStorage/THCStorage. 2) THStorage / THCStorage stores a at::ScalarType representing its data type (This will be useful when we generate a single THStorage/THCStorage). 3) APIs for Accessing the data as a real: a) storage->data<real>() -- this does runtime-type checking (checks that the at::ScalarType is correct). b) storage->unsafeData<real>() -- as above, but no runtime-type checking (used in inner loops / fast code paths). c) THStorage_(data)(storage) -- this already existed, just calls storage->data<real>(). Add include. * Attempt to fix clang build issues. * Clarify comment and remove extra character. * Rename unsafeData -> unsafe_data. * Remove unnecessary 'to' function to get compile time rather than link time errors.	2018-05-31 13:10:08 -04:00
Richard Zou	b5594ac750	Raise error when torch.load a storage on a non-existing device (#7921 ) * Raise error when torch.load a storage on a non-existing device Before, doing torch.load(...) on a CUDA tensor on a CPU-only machine would raise an unreadable error: ``` ~/pytorch/pytorch/torch/cuda/__init__.py in __enter__(self) 223 if self.idx is -1: 224 return --> 225 self.prev_idx = torch._C._cuda_getDevice() 226 if self.prev_idx != self.idx: 227 torch._C._cuda_setDevice(self.idx) AttributeError: module 'torch._C' has no attribute '_cuda_getDevice' ``` This PR makes it so that torch.load raises a hard error if one tries to load a storage onto a non-existing device and suggests the user to use torch.load's map_location feature. * Address comments * missing dep	2018-05-31 09:44:50 -04:00
Tongzhou Wang	f9926e4ce5	Fix EmbeddingBag max_norm option (#7959 ) * fix EmbeddingBag max_norm option * flake8 * add warning to the embedding bag arg change	2018-05-31 09:42:56 -04:00
cpuhrsch	5596260b9e	Add third wayt to determine IS_CONDA (#7971 )	2018-05-31 09:04:27 -04:00
bddppq	d8e28cfec2	Enable ONNX backend Mean tests (#7985 )	2018-05-31 21:03:12 +08:00
Peter Goldsborough	d476d0b4ab	[Hotfix] Bring back warnings and -Werror to ATen (#7866 ) * Bring back warnings and -Werror to ATen * Unbreak... * Fix tbb errors	2018-05-30 21:59:04 -07:00
xkszltl	1bb6d44a21	Use Glog's implementation of STL logging when possible. (#7206 ) Inject custom workaround into namespace std so that it can be found by ADL.	2018-05-30 21:10:27 -07:00
bddppq	74783f0cd8	Move the broadcast check in MKL Add/Sum to runtime (#7978 )	2018-05-30 21:09:32 -07:00
Will Feng	08b4c7ab7f	Change perf test folder after git checkout (#7980 )	2018-05-30 20:15:53 -07:00
peterjc123	108fb1c2c9	Fix the import part of the windows doc (#7979 )	2018-05-30 21:51:30 -04:00
cpuhrsch	6e1de968d6	Use mingfeima's mkldnn (#7977 )	2018-05-30 21:46:39 -04:00
Orion Reblitz-Richardson	df77ea7baf	Fix the cpp libtorch CUDA build (#7975 )	2018-05-30 21:27:45 -04:00
Svetoslav Kolev	fce6b24468	Allowing MatMul to create a gradient even with 3 inputs. useful if you are differentiating a graph twice (#6536 )	2018-05-30 16:53:54 -07:00
Pooya Davoodi	9b1abd2f81	[Caffe2] Keep name of caffe2_pybind11_state and caffe2_pybind11_state_gpu in debug build (#7155 )	2018-05-30 16:38:44 -07:00
Du Bois Eloi	f0c09203b0	[caffe2] YellowFin parameter update GPU code fix. (#6993 )	2018-05-30 16:36:08 -07:00
James Reed	c94f3bbf33	Fix typo in autodiff formula for addmm (#7932 )	2018-05-30 18:11:24 -04:00
Sam Gross	2e78bfa530	Delete unused file (#7919 )	2018-05-30 18:09:55 -04:00
Holger Kohr	fa8bdafa6c	Prevent git autocrlf for bash scripts (#7949 )	2018-05-30 18:09:10 -04:00
Tongzhou Wang	f721481543	Fix returning scalar input in Python autograd function (#7934 ) * fix _wrap_outputs not working with scalar inputs * add a test	2018-05-30 18:08:22 -04:00
cpuhrsch	df5d01df1e	Set smaller grain size for some cases (#7941 )	2018-05-30 18:07:13 -04:00
James Reed	1f94a6eab3	[JIT] Fission and fusion passes for addmm (#7938 ) * Addmm decomposition pass * Addmm peephole pass * Fix handling of output shape in fusion pass * Add DCE to the peephole passes * add comments * maybe bugfix? * Fix GPU tests * fix py2/3 test issue	2018-05-30 18:06:58 -04:00
Richard Zou	769f5f7cfe	Handling of scalars in torch.Size (#5676 ) * Handling of scalars in torch.Size torch.Size() constructor uses python_arg_parser IntList in python_arg_parser can take iter/range Have IntList take python iterables and ranges. Address comments: don't use python_arg_parser and instead call __index__ in THPSize_pynew Address comments Address comments * Rebased * Address nit	2018-05-30 17:50:32 -04:00
Will Feng	d102f9ea18	Split CI tests in half and run them in parallel (#7867 ) * Split and run tests in parallel * Refactor tests	2018-05-30 17:42:25 -04:00
Richard Zou	8e6cd43291	Fix checkBackend error message (#7926 ) * Fix checkBackend error message Fixes #7849 * Switch order of printing args	2018-05-30 16:51:23 -04:00
Richard Zou	0656ef483d	remove sort requirement from pad-sequence (#7928 ) * pad-sequence no longer requires sorting entries pad-sequence can get the max_len from the list of sequences. entries only need to be sorted if output will be used for pack_padded_sequence, which can throw the error itself. * remove sort requirement from pad-sequence Picks up from #5974. Removes the requirement that input sequences to pad_sequence have to be sorted. Addressed the comments in the PR: - Updated docstring for pad_sequence - Remove sort requirement in pad_sequence test - Test unsorted and sorted sequences in pad_sequence test	2018-05-30 16:36:55 -04:00
Tongzhou Wang	c5b895ac50	Try to fix TORCH_CUDA_ARCH_LIST for PyTorch again (#7936 ) * try again * use DEFINED * use a loop * Minor fixes	2018-05-30 16:30:21 -04:00
gchanan	f8e83dc257	Rename cuda::type to cuda::into_type and provide cuda::from_type. (#7937 ) These are used to convert Half -> half and half -> Half respectively. from_type will be used for runtime type checking in THC.	2018-05-30 15:25:25 -04:00
James Reed	5419c6ecb7	Add unsafe flag to skip checking in prepare (#7832 ) * Add unsafe flag to skip checking in prepare * pop	2018-05-30 11:48:01 -07:00
Soumith Chintala	f4256c9605	cache and use BLAS_SET_BY_USER so that it doesn't set itself to TRUE when run second time (#7942 )	2018-05-30 11:44:23 -07:00
James Reed	c0d50e1e1f	[JIT][script] Fix emitted gather and slice for dynamic indices (#7861 ) * [JIT][script] Fix emitted gather for dynamic indices * Also fix slice * Address comments	2018-05-30 11:43:22 -07:00
anderspapitto	795f6e1077	add test for correctness of transpose fusion (#7950 )	2018-05-30 10:56:51 -07:00
Sebastian Meßmer	b3e87b1066	Fix fbcode compatibility (#7939 )	2018-05-30 13:35:46 -04:00
Tongzhou Wang	8858b1d519	Fix THCUNN SpatialDepthwiseConvolution assuming contiguity (#7952 )	2018-05-30 12:55:02 -04:00
Peter Goldsborough	4a80755834	Split up detail.h (#7836 )	2018-05-30 08:55:34 -07:00
Edward Z. Yang	15122e93bc	Test if ASAN is actually working as part of ASAN tests. (#6050 ) * Test if ASAN is actually working as part of ASAN tests. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Drop explicit use of libstdc++, we should not care. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Build with DEBUG=1 Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Increase main thread stack size when using ASAN. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-30 11:31:42 -04:00
onnxbot	4e5dec3024	[auto] Update onnx to 307995b - Update from upstream (onnx/onnx#1038 ) `307995b143`	2018-05-29 23:13:26 +00:00
Nathan Inkawhich	38dbe6e605	Updates to caffe2 operator documentation (#7917 ) * Significant updates to the operator docs in prep for merge	2018-05-29 14:38:56 -07:00
Anna Petrovicheva	c72c083151	Moved condition for dilated grouped convolutions to CUDNN convolution implementation (#7465 )	2018-05-29 22:08:41 +01:00
peterjc123	267fc43a96	Fix Windows doc for import error (#7704 ) * Fix Windows doc for import error * Fix doc again * Fix wrong format	2018-05-29 22:07:00 +01:00
Teng Li	c2fa1f363b	[c10d] MPI Process Group Implementation (#7783 ) This provides a bare-minimum MPI Process Group implementation, the commit is on top of @pietern's Gloo Process Group PR. * [c10d] MPI Process Group Implementation ref: https://github.com/pytorch/pytorch/issues/7434 * Better exception, atexit func, and addressed comments * Clang formatting changes * Static initialization and addressed comments * Added constness back * Test will now launch mpi processes if found * CMakeList Changed	2018-05-29 22:06:48 +01:00
Arya McCarthy	a8625e016a	Spelling fix in MultivariateNormal docstring (#7915 )	2018-05-29 16:53:36 -04:00
xkszltl	0951f4424a	CUDA 9.2 adds support to GCC 7.3.1. (#7880 )	2018-05-29 21:53:06 +01:00
Sam Gross	e8cc16bb92	Release GIL when copying to shared memory (#7918 ) This releases the GIL when creating and copying a THStorage to shared memory.	2018-05-29 21:51:58 +01:00
Tongzhou Wang	f70146e922	Fix SN not backprop via sigma(W), and not reusing W_u (#7905 )	2018-05-29 15:55:29 -04:00
tvn	146b951ec5	Fix seeding random module in DataLoader (#7886 ) * fix seeding random module * make base seed int * follow 0.4 idiom * add a test for random seeding	2018-05-29 15:55:04 -04:00
Orion Reblitz-Richardson	65f8465f6f	Add back cpp_build tests for Mac (#7810 )	2018-05-29 12:54:12 -07:00
Sebastian Meßmer	a0480adc79	Fix file extension (#7852 )	2018-05-29 15:52:31 -04:00
Soumith Chintala	1ce7ed2895	fix slack email link	2018-05-29 15:51:22 -04:00
Orion Reblitz-Richardson	f7458faf98	Only add BUILD_ATEN/USE_ATEN once to flags (#7845 )	2018-05-29 12:21:11 -07:00
onnxbot	5c1fcea5db	[auto] Update onnx to 7361eec - Fix Operator Tests (onnx/onnx#1044 ) `7361eec59a`	2018-05-29 19:04:36 +00:00
Vedaanta Agarwalla	215fe057ea	No Default argument to max_unpool functions (Fixes #7327 ) (#7388 ) * Fix for Issue #7327 * Added testcase for max_unpool	2018-05-29 15:02:23 -04:00
Sebastian Meßmer	49f8581745	Update from facebook (#7855 ) * [mpscnn] MPSCNNChannelShuffle att * [Easy] Adding tags as an argument to the functional layer Without it "tags" would be added as an argument to the operator. The change here is based on the assumption that there is no operator that takes "tags" as an argument. * Fix locally_connected_op schema check. Fix locally_connected_op schema check. * [C2] Add TypeAndShape inference for few more operators As desc * [c2] Shape inference should support 0 as dimension Tensors can have 0 in their dimension. * Make MockHiveReader loop over and support max_examples Replace DatasetReader with RandomDatasetReader. So that Mock Hive Reader can simulate a large data input using a small sample file as source. * Utility function to wipe cache between benchmark runs Caffe2 benchmark does not wipe out cache between runs, and this potentially creates an unrealistically optimistic picture of performance. This diff adds utility function to wipe out the cache. * Allow caffe2 GlobalInit to be invoked multiple times Allow caffe2 GlobalInit to be invoked multiple times. Will re-parse gflags and update logging levels on successive invocations, but will not re-run init functions or perform other one-time initialization. * Add Caffe2 GlobalInitIsCalledGuard to base net and operator classes Warn if caffe2's GlobalInit function has not been invoked before creating an operator or net object. This is based on discussion here: https://fb.quip.com/kqGIAbmK7vNG * Rethrow current exception on failure Rethrow current exception instead of copy constructing a new one on op failure. * Make `clone()` return subclass of List/Struct `clone()` is not working correctly when we subclass those classes * Wipe the cache before the net run the util function is copied from D7409424 will rebase once D7409424 is landed. * [Caffe2] [Mobile] Support utils/cast.h::GetCastDataType with LITE_PROTO builds * Correct includes async_polling include -> async_base include * Prepare execution flags for executor migration Making async_scheduling aware of underlying net type to prepare for executor migration * Add operator level observers into async executor Adding operator level observers into RunAsync operators' calls * Cleanup TEST_Benchmark Remove duplicate code and provide default implementation in NetBase * [C2] Fix type and shape inference for binary comparison ops As desc. * Add GlobalInit to predictor to ensure initialization is always done before prediction FACEBOOK: Redo D7651453 the correct way. Now use a static variable for the arguments passed to GLog * Remove spammy log message This method is currently used in various places inside Caffe itself. * Disable events for operators inside a chain We don't need to use events in operators within a chain because the chain is always scheduled on a single stream, keeping only first and last event for scheduling purposes * Ensure correct finish run order In rare cases we might call finishRun and trigger net's destruction while another worker is still holding shared_ptr to a thread pool, that can cause thread pool destruction from within a worker thread in case no other nets are using the pool. This diff fixes the order of calling finishRun and also changes pool() to return raw pointer to keep pool's ownership within the net * Reduce unnecessary polling Make sure we don't waste CPU by polling operators that we can set an efficient callbacks on * Squash commit of syncing 9506eeb from github to fbcode Patch xplat buck fix add virtual destructor to OptimizationPass add virtual destructor to OptimizationPass build fixes for sync build fixes for sync * Fix net tracing Fix net tracing from async_scheduling * Fix logging	2018-05-29 11:38:02 -07:00
Richard Zou	9f21ec7ca2	Add spaces to indexing error message (#7922 ) Followup to #7345	2018-05-29 13:10:06 -04:00
Sam Gross	637a044a24	Add missing ${generated_comment} (#7920 )	2018-05-29 13:08:05 -04:00
Edward Z. Yang	bc8a92d03d	Move REGISTER_CUDA_HOOKS to cpp file. (#7630 ) It's going to define a static variable, and this was a loaded footgun if another C++ file directly included this header. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-29 17:40:50 +01:00
onnxbot	fb59ce32c8	[auto] Update onnx to 385523b - Eliminate unused initializer (onnx/onnx#860 ) `385523bf1c`	2018-05-29 13:55:52 +00:00
Edward Z. Yang	6dfadfeb89	Revert "Fix error when setting multiple arch in TORCH_CUDA_ARCH_LIST" (#7914 ) * Revert "Fix error when setting multiple arch in TORCH_CUDA_ARCH_LIST (#7879)" This reverts commit 45cdb63d8b8022ab26f073d3bed718e75d2aedaf. * Disable dirty test; always run all CI runs. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-29 14:48:18 +01:00
Thomas Viehmann	42a68749bf	einsum: don't inplace modify arguments (fixes: #7763 ) (#7765 ) Thank you, Pierce Freeman, for the report and minimal example!	2018-05-29 11:26:39 +01:00
gchanan	fb23e62797	Remove templatization of PyTypeObject in THP copy storage methods. (#7811 ) * Remove templatization of PyTypeObject in THP copy storage methods. An in-progress refactoring of THStorage is collapsing the types of THStorages to not be ScalarType-specific. The revelant PyTypeObject to use for the THPStorageType is currently templatized based on the current THStorage; this doesn't work if the ScalarType is collapsed. Instead, just pass it explicitly. * Pass src type instead of dst type. * Line up columns.	2018-05-29 11:19:34 +01:00
Sam Gross	8b85b8afd7	Avoid @generated in templates. (#7858 ) * Avoid @generated in templates. We want @generated only in the build products. Otherwise, templates are locked and changes to the templates are excluded from phabricator. Also adds @generated to autograd generated files (e.g. VariableType.cpp). See #7780 * Don't try to specify the template filename in generated comment The template filename is not always the same as the generated filename.	2018-05-29 11:18:31 +01:00
onnxbot	07f55ae568	[auto] Update onnx to e570127 - update version (onnx/onnx#1034 ) (onnx/onnx#1041 ) `e5701271f0`	2018-05-29 05:52:05 +00:00
Bram Wasti	c122d271a8	[caffe2][nomnigraph] Default disable optimization passes (#7741 )	2018-05-28 15:49:25 -07:00
Tongzhou Wang	45cdb63d8b	Fix error when setting multiple arch in TORCH_CUDA_ARCH_LIST (#7879 )	2018-05-26 17:20:46 -04:00
li-roy	07a0482d80	Make size_average docs clearer (#7829 ) * Make size_average docs clearer * fix format	2018-05-26 11:18:57 -04:00
gchanan	7cd1ea8166	Make TensorMethods (fastGetSet) not depend on data type of Storage. (#7859 ) * Make TensorMethods (fastGetSet) not depend on data type of Storage. Currently, fastGetSet is implemented as macros that depend on the data type of Storage (i.e. that storage->data is real). Since we are moving to having 'void' data this won't work in the future. Also, due to the recentl C/C++ split, these are actually C++ implementations (because they require the struct definition which is C++), so we move them to a generic .hpp file and implement them as static inline functions. * Fix set functions. * Add generic to CMakeLists.	2018-05-26 11:17:40 -04:00
James Reed	5e50993be7	Better type checking for pack_padded_sequence symbolic (#7874 )	2018-05-26 11:16:41 -04:00
James Reed	af3d0e20a0	[ONNX] Fix transpose fusion logic (#7872 )	2018-05-26 11:13:15 -04:00
braincodercn	f0dc40f77e	Fix typo (#7876 )	2018-05-26 11:11:19 -04:00
onnxbot	fece8787d9	[auto] Update onnx to 789efb1 - update proto files. (onnx/onnx#1040 ) `789efb166d`	2018-05-26 09:40:53 +00:00
Yinghai Lu	d8101e8410	[Caffe2] Fix roi_align_op_gpu_test and test_layer_norm_grad_op (#7875 ) * Fix roi_align_op_gpu_test * Fix layer_norm_op_test.py::TestLayerNormOp::test_layer_norm_grad_op	2018-05-26 02:28:48 -07:00
Paul Jesse Hellemn	5c8d48c457	Properly pass xml report flags to ATen tests in Caffe2 builds (#7863 ) * Not running ATEN tests on Caffe2 builds * Keeping test directory when only aten is built * Changing to run all aten tests too * Skipping directories again * . * . * skip aten/integer_divider_test (it hangs for unknown reason)	2018-05-25 23:21:40 -07:00
onnxbot	06d5dd088d	[auto] Update onnx to ec3b679 - Re-enable mypy, Fix releasing from Windows (onnx/onnx#1037 ) `ec3b6797b7`	2018-05-26 00:41:46 +00:00
Orion Reblitz-Richardson	74246c9ba4	Potential fix for RNN test on MKL (#7862 )	2018-05-25 16:16:46 -07:00
bddppq	aae0ad58f3	Fix onnx integration tests build issues (#7856 ) * Fix onnx integration tests build issues * set -DBUILD_SHARED_LIBS=OFF for integrated builds * verbose log * non-local protobuf * . * turn back off verbose logging * Fix typo	2018-05-25 15:19:54 -07:00
Chunli	14f8cd7e3d	[JIT][script] Implement nn.Sequential that can be inlined into script modules (#7747 ) * Implement nn.Sequential that can be inlined into script modules * fix bugs * add comment * add _ConstSequential class * add script_method for forward in ConstSequential * fix build bug * refactor	2018-05-25 13:38:24 -07:00
Yinghai Lu	c5b623e5d1	Use __float2half (#7850 )	2018-05-25 13:25:56 -07:00
anderspapitto	d5c466e5ce	RNN export: add transpose to match onnx spec (#7825 ) Didn't quite get it right the first time. fixes https://github.com/pytorch/pytorch/issues/7817	2018-05-25 12:56:57 -07:00
anderspapitto	e6488bbd01	add jit/passes/onnx CODEOWNERS line (#7853 )	2018-05-25 15:52:39 -04:00
cpuhrsch	d2f98fcae9	Fix perf commits (#7848 )	2018-05-25 17:42:47 +00:00
ngimel	b1d03b795a	add launch bounds to im2col and col2im (#7779 )	2018-05-25 12:25:49 -04:00
avmgithub	0f7f27a843	fix typo from #7399 (#7846 )	2018-05-25 12:11:50 -04:00
Sam Gross	bed0ec3b21	Add missing trailing underscores	2018-05-25 08:27:25 -07:00
ngimel	8d0622ca9d	re-fix 9.2 build (#7828 )	2018-05-25 11:13:20 -04:00
Yinghai Lu	fb5cc630f6	Fix me (#7837 ) * Mini fix * No USE_MKL * Add CAFFE2_USE_EIGEN_FOR_BLAS	2018-05-25 07:38:50 -07:00
Gao, Xiang	d7c32df67f	move Subset, random_split to data, use sequence at some places. (#7816 )	2018-05-25 12:50:50 +02:00
onnxbot	ce1a65b5c2	[auto] Update onnx to 94dbb76 - Fix comma in Gemm description (onnx/onnx#1032 ) (onnx/onnx#1035 ) `94dbb76747`	2018-05-25 03:41:34 +00:00
bddppq	93b7b5dddd	Fix trigonometric_op_test failures when running in python3.6 (#7831 )	2018-05-24 19:09:35 -07:00
onnxbot	dbac3d21f6	[auto] Update onnx to b18cbd3 - remove mypy which blocks release. (onnx/onnx#1031 ) `b18cbd3364`	2018-05-25 01:29:25 +00:00
Peter Goldsborough	28b1a3852c	Add backward() to Tensor and Variable (#7774 ) * Add backward() to Tensor and Variable * Add at:: in front of Tensor * Trying to not move optional to appease windows? * Move implementation into cpp file * Undo some formatting changes	2018-05-24 17:31:41 -07:00
onnxbot	147cc05cf5	[auto] Update onnx to 8236f49 - Kezhan/update manifest (onnx/onnx#1029 ) `8236f49124`	2018-05-24 23:00:38 +00:00
Yinghai Lu	144c5d1ff3	Overwrite INTEL_MKL_DIR correctly (#7824 )	2018-05-24 15:04:25 -07:00
anderspapitto	2271e7d7ab	onnx->caffe2 output: better handling of init/pred splitting (#7820 )	2018-05-24 14:49:14 -07:00
Yinghai Lu	71bad33cc4	Match parenthesis (#7797 )	2018-05-24 13:45:23 -07:00
Peter Goldsborough	b12164005f	[C++ API] Remove virtual forward and implement Sequential based on Any(Module) (#7508 ) * Remove virtual forward * Rebase	2018-05-24 12:46:51 -07:00
Ruotian(RT) Luo	1078491502	Change is_tensor to isinstance(*, torch.Tensor) (#7814 ) Thanks!	2018-05-24 15:08:16 -04:00
onnxbot	0fddfe6c21	[auto] Update onnx to f8aa447 - update version number (onnx/onnx#1027 ) `f8aa447431`	2018-05-24 18:14:57 +00:00
onnxbot	9a736f5228	[auto] Update onnx to 640a4ec - [Easy] Fix the gen_doc.py (onnx/onnx#1024 ) `640a4ec5d2`	2018-05-24 15:33:02 +00:00
onnxbot	f88c529d06	[auto] Update onnx to 5591c95 - Enable non-static schema registration (onnx/onnx#894 ) `5591c95f68`	2018-05-24 15:31:45 +00:00
Tongzhou Wang	c946db16ec	[distributions] Always enable grad when calculating lazy_property (#7708 ) * Always enable grad when calculating lazy_property * Add test with MultiVariableNormal	2018-05-24 11:22:39 -04:00
Orion Reblitz-Richardson	4bf0202cac	[build] Have PyTorch depend on minimal libcaffe2.so instead of libATen.so (#7399 ) * Have PyTorch depend on minimal libcaffe2.so instead of libATen.so * Build ATen tests as a part of Caffe2 build * Hopefully cufft and nvcc fPIC fixes * Make ATen install components optional * Add tests back for ATen and fix TH build * Fixes for test_install.sh script * Fixes for cpp_build/build_all.sh * Fixes for aten/tools/run_tests.sh * Switch ATen cmake calls to USE_CUDA instead of NO_CUDA * Attempt at fix for aten/tools/run_tests.sh * Fix typo in last commit * Fix valgrind call after pushd * Be forgiving about USE_CUDA disable like PyTorch * More fixes on the install side * Link all libcaffe2 during test run * Make cuDNN optional for ATen right now * Potential fix for non-CUDA builds * Use NCCL_ROOT_DIR environment variable * Pass -fPIC through nvcc to base compiler/linker * Remove THCUNN.h requirement for libtorch gen * Add Mac test for -Wmaybe-uninitialized * Potential Windows and Mac fixes * Move MSVC target props to shared function * Disable cpp_build/libtorch tests on Mac * Disable sleef for Windows builds * Move protos under BUILD_CAFFE2 * Remove space from linker flags passed with -Wl * Remove ATen from Caffe2 dep libs since directly included * Potential Windows fixes * Preserve options while sleef builds * Force BUILD_SHARED_LIBS flag for Caffe2 builds * Set DYLD_LIBRARY_PATH and LD_LIBRARY_PATH for Mac testing * Pass TORCH_CUDA_ARCH_LIST directly in cuda.cmake * Fixes for the last two changes * Potential fix for Mac build failure * Switch Caffe2 to build_caffe2 dir to not conflict * Cleanup FindMKL.cmake * Another attempt at Mac cpp_build fix * Clear cpp-build directory for Mac builds * Disable test in Mac build/test to match cmake	2018-05-24 07:47:27 -07:00
Lu Fang	f9633b9542	[Caffe2] Skip some tests to unbreak CI (#7804 ) * Skip some tests to unbreak CI * Pass the opset_version to run_node * Remove the stale check_graph call, caffe2_net_to_onnx_model will invoke check_model	2018-05-24 00:12:00 -07:00
onnxbot	fdabc02644	[auto] Update onnx to 9e6e7e4 - Support opset_version in run_node (onnx/onnx#1022 ) `9e6e7e4282`	2018-05-24 06:10:07 +00:00
Peter Goldsborough	cfd70dc1cf	[C++ API] Back to reset() and fixed in-place cloning (#7796 ) * Back to reset() and fixed in-place cloning * Add final override to clone_	2018-05-23 22:11:32 -07:00
onnxbot	6df371ba2f	[auto] Update onnx to 9a37d4d - Add PRelu test cases (onnx/onnx#580 ) `9a37d4daf5`	2018-05-24 01:51:36 +00:00
onnxbot	43d87afdc2	[auto] Update onnx to d2a46da - fix gru, rnn, lstm test cases to match the specification and add some cases (onnx/onnx#920 ) `d2a46da13b`	2018-05-24 01:50:51 +00:00
Lu Fang	1289fc870d	Disable onnx backend node tests with broadcasting (#7730 )	2018-05-24 09:15:16 +08:00
bddppq	966c65859d	Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2" (#7802 ) * Revert "[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879) `4898c9e925`" This reverts commit 9c679dab5fe7cac27bb8c783fd143276e6046ef1. * Revert "Add BiasCHW fallback for GPU (#7738)" This reverts commit 14ad2e74f108d13ec98abb078f6aa7f01aae0aad. * Revert "[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566)" This reverts commit 2ebcf4bb37739733e76b754284cf8b2ffcba1c30.	2018-05-23 17:58:47 -07:00
onnxbot	9c679dab5f	[auto] Update onnx to 4898c9e - Added TensorDenotation and metadata_props for images (onnx/onnx#879 ) `4898c9e925`	2018-05-24 00:31:58 +00:00
Yinghai Lu	14ad2e74f1	Add BiasCHW fallback for GPU (#7738 )	2018-05-23 16:04:35 -07:00
Peter Yeh	2ebcf4bb37	[Caffe2] Enabling AMD GPU Backend for Caffe2 (#7566 ) * Add hip support for caffe2 core * Add MIOPEN header/wrapper to caffe2 core * Add HIP device into caffe2 PB * top level makefile change for rocm/hip * makefile scaffolding for AMD/RocM/HIP * Makefile scafodding for AMD/RocM/HIP; add makefile/utility for HIP files * caffe2 PB update for AMD/ROCM HIP device * Add AMD/RocM/Thrust dependency * HIP threadpool update * Fix makefile macro * makefile fix: duplicate test/binary name * makefile clean-up * makefile clean-up * add HIP operator registry * add utilities for hip device * Add USE_HIP to config summary * makefile fix for BUILD_TEST * merge latest * Fix indentation * code clean-up * Guard builds without HIP and use the same cmake script as PyTorch to find HIP * Setup rocm environment variables in build.sh (ideally should be done in the docker images) * setup locale * set HIP_PLATFORM * Revert "set HIP_PLATFORM" This reverts commit 8ec58db2b390c9259220c49fa34cd403568300ad. * continue the build script environment variables mess * HCC_AMDGPU_TARGET * Cleanup the mess, has been fixed in the lastest docker images * Assign protobuf field hip_gpu_id a new field number for backward compatibility * change name to avoid conflict * Fix duplicated thread pool flag * Refactor cmake files to not add hip includes and libs globally * Fix the wrong usage of environment variables detection in cmake * Add MIOPEN CNN operators * Revert "Add MIOPEN CNN operators" This reverts commit 6e89ad4385b5b8967a7854c4adda52c012cee42a.	2018-05-23 15:13:09 -07:00
jvmancuso	4352eab367	Call grad_mode.py context managers as decorators (#7737 ) * call grad_mode.py context managers as decorators * flake fixes * switch to using context manager in wrapper * fix set_grad_enabled test * removed dumb github UI whitespace * revert set_grad_enabled to normal, update tests	2018-05-23 17:39:13 -04:00
Thomas Viehmann	aa214a8b8c	catch CPU tensors in checkSameGPU (fixes #7689 ) (#7767 ) Thank you, Nikita Kitaev, for the report and example.	2018-05-23 17:28:37 -04:00
Marat Dukhan	0e9613cc49	Mark stack as non-executable in NNPACK (#7752 ) Pull new revision of NNPACK which specifies non-executable stack in assembly files. Previous revision didn't do that, and depending on toolchain could cause linker to mark stack as executable for the linked binaries.	2018-05-23 12:50:07 -07:00
ngimel	1feb1a9b88	small fixes in fusion_compiler (#7776 ) * small fixes in fusion_compiler * address review comments	2018-05-23 15:18:58 -04:00
Pieter Noordhuis	7d0de4f138	Run clang-format on c10d (#7791 )	2018-05-23 11:26:35 -07:00
yyetim	42134ee799	Allow empty storage for the 'Edge' class. (#7595 ) This commit: - Converts edge storage to an optional type. - Adds a new test in tarjans_test. - Refactors related bits in other files.	2018-05-23 10:40:29 -07:00
Pieter Noordhuis	ee5e474fcf	Process group base class and Gloo implementation (#7628 ) This is a starting point and only implements allreduce for CPU tensors. It includes most base functionality like algorithm caching (similar approach as taken in the THD GlooCache) and multi-threaded execution (new). The expectation is that function calls on the process group class are globally serialized. They execute collective functions, so members of the collective must call the same functions in the same order, or a deadlock may happen. The algorithm cache works as follows: the ProcessGroupGloo class has a cache map from algorithm keys to algorithm entries. The algorithm key is a struct with fields that make up the signature of a collective function. It includes the dimensionality of the input/output tensors, tensor device assignment, source/destination rank, etc. For collective calls with the same key, the process group will lazily initialize and then cache a Gloo algorithm instance. For now we only keep a single algorithm instance per key, but this may be revisited in the future, if we observe contention on a single key and can exploit additional parallelism.	2018-05-23 09:02:18 -07:00
Ailing	5a3f7810f8	_LRSchedulers getstate include optimizer info (#7757 ) * getstate should include optimizer * remove getstate/setstate functions	2018-05-23 11:43:42 -04:00
Tongzhou Wang	e3e15b5d95	[PyTorch] [gradcheck] change backward() to grad() (#7710 ) * Change backward calls to grad to avoid memory leak from #7343; Replace unnecesary create_graph=True with retain_graph=True * fix gradgradcheck use of make_non_contiguous * allow non-contguous target * remove unnecessray .grad.zero_() * remove contiguous_detach * fix PReLU double backward always returning ggW as a scalar * let noncontig gO require grad * move requires_grad to return	2018-05-23 11:03:12 -04:00
vfdev	6a604f16cc	Update test_nn.py (#7787 )	2018-05-23 12:28:13 +02:00
avmgithub	60d5c0eb19	Define general default scheduler for TBB and fix ppc64le bug (#7761 )	2018-05-23 12:24:33 +02:00
Ruotian(RT) Luo	2222fc7666	Add support for accepting Tensor as input in clip_grad_* functions. (#7769 )	2018-05-23 12:12:03 +02:00
bddppq	5316cad5c2	[Easy] Remove unused code (#7782 )	2018-05-22 22:32:47 -07:00
cpuhrsch	85e9ae20e5	Update tbb (#7734 )	2018-05-23 01:54:16 +00:00
Andrew Tulloch	f534339a1a	Add @generated annotation (#7780 )	2018-05-22 18:33:05 -07:00
Tongzhou Wang	ee628d64b9	fix legacy comment after variable tensor merge (#7771 )	2018-05-22 19:08:42 -04:00
Will Feng	60745b3380	Revert #7750 and #7762 to fix Windows CI on master (#7772 ) * Revert "Add missing brace (#7762)" This reverts commit ea27c5af50f6bc8ba82068e6d36ade9c773dc101. * Revert "[C++ API] Add backward() to Tensor and Variable (#7750)" This reverts commit 1e2762796f33123d86782936089dbeda37bdcc92.	2018-05-22 15:42:52 -07:00
Will Feng	8d91a602cc	Temporarily disable build env check (#7768 )	2018-05-22 12:51:00 -07:00
Peter Goldsborough	ea27c5af50	Add missing brace (#7762 )	2018-05-22 14:18:22 -04:00
Peter Goldsborough	1e2762796f	[C++ API] Add backward() to Tensor and Variable (#7750 ) * Add backward() to Tensor and Variable * Added a couple tests	2018-05-22 10:43:04 -07:00
onnxbot	e5b830eb0e	[auto] Update onnx to d43b550 - Fix .gitignore and add missing files (onnx/onnx#1005 ) `d43b55087d`	2018-05-22 17:40:43 +00:00
onnxbot	bb15a0830d	[auto] Update onnx to ea1aa13 - add tests for reduce ops (onnx/onnx#675 ) `ea1aa139b2`	2018-05-22 01:50:13 +00:00
Ben	bb34887ae3	include cudnn_h (#7749 )	2018-05-21 21:48:50 -04:00
Peter Goldsborough	549b4069bb	[C++ API] Using new registration mechanism (#7663 ) * Using new registration mechanism * Fix signature of param() in module.cpp * Remove ParameterList * Fix tests	2018-05-21 17:59:21 -07:00
onnxbot	312ab535ba	[auto] Update onnx to 5dd68e6 - Add a util function: polish_model (onnx/onnx#1000 ) `5dd68e634b`	2018-05-22 00:22:30 +00:00
onnxbot	8275e430b0	[auto] Update onnx to 169b156 - Add more missing type hints (onnx/onnx#991 ) `169b1561e9`	2018-05-21 22:08:15 +00:00
onnxbot	f01be11efd	[auto] Update onnx to b3b3b28 - Enable checking for functions that don't have a type hint (onnx/onnx#989 ) `b3b3b2851a`	2018-05-21 19:18:19 +00:00
onnxbot	c5ffc3a02c	[auto] Update onnx to 9f9316a - Catch up with type hints (onnx/onnx#988 ) `9f9316a5e2`	2018-05-21 19:17:25 +00:00
onnxbot	d02b7ab389	[auto] Update onnx to c168303 - Better error message if protoc isn't found (onnx/onnx#1004 ) `c168303031`	2018-05-21 18:38:53 +00:00
onnxbot	9506eeb73a	[auto] Update onnx to 52f7528 - add more shape inference tests (onnx/onnx#971 ) `52f75285ad`	2018-05-21 17:09:03 +00:00
Zachary DeVito	286cd04a20	JIT cleanup (#7631 ) Cleans up dead code in the JIT: * Remove interpreter_autograd_function * Remove Handles * Remove HandleBuilder * Remove creates_handles, and tracing_autograd_python_function flags * Remove unused var_args * Fix submodules	2018-05-21 10:06:29 -07:00
avmgithub	e6f7e1807d	fix to build sleef when using cmake 3.11.1 (#7679 )	2018-05-21 15:13:17 +00:00
braincodercn	5ee5537b98	Fix typo in document (#7725 )	2018-05-21 11:10:24 -04:00
onnxbot	28b592e00b	[auto] Update onnx to 6f4b1b1 - Tests for Gemm operator (onnx/onnx#885 ) `6f4b1b12e5`	2018-05-21 12:22:11 +00:00
onnxbot	987b52460d	[auto] Update onnx to c6c6aad - Enhance the 1-element broadcast case (onnx/onnx#902 ) `c6c6aad416`	2018-05-21 11:29:53 +00:00
Ailing	b4ae80d459	serialization for torch.device (#7713 )	2018-05-21 11:34:26 +02:00
peterjc123	ee6e3fe301	Fix compile flags for MSVC (#7703 )	2018-05-21 10:20:19 +02:00
bddppq	0a11018db6	Fix exporting Sum to onnx (#7685 ) * Fix exporting Sum to onnx * extend fix to prod and mean * update expect file	2018-05-20 23:37:42 -07:00
Lu Fang	a890a0be07	Renanme ZFNet to ZFNet512 (#7723 )	2018-05-21 11:37:39 +08:00
Ailing	75cf0faf4c	Implement __reduce__ for torch.dtype (#7699 )	2018-05-20 14:59:02 +02:00
bddppq	5000a05724	Remove unnecessary include in vec256_float.h (#7711 )	2018-05-20 11:23:43 +02:00
bddppq	f94ae3ba1d	Update from facebook (#7696 ) * Fix handling of empty batches in SumReduceDimsOp As titled * Deferrable async_scheduling finishRun fix Proper order of finishing run operations in deferrable_async_scheduling net * Simplify exception handling in async_scheduling Simplify exception handling, no need to busy wait, thread that processes the last task can finish the run * [C2]worker_coordinator_memorize_worker_ids As titled. This is related to T28689868, where the number of blobs we want to create is equal to the number of worker ids * Add unit test for nets with no type set * Ignore total length argument in sympolic_pad_packed_sequence 1- There was a mistake in the code that total_length was added to the wrong symbolic function (pack_padded_sequence) instead of (pad_packed_sequence) 2- No need to throw an exception if total_length is given since it is only used to enable data_parallel training on multi-gpus and doesn't have anything to do with onnx export, so just ignore it. https://fburl.com/tk4gciqp * Add support for MKLDNN to async_scheduling Just add MKLDNN as a possible CPU option to async_scheduling's pool function * [AuFL][ensemble] support branch output for prediction This diff supports using predictions from different branches and thus enables model ensembling (not fully independent). * Fix a bug in add_loss in layer_model_helper As titled. * Support lradaption for adam 1.lr adaption operator 2.apply to dense adam * Perf tweaks for async_scheduling Restore single pool option + remove unnecessary (no-ops) calls * add quantization to SparseSimdAdagradOp add a bunch of quantization signatures to SparseSimdAdagradOp, implementations to come next * [sr] [codemod] Change all SR callsites to use new API @allow-large-files This diff refactors all callsites of SR to use the slightly changed API introduced in the diff below. Really what this means is that you need to include the correct header. Also if you were using `ClientFactory::newFactory` you need to not prefix it with `ClientFactory::`. ``` cd ~/fbsource/fbcode find ./ -type f -exec sed -i -e 's:#include "servicerouter/client/cpp2/ClientFactory.h":#include "servicerouter/client/cpp2/ServiceRouter.h":' -e 's:#include <servicerouter/client/cpp2/ClientFactory.h>:#include <servicerouter/client/cpp2/ServiceRouter.h>:' -e 's/ClientFactory::newFactory(/newFactory(/g' {} \; ``` Also manually fixed spots that couldn't be done automatically (or broke because they depended on transitive includes). * Back out "Fix handling of empty batches in SumReduceDimsOp" Original commit changeset: 282da1730cc2 This commit is blocking the Github->fbcode sync, which really needs to get merged ASAP. D7881937 which this diff depends on will be reverted in the sync D7990948 which causes this to break. The sync diff cannot be patched with this reversion because it must be landed against base revision 5c8c099 , and D7881937 must not be included in the sync diff because it is breaking GPU tests that are not available in sandcastle : https://ci.pytorch.org/jenkins/job/caffe2-builds/job/py2-cuda8.0-cudnn6-ubuntu16.04-test/3638/console for one example. * Add the flow to support operator benchmark 1) generate model with the operator 2) upload to everstore 3) generate model spec into json file 4) start running the benchmark * [tum][gpu] Connect DPM trainer with flow and unit tests This diff: - Fix some small bugs for Yiming's recent changes to parallelizer, so it suits real use cases. - Add correct tags to the TUM code, so we can do data parallel transform - pass extra info when instantiation. - add unit test for using DPM in TUM model After this diff, we can do simple box, multi-gpu fully-sync trainer for TUM in Fblearner workflow, but may still need to do speed benchmarking. * w/o normalized lradaption for adam dense only The previous lr adaption includes a normalization step when performing the dot product operation. This is not exactly same as what is proposed in the paper. I add normalization as an option. Without it, the operator performs exactly what the paper proposed. With the option, we add the normalization step * [fb] Use SharedPromise in DeferrableAsyncSchedulingNet This code is to simplify DeferrableAsyncSchedulingNet by removing condition variable + small fixes * [tum] implement cuda sparseLengthsMean and LengthsMean as title * Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function. Adding an optional parameter to allow use of protobufs in InferShapesAndTypes function. * Move feature_to_index to FeatureSpec.feature_to_index move feature_to_index to FeatureSpec.feature_to_index to avoid override other fields * [Caffe2] Rename bytes_moved to bytes_written Just a rename in preparation for supporting bytes_read. * [c2] fix ReduceFrontSumOp for empty case by setting 0 otherwise, it may use the results from last iteration when it's empty batch. * [Caffe2] [Int8] Improve Intel CPU performance * [Easy] Improve PrependDim op logging as titled * DBFileReader expand db_path using os.path.expanduser(..) Since there are a lot of possible use cases of `DBFileReader` to read from user home path, like `~/local/sample.db`, I want to save people's trouble of calling `os.path.expanduser(db_path)` themselves. * [Caffe2] Add bytes_read to cost structure We're adding analytical read bytes to cost functions. This extends the structure accordingly for all CostInference defined operators. Additionally, some small bug fixes were performed: 1) Cost functions now extract type information of operands instead of assuming float * Fix sleef on aarch64 for hhvm @bypass-lint Rename flag * Remove duplicated part in caffe2/ideep/operators/conv_op.cc should be sync error * Rename test helper function test_adagrad_sparse_helper to adagrad_sparse_test_helper to avoid confusing pytest	2018-05-19 23:10:48 -07:00
ngimel	2cb096ada8	fix for cuda 9.2 builds (#7709 )	2018-05-19 21:18:48 -07:00
Gao, Xiang	42e5e12750	make BatchSampler subclass of Sampler, and expose (#7707 )	2018-05-19 21:29:03 +02:00
Peter Goldsborough	cf9b80720d	Dont emit warning for ABI incompatibility when PyTorch was built from source (#7681 )	2018-05-19 20:25:52 +01:00
Thomas Viehmann	8f97cbcf4e	remove index from python bindings (fixes: #7639 ) (#7690 )	2018-05-19 20:04:07 +02:00
Naman Jain	ee882eae8e	Update _torch_docs.py (#7700 ) Added better example	2018-05-19 11:12:02 -04:00
Paul Jesse Hellemn	48bf733480	Changes from D7881937 and D7963936 plus an edit (#7605 ) * Changes from D7881937 and D7963936 plus an edit * D8038158 * Another change from cxj	2018-05-18 20:59:16 -07:00
onnxbot	77fe4bd0b6	[auto] Update onnx to 241a350 - Type and shape inference for RNN, LSTM, GRU (onnx/onnx#937 ) `241a350272`	2018-05-19 02:04:55 +00:00
Marat Dukhan	f7d96a367b	Update NNPACK and cpuinfo submodules (#7691 ) Updated NNPACK to 42d9355 Updated cpuinfo to 1e6c8c9	2018-05-18 18:27:56 -07:00
Chunli	ec71c689fc	[JIT][script] Add matmul(@), pow(*) operator (#7648 ) add matmul(@), pow(*) operator fix bug(matmul not in py2) in @ operator * fix bugs * add get_fn help func to remove duplication in test_jit	2018-05-18 15:24:20 -07:00
Pieter Noordhuis	27ea7148fe	Updates to .clang-format (#7683 ) 1) No longer compact namespaces (revert from #5127) 2) Don't break on return type for long function declarations	2018-05-18 15:11:17 -04:00
onnxbot	2f5494ac14	[auto] Update onnx to a75fa2c - fix customized version of find Protobuf for Error when calling find_package(Protobuf) twice (onnx/onnx#901 ) `a75fa2c402`	2018-05-18 18:43:01 +00:00
onnxbot	875a5dceb0	[auto] Update onnx to 55fff7b - python setup.py typecheck (onnx/onnx#972 ) `55fff7b796`	2018-05-18 18:24:01 +00:00
gchanan	4f20a0e439	Fix various sparse transpose issues; remove dead code from Declaratio… (#7200 ) * Fix various sparse transpose issues; remove dead code from Declarations.yaml. 1) Fixes some checks in t_, transpose_ that don't allow transposing empty sparse tensors. 2) Remove out= variants from docs since they don't exist (and haven't since at least v0.3.1). 3) Unify implementations of t_, transpose_, t, transpose. 4) Move dead checking code from Declarations.cwrap to actual implementations. 5) Fix test which never tested transpose_. * Add test for error with t, t_. * Address review comments. * Fix jit tests. * Fix test_jit.	2018-05-18 19:51:41 +02:00
gchanan	7abdc303c6	Don't allow requires_grad to be set on integer Tensor constructors in… (#7185 ) * Don't allow requires_grad to be set on integer Tensor constructors in tensor_new. * Fix autograd test. * Fix test_distributions. * Fix test_jit. * Fix NN tests.	2018-05-18 19:45:10 +02:00
cpuhrsch	431c80a128	Guard sleef for AVX/AVX2 (#7678 )	2018-05-18 17:33:21 +00:00
onnxbot	cf0c585b6a	[auto] Update onnx to e050bcc - add multinomial op to ONNX (onnx/onnx#897 ) `e050bccacb`	2018-05-18 17:17:17 +00:00
Seth Hendrickson	32b23a4bfc	Throw error on tensor creation when sequence shape cannot be determined (#7583 ) * first commit * unit test * minor style edits	2018-05-18 19:14:42 +02:00
Richard Zou	e37da05bd5	Expose documentation for random_split (#7676 ) Fixes #7640	2018-05-18 17:16:25 +02:00
Thomas Viehmann	8212f576db	improve RNN docs (fixes #3587 ) (#7669 )	2018-05-18 16:41:03 +02:00
Thomas Viehmann	f7bc7007d4	return nan in max_pool/adaptive_max_pool for nan args (#7645 ) (#7670 )	2018-05-18 16:39:41 +02:00
Thomas Viehmann	bf95dff85b	Map digamma +/-inf results to nan in test (fixes #7651 ) (#7665 )	2018-05-18 16:35:00 +02:00
Richard Zou	50d8473ccc	Document dtype arg for reduce ops (#7654 ) Fixes #7039.	2018-05-18 10:30:38 -04:00
Richard Zou	c46a0c8813	add back Tensor.permute docs (#7652 )	2018-05-18 10:29:43 -04:00
Richard Zou	56e7a2cde1	Better support for adding zero-filled sparse tensors (#7479 ) Right now, if we add a zero-filled sparse tensor with another sparse tensor, both tensors must have the same "density" (dimI, dimV) and size (tensor.size()) for them to be added successfully. This relaxes that constraint so that if both tensors have the same tensor.size() and at least one is zero-filled, they can be added successfully. Before: ``` i = torch.LongTensor([[0, 1, 1], [2, 0, 2]]) v = torch.FloatTensor([3, 4, 5]).unsqueeze(1) sparse_mat = torch.sparse.FloatTensor(i, v, torch.Size([2,3,1])) zeros = torch.zeros(sparse_mat.size(), layout=torch.sparse_coo) sparse_mat + zeros RuntimeError: cadd operands have incompatible sizes or dimension types at ../src/THS/generic/THSTensorMath.c:126 ``` After: no error.	2018-05-18 10:29:27 -04:00
Mikhail Korobov	f12b8770cd	use matching tp_name for torch.device (#7673 )	2018-05-18 16:24:21 +02:00
onnxbot	c58893eb9e	[auto] Update onnx to 59b0b24 - Clarified description of Pad attribute (onnx/onnx#962 ) `59b0b24643`	2018-05-18 13:09:51 +00:00
Adam Paszke	06fa332e2b	Fix UB when converting negative floating values to uint8_t (#7644 )	2018-05-18 11:02:00 +02:00
onnxbot	4dd0aab33c	[auto] Update onnx to 3fc5f43 - move finalize function to be public. (onnx/onnx#987 ) `3fc5f43e91`	2018-05-18 08:07:12 +00:00
Marat Dukhan	47ab3f936b	[caffe2] Fix warning in net_async_tracing.cc (#7646 ) Compilers used to report a warning: caffe2/core/net_async_tracing.cc: In member function 'void caffe2::tracing::Tracer::renameThreads()': caffe2/core/net_async_tracing.cc:210:32: warning: overflow in implicit constant conversion [-Woverflow] const long numa_multiplier = 10e9; This patch fixes it.	2018-05-17 22:36:54 -07:00
onnxbot	ca860907bb	[auto] Update onnx to 8d548e2 - Update shape inference methods to throw exception (onnx/onnx#986 ) `8d548e2361`	2018-05-18 04:42:23 +00:00
bddppq	bc4feab3e3	Fix flaky atomic iter test (#7649 )	2018-05-17 21:17:29 -07:00
bddppq	5207998fc3	Fix onnx Pow export (#7657 )	2018-05-17 21:15:04 -07:00
onnxbot	93f8d98027	[auto] Update onnx to 8356ad5 - Add unit test framework for the project C++ APIs (onnx/onnx#763 ) `8356ad54e9`	2018-05-17 23:52:58 +00:00
Bram Wasti	2d313276b2	[caffe2][nomnigraph] Add registry for optimization passes (#7656 )	2018-05-17 16:33:56 -07:00
onnxbot	8c0299b5e6	[auto] Update onnx to 94ca052 - Update mypy version (onnx/onnx#968 ) `94ca052447`	2018-05-17 22:52:09 +00:00
Soumith Chintala	d4f6c84041	fix nccl distributed documentation	2018-05-17 18:03:54 -04:00
Mike Ruberry	f2295494af	Makes AccumulateGrad high priority in backwards passes (#7604 ) * Makes accumulate_grad functions high priority in backwards passes * Delegating constructor and comments * Sequence_nr ain't pretty no more * Sequence_nr ain't pretty no more	2018-05-17 23:49:15 +02:00
Peter Goldsborough	cba19e59ca	[C++ API] Implement builder style construction (#7597 ) * Implemented fused builder based construction mechanism * "weights" -> "weight" * Use int64_t instead of size_t everywhere in RNN * Extracted Conv::ExpandingSize into its own thing * Rename TORCH_PARAMETER to TORCH_ATTR * Added documentation * Fix weight names in batchnorm module	2018-05-17 17:10:15 -04:00
Teng Li	0d27d2686c	C10D: Added TCPStore to support C10D store interface (#7560 ) Reference: https://github.com/pytorch/pytorch/issues/7434 * C10D: Added TCPStore to support C10D store interface * Used pipe to terminate the store daemon and addressed all comments * Used notify/wake for wait and addressed all comments * Clean up nits * Clean up all socket states when the socket is closed	2018-05-17 13:38:06 -07:00
onnxbot	ec42a11410	[auto] Update onnx to ba86ec2 - Protobuf typing (onnx/onnx#982 ) `ba86ec2682`	2018-05-17 18:29:16 +00:00
Matt Le	562d9971c9	Add LBFGS optimization algorithm to C++ API (#7596 ) * Adding LBFGS to cpp API * Adding stop conditions * Test cases now passing and adding closure to all algs * Addressing code review * Set seeds to make optim tests more deterministic	2018-05-17 14:03:08 -04:00
Gao, Xiang	84730aa659	support <= and >= (#7633 )	2018-05-17 10:01:29 -07:00
Zachary DeVito	f7f95f1742	Reduce gen_jit_dispatch options (#7562 ) * Reduce gen_jit_dispatch options This removes the power set of options generated for IntList[k] arguments in aten_dispatch. Instead, the compiler now performs the broadcast using schema information. This substantially cuts the compile time for aten_dispatch.cpp	2018-05-17 10:00:35 -07:00
onnxbot	331a04d8eb	[auto] Update onnx to 321d874 - update output shape of RNN ops according to ONNX spec (onnx/onnx#923 ) `321d87457f`	2018-05-17 05:47:23 +00:00
onnxbot	77e8a23a29	[auto] Update onnx to a8b3316 - add exception mechanism for use in type and shape inference (onnx/onnx#983 ) `a8b3316cff`	2018-05-17 04:41:12 +00:00
onnxbot	9a1a20cb33	[auto] Update onnx to 13196bf - Shape inference for ConvTranspose (onnx/onnx#973 ) `13196bf40b`	2018-05-17 03:58:54 +00:00
James Sun	b4d5e67e5f	Add asin, acos, tan, atan operators (#7600 )	2018-05-16 18:09:26 -07:00
cpuhrsch	221e615665	Move bernoulli further into ATen (#7578 )	2018-05-16 23:20:40 +00:00
cpuhrsch	330a72581f	Update README to contain instructions on how to install mkldnn for Linux (#7625 )	2018-05-16 19:08:03 -04:00
onnxbot	3c9ded098d	[auto] Update onnx to 83f3666 - Spec clarity: Versioning (onnx/onnx#931 ) `83f366619e`	2018-05-16 22:29:20 +00:00
Adam Paszke	3238db6247	Show skipped distributed tests as skipped (#7624 ) Previously, tests that have been skipped because their backend was missing would show up as succeeded, which has been very confusing.	2018-05-17 00:23:46 +02:00
Peter Goldsborough	8f42bb65b3	Be more lenient w.r.t. flag processing in C++ extensions (#7621 )	2018-05-16 18:17:18 -04:00
Bram Wasti	f87091636d	Update .gitignore (#7622 )	2018-05-16 18:10:35 -04:00
bddppq	8f6f43f5cf	Fix rocm docker images environment variables round 2 (#7626 )	2018-05-16 14:40:07 -07:00
Will Feng	599d0fac93	Reduce MAX_JOBS for gcc 7.2 build (#7618 )	2018-05-16 17:30:09 -04:00
Yinghai Lu	64cb4fb13d	Add ChannelShuffle to IDEEP fallback (#7623 )	2018-05-16 14:02:27 -07:00
Edward Z. Yang	c3a02fd8ed	Conditionalize all of conv_op_eigen on version (#7581 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-16 14:17:25 -04:00
Adam Paszke	b45f2ff1ae	Remove CompiledFunction + clean up JIT tests (#7421 )	2018-05-16 20:03:04 +02:00
onnxbot	28b0b16f9b	[auto] Update onnx to 01745b2 - Update README.md (onnx/onnx#976 ) `01745b28fa`	2018-05-16 16:56:18 +00:00
Paul Jesse Hellemn	c425d0350b	Patches needed for sync, rebased (#7564 )	2018-05-16 11:20:14 -04:00
Eric S. Yu	7bc3414f8f	fix caffe build failed with -O0 (#7570 )	2018-05-16 11:19:15 -04:00
Matt Le	c5b9a36f1e	Make return uniform in lbfgs step (#7586 ) * Make return uniform in lbfgs step This ensures that we are returning results of the same type in LBFGS step. * Adding test case to exercise different exit points Sets the tolerance_grad to negative infinity and positive infinity to deterministically excercise the early exit branch * Fixing lint error	2018-05-16 11:16:46 -04:00
Eric S. Yu	9213336c73	fix cmake USE_ASAN (#7608 )	2018-05-16 11:10:13 -04:00
bddppq	6eec4118a3	Fix python3.6 build in caffe2 CI (#7602 ) * Fix python3.6 build in caffe2 CI * Turn off onnx protobuf type stubs generation * Revert "Turn off onnx protobuf type stubs generation" This reverts commit 618b80911a316caa69f2d774fb12ae6b24b2a6d6.	2018-05-15 23:01:18 -07:00
onnxbot	ba44231cbc	[auto] Update onnx to 3a14d83 - Improve LRN doc (onnx/onnx#965 ) `3a14d83974`	2018-05-16 05:49:21 +00:00
onnxbot	86b1e230c7	[auto] Update onnx to 061af05 - Print protobuf type stubs warning to stderr (onnx/onnx#979 ) `061af05f45`	2018-05-16 05:07:35 +00:00
bddppq	ed458fd311	Fix environment variables in rocm docker images (#7598 ) * Fix environment variables in rocm docker images * Add to .bashrc as well	2018-05-15 21:51:02 -07:00
Marat Dukhan	9213b3f739	[caffe2] Fix linking of Android unit tests (#7607 ) Android unit tests failed to link due because libnnpack and libcpuinfo appeared in the linker command line before libcaffe2. This patch somehow fixes it.	2018-05-15 21:39:37 -07:00
onnxbot	0493d49afa	[auto] Update onnx to 63234db - remove fc op. (onnx/onnx#977 ) `63234dbae6`	2018-05-16 04:15:22 +00:00
Richard Zou	c76da6494b	Drop support for MAGMA v1 (#7582 ) Fixes #7502. Test Plan: build and test Build output has this: ``` -- Checking prototype magma_get_sgeqrf_nb for MAGMA_V2 - True -- Compiling with MAGMA V2 support -- MAGMA INCLUDE DIRECTORIES: /data/users/rzou/miniconda3/include -- MAGMA LIBRARIES: /data/users/rzou/miniconda3/lib/libmagma.a ```	2018-05-15 23:57:16 -04:00
onnxbot	be145e4f5b	[auto] Update onnx to 0524595 - Do not generate protobuf python type stubs if protobuf python package is not installed (onnx/onnx#974 ) `052459560d`	2018-05-16 03:40:42 +00:00
Pooya Davoodi	56fa6ec66a	[caffe2] Change iteritems in trt/transform.py to items for python3 compatibility (#7599 )	2018-05-15 20:32:06 -07:00
Yinghai Lu	c187a5d79e	Resolve the performance issue on ConvFusion Op (#7584 )	2018-05-15 20:31:29 -07:00
Jorghi12	cd86d4c554	PyTorch AMD Build Scripts (#6625 ) * PyTorch AMD Build Script. * Python invocation for hipify * Adding individual hip fles. * Updating CWD Use the actual path for the file instead of the current working directory, which depends on where the script is invoked. * Updating folder path for amd_build * Removing previous amd_build directory * Updated setup.py to support WITH_ROCM * Renaming the files for CuDNN BatchNorm & Conv since having two .cpp files with the same name results in a linking error in the HCC compiler used for ROCm/AMD. * Removing old BatchNorm & Conv files since they've been renamed. * Updating build path to handle ROCM * Cleaned up the build path and created a FindHIP cmake file for setting up relevant hip paths. * Seperated the individual patch files to make it easier to detect issues while building. * Removed CMakeLists hip files and fixed directory structure * Adding build pytorch amd script * Merged setup patch into PyTorch setup.py & cleaned a few issues * Added information on where to download the hipify-python script. * Resolved linting issues inside of build_pytorch_amd.py * Removing many unnecessary patch files. Removing unnecessary .hip files. Fixing up the build process. * Refactored the PR for supporting HIP * Minimizing the number of changes inside individual patches. * Cleaned up patch files. * Removed patch files. * Updating patches * Removing HIP change from file. * Cleaned up patches * Added AVX/SSE avoidance due to bug with ROCms stack. Just temporary for now. * Removing the other HIP file * Removed patch file + merged ROCm into Aten/test * Removed ATen tests patch file and updated disbale_features yaml to remove headers that don't exist on the HIP stack. * Reduced the number of patches down to 14 after Edward's suggestions. * Transferred deletion of certain functions from patch to yaml file. * Set default Thrust path * Fixed aten files so we now use the templated pow/abs instead of std:: directly. * Removed error from aten/src/THCUNN/Abs.cu * Updated the locations of the cmake build files. Moved THCTensorRandom from a hip to a patch file. Added executable/library commands that can successfully handle either CUDA or HIP. * Removed hip extraction from the build script and removed the old hip file. * Replaced MACRO with function in upper level cmake. * Added empty ELSE() block to prevent the loading of a command without CUDA or HIP. Also added IF guards around torch_cuda_based_add_executable in Aten tests. * Updated aten tests. * Removed the hip include from the ATen header. * Can't throw exceptions on C++ AMP, using abort * Missing IF guards for cuda/hip executables in aten tests. * Removed a series of patch files. * Added template keyword to help out the HCC compiler. * Rebased the specific files displayed in the PR * Fixing typo. * Change flag from "WITH_CUDA" to "NOT NO_CUDA" Replacing "WITH_CUDA" with "NOT NO_CUDA" after the rebase. * Fix LoadHIP path * Updating build files after rebasing. * Reorganization after cpu/gpu separation. * Removed HIPCC from setup.py & removed -shared extra linking args. * Updated CMake / Setup build to correctly link when under ROCm stack. * Removed the unnecessary argument from Extension constructor. * Adding another test to be included with ROCm building. * Updated the setup_helpers scripts in order to get around linter error * Fix syntax issue * Solving lint issue: line too long	2018-05-15 18:38:01 -07:00
Will Feng	2de1b4488f	Run sccache in background mode and save logs to file (#7594 ) Running sccache in foreground mode seems to uniformly slow down the builds and causes virtual memory exhausted errors for gcc7.2 builds. This PR moves sccache to background mode instead and print the compilation log at the end of the build.	2018-05-15 21:21:19 -04:00
Bram Wasti	4b6c884b99	[caffe2][nomnigraph] Add optimize function to opt:: namespace that takes in a level and optimizes the graph/workspace accordingly. Adding it to predictor and speed_benchmark arguments (#7558 )	2018-05-15 15:57:06 -07:00
onnxbot	469c6c88a3	[auto] Update onnx to dc07e0f - Extend Concat/Gather/Squeeze/UnSqueeze to accept any tensor type (onnx/onnx#957 ) `dc07e0fb2f`	2018-05-15 22:26:47 +00:00
Marat Dukhan	9211790049	[caffe2] Include <array> in fatal_signal_asan_no_sig_test (#7592 ) fatal_signal_asan_no_sig_test.cc uses std::array, but doesn't include the header. It caused build error on Android.	2018-05-15 15:02:24 -07:00
onnxbot	0df84d7ec7	[auto] Update onnx to 21b56ad - mypy info (onnx/onnx#970 ) `21b56ada78`	2018-05-15 21:38:37 +00:00
Marat Dukhan	79b9bbe60f	[caffe2] Use caffe2::stod in lexer (#7591 ) std::stod causes build errors on Android	2018-05-15 14:06:24 -07:00
onnxbot	be019e4429	[auto] Update onnx to 76a288f - add script to count shape inference implementations (onnx/onnx#967 ) `76a288f098`	2018-05-15 20:54:11 +00:00
bddppq	3af3d13599	Run onnx integration tests in caffe2 CI (#7565 ) * Run onnx integration tests in caffe2 CI * verbose log * turn off onnx verbose installation log * can not install ninja * Do not use all cores to build pytorch * install tests require * pip install to user dir * use determined path to improve (s)ccache hit * Do not change path in test.sh * Add the compile cache hit trick to conda install as well * cover jenkins in CI environment detection	2018-05-15 13:25:24 -07:00
onnxbot	e65d6de16a	[auto] Update onnx to 3f80231 - Add type hints to numpy_helper_test.py (onnx/onnx#951 ) `3f80231786`	2018-05-15 20:07:18 +00:00
onnxbot	37f5b147fc	[auto] Update onnx to 037cfaa - Add type hints to test_backend_test.py (onnx/onnx#954 ) `037cfaa015`	2018-05-15 20:06:06 +00:00
Fritz Obermeyer	996886137a	Add link to TensorFlow Distributions paper (#7563 )	2018-05-15 15:46:54 -04:00
onnxbot	5748cc43ce	[auto] Update onnx to c918b4b - Add type hints to basic_test.py (onnx/onnx#947 ) `c918b4be91`	2018-05-15 19:23:53 +00:00
bddppq	d971782a03	Change code owners for onnx integration tests (#7587 )	2018-05-15 15:22:32 -04:00
Edward Z. Yang	efb7dead9d	Squelch -Werror=non-virtual-dtor (#7554 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-15 13:53:15 -04:00
onnxbot	4251e38eb3	[auto] Update onnx to b265987 - Add type hints to helper_test.py (onnx/onnx#950 ) `b26598714c`	2018-05-15 03:30:02 +00:00
onnxbot	a52eb24c42	[auto] Update onnx to bb4d582 - Add type hints to relu_test.py (onnx/onnx#952 ) `bb4d5827cf`	2018-05-15 03:24:41 +00:00
onnxbot	be7c5e573e	[auto] Update onnx to 533a84c - Add type hints to elu_test.py (onnx/onnx#949 ) `533a84c3ca`	2018-05-15 03:23:15 +00:00
onnxbot	f007392522	[auto] Update onnx to a659ab9 - Add type hints to schema_test.py (onnx/onnx#953 ) `a659ab90cc`	2018-05-15 03:22:04 +00:00
onnxbot	dbf77ef7a7	[auto] Update onnx to 28a8849 - Add type hints to onnx/test/optimizer_test.py (onnx/onnx#955 ) `28a8849127`	2018-05-15 03:20:57 +00:00
onnxbot	fb314ee150	[auto] Update onnx to 65f1811 - Fix a type error in lstm test case (onnx/onnx#959 ) `65f1811d2d`	2018-05-15 02:43:40 +00:00
Marat Dukhan	08415c42af	Replace std::to_string with caffe2::to_string in nomnigraph (#7561 ) std::to_string is not available on Android with GNU STL. We conventionally use caffe2::to_string as a portable alternative.	2018-05-14 19:37:43 -07:00
Thomas Viehmann	e1148db7f2	Implement logsumexp (fixes #2591 ) (#7254 ) * Implement logsumexp (fixes #2591) * Add logsumexp_backward, fix _out declaration. Thank you Simon and Edward for your comments!	2018-05-14 22:08:14 -04:00
cpuhrsch	05853945a4	Vectorize softmax and logsoftmax (#7375 ) This PR uses Vec256 to vectorize the softmax and logsoftmax Layers. This comes in 4 steps: log_softmax softmax log_softmax_backward softmax_backward * Vectorized Softmax and LogSoftmax * Abstractions * Style * Remove <limits> for Kernel * Perf investigations * Last cleanups	2018-05-14 22:08:00 -04:00
Paul Jesse Hellemn	44a10f2a98	Removing arch 20 + 21 (#7512 ) Should solve the shfl_xor undefined problem on cuda8 with conda and aten	2018-05-14 22:06:52 -04:00
Will Feng	4d35a40f3b	Better logging for sccache compilation failure (#7555 )	2018-05-14 22:03:38 -04:00
Peter Goldsborough	3414475653	[C++ API] Remove initialize_* functions (#7517 ) * Remove initialize_ functions * Fix clone() to recursively clone children * Small codemove	2018-05-14 18:24:58 -07:00
bddppq	bf9676180f	Update the name of env var for triggering integrated conda build (#7557 )	2018-05-14 16:28:39 -07:00
onnxbot	1666b54068	[auto] Update onnx to ac970c9 - update onnx model tests for rnn/lstm/gru (onnx/onnx#960 ) `ac970c9dcb`	2018-05-14 22:43:18 +00:00
anderspapitto	284f13b814	make sure that pytorch and caffe2 usage lines up with onnx rnn spec (#7511 )	2018-05-14 15:42:56 -07:00
Zachary DeVito	ce69d3110b	Improve script builtin checking using schema (#7311 ) Improve script builtin checking using schema * This add aten_schema.h which provides a barebones amount of type and argument information about each builtin operator * emitBuiltinCall is updated to use this information rather than aten_dispatch to ensure the operator is correct. * handling of keyword and position arguments now matches python behavior * There is no longer a requirement that kwargs be constant or that the attributes of an op must be entirely constant or non-constant * compiler now constructs a non-attributed version of the op first and then turns it into the constant-attribute version if all attributes are constants. * default arguments for builtins now work * SugaredValue::call and similar functions now have SourceRange information for their arguments so that error reporting is more accurate Notes: * This does not try to merge the builtin checking with python arg parser. Given that we will eventually have C10 schema which will replace aten_schema, we will eventually have a C++ description of the schema and working of that description directly will be the easiest form to understand. * python function calls and script method calls do not support keyword arguments yet. When we add this support we should refactor the handling in tryEmitSchema that resolves keywords into a common function. * default arguments work * keyword arguments to builtins work (still need to extend to calling python and other script methods) * much better error reporting for incorrect builtins Lift any constants to attributes on nodes when possible * Schema is usable internally in the compiler as the function signatures of script functions as well as for builtin operators. * Adds a List[T] class to better represent the arguments to cat/stack as a type rather than with custom checking. * Support kwargs for calls of script methods A future commit will be needed to add support for: * calls to script _functions_ which are currently are GraphExecutors without schema info. * kwargs to python functions, which will require refactoring python op	2018-05-14 14:46:36 -07:00
Dominik Schlösser	1f08000562	return value of LSTM example fixed. (#7534 )	2018-05-14 15:36:09 -04:00
andreh7	61afbbbd18	clamping the return value of uniform.cdf() to [0..1] (#7538 ) * fix for #7532: clamping the return value of uniform.cdf() to the range [0,1] * removed whitespace around equals to pass flake8 tests * added a test for uniform.cdf() with arguments outside support	2018-05-14 15:36:00 -04:00
Martin Drawitsch	bccb727b65	Remove wrong "input" arg from scatter_() docstring (#7550 )	2018-05-14 15:33:47 -04:00
onnxbot	a9a44faf03	[auto] Update onnx to 310b44c - Add tools for generating c++ code test coverage (onnx/onnx#938 ) `310b44c800`	2018-05-14 19:13:47 +00:00
bddppq	cf9913d569	Install torchvision before running integration tests (#7552 )	2018-05-14 11:49:10 -07:00
Will Feng	4af63916cd	Set up Caffe2 CUDA builds to use sccache (#7547 ) * Set up Caffe2 CUDA builds to use sccache * comment fix	2018-05-14 11:15:58 -07:00
onnxbot	56a63459b6	[auto] Update onnx to 330fd0f - shape inference for TopK and trigonometric functions (onnx/onnx#946 ) `330fd0f73e`	2018-05-14 04:29:19 +00:00
onnxbot	169e91c530	[auto] Update onnx to 8ff5fdb - fix def of gru version 1 (onnx/onnx#945 ) `8ff5fdbe26`	2018-05-14 03:48:22 +00:00
Mike Ruberry	fc23885105	Fixes reductions where accum type != type and simplifies all reductions (#7487 ) This PR makes two improvements: It fixes reduce kernels where accum type != type. Currently, for example, half tensors with small values may have norms that are (approximately) representable in fp16, but calling .norm() on them will result in underflow and a reported norm of zero. This PR fixes that behavior and adds a test in test_cuda.py to ensure underflow does not occur (test_tiny_half_norm). It simplifies all reductions by removing excessive templating and the -2 contiguous special case from THC_reduceDim and THC_reduceAll. The latter was previously removed from pointwise apply. This has no performance impact as the -2 special case was already mapping to the 1D code path. PyTorch currently attempts to handle accum type != type by either (1) writing kernels that immediately convert values to accum type after reading or (2) writing operations that take in type values and accumulate to the accum type. The latter path was not working properly (hence the current excessive half tensor underflow) and resulted in a lot of redundant code, with two reduce ops being passed to a kernel instead of one, and reduce ops frequently receiving the same template argument twice. This PR makes the former approach THE approach. Kernels that accumulate to (potentially) different types should follow the pattern of converting their input to the accum type, performing all operations on that type, and then converting back to the appropriate type if writing their value back to the tensor. This pattern makes the second reduce op redundant and allows for simpler templating, which should improve readability, reduce build time, and reduce binary size. Also, this prevents ops from having to perform their own conversions, which could result in poor performance if the same value was operated on multiple times. One exception to this simplification was that a new ThrustTensorDistOp was created to handle a call to thrust::inner_product(). This Op fuses the conversion and the TensorDistOp. In addition to the expected simplification, there is also some cleanup of excessive template parameters. For example, kernelReduceAllPass2() had three template parameters: T, IndexType, and ReduceOp, but IndexType was never used. * wip * Adds tests * Fixes Python linting * mean and norm fusions, code cleanup * fixes file permissions	2018-05-13 18:33:48 -04:00
onnxbot	d0287eca94	[auto] Update onnx to c50f329 - Adding shape inferences for GlobalMaxPool, GlobalAveragePool, and GlobalLpPool" (onnx/onnx#943 ) `c50f329dcd`	2018-05-13 20:36:27 +00:00
ngimel	63ae163b24	put dropout states on the input device (#7515 ) * put dropout states on the input device * add assert to aten, add test, fix lint * only assert device if states are defined	2018-05-13 16:25:37 -04:00
Thomas Viehmann	1ce5431aaf	Documentation improvements (#7537 ) - improve scatter documentation (fixes #7518) - refine KLDivLoss documentation (fixes #7464) - fix some sphinxbuild warnings Thank you, Hugh Perkins for reporting!	2018-05-13 15:44:24 -04:00
onnxbot	8f64f918f7	[auto] Update onnx to 0a6076e - Fix the opset version in backend tests (onnx/onnx#944 ) `0a6076eae6`	2018-05-13 15:46:52 +00:00
bddppq	c84fdda582	Skip onnx backend tests for inverse trigonometric ops (#7533 )	2018-05-13 08:41:28 -07:00
Edward Z. Yang	a3b2877810	Fix CUDA builds. (#7529 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-13 09:54:03 -04:00
onnxbot	825c3ca2d6	[auto] Update onnx to 4e98b03 - add trigonometric functions (onnx/onnx#869 ) `4e98b038d1`	2018-05-13 07:52:49 +00:00
onnxbot	f529b85035	[auto] Update onnx to 0bd3f78 - Add shape inference for LpPool, RoiPool, and fix MaxPool, AveragePool, and Conv (onnx/onnx#928 ) `0bd3f78bf4`	2018-05-13 05:05:49 +00:00
Edward Z. Yang	5336ea4195	Work around Python nightly regression. (#7526 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-13 00:17:36 -04:00
Edward Z. Yang	2bb38ba700	Built-in support for rebuilding in win-build.sh (#7442 ) * Built-in support for rebuilding in win-build.sh Signed-off-by: Edward Z. Yang <ezyang@fb.com> * fixups Signed-off-by: Jenkins <jenkins@ci.pytorch.org> * CR comments * CR comments * more delayed expansion fixes	2018-05-12 23:53:40 -04:00
Soumith Chintala	ac52f1186a	[minor] change dockerfile to point to pytorch channel (#6960 )	2018-05-12 23:43:09 -04:00
Mike Ruberry	37b9d093d2	Updates collapseDims() function and documentation (#7056 ) * Updates collapseDims() function and documentation * Adds C++ tests, validates input, updates names for readability * Removes invalid test * stashing to merge AT_CHECK macro * Updates asserts, removes tests on Windows	2018-05-12 23:42:55 -04:00
Thomas Viehmann	cfc1d92975	Implement ellipses ('...') and diagonals (e.g. 'ii->i') in einsum. (#7173 ) This brings the two most important missing numpy einsum features to toch.einsum.	2018-05-12 23:39:37 -04:00
Thomas Viehmann	7edd451a4e	Improve spectral_norm (fixes #7261 ) (#7298 ) * Improve spectral_norm (fixes #7261) Thank you Morgan Funtowicz for the report and minimal example! * compute sigma only once	2018-05-12 23:31:37 -04:00
Mahdi Nazemi	cf9751207e	Allow building Caffe2 with ATen support (Addresses #7249 ) (#7297 ) * Addresses Issue #7249, where Caffe2 cannot be built with ATen support * Fixed indentation	2018-05-12 23:30:46 -04:00
Richard Zou	eaa3f2e613	Fix advanced indexing with negative indices (#7345 ) * Fix advanced indexing with negative indices Fixes #7156 Here is some behavior before this PR: ``` In[1]: x = torch.arange(9).view(3, 3).contiguous() x[[0], [-1]] # Should be equivalent to x[0, -1] Out[1]: tensor([ 8]) ``` The bug is that negative indices are added to the computed linear index directly. In the above example, the linear index computed is "-1", which wraps around to "8", giving the last element of a flattened view of `x`. Instead, we should wrap negative indices around before adding them to the linear index. * Use toCLong()	2018-05-12 23:24:40 -04:00
onnxbot	2ac34b98ea	[auto] Update onnx to 490c4c6 - fix build dependency between onnx-operators.proto and (onnx/onnx#934 ) `490c4c6ca9`	2018-05-13 03:14:44 +00:00
Sam Gross	976b1d5ec1	Don't initialize the current device in CUDAGenerator::CUDAGenerator (#7392 ) Previously, CUDAGenerator::CUDAGenerator would initialize the random number generator on the current device. This would usually be device 0. This is undesirable because initialize the CUDA context allocates a few 100 MBs due to all the kernels in libTHC.so. This avoids the unecessary call to THCRandom_getGenerator() in the CUDAGenerator constructor. Fixes #7320 Previously, CUDAGenerator::CUDAGenerator would initialize the random number generator on the current device. This would usually be device 0. This is undesirable because initialize the CUDA context allocates a few 100 MBs due to all the kernels in libTHC.so. This avoids the unecessary call to THCRandom_getGenerator() in the CUDAGenerator constructor. Fixes #7320 * Fix call to get THCState	2018-05-12 22:57:06 -04:00
Edward Z. Yang	acb6f2697e	Some notes about developing on Windows (#7447 ) * Some notes about developing on Windows * typofix	2018-05-12 22:55:11 -04:00
Maxim Berman	03767b66db	Add FileNotFoundError to torch._six (#7524 ) Add FileNotFoundError for compatibility with Python 2 and use in dataloader. Fixes pytorch/pytorch#6932	2018-05-12 20:54:26 -04:00
Xiaomeng Yang	921dece2d7	Update Im2ColNd functions (#7505 ) Update Im2ColNd functions	2018-05-12 15:59:50 -07:00
Kaiyu Shi	db6e4576da	Use customized python interpreter (#7520 )	2018-05-12 13:06:39 -04:00
cpuhrsch	0337d6708c	Use SLEEF's tanh (#7513 )	2018-05-12 14:14:02 +00:00
Yinghai Lu	ed3b12e1ba	[Caffe2] Ideep net optmization passes (#7514 ) * Transform ideep net * Add conv+relu transformation * Add verification and address comments	2018-05-11 23:50:18 -07:00
onnxbot	580556dd60	[auto] Update onnx to 25b8845 - Extend AveragePool to support average count include padding (onnx/onnx#884 ) `25b8845a14`	2018-05-12 04:10:55 +00:00
Peter Goldsborough	6ada041b31	Some small fixes in C++ API (#7510 )	2018-05-11 18:56:53 -07:00
onnxbot	aced37a633	[auto] Update onnx to 7c8b3d2 - [Typing 4/5] Add type hints to onnx/backend (onnx/onnx#913 ) `7c8b3d2c75`	2018-05-11 23:19:04 +00:00
bddppq	141d81d095	Move ONNX integration tests from onnx-fb-universe to PyTorch repo (#7397 ) * Move ONNX integration tests from onnx-fb-universe to PyTorch repo * Switch to use torchvision * Delete single rnn operator tests, they have been covered in e2e tests in test_caffe2.py * Mirror the fix in onnx-fb-universe to bypass cuda check `667326d84b`	2018-05-11 15:05:18 -07:00
anderspapitto	b3f0ab3726	rnn onnx export: consolidate rnn/gru/lstm (#7506 )	2018-05-11 14:58:20 -07:00
Yinghai Lu	2863d935b9	[Caffe2] Fix of the performance issue of IDEEP (#7503 ) * Sketch fix of the performance issue of IDEEP * Revert CMakefile * Fix tests * format * comments * Print error * review comments	2018-05-11 13:43:41 -07:00
Zachary DeVito	38bc732b2d	[jit] Change interpreter/fuser to work on Variables only (#7489 ) * this removes the flag controlling whether the interpreter works on variables. * now the interpreter _always_ works on variables * constants in the IR are still _always_ non-variables, and an assert was added to ensure this. * as_tensor was split into as_variable and as_tensor since it is sometimes used to construct constants in the IR * I tried changing the IR to also always use variables but that change was much more cross cutting and fragile and I never got it working	2018-05-11 13:33:47 -07:00
Karan Dwivedi	dc0faab18d	Add zeros_ and ones_ init + tests (#7488 ) * Add zeros_ and ones_ init + tests * Dedup tests * Remove all occurences of as_variable	2018-05-11 11:07:11 -04:00
Barlas Oguz	5f96a2d26a	Add sparse gradient option to pretrained embedding (#7492 ) * Add sparse gradient option to pretrained embedding * Add sparse gradient option to pretrained embedding * Trailing white space	2018-05-11 08:44:53 -04:00
Jon Walsh	857e3f4a5e	Throw error in tensor constructor when numpy strides mismatch (#7440 )	2018-05-11 11:00:43 +02:00
Paul Jesse Hellemn	b875fb281c	Update from facebook (#7451 ) * [bootcamp] Improve "Shape" operator to support axes specification To improve .shape operator of Caffe2 to support x.shape(tensor, axes), which takes an optional int array "axes" as input. For example, x.shape(tensor, [1, 0]) will return the dimension for axis 1 and 0 following the specified order. For current version, "axes" input allows duplications and can have arbitrary length. * Back out "Add barrier net that runs before training nets" Original commit changeset: b373fdc9c30f. Need additional changes to some callers to support barrier failures. * Change warning to verbose log to reduce log spam The `LOG(WARNING)` was a bit spammy for regular use so lets just make it a `VLOG`. * Extract the shared code from different caffe2_benchmark binaries The OSS benchmark and Internal benchmark will share most functions in the benchmark. * Support MFR in sequence training As titled. * Make knowledge distillation work with using logged prediction feature as teacher label. 1) Add loading raw dense feature as teacher label. 2) Optional calibration function for teacher label 3) Add teacher label into generic unit test 4) Deprecated TTSN workflow version using feature_options to config teacher label * [C2/CUDA]: unjoined cross entropy sigmoid as desc * Add async_scheduling executor into deferrable_net_exec_test Add async_scheduling into tests and fix some exception cases * Fix Event disabled error When disabling event in RNN ops make sure we don't call Finish on disabled event from op's RunAsync * cuda ensure cpu output op can handle both TensorCPU and TensorCUDA as desc. * [C2 Core] Infer input device option in C2 hypothesis_test checkers Improve how we default input blob device options. Previously it defaults as where op lives but it is not necessarily the case. For example: CopyCPUToGPU * [C2 Op]SplitByLengthsOp CPU/GPU implementation [C2 Op]SplitByLengthsOp CPU/GPU implementation * fix undefined symbol error not sure why we're getting undefined symbol even with link_whole = True Need to figure out why but need this workaround for now * Add tools in DAIPlayground platform to help debugging models Add additional tools to allow Plauground override individual method defined in AnyExp. This will allow user to create module that specificly change certain default method behavior. An example included in this diff is deactivating test model and checkpointing. When debugging any model problems, switching off components helps me quickly narrow down the location of the bug. The technique is extensively used in task T27038712 (Steady memory increase in EDPM, eventually resulting in gloo/cuda.cu:34: out of memory) * add shape and type inference for int8 conversion operator * Fix flaky test for group_norm Fix flaky test for group_norm * Fix group_norm_op_test flaky Fix group_norm_op_test flaky * Implementation of composite learning rate policy In many state-of-the-arts deep learning works, people use a simple trick to schedule the learning rate: use a fixed learning rate until error plateaus and then switch to a different fixed learning rate, and so on. In this diff, we implemented a simple version of the composite learning rate. The user gives a set of learning rates policies and corresponding iteration nums, and the optimizer will change the learning rate policy based on the number of iterations so far. For example, the user give two learning rate policies, one is FixedLearningRate and PolyLearningRate, with an iteration number of 1k. Then the first 1k iteration, we use FixedLearningRate. For the following iterations, we use PolyLearningRate. * Split two use cases of CachedReader into two classes, DBFileReader and CachedReader # Use Cases: 1). input: DB file -> output: DatasetReader. Use DBFileReader. 2). input: Reader -> build cache DB file -> output: DatasetReader. Use CachedReader. # Changes to CachedReader: 1). Move db_path to the constructor. Because in mock reader. cache will always be built ahead. # Changes to tests: 1). Make a separate TestCase class for CachedReader and DBFileReader. 2). Make it possible to add more test functions by adding setUp, tearDown and _make_temp_path. 3). Make delete db_path more general. `db_path` could be a file for `log_file_db`, but could also be a directory for `leveldb`. * Back out "On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization" Original commit changeset: 4489c6133f11 * Fix LARS bug Fixed a bug in the LARS implementation which caused all subsequent blobs not using LARS to have the LARS learning rate multiplier applied to them. * [tum] support sparse init & add uniformFill option as title * Propagate exception for async nets Capture the exception when an exception is thrown in async nets and re-throw it after wait(). This allows exceptions to be propagated up to the caller. This diff was a part of D7752068. We split the diff so that C2 core files changes are in a separate diff. * Automatic update of fbcode/onnx to 69894f207dfcd72d1e70497d387201cec327efbc Previous import was 403ccfbd0161c38f0834413d790bad0874afbf9a Included changes: - [69894f2](https://github.com/onnx/onnx/commit/69894f2): Use op schema.all tensor types in random like definitions (#865) <Scott McKay> - [b9d6b90](https://github.com/onnx/onnx/commit/b9d6b90): Clarify random like operators (#846) <Scott McKay> - [fc6b5fb](https://github.com/onnx/onnx/commit/fc6b5fb): Refactor shape inference implementation (#855) <anderspapitto> - [b7d8dc8](https://github.com/onnx/onnx/commit/b7d8dc8): fix cmake warning message (#863) <Eric S. Yu> - [f585c5d](https://github.com/onnx/onnx/commit/f585c5d): add pytorch-operator test for tile (#831) <Wenhao Hu> - [993fe70](https://github.com/onnx/onnx/commit/993fe70): add install step (#832) <Eric S. Yu> - [68bc26c](https://github.com/onnx/onnx/commit/68bc26c): add type inference for traditional ml ops except classifier ops. (#857) <Ke Zhang> - [9cc0cda](https://github.com/onnx/onnx/commit/9cc0cda): fix string representation of scalar types (#858) <G. Ramalingam> - [1078925](https://github.com/onnx/onnx/commit/1078925): fix y in pow test case to scalar (#852) <Wenhao Hu> - [c66fb6f](https://github.com/onnx/onnx/commit/c66fb6f): Add some math function shape inference (#845) <anderspapitto> - [ff667d1](https://github.com/onnx/onnx/commit/ff667d1): Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853) <Marat Dukhan> - [11c6876](https://github.com/onnx/onnx/commit/11c6876): clear initializer names when clear initializer (#849) <Wenhao Hu> - [73c34ae](https://github.com/onnx/onnx/commit/73c34ae): Clarify FeatureVectorizer description. (#843) <Scott McKay> - [1befb9b](https://github.com/onnx/onnx/commit/1befb9b): Remove useless text in docs (#850) <Lu Fang> - [e84788f](https://github.com/onnx/onnx/commit/e84788f): Fix SELU attributes' default values (#839) <Lu Fang> - [ebac046](https://github.com/onnx/onnx/commit/ebac046): Add tile test case (#823) <Wenhao Hu> - [8b7a925](https://github.com/onnx/onnx/commit/8b7a925): a few more shape inference functions (#772) <anderspapitto> - [9718f42](https://github.com/onnx/onnx/commit/9718f42): Make the coefficient non optional for LinearClassifier (#836) <Jaliya Ekanayake> - [ef083d0](https://github.com/onnx/onnx/commit/ef083d0): Add save_tensor and load_tensor functions for Protos (#770) <Lu Fang> - [45ceb55](https://github.com/onnx/onnx/commit/45ceb55): Check if CMAKE_BUILD_TYPE set before project(). (#812) <Sergii Dymchenko> - [4b3d2b0](https://github.com/onnx/onnx/commit/4b3d2b0): [WIP] reenable shape inference tests (#834) <anderspapitto> - [22d17ee](https://github.com/onnx/onnx/commit/22d17ee): RNN tests: LSTM, GRU, SimpleRNN (#739) <Peyman Manikashani> - [de65b95](https://github.com/onnx/onnx/commit/de65b95): dimension denotation (#443) <Tian Jin> - [eccc76e](https://github.com/onnx/onnx/commit/eccc76e): fix field number issue in onnx operator proto and enable its build (#829) <Ke Zhang> - [d582beb](https://github.com/onnx/onnx/commit/d582beb): disable shape inference test to unbreak ci (#830) <Lu Fang> - [485b787](https://github.com/onnx/onnx/commit/485b787): function proto for composite op. (#802) <Ke Zhang> - [cd58928](https://github.com/onnx/onnx/commit/cd58928): specify defaults for attributes of Affine op (#820) <G. Ramalingam> - [7ee2cf9](https://github.com/onnx/onnx/commit/7ee2cf9): merge the dummy backend back into the main one (#743) <anderspapitto> - [1c03a5a](https://github.com/onnx/onnx/commit/1c03a5a): [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551) <Marat Dukhan> - [3769a98](https://github.com/onnx/onnx/commit/3769a98): Rename real model test case from VGG-16 to ZFNet (#821) <Lu Fang> * [C2]ReluN Op relu n op. tf reference: https://www.tensorflow.org/api_docs/python/tf/nn/relu6 * Call destructor when assigning a blob value * Add executor overrides Add executor overrides flag to enable migration to async_scheduling executor * Add barrier net that runs before training nets - attempt #2 Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. This change was landed previously but caused errors for some EDPM workflows - See https://fb.facebook.com/groups/1426530000692545/permalink/1906766366002237/ - because EDPM assumes any call to CreateOrCloneCommonWorld and Gloo ops are wrapped in exception handlers but in this case exception thrown in the barrier init net is not handled. To address this issue, we add _CreateOrCloneCommonWorld to the param_init_net instead of a new barrier init net. Since errors for param_init_net run is handled gracefully and re-rendezvous, it should fixes the problem. * Handle empty nets in async_scheduling Make sure we don't get stuck on empty nets * use CUDA_ARCH for conditional compile * [C2 fix] infer function for ensure_cpu_output_op * Update group_norm test to reduce flaky test * Fix lr_multiplier for GPU	2018-05-10 23:14:27 -07:00
onnxbot	947155c69d	[auto] Update onnx to b2539fc - Shape and type inference for Flatten, SpaceToDepth, DepthToSpace (onnx/onnx#930 ) `b2539fca83`	2018-05-11 02:43:57 +00:00
Orion Reblitz-Richardson	f8b5d420a4	Fix Caffe2 build with ATen CPU/GPU split (#7486 )	2018-05-10 19:28:56 -07:00
onnxbot	75f549bbef	[auto] Update onnx to 9dd2533 - Changes done internally at Facebook (onnx/onnx#909 ) `9dd2533ee3`	2018-05-10 23:34:10 +00:00
Pieter Noordhuis	d5e77fb058	Port interface of store base class from Caffe2 (#7439 ) The file store implementation is new and based on the file initialization method (which uses a single file and file locking) and the interface of the Caffe2 store handler. See #7434.	2018-05-10 16:04:19 -07:00
Qinqing Zheng	6547245f1f	Add return value to setup() function of PipedReaderBuilder (#7476 )	2018-05-10 15:39:54 -07:00
Sam Gross	6c7a8318c4	Fix Tensor.type(dtype) not preserving device (#7474 ) Note that Tensor.cuda() will stil copy the tensor to the current device if it's a CUDA tensor on a different device. Fixes #7441	2018-05-10 18:22:13 -04:00
anderspapitto	43264c3c30	add cast to ensure correct type for sequence lens argument (#7483 )	2018-05-10 14:58:00 -07:00
Lu Fang	c489c6a1da	Skip upsample onnx backend test (#7477 )	2018-05-10 13:17:24 -07:00
Bram Wasti	a2a4b229cc	[caffe2][nomnigraph] Make conv relu fusion more generic (#7437 )	2018-05-10 13:03:20 -07:00
Ethan Steinberg	9fa1dff66a	Allow the use of torch.device for loading (#7339 ) * Allow using torch.device for loading * Make recommended changes * Better tests	2018-05-10 15:50:00 -04:00
danielsimig	b6adf6871c	EmbeddingBag to handle empty bags in all modes (#7389 )	2018-05-10 15:46:57 -04:00
Will Feng	3f029224cd	hotfix: update cmake version for Linux CUDA9 builds (#7478 )	2018-05-10 15:39:57 -04:00
Konpat	9789602814	Fix excess ']' in nn.utils.rnn.pack_sequence (#7475 )	2018-05-10 14:41:17 -04:00
Zachary DeVito	93eb50c103	Mark expand nodes as implicit/explicit in trace (#7303 ) When tracing we record expand nodes. This is useful in some cases because it makes it clear a broadcast happened. However, in future runs the broadcast may be different or not needed. This change adds an attribute to expand to track if it was implicitly added. This takes the form of an unused input to expand with a default value. The execution engine then removes implicit expands before execution. Note that shape_analysis will re-add expands when it can prove by shape analysis that they will exist and this is useful for the fuser, so this change should not affect fusion passes.	2018-05-10 10:47:43 -07:00
onnxbot	c3918da523	[auto] Update onnx to 008a805 - update some model files (onnx/onnx#926 ) `008a8054fd`	2018-05-10 17:45:10 +00:00
Deyu Fu	20041e2704	better cache for nccl resourse (#6970 ) allow more than 1 device list to be stored	2018-05-10 19:42:36 +02:00
Edward Z. Yang	64834f6fb8	Split libATen.so into libATen_cpu.so and libATen_cuda.so (#7275 ) * Split libATen.so into libATen_cpu.so and libATen_cuda.so Previously, ATen could be built with either CPU-only support, or CPU/CUDA support, but only via a compile-time flag, requiring two separate builds. This means that if you have a program which indirectly uses a CPU-only build of ATen, and a CPU/CUDA-build of ATen, you're gonna have a bad time. And you might want a CPU-only build of ATen, because it is 15M (versus the 300M of a CUDA build). This commit splits libATen.so into two libraries, CPU/CUDA, so that it's not necessary to do a full rebuild to get CPU-only support; instead, if you link against libATen_cpu.so only, you are CPU-only; if you additionally link/dlopen libATen_cuda.so, this enables CUDA support. This brings ATen's dynamic library structure more similar to Caffe2's. libATen.so is no more (this is BC BREAKING) The general principle for how this works is that we introduce a hooks interface, which introduces a dynamic dispatch indirection between a call site and implementation site of CUDA functionality, mediated by a static initialization registry. This means that we can continue to, for example, lazily initialize CUDA from Context (a core, CPU class) without having a direct dependency on the CUDA bits. Instead, we look up in the registry if, e.g., CUDA hooks have been loaded (this loading process happens at static initialization time), and if they have been we dynamic dispatch to this class. We similarly use the hooks interface to handle Variable registration. We introduce a new invariant: if the backend of a type has not been initialized (e.g., it's library has not been dlopened; for CUDA, this also includes CUDA initialization), then the Type pointers in the context registry are NULL. If you access the registry directly you must maintain this invariant. There are a few potholes along the way. I document them here: - Previously, PyTorch maintained a separate registry for variable types, because no provision for them was made in the Context's type_registry. Now that we have the hooks mechanism, we can easily have PyTorch register variables in the main registry. The code has been refactored accordingly. - There is a subtle ordering issue between Variable and CUDA. We permit libATen_cuda.so and PyTorch to be loaded in either order (in practice, CUDA is always loaded "after" PyTorch, because it is lazily initialized.) This means that, when CUDA types are loaded, we must subsequently also initialize their Variable equivalents. Appropriate hooks were added to VariableHooks to make this possible; similarly, getVariableHooks() is not referentially transparent, and will change behavior after Variables are loaded. (This is different to CUDAHooks, which is "burned in" after you try to initialize CUDA.) - The cmake is adjusted to separate dependencies into either CPU or CUDA dependencies. The generator scripts are adjusted to either generate a file as a CUDA (cuda_file_manager) or CPU file (file_manager). - I changed all native functions which were CUDA-only (the cudnn functions) to have dispatches for CUDA only (making it permissible to not specify all dispatch options.) This uncovered a bug in how we were handling native functions which dispatch on a Type argument; I introduced a new self_ty keyword to handle this case. I'm not 100% happy about it but it fixed my problem. This also exposed the fact that set_history incompletely handles heterogenous return tuples combining Tensor and TensorList. I swapped this codegen to use flatten() (at the possible cost of a slight perf regression, since we're allocating another vector now in this code path). - thc_state is no longer a public member of Context; use getTHCState() instead - This PR comes with Registry from Caffe2, for handling static initialization. I needed to make a bunch of fixes to Registry to make it more portable - No more ##__VA_ARGS__ token pasting; instead, it is mandatory to pass at least one argument to the var-args. CUDAHooks and VariableHooks pass a nullary struct CUDAHooksArgs/VariableHooksArgs to solve the problem. We must get rid of token pasting because it does not work with MSVC. - It seems MSVC is not willing to generate code for constructors of template classes at use sites which cross DLL boundaries. So we explicitly instantiate the class to get around the problem. This involved tweaks to the boilerplate generating macros, and also required us to shuffle around namespaces a bit, because you can't specialize a template unless you are in the same namespace as the template. - Insertion of AT_API to appropriate places where the registry must be exported - We have a general problem which is that on recent Ubuntu distributions, --as-needed is enabled for shared libraries, which is (cc @apaszke who was worrying about this in #7160 see also #7160 (comment)). For now, I've hacked this up in the PR to pass -Wl,--no-as-needed to all of the spots necessary to make CI work, but a more sustainable solution is to attempt to dlopen libATen_cuda.so when CUDA functionality is requested. - The JIT tests somehow manage to try to touch CUDA without loading libATen_cuda.so. So we pass -Wl,--no-as-needed when linking libATen_cuda.so to _C.so - There is a very subtle linking issue with lapack, which is solved by making sure libATen_cuda.so links against LAPACK. There's a comment in aten/src/ATen/CMakeLists.txt about htis as well as a follow up bug at #7353 - autogradpp used AT_CUDA_ENABLED directly. We've expunged these uses and added a few more things to CUDAHooks (getNumGPUs) - Added manualSeedAll to Generator so that we can invoke it polymorphically (it only does something different for CUDAGenerator) - There's a new cuda/CUDAConfig.h header for CUDA-only ifdef macros (AT_CUDNN_ENABLED, most prominently) - CUDAHooks/VariableHooks structs live in at namespace because Registry's namespace support is not good enough to handle it otherwise (see Registry changes above) - There's some modest moving around of native functions in ReduceOps and UnaryOps to get the CUDA-only function implementations into separate files, so they are only compiled into libATen_cuda.so. sspaddmm needed a separate CUDA function due to object linkage boundaries. - Some direct uses of native functions in CUDA code has to go away, since these functions are not exported, so you have to go through the dispatcher (at::native::empty_like to at::empty_like) - Code in THC/THCS/THCUNN now properly use THC_API macro instead of TH_API (which matters now that TH and THC are not in the same library) - Added code debt in torch/_thnn/utils.py and other THNN parsing code to handle both TH_API and THC_API - TensorUtils.h is now properly exported with AT_API - Dead uses of TH_EXPORTS and co expunged; we now use ATen_cpu_exports and ATen_cuda_exports (new, in ATenCUDAGeneral.h) consistently - Fix some incorrect type annotations on _cudnn_rnn_backward, where we didn't declare a type as possibly undefined when we should have. We didn't catch this previously because optional annotations are not tested on "pass-through" native ATen ops (which don't have dispatch). Upstream issue at #7316 - There's a new cmake macro aten_compile_options for applying all of our per-target compile time options. We use this on the cpu and cuda libraries. - test/test_cpp_extensions.py can be run directly by invoking in Python, assuming you've setup your PYTHONPATH setup correctly - type_from_string does some new funny business to only query for all valid CUDA types (which causes CUDA initialization) when we see "torch.cuda." in the requested string Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Last mile libtorch fixes Signed-off-by: Edward Z. Yang <ezyang@fb.com> * pedantic fix Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-10 10:28:33 -07:00
Vince Jankovics	ea98256e96	Buf check_unique fix for jit (#7468 )	2018-05-10 19:27:24 +02:00
bddppq	b5a1eda7d3	guard dynamic sizes expand from peephole passes (#7436 )	2018-05-10 09:34:20 -07:00
Will Feng	6a118b21b5	Set MAX_JOBS to nproc - 1 if using sccache to compile CUDA (#7361 ) * Set MAX_JOBS to nproc - 1 if using sccache to compile CUDA * Change JOBS setting in tools/cpp_build/build_common.sh	2018-05-10 12:25:13 -04:00
Paul Jesse Hellemn	78c3d8c164	Adding yaml to docker images for Aten builds (#7430 ) * Adding yaml to docker images for Aten builds * Removing pip install of yaml due to permissions	2018-05-10 09:07:21 -07:00
Peter Goldsborough	c5de3314cf	Add name() to C++ modules (#7409 ) * Add name() to C++ modules * Use RTTI to get module name by default * Add functional.cpp to CMakeLists.txt * Call typeid() inside name() instead of constructor * Add tests and use default constructor	2018-05-10 08:52:38 -07:00
anderspapitto	ab5c391100	onnx rnn export: use spec-respecting dimensions (#7394 ) fixes https://github.com/pytorch/pytorch/issues/6879	2018-05-10 08:19:17 -07:00
Orion Reblitz-Richardson	d9671ea38e	Fix Caffe2 with ATen build (#7452 )	2018-05-10 07:57:31 -07:00
Changhan Wang	a257bd19a2	added state_dict/load_state_dict for ReduceLROnPlateau (#7201 )	2018-05-10 12:02:28 +02:00
Peter Goldsborough	4eaf5261d3	Provide default implementation of clone() in base module (#7446 )	2018-05-10 00:49:29 -07:00
Marat Dukhan	48b7f298f9	Update NNPACK and cpuinfo submodules to latest master (#7443 ) In Maratyszcza/NNPACK#140 @daquexian reported an error on Faster-RCNN model with MobileNet V2, when running with NNPACK engine. The error disappears when using the latest NNPACK and cpuinfo. Updating submodules upstream to ensure others don't hit this issue.	2018-05-10 00:20:19 -04:00
Will Feng	bd8f6bd46a	hotfix: update cmake version for OSX builds (#7456 )	2018-05-10 00:05:04 -04:00
Peter Goldsborough	3023dd25f3	Use set_type to implement type conversions in C++ API (#7408 ) * Use set_type to implement .cuda() in C++ API * Change C++ module parameter types in place * Fix bug where batchnorm state was not moved to CUDA	2018-05-09 17:01:19 -04:00
James Reed	ed111619da	[ONNX] Allow specifying only a subset of input/output names (#7427 ) * [ONNX] Allow specifying only a subset of input/output names Then we can only specify the "real" names while ignoring the names for all the parameters * fix * Update utils.py	2018-05-09 13:02:20 -07:00
James Reed	d9c74f727c	Fix ONNX tutorial specification for input names (#7433 ) * Fix ONNX tutorial specification for input names * Some more updates	2018-05-09 13:01:53 -07:00
James Reed	56077f5661	Fix CODEOWNERS precedence for ONNX folder (#7429 ) More specific paths should come later, since the last matching pattern takes precedence	2018-05-09 14:31:10 -04:00
Peter Goldsborough	23be4ac3a2	Add clang tidy tooling (#7412 )	2018-05-09 13:08:53 -04:00
Jinghui	769397eb77	[Caffe2] [feature request] Add gradient operators for IDEEP (#7234 ) * Add gradient operators for IDEEP Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Add gradient test cases for IDEEP Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Upgrade third_party/ideep Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Refine SumOp for IDEEP Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Share input buffer in fallback op if possible Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fallback ConvTranspose op for IDEEP Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix bug introduced by the patch of sharing input buffer Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Share output buffer in fallback operators Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Remove IDEEP to resolve repo issue Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Reflash IDEEP repo Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Remove redundant lines in IDEEP Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fallback operators for IDEEP (Flatten, ResizeLike, Transpose, and Reshape) Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>	2018-05-09 08:52:24 -07:00
Yann Zimmermann	97c5c0b034	add python library linking on Windows (#7157 )	2018-05-09 11:50:55 -04:00
avmgithub	f02ae65727	skip test_utils.TestFFI.test_cpu for ppc64le due to incompatible exception handling (#7422 )	2018-05-09 11:45:30 -04:00
Domagoj Alagić	f43e067128	Make optimizer not complain about parameters with requires_grad=False (#7419 )	2018-05-09 11:34:52 -04:00
Soumith Chintala	6fd252ccae	AUTOGRAD_ to TORCH_AUTOGRAD_ for macros (#7424 )	2018-05-09 10:45:05 -04:00
Isaac Ge	537cb10525	improve DataParallel/DistributedDataParallel docs (#7407 )	2018-05-09 10:30:42 +02:00
onnxbot	ef477b2b00	[auto] Update onnx to 72e15ac - [Typing 2/5] Add type hints to onnx/defs (onnx/onnx#911 ) `72e15ac46f`	2018-05-09 06:49:45 +00:00
onnxbot	dca540e455	[auto] Update onnx to ee7f97c - Add type hints to onnx/tools (onnx/onnx#910 ) `ee7f97c2b1`	2018-05-09 06:48:38 +00:00
bddppq	af23ab9b3e	Make omnigraph a public dependency of caffe2 main lib (#7402 )	2018-05-08 23:37:40 -07:00
anderspapitto	5c2015d133	onnx werror is now opt in (#7390 )	2018-05-08 21:21:34 -07:00
cpuhrsch	8dbeffab07	Add back SLEEF and also use better cmake setup. (#7341 )	2018-05-09 02:48:16 +00:00
Peter Goldsborough	7911a30081	Move #endif below magma source (#7400 )	2018-05-08 22:28:26 -04:00
James Reed	92d02a46dd	Dont do CSE on nodes with blocks (#7363 )	2018-05-08 18:00:45 -07:00
Bram Wasti	b1fbf29b52	[caffe2][nomnigraph] Change the standard transform API to take in NNModule rather than NetDef (#7308 )	2018-05-08 17:43:51 -07:00
Paul Jesse Hellemn	dc3252730e	Fixing conda builds by removing unneeded python args (#7384 )	2018-05-08 17:33:30 -07:00
Bram Wasti	3913e9ead3	[caffe2][nomnigraph] Batchnorm + Conv Fusion (#7057 )	2018-05-08 15:40:34 -07:00
Richard Zou	3185d8342e	Replace incorrect usages of "NotImplemented" (#7381 ) * Replace incorrect usages of "NotImplemented" Fixes #7266. Replaces "NotImplemented" (which is supposed to be used for binary ops) with the correct "NotImplementedError". * Address comments	2018-05-08 18:31:45 -04:00
Richard Zou	755d3105b6	Fix MultiMarginLoss equation in docs (#7383 ) Fixes #7237	2018-05-08 18:30:47 -04:00
Yinghai Lu	e3935f7509	[Caffe2] Add conv+relu fusion for MKLDNN ops (IDEEP) (#7385 ) * Add conv+relu fusion for MKLDNN ops (IDEEP) * comments	2018-05-08 14:44:53 -07:00
ngimel	8c8918c341	make half overflow checks consistent with other types (#7382 )	2018-05-08 14:40:18 -07:00
onnxbot	8f27582194	[auto] Update onnx to dee6d89 - make werror opt-in (onnx/onnx#908 ) `dee6d89781`	2018-05-08 21:22:40 +00:00
Richard Zou	71626491c4	Add batched linear solver to torch.gesv() (#6100 ) * Add batched linear solver to torch.gesv() Fixes #3164 Picks up from #4502 I moved `gesv` to ATen. Adds bindings for MAGMA's `gesv_batched` function for CUDA. For CPU, runs `THLapack(gesv)` in a for loop. The new function supports arbitrary batch dimensions (and broadcasting of those dimensions). For example, the 4-d tensor `A x B x M x M` should be treated as having batch-size `(A x B)`. The overhead of creating the magma_queue_t is: ~350000 microseconds the first time it's called and ~6 microseconds every time after that. * Tests and docs * Address comments * Address comments * Rebase * Address comments * Fix rebase * Addressed comments * Address comments * Address comments * Addressed comments	2018-05-08 17:06:27 -04:00
bddppq	f598ef9102	Add CI docker image for rocm builds (#7349 )	2018-05-08 13:41:27 -07:00
bddppq	7b66c433bc	Use a CI specific onnx namespace to catch hardcoded ones in the code (#7369 )	2018-05-08 13:40:55 -07:00
Paul Jesse Hellemn	de470d1222	Small fix needed to build Caffe2 Aten without CUDA (#7387 )	2018-05-08 15:55:03 -04:00
Richard Zou	fea95de854	Add aten::expand to the isDifferentiable list (#7350 ) This lets aten::expand be differentiable in torchscript. It was probably omitted from the list by accident in the past b/c gradientForNode does already support aten::expand. Also adds a test to check expand and its gradient in a torchscript fn.	2018-05-08 21:40:36 +02:00
Mike Ruberry	913e145340	Removes -2 special case and specialization from pointwise apply (#7366 ) * Removes -2 special case and specialization * Specialization and comment cleanup	2018-05-08 14:58:46 -04:00
bddppq	4adba42a75	[easy] minor cleanup in caffe2 jenkins test script (#7378 )	2018-05-08 11:50:48 -07:00
Paul Jesse Hellemn	9396740406	Updating condas to build for all CUDA archs (#7379 )	2018-05-08 11:45:45 -07:00
Edward Z. Yang	67e7c24479	Add note about thread-safety of registry (#7285 )	2018-05-08 10:26:28 -07:00
Orion Reblitz-Richardson	24b41da795	[build] Make ATen buildable without all Caffe2 by root cmake (#7295 ) * Make ATen buildable without all Caffe2 by root cmake * Fix typo in aten cmake * Set BUILD_ATEN from USE_ATEN as compat * Only set BUILD_ATEN from USE_ATEN when on * Have USE_GLOO only set when BUILD_CAFFE2	2018-05-08 10:24:04 -07:00
onnxbot	0aebddd476	[auto] Update onnx to 522c055 - version bump to 7 (onnx/onnx#876 ) `522c05566e`	2018-05-08 17:10:40 +00:00
Yinghai Lu	e9f6f14555	[Caffe2] Revamp the convnet benchmark code by using models from model zoo (#7351 ) * Revamp the convnet benchmark code by using models from model zoo * Move ModelDownloader to caffe2/python/models * Remove convnet_benchmarks.py	2018-05-08 08:53:52 -07:00
Yinghai Lu	2cb26bcd40	Fix type in TensortRT tests (#7357 )	2018-05-08 07:52:04 -07:00
Bram Wasti	75dbf9b113	[caffe2][build] Update python cmake flag print script (#7306 )	2018-05-08 00:34:42 -07:00
103yiran	79a4d27232	Correct the parameter annotation (#7367 ) Make the annotation keep pace with the parameter.	2018-05-08 00:31:16 -07:00
Yinghai Lu	f439ba5843	[Caffe2][nomnigraph] Generic fuse conv relu pass for nomnigraph (#7355 ) * Generic fuse conv relu pass for nomnigraph * Use it in NNPACK conversion * Comments * Change the postprocess interface to take node instead of conv op	2018-05-07 23:19:06 -07:00
Paul Jesse Hellemn	f3c8bd598d	[Caffe2] Pinning conda-numpy to 1.14 to avoid SVD issue (#7344 ) * Pinning conda-numpy to 1.14 to avoid SVD issue * Adding another leveldb test to conda's ignored tests, removing a mkl-test from this * Removing commented out section	2018-05-07 22:55:50 -07:00
anderspapitto	75651c199f	fix build (#7348 )	2018-05-07 20:43:08 -07:00
François Garillot	b6adecdeee	correct schema.Scalar's shape for a shape argument of 1 (#6493 ) The schema.Scalar class makes pretty strict assumptions (via its docstring) on the spec of the shape of its underlying object. Because of idiosyncracies of numpy indexing and the use of np.dtype, those assumptions are broken on an edge case (dtype = (scalar_type, 1)). This corrects the behavior of this edge case to conform to the spec.	2018-05-07 18:58:11 -07:00
Bram Wasti	e7116d95e0	Create README.md (#7360 )	2018-05-07 18:26:59 -07:00
Yinghai Lu	ea24c7ff1b	Remove cdft library requirement from MKL (#7246 )	2018-05-07 15:31:30 -07:00
Bram Wasti	ed6f79ccd2	[caffe2][build] Add ASAN to the debug release of caffe2 (#7107 )	2018-05-07 15:26:51 -07:00
onnxbot	edbfe02941	[auto] Update onnx to ea0e0cb - remove whitespace and semicolon (onnx/onnx#904 ) `ea0e0cb13f`	2018-05-07 22:07:27 +00:00
Bram Wasti	3642745ef9	[caffe2][nomnigraph] Add maxpool sink transform (#7207 )	2018-05-07 14:52:10 -07:00
Peter Goldsborough	8fce8673bb	Rename Container to Module in autogradpp and reorg code (#7304 ) * Rename autograd namespace to torch and change torch.h into python.h * Pave the way for torch::nn::Module * Reorganize module code structure * Undo ONNX update * Remove sleef submodule	2018-05-07 14:45:00 -07:00
onnxbot	5146bc99e4	[auto] Update onnx to 328ed3e - shape inference for logical ops (onnx/onnx#899 ) `328ed3e679`	2018-05-07 18:45:53 +00:00
Will Feng	2fdc00e41c	Use sccache for Windows build (#7331 )	2018-05-07 14:42:59 -04:00
Yan Facai (颜发才)	f1e38725bf	add `to` method for PackedSequence (#7319 ) * ENH: add to method for PackedSequence * ENH: return self if possible * TST: remove extra data * DOC: add more explanation * TST: remove extra data * DOC: minor fix	2018-05-07 14:39:03 -04:00
onnxbot	c68ae308cd	[auto] Update onnx to d05b6b4 - Just don't output opset_version in the example then. (onnx/onnx#887 ) `d05b6b46f8`	2018-05-07 18:04:01 +00:00
onnxbot	4f48b7c1ba	[auto] Update onnx to 5be6d86 - fix typos in documentation (onnx/onnx#896 ) `5be6d86654`	2018-05-07 17:44:15 +00:00
Qian Hong	bebccc0c6d	Improve math formula rendering in Poisson Distribution docs. (#7340 )	2018-05-07 18:40:01 +02:00
onnxbot	4c511075c3	[auto] Update onnx to 6fa9f1a - promote identity op given it's being used. (#892 ) `6fa9f1a58b`	2018-05-06 21:07:56 +00:00
onnxbot	f9b83f2e6c	[auto] Update onnx to c0fb725 - Spec clarity: IR.md modifications. (#720 ) `c0fb725b64`	2018-05-06 19:56:05 +00:00
Anton	56daed0a85	copy paste documentation error fixed in Softmin (#7324 )	2018-05-06 21:50:46 +02:00
Peter Goldsborough	54a4867675	Bring back C++ extension torch.h (#7310 ) * Bring back C++ extension torch.h * Fix python.h include in python_tensor.cpp	2018-05-05 14:06:27 -07:00
onnxbot	6087a5feaa	[auto] Update onnx to b0ab0d1 - function registration c++ API (#848 ) `b0ab0d1d15`	2018-05-05 14:37:10 +00:00
onnxbot	94b74d2068	[auto] Update onnx to ceb259c - Tests for ReduceLogSum (#862 ) `ceb259c903`	2018-05-05 08:36:40 +00:00
Paul Jesse Hellemn	0859f0e3e6	Pinning numpy version in conda builds (#7314 )	2018-05-04 16:38:53 -07:00
onnxbot	1f14d681dd	[auto] Update onnx to 1c600f8 - Lint the code and fix the CI (#895 ) `1c600f802d`	2018-05-04 22:50:30 +00:00
onnxbot	ea12702e02	[auto] Update onnx to 278ef5b - inference for math ops (#893 ) `278ef5bc9c`	2018-05-04 21:51:16 +00:00
onnxbot	56ed857f1b	[auto] Update onnx to f708d41 - type and shape inference for experimental ops (#890 ) `f708d41fea`	2018-05-04 21:50:10 +00:00
jfan-uber	f06fcc6efa	Fix bug that introduced in pull #3280 (#7292 ) Apparently get() is a function of requests, not a module (not sure if in the past get() used to be a module). Therefore, the syntax in #3280 will alway fail with ImportError, and requests lib will never be used (kind of defeat the purpose of that pull request). Also, if requests lib is used, should add stream=True parameter, otherwise requests.get() will load the whole response into memory.	2018-05-04 14:14:02 -07:00
onnxbot	e1c7e6dce2	[auto] Update onnx to 38eea57 - add ONNX_NO_WERROR as option (#891 ) `38eea57313`	2018-05-04 21:04:54 +00:00
anderspapitto	67a9948d87	Refactor rnn export (#7263 ) * rnn refactor: extract rnn weights and biases * rnn refactor: make rnn with converted outputs * rnn refactor: finish it off	2018-05-04 14:00:09 -07:00
Tongzhou Wang	55b8317f1d	Update gif with new logo (#7301 ) * Update gif with new logo * add requires_grad=True	2018-05-04 16:47:08 -04:00
Richard Zou	24681a8e49	Update unstable docs logo to new logo. (#7305 ) Fixes #7302	2018-05-04 16:44:58 -04:00
Peter Goldsborough	feb64b5291	Add -Wno-unknown-pragmas (#7291 )	2018-05-04 13:44:13 -07:00
Richard Zou	3369828bfa	Clarify patience in ReduceLROnPlateau docs (#7242 ) * Clarify patience in ReduceLROnPlateau docs It's unclear which definition of patience we have. The two ways to interpret it are: - How many bad epochs can you see before you start considering changing the learning rate. - How many bad epochs can you see before you change the learning rate. This PR clarifies the docs with an example. If `patience = 2`, then after 2 bad epochs, we begin considering changing the learning rate. After seeing one more epoch (the 3rd epoch), if that epoch is also bad, then we change the learning rate after it. * address comments	2018-05-04 16:39:26 -04:00
Tongzhou Wang	ac5d7bdf62	Fix onnx.symbolic.upsample_bilinear2d not considering align_corners (#7264 )	2018-05-04 16:38:38 -04:00
bddppq	0dd2521d4c	Fix ONNX export for AveragePool with count_include_pad=True (#7279 )	2018-05-04 13:21:32 -07:00
Paul Jesse Hellemn	0259d9c8d3	Changing underscores to hypens in conda package names (#7299 )	2018-05-04 12:50:41 -07:00
Sergei Lebedev	a0c1e5faea	Change the error message in pad_sequence to be more user-friendly (#7283 )	2018-05-04 12:29:21 -07:00
gchanan	36a3f0995b	Remove THDTensorDescriptor_newFromTH{X}Tensor. (#7287 ) They don't seem to be used and we are moving to a single TensorImpl model.	2018-05-04 12:22:19 -07:00
Lu Fang	833b1e6c74	Skip the test case on ReduceLogSum (#7293 )	2018-05-04 11:49:30 -07:00
anderspapitto	026cb9d2f1	set ONNX_NO_WERROR (#7296 )	2018-05-04 11:35:15 -07:00
ngimel	a015d579dd	move softmax/logsoftmax to ATen (#6786 ) * move softmax/logsoftmax to ATen * specify cpu and gpu accum types * use accreal for CPU * expose softmax backward to python, fix legacy interface * fix Distributions.cu to use common AccumulateType * fix cuda 8 build * delete commented out lines * rebase on master, fix breakages	2018-05-04 14:23:35 -04:00
Zeming Lin	5c575a1497	Fixes RNN shapes for C++ API (#7272 )	2018-05-04 14:00:30 -04:00
anderspapitto	9e3f5bb5fd	enable onnx shape inference when converting onnx -> caffe2 (#7260 )	2018-05-04 10:27:30 -07:00
Edward Z. Yang	157d7499e7	Disable two flaky C++ API tests. (#7290 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-04 10:23:52 -07:00
onnxbot	46d0140d94	[auto] Update onnx to 541512b - tests for type and shape inference for Random generator ops (#880 ) `541512b93a`	2018-05-04 16:02:33 +00:00
Edward Z. Yang	4abb229960	Double-dispatch copy. (#7197 ) * Double-dispatch copy. In order to split ATen's CPU/CUDA code into two separate libraries which don't require a build flag (AT_CUDA_ENABLED) to separate them, we need to be able to split source files based on whether or not they handle CPU functionality only, or also touch CUDA. Copy poses a unique challenge here, because the naive implementation involves writing a matrix for all combinations of CPU/GPU in a single file. This PR splits up Copy.cpp into CPUCopy.cpp and CUDACopy.cpp, respecting the following matrix: to\from CPU CUDA +--------------------------- CPU \| CPUCopy.cpp CUDACopy.cpp CUDA \| CUDACopy.cpp CUDACopy.cpp When you run x.copy_(y) where x is CPU and y is CUDA, we do a second virtual dispatch to copy_from(y, x) on y's type, so that we can get from CPUCopy.cpp to CUDACopy.cpp The new autogenerated code for CPU looks like this: Tensor & CPUByteType::s_copy_(Tensor & dst, const Tensor & src, bool non_blocking) const { // code generated by copy_wrapper checked_cast_tensor<CPUByteTensor>(dst.pImpl, "dst", 0, false); switch (src.type().ID()) { case TypeID::CPUByte: THByteTensor_copyByte(static_cast<CPUByteTensor>(dst.pImpl)->tensor, static_cast<CPUByteTensor>(src.pImpl)->tensor); break; case TypeID::CPUChar: THByteTensor_copyChar(static_cast<CPUByteTensor>(dst.pImpl)->tensor, static_cast<CPUCharTensor>(src.pImpl)->tensor); break; ... default: return src.type().s_copy_from(src, dst, non_blocking); Notice that the fall through goes to s_copy_from. s_copy_from is like s_copy but the arguments are reversed. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Lintfix and no-CUDA fix Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix compilation erorr. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CR Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-04 11:58:22 -04:00
Orion Reblitz-Richardson	053b68c4da	Fix USE_ATEN flag in caffe2 (#7252 )	2018-05-04 08:30:08 -07:00
Peter Goldsborough	67d0d14908	Rename autograd namespace to torch and change torch.h into python.h (#7267 ) * Rename autograd namespace to torch and change torch.h into python.h * Include torch.h instead of python.h in test/cpp/api * Change some mentions of torch.h to python.h in C++ extensions * Set paths directly, without find_path	2018-05-04 08:04:57 -07:00
cpuhrsch	bcffb5aa1d	Remove SLEEF and all dependent code paths (#7268 ) Temporarily remove this dependency.	2018-05-04 14:41:09 +00:00
Adam Paszke	0829d4502d	Trace size-dependent expressions correctly (#6554 ) This makes the JIT tracer much more robust, by allowing it to record dependencies on tensor sizes. For example, if you were to trace this function def fn(x): return x.view(x.size(1), -1) before this patch, then it would embed the actual value of x.size(1) in the trace as a constant, making it very hard to have e.g. batch size independent traces. Now, this will correctly record the dependency, and will retrieve the size of x at every run.	2018-05-04 10:55:39 +02:00
Adam Paszke	da654337e0	Add support for type annotations in Python functions (#7009 )	2018-05-04 10:54:19 +02:00
vfdev	6363faf184	Fix issue #7209 in DataLoader (#7265 )	2018-05-04 10:51:46 +02:00
onnxbot	159c75a2ca	[auto] Update onnx to e35126b - add type inference function for classifier ops. (#882 ) `e35126bc4b`	2018-05-04 08:03:07 +00:00
onnxbot	739d3d48ec	[auto] Update onnx to 7ee7d0b - enable Werror=sign-compare on linux (#867 ) `7ee7d0b57a`	2018-05-04 08:02:14 +00:00
onnxbot	d856bfc1bf	[auto] Update onnx to e35126b - add type inference function for classifier ops. (#882 ) `e35126bc4b`	2018-05-04 06:47:08 +00:00
Fritz Obermeyer	98c24fae6b	Fix broadcasting error in LogNormal and TransformedDistribution (#7269 )	2018-05-03 23:03:51 -04:00
Christian Sarofeen	8325206c6f	A clip grad fix for sparse tensors. (#7257 )	2018-05-04 00:35:32 +02:00
bddppq	a95b7b13f9	Extend support to arbitrary ops in init net when converting c2 models to onnx (#7256 )	2018-05-03 15:34:47 -07:00
Adam Paszke	8091388d0f	Add support for __floordiv__ and __rdiv__ for integral tensors (#7245 )	2018-05-03 23:34:59 +02:00
Tongzhou Wang	371cc1e2db	update the gif for 0.4 (#7262 )	2018-05-03 14:23:08 -07:00
Soumith Chintala	92f54e1f01	remove static libstdc++ linking and PYTORCH_BINARY_BUILD env variable (#7259 )	2018-05-03 12:32:57 -07:00
Xiaomeng Yang	3ae92b3a8b	Fix lint errors (#7247 )	2018-05-03 12:17:23 -07:00
Bram Wasti	e625ecc41f	[caffe2][nomnigraph] Fix NNPack conv-relu fusion for ping-pong naming, (#7199 ) add test for it and make tests python3 compatible	2018-05-03 12:12:24 -07:00
Martin Tutek	c96f2624a2	Speedup sparse init (#6899 ) * Sparse initialization speedup * +empty line * simplify indexing * Can't reproduce locally... * Can't reproduce locally...+ * Can't reproduce locally...+ * Fix test, cleanup	2018-05-03 14:29:12 +01:00
Edgar Andrés Margffoy Tuay	4ab6ea5b1f	Add unbuffered flag to distributed node launcher (#7226 )	2018-05-03 11:49:06 +02:00
Thomas Viehmann	79245306c7	Fix onnx sum (#7232 ) * fix onnx ReduceSum generation * allow handle_only_zero_dim to return none to make mypy happy	2018-05-03 00:18:16 -07:00
Daniel Bermond	f9393ffc90	Remove unneeded entry for NCCL in .gitmodules (#7216 ) NCCL currently is not a git submodule. The NCCL source code is bundled in 'third_party/nccl'. Closes #7150	2018-05-03 00:07:58 -07:00
Thomas Viehmann	c4078b42b4	Add docstring for Tensor.tolist (Fixes #7095 ) (#7182 )	2018-05-02 23:58:32 -07:00
ngimel	6538ae5c16	clean up runtime dockerfile, use cuda 9 package (#7230 )	2018-05-02 23:54:05 -07:00
Peter Goldsborough	7c70c3bdca	Fixes for C++ build on macOS (#7192 ) * Fix C++ build on Mac * Enable CI on Mac * Create NO_API switch to only build jit without api * More fixes * Fixes to CMake	2018-05-02 23:06:04 -07:00
Paul Jesse Hellemn	1313791015	Need an explicit flag since opencv is on by default (#7225 )	2018-05-02 21:00:34 -07:00
Orion Reblitz-Richardson	aa38ae303d	[build] Setup to build ATen from root CMake file (#7163 ) * Setup to build ATen from root CMake file * Move aten/src/TH/cmake into cmake/Modules * Add special code path for FindMKL for merge	2018-05-02 19:33:31 -07:00
gchanan	681baa9254	Restore warning to torch.range. (#7194 ) Also, get rid of warning specification in Declarations.cwrap, which currently has no effect.	2018-05-02 21:53:00 -04:00
Thomas Viehmann	07513cfd1d	implement sum over multiple dimensions (fixes #2006 ) (#6152 )	2018-05-02 21:50:29 -04:00
Zeming Lin	e25e501bea	Fix build for osx (#7187 ) For some reason, this used to build in autogradpp but requires us to put the declaration in the .cpp in PyTorch.	2018-05-02 21:08:14 -04:00
Paul Jesse Hellemn	d154d32890	Fix to a conda hack (#7212 )	2018-05-02 17:35:15 -07:00
Paul Jesse Hellemn	8ac6856e54	Removing features for a sec (#7211 )	2018-05-02 17:11:19 -07:00
Paul Jesse Hellemn	faef70b5b0	Fixing a bug in my bug fix (#7210 )	2018-05-02 17:02:24 -07:00
onnxbot	a10870a2d1	[auto] Update onnx to 676e0c7 - Type and shape inference for generator ops (#871 ) `676e0c7726`	2018-05-02 23:36:33 +00:00
Yangqing Jia	83622abd9f	Reroute aten to use the root cmake system (#7188 )	2018-05-02 16:25:56 -07:00
Paul Jesse Hellemn	1ca6e77615	Fix to comput_70 error + some more lowercasing (#7205 )	2018-05-02 15:34:35 -07:00
li-roy	93242d320f	fix scale on some tensors (#7189 )	2018-05-02 15:33:02 -07:00
Xiaomeng Yang	a61d4a3374	[Caffe2] Refactor reduce ops to take flexible input types (#7164 ) * Refactor reduce ops to take flexible input types * Add DISPATCH_FUNCTION macros in common_gpu.h * Use macros to reduce switch case in dispatching cuda functions	2018-05-02 12:08:38 -07:00
Takayoshi Nishida	197412fa8f	Fix typo in comment (#7183 )	2018-05-02 11:58:30 -07:00
Edward Z. Yang	619a56bf21	Emergency new fork for ideep (upstream lost commits). (#7191 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-02 14:50:47 -04:00
cpuhrsch	88a705555a	Add SLEEF for float and double (#6725 )	2018-05-02 18:40:44 +00:00
Paul Jesse Hellemn	4d2693973e	[Caffe2] Turning on ATEN for Caffe2 in integrated builds (#7169 ) * Turning on ATEN for Caffe2 in integrated builds * Adding slim version * Fixing missing name suffix, fixing conda tests	2018-05-02 11:16:29 -07:00
Soumith Chintala	1904058370	update logos (#7184 )	2018-05-02 10:56:20 -07:00
onnxbot	e6330559c8	[auto] Update onnx to c7055f7 - update defs for reduce, rnn, and tensor depth-space ops (#847 ) `c7055f721c`	2018-05-02 16:41:28 +00:00
Edward Z. Yang	604f907bc7	Restore filename and line number on AT_ASSERT. (#7152 ) AT_ASSERT is an internal, PyTorch specific error, so we should give a little more debug information (than with the ordinary errors.) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-02 07:49:31 -07:00
Zachary DeVito	f07f24db0b	Change unique name so that you are guarenteed: (#7166 ) ``` JIT_ASSERT(v->setUnique(x)->uniqueName() == x); ``` This works by changing any other value in the graph with name x to a different name. This mirrors llvm behavior and is useful when you want to ensure some names have particular values.	2018-05-02 07:32:01 -07:00
Pieter Noordhuis	ebebfce681	Minor THD cleanup (#7161 ) * Remove stale THD README * Move common THD dependency into THD/base The master_worker directory now no longer contains files that are needed for building other parts of THD.	2018-05-02 07:29:27 -07:00
cpuhrsch	414e0b4b6f	Split up CPUApplyUtils for perf (#7168 )	2018-05-02 14:22:36 +00:00
Lu Fang	664fe34e0a	[Caffe2][fbcode=>GH sync] Update from facebook 4323b18ce13c (#7116 ) * [fix] Re-enable events in RNN ops We have earlier added event disabling in RNN ops as back then we didn't use events, with current use cases this is no longer true (https://fburl.com/8vd0lp8y) * use ops with cude impl * Revert D7729695: [caffe2][fix] Re-enable events in RNN ops This reverts commit 4b215c7496fb724656ff4c776933a15bdbbcde5e @bypass-lint An infra SEV is better than not reverting this diff. If you copy this password, see you in SEV Review! @cause_a_sev_many_files * [observer] Clean up observer_config.h #accept2ship * [1/n] Refactor dataio_test.py Replace code duplication with a common function * Add barrier net that runs before training nets Add a synchonize barrier net that is run before training nets. With this net, shards that are faster will wait for other shards before start training. This reduce chances of the faster shards timing out during GLOO AllReduce. Removed explicit data_parallel_model.py.synchronize call in holmes workflow. Similar change in speech/asr_training workflow will come in another diff. * Support the dnnlowp backend in caffe2_benchmark This is for SHARE operator latency evaluation * Migrate integral_image_op to main caffe2 migrate integral_image_op(GPU version) given by https://fburl.com/yvqezigi to caffe2/caffe2/operators and implement its CPU version. Write up a test using the hypothesis_test mechanism * [pos_disc, fbcode] Implement unjoined lr loss As explained in https://our.intern.facebook.com/intern/wiki/Model_Based_Calibration/, when the dataset is an joined data set, where labels might change later, we need to use unjoined logloss. The implementation is almost the same as in Sigrid (https://fburl.com/1trngsls), where loss = y (log(p) - log(1-p)) + (1-y)(log(1-p)) = xy - (1-y)x - (1-y)log(1+exp(-x)) For x < 0, to ensure stability and avoid overflow, we reformulate the above exp as loss = xy - (1-y)x - (1-y)x + (1-y)log(1+exp(x)) = xy + (1-y)log(1+exp(x)) Then the final expression becomes loss = xy + (y - 1) x (x >= 0) - (1 - y) log(1 + exp(x - 2 x (x >= 0))) where y is the true label, x is the dot product and p = logistic(x). This kind of implementation is align with the current implementation of the original cross entropy in https://phabricator.intern.facebook.com/diffusion/FBS/browse/master/fbcode/caffe2/caffe2/operators/cross_entropy_op.cc;0bae3b5d0f825897c5e0dd0ff10f489d7271bf25$7-13 * Keep the array to fix the conflict * [C2] Compute Adagrad effective LR The AdagradWithLR op outputs an extra blob which is contains the average effective learning rate across all weights in this blob. * Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs 1. Open-source extractMetaNetDef and runGlobalInitialization, for use in 2. new Predictor constructor from db file. 3. Add new run function that returns outputs as TensorMap * Disable eigen cpu Disable eigen cpu in transpose and reduce * Introduce request_only/object_only property of ModelLayer by default this is False * A simple TC Caffe2 benchmark We can run tunner, get MappingOptions and then use them to compare against cuBLAS currently broken due to LLVM issues. How to run: hg checkout eec1ab31b59c03b8deded1c755a9abaf8c45be01 add D7401202 add D7434625 add D7506031 add D7540728 buck run @mode/dev-nosan tc/tc/benchmarks_python:caffe2_benchmark * Move Caffe2 feature_maps_ops to open source Need feature maps operators in open source project facebookresearch/BlueWhale * Manually fix the conflicts in channel shuffle op * Fix the inconsistency between different gh and fbcode * Skip Adagrad GPU Test (Because some gpu implementation is missing) * Fix another test to make sure it won't run on gpu when implementation is not available yet	2018-05-01 20:49:00 -07:00
Bram Wasti	967c4a0c18	[caffe2][nomnigraph] Fix NNPACK relu fusion for inplace relu (#7124 )	2018-05-01 16:26:54 -07:00
Bram Wasti	20666feb2c	[caffe2][nomnigraph] Add compatibility for MSVC, which lacks some C++11 language features (#7158 )	2018-05-01 16:26:20 -07:00
gchanan	f3c76b9b78	Remove specifications from Declarations.cwrap that have no effect and are already handled. (#7147 ) These changes are already handled, either in native functions or via resize specifications in Declarations.cwrap. The resize_ one is technically not handled, although in TH it is checked if the storage is actually reallocated; this is less strict, but seems okay.	2018-05-01 19:10:31 -04:00
cpuhrsch	a9f2ee0817	CPUApplyUtils is faster if iterate is split into two steps (#7148 )	2018-05-01 22:32:02 +00:00
Bram Wasti	9ba503ac9c	[caffe2][nomnigraph] Add ability to pass the old net to convertToCaffe2Proto (#7149 )	2018-05-01 15:31:07 -07:00
Edward Z. Yang	1418cc72d6	Make refcount in THMapInfo atomic. (#7135 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-01 18:14:46 -04:00
Edward Z. Yang	a5e1d4a049	Delete dead header (#7153 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-01 18:14:06 -04:00
Xiaomeng Yang	08a853b02c	Add rsqrt op in caffe2 (#7154 )	2018-05-01 15:06:53 -07:00
onnxbot	a8b059edcc	[auto] Update onnx to 69894f2 - Use op schema.all tensor types in random like definitions (#865 ) `69894f207d`	2018-05-01 21:30:49 +00:00
Xiaomeng Yang	762eb3ddc8	[Caffe2] Add moments op in caffe2 (#7114 ) * Add moments op in caffe2 * Use rsqrtf in float for group_norm * Add docs for default behavior when axes is not provided. * Update group_norm_op by using Eigen::sqrt on CPU	2018-05-01 12:19:08 -07:00
Paul Jesse Hellemn	323e3aca47	A small fix for aten cmake (#7141 )	2018-05-01 12:12:29 -07:00
Bram Wasti	dfe1bae3cd	[caffe2][nomnigraph] Move tests to proper gtest suite (#7046 )	2018-05-01 12:00:43 -07:00
Peter Goldsborough	bcadf92ad5	Move codegen from setup.py to CMake for C++ libraries (#7121 ) * Generate code without setup.py for C++ build * Move code generation to CMake * Set DEPENDS files correctly * Fix some errors in codegen * Fix blank line lint	2018-05-01 11:30:13 -07:00
Luca Antiga	5d3c3c53aa	Add raw IR serialization/deserialization (#6392 )	2018-05-01 20:21:29 +02:00
onnxbot	ca8ee4c1e1	[auto] Update onnx to b9d6b90 - Clarify random like operators (#846 ) `b9d6b90a64`	2018-05-01 17:54:27 +00:00
gchanan	2a18e7c45b	Have python dispatch respect 'auto_gpu' and 'with_gil'. (#7137 )	2018-05-01 13:51:02 -04:00
gchanan	8031da5479	Implement torch.as_tensor, similar to numpy.asarray. (#7109 ) * Implement torch.as_tensor, similar to numpy.asarray. torch.as_tensor behaves like torch.tensor except it avoids copies if possible; so also somewhat like tensor.new but without the size overloads. I didn't add a requires_grad field, because we haven't decided on the semantics such as as_param. * Remove requires_grad for doc.	2018-05-01 12:54:43 -04:00
onnxbot	1f5b392da0	[auto] Update onnx to fc6b5fb - Refactor shape inference implementation (#855 ) `fc6b5fbb6d`	2018-05-01 15:04:47 +00:00
peterjc123	15b12e6f8a	Add support for MKLDNN on Windows (#7130 )	2018-05-01 10:57:16 -04:00
Paul Jesse Hellemn	7968ee0f59	Removing references to CUDA_SDK_ROOT_DIR to see if it breaks anything (#7125 )	2018-05-01 07:52:16 -07:00
Peter Goldsborough	87e6362393	Add more warnings to C++ API build (#7123 ) Enables more warnings in the C++ API build. Fixed a bunch of things in torch/csrc/. Mostly taken from c10 * Enable -pedantic for C++ build * Enable more warnings * Include CUDA and library headers with -isystem * Fix sign-promo warning	2018-05-01 10:40:22 -04:00
Edward Z. Yang	0427afadd1	Make AT_ASSERT/AT_ERROR non-printf based, other tweaks (#7104 ) * Make AT_ASSERT/AT_ERROR non-printf based, other tweaks - AT_ASSERT/AT_ERROR don't take printf strings anymore; instead, they take a comma-separated list of things you wanted to print (bringing it inline with Caffe2's conventions). Instead of AT_ASSERT(x == 0, "%d is not zero", x) you write AT_ASSERT(x == 0, x, " is not zero") This is done by way of a new variadic template at::str(), which takes a list of arguments and cats their string reps (as per operator<<) together. - A bunch of the demangling logic that was in Error.h is now moved to Error.cpp (better header hygiene.) Also, demangle has been moved out to its own helper function, and also a new helper demangle_type (from Caffe2) added. - A bunch of AT_ASSERT converted into AT_CHECK, to more properly convey which checks can be caused by user error, and which are due to logic error in ATen. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * CR Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Fix test failure. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * buildfix Signed-off-by: Edward Z. Yang <ezyang@fb.com> * More fixes. Signed-off-by: Edward Z. Yang <ezyang@fb.com> * One more fix Signed-off-by: Edward Z. Yang <ezyang@fb.com> * Try harder Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-05-01 10:28:31 -04:00
xkszltl	24461a756a	Separate "-Xcompiler <...>" into 2 elements because ${nvcc_flags} (when using CUDA_SEPARABLE_COMPILATION) doesn't recognize it. (#7118 ) This solves the "nvcc fatal : Unknown option 'Xcompiler -MD'" issue where nvcc gets -'Xcompiler -MD'.	2018-05-01 09:31:43 -04:00
Kento NOZAWA	dccfdf317b	Fix example of torch.clamp (#7131 )	2018-05-01 14:52:32 +02:00
Masaki Kozuki	ba046331e8	add spectral normalization [pytorch] (#6929 ) * initial commit for spectral norm * fix comment * edit rst * fix doc * remove redundant empty line * fix nit mistakes in doc * replace l2normalize with F.normalize * fix chained `by` * fix docs fix typos add comments related to power iteration and epsilon update link to the paper make some comments specific * fix typo	2018-05-01 17:00:30 +08:00
onnxbot	23a5ddd3c8	[auto] Update onnx to b7d8dc8 - fix cmake warning message (#863 ) `b7d8dc8fa6`	2018-05-01 08:21:41 +00:00
onnxbot	e8916f510b	[auto] Update onnx to f585c5d - add pytorch-operator test for tile (#831 ) `f585c5d066`	2018-05-01 07:22:44 +00:00
onnxbot	c72e5da7eb	[auto] Update onnx to 993fe70 - add install step (#832 ) `993fe70805`	2018-05-01 07:21:36 +00:00
Lu Fang	5acc62ffa5	Skip Tile onnx backend to keep CI green (#7120 )	2018-04-30 22:37:34 -07:00
James Reed	892bef9aa3	[ONNX] Delay external value resolution as long as possible in ONNX backend (#7111 )	2018-04-30 21:30:31 -07:00
Joel Wong	0b0279981d	Fix example for new_zeros in documentation (#7128 ) Fix for Issue #7088	2018-05-01 00:29:13 -04:00
Pooya Davoodi	531944275c	[Caffe2] Guard CUDA API calls in caffe2/operators using macro CUDA_CHECK (#6810 )	2018-04-30 21:27:37 -07:00
Yinghai Lu	150af6ac1e	Move ideep ops from caffe2/contrib/ideep to caffe2/ideep (#7112 )	2018-04-30 21:10:46 -07:00
Yinghai Lu	b2cdd08252	Introducing onnx-tensorrt to third_party (#7119 )	2018-04-30 21:09:51 -07:00
bddppq	4add3a4df7	Add dependency from caffe2_gpu to ATen in CMake (#7117 )	2018-04-30 19:30:34 -07:00
onnxbot	cdc6d104e2	[auto] Update onnx to 68bc26c - add type inference for traditional ml ops except classifier ops. (#857 ) `68bc26cfb2`	2018-05-01 02:21:49 +00:00
Richard Zou	b3be71f046	[easy] Stop hardcoding "python" executable in bottleneck tests (#7105 ) Right now, the bottleneck test_utils.py tests assume that a user's python executable is 'python'. This may not be the case especially if the user has multiple versions of python installed. This PR changes it so that test_utils.py uses `sys.executable` as the python executable.	2018-04-30 22:01:36 -04:00
Peter Goldsborough	afe3c2688f	Update C++ API tests to use Catch2 (#7108 ) * Update C++ API tests to use Catch2 * Update download_mnist.py to be less verbose	2018-04-30 21:36:35 -04:00
Peter Goldsborough	25e7d5c612	Make @ebetica and @goldsborough owners for test/cpp/api (#7113 )	2018-04-30 21:35:13 -04:00
Yinghai Lu	6e72ba9798	[Caffe2] Fail fast for C++ unit tests too (#7106 ) * Fail fast for C++ unittests too * Fix based on comments	2018-04-30 17:30:03 -07:00
onnxbot	7efd6f0506	[auto] Update onnx to 9cc0cda - fix string representation of scalar types (#858 ) `9cc0cdabd3`	2018-05-01 00:07:32 +00:00
theweiho	ab44002ac8	Open-source extractMetaNetDef & runGlobalInitialization, add new Predictor constructor from db file, and add run_map_outputs (#7063 ) * Refactor extractMetaNetDef and runGlobalInitialization into open... * Fix test by making get output blobs optional * Update test instead of making output blobs optional	2018-04-30 17:01:27 -07:00
Peter Goldsborough	71f6cca992	Make @ebetica and @goldsborough owners for torch/csrc/api (#7110 )	2018-04-30 15:48:12 -07:00
onnxbot	bd69d2fd23	[auto] Update onnx to 1078925 - fix y in pow test case to scalar (#852 ) `1078925c2d`	2018-04-30 22:42:37 +00:00
daquexian	f87462c65f	[Caffe2] Fix the wrong argument name in collect_and_distribute_op (#7091 ) * Fix the wrong argument name, FPN works! * Fix collect_and_distribute test	2018-04-30 15:01:11 -07:00
Peter Goldsborough	50218a25e7	[EASY] Document load_inline (#7101 ) * Document load_inline * Link to tests for examples * Links in RestructuredText are weird	2018-04-30 14:36:41 -07:00
Paul Jesse Hellemn	1ea3f79569	Location of pip package changed (#7100 ) * Location of pip package changed * They moved setuptools two days ago too	2018-04-30 14:35:17 -07:00
Paul Jesse Hellemn	95681257d6	Revising cudnn version check (#7062 )	2018-04-30 14:34:41 -07:00
Peter Goldsborough	af71fb882f	Merge autogradpp into PyTorch (#7074 ) * Dump autogradpp into PyTorch * Fixed up CMake for autogradpp/C++ API * Made cereal a submodule * Change search location of autogradpps mnist directory * Add test_api to CI * Download MNIST from the internet instead of storing in repo * Fix warnings	2018-04-30 12:53:46 -07:00
Peter Goldsborough	3407708b81	Remove unused variable (#7103 )	2018-04-30 12:53:28 -07:00
onnxbot	bf9fab3cf3	[auto] Update onnx to c66fb6f - Add some math function shape inference (#845 ) `c66fb6f077`	2018-04-30 19:45:21 +00:00
Thomas Viehmann	20c965f7d6	fix max/min on cuda in presence of NaN (fixes #6996 ) (#7052 ) Thank you ngimel and zou3519!	2018-04-30 21:02:47 +02:00
Paul Jesse Hellemn	90026f59a3	Switching to conda's --no-test flag (#7099 ) * Switching to conda's --no-test flag * Also updating callsite in .jenkins/build.sh	2018-04-30 11:22:25 -07:00
daquexian	9a3c723644	Add missing PrintOp arguments doc (#7084 )	2018-04-30 11:17:56 -07:00
xkszltl	caa6a8ce30	Switch to the official git mirror for Eigen. (#7090 )	2018-04-30 14:09:18 -04:00
Edward Z. Yang	39c0b0b850	Delete unnecessary header includes. (#7094 ) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-04-30 14:04:28 -04:00
Paul Jesse Hellemn	2a56666196	Removing leveldb to make special gcc builds unnecessary (#7098 )	2018-04-30 10:55:37 -07:00
Peter Goldsborough	b70b7a80d4	Inline JIT C++ Extensions (#7059 ) Adds ability to JIT compile C++ extensions from strings >>> from torch.utils.cpp_extension import load_inline >>> source = ''' at::Tensor sin_add(at::Tensor x, at::Tensor y) { return x.sin() + y.sin(); } ''' >>> module = load_inline(name='inline_extension', cpp_sources=source, functions='sin_add') Fixes #7012 * Inline JIT C++ Extensions * jit_compile_sources -> jit_compile * Split up test into CUDA and non-CUDA parts * Documentation fixes * Implement prologue and epilogue generation * Remove extra newline * Only create the CUDA source file when cuda_sources is passed	2018-04-30 11:48:44 -04:00
onnxbot	c5978db094	[auto] Update onnx to ff667d1 - Refactor return type and docs for ONNXIFI_BACKEND_DIRECTX_ID (#853 ) `ff667d1dfb`	2018-04-30 15:37:06 +00:00
Tongzhou Wang	d9aeb7e71b	clamp now has subgradient 1 at min and max (#7049 ) * subgradient 1 at min and max for clamp * clamp max and clamp min too * add comment	2018-04-30 21:21:56 +08:00
Thomas Viehmann	8fbab83c2a	only Tensors of floating point dtype can require gradients (see #7021 ) (#7034 )	2018-04-30 10:20:00 +02:00
Soumith Chintala	6a55d86234	GroupNorm docs (#7086 )	2018-04-30 09:40:34 +02:00
onnxbot	881af544fd	[auto] Update onnx to 11c6876 - clear initializer names when clear initializer (#849 ) `11c6876f1d`	2018-04-30 07:00:36 +00:00
Marcin Elantkowski	bc62645e4c	[jit] Fix handling of IntList[k] parameters (#6965 ) * squash commits * emit additional declarations and handle positional arg. case * apply minor tweaks * py-2 fix * Address Tom's comments * move logic to gen_jit_dispatch, start adding tests * add test * address review comments * address review comment * fix build issue. change argument indices to argument names. Get rid of deepcopy * py-2 flake8 fix	2018-04-29 23:09:04 -04:00
Richard Zou	96c6ae67bb	Remove incorrect/irrelevant test code. (#7050 ) Followup to #6873.	2018-04-29 23:03:44 -04:00
Ethan Steinberg	ee00a8049a	Add max pooling support to EmbeddingBag (#5725 ) * Add max mode support to EmbeddingBag * Lint fix * Fix compilation issue on other platforms * Rebase + don't waste memory when not in max mode * Oops, missed a spot * Fix whitespace from merge * less precision * Lower precision to avoid spurious failures * Minor typo * Switch to size()	2018-04-29 16:48:11 -04:00
Xiaomeng Yang	49f87320ba	[Caffe2] Add full impl of GroupNorm (#7058 ) * Add full impl of GroupNorm * Fix comments in math.h * Remove unsed buffers * Add #include <array> in gpu version * Remove unused moments_buffer_ * Make inverse std to be a template. * Add detailed comments	2018-04-29 11:26:40 -07:00
Luca Antiga	0703357723	Don't build THD/master_worker if not explicitly requested (#7081 )	2018-04-29 13:17:09 -04:00
Francisco Massa	b240cc9b87	Add support for dotted names in CPP Extensions (#6986 ) * Add support for dotted names in CPP Extensions * Modify tests for cpp extensions Test that dotted names work * Py2 fixes * Make run_test cpp_extensions Win-compatible	2018-04-29 18:10:03 +02:00
Yinghai Lu	e6ce1afe47	[Caffe2] Follow-up of onnx-trt API change (#7076 ) * Follow-up of onnx-trt API change * indent * comments	2018-04-28 23:07:15 -07:00
onnxbot	7450e9152b	[auto] Update onnx to 73c34ae - Clarify FeatureVectorizer description. (#843 ) `73c34ae62f`	2018-04-28 22:43:39 +00:00
Peter Goldsborough	281f095972	Add autograd API to at::Tensor (#6582 ) * Add autograd API to at::Tensor * Trying to fix linker errors on Windows * Add AT_API to set_data	2018-04-28 12:54:05 -07:00
onnxbot	802e718e1c	[auto] Update onnx to 1befb9b - Remove useless text in docs (#850 ) `1befb9b12d`	2018-04-28 17:30:40 +00:00
Edward Z. Yang	4caea64d72	Make all of TH and THC C++. (#6913 ) Changelist: - Move .c to .cpp - Change includes of ".c" to ".cpp" - A bunch of cmake configuration modifying CMAKE_C_FLAGS changed to CMAKE_CXX_FLAGS or add_compile_options, because if you do CMAKE_C_FLAGS it only applies when you compile C code - Explicitly cast void* to T* in a number of places - Delete extern "C" { ... } blocks; instead, properly apply TH_API to everything that should have it (TH_API handles extern "C") - Stop using stdatomic.h, instead, use <atomic>. This resulted in a bunch of placement-new/delete to be "totally properly correct" - Refactor of THLongStorageView to not have static constructor methods (since it no longer has a copy/move constructor) - Documentation about how the TH C interface (and extern C business) works - Note that THD master_worker mode is dead - C++ headers in TH libraries are given .hpp suffix, to make it less likely that you'll confuse them with the C-compatible headers (now suffixed .h) - New function THCStream_stream and THCStream_device to project out fields of THCStream instead of accessing fields directly - New function THStorage_(retainIfLive), which is equivalent to a retain but only if the refcount is greater than zero. - In general, I tried to avoid using hpp headers outside of ATen/TH. However, there were a few places where I gave up and depended on the headers for my own sanity. See Note [TH abstraction violation] for all the sites where this occurred. All other sites were refactored to use functions - Some extra Werror fixes (char* versus const char*)	2018-04-28 07:45:02 -04:00
James Reed	4667983f0f	Fixes for interpreter and ONNX export for translation (#7044 ) Fixes for interpreter and ONNX export for translation Address comments	2018-04-27 22:23:57 -07:00
Paul Jesse Hellemn	fc6a846cc5	[Caffe2] Fixing bug in conda builds (#7061 ) * Fixing bug in conda builds * Update to other PR	2018-04-27 21:52:40 -07:00
Paul Jesse Hellemn	1048d0dd67	[Caffe2] Moving all conda package information into package name rather than build string (#7041 ) * Lowercasing script internal variables * Removing nccl from name	2018-04-27 21:42:49 -07:00
xkszltl	065cd32ed0	Fix ".pb.h" dependency issue about DLL build. (#7027 ) * Add missing header "caffe2/core/common.h" before "caffe/proto/caffe.pb.h" to provide CAFFE2_API macro. This only affects the Windows build since CAFFE2_API is only defined for DLL. * Fix ".pb.h" dependency issue about DLL build. CAFFE2_API defined in "caffe2/core/common.h" is required by ".pb.h" generated on Windows for DLL build. We always need to have "#include <caffe2/core/common.h>" before using any proto header. In this case "caffe2.pb.h" is already included by "context_gpu.h" -> "common_cudnn.h" in the correct order, hence we simply remove a line.	2018-04-27 21:21:46 -07:00
onnxbot	bb9c859253	[auto] Update onnx to e84788f - Fix SELU attributes' default values (#839 ) `e84788fb48`	2018-04-28 04:18:49 +00:00
James Reed	20cd27da42	[caffe2][ONNX] Implement CPU NumpyTileOp and corresponding ONNX backend (#7053 ) * Implement CPU NumpyTileOp * Address comments	2018-04-27 19:58:15 -07:00
Peter Goldsborough	2e023a29e4	Add optional support to C++ extensions (#7055 )	2018-04-28 01:59:50 +01:00
Peter Goldsborough	7b09bc72a5	[WIP] Enable WERROR in tests (#6539 ) * Enable WERROR in tests * Also set WERROR=1 for cpp_build in CI * Enable Werror after the compiler checks * Remove -DWERROR because its picked up from the env var * Had to fix some errors in aten/contrib/data * Allow an uninitialized variable in ReduceOpsKernel.cpp * Use CUDNN_DATA_UINT8 in cuDNN type string conversion * Fixes and use target_compile_options * Fix uninitialized variables in THNN * Include Python.h earlier in tensor_types.cpp * Use CUDNN_VERSION 7100 instead of 7000? * More Python.h includes * Make switch case in common_subexpression_elimination.cpp exhaustive * Build with WERROR=0 just to see all the warnings * Remove some Python includes * Enable WERROR=1 again * Bring back switch case default	2018-04-28 01:51:16 +01:00
Zachary DeVito	733e2967b1	Allow `__constant__` values in a ScriptModule to be used as attributes for builtin functions (#7017 ) * Allow `__constant__` values in a ScriptModule to be used as attributes for builtin functions * Fix bugs in @script loops 1. while loops run shape propagation multiple times until the shapes have converged. There were two bugs here. (a) First the 'changed' condition was not checking if it actually changed the output, and instead would mark changed = true if the two inputs were different. This incorrect because the output of the block and the input of the block may always have different shapes. Now it actually checks if it is about to change the output entry that it is writing to. (b) expand nodes were being inserted into the graph even inside the while loop body. However, if we iteratively discover that the input shape to one of these expands is actual dynamic, then it was incorrect to insert the expand in the first place. This changes it so that we only insert expands after we have converged on the shapes. 2. the way deleteExtraInputs removed loop-carried dependencies was unsafe because it would lookup Value* elements in the loop body's environment that were previously invalidated when deleteExtraInputs remove another input to the loop. This changes the way deleteExtraInputs works so that it never has to read a value out of the loop body's environment to avoid using the invalidated pointers.	2018-04-27 17:44:17 -07:00
Orion Reblitz-Richardson	02a764f82d	Update the video input op in caffe2 (#7054 ) There are multiple fixes to the video input op recently. This is to update the caffe2 version so that it is up to date.	2018-04-27 17:17:42 -07:00
xkszltl	980960d036	Fix Visual Studio error C2398 about ill-formed narrowing conversion. (#7024 )	2018-04-27 17:07:56 -07:00
Marat Dukhan	59f5f9ac36	[caffe2] Fix build of depthwise_3x3 for CUDA compute capability < 3.5 (#7048 ) PR #6601 broke build on older CUDA targets due to __ldg intrinsics. This patch adds a work-around.	2018-04-27 18:53:24 -04:00
gchanan	361648a4a7	Fix torch.tensor(...) device-type calculation when used with numpy an… (#6995 ) * Fix torch.tensor(...) device-type calculation when used with numpy and type inference. * Fix tensor device type inference as well. * Better variable type inference: infer cuda-ness only if device is not specified.	2018-04-27 18:12:33 -04:00
Samuel	0c737dff63	fix lbfgs variable names (#7037 ) Switches the step/direction variable names (steps and directions are flipped in the current implementation of the two loop-recursion). This change does not change the numerical output of the program, but should make it easier to follow.	2018-04-27 17:47:37 -04:00
onnxbot	6ce376fee3	[auto] Update onnx to ebac046 - Add tile test case (#823 ) `ebac0463a0`	2018-04-27 21:01:58 +00:00
Bram Wasti	f630de8f33	[caffe2][nomnigraph] Lint run (#7045 )	2018-04-27 12:58:58 -07:00
Richard Zou	932c4c2364	Prevent stack overflow on deletion of deep graph (#6873 ) * Prevent stack overflow on deletion of deep graph Fixes #5534. Sometimes one can end up with a very big computation graph of Functions and Edges. Each std::shared_ptr<Function> contains a list of Edge, and each Edge contains a std::shared_ptr<Function>. Deleting a std::shared_ptr<Function> can trigger the recursive deletion of other std::shared_ptr<Function>'s: this can stack overflow if the graph is deep enough. Here is an example of such a graph: shared_ptr<Function> -> Edge -> shared_ptr<Function> -> Edge -> ... -> shared_ptr<Function> The solution here is to use a custom deleter with each std::shared_ptr<Function>. The custom deleter keeps track of how many nested deleters it is in. When this number exceeds the maximum allowed depth, the Function* to be deleted are accumulated in a per-thread delete queue and handled by one of the deleters. Example code that could trigger the overflow (set ``depth`` to something > 100000) is below. I also benchmarked the below code before/after the changes to see if there are any significant performance differences. ``` import torch def scope(): depth = 80000 x = torch.randn(9, requires_grad=True) y = x.clone() # build deeply nested computation graph for i in range(depth): y = y + y * 0.000001 %timeit -n 100 scope() 376 ms ± 3.94 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) Without changes: 352 ms ± 6.58 ms per loop (mean ± std. dev. of 7 runs, 100 loops each) ``` With the change, the above code is 6.8% slower. UPDATE: I did some more benchmarking. It looks like it takes 25% more time to free the computation graph in the case of the straight chain graph: https://gist.github.com/zou3519/93cf84d96ae431356ae7f7c1923ef51a * WIP * Add custom deleter to PyFunctions created by THPFunction * Address some comments; pick new value * Address some more comments * Add more complicated test; special case the windows depth constant	2018-04-27 15:49:58 -04:00
Thomas Viehmann	c730792d51	Add big warning about averaging to KLDivLoss documentation #6622 (#7006 ) * Add big warning about averagin to KLDivLoss documentation #6622 Also: An (independent) change in diagonal docstring tensor formatting. * Improve note with example Thank you Richard Zou! * use log_softmax	2018-04-27 15:45:26 -04:00
cpuhrsch	ae35e0e924	Support non-contiguous tensors for unary ops (#6119 )	2018-04-27 21:31:34 +02:00
gchanan	a6bfa16c17	torch.arange: add numpy-style type inference. (#7016 ) * torch.arange: add numpy-style type inference. This is a backwards-compatibility breaking change. * Fix flake8. * Use at::optional. * Remove unneeded header files. * Use reference wrapper. * Update arange for test. * Address review comments.	2018-04-27 15:11:45 -04:00
onnxbot	bdd27ea956	[auto] Update onnx to 8b7a925 - a few more shape inference functions (#772 ) `8b7a9252c9`	2018-04-27 19:06:58 +00:00
onnxbot	f6083b343b	[auto] Update onnx to 9718f42 - Make the coefficient non optional for LinearClassifier (#836 ) `9718f42976`	2018-04-27 18:06:39 +00:00
onnxbot	39c6101ab4	[auto] Update onnx to ef083d0 - Add save_tensor and load_tensor functions for Protos (#770 ) `ef083d0338`	2018-04-27 17:13:59 +00:00
Thomas Viehmann	1b0ad8678b	import *Sampler to utils.data (Better fix than #6982 ) (#7007 )	2018-04-27 10:18:29 +02:00
Peter Goldsborough	3d4d39ce30	Also check compiler ABI compatibility when JIT compiling (#7015 )	2018-04-27 08:19:17 +01:00
onnxbot	9db779f331	[auto] Update onnx to 45ceb55 - Check if CMAKE_BUILD_TYPE set before project(). (#812 ) `45ceb5523a`	2018-04-27 04:51:00 +00:00
Jerry Ma	76d3c30783	Enable resetting of batchnorm running moments and cumulative ("simple") moving average (#6445 )	2018-04-26 19:27:24 -07:00
Bram Wasti	eaab6ce459	[caffe2][nomnigraph] Move nomnigraph<->caffe2 converter logic to caffe2/opt (#7018 )	2018-04-26 18:28:13 -07:00
gchanan	18ed2160b0	Use Index rather than Long for IntList parsing (#6674 ) * Use Index rather than Long for IntList, so floating-point types convertible to ints fail the parsing. Basically, our unpackLong code works with floating-point types that are convertible to ints, but this isn't often what you want (because of truncation). What you actually want is to convert to an index, which will usually find such issues. I made this the minimal change I could because: 1) I didn't want to change unpackLong because the existing code call checkLong before unpackLong, so this should be a non-issue most of the time. And fixing this properly requires calling checkLong again, which will slow everything down. 2) An exception above is with IntList, which only checks that 1) it is a tuple or 2) it is a varargs tuple (i.e. torch.ones(1, 2, 3)). * Fix bug. * Don't conflict tensor and IntList bindings. * Change function to be consistent between python 2 and 3. * Check Index. * Move IntList overloads in legacy new functions to below Tensor overloads.	2018-04-26 19:13:23 -04:00
Paul Jesse Hellemn	902579602b	[wip] [Caffe2] Changes to integrated binaries (#6997 ) * Changes to integrated binaries * Changes for cpu version of integrated binary * Disabling static linking of CUDA for pytorch for integrated builds	2018-04-26 15:43:24 -07:00
onnxbot	19cb5a0436	[auto] Update onnx to 4b3d2b0 - [WIP] reenable shape inference tests (#834 ) `4b3d2b02e8`	2018-04-26 22:17:52 +00:00
onnxbot	d67ec68dbe	[auto] Update onnx to 22d17ee - RNN tests: LSTM, GRU, SimpleRNN (#739 ) `22d17eee2e`	2018-04-26 20:57:42 +00:00
gchanan	a08091a42d	Implement matmul_out and dot_out. (#6961 ) * Implement matmul_out and dot_out. * Fix autograd by only calling _out variants if we have an out ourselves. * Disallow mismatched types in dot_out. * Make sure out variant doesn't have a method. * Do proper type conversion.	2018-04-26 16:52:58 -04:00
Mike Ruberry	49493948a8	Fixes some build warnings. (#7004 )	2018-04-26 16:44:23 -04:00
Lu Fang	9a6c033004	Skip unsupported ONNX backend test cases (#7005 )	2018-04-26 13:10:55 -07:00
li-roy	242f6c3470	Don't print dots after nonfinite numbers in integral float tensors (#6835 ) * Don't print dots after nonfinite numbers in integral float tensors * get around lint * support python 2 * refactor * better refactor	2018-04-26 11:18:12 -07:00
Thomas Viehmann	2b44c420c8	Enhance diagonal (fixes #6479 ) (#6718 ) * Enhance diagonal This patch - adds Tensor.diagonal to complement torch.diagonal - implements diagonal natively in ATen - makes diagonal a view - implements taking arbitrary diagonals - implements diagonal backward instead of referring to the (more limited) diag * add tests, copy diagonal code to backward for double differentiability * improve tests and doc comment. Thank you, Adam! * Mark diagonal as view function in gen_autograd.py, use simple backward.	2018-04-26 11:11:20 -04:00
Paul Jesse Hellemn	8109b3065e	Slight changes to anaconda script (#6994 )	2018-04-26 10:04:58 -05:00
Zachary DeVito	b2581c0289	Workaround in onnx to get transposes into init_nets (#6924 ) * Workaround in onnx to get transposes into init_nets This adds a pass to ONNX so that it can speculate Transpose operators so that ONNX's split pass can put them into an init_net Also fixes a potential bug in onnx peephole where an optimization across blocks might move a Value and violate scoping. * Perform shape propagation when embedding a program into a trace. This ensures the trace still has type information specific to that trace, which will help onnx export succeed in more cases.	2018-04-26 11:04:17 -04:00
Wenhao Hu	a64b2987b4	[ONNX] export tile op (#6954 ) * onnx export aten::repeat to Tile * move repeats to input * turn repeats to a long tensor constant * deal with case that len of repeats bigger than number of dims in input	2018-04-26 11:03:41 -04:00
Thomas Viehmann	5dc5a71d74	Improve error message (Sampler location) Fixes #6917 (#6982 ) Thank you @ruotianluo for reporting!	2018-04-26 10:58:27 -04:00
derek_kim	984516bdc4	typo corrected: is -> if (#6980 )	2018-04-26 09:57:11 -04:00
Neeraj Pradhan	3964253f94	Allowing for vectorized counts in Binomial Distribution (#6720 )	2018-04-26 15:53:01 +02:00
Thomas Viehmann	f98b778086	Fix forward and backward for norm/renorm with infty norm (fixes #6817 ) (#6969 )	2018-04-26 12:54:53 +02:00
Marat Dukhan	24d05662ea	[caffe2] Open-source DEPTHWISE_3x3 engine (#6601 ) DEPTHWISE_3x3 engine provides an optimized implementation of depthwise 3x3 convolution, e.g. for ShuffleNet, MobileNets Implementations exist for CPU (generic), ARM CPU, and CUDA GPU. Originally developed by @ajtulloch	2018-04-26 02:30:51 -04:00
onnxbot	eb4154a007	[auto] Update onnx to 485b787 - function proto for composite op. (#802 ) `485b7875fa`	2018-04-26 03:01:03 +00:00
gchanan	3d907ef78e	Consistently check 'out' variants against specified dtype/layout/device parameters. (#6973 ) We were previously doing this in the most common cases, but not consistently.	2018-04-25 22:46:42 -04:00
Thomas Viehmann	c10da636b5	implement gamma cuda (#6855 ) * Refactor standard_gamma and implement CUDA gamma sampling * Attempt fixes for AT_CUDA_ENABLED changes * Gamma cuda and cpu forward as ATen native * implement standard_gamma_grad_cuda * update native_test.cpp, try to fix windows and various cuda version compiles * searching a windows fix via CI... use std:: for math * casting some constants in the calculation, compute at float for half precision * whitespace fixes * add acctype to do half->float computation, include HALF in generation, cast locally rather than tensors * fix cuda8 half compilation * always use scalar_cast with CUDACC, lock CPU generator, CPU acctype = double\nThank you for your review comments!	2018-04-25 22:22:09 -04:00
Lu Fang	7cbef70372	Fix the onnx symbolic for selu and maxpool3d (#6816 )	2018-04-25 22:20:45 -04:00
Emanuel Jöbstl	645ad7ad0c	Fixing LP-Pooling stability issues (#6766 ) * Added ReLU unit to LP pooling, so the gradient does not become NAN if all inputs are zero. * Added workaround for odd p. Added a bit of doc. * Make the linter happy.	2018-04-25 22:13:15 -04:00
Soumith Chintala	bd14d8e8f8	add additional caffe/caffe2 paths to exclude list in pytorch setup.py (#6891 )	2018-04-25 22:10:38 -04:00
Mike Ruberry	ab016a2b30	Code Cleanup: removes unused getTextureObject (#6974 )	2018-04-25 21:07:48 -04:00
Mike Ruberry	2d6d6a4d10	Removes unused _long functions in THCTensorIndex (#6971 )	2018-04-25 21:07:28 -04:00
Mike Ruberry	31c9b4f0d2	Changes incorrect "overlappingIndices" call to correct "maybeOverlappingIndices" (#6953 ) * Changes incorrect "overlappingIndices" call to correct "maybeOverlappingIndices" THE PROBLEM The current overlappingIndices() is meant to detect if a tensor defines multiple valid indices for the same data element. There are two significant issues with this function: (1) The algorithm it attempts to implement cannot do this. (2) That algorithm is not implemented correctly. This call is used by pointwiseApply() and scatter(). If a tensor is readable/writable and detected as overlapped these algorithms will create a non-overlapped copy of it to work on. When tensors are improperly identified as overlapped this causese extra work. If tensors are improperly identified as non-overlapped then this would cause the operations to exhibit unexpected behavior. For example, ref = torch.arange(0, 32 * 5).view(4, 8, 5).cuda().double() p = ref[:,:,::2] p += 1 Results in a call to pointwiseApply1, which detects p as an overlapped tensor (it is not), causing a call to pointwiseApply2 that copies it into a non-overlapped temporary, and then another call to pointwiseApply2 later that copies it back to the original tensor. If, however, the original tensor is given dimensions of (4, 8, 4), instead, it is correctly detected as non-overlapped and only a single pointwiseApply1 call is made. DISCUSSION + FIX The algorithm that overlappingIndices() attempts to implement tests for a sufficient but not necessary condition of a tensor to be non-overlapping. That is, if its algorithm were implemented properly then it would be a conservative check that would ensure all overlapped tensors were copied (as desired), but also that some non-overlapped tensors were copied too. The algorithm can be thought of as trying to test whether the dimensions can be ordered like "nesting dolls," with each dimension fitting within the next one larger than it. If this is true then the tensor is non-overlapping, but if it's false the tensor may or may not be overlapped. For example, a tensor with dims (2, 3) and strides (4, 3) cannot be "nested," but is non-overlapping. (The tensor looks like [[0, 3, 6], [4, 7, 10]].) The algorithm is currently implemented improperly, as can be seen in the example above. The tensor p has dimensions [4, 8, 3] and strides [40, 5, 2]. This confuses the current implementation, which thinks the innermost dimension needs a stride of 6, which is incorrect. The first row is [0, 2, 4] and the next row begins with 5. The current implementation also improperly implemented its sorting behavior. (qsort comparators require -1, 0, and 1, not true/false return values.) Fixing the existing algorithm is straightforward (and what this PR does, see below), but it is important to note that the algorithm never performed as intended, so its name and the documentation around it has been updated, too. A natural question is if it's possible to write an efficient overlappingIndices(), and I believe the answer is "no." Disambiguating overlapping from non-overlapping tensors is equivalent to finding a nonzero solution to a linear diophantine equation with restricted coefficients, that is, an equation of the form x_0s_0 + x_1s_1 ... = 0 where s_X is the stride in dimension X and x_X is an integer from [-size_X + 1, size_X - 1]. Another note is that the CPU does not perform this check. For example, if we run: a = torch.FloatTensor([[0,1], [10, 11]]) b = torch.FloatTensor([[0,0],[0,0]]) b = b.set_(a.storage(), storage_offset=0, size=a.size(), stride=(1,1)) b += 1 Then b is [[1, 3], [3, 11]] because the operation is applied twice to the second element of the original tensor. This causes no warning. Since the CPU does not perform a similar check, another question is whether the GPU code should remove its check. While it may seem that writing to overlapping tensors is an error state, running test_cuda.py reveals 171 instances of possibly overlapped tensors being copied by pointwiseApply(). (The prior incorrect version has 176 copies.) Allowing writing to overlapped tensors on the GPU may violate assumptions about memory accesses, too. In fairness, these assumptions may be violated on the CPU already. Leaving the CPU vs GPU behavior question for the future, this fix corrects the current intended GPU behavior. This means that there will be fewer unnecessary copies and no chance of an overlapped tensor sneaking through on the GPU. The CPU behavior remains unchanged. The fix also adds a test to test_cuda.py to ensure that overlapped tensors on the GPU are written to as expected. * cleanup * Fixes Python formatting	2018-04-25 21:07:13 -04:00
ngimel	d48d3ef6bc	Make cuda 9 behave as cuda 8 wrt half conversions (#6958 ) * Make cuda 9 behave as cuda 8 wrt half conversions Cuda 9 is too smart about implicit half conversions, this would disable them so that cuda 8 and cuda 9 behave in the same way wrt half. * try fixing windows build * one more broken conversion	2018-04-25 17:59:49 -07:00
onnxbot	5209213fa7	[auto] Update onnx to cd58928 - specify defaults for attributes of Affine op (#820 ) `cd589283a0`	2018-04-26 00:26:42 +00:00
Lu Fang	f21c5c5cd8	Fix the symbolic of batchnorm to handle special case (#6967 )	2018-04-25 17:04:25 -07:00
Paul Jesse Hellemn	b038b3d7be	Always dumping final meta.yaml for debugging (#6977 )	2018-04-25 19:00:24 -05:00
onnxbot	3573f64bb1	[auto] Update onnx to 7ee2cf9 - merge the dummy backend back into the main one (#743 ) `7ee2cf9854`	2018-04-25 23:44:01 +00:00
Lu Fang	8028162103	Update the script to avoid the protobuf lib issue and add ZFNet (#6966 )	2018-04-25 16:38:43 -07:00
gchanan	94d2afbe50	Clarify _unsafe_view comment. (#6952 ) It was unclear to me whether the "viewed" tensor was the input or the output.	2018-04-25 19:29:49 -04:00
Paul Jesse Hellemn	2e32e8df75	Statically linking CUDA for Anaconda builds (#6680 ) * Statically linking CUDA for Anaconda builds * typo * Adding a summary line * Comments * Typo fix * Fix faulty parameter passing * Removing problem CUDA modules for now * Fixing unused debugging function * Turning off static cuda linking until script changes are in * Disabling mkl	2018-04-25 18:22:54 -05:00
James Reed	7599d0c3fe	[caffe2] ONNX backend support for control nodes (#6914 )	2018-04-25 15:44:00 -07:00
Sam Gross	3b009dffe1	Delete unused legacy indexed based streams (#6964 ) PyTorch uses THC's THCStream API.	2018-04-25 18:38:47 -04:00
Bram Wasti	1e134b11ec	[caffe2][cmake][opencl] Wrong directories were being included, which might break systems without opencl in the system headers (#6972 )	2018-04-25 14:58:16 -07:00
onnxbot	5aed120bc3	[auto] Update onnx to 1c03a5a - [Proposal] ONNX Interface for Framework Integration (previously ONNX Backend API) header and docs (#551 ) `1c03a5a42e`	2018-04-25 21:28:39 +00:00
Sam Gross	a7b274bb2a	Remove scratch space from THCState (#6956 ) THC had a concept of per-device per-stream scratch space that was persistent in THCState. This was useful before the caching allocator because it avoided synchronizations in kernels that needed temporary scratch space. However, it's not thread-safe since multiple threads can operate on the same stream: In a two-pass reduction the scratch space may get clobbered in between the two kernels. This removes the scratch space and just uses THCudaMalloc and THCudaFree within the reductions. I've kept THCState_getCurrentDeviceScratchSpaceSize for now since it's useful to have the temporary buffer be sized based on the number of SMs.	2018-04-25 16:02:17 -04:00
onnxbot	075ca76c26	[auto] Update onnx to 3769a98 - Rename real model test case from VGG-16 to ZFNet (#821 ) `3769a98362`	2018-04-25 19:57:13 +00:00
Soumith Chintala	333e8c9b22	any/all returns LongTensor, make test expect that (#6957 )	2018-04-25 14:05:29 -04:00
rolczynski	6ebcb4606f	fix typo in the LSTMCell math definition (#6951 )	2018-04-25 19:20:46 +02:00
onnxbot	138d69c688	[auto] Update onnx to 403ccfb - Change the return type for the zipmap operator to match the description in the spec. (#818 ) `403ccfbd01`	2018-04-25 15:48:39 +00:00
Soumith Chintala	e767b186ee	add missing UNCERTAIN_TH_OMP_OVERHEAD_THRESHOLD to TH_TENSOR_APPLY_REDUCTION_OMP (#6946 )	2018-04-25 10:34:31 -04:00
Oleksandr "Alex" Zinenko	e7babb1890	[aten] only lookup CuDNN if compiling with CUDA (#6905 ) ATen can be configured to compile without CUDA support by passing -DNO_CUDA=0 to cmake. However, cmake will look for CuDNN independently of that flag and may eventually find it. In cases were compilation without CUDA support was requested on system with CUDA installed, this will result in linking errors while building some tests that rely only on CuDNN being found. Do not look for CuDNN if -DNO_CUDA=1 was provided in the cmake call since it does not make sense to compile with CuDNN if CUDA support was disabled.	2018-04-25 09:13:23 -04:00
vfdev	2dc177ac50	Update checkpoint.py (#6943 )	2018-04-25 08:43:58 -04:00
Tao He	39d4814933	Make any and all on ByteTensor behave like sum/prod. (#4627 )	2018-04-25 10:25:38 +02:00
onnxbot	241a1e0f52	[auto] Update onnx to 15289e3 - Tile - align with numpy (#757 ) `15289e3d77`	2018-04-25 08:16:31 +00:00
onnxbot	c820fda180	[auto] Update onnx to 42207c6 - Pass to lift captured values as inputs to control nodes (#804 ) `42207c60d8`	2018-04-25 08:15:37 +00:00
Sang-gil Lee	c92b5422f7	Fix typo in set_grad_enabled description (#6931 ) After setting set_grad_enabled(False), y.requires_grad returns False. But in the example it is described as True.	2018-04-25 09:23:15 +02:00
Xiaomeng Yang	e27d66a454	Remove Eigen from math CUDA and update algorithm in ReduceTensor and Moments (#6922 )	2018-04-24 23:07:35 -07:00
onnxbot	40301c3be7	[auto] Update onnx to 15289e3 - Tile - align with numpy (#757 ) `15289e3d77`	2018-04-25 06:05:47 +00:00
Wenhao Hu	2f311be90b	add default value to ConstantFill doc (#6923 )	2018-04-24 20:57:09 -07:00
li-roy	09f40ae06f	silence compiler warnings (#6915 )	2018-04-24 23:49:12 -04:00
Yang, Zhen	d9bde84b84	Add threshold for ops using openmp macro (#5584 ) * add threshold for ops using omp macro * modify interface for ops using omp macro * modify some thresholds * implement C macros with optional parameters to avoid duplicating definitions for all pointwise operations * add a parameter of LAB_IMPLEMENT_BASIC_FUNCTION for vectorizing * modify the comment * Revert "add a parameter of LAB_IMPLEMENT_BASIC_FUNCTION for vectorizing" Modify macro LAB_IMPLEMENT_VECTORIZED_FUNCTION to enable optional parameters This reverts commit 8ef783a0cc67b653c435e64a3beb6866a6b4216d. Conflicts: aten/src/TH/generic/THTensorMath.c * fix build error on windows * retrigger the test	2018-04-24 23:41:55 -04:00
jhcross	aa88ca8ae0	remove quotes from caffe2/contrib/aten/CMakeLists.txt (#6928 )	2018-04-24 20:37:14 -07:00
Orion Reblitz-Richardson	dec5e99e99	[aten] Move submodules to third_party (#6866 ) * [aten] Move submodules to third_party * [aten] Update aten_mirror.sh script for third_party * [aten] Move ATen submodules def to root and rename * [aten] Update cpuinfo cmake build * [aten] Fix cpuinfo cmake build * Update third_party/cpuinfo to d03d5d296063063c66877fb559cf34469734e3e1 * [aten] Fix JIT test reference to catch	2018-04-24 23:33:46 -04:00
Varun Agrawal	c33d7f565b	updated the environment collection script URL to the raw version on Github to download the script instead of the webpage (#6927 )	2018-04-24 23:30:32 -04:00
Yinghai Lu	8b70f7d248	[Caffe2] Clean up ideep integration (#6881 ) * Clean up ideep integrtation * . * Remove redundant code in convnet benchmark * MKL ON * Do not add -mavx2 everywhere * . * Comments * rename * .	2018-04-24 18:32:35 -07:00
Zachary DeVito	b7487d42a0	Workaround to make PythonOps traced with torch.jit.trace work correctly. (#6738 ) The long-term fix is to remove the handling-creating pathways and remove all the modes from PythonOp making it into an op that simply calls a PyObject. Right now ONNX expects PythonOp to hold a nn.Function, not a generic callable, so completely removing the legacy pathway will also require changes to how ONNX symbolics are found.	2018-04-24 17:21:00 -07:00
onnxbot	e28508afa5	[auto] Update onnx to 42207c6 - Pass to lift captured values as inputs to control nodes (#804 ) `42207c60d8`	2018-04-24 23:53:27 +00:00
James Reed	3c80a2b85c	[caffe2] Add flag to ONNXWhile to skip scoping (#6910 ) * [caffe2] Fix logic error in tensor filling ops in C++ ONNX backend * [caffe2] Add flag to ONNXWhile to skip scoping	2018-04-24 16:53:22 -07:00
onnxbot	53a8158d6d	[auto] Update onnx to 0eaf45f - Add dtype for input in Gather node test case (#815 ) `0eaf45ff89`	2018-04-24 22:23:24 +00:00
Zachary DeVito	0b5910f77e	[jit][script] Fix a bug combining sizes/unsized tensors (#6882 ) * [jit][script] Fix a bug combining sizes/unsized tensors This add an isSubtypeOf method to reflect that sized tensors are a subtype of Dynamic[Tensors]. It updates the typechecking code to reflect this relationship. * Add index_select to shape prop	2018-04-24 14:04:18 -07:00
James Reed	6e60edb799	[caffe2] Fix logic error in tensor filling ops in C++ ONNX backend (#6909 )	2018-04-24 13:53:27 -07:00
Lu Fang	146e8c8a10	Fix the legacy padding handling on global pool case (#6473 )	2018-04-24 13:34:51 -07:00
Bram Wasti	cfb626b638	[caffe2][tiny][fix] Make the build work with profile observers (#6908 )	2018-04-24 12:46:48 -07:00
Richard Zou	9dd73aa7eb	Fix stable link to always be /stable/ (#6907 )	2018-04-24 15:42:46 -04:00
Zachary DeVito	d985cf46f1	Add workaround to fix include warnings in Python 2 builds. (#6716 )	2018-04-24 12:30:19 -07:00
gchanan	90e75c6528	Speed up printing of large tensors. (#6876 ) * Speed up printing of large tensors. Instead of deciding on the format based on all of the elements of the tensor, decide based on the elements that will actually be printed. * Fix flake8. * Add else case.	2018-04-24 14:04:29 -04:00
Richard Zou	0430bfe40b	[docs] Update broadcasting and cuda semantics notes (#6904 ) * [docs] Update broadcasting and cuda semantics notes * Update multiprocessing.rst * address comments * Address comments	2018-04-24 13:41:24 -04:00
Edward Z. Yang	6418c49ee9	Make ArrayRef read-only by default. (#6444 ) Sebastian Messmer noticed that these iterators were writeable by default, which seemed dangerous. Replaced with const iterators. This doesn't seem to affect any ATen code; seems reasonable enough. Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-04-24 13:30:43 -04:00
Edward Z. Yang	26c53c58a2	Fix ATen .travis.yml setup (#6860 ) - ATen repo now has a new top-level, so Travis script has to be adjusted to (1) be moved to the top-level and (2) cd into the aten directory before doing anything. - Unfortunately, this makes the import script even slower, because I'm banging on the entire index every commit. If anyone has better suggestions for how to twiddle the index. One possibility is to fold the ATen build into the base\ .travis.yml but only activate it when a file is missing (and then filter out that file.) Signed-off-by: Edward Z. Yang <ezyang@fb.com>	2018-04-24 10:07:33 -04:00
onnxbot	21e0fc8fec	[auto] Update onnx to adbfb4a - Fix the ConstantFill spec (#808 ) `adbfb4ad19`	2018-04-24 03:55:51 +00:00
Priya Goyal	7d32f6fdc3	Adding runtime warning for checkpointing inputs to have requires_grad=True (#6883 ) * Adding the warning for the checkpointing inputs to have requires_grad=True * fix bug	2018-04-23 22:43:35 -04:00
Sam Gross	9765bb5f1e	Revert "Fix performance regression of simple indexing cases (#6793 )" (#6886 ) This reverts commit 8a016693c0808ec8353370fd4c48f4049a372b74.	2018-04-23 22:22:12 -04:00
Soumith Chintala	b6ed729cdc	fix memory leak in median (#6889 )	2018-04-23 22:20:03 -04:00
Yangqing Jia	df2817d3b1	Bump benchmark to master (#6878 ) * Bump benchmark to master * add semicolon to BENCHMARK_MAIN	2018-04-23 16:28:08 -07:00
Richard Zou	82a33c32aa	Update device docs (#6887 ) Tell users that one can substitute torch.device with a string	2018-04-23 19:04:20 -04:00
Tongzhou Wang	b5d2d285a8	fix SVD backward on non-square matrices when some=False (#6870 )	2018-04-23 19:01:51 -04:00
Tongzhou Wang	1ee009599c	Add torch.get_default_dtype doc (#6872 ) * add torch.get_default_dtype doc * address comments	2018-04-23 18:58:01 -04:00
anderspapitto	750a323ca1	Work around protobuf issues by importing onnx first (#6833 )	2018-04-23 15:44:04 -07:00
Bram Wasti	aa56a1211d	Update from facebook (#6871 ) * Track checkpoint performance in scuba As title. * [C2/CUDA]: fix cross entropy sigmoid with logits when adding log_d_trick, I forgot to add it to the cuda impl; this diff fixes it. * Back out "[caffe2] Unregister MKL fallbacks for NCHW conversions" Original commit changeset: 8918dd40205a Will land after @jongsoo's diff https://phabricator.intern.facebook.com/D7596315 lands * [Easy][C2] Don't add blob to external outputs from output_record if it's already external output As desc. * On Mobile phones, call GlobalInit with no arguments in predictor in case we need to perform initialization FACEBOOK: The QPL logger needs the initialization code. In the past, the initialization code is put in the pipeline calling Caffe2. However, those places become obsolete quickly, as the product teams change places to call Caffe2 from time to time. We also need to track which teams use Caffe2 so that we can put the initialization code there. With this diff, the initialization code is put in the predictor constructor, only enabled for mobile phones. This way, we can always enable QPL logging. Once we do this, we can check how many times Caffe2 inference is called in production, and which models are more popular in production. This way, we can prioritize our effort supporting those models. Will clean up the old code calling the init in the product in a separate diff. * add padding op for sparse length tensor to pad length-based sparse tensor with padding_value * Add conv_op with cudaconvnet engine Add conv_op with cudaconvnet engine * [numa] Fix simple NUMA copy benchmark Move XavierFill into init_net and also compute BW * call roundf (device function) instead of round (host function) * [caffe2_benchmark][observer] Make caffe2_benchmark use its own observer 1. Add ClearGlobalNetObservers() 2. Make caffe2_benchmark use its own observer and observer_reporter * [detectron] Use roundf instead of round in the detectron module ops * allow K larger than number of elements in top k op one use case is to use this op together with PackSegments for sparse tensors, where the number of elements in each slice is not statistically defined. * add ChannelShuffle DNNLOWP op * fixup math_cpu.cc break	2018-04-23 15:01:56 -07:00
mdschatz	aeb91587e5	[caffe2] Fix observer logic in RNN executor. Remove dynamic casts (#6202 ) * Fix observer logic in RNN executor. Remove dynamic casts * Revert to original design	2018-04-23 15:01:00 -07:00
Bram Wasti	548f6e34ab	[caffe2][nomnigraph][fixup][tiny] Remove accidentally included logging (#6880 )	2018-04-23 13:59:55 -07:00
Yinghai Lu	9ed46c615c	[Caffe2] Provide option to initialize the TensorRT engine at Operator constructor time (#6809 ) * Try to have a lazy conversion of onnx-trt * . * Make it work * comments	2018-04-23 13:09:35 -07:00
li-roy	a2f2d6b43f	Add special case for printing dtype for empty int64 tensor (#6869 ) * add special case for printing dtype for empty int64 tensor * add comment	2018-04-23 12:07:59 -07:00
kevinbchen	a02b7c9776	Move main slice logic for easier reuse (#6822 ) Want to reuse this logic for Int8 Slice.	2018-04-23 12:00:56 -07:00
Zachary DeVito	b8ada7380a	Tuple literal and cat support (#6691 ) * Support list and tuple literals: Adds support for [a, b], (a, b) and "a, " * Allow non-tensors to reach emitBuiltinCall, each SugaredValue::call is now responsible for checking the types of its inputs. Add support for calling cat with a tuple to emitBuiltinOp	2018-04-23 10:58:07 -07:00
Qinqing Zheng	90586d925f	[DT] [38/n] Rename add_stop_signal to add_stop_condition (#6825 ) att	2018-04-23 10:39:37 -07:00
onnxbot	a986b85afd	[auto] Update onnx to 3cb4d61 - Extend optimizer passes to recursively descend on GraphProto attributes (#803 ) `3cb4d61387`	2018-04-23 17:05:41 +00:00
James Reed	46b1737255	[ONNX] Switch ONNX peephole optimizers to recursively descend on sub-blocks (#6828 )	2018-04-23 10:01:03 -07:00
Richard Zou	3b63be063e	quick fix for collect_env (#6861 )	2018-04-23 10:33:06 -04:00
Richard Zou	4040164097	Relax collect_env.py tests (#6859 ) This PR makes it so that the collect_env.py tests ignore the most minor number of most version strings. It also bumps the version up to 0.5.0a to fix the CI.	2018-04-23 10:28:41 -04:00
peterjc123	a4dbd37403	[doc] Minor fixes for Windows docs (#6853 )	2018-04-23 13:15:33 +02:00
Jinghui	26ddefbda1	[feature request] [Caffe2] Enable MKLDNN support for inference (#6699 ) * Add operators based-on IDEEP interfaces Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Enable IDEEP as a caffe2 device Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Add test cases for IDEEP ops Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Add IDEEP as a caffe2 submodule Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Skip test cases if no IDEEP support Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Correct cmake options for IDEEP Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Add dependences on ideep libraries Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix issues in IDEEP conv ops and etc. Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Move ideep from caffe2/ideep to caffe2/contrib/ideep Signed-off-by: Gu Jinghui <jinghui.gu@intel.com> * Update IDEEP to fix cmake issue Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Fix cmake issue caused by USE_MKL option Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com> * Correct comments in MKL cmake file Signed-off-by: Gu, Jinghui <jinghui.gu@intel.com>	2018-04-22 21:58:14 -07:00
Yinghai Lu	a16b85facd	[Caffe2] Fix cuda.cmake (#6821 ) * Fix cmake * .	2018-04-22 21:32:18 -07:00
Yinghai Lu	e966f22656	fix typo (#6824 )	2018-04-22 21:32:00 -07:00
Will Feng	e8bdbdaa27	Terminate dataloader workers properly when parent process is SIGKILL'ed (#6779 ) Reopening #6606 with fix for TEST_CUDA import issue on Windows and improvement to how we wait for manager exit in test_manager_unclean_exit. Loop tested on the Windows CI multiple times to make sure this actually fixes the CUDA OOM issue. * Terminate dataloader workers properly when parent process is SIGKILL'ed * Wait for worker processes to finish before shutting down manager process * Add test for checking proper worker exit * cosmetic change * Test only if CUDA exists * Don't call multiprocessing.set_start_method() in Python 2 * import TEST_CUDA only when we are in __main__ * Tune JOIN_TIMEOUT * handle os.getppid() == 0 case * Reset to original JOIN_TIMEOUT * Use WaitForSingleObject() to check parent process status on Windows * Fix TEST_CUDA import * clean up * Check main process only when index_queue.get() times out * Change index_queues to multiprocessing.Queue * Move manager checking logic to watchdog class * Fix bugs in dataloader * Fix TEST_CUDA import issue * Don't import TEST_CUDA from common_nn * Use event to signal manager exit in test * fix lint * Add comments	2018-04-22 23:03:54 -04:00
Richard Zou	7a3c38ab59	Add environment collection script (#6635 ) * Add environment collection script Fixes #6111. This should make it easier for users to report bugs by giving them a script to collect system environment information. Changes include: - Refactor out the environment collecting code from utils.bottleneck - Add script (collect_env.py) - Cleaned up the issues template so that it suggests using the script and is more readable. Testing: added expect tests to go with 4 CI configurations. Whenever one of these configurations gets updated, the test will fail until the test also gets updated. * Expect tests * Update issue template * Fix random space * Minor improvement to issue template; fix expect test * Skip expect test if BUILD_ENVIRONMENT not found; test fix; split off smoke/expect test	2018-04-22 15:18:14 -04:00
peterjc123	56567fe47d	Add documents for Windows (#6653 ) * Add Windows doc * some minor fixes * Fix typo * more minor fixes * Fixes on dataloader	2018-04-22 15:18:02 -04:00
Mike Ruberry	7d5c9bff58	Removes (unused) LinearIndexCalcData. (#6791 ) This class as well as several functions using it appear to not be used. This is simply code cleanup. Testing: All tests in test_cuda.py pass.	2018-04-22 13:58:22 -04:00
gchanan	1c7b0c1020	Update version string to 0.5. (#6795 )	2018-04-22 13:57:48 -04:00
Soumith Chintala	50e92a3085	Static linkage for CUDA (#6807 ) * add static linkage option for CUDA libs * add CuFFT linking via fakelink * remove warning for 5.0 cuda architecture	2018-04-22 13:57:17 -04:00
cpuhrsch	a8bdb561b7	Fix reductions on some contiguous tensors where size(dim) == 1 (#6815 )	2018-04-22 13:55:55 -04:00
James Reed	814f791f2b	[JIT][script] Improve error reporting for tuple type mismatch (#6819 ) Previously we would see errors like: variable 'states' previously has type (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor) but is now being assigned to a value of type (Tensor, Tensor, Tensor, Tensor, Tensor, Tensor): since the default case in the diagnostic printout was "Tensor". This adds a virtual member function to each Type class that returns a human-readable string for better error reporting * Improve error reporting for tuple type mismatch * Add better Tensor printout	2018-04-22 13:54:52 -04:00
Tongzhou Wang	95d0e9aaa2	[docs] Update set_default_(tensor_\|d)type docs (#6843 ) * update set_default_(tensor_\|d)type docs * make ndarray display nicer	2018-04-22 13:44:20 -04:00
bddppq	0d0dcde5a8	Fix caffe2 eigen + cuda9 windows build (#6746 )	2018-04-22 09:36:09 -07:00
onnxbot	4e8e13d90c	[auto] Update onnx to bf00ae6 - Kezhan/update ml op spec (#799 ) `bf00ae6118`	2018-04-21 22:34:34 +00:00
li-roy	d564ecb4a5	Update docs with new tensor repr (#6454 ) * Update docs with new tensor repr * remove cuda in dtype * remove changes to gloo submodule * [docs] document tensor.new_* ctor * [docs] Add docs for tensor.to(), tensor.float(), etc * [docs] Moar examples for docs. * [docs] Warning for tensor ctor copy behavior * Quick fix * [docs] Document requires_grad_() * [docs] Add example for requires_grad_() * update slogdet and fft update tensor rst * small fixes * update some docs * additional doc changes * update torch and tensor docs * finish changing tensor docs * fix flake8 * slogdet with negative det * Update functional.py tensor ctors * Fix nll_loss docs * reorder to move device up * torch.LongTensor -> torch.tensor or torch.empty in docs * update tensor constructors in docs * change tensor constructors * change constructors * change more Tensor() to tensor() * Show requires_grads_ docs * Fix set_default_dtype docs * Update docs with new tensor repr * remove cuda in dtype * remove changes to gloo submodule * [docs] document tensor.new_* ctor * [docs] Add docs for tensor.to(), tensor.float(), etc * [docs] Moar examples for docs. * [docs] Warning for tensor ctor copy behavior * Quick fix * [docs] Document requires_grad_() * [docs] Add example for requires_grad_() * update slogdet and fft update tensor rst * small fixes * update some docs * additional doc changes * update torch and tensor docs * finish changing tensor docs * fix flake8 * slogdet with negative det * Update functional.py tensor ctors * Fix nll_loss docs * reorder to move device up * torch.LongTensor -> torch.tensor or torch.empty in docs * update tensor constructors in docs * change tensor constructors * change constructors * change more Tensor() to tensor() * Show requires_grads_ docs * Fix set_default_dtype docs * Link to torch.no_grad, etc, from torch doc * Add dtype aliases to table * regen docs again * Tensor attributes stub page * link to inplace sampling * Link torch.dtype, device, and layout * fix dots after nonfinite floats * better layout docs	2018-04-21 07:35:37 -04:00
Xiaomeng Yang	34fa355f27	[caffe2] Add Moments to math (#6798 ) * Add gpu check for reduce_max * Add Moments in math * Update cpu version to avoid int type to be 0 * Update Moments on CPU to same as GPU	2018-04-21 01:03:44 -07:00
onnxbot	5945f3a7b4	[auto] Update onnx to e3da0f9 - Fix some checks not ideal to onnx-ml (#781 ) `e3da0f9bab`	2018-04-21 03:28:57 +00:00
bddppq	7b6b7d4575	Mark schema registration helper variables as unused (#6799 )	2018-04-20 19:57:42 -07:00
Yangqing Jia	8b28ab4858	Add option cache to speed up cmake build (#6737 ) * Add option cache to speed up cmake build * Also only run autogen_init_py_files once	2018-04-20 19:55:39 -07:00
li-roy	34edd6f12e	fix sparse tensor print (#6829 )	2018-04-20 19:39:52 -07:00
gchanan	8a434d9554	Print integral floating point numbers as X. instead of X.0000. (#6812 )	2018-04-20 21:26:21 -04:00
peterjc123	8fc11748fe	Fix debug build for Windows (#6758 ) * Fix debug build for Windows * Fix for wrong placement * Fix variable name	2018-04-20 21:02:18 -04:00
gchanan	a568b91a5d	[docs] Add missing device parameters to factories, refer to dtypes as data types rather than types. (#6803 )	2018-04-20 21:01:16 -04:00
gchanan	516f067641	InputBuffers should AutoGPU for accumulation. (#6826 )	2018-04-20 20:15:51 -04:00
Dr. Kashif Rasul	6c8f0ef33b	fixed error message (#6820 )	2018-04-20 20:14:10 -04:00
onnxbot	9b37a4d027	[auto] Update onnx to 4890619 - Remove debug string (#798 ) `48906190e6`	2018-04-20 23:39:11 +00:00
onnxbot	356af0c195	[auto] Update onnx to 2f7c284 - Use ONNX_NAMESPACE::to_string instead of std::to_string (#797 ) `2f7c284e57`	2018-04-20 23:28:21 +00:00
onnxbot	afea133113	[auto] Update onnx to b20fae0 - Add newline at the end (#795 ) `b20fae0287`	2018-04-20 23:24:08 +00:00
Lu Fang	db540c9e7b	Fix the bug in fb devgpu setup script (#6823 ) * Update onnx_c2_setup.sh * More fix	2018-04-20 15:15:41 -07:00
onnxbot	41bb1d56a7	[auto] Update onnx to f5496b2 - Update the remainig cases (#794 ) `f5496b2c74`	2018-04-20 21:06:51 +00:00
onnxbot	02544f4472	[auto] Update onnx to 7d1e102 - change the inference context api to use TypeProto (#779 ) `7d1e102e73`	2018-04-20 20:05:40 +00:00
Fritz Obermeyer	1d51dd8665	[distributions] Fix Independent.rsample() and add more tests (#6806 )	2018-04-20 21:55:39 +02:00
Bram Wasti	12e07ca731	[caffe2][nomnigraph] Add binary split algorithm to Algorithms.h (#6689 )	2018-04-20 11:49:17 -07:00
Bram Wasti	a73b3fd1f0	[caffe2][opencl] Add OpenCL context (#6777 )	2018-04-20 11:31:21 -07:00
Lu Fang	8a15bc4c9c	Fix the ONNX exporter API (#6788 )	2018-04-20 09:10:38 -07:00
onnxbot	188b6e9346	[auto] Update onnx to 6953eff - some cleanups to shape inference impls (#771 ) `6953eff49a`	2018-04-20 16:05:40 +00:00
Lu Fang	c286efb442	Quick patch for the CI (#6802 )	2018-04-20 08:58:38 -07:00
onnxbot	378f742792	[auto] Update onnx to 8dafe88 - Remove incorrect cases (#791 ) `8dafe88901`	2018-04-20 15:36:16 +00:00
Teng Li	3e2891b27a	Let Gloo close socket, destroy() not needed for non-NCCL backend (#6787 )	2018-04-19 23:52:12 -07:00
James Reed	ef76e24f60	[JIT][script][ONNX] ScriptModule ONNX export + ONNX export for control flow nodes (#6608 ) * ScriptModule ONNX export * ScriptModule ONNX export * Export for control flow nodes * Add pretty-print capability for ONNX export testing * Update tests and handling of mutliple GraphProto names * Maybe bugfix? * factor out code from export and pretty print	2018-04-19 23:45:03 -07:00
onnxbot	945cb0fabc	[auto] Update onnx to 45be0fe - Fix shadow-compatible-local compiler warning (#789 ) `45be0fe736`	2018-04-20 05:02:50 +00:00
Yinghai Lu	d695624efe	More trt tests (#6782 )	2018-04-19 21:53:49 -07:00
onnxbot	503be98d61	[auto] Update onnx to d01e4af - update the test cases (#788 ) `d01e4afc4e`	2018-04-20 04:35:38 +00:00
Zachary DeVito	c420297545	[jit][script] Constants python int now turn into Long (#6728 ) This matches the behavior or literals.	2018-04-19 21:33:29 -07:00
xkszltl	7e1c5ca6d5	Add missing #include for CAFFE2_MODULE macro. (#6790 )	2018-04-19 20:46:09 -07:00

4206 changed files with 319828 additions and 146246 deletions

									
										974

.circleci/config.yml
									
										Normal file
									
												View File
												
				@ -0,0 +1,974 @@

				docker_config_defaults: &docker_config_defaults

				  user: jenkins

				  aws_auth:

				    # This IAM user only allows read-only access to ECR

				    aws_access_key_id: AKIAJ2J6FIG5OSZTQ3IA

				    aws_secret_access_key: ${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_ONLY}

				# NOTE: We only perform the merge in build step and not in test step, because

				# all source files will be shared from build to test

				merge_pull_request_onto_master: &merge_pull_request_onto_master

				  name: Merge Onto Master

				  no_output_timeout: "10h"

				  command: |

				    if [[ "${CIRCLE_BRANCH}" != "master" ]]; then

				      git config --global user.email "circleci.ossci@gmail.com"

				      git config --global user.name "CircleCI"

				      git config remote.origin.url https://github.com/pytorch/pytorch.git

				      git config --add remote.origin.fetch +refs/heads/master:refs/remotes/origin/master

				      git fetch --tags --progress https://github.com/pytorch/pytorch.git +refs/heads/master:refs/remotes/origin/master --depth=50 --quiet

				      export GIT_MERGE_TARGET=`git log -n 1 --pretty=format:"%H" origin/master`

				      echo "GIT_MERGE_TARGET: " ${GIT_MERGE_TARGET}

				      export GIT_COMMIT=${CIRCLE_SHA1}

				      echo "GIT_COMMIT: " ${GIT_COMMIT}

				      git checkout -f ${GIT_COMMIT}

				      git reset --hard ${GIT_COMMIT}

				      git merge --no-edit --no-ff ${GIT_MERGE_TARGET}

				    fi

				pytorch_linux_cpu_build_test_defaults: &pytorch_linux_cpu_build_test_defaults

				  resource_class: large

				  working_directory: /var/lib/jenkins/workspace

				  steps:

				  - checkout

				  - run:

				      <<: *merge_pull_request_onto_master

				  - run:

				      name: Build

				      no_output_timeout: "10h"

				      command: |

				        export IN_CIRCLECI=1

				        export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				        export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`

				        export MEMORY_LIMIT_MAX_JOBS=8  # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM

				        export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))

				        # This IAM user allows write access to S3 bucket for sccache

				        export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA

				        export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}

				        git submodule sync && git submodule update --init

				        .jenkins/pytorch/build.sh

				        .jenkins/pytorch/test.sh

				pytorch_linux_build_defaults: &pytorch_linux_build_defaults

				  resource_class: large

				  working_directory: /var/lib/jenkins/workspace

				  steps:

				  - checkout

				  - run:

				      <<: *merge_pull_request_onto_master

				  - run:

				      name: Build

				      no_output_timeout: "10h"

				      command: |

				        export IN_CIRCLECI=1

				        export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				        if [ -n "${CUDA_VERSION}" ]; then

				          export TORCH_CUDA_ARCH_LIST=5.2

				        fi

				        export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`

				        export MEMORY_LIMIT_MAX_JOBS=8  # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM

				        export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))

				        # This IAM user allows write access to S3 bucket for sccache

				        export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA

				        export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}

				        git submodule sync && git submodule update --init

				        .jenkins/pytorch/build.sh

				        export PYTORCH_CI_ENV_DIR=/var/lib/jenkins/pytorch-ci-env

				        mkdir -p ${PYTORCH_CI_ENV_DIR}

				        cp -r /var/lib/jenkins/workspace ${PYTORCH_CI_ENV_DIR}/build_workspace  # This copies all source files from build step to the next step

				        cp -r /opt/conda/lib/python${PYTHON_VERSION}/site-packages/torch ${PYTORCH_CI_ENV_DIR}/torch

				        cp -r build/bin ${PYTORCH_CI_ENV_DIR}/cpp_test_bin

				        if [ -d "../cpp-build" ]; then

				          cp -r ../cpp-build ${PYTORCH_CI_ENV_DIR}/cpp-build

				        fi

				  - persist_to_workspace:

				      root: /var/lib/jenkins/pytorch-ci-env

				      paths:

				        - "*"

				pytorch_linux_test_defaults: &pytorch_linux_test_defaults

				  machine:

				    image: default

				  steps:

				  - run:

				      name: Prepare workspace

				      command: |

				        sudo mkdir -p /opt/workspace

				        sudo chmod -R 777 /opt/workspace

				  - attach_workspace:

				      at: /opt/workspace

				  - run:

				      name: Build

				      no_output_timeout: "10h"

				      command: |

				        set -x

				        sudo pip install awscli

				        if [ -n "${CUDA_VERSION}" ]; then

				          curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

				          echo "deb https://nvidia.github.io/libnvidia-container/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				          echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				          echo "deb https://nvidia.github.io/nvidia-docker/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				        fi

				        sudo apt-get update

				        sudo apt-get remove linux-image-generic linux-headers-generic linux-generic

				        sudo apt-get install linux-headers-$(uname -r)

				        sudo apt-get install linux-image-generic

				        if [ -n "${CUDA_VERSION}" ]; then

				          wget 'https://s3.amazonaws.com/ossci-linux/nvidia_driver/NVIDIA-Linux-x86_64-396.26.run'

				          sudo /bin/bash ./NVIDIA-Linux-x86_64-396.26.run -s --no-drm

				          sudo apt-get install -y nvidia-docker2

				        fi

				        sudo pkill -SIGHUP dockerd

				        if [ -n "${CUDA_VERSION}" ]; then

				          nvidia-smi

				        fi

				        # This IAM user only allows read-only access to ECR

				        export AWS_ACCESS_KEY_ID=AKIAJ2J6FIG5OSZTQ3IA

				        export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_ONLY}

				        eval $(aws ecr get-login --region us-east-1 --no-include-email)

				        docker pull ${DOCKER_IMAGE}

				        if [ -n "${CUDA_VERSION}" ]; then

				          id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				        else

				          id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				        fi

				        pwd

				        cp -r /opt/workspace/build_workspace/. /home/circleci/project  # This copies all source files from build step to the current step

				        echo "declare -x IN_CIRCLECI=1" > /home/circleci/project/env

				        echo "declare -x PYTHON_VERSION=${PYTHON_VERSION}" >> /home/circleci/project/env

				        echo "declare -x SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> /home/circleci/project/env

				        # This IAM user allows write access to S3 bucket for sccache

				        echo "declare -x AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA" >> /home/circleci/project/env

				        echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}" >> /home/circleci/project/env

				        mkdir -p /home/circleci/project/build

				        cp -r /opt/workspace/cpp_test_bin /home/circleci/project/build/bin

				        docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"

				        echo "mkdir -p /opt/conda/lib/python${PYTHON_VERSION}/site-packages" | docker exec -u jenkins -i "$id" bash

				        docker cp "/opt/workspace/torch" "$id:/opt/conda/lib/python${PYTHON_VERSION}/site-packages/torch"

				        if [ -d "/opt/workspace/cpp-build" ]; then

				          docker cp "/opt/workspace/cpp-build" "$id:/var/lib/jenkins/cpp-build"

				        fi

				        if [ -n "${MULTI_GPU}" ]; then

				          (echo "source ./workspace/env" && echo 'sudo chown -R jenkins workspace /opt/conda/lib/python${PYTHON_VERSION}/site-packages/torch && cd workspace && .jenkins/pytorch/multigpu-test.sh') | docker exec -u jenkins -i "$id" bash

				        else

				          (echo "source ./workspace/env" && echo 'sudo chown -R jenkins workspace /opt/conda/lib/python${PYTHON_VERSION}/site-packages/torch && cd workspace && .jenkins/pytorch/test.sh') | docker exec -u jenkins -i "$id" bash

				        fi

				caffe2_linux_build_defaults: &caffe2_linux_build_defaults

				  resource_class: large

				  working_directory: /var/lib/jenkins/workspace

				  steps:

				  - checkout

				  - run:

				      <<: *merge_pull_request_onto_master

				  - run:

				      name: Build

				      no_output_timeout: "10h"

				      command: |

				        export IN_CIRCLECI=1

				        export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				        # This IAM user allows write access to S3 bucket for sccache

				        export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA

				        export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}

				        export SCCACHE_MAX_JOBS=`expr $(nproc) - 1`

				        export MEMORY_LIMIT_MAX_JOBS=8  # the "large" resource class on CircleCI has 32 CPU cores, if we use all of them we'll OOM

				        export MAX_JOBS=$(( ${SCCACHE_MAX_JOBS} > ${MEMORY_LIMIT_MAX_JOBS} ? ${MEMORY_LIMIT_MAX_JOBS} : ${SCCACHE_MAX_JOBS} ))

				        set -ex

				        # Need to checkout fetch PRs for onnxbot tracking PRs

				        git submodule update --init third_party/onnx || true

				        cd third_party/onnx && git fetch --tags --progress origin +refs/pull/*:refs/remotes/origin/pr/* && cd -

				        # Reinitialize submodules

				        git submodule sync && git submodule update --init --recursive

				        # Ensure jenkins can write to the ccache root dir.

				        sudo chown jenkins:jenkins "${HOME}/.ccache"

				        # Make ccache log to the workspace, so we can archive it after the build

				        mkdir -p build

				        ccache -o log_file=$PWD/build/ccache.log

				        # Configure additional cmake arguments

				        cmake_args=()

				        cmake_args+=("$CMAKE_ARGS")

				        if [[ $BUILD_ENVIRONMENT == *aten* ]]; then

				          cmake_args+=("-DBUILD_ATEN=ON")

				        fi

				        # conda must be added to the path for Anaconda builds (this location must be

				        # the same as that in install_anaconda.sh used to build the docker image)

				        if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				          export PATH=/opt/conda/bin:$PATH

				          sudo chown -R jenkins:jenkins '/opt/conda'

				        fi

				        # Build

				        if test -x ".jenkins/caffe2/build.sh"; then

				          ./.jenkins/caffe2/build.sh ${cmake_args[@]}

				        else

				          ./.jenkins/build.sh ${cmake_args[@]}

				        fi

				        # Show sccache stats if it is running

				        if pgrep sccache > /dev/null; then

				          sccache --show-stats

				        fi

				        # Copy all necessary binaries to shared workspace

				        export CAFFE2_CI_ENV_DIR=/var/lib/jenkins/caffe2-ci-env

				        mkdir -p ${CAFFE2_CI_ENV_DIR}

				        cp -r /var/lib/jenkins/workspace ${CAFFE2_CI_ENV_DIR}/build_workspace  # This copies all source files from build step to the next step

				        cp -r third_party/onnx ${CAFFE2_CI_ENV_DIR}/onnx

				        if [ -d "/usr/local/caffe2" ]; then

				          cp -r /usr/local/caffe2 ${CAFFE2_CI_ENV_DIR}/caffe2

				        fi

				        if [ -d "/opt/conda" ]; then

				          cp -r /opt/conda ${CAFFE2_CI_ENV_DIR}/conda_env

				        fi

				  - persist_to_workspace:

				      root: /var/lib/jenkins/caffe2-ci-env

				      paths:

				        - "*"

				caffe2_linux_test_defaults: &caffe2_linux_test_defaults

				  machine:

				    image: default

				  steps:

				  - run:

				      name: Prepare workspace

				      command: |

				        sudo mkdir -p /opt/workspace

				        sudo chmod -R 777 /opt/workspace

				  - attach_workspace:

				      at: /opt/workspace

				  - run:

				      name: Build

				      no_output_timeout: "10h"

				      command: |

				        set -x

				        sudo pip install awscli

				        if [ -n "${CUDA_VERSION}" ]; then

				          curl -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add -

				          echo "deb https://nvidia.github.io/libnvidia-container/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				          echo "deb https://nvidia.github.io/nvidia-container-runtime/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				          echo "deb https://nvidia.github.io/nvidia-docker/ubuntu14.04/amd64 /" | sudo tee -a /etc/apt/sources.list.d/nvidia-docker.list

				        fi

				        sudo apt-get update

				        sudo apt-get remove linux-image-generic linux-headers-generic linux-generic

				        sudo apt-get install linux-headers-$(uname -r)

				        sudo apt-get install linux-image-generic

				        if [ -n "${CUDA_VERSION}" ]; then

				          wget 'https://s3.amazonaws.com/ossci-linux/nvidia_driver/NVIDIA-Linux-x86_64-396.26.run'

				          sudo /bin/bash ./NVIDIA-Linux-x86_64-396.26.run -s --no-drm

				          sudo apt-get install -y nvidia-docker2

				        fi

				        sudo pkill -SIGHUP dockerd

				        if [ -n "${CUDA_VERSION}" ]; then

				          nvidia-smi

				        fi

				        # This IAM user only allows read-only access to ECR

				        export AWS_ACCESS_KEY_ID=AKIAJ2J6FIG5OSZTQ3IA

				        export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_ECR_READ_ONLY}

				        eval $(aws ecr get-login --region us-east-1 --no-include-email)

				        docker pull ${DOCKER_IMAGE}

				        if [ -n "${CUDA_VERSION}" ]; then

				          id=$(docker run --runtime=nvidia -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				        else

				          id=$(docker run -t -d -w /var/lib/jenkins ${DOCKER_IMAGE})

				        fi

				        pwd

				        cp -r /opt/workspace/build_workspace/. /home/circleci/project  # This copies all source files from build step to the current step

				        echo "declare -x IN_CIRCLECI=1" > /home/circleci/project/env

				        echo "declare -x SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2" >> /home/circleci/project/env

				        # This IAM user allows write access to S3 bucket for sccache

				        echo "declare -x AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA" >> /home/circleci/project/env

				        echo "declare -x AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}" >> /home/circleci/project/env

				        echo "declare -x BUILD_ENVIRONMENT=${BUILD_ENVIRONMENT}" >> /home/circleci/project/env

				        # TODO: merge this into Caffe2 build.sh

				        cat >/home/circleci/project/ci_build_script.sh <<EOL

				        # =================== The following code will be executed inside Docker container ===================

				        set -ex

				        # libdc1394 (dependency of OpenCV) expects /dev/raw1394 to exist...

				        sudo ln /dev/null /dev/raw1394

				        # Hotfix, use hypothesis 3.44.6 on Ubuntu 14.04

				        # See comments on https://github.com/HypothesisWorks/hypothesis-python/commit/eadd62e467d6cee6216e71b391951ec25b4f5830

				        if [[ "$BUILD_ENVIRONMENT" == *ubuntu14.04* ]]; then

				          sudo pip uninstall -y hypothesis

				          # "pip install hypothesis==3.44.6" from official server is unreliable on CircleCI, so we host a copy on S3 instead

				          sudo pip install attrs -f https://s3.amazonaws.com/ossci-linux/wheels/attrs-18.1.0-py2.py3-none-any.whl

				          sudo pip install coverage -f https://s3.amazonaws.com/ossci-linux/wheels/coverage-4.5.1-cp36-cp36m-macosx_10_12_x86_64.whl

				          sudo pip install hypothesis -f https://s3.amazonaws.com/ossci-linux/wheels/hypothesis-3.44.6-py3-none-any.whl

				        fi

				        # conda must be added to the path for Anaconda builds (this location must be

				        # the same as that in install_anaconda.sh used to build the docker image)

				        if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				          export PATH=/opt/conda/bin:$PATH

				        fi

				        pip install --user -b /tmp/pip_install_onnx "file:///var/lib/jenkins/workspace/third_party/onnx#egg=onnx"

				        pip install --user future

				        # Build

				        if test -x ".jenkins/caffe2/test.sh"; then

				          ./.jenkins/caffe2/test.sh

				        else

				          ./.jenkins/test.sh

				        fi

				        # Remove benign core dumps.

				        # These are tests for signal handling (including SIGABRT).

				        rm -f ./crash/core.fatal_signal_as.*

				        rm -f ./crash/core.logging_test.*

				        # =================== The above code will be executed inside Docker container ===================

				        EOL

				        chmod +x /home/circleci/project/ci_build_script.sh

				        docker cp /home/circleci/project/. "$id:/var/lib/jenkins/workspace"

				        if [ -d "/opt/workspace/caffe2" ]; then

				          echo "mkdir -p /usr/local/caffe2" | docker exec -u jenkins -i "$id" bash

				          docker cp /opt/workspace/caffe2/. "$id:/usr/local/caffe2"

				        fi

				        if [ -d "/opt/workspace/conda_env" ]; then

				          echo "sudo mkdir -p /opt/conda" | docker exec -u jenkins -i "$id" bash

				          docker cp /opt/workspace/conda_env/. "$id:/opt/conda"

				        fi

				        docker cp /opt/workspace/onnx/. "$id:/var/lib/jenkins/workspace/third_party/onnx"

				        (echo "source ./workspace/env" && echo 'sudo chown -R jenkins workspace && cd workspace && ./ci_build_script.sh') | docker exec -u jenkins -i "$id" bash

				caffe2_macos_build_defaults: &caffe2_macos_build_defaults

				  macos:

				    xcode: "9.0"

				  steps:

				    - checkout

				    - run:

				        <<: *merge_pull_request_onto_master

				    - run:

				        name: Build

				        no_output_timeout: "10h"

				        command: |

				          set -ex

				          export IN_CIRCLECI=1

				          brew install cmake

				          # Reinitialize submodules

				          git submodule sync && git submodule update --init --recursive

				          # Reinitialize path (see man page for path_helper(8))

				          eval `/usr/libexec/path_helper -s`

				          # Use Homebrew Python if configured to do so

				          if [ "${PYTHON_INSTALLATION}" == "homebrew" ]; then

				            export PATH=/usr/local/opt/python/libexec/bin:/usr/local/bin:$PATH

				          fi

				          pip install numpy

				          # Install Anaconda if we need to

				          if [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				            rm -rf ${TMPDIR}/anaconda

				            curl -o ${TMPDIR}/anaconda.sh "https://repo.continuum.io/archive/Anaconda${ANACONDA_VERSION}-5.0.1-MacOSX-x86_64.sh"

				            /bin/bash ${TMPDIR}/anaconda.sh -b -p ${TMPDIR}/anaconda

				            rm -f ${TMPDIR}/anaconda.sh

				            export PATH="${TMPDIR}/anaconda/bin:${PATH}"

				            source ${TMPDIR}/anaconda/bin/activate

				          fi

				          # Install sccache

				          sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				          sudo chmod +x /usr/local/bin/sccache

				          export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				          # This IAM user allows write access to S3 bucket for sccache

				          export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA

				          export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}

				          export SCCACHE_BIN=${PWD}/sccache_bin

				          mkdir -p ${SCCACHE_BIN}

				          if which sccache > /dev/null; then

				            printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${SCCACHE_BIN}/clang++"

				            chmod a+x "${SCCACHE_BIN}/clang++"

				            printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${SCCACHE_BIN}/clang"

				            chmod a+x "${SCCACHE_BIN}/clang"

				            export PATH="${SCCACHE_BIN}:$PATH"

				          fi

				          # Build

				          if [ "${BUILD_IOS:-0}" -eq 1 ]; then

				            scripts/build_ios.sh

				          elif [ -n "${CAFFE2_USE_ANACONDA}" ]; then

				            # All conda build logic should be in scripts/build_anaconda.sh

				            scripts/build_anaconda.sh

				          else

				            scripts/build_local.sh

				          fi

				          # Show sccache stats if it is running

				          if which sccache > /dev/null; then

				            sccache --show-stats

				          fi

				version: 2

				jobs:

				  pytorch_linux_trusty_py2_7_9_build_test:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py2.7.9:238

				        <<: *docker_config_defaults

				    <<: *pytorch_linux_cpu_build_test_defaults

				  pytorch_linux_trusty_py2_7_build_test:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py2.7:238

				        <<: *docker_config_defaults

				    <<: *pytorch_linux_cpu_build_test_defaults

				  pytorch_linux_trusty_py3_5_build_test:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py3.5:238

				        <<: *docker_config_defaults

				    <<: *pytorch_linux_cpu_build_test_defaults

				  pytorch_linux_trusty_py3_6_gcc4_8_build_test:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py3.6-gcc4.8:238

				        <<: *docker_config_defaults

				    <<: *pytorch_linux_cpu_build_test_defaults

				  pytorch_linux_trusty_py3_6_gcc5_4_build_test:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py3.6-gcc5.4:238

				        <<: *docker_config_defaults

				    <<: *pytorch_linux_cpu_build_test_defaults

				  pytorch_linux_trusty_py3_6_gcc7_build_test:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-py3.6-gcc7:238

				        <<: *docker_config_defaults

				    <<: *pytorch_linux_cpu_build_test_defaults

				  pytorch_linux_trusty_pynightly_build_test:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-trusty-pynightly:238

				        <<: *docker_config_defaults

				    <<: *pytorch_linux_cpu_build_test_defaults

				  pytorch_linux_xenial_py3_clang5_asan_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:238

				        <<: *docker_config_defaults

				    environment:

				      PYTHON_VERSION: "3.6"

				    <<: *pytorch_linux_build_defaults

				  pytorch_linux_xenial_py3_clang5_asan_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-py3-clang5-asan:238"

				      PYTHON_VERSION: "3.6"

				    resource_class: large

				    <<: *pytorch_linux_test_defaults

				  pytorch_linux_xenial_cuda8_cudnn6_py3_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda8-cudnn6-py3:238

				        <<: *docker_config_defaults

				    environment:

				      PYTHON_VERSION: "3.6"

				      CUDA_VERSION: "8"

				    <<: *pytorch_linux_build_defaults

				  pytorch_linux_xenial_cuda8_cudnn6_py3_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda8-cudnn6-py3:238"

				      PYTHON_VERSION: "3.6"

				      CUDA_VERSION: "8"

				    resource_class: gpu.medium

				    <<: *pytorch_linux_test_defaults

				  pytorch_linux_xenial_cuda8_cudnn6_py3_multigpu_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda8-cudnn6-py3:238"

				      PYTHON_VERSION: "3.6"

				      CUDA_VERSION: "8"

				      MULTI_GPU: "1"

				    resource_class: gpu.large

				    <<: *pytorch_linux_test_defaults

				  pytorch_linux_xenial_cuda9_cudnn7_py2_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py2:238

				        <<: *docker_config_defaults

				    environment:

				      PYTHON_VERSION: "2.7"

				      CUDA_VERSION: "9"

				    <<: *pytorch_linux_build_defaults

				  pytorch_linux_xenial_cuda9_cudnn7_py2_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py2:238"

				      PYTHON_VERSION: "2.7"

				      CUDA_VERSION: "9"

				    resource_class: gpu.medium

				    <<: *pytorch_linux_test_defaults

				  pytorch_linux_xenial_cuda9_cudnn7_py3_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:238

				        <<: *docker_config_defaults

				    environment:

				      PYTHON_VERSION: "3.6"

				      CUDA_VERSION: "9"

				    <<: *pytorch_linux_build_defaults

				  pytorch_linux_xenial_cuda9_cudnn7_py3_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9-cudnn7-py3:238"

				      PYTHON_VERSION: "3.6"

				      CUDA_VERSION: "9"

				    resource_class: gpu.medium

				    <<: *pytorch_linux_test_defaults

				  pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7:238

				        <<: *docker_config_defaults

				    environment:

				      PYTHON_VERSION: "3.6"

				      CUDA_VERSION: "9.2"

				    <<: *pytorch_linux_build_defaults

				  pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/pytorch/pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7:238"

				      PYTHON_VERSION: "3.6"

				      CUDA_VERSION: "9.2"

				    resource_class: gpu.medium

				    <<: *pytorch_linux_test_defaults

				  pytorch_macos_10_13_py3_build:

				    macos:

				      xcode: "9.0"

				    steps:

				      - checkout

				      - run:

				          <<: *merge_pull_request_onto_master

				      - run:

				          name: Build

				          environment:

				            BUILD_ENVIRONMENT: pytorch-macos-10.13-py3

				          no_output_timeout: "10h"

				          command: |

				            set -ex

				            export IN_CIRCLECI=1

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				            # This IAM user allows write access to S3 bucket for sccache

				            export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}

				            git submodule sync && git submodule update --init

				            chmod a+x .jenkins/pytorch/macos-build.sh

				            .jenkins/pytorch/macos-build.sh

				            # TODO: need to share source files from build to test, when macOS builds are enabled

				      - persist_to_workspace:

				          root: /Users/distiller/pytorch-ci-env

				          paths:

				            - "*"

				  pytorch_macos_10_13_py3_test:

				    macos:

				      xcode: "9.0"

				    steps:

				      - run:

				          name: Prepare workspace

				          command: |

				            sudo mkdir -p /Users/distiller/pytorch-ci-env

				            sudo chmod -R 777 /Users/distiller/pytorch-ci-env

				      - attach_workspace:

				          at: /Users/distiller/pytorch-ci-env

				      - run:

				          name: Build

				          environment:

				            BUILD_ENVIRONMENT: pytorch-macos-10.13-py3

				          no_output_timeout: "10h"

				          command: |

				            # TODO: need to share source files from build to test, when macOS builds are enabled

				            set -ex

				            export IN_CIRCLECI=1

				            chmod a+x .jenkins/pytorch/macos-test.sh

				            .jenkins/pytorch/macos-test.sh

				  pytorch_macos_10_13_cuda9_2_cudnn7_py3_build:

				    macos:

				      xcode: "9.0"

				    steps:

				      - checkout

				      - run:

				          <<: *merge_pull_request_onto_master

				      - run:

				          name: Build

				          environment:

				            JOB_BASE_NAME: pytorch-macos-10.13-cuda9.2-cudnn7-py3-build

				            BUILD_ENVIRONMENT: pytorch-macos-10.13-cuda9.2-cudnn7-py3

				          no_output_timeout: "10h"

				          command: |

				            set -ex

				            export IN_CIRCLECI=1

				            # Install CUDA 9.2

				            sudo rm -rf ~/cuda_9.2.64_mac_installer.app || true

				            curl https://s3.amazonaws.com/ossci-macos/cuda_9.2.64_mac_installer.zip -o ~/cuda_9.2.64_mac_installer.zip

				            unzip ~/cuda_9.2.64_mac_installer.zip -d ~/

				            sudo ~/cuda_9.2.64_mac_installer.app/Contents/MacOS/CUDAMacOSXInstaller --accept-eula --no-window

				            sudo cp /usr/local/cuda/lib/libcuda.dylib /Developer/NVIDIA/CUDA-9.2/lib/libcuda.dylib

				            sudo rm -rf /usr/local/cuda || true

				            # Install cuDNN 7.1 for CUDA 9.2

				            curl https://s3.amazonaws.com/ossci-macos/cudnn-9.2-osx-x64-v7.1.tgz -o ~/cudnn-9.2-osx-x64-v7.1.tgz

				            rm -rf ~/cudnn-9.2-osx-x64-v7.1 && mkdir ~/cudnn-9.2-osx-x64-v7.1

				            tar -xzvf ~/cudnn-9.2-osx-x64-v7.1.tgz -C ~/cudnn-9.2-osx-x64-v7.1

				            sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/include/

				            sudo cp ~/cudnn-9.2-osx-x64-v7.1/cuda/lib/libcudnn* /Developer/NVIDIA/CUDA-9.2/lib/

				            sudo chmod a+r /Developer/NVIDIA/CUDA-9.2/include/cudnn.h /Developer/NVIDIA/CUDA-9.2/lib/libcudnn*

				            # Install sccache

				            sudo curl https://s3.amazonaws.com/ossci-macos/sccache --output /usr/local/bin/sccache

				            sudo chmod +x /usr/local/bin/sccache

				            export SCCACHE_BUCKET=ossci-compiler-cache-circleci-v2

				            # This IAM user allows write access to S3 bucket for sccache

				            export AWS_ACCESS_KEY_ID=AKIAJJZUW4G2ASX5W7KA

				            export AWS_SECRET_ACCESS_KEY=${CIRCLECI_AWS_SECRET_KEY_FOR_SCCACHE_S3_BUCKET}

				            git submodule sync && git submodule update --init

				            chmod a+x .jenkins/pytorch/macos-build.sh

				            .jenkins/pytorch/macos-build.sh

				  caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn6-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      CUDA_VERSION: "8"

				      BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn6-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn6-ubuntu16.04:190"

				      CUDA_VERSION: "8"

				      BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn6-ubuntu16.04"

				    resource_class: gpu.medium

				    <<: *caffe2_linux_test_defaults

				  caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      CUDA_VERSION: "9"

				      BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:190"

				      CUDA_VERSION: "9"

				      BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-ubuntu16.04"

				    resource_class: gpu.medium

				    <<: *caffe2_linux_test_defaults

				  caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      CUDA_VERSION: "9"

				      BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-aten-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-ubuntu16.04:190"

				      CUDA_VERSION: "9"

				      BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-aten-ubuntu16.04"

				    resource_class: gpu.medium

				    <<: *caffe2_linux_test_defaults

				  caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.1-cudnn7-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      CUDA_VERSION: "9.1"

				      BUILD_ENVIRONMENT: "py2-cuda9.1-cudnn7-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.1-cudnn7-ubuntu16.04:190"

				      CUDA_VERSION: "9.1"

				      BUILD_ENVIRONMENT: "py2-cuda9.1-cudnn7-ubuntu16.04"

				    resource_class: gpu.medium

				    <<: *caffe2_linux_test_defaults

				  caffe2_py2_mkl_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-mkl-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-mkl-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_mkl_ubuntu16_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-mkl-ubuntu16.04:190"

				      BUILD_ENVIRONMENT: "py2-mkl-ubuntu16.04"

				    resource_class: large

				    <<: *caffe2_linux_test_defaults

				  caffe2_py2_gcc4_8_ubuntu14_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.8-ubuntu14.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-gcc4.8-ubuntu14.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_gcc4_8_ubuntu14_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.8-ubuntu14.04:190"

				      BUILD_ENVIRONMENT: "py2-gcc4.8-ubuntu14.04"

				    resource_class: large

				    <<: *caffe2_linux_test_defaults

				  caffe2_onnx_py2_gcc5_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc5-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "onnx-py2-gcc5-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_onnx_py2_gcc5_ubuntu16_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc5-ubuntu16.04:190"

				      BUILD_ENVIRONMENT: "onnx-py2-gcc5-ubuntu16.04"

				    resource_class: large

				    <<: *caffe2_linux_test_defaults

				  caffe2_conda2_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/conda2-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "conda2-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_conda2_ubuntu16_04_test:

				    environment:

				      DOCKER_IMAGE: "308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/conda2-ubuntu16.04:190"

				      BUILD_ENVIRONMENT: "conda2-ubuntu16.04"

				    resource_class: large

				    <<: *caffe2_linux_test_defaults

				  caffe2_py2_cuda8_0_cudnn7_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn7-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn7-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_gcc4_9_ubuntu14_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc4.9-ubuntu14.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-gcc4.9-ubuntu14.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_clang3_8_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-clang3.8-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-clang3.8-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_clang3_9_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-clang3.9-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-clang3.9-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_gcc6_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc6-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-gcc6-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_gcc7_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-gcc7-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-gcc7-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_cuda8_0_cudnn7_aten_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda8.0-cudnn7-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-cuda8.0-cudnn7-aten-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_android_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-android-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-android-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_conda3_cuda9_0_cudnn7_ubuntu16_04_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/conda3-cuda9.0-cudnn7-ubuntu16.04:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "conda3-cuda9.0-cudnn7-ubuntu16.04"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_cuda9_0_cudnn7_centos7_build:

				    docker:

				      - image: 308535385114.dkr.ecr.us-east-1.amazonaws.com/caffe2/py2-cuda9.0-cudnn7-centos7:190

				        <<: *docker_config_defaults

				    environment:

				      BUILD_ENVIRONMENT: "py2-cuda9.0-cudnn7-centos7"

				    <<: *caffe2_linux_build_defaults

				  caffe2_py2_ios_macos10_13_build:

				    environment:

				      BUILD_IOS: "1"

				      PYTHON_INSTALLATION: "system"

				      PYTHON_VERSION: "2"

				    <<: *caffe2_macos_build_defaults

				  caffe2_py2_system_macos10_13_build:

				    environment:

				      PYTHON_INSTALLATION: "system"

				      PYTHON_VERSION: "2"

				    <<: *caffe2_macos_build_defaults

				workflows:

				  version: 2

				  build:

				    jobs:

				      # - pytorch_linux_trusty_py2_7_9_build_test

				      # - pytorch_linux_trusty_py2_7_build_test

				      # - pytorch_linux_trusty_py3_5_build_test

				      # - pytorch_linux_trusty_py3_6_gcc4_8_build_test

				      # - pytorch_linux_trusty_py3_6_gcc5_4_build_test

				      # - pytorch_linux_trusty_py3_6_gcc7_build_test

				      # - pytorch_linux_trusty_pynightly_build_test

				      # - pytorch_linux_xenial_py3_clang5_asan_build

				      # - pytorch_linux_xenial_py3_clang5_asan_test:

				      #     requires:

				      #       - pytorch_linux_xenial_py3_clang5_asan_build

				      # - pytorch_linux_xenial_cuda8_cudnn6_py3_build

				      # - pytorch_linux_xenial_cuda8_cudnn6_py3_test:

				      #     requires:

				      #       - pytorch_linux_xenial_cuda8_cudnn6_py3_build

				      # - pytorch_linux_xenial_cuda8_cudnn6_py3_multigpu_test:

				      #     requires:

				      #       - pytorch_linux_xenial_cuda8_cudnn6_py3_build

				      # - pytorch_linux_xenial_cuda9_cudnn7_py2_build

				      # - pytorch_linux_xenial_cuda9_cudnn7_py2_test:

				      #     requires:

				      #       - pytorch_linux_xenial_cuda9_cudnn7_py2_build

				      # - pytorch_linux_xenial_cuda9_cudnn7_py3_build

				      # - pytorch_linux_xenial_cuda9_cudnn7_py3_test:

				      #     requires:

				      #       - pytorch_linux_xenial_cuda9_cudnn7_py3_build

				      # - pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_build

				      # - pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_test:

				      #     requires:

				      #       - pytorch_linux_xenial_cuda9_2_cudnn7_py3_gcc7_build

				      # - pytorch_macos_10_13_py3_build

				      # - pytorch_macos_10_13_py3_test:

				      #     requires:

				      #       - pytorch_macos_10_13_py3_build

				      # - pytorch_macos_10_13_cuda9_2_cudnn7_py3_build

				      # - caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_build

				      # - caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_test:

				      #     requires:

				      #       - caffe2_py2_cuda8_0_cudnn6_ubuntu16_04_build

				      # - caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_build

				      # - caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_test:

				      #     requires:

				      #       - caffe2_py2_cuda9_0_cudnn7_ubuntu16_04_build

				      # - caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_build

				      # - caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_test:

				      #     requires:

				      #       - caffe2_py2_cuda9_0_cudnn7_aten_ubuntu16_04_build

				      # - caffe2_py2_mkl_ubuntu16_04_build

				      # - caffe2_py2_mkl_ubuntu16_04_test:

				      #     requires:

				      #       - caffe2_py2_mkl_ubuntu16_04_build

				      # - caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_build

				      # - caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_test:

				      #     requires:

				      #       - caffe2_py2_cuda9_1_cudnn7_ubuntu16_04_build

				      # - caffe2_py2_gcc4_8_ubuntu14_04_build

				      # - caffe2_py2_gcc4_8_ubuntu14_04_test:

				      #     requires:

				      #       - caffe2_py2_gcc4_8_ubuntu14_04_build

				      # - caffe2_onnx_py2_gcc5_ubuntu16_04_build

				      # - caffe2_onnx_py2_gcc5_ubuntu16_04_test:

				      #     requires:

				      #       - caffe2_onnx_py2_gcc5_ubuntu16_04_build

				      # - caffe2_conda2_ubuntu16_04_build

				      # - caffe2_conda2_ubuntu16_04_test:

				      #     requires:

				      #       - caffe2_conda2_ubuntu16_04_build

				      # - caffe2_py2_cuda8_0_cudnn7_ubuntu16_04_build

				      # - caffe2_py2_gcc4_9_ubuntu14_04_build

				      # - caffe2_py2_clang3_8_ubuntu16_04_build

				      # - caffe2_py2_clang3_9_ubuntu16_04_build

				      # - caffe2_py2_gcc6_ubuntu16_04_build

				      # - caffe2_py2_gcc7_ubuntu16_04_build

				      # - caffe2_py2_cuda8_0_cudnn7_aten_ubuntu16_04_build

				      # - caffe2_py2_android_ubuntu16_04_build

				      # - caffe2_conda3_cuda9_0_cudnn7_ubuntu16_04_build

				      # - caffe2_py2_cuda9_0_cudnn7_centos7_build

				      # - caffe2_py2_ios_macos10_13_build

				      # - caffe2_py2_system_macos10_13_build

4

.clang-format

View File

 @ -37,7 +37,7 @@ BreakAfterJavaFieldAnnotations: false
 BreakStringLiterals: false
 ColumnLimit:     80
 CommentPragmas:  '^ IWYU pragma:'
 CompactNamespaces: true
 CompactNamespaces: false
 ConstructorInitializerAllOnOneLineOrOnePerLine: true
 ConstructorInitializerIndentWidth: 4
 ContinuationIndentWidth: 4
 @ -68,7 +68,7 @@ PenaltyBreakComment: 300
 PenaltyBreakFirstLessLess: 120
 PenaltyBreakString: 1000
 PenaltyExcessCharacter: 1000000
 PenaltyReturnTypeOnItsOwnLine: 200
 PenaltyReturnTypeOnItsOwnLine: 2000000
 PointerAlignment: Left
 ReflowComments:  true
 SortIncludes:    true

51

.clang-tidy Normal file

View File

 @ -0,0 +1,51 @@
 ---
 # NOTE: there must be no spaces before the '-', so put the comma first.
 Checks: '
   *
   ,clang-analyzer-*
   ,modernize-*
   ,-cert-dcl21-cpp
   ,-cert-err58-cpp
   ,-cert-err60-cpp
   ,-clang-diagnostic-*
   ,-cppcoreguidelines-owning-memory
   ,-cppcoreguidelines-pro-bounds-array-to-pointer-decay
   ,-cppcoreguidelines-pro-bounds-constant-array-index
   ,-cppcoreguidelines-pro-type-member-init
   ,-cppcoreguidelines-pro-type-static-cast-downcast
   ,-cppcoreguidelines-pro-type-union-access
   ,-cppcoreguidelines-pro-type-vararg
   ,-cppcoreguidelines-special-member-functions
   ,-fuchsia-*
   ,-google-build-using-namespace
   ,-google-default-arguments
   ,-google-explicit-constructor
   ,-google-readability-braces-around-statements
   ,-google-readability-namespace-comments
   ,-google-readability-todo
   ,-google-runtime-references
   ,-google-runtime-references
   ,-hicpp-braces-around-statements
   ,-hicpp-explicit-conversions
   ,-hicpp-member-init
   ,-hicpp-no-array-decay
   ,-hicpp-signed-bitwise
   ,-hicpp-special-member-functions
   ,-hicpp-vararg
   ,-llvm-header-guard
   ,-llvm-include-order
   ,-llvm-namespace-comment
   ,-misc-unused-parameters
   ,-modernize-make-unique
   ,-modernize-use-default-member-init
   ,-performance-unnecessary-value-param
   ,-readability-braces-around-statements
   ,-readability-else-after-return
   ,-readability-implicit-bool-conversion
   ,-readability-named-parameter
   '
 WarningsAsErrors: ''
 HeaderFilterRegex: 'torch/csrc/'
 AnalyzeTemporaryDtors: false
 CheckOptions:
 ...

1

.gitattributes vendored Normal file

View File

				`@ -0,0 +1 @@`
				`*.bat text eol=crlf`

									
										38

.github/ISSUE_TEMPLATE.md
									
										vendored
									
												View File
												
				@ -1,24 +1,38 @@

				PyTorch GitHub Issues Guidelines

				--------------------------------

				We like to limit our issues to bug reports and feature requests. If you have a question or would like help and support, please visit our forums: https://discuss.pytorch.org/

				If you have a question or would like help and support, please ask at our

				[forums](https://discuss.pytorch.org/).

				If you are submitting a feature request, please preface the title with [feature request].

				If you are submitting a bug report, please fill in the following details.

				## Issue description

				Provide a short description.

				## Code example

				Please try to provide a minimal example to repro the bug.

				Error messages and stack traces are also helpful.

				## System Info

				Please copy and paste the output from our

				[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)

				(or fill out the checklist below manually).

				You can get the script and run it with:

				```

				wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py

				# For security purposes, please check the contents of collect_env.py before running it.

				python collect_env.py

				```

				When submitting a bug report, please include the following information (where relevant):

				- PyTorch or Caffe2:

				- How you installed PyTorch (conda, pip, source):

				- Build command you used (if compiling from source):

				- OS:

				- PyTorch version:

				- How you installed PyTorch (conda, pip, source):

				- Python version:

				- CUDA/cuDNN version:

				- GPU models and configuration:

				- GCC version (if compiling from source):

				- CMake version:

				- Build command you used (if compiling from source):

				- Versions of any other relevant libraries:

				In addition, including the following information will also be very helpful for us to diagnose the problem:

				- A script to reproduce the bug. Please try to provide as minimal of a test case as possible.

				- Error messages and/or stack traces of the bug

				- Context around what you are trying to do

									
										49

.github/ISSUE_TEMPLATE/bug-report.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,49 @@

				---

				name: "\U0001F41B Bug Report"

				about: Submit a bug report to help us improve PyTorch

				---

				## 🐛 Bug

				<!-- A clear and concise description of what the bug is. -->

				## To Reproduce

				Steps to reproduce the behavior:

				1.

				1.

				1.

				<!-- If you have a code sample, error messages, stack traces, please provide it here as well -->

				## Expected behavior

				<!-- A clear and concise description of what you expected to happen. -->

				## Environment

				Please copy and paste the output from our

				[environment collection script](https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py)

				(or fill out the checklist below manually).

				You can get the script and run it with:

				```

				wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py

				# For security purposes, please check the contents of collect_env.py before running it.

				python collect_env.py

				```

				 - PyTorch Version (e.g., 1.0):

				 - OS (e.g., Linux):

				 - How you installed PyTorch (`conda`, `pip`, source):

				 - Build command you used (if compiling from source):

				 - Python version:

				 - CUDA/cuDNN version:

				 - GPU models and configuration:

				 - Any other relevant information:

				## Additional context

				<!-- Add any other context about the problem here. -->

									
										9

.github/ISSUE_TEMPLATE/documentation.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,9 @@

				---

				name: "\U0001F4DA Documentation"

				about: Report an issue related to https://pytorch.org/docs

				---

				## 📚 Documentation

				<!-- A clear and concise description of what content in https://pytorch.org/docs is an issue. If this has to do with the general https://pytorch.org website, please file an issue at https://github.com/pytorch/pytorch.github.io/issues/new/choose instead. If this has to do with https://pytorch.org/tutorials, please file an issue at https://github.com/pytorch/tutorials/issues/new -->

									
										24

.github/ISSUE_TEMPLATE/feature-request.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,24 @@

				---

				name: "\U0001F680Feature Request"

				about: Submit a proposal/request for a new PyTorch feature

				---

				## 🚀 Feature

				<!-- A clear and concise description of the feature proposal -->

				## Motivation

				<!-- Please outline the motivation for the proposal. Is your feature request related to a problem? e.g., I'm always frustrated when [...]. If this is related to another GitHub issue, please link here too -->

				## Pitch

				<!-- A clear and concise description of what you want to happen. -->

				## Alternatives

				<!-- A clear and concise description of any alternative solutions or features you've considered, if any. -->

				## Additional context

				<!-- Add any other context or screenshots about the feature request here. -->

									
										13

.github/ISSUE_TEMPLATE/questions-help-support.md
									
										vendored
									
										Normal file
									
												View File
												
				@ -0,0 +1,13 @@

				---

				name: "❓Questions/Help/Support"

				about: Do you need support? We have resources.

				---

				## ❓ Questions and Help

				### Please note that this issue tracker is not a help form and this issue will be closed.

				We have a set of [listed resources available on the website](https://pytorch.org/resources). Our primary means of support is our discussion forum:

				- [Discussion Forum](https://discuss.pytorch.org/)

109

.gitignore vendored

View File

 @ -8,53 +8,67 @@
 ## PyTorch
 build/
 dist/
 torch.egg-info/
 */**/__pycache__
 aten/build/
 aten/src/ATen/Config.h
 third_party/build/
 torch/version.py
 torch/csrc/generic/TensorMethods.cpp
 torch/lib/*.so*
 torch/lib/*.a*
 torch/lib/*.dll*
 torch/lib/*.lib
 torch/lib/*.dylib*
 torch/lib/*.h
 torch/lib/build
 torch/lib/tmp_install
 torch/lib/include
 torch/lib/torch_shm_manager
 torch/csrc/jit/generated/*
 torch/csrc/autograd/generated/*
 torch/csrc/cudnn/cuDNN.cpp
 torch/csrc/nn/THNN.cwrap
 torch/csrc/nn/THNN.cpp
 torch/csrc/nn/THCUNN.cwrap
 torch/csrc/nn/THCUNN.cpp
 torch/csrc/nn/THNN_generic.cwrap
 torch/csrc/nn/THNN_generic.cpp
 torch/csrc/nn/THNN_generic.h
 torch/csrc/generated
 docs/src/**/*
 test/data/legacy_modules.t7
 test/data/gpu_tensors.pt
 test/htmlcov
 test/.coverage
 .mypy_cache
 */*.pyc
 */*.so*
 */**/__pycache__
 */**/*.dylib*
 */**/*.pyc
 */**/*.pyd
 */**/*.so*
 */**/**/*.pyc
 */**/**/**/*.pyc
 */**/**/**/**/*.pyc
 */*.so*
 */**/*.so*
 */**/*.dylib*
 */**/*.pyd
 aten/build/
 aten/src/ATen/Config.h
 aten/src/ATen/cuda/CUDAConfig.h
 build/
 dist/
 docs/src/**/*
 docs/cpp/build
 docs/cpp/source/api
 test/.coverage
 test/cpp/api/mnist
 test/custom_operator/model.pt
 test/data/gpu_tensors.pt
 test/data/legacy_modules.t7
 test/data/legacy_serialized.pt
 test/data/linear.pt
 .mypy_cache
 test/htmlcov
 test/cpp_extensions/install/
 third_party/build/
 tools/shared/_utils_internal.py
 torch.egg-info/
 torch/csrc/autograd/generated/*
 torch/csrc/cudnn/cuDNN.cpp
 torch/csrc/generated
 torch/csrc/generic/TensorMethods.cpp
 torch/csrc/jit/generated/*
 torch/csrc/jit/fusers/Config.h
 torch/csrc/nn/THCUNN.cpp
 torch/csrc/nn/THCUNN.cwrap
 torch/csrc/nn/THNN_generic.cpp
 torch/csrc/nn/THNN_generic.cwrap
 torch/csrc/nn/THNN_generic.h
 torch/csrc/nn/THNN.cpp
 torch/csrc/nn/THNN.cwrap
 torch/lib/*.a*
 torch/lib/*.dll*
 torch/lib/*.exe*
 torch/lib/*.dylib*
 torch/lib/*.h
 torch/lib/*.lib
 torch/lib/*.so*
 torch/lib/build
 torch/lib/cmake
 torch/lib/include
 torch/lib/pkgconfig
 torch/lib/protoc
 torch/lib/tmp_install
 torch/lib/torch_shm_manager
 torch/lib/python*
 torch/share/
 torch/version.py
 # IPython notebook checkpoints
 .ipynb_checkpoints
 @ -63,6 +77,7 @@ test/data/linear.pt
 *.swn
 *.swo
 *.swp
 *.swm
 *~
 # macOS dir files
 @ -133,10 +148,6 @@ docs/source/scripts/activation_images/
 # PyCharm files
 .idea
 # Visual Studio Code files
 .vscode
 .vs
 # OSX dir files
 .DS_Store
 @ -147,7 +158,7 @@ build
 build_host_protoc
 build_android
 build_ios
 build_*
 /build_*
 .build_debug/*
 .build_release/*
 distribute/*
 @ -187,3 +198,11 @@ caffe2.egg-info
 # Atom/Watchman required file
 .watchmanconfig
 # BEGIN NOT-CLEAN-FILES (setup.py handles this marker. Do not change.)
 #
 # Below files are not deleted by "setup.py clean".
 # Visual Studio Code files
 .vscode
 .vs

31

.gitmodules vendored

View File

 @ -1,28 +1,15 @@
 [submodule "aten/src/ATen/cpu/cpuinfo"]
 	path = aten/src/ATen/cpu/cpuinfo
 	url = https://github.com/Maratyszcza/cpuinfo
 [submodule "aten/src/ATen/cpu/tbb/tbb_remote"]
 	path = aten/src/ATen/cpu/tbb/tbb_remote
 	url = https://github.com/01org/tbb
 	branch = tbb_2018
 [submodule "aten/src/ATen/utils/catch"]
 	path = aten/src/ATen/utils/catch
 [submodule "third_party/catch"]
 	path = third_party/catch
 	url = https://github.com/catchorg/Catch2.git
 [submodule "third_party/nanopb"]
 	path = third_party/nanopb
 	url = https://github.com/nanopb/nanopb.git
 [submodule "third_party/pybind11"]
 	path = third_party/pybind11
 	url = https://github.com/pybind/pybind11.git
 [submodule "third_party/nccl"]
 	path = third_party/nccl
 	url = https://github.com/nvidia/nccl.git
 [submodule "third_party/cub"]
 	path = third_party/cub
 	url = https://github.com/NVlabs/cub.git
 [submodule "third_party/eigen"]
 	path = third_party/eigen
 	url = https://github.com/RLovelett/eigen.git
 	url = https://github.com/eigenteam/eigen-git-mirror.git
 [submodule "third_party/googletest"]
 	path = third_party/googletest
 	url = https://github.com/google/googletest.git
 @ -77,3 +64,15 @@
 [submodule "third_party/onnx"]
 	path = third_party/onnx
 	url = https://github.com/onnx/onnx.git
 [submodule "third_party/cereal"]
 	path = third_party/cereal
 	url = https://github.com/USCiLab/cereal
 [submodule "third_party/onnx-tensorrt"]
 	path = third_party/onnx-tensorrt
 	url = https://github.com/onnx/onnx-tensorrt
 [submodule "third_party/sleef"]
 	path = third_party/sleef
 	url = https://github.com/shibatch/sleef
 [submodule "third_party/ideep"]
 	path = third_party/ideep
 	url = https://github.com/intel/ideep

									
										296

.jenkins/caffe2/build.sh
									
												View File
												
				@ -2,30 +2,51 @@

				set -ex

				pip install --user --no-cache-dir hypothesis==3.59.0

				# The INSTALL_PREFIX here must match up with test.sh

				INSTALL_PREFIX="/usr/local/caffe2"

				LOCAL_DIR=$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)

				ROOT_DIR=$(cd "$LOCAL_DIR"/../.. && pwd)

				CMAKE_ARGS=()

				SCCACHE="$(which sccache)"

				# Setup sccache if SCCACHE_BUCKET is set

				if [ -n "${SCCACHE_BUCKET}" ]; then

				  mkdir -p ./sccache

				if [ "$(which gcc)" != "/root/sccache/gcc" ]; then

				  # Setup SCCACHE

				  ###############################################################################

				  # Setup sccache if SCCACHE_BUCKET is set

				  if [ -n "${SCCACHE_BUCKET}" ]; then

				    mkdir -p ./sccache

				  SCCACHE="$(which sccache)"

				  if [ -z "${SCCACHE}" ]; then

				    echo "Unable to find sccache..."

				    exit 1

				    SCCACHE="$(which sccache)"

				    if [ -z "${SCCACHE}" ]; then

				      echo "Unable to find sccache..."

				      exit 1

				    fi

				    # Setup wrapper scripts

				    for compiler in cc c++ gcc g++ x86_64-linux-gnu-gcc; do

				      (

				        echo "#!/bin/sh"

				        echo "exec $SCCACHE $(which $compiler) \"\$@\""

				      ) > "./sccache/$compiler"

				      chmod +x "./sccache/$compiler"

				    done

				    if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]]; then

				      (

				        echo "#!/bin/sh"

				        echo "exec $SCCACHE $(which nvcc) \"\$@\""

				      ) > "./sccache/nvcc"

				      chmod +x "./sccache/nvcc"

				    fi

				    export CACHE_WRAPPER_DIR="$PWD/sccache"

				    # CMake must find these wrapper scripts

				    export PATH="$CACHE_WRAPPER_DIR:$PATH"

				  fi

				  # Setup wrapper scripts

				  for compiler in cc c++ gcc g++ x86_64-linux-gnu-gcc; do

				    (

				      echo "#!/bin/sh"

				      echo "exec $SCCACHE $(which $compiler) \"\$@\""

				    ) > "./sccache/$compiler"

				    chmod +x "./sccache/$compiler"

				  done

				  # CMake must find these wrapper scripts

				  export PATH="$PWD/sccache:$PATH"

				fi

				# Setup ccache if configured to use it (and not sccache)

				@ -36,30 +57,55 @@ if [ -z "${SCCACHE}" ] && which ccache > /dev/null; then

				  ln -sf "$(which ccache)" ./ccache/gcc

				  ln -sf "$(which ccache)" ./ccache/g++

				  ln -sf "$(which ccache)" ./ccache/x86_64-linux-gnu-gcc

				  export CCACHE_WRAPPER_DIR="$PWD/ccache"

				  export PATH="$CCACHE_WRAPPER_DIR:$PATH"

				  if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]]; then

				    ln -sf "$(which ccache)" ./ccache/nvcc

				  fi

				  export CACHE_WRAPPER_DIR="$PWD/ccache"

				  export PATH="$CACHE_WRAPPER_DIR:$PATH"

				fi

				CMAKE_ARGS=("-DBUILD_BINARY=ON")

				CMAKE_ARGS+=("-DUSE_OBSERVERS=ON")

				CMAKE_ARGS+=("-DUSE_ZSTD=ON")

				# sccache will fail for CUDA builds if all cores are used for compiling

				if [ -z "$MAX_JOBS" ]; then

				  if [[ "${BUILD_ENVIRONMENT}" == *-cuda* ]] && [ -n "${SCCACHE}" ]; then

				    MAX_JOBS=`expr $(nproc) - 1`

				  else

				    MAX_JOBS=$(nproc)

				  fi

				fi

				# Run build script from scripts if applicable

				report_compile_cache_stats() {

				  if [[ -n "${SCCACHE}" ]]; then

				    "$SCCACHE" --show-stats

				  elif which ccache > /dev/null; then

				    ccache -s

				  fi

				}

				###############################################################################

				# Explicitly set Python executable.

				###############################################################################

				# On Ubuntu 16.04 the default Python is still 2.7.

				PYTHON="$(which python)"

				if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then

				  PYTHON=$(which "python${BASH_REMATCH[1]}")

				  CMAKE_ARGS+=("-DPYTHON_EXECUTABLE=${PYTHON}")

				fi

				###############################################################################

				# Use special scripts for Android, conda, and setup builds

				###############################################################################

				if [[ "${BUILD_ENVIRONMENT}" == *-android* ]]; then

				  export ANDROID_NDK=/opt/ndk

				  CMAKE_ARGS+=("-DBUILD_BINARY=ON")

				  CMAKE_ARGS+=("-DBUILD_TEST=ON")

				  CMAKE_ARGS+=("-DUSE_OBSERVERS=ON")

				  CMAKE_ARGS+=("-DUSE_ZSTD=ON")

				  "${ROOT_DIR}/scripts/build_android.sh" ${CMAKE_ARGS[*]} "$@"

				  exit 0

				fi

				if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				  # click (required by onnx) wants these set

				  export LANG=C.UTF-8

				  export LC_ALL=C.UTF-8

				  # SKIP_CONDA_TESTS refers to only the 'test' section of the meta.yaml

				  export SKIP_CONDA_TESTS=1

				  export CONDA_INSTALL_LOCALLY=1

				  "${ROOT_DIR}/scripts/build_anaconda.sh" "$@"

				elif [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				  "${ROOT_DIR}/scripts/build_anaconda.sh" --skip-tests --install-locally "$@"

				  report_compile_cache_stats

				  # This build will be tested against onnx tests, which needs onnx installed.

				  # At this point the visible protbuf installation will be in conda, since one

				@ -67,49 +113,63 @@ if [[ "${BUILD_ENVIRONMENT}" == conda* ]]; then

				  # headers are those in conda as well

				  # This path comes from install_anaconda.sh which installs Anaconda into the

				  # docker image

				  PROTOBUF_INCDIR=/opt/conda/include pip install "${ROOT_DIR}/third_party/onnx"

				  PROTOBUF_INCDIR=/opt/conda/include pip install -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"

				  report_compile_cache_stats

				  exit 0

				fi

				# Run cmake from ./build directory

				mkdir -p ./build

				cd ./build

				INSTALL_PREFIX="/usr/local/caffe2"

				###############################################################################

				# Set cmake args

				###############################################################################

				CMAKE_ARGS+=("-DBUILD_BINARY=ON")

				CMAKE_ARGS+=("-DBUILD_TEST=ON")

				CMAKE_ARGS+=("-DINSTALL_TEST=ON")

				CMAKE_ARGS+=("-DUSE_OBSERVERS=ON")

				CMAKE_ARGS+=("-DUSE_ZSTD=ON")

				CMAKE_ARGS+=("-DCMAKE_INSTALL_PREFIX=${INSTALL_PREFIX}")

				# Explicitly set Python executable.

				# On Ubuntu 16.04 the default Python is still 2.7.

				PYTHON="$(which python)"

				if [[ "${BUILD_ENVIRONMENT}" == py3* ]]; then

				  PYTHON=/usr/bin/python3

				  CMAKE_ARGS+=("-DPYTHON_EXECUTABLE=${PYTHON}")

				if [[ $BUILD_ENVIRONMENT == *mkl* ]]; then

				  CMAKE_ARGS+=("-DBLAS=MKL")

				fi

				if [[ $BUILD_ENVIRONMENT == *cuda* ]]; then

				  CMAKE_ARGS+=("-DUSE_CUDA=ON")

				  CMAKE_ARGS+=("-DCUDA_ARCH_NAME=Maxwell")

				  CMAKE_ARGS+=("-DUSE_NNPACK=OFF")

				case "${BUILD_ENVIRONMENT}" in

				  *-mkl*)

				    CMAKE_ARGS+=("-DBLAS=MKL")

				    ;;

				  *-cuda*)

				    CMAKE_ARGS+=("-DUSE_CUDA=ON")

				    CMAKE_ARGS+=("-DCUDA_ARCH_NAME=Maxwell")

				    CMAKE_ARGS+=("-DUSE_NNPACK=OFF")

				  # Explicitly set path to NVCC such that the symlink to ccache or sccache is used

				  CMAKE_ARGS+=("-DCUDA_NVCC_EXECUTABLE=${CACHE_WRAPPER_DIR}/nvcc")

				    # Add ccache symlink for nvcc

				    ln -sf "$(which ccache)" "${CCACHE_WRAPPER_DIR}/nvcc"

				  # Ensure FindCUDA.cmake can infer the right path to the CUDA toolkit.

				  # Setting PATH to resolve to the right nvcc alone isn't enough.

				  # See /usr/share/cmake-3.5/Modules/FindCUDA.cmake, block at line 589.

				  export CUDA_PATH="/usr/local/cuda"

				    # Explicitly set path to NVCC such that the symlink to ccache is used

				    CMAKE_ARGS+=("-DCUDA_NVCC_EXECUTABLE=${CCACHE_WRAPPER_DIR}/nvcc")

				  # Ensure the ccache symlink can still find the real nvcc binary.

				  export PATH="/usr/local/cuda/bin:$PATH"

				fi

				if [[ $BUILD_ENVIRONMENT == *rocm* ]]; then

				  # TODO: This is patching the official FindHip to properly handly

				  # cmake generator expression. A PR is opened in the upstream repo here:

				  # https://github.com/ROCm-Developer-Tools/HIP/pull/516

				  # remove this hack once it's merged.

				  if [[ -f /opt/rocm/hip/cmake/FindHIP.cmake ]]; then

				    sudo sed -i 's/\ -I${dir}/\ $<$<BOOL:${dir}>:-I${dir}>/' /opt/rocm/hip/cmake/FindHIP.cmake

				  fi

				    # Ensure FindCUDA.cmake can infer the right path to the CUDA toolkit.

				    # Setting PATH to resolve to the right nvcc alone isn't enough.

				    # See /usr/share/cmake-3.5/Modules/FindCUDA.cmake, block at line 589.

				    export CUDA_PATH="/usr/local/cuda"

				  export LANG=C.UTF-8

				  export LC_ALL=C.UTF-8

				  export HCC_AMDGPU_TARGET=gfx900

				    # Ensure the ccache symlink can still find the real nvcc binary.

				    export PATH="/usr/local/cuda/bin:$PATH"

				    ;;

				esac

				  # The link time of libcaffe2_hip.so takes 40 minutes, according to

				  # https://github.com/RadeonOpenCompute/hcc#thinlto-phase-1---implemented

				  # using using ThinLTO could significantly improve link-time performance.

				  export KMTHINLTO=1

				  ########## HIPIFY Caffe2 operators

				  ${PYTHON} "${ROOT_DIR}/tools/amd_build/build_pytorch_amd.py"

				  ${PYTHON} "${ROOT_DIR}/tools/amd_build/build_caffe2_amd.py"

				fi

				# Try to include Redis support for Linux builds

				if [ "$(uname)" == "Linux" ]; then

				@ -123,6 +183,9 @@ if [ "$(uname)" == "Darwin" ]; then

				  CMAKE_ARGS+=("-DBUILD_CUSTOM_PROTOBUF=ON")

				fi

				# Use a speciallized onnx namespace in CI to catch hardcoded onnx namespace

				CMAKE_ARGS+=("-DONNX_NAMESPACE=ONNX_NAMESPACE_FOR_C2_CI")

				# We test the presence of cmake3 (for platforms like Centos and Ubuntu 14.04)

				# and use that if so.

				if [[ -x "$(command -v cmake3)" ]]; then

				@ -131,20 +194,57 @@ else

				    CMAKE_BINARY=cmake

				fi

				# Configure

				${CMAKE_BINARY} "${ROOT_DIR}" ${CMAKE_ARGS[*]} "$@"

				###############################################################################

				# Configure and make

				###############################################################################

				if [[ -z "$INTEGRATED" ]]; then

				  # Run cmake from ./build_caffe2 directory so it doesn't conflict with

				  # standard PyTorch build directory. Eventually these won't need to

				  # be separate.

				  rm -rf build_caffe2

				  mkdir build_caffe2

				  cd ./build_caffe2

				  # Configure

				  ${CMAKE_BINARY} "${ROOT_DIR}" ${CMAKE_ARGS[*]} "$@"

				  # Build

				  if [ "$(uname)" == "Linux" ]; then

				    make "-j${MAX_JOBS}" install

				  else

				    echo "Don't know how to build on $(uname)"

				    exit 1

				  fi

				# Build

				if [ "$(uname)" == "Linux" ]; then

				  make "-j$(nproc)" install

				else

				  echo "Don't know how to build on $(uname)"

				  exit 1

				  # sccache will be stuck if  all cores are used for compiling

				  # see https://github.com/pytorch/pytorch/pull/7361

				  if [[ -n "${SCCACHE}" ]]; then

				    export MAX_JOBS=`expr $(nproc) - 1`

				  fi

				  USE_LEVELDB=1 USE_LMDB=1 USE_OPENCV=1 BUILD_BINARY=1 python setup.py install --user

				  # This is to save test binaries for testing

				  cp -r torch/lib/tmp_install $INSTALL_PREFIX

				  ls $INSTALL_PREFIX

				  report_compile_cache_stats

				fi

				###############################################################################

				# Install ONNX

				###############################################################################

				# Install ONNX into a local directory

				ONNX_INSTALL_PATH="/usr/local/onnx"

				pip install "${ROOT_DIR}/third_party/onnx" -t "${ONNX_INSTALL_PATH}"

				pip install --user -b /tmp/pip_install_onnx "file://${ROOT_DIR}/third_party/onnx#egg=onnx"

				report_compile_cache_stats

				# Symlink the caffe2 base python path into the system python path,

				# so that we can import caffe2 without having to change $PYTHONPATH.

				@ -153,30 +253,30 @@ pip install "${ROOT_DIR}/third_party/onnx" -t "${ONNX_INSTALL_PATH}"

				# This is only done when running on Jenkins!  We don't want to pollute

				# the user environment with Python symlinks and ld.so.conf.d hacks.

				#

				if [ -n "${JENKINS_URL}" ]; then

				  (

				    source /etc/os-release

				if [[ -z "$INTEGRATED" ]]; then

				  if [ -n "${JENKINS_URL}" ]; then

				    (

				      source /etc/os-release

				    function python_version() {

				      "$PYTHON" -c 'import sys; print("python%d.%d" % sys.version_info[0:2])'

				    }

				      function python_version() {

				        "$PYTHON" -c 'import sys; print("python%d.%d" % sys.version_info[0:2])'

				      }

				    # Debian/Ubuntu

				    if [[ "$ID_LIKE" == *debian* ]]; then

				      python_path="/usr/local/lib/$(python_version)/dist-packages"

				      sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"

				      sudo ln -sf "${ONNX_INSTALL_PATH}/onnx" "${python_path}"

				    fi

				      # Debian/Ubuntu

				      if [[ "$ID_LIKE" == *debian* ]]; then

				        python_path="/usr/local/lib/$(python_version)/dist-packages"

				        sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"

				      fi

				    # RHEL/CentOS

				    if [[ "$ID_LIKE" == *rhel* ]]; then

				      python_path="/usr/lib64/$(python_version)/site-packages/"

				      sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"

				      sudo ln -sf "${ONNX_INSTALL_PATH}/onnx" "${python_path}"

				    fi

				      # RHEL/CentOS

				      if [[ "$ID_LIKE" == *rhel* ]]; then

				        python_path="/usr/lib64/$(python_version)/site-packages/"

				        sudo ln -sf "${INSTALL_PREFIX}/caffe2" "${python_path}"

				      fi

				    # /etc/ld.so.conf.d is used on both Debian and RHEL

				    echo "${INSTALL_PREFIX}/lib" | sudo tee /etc/ld.so.conf.d/caffe2.conf

				    sudo ldconfig

				  )

				      # /etc/ld.so.conf.d is used on both Debian and RHEL

				      echo "${INSTALL_PREFIX}/lib" | sudo tee /etc/ld.so.conf.d/caffe2.conf

				      sudo ldconfig

				    )

				  fi

				fi

									
										3

.jenkins/caffe2/dirty.sh
									
												View File
												
				@ -3,4 +3,5 @@ set -ex

				upstream="$1"

				pr="$2"

				git diff --name-only "$upstream" "$pr"

				git diff --name-only "$upstream" "$pr" | grep -Eq '^(CMakeLists.txt|Makefile|.gitmodules|.jenkins/caffe2|binaries|caffe|caffe2|cmake|conda|docker|docs/caffe2|modules|scripts|third_party)'

				# For safety, unconditionally trigger for any changes.

				#git diff --name-only "$upstream" "$pr" | grep -Eq '^(CMakeLists.txt|Makefile|.gitmodules|.jenkins/caffe2|binaries|caffe|caffe2|cmake|conda|docker|docs/caffe2|modules|scripts|third_party)'

									
										105

.jenkins/caffe2/test.sh
									
												View File
												
				@ -8,12 +8,8 @@ TEST_DIR=$ROOT_DIR/caffe2_tests

				# Figure out which Python to use

				PYTHON="python"

				if [ -n "$BUILD_ENVIRONMENT" ]; then

				  if [[ "$BUILD_ENVIRONMENT" == py2* ]]; then

				    PYTHON="python2"

				  elif [[ "$BUILD_ENVIRONMENT" == py3* ]]; then

				    PYTHON="python3"

				  fi

				if [[ "${BUILD_ENVIRONMENT}" =~ py((2|3)\.?[0-9]?\.?[0-9]?) ]]; then

				  PYTHON="python${BASH_REMATCH[1]}"

				fi

				# The prefix must mirror the setting from build.sh

				@ -44,8 +40,6 @@ if [[ "$BUILD_ENVIRONMENT" != conda* ]]; then

				  export LD_LIBRARY_PATH="${LD_LIBRARY_PATH}:${INSTALL_PREFIX}/lib"

				fi

				exit_code=0

				cd "$ROOT_DIR"

				if [ -d $TEST_DIR ]; then

				@ -55,30 +49,55 @@ fi

				mkdir -p $TEST_DIR/{cpp,python}

				cd ${INSTALL_PREFIX}

				if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then

				  export LANG=C.UTF-8

				  export LC_ALL=C.UTF-8

				# Commands below may exit with non-zero status

				set +e

				  # Pin individual runs to specific gpu so that we can schedule

				  # multiple jobs on machines that have multi-gpu.

				  NUM_AMD_GPUS=$(/opt/rocm/bin/rocminfo | grep 'Device Type.*GPU' | wc -l)

				  if (( $NUM_AMD_GPUS == 0 )); then

				      echo >&2 "No AMD GPU detected!"

				      exit 1

				  fi

				  export HIP_VISIBLE_DEVICES=$(($BUILD_NUMBER % $NUM_AMD_GPUS))

				fi

				cd "${WORKSPACE}"

				# C++ tests

				echo "Running C++ tests.."

				for test in $INSTALL_PREFIX/test/*; do

				  # Skip tests we know are hanging or bad

				  case "$(basename "$test")" in

				    mkl_utils_test)

				gtest_reports_dir="${TEST_DIR}/cpp"

				junit_reports_dir="${TEST_DIR}/junit_reports"

				mkdir -p "$gtest_reports_dir" "$junit_reports_dir"

				for test in $(find "${INSTALL_PREFIX}/test" -executable -type f); do

				  case "$test" in

				    # skip tests we know are hanging or bad

				    */mkl_utils_test|*/aten/integer_divider_test)

				      continue

				      ;;

				    # TODO investigate conv_op_test failures when using MKL

				    conv_op_test)

				      continue

				    */scalar_tensor_test|*/basic|*/native_test)

					  if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

						continue

					  else

					    "$test"

					  fi

					  ;;

					*)

				      # Currently, we use a mixture of gtest (caffe2) and Catch2 (ATen). While

				      # planning to migrate to gtest as the common PyTorch c++ test suite, we

				      # currently do NOT use the xml test reporter, because Catch doesn't

				      # support multiple reporters

				      # c.f. https://github.com/catchorg/Catch2/blob/master/docs/release-notes.md#223

				      # which means that enabling XML output means you lose useful stdout

				      # output for Jenkins.  It's more important to have useful console

				      # output than it is to have XML output for Jenkins.

				      # Note: in the future, if we want to use xml test reporter once we switch

				      # to all gtest, one can simply do:

				      # "$test" --gtest_output=xml:"$gtest_reports_dir/$(basename $test).xml"

				      "$test"

				      ;;

				  esac

				  "$test" --gtest_output=xml:"$TEST_DIR"/cpp/$(basename "$test").xml

				  tmp_exit_code="$?"

				  if [ "$exit_code" -eq 0 ]; then

				    exit_code="$tmp_exit_code"

				  fi

				done

				# Get the relative path to where the caffe2 python module was installed

				@ -92,9 +111,29 @@ if [[ "$BUILD_ENVIRONMENT" == *-cuda* ]]; then

				  EXTRA_TESTS+=("$CAFFE2_PYPATH/contrib/nccl")

				fi

				# TODO find out why this breaks for conda builds

				conda_ignore_test=()

				if [[ $BUILD_ENVIRONMENT == conda* ]]; then

				  conda_ignore_test="--ignore $CAFFE2_PYPATH/python/tt_core_test.py"

				  # These tests both assume Caffe2 was built with leveldb, which is not the case

				  conda_ignore_test+=("--ignore $CAFFE2_PYPATH/python/dataio_test.py")

				  conda_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/checkpoint_test.py")

				fi

				rocm_ignore_test=()

				if [[ $BUILD_ENVIRONMENT == *-rocm* ]]; then

				  # Currently these tests are failing on ROCM platform:

				  # Unknown reasons, need to debug

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/arg_ops_test.py")

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/piecewise_linear_transform_test.py")

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/softmax_ops_test.py")

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/unique_ops_test.py")

				  # Need to go through roi ops to replace max(...) with fmaxf(...)

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/roi_align_rotated_op_test.py")

				  # Our cuda top_k op has some asm code, the hipified version doesn't

				  # compile yet, so we don't have top_k operator for now

				  rocm_ignore_test+=("--ignore $CAFFE2_PYPATH/python/operator_test/top_k_test.py")

				fi

				# Python tests

				@ -108,14 +147,14 @@ echo "Running Python tests.."

				  --ignore "$CAFFE2_PYPATH/python/operator_test/matmul_op_test.py" \

				  --ignore "$CAFFE2_PYPATH/python/operator_test/pack_ops_test.py" \

				  --ignore "$CAFFE2_PYPATH/python/mkl/mkl_sbn_speed_test.py" \

				  $conda_ignore_test \

				  ${conda_ignore_test[@]} \

				  ${rocm_ignore_test[@]} \

				  "$CAFFE2_PYPATH/python" \

				  "${EXTRA_TESTS[@]}"

				tmp_exit_code="$?"

				if [ "$exit_code" -eq 0 ]; then

				  exit_code="$tmp_exit_code"

				fi

				cd ${INSTALL_PREFIX}

				# Exit with the first non-zero status we got

				exit "$exit_code"

				if [[ -n "$INTEGRATED" ]]; then

				  pip install --user torchvision

				  "$ROOT_DIR/scripts/onnx/test.sh"

				fi

									
										3

.jenkins/pytorch/build-asan.sh
									
												View File
												
				@ -16,7 +16,6 @@ export ASAN_OPTIONS=detect_leaks=0:symbolize=1

				# TODO: Make the ASAN flags a more unified env var

				CC="clang" CXX="clang++" LDSHARED="clang --shared" \

				  LDFLAGS="-stdlib=libstdc++" \

				  CFLAGS="-fsanitize=address -shared-libasan" \

				  CFLAGS="-fsanitize=address -fsanitize=undefined -fno-sanitize-recover=all -shared-libasan" \

				  NO_CUDA=1 \

				  python setup.py install

									
										115

.jenkins/pytorch/build.sh
									
												View File
												
				@ -1,12 +1,28 @@

				#!/bin/bash

				# For distributed, four environmental configs:

				# (1) build with only NCCL

				# (2) build with NCCL and MPI

				# (3) build with only MPI

				# (4) build with neither

				if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				  # TODO: move this to Docker

				  sudo apt-get update

				  sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				fi

				if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]] || [[ "$BUILD_ENVIRONMENT" == *-trusty-py2.7.9* ]]; then

				  # TODO: move this to Docker

				  sudo apt-get update

				  sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				  sudo apt-get install -y --no-install-recommends openssh-client openssh-server

				  sudo mkdir -p /var/run/sshd

				fi

				if [[ "$BUILD_ENVIRONMENT" == "pytorch-linux-xenial-py3-clang5-asan" ]]; then

				  exec "$(dirname "${BASH_SOURCE[0]}")/build-asan.sh" $*

				fi

				# Add nccl2 for distributed test.

				apt-get install libnccl-dev libnccl2

				# Required environment variable: $BUILD_ENVIRONMENT

				# (This is set by default in the Docker images we build, so you don't

				# need to set it yourself.

				@ -20,48 +36,99 @@ python --version

				echo "GCC version:"

				gcc --version

				echo "CMake version:"

				cmake --version

				# TODO: Don't run this...

				pip install -r requirements.txt || true

				if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  # This is necessary in order to cross compile (or else we'll have missing GPU device).

				  export HCC_AMDGPU_TARGET=gfx900

				  # These environment variables are not set on CI when we were running as the Jenkins user.

				  # The HIP Utility scripts require these environment variables to be set in order to run without error.

				  export LANG=C.UTF-8

				  export LC_ALL=C.UTF-8

				  # This environment variable enabled HCC Optimizations that speed up the linking stage.

				  # https://github.com/RadeonOpenCompute/hcc#hcc-with-thinlto-linking

				  export KMTHINLTO=1

				  # Need the libc++1 and libc++abi1 libraries to allow torch._C to load at runtime

				  sudo apt-get install libc++1

				  sudo apt-get install libc++abi1

				  python tools/amd_build/build_pytorch_amd.py

				  python tools/amd_build/build_caffe2_amd.py

				  USE_ROCM=1 python setup.py install --user

				  exit 0

				fi

				# TODO: Don't install this here

				if ! which conda; then

				  pip install mkl mkl-devel

				fi

				python setup.py install

				# Add the ATen test binaries so that they won't be git clean'ed away

				git add -f aten/build/src/ATen/test

				# Testing ATen install

				if [[ "$BUILD_ENVIRONMENT" != *cuda* ]]; then

				  echo "Testing ATen install"

				  time tools/test_aten_install.sh

				# sccache will fail for CUDA builds if all cores are used for compiling

				# gcc 7 with sccache seems to have intermittent OOM issue if all cores are used

				if [ -z "$MAX_JOBS" ]; then

				  if ([[ "$BUILD_ENVIRONMENT" == *cuda* ]] || [[ "$BUILD_ENVIRONMENT" == *gcc7* ]]) && which sccache > /dev/null; then

				    export MAX_JOBS=`expr $(nproc) - 1`

				  fi

				fi

				# Test C FFI plugins

				# cffi install doesn't work for Python 3.7

				if [[ "$BUILD_ENVIRONMENT" != *pynightly* ]]; then

				  # TODO: Don't run this here

				  pip install cffi

				  git clone https://github.com/pytorch/extension-ffi.git

				  pushd extension-ffi/script

				  python build.py

				  popd

				# Target only our CI GPU machine's CUDA arch to speed up the build

				export TORCH_CUDA_ARCH_LIST="5.2"

				if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then

				  export TORCH_CUDA_ARCH_LIST="6.0"

				fi

				if [[ "$BUILD_ENVIRONMENT" == *trusty-py3.6-gcc5.4* ]]; then

				  export DEBUG=1

				fi

				# ppc64le build fails when WERROR=1

				# set only when building other architectures

				# only use for "python setup.py install" line

				if [[ "$BUILD_ENVIRONMENT" != *ppc64le* ]]; then

				  WERROR=1 python setup.py install

				elif [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then

				  python setup.py install

				fi

				# Add the test binaries so that they won't be git clean'ed away

				git add -f build/bin

				# Test documentation build

				if [[ "$BUILD_ENVIRONMENT" == *xenial-cuda8-cudnn6-py3* ]]; then

				  pushd docs

				  # TODO: Don't run this here

				  pip install -r requirements.txt || true

				  make html

				  LC_ALL=C make html

				  popd

				fi

				# Test no-Python build

				if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then

				  echo "Building libtorch with NO_PYTHON"

				  echo "Building libtorch"

				  # NB: Install outside of source directory (at the same level as the root

				  # pytorch folder) so that it doesn't get cleaned away prior to docker push.

				  VERBOSE=1 tools/cpp_build/build_all.sh "$PWD/../cpp-build"

				  BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py

				  mkdir -p ../cpp-build/caffe2

				  pushd ../cpp-build/caffe2

				  WERROR=1 VERBOSE=1 DEBUG=1 python $BUILD_LIBTORCH_PY

				  popd

				  # Build custom operator tests.

				  CUSTOM_OP_BUILD="$PWD/../custom-op-build"

				  CUSTOM_OP_TEST="$PWD/test/custom_operator"

				  SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"

				  mkdir "$CUSTOM_OP_BUILD"

				  pushd "$CUSTOM_OP_BUILD"

				  CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" cmake "$CUSTOM_OP_TEST"

				  make VERBOSE=1

				  popd

				fi

									
										35

.jenkins/pytorch/common.sh
									
												View File
												
				@ -62,10 +62,19 @@ declare -f -t trap_add

				trap_add cleanup EXIT

				if which sccache > /dev/null; then

				  # Save sccache logs to file

				  sccache --stop-server || true

				  rm ~/sccache_error.log || true

				  SCCACHE_ERROR_LOG=~/sccache_error.log RUST_LOG=sccache::server=error sccache --start-server

				  # Report sccache stats for easier debugging

				  sccache --show-stats

				  sccache --zero-stats

				  function sccache_epilogue() {

				     sccache --show-stats

				    echo '=================== sccache compilation log ==================='

				    python $(dirname "${BASH_SOURCE[0]}")/print_sccache_log.py ~/sccache_error.log

				    echo '=========== If your build fails, please take a look at the log above for possible reasons ==========='

				    sccache --show-stats

				    sccache --stop-server || true

				  }

				  trap_add sccache_epilogue EXIT

				fi

				@ -104,8 +113,28 @@ else

				fi

				if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda9-cudnn7-py3 ]] || \

				   [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-trusty-py3.6-gcc7.2 ]]; then

				   [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-trusty-py3.6-gcc7* ]]; then

				  BUILD_TEST_LIBTORCH=1

				else

				  BUILD_TEST_LIBTORCH=0

				fi

				# Use conda cmake in some CI build. Conda cmake will be newer than our supported

				# min version 3.5, so we only do it in two builds that we know should use conda.

				if [[ "$BUILD_ENVIRONMENT" == *pytorch-linux-xenial-cuda* ]]; then

				  if [[ "$BUILD_ENVIRONMENT" == *cuda8-cudnn6-py2* ]] || \

				     [[ "$BUILD_ENVIRONMENT" == *cuda9-cudnn7-py3* ]]; then

				    if ! which conda; then

				      echo "Expected ${BUILD_ENVIRONMENT} to use conda, but 'which conda' returns empty"

				      exit 1

				    else

				      conda install -q -y cmake

				    fi

				  else

				    if ! cmake --version | grep 'cmake version 3\.5'; then

				      echo "Expected ${BUILD_ENVIRONMENT} to have cmake version 3.5.* (min support version), but 'cmake --version' returns:"

				      cmake --version

				      exit 1

				    fi

				  fi

				fi

									
										6

.jenkins/pytorch/dirty.sh
									
												View File
												
				@ -3,4 +3,8 @@ set -ex

				upstream="$1"

				pr="$2"

				git diff --name-only "$upstream" "$pr"

				git diff --name-only "$upstream" "$pr" | grep -Eq '^(aten/|.jenkins/pytorch|docs/(make.bat|Makefile|requirements.txt|source)|mypy|requirements.txt|setup.py|test/|third_party/|tools/|\.gitmodules|torch/)'

				# Now that PyTorch build depends on Caffe2, unconditionally trigger

				# for any changes.

				# TODO: Replace this with a NEGATIVE regex that allows us to blacklist

				# files (letting us skip builds when they are unnecessary)

				#git diff --name-only "$upstream" "$pr" | grep -Eq '^(aten/|caffe2/|.jenkins/pytorch|docs/(make.bat|Makefile|requirements.txt|source)|mypy|requirements.txt|setup.py|test/|third_party/|tools/|\.gitmodules|torch/)'

14

.jenkins/pytorch/enabled-configs.txt

View File

 @ -12,6 +12,8 @@ pytorch-linux-xenial-cuda9-cudnn7-py2-build
 pytorch-linux-xenial-cuda9-cudnn7-py2-test
 pytorch-linux-xenial-cuda9-cudnn7-py3-build
 pytorch-linux-xenial-cuda9-cudnn7-py3-test
 pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-build
 pytorch-linux-xenial-cuda9.2-cudnn7-py3-gcc7-test
 pytorch-linux-xenial-py3-clang5-asan-build
 pytorch-linux-xenial-py3-clang5-asan-test
 pytorch-linux-trusty-py2.7.9-build
 @ -26,11 +28,21 @@ pytorch-linux-trusty-py3.6-gcc5.4-build
 pytorch-linux-trusty-py3.6-gcc5.4-test
 pytorch-linux-trusty-py3.6-gcc7.2-build
 pytorch-linux-trusty-py3.6-gcc7.2-test
 pytorch-linux-trusty-py3.6-gcc7-build
 pytorch-linux-trusty-py3.6-gcc7-test
 pytorch-linux-trusty-pynightly-build
 pytorch-linux-trusty-pynightly-test
 pytorch-win-ws2016-cuda9-cudnn7-py3-build
 pytorch-win-ws2016-cuda9-cudnn7-py3-test
 pytorch-macos-10.13-py3-build-test
 pytorch-macos-10.13-py3-build
 pytorch-macos-10.13-py3-test
 pytorch-macos-10.13-cuda9.2-cudnn7-py3-build
 pytorch-docker-build-test
 short-perf-test-cpu
 short-perf-test-gpu
 py2-clang7-rocmdeb-ubuntu16.04-build
 py2-clang7-rocmdeb-ubuntu16.04-test
 pytorch-ppc64le-cuda9.2-cudnn7-py3-build
 pytorch-ppc64le-cuda9.2-cudnn7-py3-test
 pytorch-ppc64le-cuda9.1-cudnn7-py3-build
 pytorch-ppc64le-cuda9.1-cudnn7-py3-test

									
										29

.jenkins/pytorch/macos-build-test.sh
									
												View File
												
				@ -1,26 +1,9 @@

				#!/bin/bash

				COMPACT_JOB_NAME=pytorch-macos-10.13-py3-build-test

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-build* ]]; then

				  source "$(dirname "${BASH_SOURCE[0]}")/macos-build.sh"

				fi

				# Set up conda environment

				curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o $PWD/miniconda3.sh

				rm -rf $PWD/miniconda3

				bash $PWD/miniconda3.sh -b -p $PWD/miniconda3

				export PATH="$PWD/miniconda3/bin:$PATH"

				source $PWD/miniconda3/bin/activate

				conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja

				# Build and test PyTorch

				git submodule update --init --recursive

				export CMAKE_PREFIX_PATH=$PWD/miniconda3/

				export MACOSX_DEPLOYMENT_TARGET=10.9

				export CXX=clang++

				export CC=clang

				# If we run too many parallel jobs, we will OOM

				export MAX_JOBS=2

				python setup.py install

				echo "Ninja version: $(ninja --version)"

				python test/run_test.py --verbose

				echo "BUILD PASSED"

				if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test* ]]; then

				  source "$(dirname "${BASH_SOURCE[0]}")/macos-test.sh"

				fi

									
										72

.jenkins/pytorch/macos-build.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,72 @@

				#!/bin/bash

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-build"

				export PATH="/usr/local/bin:$PATH"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				# Set up conda environment

				export PYTORCH_ENV_DIR="${HOME}/pytorch-ci-env"

				# If a local installation of conda doesn't exist, we download and install conda

				if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then

				  mkdir -p ${PYTORCH_ENV_DIR}

				  curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${PYTORCH_ENV_DIR}/miniconda3.sh

				  bash ${PYTORCH_ENV_DIR}/miniconda3.sh -b -p ${PYTORCH_ENV_DIR}/miniconda3

				fi

				export PATH="${PYTORCH_ENV_DIR}/miniconda3/bin:$PATH"

				source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate

				conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja

				rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				git submodule update --init --recursive

				export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/

				# Build PyTorch

				if [[ "${JOB_BASE_NAME}" == *cuda9.2* ]]; then

				  export CUDA_VERSION=9.2

				  export TORCH_CUDA_ARCH_LIST=5.2

				  export PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/bin${PATH:+:${PATH}}

				  export DYLD_LIBRARY_PATH=/Developer/NVIDIA/CUDA-${CUDA_VERSION}/lib${DYLD_LIBRARY_PATH:+:${DYLD_LIBRARY_PATH}}

				  export CUDA_HOME=/Developer/NVIDIA/CUDA-${CUDA_VERSION}

				  export NO_CUDA=0

				  if [ -z "${IN_CIRCLECI}" ]; then

				    # Eigen gives "explicit specialization of class must precede its first use" error

				    # when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.

				    export DEVELOPER_DIR=/Library/Developer/CommandLineTools

				  fi

				else

				  if [ -z "${IN_CIRCLECI}" ]; then

				    export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer

				  fi

				fi

				export MACOSX_DEPLOYMENT_TARGET=10.9

				export CXX=clang++

				export CC=clang

				if which sccache > /dev/null; then

				  printf "#!/bin/sh\nexec sccache $(which clang++) \$*" > "${PYTORCH_ENV_DIR}/clang++"

				  chmod a+x "${PYTORCH_ENV_DIR}/clang++"

				  printf "#!/bin/sh\nexec sccache $(which clang) \$*" > "${PYTORCH_ENV_DIR}/clang"

				  chmod a+x "${PYTORCH_ENV_DIR}/clang"

				  if [[ "${JOB_BASE_NAME}" == *cuda* ]]; then

				    printf "#!/bin/sh\nexec sccache $(which nvcc) \$*" > "${PYTORCH_ENV_DIR}/nvcc"

				    chmod a+x "${PYTORCH_ENV_DIR}/nvcc"

				    export CUDA_NVCC_EXECUTABLE="${PYTORCH_ENV_DIR}/nvcc"

				  fi

				  export PATH="${PYTORCH_ENV_DIR}:$PATH"

				fi

				# If we run too many parallel jobs, we will OOM

				export MAX_JOBS=2

				export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}

				python setup.py install

				# Upload torch binaries when the build job is finished

				if [ -z "${IN_CIRCLECI}" ]; then

				  7z a ${IMAGE_COMMIT_TAG}.7z ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				  aws s3 cp ${IMAGE_COMMIT_TAG}.7z s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z --acl public-read

				fi

									
										112

.jenkins/pytorch/macos-test.sh
									
										Executable file
									
												View File
												
				@ -0,0 +1,112 @@

				#!/bin/bash

				COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-test"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				export PATH="/usr/local/bin:$PATH"

				# Set up conda environment

				export PYTORCH_ENV_DIR="${HOME}/pytorch-ci-env"

				# If a local installation of conda doesn't exist, we download and install conda

				if [ ! -d "${PYTORCH_ENV_DIR}/miniconda3" ]; then

				  mkdir -p ${PYTORCH_ENV_DIR}

				  curl https://repo.continuum.io/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -o ${PYTORCH_ENV_DIR}/miniconda3.sh

				  bash ${PYTORCH_ENV_DIR}/miniconda3.sh -b -p ${PYTORCH_ENV_DIR}/miniconda3

				fi

				export PATH="${PYTORCH_ENV_DIR}/miniconda3/bin:$PATH"

				source ${PYTORCH_ENV_DIR}/miniconda3/bin/activate

				conda install -y mkl mkl-include numpy pyyaml setuptools cmake cffi ninja future six

				if [ -z "${IN_CIRCLECI}" ]; then

				  rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				fi

				git submodule update --init --recursive

				export CMAKE_PREFIX_PATH=${PYTORCH_ENV_DIR}/miniconda3/

				# Test PyTorch

				if [ -z "${IN_CIRCLECI}" ]; then

				  if [[ "${JOB_BASE_NAME}" == *cuda9.2* ]]; then

				    # Eigen gives "explicit specialization of class must precede its first use" error

				    # when compiling with Xcode 9.1 toolchain, so we have to use Xcode 8.2 toolchain instead.

				    export DEVELOPER_DIR=/Library/Developer/CommandLineTools

				  else

				    export DEVELOPER_DIR=/Applications/Xcode9.app/Contents/Developer

				  fi

				fi

				export MACOSX_DEPLOYMENT_TARGET=10.9

				export CXX=clang++

				export CC=clang

				# If we run too many parallel jobs, we will OOM

				export MAX_JOBS=2

				export IMAGE_COMMIT_TAG=${BUILD_ENVIRONMENT}-${IMAGE_COMMIT_ID}

				# Download torch binaries in the test jobs

				if [ -z "${IN_CIRCLECI}" ]; then

				  rm -rf ${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages/torch*

				  aws s3 cp s3://ossci-macos-build/pytorch/${IMAGE_COMMIT_TAG}.7z ${IMAGE_COMMIT_TAG}.7z

				  7z x ${IMAGE_COMMIT_TAG}.7z -o"${PYTORCH_ENV_DIR}/miniconda3/lib/python3.6/site-packages"

				fi

				test_python_all() {

				  echo "Ninja version: $(ninja --version)"

				  python test/run_test.py --verbose

				}

				test_cpp_api() {

				  # C++ API

				  # NB: Install outside of source directory (at the same level as the root

				  # pytorch folder) so that it doesn't get cleaned away prior to docker push.

				  # But still clean it before we perform our own build.

				  #

				  CPP_BUILD="$PWD/../cpp-build"

				  rm -rf $CPP_BUILD

				  mkdir -p $CPP_BUILD/caffe2

				  BUILD_LIBTORCH_PY=$PWD/tools/build_libtorch.py

				  pushd $CPP_BUILD/caffe2

				  VERBOSE=1 DEBUG=1 python $BUILD_LIBTORCH_PY

				  popd

				  python tools/download_mnist.py --quiet -d test/cpp/api/mnist

				  # Unfortunately it seems like the test can't load from miniconda3

				  # without these paths being set

				  export DYLD_LIBRARY_PATH="$DYLD_LIBRARY_PATH:$PWD/miniconda3/lib"

				  export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$PWD/miniconda3/lib"

				  "$CPP_BUILD"/caffe2/bin/test_api

				}

				test_custom_script_ops() {

				  echo "Testing custom script operators"

				  pushd test/custom_operator

				  # Build the custom operator library.

				  rm -rf build && mkdir build

				  pushd build

				  SITE_PACKAGES="$(python -c 'from distutils.sysconfig import get_python_lib; print(get_python_lib())')"

				  CMAKE_PREFIX_PATH="$SITE_PACKAGES/torch" cmake ..

				  make VERBOSE=1

				  popd

				  # Run tests Python-side and export a script module.

				  python test_custom_ops.py -v

				  python model.py --export-script-module=model.pt

				  # Run tests C++-side and load the exported script module.

				  build/test_custom_ops ./model.pt

				  popd

				}

				if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then

				  test_python_all

				  test_cpp_api

				  test_custom_script_ops

				else

				  if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then

				    test_python_all

				  elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then

				    test_cpp_api

				    test_custom_script_ops

				  fi

				fi

									
										17

.jenkins/pytorch/multigpu-test.sh
									
												View File
												
				@ -8,4 +8,21 @@ COMPACT_JOB_NAME="${BUILD_ENVIRONMENT}-multigpu-test"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				echo "Testing pytorch (distributed only)"

				if [ -n "${IN_CIRCLECI}" ]; then

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				  fi

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				    sudo apt-get install -y --no-install-recommends openssh-client openssh-server

				    sudo mkdir -p /var/run/sshd

				  fi

				fi

				time python test/run_test.py --verbose -i distributed

									
										2

.jenkins/pytorch/perf_test/common.sh
									
												View File
												
				@ -11,7 +11,7 @@ get_runtime_of_command () {

				  # runtime=$( { time ($@ &> /dev/null); } 2>&1 1>/dev/null)

				  runtime=$( { time $@; } 2>&1 1>/dev/null)

				  if [[ $runtime == *"Warning"* ]] || [[ $runtime == *"Error"* ]]; then

				  if [[ $runtime == *"Error"* ]]; then

				    exit 1

				  fi

				  runtime=${runtime#+++ $@}

									
										16

.jenkins/pytorch/perf_test/compare_with_baseline.py
									
												View File
												
				@ -1,5 +1,6 @@

				import sys

				import json

				import math

				import numpy

				import argparse

				@ -35,14 +36,25 @@ else:

				print("population mean: ", mean)

				print("population sigma: ", sigma)

				# Let the test pass if baseline number is NaN (which happened in

				# the past when we didn't have logic for catching NaN numbers)

				if math.isnan(mean) or math.isnan(sigma):

				    mean = sys.maxsize

				    sigma = 0.001

				sample_stats_data = json.loads(args.sample_stats)

				sample_mean = sample_stats_data['mean']

				sample_sigma = sample_stats_data['sigma']

				sample_mean = float(sample_stats_data['mean'])

				sample_sigma = float(sample_stats_data['sigma'])

				print("sample mean: ", sample_mean)

				print("sample sigma: ", sample_sigma)

				if math.isnan(sample_mean):

				    raise Exception('''Error: sample mean is NaN''')

				elif math.isnan(sample_sigma):

				    raise Exception('''Error: sample sigma is NaN''')

				z_value = (sample_mean - mean) / sigma

				print("z-value: ", z_value)

									
										6

.jenkins/pytorch/perf_test/test_gpu_speed_cudnn_lstm.sh
									
												View File
												
				@ -10,7 +10,11 @@ test_gpu_speed_cudnn_lstm () {

				  git clone https://github.com/pytorch/benchmark.git

				  cd benchmark/scripts/

				  cd benchmark/

				  git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0

				  cd scripts/

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

									
										6

.jenkins/pytorch/perf_test/test_gpu_speed_lstm.sh
									
												View File
												
				@ -10,7 +10,11 @@ test_gpu_speed_lstm () {

				  git clone https://github.com/pytorch/benchmark.git

				  cd benchmark/scripts/

				  cd benchmark/

				  git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0

				  cd scripts/

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

									
										6

.jenkins/pytorch/perf_test/test_gpu_speed_mlstm.sh
									
												View File
												
				@ -10,7 +10,11 @@ test_gpu_speed_mlstm () {

				  git clone https://github.com/pytorch/benchmark.git

				  cd benchmark/scripts/

				  cd benchmark/

				  git checkout 43dfb2c0370e70ef37f249dc09aff9f0ccd2ddb0

				  cd scripts/

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

									
										3

.jenkins/pytorch/perf_test/test_gpu_speed_mnist.sh
									
												View File
												
				@ -20,6 +20,9 @@ test_gpu_speed_mnist () {

				  SAMPLE_ARRAY=()

				  NUM_RUNS=$1

				  # Needs warm up to get accurate number

				  python main.py --epochs 1 --no-log

				  for (( i=1; i<=$NUM_RUNS; i++ )) do

				    runtime=$(get_runtime_of_command python main.py --epochs 1 --no-log)

				    echo $runtime

									
										11

.jenkins/pytorch/print_sccache_log.py
									
										Normal file
									
												View File
												
				@ -0,0 +1,11 @@

				import sys

				log_file_path = sys.argv[1]

				with open(log_file_path) as f:

				    lines = f.readlines()

				for line in lines:

				    # Ignore errors from CPU instruction set testing

				    if 'src.c' not in line:

				        print(line)

									
										4

.jenkins/pytorch/short-perf-test-gpu.sh
									
												View File
												
				@ -3,7 +3,7 @@

				COMPACT_JOB_NAME="short-perf-test-gpu"

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				cd .jenkins/pytorch/perf_test

				pushd .jenkins/pytorch/perf_test

				echo "Running GPU perf test for PyTorch..."

				@ -64,3 +64,5 @@ if [[ "$COMMIT_SOURCE" == master ]]; then

				    # but the chance of them executing this line at the same time is low.

				    aws s3 cp new_gpu_runtime.json s3://ossci-perf-test/pytorch/gpu_runtime/${MASTER_COMMIT_ID}.json --acl public-read

				fi

				popd

									
										176

.jenkins/pytorch/test.sh
									
												View File
												
				@ -9,6 +9,22 @@ source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				echo "Testing pytorch"

				if [ -n "${IN_CIRCLECI}" ]; then

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages libnccl-dev=2.2.13-1+cuda9.0 libnccl2=2.2.13-1+cuda9.0

				  fi

				  if [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda8-* ]] || [[ "$BUILD_ENVIRONMENT" == *-xenial-cuda9-cudnn7-py2* ]]; then

				    # TODO: move this to Docker

				    sudo apt-get update

				    sudo apt-get install -y --allow-downgrades --allow-change-held-packages openmpi-bin libopenmpi-dev

				    sudo apt-get install -y --no-install-recommends openssh-client openssh-server

				    sudo mkdir -p /var/run/sshd

				  fi

				fi

				# JIT C++ extensions require ninja.

				git clone https://github.com/ninja-build/ninja --quiet

				pushd ninja

				@ -16,43 +32,147 @@ python ./configure.py --bootstrap

				export PATH="$PWD:$PATH"

				popd

				# DANGER WILL ROBINSON.  The LD_PRELOAD here oculd cause you problems

				# DANGER WILL ROBINSON.  The LD_PRELOAD here could cause you problems

				# if you're not careful.  Check this if you made some changes and the

				# ASAN test is not working

				if [[ "$BUILD_ENVIRONMENT" == *asan* ]]; then

				    export ASAN_OPTIONS=detect_leaks=0:symbolize=1

				    export ASAN_OPTIONS=detect_leaks=0:symbolize=1:strict_init_order=true

				    # We suppress the vptr volation, since we have separate copies of

				    # libprotobuf in both libtorch.so and libcaffe2.so, and it causes

				    # the following problem:

				    #    test_cse (__main__.TestJit) ... torch/csrc/jit/export.cpp:622:38:

				    #        runtime error: member call on address ... which does not point

				    #        to an object of type 'google::protobuf::MessageLite'

				    #        ...: note: object is of type 'onnx_torch::ModelProto'

				    #

				    # This problem should be solved when libtorch.so and libcaffe2.so are

				    # merged.

				    export UBSAN_OPTIONS=print_stacktrace=1:suppressions=$PWD/ubsan.supp

				    export PYTORCH_TEST_WITH_ASAN=1

				    export PYTORCH_TEST_WITH_UBSAN=1

				    # TODO: Figure out how to avoid hard-coding these paths

				    export ASAN_SYMBOLIZER_PATH=/usr/lib/llvm-5.0/bin/llvm-symbolizer

				    export LD_PRELOAD=/usr/lib/llvm-5.0/lib/clang/5.0.0/lib/linux/libclang_rt.asan-x86_64.so

				    # Increase stack size, because ASAN red zones use more stack

				    ulimit -s 81920

				    function get_exit_code() {

				      set +e

				      "$@"

				      retcode=$?

				      set -e

				      return $retcode

				    }

				    (cd test && python -c "import torch")

				    echo "The next three invocations are expected to crash; if they don't that means ASAN/UBSAN is misconfigured"

				    (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_csrc_asan(3)")

				    (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_csrc_ubsan(0)")

				    (cd test && ! get_exit_code python -c "import torch; torch._C._crash_if_aten_asan(3)")

				fi

				time python test/run_test.py --verbose

				# Test ATen

				if [[ "$BUILD_ENVIRONMENT" != *asan* ]]; then

				  echo "Testing ATen"

				  TORCH_LIB_PATH=$(python -c "import site; print(site.getsitepackages()[0])")/torch/lib

				  ln -s "$TORCH_LIB_PATH"/libATen.so aten/build/src/ATen/libATen.so

				  aten/tools/run_tests.sh aten/build

				if [[ "$BUILD_ENVIRONMENT" == *rocm* ]]; then

				  export PYTORCH_TEST_WITH_ROCM=1

				fi

				rm -rf ninja

				echo "Installing torchvision at branch master"

				rm -rf vision

				# TODO: This git clone is bad

				git clone https://github.com/pytorch/vision --quiet

				pushd vision

				time python setup.py install

				popd

				if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then

				   echo "Testing libtorch with NO_PYTHON"

				   CPP_BUILD="$PWD/../cpp-build"

				   if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then

				     "$CPP_BUILD"/libtorch/bin/test_jit

				   else

				     "$CPP_BUILD"/libtorch/bin/test_jit "[cpu]"

				   fi

				if [[ "${JOB_BASE_NAME}" == *-NO_AVX-* ]]; then

				  export ATEN_CPU_CAPABILITY=default

				elif [[ "${JOB_BASE_NAME}" == *-NO_AVX2-* ]]; then

				  export ATEN_CPU_CAPABILITY=avx

				fi

				test_python_nn() {

				  time python test/run_test.py --include nn --verbose

				}

				test_python_all_except_nn() {

				  time python test/run_test.py --exclude nn --verbose

				}

				test_aten() {

				  # Test ATen

				  # The following test(s) of ATen have already been skipped by caffe2 in rocm environment:

				  # scalar_tensor_test, basic, native_test

				  if ([[ "$BUILD_ENVIRONMENT" != *asan* ]] && [[ "$BUILD_ENVIRONMENT" != *rocm* ]]); then

				    echo "Running ATen tests with pytorch lib"

				    TORCH_LIB_PATH=$(python -c "import site; print(site.getsitepackages()[0])")/torch/lib

				    # NB: the ATen test binaries don't have RPATH set, so it's necessary to

				    # put the dynamic libraries somewhere were the dynamic linker can find them.

				    # This is a bit of a hack.

				    if [[ "$BUILD_ENVIRONMENT" == *ppc64le* ]]; then

				      SUDO=sudo

				    fi

				    ${SUDO} ln -s "$TORCH_LIB_PATH"/libc10* build/bin

				    ${SUDO} ln -s "$TORCH_LIB_PATH"/libcaffe2* build/bin

				    ${SUDO} ln -s "$TORCH_LIB_PATH"/libnccl* build/bin

				    ls build/bin

				    aten/tools/run_tests.sh build/bin

				  fi

				}

				test_torchvision() {

				  rm -rf ninja

				  echo "Installing torchvision at branch master"

				  rm -rf vision

				  # TODO: This git clone is bad, it means pushes to torchvision can break

				  # PyTorch CI

				  git clone https://github.com/pytorch/vision --quiet

				  pushd vision

				  # python setup.py install with a tqdm dependency is broken in the

				  # Travis Python nightly (but not in latest Python nightlies, so

				  # this should be a transient requirement...)

				  # See https://github.com/pytorch/pytorch/issues/7525

				  #time python setup.py install

				  pip install --user .

				  popd

				}

				test_libtorch() {

				  if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then

				     echo "Testing libtorch"

				     CPP_BUILD="$PWD/../cpp-build"

				     if [[ "$BUILD_ENVIRONMENT" == *cuda* ]]; then

				       "$CPP_BUILD"/caffe2/bin/test_jit

				     else

				       "$CPP_BUILD"/caffe2/bin/test_jit "[cpu]"

				     fi

				     python tools/download_mnist.py --quiet -d test/cpp/api/mnist

				     OMP_NUM_THREADS=2 "$CPP_BUILD"/caffe2/bin/test_api

				  fi

				}

				test_custom_script_ops() {

				  if [[ "$BUILD_TEST_LIBTORCH" == "1" ]]; then

				    echo "Testing custom script operators"

				    CUSTOM_OP_BUILD="$PWD/../custom-op-build"

				    pushd test/custom_operator

				    cp -r "$CUSTOM_OP_BUILD" build

				    # Run tests Python-side and export a script module.

				    python test_custom_ops.py -v

				    python model.py --export-script-module=model.pt

				    # Run tests C++-side and load the exported script module.

				    build/test_custom_ops ./model.pt

				    popd

				  fi

				}

				if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then

				  test_python_nn

				  test_python_all_except_nn

				  test_aten

				  test_torchvision

				  test_libtorch

				  test_custom_script_ops

				else

				  if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then

				    test_python_nn

				  elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then

				    test_python_all_except_nn

				    test_aten

				    test_torchvision

				    test_libtorch

				    test_custom_script_ops

				  fi

				fi

									
										108

.jenkins/pytorch/win-build.sh
									
												View File
												
				@ -1,5 +1,14 @@

				#!/bin/bash

				# If you want to rebuild, run this with REBUILD=1

				# If you want to build with CUDA, run this with USE_CUDA=1

				# If you want to build without CUDA, run this with USE_CUDA=0

				if [ ! -f setup.py ]; then

				  echo "ERROR: Please run this build script from PyTorch root directory."

				  exit 1

				fi

				COMPACT_JOB_NAME=pytorch-win-ws2016-cuda9-cudnn7-py3-build

				source "$(dirname "${BASH_SOURCE[0]}")/common.sh"

				@ -29,35 +38,64 @@ EOL

				cat >ci_scripts/build_pytorch.bat <<EOL

				set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\curl-7.57.0-win64-mingw\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%

				set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\ProgramData\\chocolatey\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%

				:: Install MKL

				aws s3 cp s3://ossci-windows/mkl_2018.2.185.7z mkl.7z --quiet && 7z x -aoa mkl.7z -omkl

				if "%REBUILD%"=="" (

				  if "%BUILD_ENVIRONMENT%"=="" (

				    curl -k https://s3.amazonaws.com/ossci-windows/mkl_2018.2.185.7z --output mkl.7z

				  ) else (

				    aws s3 cp s3://ossci-windows/mkl_2018.2.185.7z mkl.7z --quiet

				  )

				  7z x -aoa mkl.7z -omkl

				)

				set CMAKE_INCLUDE_PATH=%cd%\\mkl\\include

				set LIB=%cd%\\mkl\\lib;%LIB

				:: Install MAGMA

				aws s3 cp s3://ossci-windows/magma_cuda90_release_mkl_2018.2.185.7z magma_cuda90_release_mkl_2018.2.185.7z --quiet && 7z x -aoa magma_cuda90_release_mkl_2018.2.185.7z -omagma

				if "%REBUILD%"=="" (

				  if "%BUILD_ENVIRONMENT%"=="" (

				    curl -k https://s3.amazonaws.com/ossci-windows/magma_cuda90_release_mkl_2018.2.185.7z --output magma_cuda90_release_mkl_2018.2.185.7z

				  ) else (

				    aws s3 cp s3://ossci-windows/magma_cuda90_release_mkl_2018.2.185.7z magma_cuda90_release_mkl_2018.2.185.7z --quiet

				  )

				  7z x -aoa magma_cuda90_release_mkl_2018.2.185.7z -omagma

				)

				set MAGMA_HOME=%cd%\\magma

				:: Install clcache

				aws s3 cp s3://ossci-windows/clcache.7z clcache.7z --quiet && 7z x -aoa clcache.7z -oclcache

				:: Install sccache

				mkdir %CD%\\tmp_bin

				if "%REBUILD%"=="" (

				  :check_sccache

				  %CD%\\tmp_bin\\sccache.exe --show-stats || (

				    taskkill /im sccache.exe /f /t || ver > nul

				    del %CD%\\tmp_bin\\sccache.exe

				    if "%BUILD_ENVIRONMENT%"=="" (

				      curl -k https://s3.amazonaws.com/ossci-windows/sccache.exe --output %CD%\\tmp_bin\\sccache.exe

				    ) else (

				      aws s3 cp s3://ossci-windows/sccache.exe %CD%\\tmp_bin\\sccache.exe

				    )

				    goto :check_sccache

				  )

				)

				:: Install Miniconda3

				IF EXIST C:\\Jenkins\\Miniconda3 ( rd /s /q C:\\Jenkins\\Miniconda3 )

				curl https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O

				.\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=C:\\Jenkins\\Miniconda3

				if "%REBUILD%"=="" (

				  IF EXIST C:\\Jenkins\\Miniconda3 ( rd /s /q C:\\Jenkins\\Miniconda3 )

				  curl -k https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O

				  .\Miniconda3-latest-Windows-x86_64.exe /InstallationType=JustMe /RegisterPython=0 /S /AddToPath=0 /D=C:\\Jenkins\\Miniconda3

				)

				call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3

				call conda install -y -q numpy cffi pyyaml boto3

				if "%REBUILD%"=="" ( call conda install -y -q numpy cffi pyyaml boto3 )

				:: Install ninja

				pip install ninja

				if "%REBUILD%"=="" ( pip install ninja )

				call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x86_amd64

				git submodule update --init --recursive

				set PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\libnvvp;%PATH%

				set PATH=%CD%\\tmp_bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\bin;C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\libnvvp;%PATH%

				set CUDA_PATH=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set CUDA_PATH_V9_0=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set NVTOOLSEXT_PATH=C:\\Program Files\\NVIDIA Corporation\\NvToolsExt

				@ -65,29 +103,53 @@ set CUDNN_LIB_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0\\l

				set CUDA_TOOLKIT_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				set CUDNN_ROOT_DIR=C:\\Program Files\\NVIDIA GPU Computing Toolkit\\CUDA\\v9.0

				:: Target only our CI GPU machine's CUDA arch to speed up the build

				set TORCH_CUDA_ARCH_LIST=5.2

				set USE_CLCACHE=1

				set CLCACHE_DIR=%cd%\\clcache_tmp

				set CC=%cd%\\clcache\\clcache_main.exe

				set CXX=%cd%\\clcache\\clcache_main.exe

				sccache --stop-server

				sccache --start-server

				sccache --zero-stats

				set CC=sccache cl

				set CXX=sccache cl

				set DISTUTILS_USE_SDK=1

				set CMAKE_GENERATOR=Ninja

				set NO_CUDA=1

				if not "%USE_CUDA%"=="1" (

				  if "%REBUILD%"=="" (

				    set NO_CUDA=1

				    python setup.py install

				  )

				  if errorlevel 1 exit /b 1

				  if not errorlevel 0 exit /b 1

				)

				python setup.py install

				if not "%USE_CUDA%"=="0" (

				  if "%REBUILD%"=="" (

				    sccache --show-stats

				    sccache --zero-stats

				    rd /s /q C:\\Jenkins\\Miniconda3\\Lib\\site-packages\\torch

				    copy %CD%\\tmp_bin\\sccache.exe tmp_bin\\nvcc.exe

				  )

				if %errorlevel% neq 0 exit /b %errorlevel%

				  set CUDA_NVCC_EXECUTABLE=%CD%\\tmp_bin\\nvcc

				rd /s /q C:\\Jenkins\\Miniconda3\\Lib\\site-packages\\torch

				  if "%REBUILD%"=="" set NO_CUDA=0

				set NO_CUDA=

				python setup.py install && 7z a %IMAGE_COMMIT_TAG%.7z C:\\Jenkins\\Miniconda3\\Lib\\site-packages\\torch && python ci_scripts\\upload_image.py %IMAGE_COMMIT_TAG%.7z

				  python setup.py install && sccache --show-stats && (

				    if "%BUILD_ENVIRONMENT%"=="" (

				      echo "NOTE: To run \`import torch\`, please make sure to activate the conda environment by running \`call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3\` in Command Prompt before running Git Bash."

				    ) else (

				      7z a %IMAGE_COMMIT_TAG%.7z C:\\Jenkins\\Miniconda3\\Lib\\site-packages\\torch && python ci_scripts\\upload_image.py %IMAGE_COMMIT_TAG%.7z

				    )

				  )

				)

				EOL

				ci_scripts/build_pytorch.bat && echo "BUILD PASSED"

				ci_scripts/build_pytorch.bat

				if [ ! -f $IMAGE_COMMIT_TAG.7z ] && [ ! ${BUILD_ENVIRONMENT} == "" ]; then

				    exit 1

				fi

				echo "BUILD PASSED"

									
										48

.jenkins/pytorch/win-test.sh
									
												View File
												
				@ -34,24 +34,9 @@ except botocore.exceptions.ClientError as e:

				EOL

				cat >ci_scripts/delete_image.py << EOL

				cat >ci_scripts/setup_pytorch_env.bat <<EOL

				import os

				import boto3

				IMAGE_COMMIT_TAG = os.getenv('IMAGE_COMMIT_TAG')

				session = boto3.session.Session()

				s3 = session.resource('s3')

				BUCKET_NAME = 'ossci-windows-build'

				KEY = 'pytorch/'+IMAGE_COMMIT_TAG+'.7z'

				s3.Object(BUCKET_NAME, KEY).delete()

				EOL

				cat >ci_scripts/test_pytorch.bat <<EOL

				set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\curl-7.57.0-win64-mingw\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%

				set PATH=C:\\Program Files\\CMake\\bin;C:\\Program Files\\7-Zip;C:\\ProgramData\\chocolatey\\bin;C:\\Program Files\\Git\\cmd;C:\\Program Files\\Amazon\\AWSCLI;%PATH%

				:: Install Miniconda3

				IF EXIST C:\\Jenkins\\Miniconda3 ( rd /s /q C:\\Jenkins\\Miniconda3 )

				@ -60,7 +45,7 @@ curl https://repo.continuum.io/miniconda/Miniconda3-latest-Windows-x86_64.exe -O

				call C:\\Jenkins\\Miniconda3\\Scripts\\activate.bat C:\\Jenkins\\Miniconda3

				call conda install -y -q numpy mkl cffi pyyaml boto3

				pip install ninja

				pip install ninja future

				call "C:\\Program Files (x86)\\Microsoft Visual Studio\\2017\\Community\\VC\\Auxiliary\\Build\\vcvarsall.bat" x86_amd64

				@ -78,8 +63,31 @@ cd test/

				python ..\\ci_scripts\\download_image.py %IMAGE_COMMIT_TAG%.7z

				7z x %IMAGE_COMMIT_TAG%.7z

				python run_test.py --verbose && python ..\\ci_scripts\\delete_image.py

				cd ..

				EOL

				ci_scripts/test_pytorch.bat && echo "TEST PASSED"

				cat >ci_scripts/test_python_nn.bat <<EOL

				call ci_scripts/setup_pytorch_env.bat

				cd test/ && python run_test.py --include nn --verbose && cd ..

				EOL

				cat >ci_scripts/test_python_all_except_nn.bat <<EOL

				call ci_scripts/setup_pytorch_env.bat

				cd test/ && python run_test.py --exclude nn --verbose && cd ..

				EOL

				run_tests() {

				    if [ -z "${JOB_BASE_NAME}" ] || [[ "${JOB_BASE_NAME}" == *-test ]]; then

				        ci_scripts/test_python_nn.bat && ci_scripts/test_python_all_except_nn.bat

				    else

				        if [[ "${JOB_BASE_NAME}" == *-test1 ]]; then

				            ci_scripts/test_python_nn.bat

				        elif [[ "${JOB_BASE_NAME}" == *-test2 ]]; then

				            ci_scripts/test_python_all_except_nn.bat

				        fi

				    fi

				}

				run_tests && echo "TEST PASSED"

									
										3

aten/.travis.yml → .travis.aten.yml
									
												View File
												
				@ -13,9 +13,10 @@ install:

				    - travis_retry pip install pyyaml typing

				script:

				    - cd aten

				    - mkdir build install

				    - cd build

				    - cmake .. -DNO_CUDA=1 -DCMAKE_INSTALL_PREFIX=../install

				    - cmake .. -DUSE_CUDA=OFF -DCMAKE_INSTALL_PREFIX=../install

				    - make install

				    - ../tools/run_tests.sh .

				    - cd ..

									
										9

.travis.yml
									
												View File
												
				@ -16,7 +16,16 @@ matrix:

				        python: "2.7"

				        install: pip install flake8

				        script: flake8

				      - env: LINT_CHECK

				        python: "3.7"

				        dist: xenial    # required for Python 3.7 (travis-ci/travis-ci#9069)

				        sudo: required  # required for Python 3.7 (travis-ci/travis-ci#9069)

				        install: pip install flake8

				        script: flake8

				      - env: MYPY_TYPE_CHECK

				        python: "3.6"

				        install: pip install mypy mypy-extensions

				        script: mypy @mypy-files.txt

				      - env: CPP_DOC_CHECK

				        install: sudo apt-get install -y doxygen

				        script: cd docs/cpp/source && ./check-doxygen.sh

									
										220

CMakeLists.txt
									
												View File
												
				@ -1,15 +1,14 @@

				cmake_minimum_required(VERSION 3.2 FATAL_ERROR)

				cmake_minimum_required(VERSION 3.5 FATAL_ERROR)

				#cmake_policy(SET CMP0022 NEW)

				#cmake_policy(SET CMP0023 NEW)

				# ---[ Project and semantic versioning.

				project(Caffe2 CXX C)

				set(CAFFE2_VERSION_MAJOR 0)

				set(CAFFE2_VERSION_MINOR 8)

				set(CAFFE2_VERSION_PATCH 2)

				set(CAFFE2_VERSION

				    "${CAFFE2_VERSION_MAJOR}.${CAFFE2_VERSION_MINOR}.${CAFFE2_VERSION_PATCH}")

				set(CMAKE_CXX_STANDARD 11)

				if (NOT MSVC)

				  set(CMAKE_C_STANDARD 11)

				endif()

				# One variable that determines whether the current cmake process is being run

				# with the main Caffe2 library. This is useful for building modules - if

				@ -20,14 +19,47 @@ set(CAFFE2_VERSION

				#    endif()

				set(CAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO ON)

				if(NOT DEFINED BLAS_SET_BY_USER)

				  if(DEFINED BLAS)

				    set(BLAS_SET_BY_USER TRUE)

				  else()

				    message(STATUS "Not forcing any particular BLAS to be found")

				    set(BLAS_SET_BY_USER FALSE)

				  endif()

				  set(BLAS_SET_BY_USER ${BLAS_SET_BY_USER} CACHE STRING "Marks whether BLAS was manually set by user or auto-detected")

				endif()

				# Apple specific

				if(APPLE)

				  # These lines are an attempt to make find_package(cuda) pick up

				  # libcuda.dylib, and not cuda.framework.  It doesn't work all

				  # the time, but it seems to help for some users.

				  # TODO: replace this with a more robust fix

				  set(CMAKE_FIND_FRAMEWORK LAST)

				  set(CMAKE_FIND_APPBUNDLE LAST)

				  # Get clang version on macOS

				  EXECUTE_PROCESS( COMMAND ${CMAKE_CXX_COMPILER} --version OUTPUT_VARIABLE clang_full_version_string )

				  string(REGEX REPLACE "Apple LLVM version ([0-9]+\\.[0-9]+).*" "\\1" CLANG_VERSION_STRING ${clang_full_version_string})

				  MESSAGE( STATUS "CLANG_VERSION_STRING:         " ${CLANG_VERSION_STRING} )

				  # RPATH stuff

				  set(CMAKE_MACOSX_RPATH ON)

				endif()

				# ---[ Options.

				# Note to developers: if you add an option below, make sure you also add it to

				# cmake/Summary.cmake so that the summary prints out the option values.

				include(CMakeDependentOption)

				option(BUILD_BINARY "Build C++ binaries" ON)

				option(BUILD_DOCS "Build documentation" OFF)

				option(BUILD_TORCH "Build Torch" OFF)

				option(ATEN_NO_TEST "Do not build ATen test binaries" OFF)

				option(BUILD_ATEN_MOBILE "Build ATen for Android and iOS" OFF)

				option(BUILD_BINARY "Build C++ binaries" OFF)

				option(BUILD_DOCS "Build Caffe2 documentation" OFF)

				option(BUILD_CUSTOM_PROTOBUF "Build and use Caffe2's own protobuf under third_party" ON)

				option(BUILD_PYTHON "Build Python binaries" ON)

				option(BUILD_CAFFE2_OPS "Build Caffe2 operators" ON)

				option(BUILD_SHARED_LIBS "Build libcaffe2.so" ON)

				cmake_dependent_option(

				    CAFFE2_LINK_LOCAL_PROTOBUF "If set, build protobuf inside libcaffe2.so." ON

				@ -35,45 +67,91 @@ cmake_dependent_option(

				cmake_dependent_option(

				    CAFFE2_USE_MSVC_STATIC_RUNTIME "Using MSVC static runtime libraries" ON

				    "NOT BUILD_SHARED_LIBS" OFF)

				option(BUILD_TEST "Build C++ test binaries (need gtest and gbenchmark)" ON)

				option(BUILD_TEST "Build C++ test binaries (need gtest and gbenchmark)" OFF)

				cmake_dependent_option(

				    INSTALL_TEST "Install test binaries if BUILD_TEST is on" OFF

				    "BUILD_TEST" OFF)

				option(USE_ACL "Use ARM Compute Library" OFF)

				option(USE_ASAN "Use Address Sanitizer" OFF)

				option(USE_ATEN "Use ATen" OFF)

				option(USE_CUDA "Use Cuda" ON)

				option(USE_CUDA "Use CUDA" ON)

				option(USE_ROCM "Use ROCm" OFF)

				option(CAFFE2_STATIC_LINK_CUDA "Statically link CUDA libraries" OFF)

				cmake_dependent_option(

				    USE_CUDNN "Use cuDNN" ON

				    "USE_CUDA" OFF)

				option(USE_FFMPEG "Use ffmpeg" OFF)

				option(USE_GFLAGS "Use GFLAGS" ON)

				option(USE_GLOG "Use GLOG" ON)

				option(USE_GLOO "Use Gloo" ON)

				option(USE_LEVELDB "Use LEVELDB" ON)

				option(USE_LITE_PROTO "Use lite protobuf instead of full." OFF)

				option(USE_LMDB "Use LMDB" ON)

				option(USE_METAL "Use Metal for iOS build" ON)

				option(USE_MOBILE_OPENGL "Use OpenGL for mobile code" ON)

				option(USE_MPI "Use MPI" ON)

				option(USE_NATIVE_ARCH "Use -march=native" OFF)

				option(USE_NCCL "Use NCCL" ON)

				option(USE_SYSTEM_NCCL "Use system-wide NCCL" OFF)

				option(USE_NERVANA_GPU "Use Nervana GPU backend" OFF)

				option(USE_NNAPI "Use NNAPI" OFF)

				option(USE_NNPACK "Use NNPACK" ON)

				option(USE_NUMA "Use NUMA (only available on Linux)" ON)

				cmake_dependent_option(

				    USE_NVRTC "Use NVRTC. Only available if USE_CUDA is on." OFF

				    "USE_CUDA" OFF)

				option(USE_OBSERVERS "Use observers module." OFF)

				option(USE_OPENCV "Use openCV" ON)

				option(USE_OPENCL "Use OpenCL" OFF)

				option(USE_OPENCV "Use OpenCV" ON)

				option(USE_OPENMP "Use OpenMP for parallel code" OFF)

				option(USE_PROF "Use profiling" OFF)

				option(USE_REDIS "Use Redis" OFF)

				option(USE_ROCKSDB "Use RocksDB" OFF)

				option(USE_SNPE "Use Qualcomm's SNPE library" OFF)

				option(USE_SYSTEM_EIGEN_INSTALL

				    "Use system Eigen instead of the one under third_party" OFF)

				option(USE_TENSORRT "Using Nvidia TensorRT library" OFF)

				option(USE_ZMQ "Use ZMQ" OFF)

				option(USE_ZSTD "Use ZSTD" OFF)

				option(USE_MKLDNN "Use MKLDNN" OFF)

				option(USE_IDEEP "Use IDEEP interface in MKL BLAS" ON)

				option(USE_MKLML "Use MKLML interface in MKL BLAS" ON)

				option(USE_DISTRIBUTED "Use distributed" ON)

				cmake_dependent_option(

				    USE_MPI "Use MPI for Caffe2. Only available if USE_DISTRIBUTED is on." ON

				    "USE_DISTRIBUTED" OFF)

				cmake_dependent_option(

				    USE_GLOO "Use Gloo. Only available if USE_DISTRIBUTED is on." ON

				    "USE_DISTRIBUTED" OFF)

				cmake_dependent_option(

				    USE_GLOO_IBVERBS "Use Gloo IB verbs for distributed. Only available if USE_GLOO is on." OFF

				    "USE_GLOO" OFF)

				# Used when building Caffe2 through setup.py

				option(BUILDING_WITH_TORCH_LIBS "Tell cmake if Caffe2 is being built alongside torch libs" OFF)

				SET(ONNX_NAMESPACE "onnx_c2" CACHE STRING "onnx namespace")

				if (ANDROID OR IOS)

				  set(BUILD_ATEN_MOBILE ON)

				endif()

				# ---[ Utils

				# TODO: merge the following 3 files into cmake/public/utils.cmake.

				include(cmake/Utils.cmake)

				include(cmake/public/utils.cmake)

				# ---[ Version numbers for generated libraries

				set(TORCH_DEFAULT_VERSION "1.0.0")

				set(TORCH_BUILD_VERSION "${TORCH_DEFAULT_VERSION}" CACHE STRING "Torch build version")

				if (NOT TORCH_BUILD_VERSION)

				  # An empty string was specified so force version to the default

				  set(TORCH_BUILD_VERSION "${TORCH_DEFAULT_VERSION}"

				    CACHE STRING "Torch build version" FORCE)

				endif()

				caffe2_parse_version_str(TORCH ${TORCH_BUILD_VERSION})

				caffe2_parse_version_str(CAFFE2 ${TORCH_BUILD_VERSION})

				# ---[ CMake scripts + modules

				list(APPEND CMAKE_MODULE_PATH ${PROJECT_SOURCE_DIR}/cmake/Modules)

				if (MSVC AND ${BUILD_SHARED_LIBS})

				  set(CMAKE_WINDOWS_EXPORT_ALL_SYMBOLS ON)

				endif()

				# ---[ CMake build directories

				set(CMAKE_ARCHIVE_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)

				set(CMAKE_LIBRARY_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/lib)

				@ -81,17 +159,8 @@ set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)

				enable_testing()

				# ---[ Misc checks to cope with various compiler modes

				include(cmake/MiscCheck.cmake)

				# ---[ Build variables set within the cmake tree

				include(cmake/BuildVariables.cmake)

				# External projects

				include(ExternalProject)

				# TODO: merge the following 3 files into cmake/public/utils.cmake.

				include(cmake/Utils.cmake)

				include(cmake/public/utils.cmake)

				set(CAFFE2_WHITELIST "" CACHE STRING "A whitelist file of files that one should build.")

				# Set default build type

				@ -100,6 +169,12 @@ if(NOT CMAKE_BUILD_TYPE)

				    set(CMAKE_BUILD_TYPE "Release" CACHE STRING "Choose the type of build from: Debug Release RelWithDebInfo MinSizeRel Coverage." FORCE)

				endif()

				# ---[ Misc checks to cope with various compiler modes

				include(cmake/MiscCheck.cmake)

				# External projects

				include(ExternalProject)

				# ---[ Dependencies

				include(cmake/Dependencies.cmake)

				@ -120,7 +195,52 @@ if(NOT MSVC)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-narrowing")

				  # Eigen fails to build with some versions, so convert this to a warning

				  # Details at http://eigen.tuxfamily.org/bz/show_bug.cgi?id=1459

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-invalid-partial-specialization")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wall")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wextra")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-field-initializers")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-type-limits")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-array-bounds")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unknown-pragmas")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-sign-compare")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-parameter")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-variable")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-function")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-result")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-strict-overflow")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-strict-aliasing")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=deprecated-declarations")

				  if (CMAKE_COMPILER_IS_GNUCXX AND NOT (CMAKE_CXX_COMPILER_VERSION VERSION_LESS 7.0.0))

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-stringop-overflow")

				  endif()

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=pedantic")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=redundant-decls")

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-error=old-style-cast")

				  # These flags are not available in GCC-4.8.5. Set only when using clang.

				  # Compared against https://gcc.gnu.org/onlinedocs/gcc-4.8.5/gcc/Option-Summary.html

				  if ("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-invalid-partial-specialization")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-typedef-redefinition")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unknown-warning-option")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-private-field")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-inconsistent-missing-override")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-aligned-allocation-unavailable")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-c++14-extensions")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-constexpr-not-const")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-braces")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Qunused-arguments")

				  endif()

				  if ((APPLE AND (NOT ("${CLANG_VERSION_STRING}" VERSION_LESS "9.0")))

				    OR (CMAKE_COMPILER_IS_GNUCXX

				    AND (CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0 AND NOT APPLE)))

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -faligned-new")

				  endif()

				  if ($ENV{WERROR})

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Werror")

				  endif($ENV{WERROR})

				  if (NOT APPLE)

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-but-set-variable")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-maybe-uninitialized")

				  endif()

				else()

				  foreach(flag_var

				      CMAKE_CXX_FLAGS CMAKE_CXX_FLAGS_DEBUG CMAKE_CXX_FLAGS_RELEASE

				@ -134,10 +254,30 @@ else()

				        string(REGEX REPLACE "/MT" "/MD" ${flag_var} "${${flag_var}}")

				      endif()

				    endif()

				    # /bigobj increases number of sections in .obj file, which is needed to link

				    # against libaries in Python 2.7 under Windows

				    set(${flag_var} "${${flag_var}} /MP /bigobj")

				  endforeach(flag_var)

				endif()

				set (CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fno-omit-frame-pointer -O0")

				set (CMAKE_LINKER_FLAGS_DEBUG "${CMAKE_STATIC_LINKER_FLAGS_DEBUG} -fno-omit-frame-pointer -O0")

				if (USE_ASAN)

				    set (CMAKE_CXX_FLAGS_DEBUG "${CMAKE_CXX_FLAGS_DEBUG} -fsanitize=address")

				    set (CMAKE_LINKER_FLAGS_DEBUG "${CMAKE_STATIC_LINKER_FLAGS_DEBUG} -fsanitize=address")

				endif()

				if (APPLE)

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-unused-private-field")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-missing-braces")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-c++14-extensions")

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-constexpr-not-const")

				endif()

				if(CMAKE_COMPILER_IS_GNUCXX AND CMAKE_CXX_COMPILER_VERSION VERSION_GREATER 7.0.0)

				  set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-stringop-overflow")

				endif()

				if(ANDROID)

				  if(CMAKE_COMPILER_IS_GNUCXX)

				    set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -s")

				@ -161,13 +301,13 @@ include_directories(BEFORE ${PROJECT_SOURCE_DIR})

				# in PROJECT_SOURCE_DIR.

				include_directories(BEFORE ${PROJECT_BINARY_DIR})

				# ---[ Old caffe protobuf.

				add_subdirectory(caffe/proto)

				include_directories(BEFORE ${PROJECT_SOURCE_DIR}/aten/src/)

				# ---[ Main build

				add_subdirectory(c10)

				add_subdirectory(caffe2)

				# Documentation Option

				# --[ Documentation

				if(BUILD_DOCS)

				  # check if Doxygen is installed

				  find_package(Doxygen)

				@ -181,7 +321,7 @@ if(BUILD_DOCS)

				    if(EXISTS ${CMAKE_CURRENT_BINARY_DIR}/docs)

				      file(REMOVE_RECURSE ${CMAKE_CURRENT_BINARY_DIR}/docs)

				    endif (EXISTS ${CMAKE_CURRENT_BINARY_DIR}/docs)

				    endif()

				    file(MAKE_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/docs)

				    configure_file(${DOXYGEN_C_IN} ${DOXYGEN_C_OUT} @ONLY)

				@ -198,10 +338,10 @@ if(BUILD_DOCS)

				        WORKING_DIRECTORY ${CMAKE_CURRENT_SOURCE_DIR}

				        COMMENT "Generating Python API documentation with Doxygen"

				        VERBATIM)

				  else (DOXYGEN_FOUND)

				  else()

				    message(FATAL_ERROR "Doxygen needs to be installed to generate the documentation")

				  endif (DOXYGEN_FOUND)

				endif (BUILD_DOCS)

				  endif()

				endif()

				# ---[ CMake related files

				# Uninistall option.

				@ -226,7 +366,7 @@ if ((NOT USE_GLOG) OR (NOT USE_GFLAGS) OR BUILD_CUSTOM_PROTOBUF)

				      "generate files that are not well tested.")

				endif()

				if (USE_CUDA)

				if (USE_CUDA OR USE_ROCM)

				  # TODO: check if we should include other cuda dependency libraries

				  # to the interface as well.

				@ -256,11 +396,16 @@ if (BUILD_SHARED_LIBS)

				      ${PROJECT_SOURCE_DIR}/cmake/public/cuda.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/glog.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/gflags.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/mkl.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/protobuf.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/threads.cmake

				      ${PROJECT_SOURCE_DIR}/cmake/public/utils.cmake

				      DESTINATION share/cmake/Caffe2/public

				      COMPONENT dev)

				  install(DIRECTORY

				      ${PROJECT_SOURCE_DIR}/cmake/Modules_CUDA_fix

				      DESTINATION share/cmake/Caffe2/

				      COMPONENT dev)

				  install(EXPORT Caffe2Targets DESTINATION share/cmake/Caffe2

				      FILE Caffe2Targets.cmake

				      COMPONENT dev)

				@ -278,7 +423,6 @@ add_subdirectory(modules)

				# are built. For the binaries, they will be linked to the Caffe2 main

				# libraries, as well as all the modules that are built with Caffe2 (the ones

				# built in the previous Modules section above).

				if (BUILD_BINARY)

				  add_subdirectory(binaries)

				endif()

17

CODEOWNERS

View File

 @ -2,11 +2,24 @@
 # Each line is a file pattern followed by one or more owners.
 /aten/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /torch/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer
 /aten/src/ATen/core/
 /torch/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /docs/source @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /docs/source @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ssnl @zou3519
 /docs/cpp @goldsborough @ebetica @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /test @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /tools @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /README.md @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /setup.py @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /requirements.txt @apaszke @soumith @colesbury @gchanan @zdevito @ezyang
 /torch/csrc/api/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ebetica @goldsborough
 /test/cpp/api/ @apaszke @soumith @colesbury @gchanan @zdevito @ezyang @ebetica @goldsborough
 /torch/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /torch/csrc/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /torch/csrc/jit/passes/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /test/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /scripts/onnx/ @anderspapitto @bddppq @dzhulgakov @ezyang @houseroad @jamesr66a @smessmer @Yangqing
 /torch/lib/c10d/ @apaszke @pietern @teng-li
 /torch/csrc/distributed/ @apaszke @pietern @teng-li
 /torch/distributed/ @apaszke @pietern @teng-li
 /test/test_c10d.py @apaszke @pietern @teng-li
 /torch/utils/cpp_extension.py @goldsborough @fmassa @apaszke @soumith @ezyang

									
										159

CONTRIBUTING.md
									
												View File
												
				@ -19,18 +19,18 @@ If you are not familiar with creating a Pull Request, here are some guides:

				- https://help.github.com/articles/creating-a-pull-request/

				## Developing locally with PyTorch

				## Developing PyTorch

				To locally develop with PyTorch, here are some tips:

				To develop PyTorch on your machine, here are some tips:

				1. Uninstall all existing pytorch installs

				1. Uninstall all existing PyTorch installs:

				```

				conda uninstall pytorch

				pip uninstall torch

				pip uninstall torch # run this command twice

				```

				2. Locally clone a copy of PyTorch from source:

				2. Clone a copy of PyTorch from source:

				```

				git clone https://github.com/pytorch/pytorch

				@ -72,6 +72,9 @@ For example:

				You do not need to repeatedly install after modifying python files.

				In case you want to reinstall, make sure that you uninstall pytorch first by running `pip uninstall torch`

				and `python setup.py clean`. Then you can install in `build develop` mode again.

				## Unit testing

				PyTorch's testing is located under `test/`. Run the entire test suite with

				@ -101,6 +104,18 @@ PyTorch uses [Google style](http://sphinxcontrib-napoleon.readthedocs.io/en/late

				for formatting docstrings. Length of line inside docstrings block must be limited to 80 characters to

				fit into Jupyter documentation popups.

				For C++ documentation (https://pytorch.org/cppdocs), we use

				[Doxygen](http://www.doxygen.nl/) and then convert it to

				[Sphinx](http://www.sphinx-doc.org/) via

				[Breathe](https://github.com/michaeljones/breathe) and

				[Exhale](https://github.com/svenevs/exhale). Check the [Doxygen

				reference](http://www.stack.nl/~dimitri/doxygen/manual/index.html) for more

				information on the documentation syntax. To build the documentation locally,

				`cd` into `docs/cpp` and then `make html`.

				We run Doxygen in CI (Travis) to verify that you do not use invalid Doxygen

				commands. To run this check locally, run `./check-doxygen.sh` from inside

				`docs/cpp`.

				## Managing multiple build trees

				@ -136,19 +151,20 @@ not very optimized for incremental rebuilds, this will actually be very slow.

				Far better is to only request rebuilds of the parts of the project you are

				working on:

				- Working on `torch/csrc`?  Run `python setup.py develop` to rebuild

				- Working on the Python bindings?  Run `python setup.py develop` to rebuild

				  (NB: no `build` here!)

				- Working on `torch/lib/TH`, did not make any cmake changes, and just want to

				  see if it compiles?  Run `(cd torch/lib/build/TH && make install -j$(getconf _NPROCESSORS_ONLN))`.  This

				  applies for any other subdirectory of `torch/lib`.  **Warning: Changes you

				  make here will not be visible from Python.**  See below.

				- Working on `torch/csrc` or `aten`?  Run `python setup.py rebuild_libtorch` to

				  rebuild and avoid having to rebuild other dependent libraries we

				  depend on.

				- Working on `torch/lib` and want to run your changes / rerun cmake?  Run

				  `python setup.py build_deps`.  Note that this will rerun cmake for

				  every subdirectory in TH; if you are only working on one project,

				  consider editing `torch/lib/build_all.sh` and commenting out the

				  `build` lines of libraries you are not working on.

				- Working on one of the other dependent libraries? The other valid

				  targets are listed in `dep_libs` in `setup.py`. prepend `build_` to

				  get a target, and run as e.g. `python setup.py build_gloo`.

				- Working on a test binary?  Run `(cd build && ninja bin/test_binary_name)` to

				  rebuild only that test binary (without rerunning cmake).  (Replace `ninja` with

				  `make` if you don't have ninja installed).

				On the initial build, you can also speed things up with the environment

				variables `DEBUG` and `NO_CUDA`.

				@ -165,7 +181,7 @@ Make sure you continue to pass these flags on subsequent builds.

				### Code completion and IDE support

				When using `python setup.py develop`, PyTorch will generate 

				When using `python setup.py develop`, PyTorch will generate

				a `compile_commands.json` file that can be used by many editors

				to provide command completion and error highlighting for PyTorch's

				C++ code. You need to `pip install ninja` to generate accurate

				@ -176,12 +192,14 @@ information for the code in `torch/csrc`. More information at:

				#### Use Ninja

				Python `setuptools` is pretty dumb, and always rebuilds every C file in a

				project.  If you install the ninja build system with `pip install ninja`, 

				project.  If you install the ninja build system with `pip install ninja`,

				then PyTorch will use it to track dependencies correctly.

				If pytorch was already built, you will need to run `python setup.py clean` once

				after installing ninja for builds to succeed.

				#### Use CCache

				Even when dependencies are tracked with file modification, 

				Even when dependencies are tracked with file modification,

				there are many situations where files get rebuilt when a previous

				compilation was exactly the same.

				@ -226,15 +244,116 @@ export CUDA_NVCC_EXECUTABLE=~/ccache/cuda/nvcc

				If you are working on the CUDA code, here are some useful CUDA debugging tips:

				1. `CUDA_DEBUG=1` will enable CUDA debugging symbols (-g -G). This is particularly

				    helpful in debugging device code. However, it will slow down the build process,

				    so use wisely.

				1. `CUDA_DEVICE_DEBUG=1` will enable CUDA device function debug symbols (`-g -G`).

				    This will be particularly helpful in debugging device code. However, it will

				    slow down the build process for about 50% (compared to only `DEBUG=1`), so use wisely.

				2. `cuda-gdb` and `cuda-memcheck` are your best CUDA debugging friends. Unlike`gdb`,

				   `cuda-gdb` can display actual values in a CUDA tensor (rather than all zeros).

				Hope this helps, and thanks for considering to contribute.

				## Windows development tips

				Occasionally, you will write a patch which works on Linux, but fails CI on Windows.

				There are a few aspects in which MSVC (the Windows compiler toolchain we use) is stricter

				than Linux, which are worth keeping in mind when fixing these problems.

				1. Symbols are NOT exported by default on Windows; instead, you have to explicitly

				   mark a symbol as exported/imported in a header file with `__declspec(dllexport)` /

				   `__declspec(dllimport)`.  We have codified this pattern into a set of macros

				   which follow the convention `*_API`, e.g., `CAFFE2_API` inside Caffe2 and ATen.

				   (Every separate shared library needs a unique macro name, because symbol visibility

				   is on a per shared library basis. See c10/macros/Macros.h for more details.)

				   The upshot is if you see an "unresolved external" error in your Windows build, this

				   is probably because you forgot to mark a function with `*_API`.  However, there is

				   one important counterexample to this principle: if you want a *templated* function

				   to be instantiated at the call site, do NOT mark it with `*_API` (if you do mark it,

				   you'll have to explicitly instantiate all of the specializations used by the call

				   sites.)

				2. If you link against a library, this does not make its dependencies transitively

				   visible. You must explicitly specify a link dependency against every library whose

				   symbols you use.  (This is different from Linux where in most environments,

				   transitive dependencies can be used to fulfill unresolved symbols.)

				3. If you have a Windows box (we have a few on EC2 which you can request access to) and

				   you want to run the build, the easiest way is to just run `.jenkins/pytorch/win-build.sh`.

				   If you need to rebuild, run `REBUILD=1 .jenkins/pytorch/win-build.sh` (this will avoid

				   blowing away your Conda environment.)

				Even if you don't know anything about MSVC, you can use cmake to build simple programs on

				Windows; this can be helpful if you want to learn more about some peculiar linking behavior

				by reproducing it on a small example.  Here's a simple example cmake file that defines

				two dynamic libraries, one linking with the other:

				```

				project(myproject CXX)

				set(CMAKE_CXX_STANDARD 11)

				add_library(foo SHARED foo.cpp)

				add_library(bar SHARED bar.cpp)

				# NB: don't forget to __declspec(dllexport) at least one symbol from foo,

				# otherwise foo.lib will not be created.

				target_link_libraries(bar PUBLIC foo)

				```

				You can build it with:

				```

				mkdir build

				cd build

				cmake ..

				cmake --build .

				```

				### Known MSVC (and MSVC with NVCC) bugs

				The PyTorch codebase sometimes likes to use exciting C++ features, and

				these exciting features lead to exciting bugs in Windows compilers.

				To add insult to injury, the error messages will often not tell you

				which line of code actually induced the erroring template instantiation.

				I've found the most effective way to debug these problems is to

				carefully read over diffs, keeping in mind known bugs in MSVC/NVCC.

				Here are a few well known pitfalls and workarounds:

				* This is not actually a bug per se, but in general, code generated by MSVC

				  is more sensitive to memory errors; you may have written some code

				  that does a use-after-free or stack overflows; on Linux the code

				  might work, but on Windows your program will crash.  ASAN may not

				  catch all of these problems: stay vigilant to the possibility that

				  your crash is due to a real memory problem.

				* (NVCC) `at::optional` does not work when used from device code.  Don't use

				  it from kernels.  Upstream issue: https://github.com/akrzemi1/Optional/issues/58

				  and our local issue #10329.

				* `constexpr` generally works less well on MSVC.

				  * The idiom `static_assert(f() == f())` to test if `f` is constexpr

				    does not work; you'll get "error C2131: expression did not evaluate

				    to a constant".  Don't use these asserts on Windows.

				    (Example: `aten/src/ATen/core/intrusive_ptr.h`)

				* (NVCC) Code you access inside a `static_assert` will eagerly be

				  evaluated as if it were device code, and so you might get an error

				  that the code is "not accessible".

				```

				class A {

				  static A singleton_;

				  static constexpr inline A* singleton() {

				    return &singleton_;

				  }

				};

				static_assert(std::is_same(A*, decltype(A::singelton()))::value, "hmm");

				```

				* The compiler will run out of heap if you attempt to compile files that

				  are too large.  Splitting such files into separate files helps.

				  (Example: `THTensorMath`, `THTensorMoreMath`, `THTensorEvenMoreMath`.)

				## Caffe2 notes

				In 2018, we merged Caffe2 into the PyTorch source repository.  While the

42

LICENSE

View File

 @ -1,3 +1,45 @@
 From PyTorch:
 Copyright (c) 2016-     Facebook, Inc            (Adam Paszke)
 Copyright (c) 2014-     Facebook, Inc            (Soumith Chintala)
 Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
 Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
 Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
 Copyright (c) 2011-2013 NYU                      (Clement Farabet)
 Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
 Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
 Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
 From Caffe2:
 Copyright (c) 2016-present, Facebook Inc. All rights reserved.
 All contributions by Facebook:
 Copyright (c) 2016 Facebook Inc.
 All contributions by Google:
 Copyright (c) 2015 Google Inc.
 All rights reserved.
 All contributions by Yangqing Jia:
 Copyright (c) 2015 Yangqing Jia
 All rights reserved.
 All contributions from Caffe:
 Copyright(c) 2013, 2014, 2015, the respective contributors
 All rights reserved.
 All other contributions:
 Copyright(c) 2015, 2016 the respective contributors
 All rights reserved.
 Caffe2 uses a copyright model similar to Caffe: each contributor holds
 copyright over their contributions to Caffe2. The project versioning records
 all such contribution and copyright details. If a contributor wants to further
 mark their specific copyright on a particular contribution, they should
 indicate their copyright solely in the commit message of the change when it is
 committed.
 All rights reserved.
 Redistribution and use in source and binary forms, with or without

42

NOTICE

View File

 @ -1,45 +1,3 @@
 From PyTorch:
 Copyright (c) 2016-     Facebook, Inc            (Adam Paszke)
 Copyright (c) 2014-     Facebook, Inc            (Soumith Chintala)
 Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert)
 Copyright (c) 2012-2014 Deepmind Technologies    (Koray Kavukcuoglu)
 Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu)
 Copyright (c) 2011-2013 NYU                      (Clement Farabet)
 Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston)
 Copyright (c) 2006      Idiap Research Institute (Samy Bengio)
 Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz)
 From Caffe2:
 Copyright (c) 2016-present, Facebook Inc. All rights reserved.
 All contributions by Facebook:
 Copyright (c) 2016 Facebook Inc.
 All contributions by Google:
 Copyright (c) 2015 Google Inc.
 All rights reserved.
 All contributions by Yangqing Jia:
 Copyright (c) 2015 Yangqing Jia
 All rights reserved.
 All contributions from Caffe:
 Copyright(c) 2013, 2014, 2015, the respective contributors
 All rights reserved.
 All other contributions:
 Copyright(c) 2015, 2016 the respective contributors
 All rights reserved.
 Caffe2 uses a copyright model similar to Caffe: each contributor holds
 copyright over their contributions to Caffe2. The project versioning records
 all such contribution and copyright details. If a contributor wants to further
 mark their specific copyright on a particular contribution, they should
 indicate their copyright solely in the commit message of the change when it is
 committed.
 =======================================================================
 Software under third_party
 =======================================================================

									
										87

README.md
									
												View File
												
				@ -1,4 +1,4 @@

				<p align="center"><img width="40%" src="docs/source/_static/img/pytorch-logo-dark.png" /></p>

				![PyTorch Logo](https://github.com/pytorch/pytorch/blob/master/docs/source/_static/img/pytorch-logo-dark.png)

				--------------------------------------------------------------------------------

				@ -15,6 +15,7 @@ We are in an early-release beta. Expect some adventures and rough edges.

				  - [Binaries](#binaries)

				  - [From Source](#from-source)

				  - [Docker Image](#docker-image)

				  - [Building the Documentation](#building-the-documentation)

				  - [Previous Versions](#previous-versions)

				- [Getting Started](#getting-started)

				- [Communication](#communication)

				@ -27,37 +28,21 @@ We are in an early-release beta. Expect some adventures and rough edges.

				| Linux GPU | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-master/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-master/) |

				| Windows GPU | <center>—</center> | [![Build Status](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/badge/icon)](https://ci.pytorch.org/jenkins/job/pytorch-builds/job/pytorch-win-ws2016-cuda9-cudnn7-py3-trigger/)

				See also the [ci.pytorch.org HUD](https://ezyang.github.io/pytorch-ci-hud/build/pytorch-master).

				## More about PyTorch

				At a granular level, PyTorch is a library that consists of the following components:

				<table>

				<tr>

				    <td><b> torch </b></td>

				    <td> a Tensor library like NumPy, with strong GPU support </td>

				</tr>

				<tr>

				    <td><b> torch.autograd </b></td>

				    <td> a tape-based automatic differentiation library that supports all differentiable Tensor operations in torch </td>

				</tr>

				<tr>

				    <td><b> torch.nn </b></td>

				    <td> a neural networks library deeply integrated with autograd designed for maximum flexibility </td>

				</tr>

				<tr>

				    <td><b> torch.multiprocessing  </b></td>

				    <td> Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training. </td>

				</tr>

				<tr>

				    <td><b> torch.utils </b></td>

				    <td> DataLoader, Trainer and other utility functions for convenience </td>

				</tr>

				<tr>

				    <td><b> torch.legacy(.nn/.optim) </b></td>

				    <td> legacy code that has been ported over from torch for backward compatibility reasons </td>

				</tr>

				</table>

				| Component | Description |

				| ---- | --- |

				| **torch** | a Tensor library like NumPy, with strong GPU support |

				| **torch.autograd** | a tape-based automatic differentiation library that supports all differentiable Tensor operations in torch |

				| **torch.nn** | a neural networks library deeply integrated with autograd designed for maximum flexibility |

				| **torch.multiprocessing** | Python multiprocessing, but with magical memory sharing of torch Tensors across processes. Useful for data loading and Hogwild training |

				| **torch.utils** | DataLoader, Trainer and other utility functions for convenience |

				| **torch.legacy(.nn/.optim)** | legacy code that has been ported over from torch for backward compatibility reasons |

				Usually one uses PyTorch either as:

				@ -70,10 +55,10 @@ Elaborating further:

				If you use NumPy, then you have used Tensors (a.k.a ndarray).

				<p align=center><img width="30%" src="docs/source/_static/img/tensor_illustration.png" /></p>

				![Tensor illustration](https://github.com/pytorch/pytorch/blob/master/docs/source/_static/img/tensor_illustration.png)

				PyTorch provides Tensors that can live either on the CPU or the GPU, and accelerate

				compute by a huge amount.

				PyTorch provides Tensors that can live either on the CPU or the GPU, and accelerates the

				computation by a huge amount.

				We provide a wide variety of tensor routines to accelerate and fit your scientific computation needs

				such as slicing, indexing, math operations, linear algebra, reductions.

				@ -97,7 +82,7 @@ from several research papers on this topic, as well as current and past work suc

				While this technique is not unique to PyTorch, it's one of the fastest implementations of it to date.

				You get the best of speed and flexibility for your crazy research.

				<p align=center><img width="80%" src="docs/source/_static/img/dynamic_graph.gif" /></p>

				![Dynamic graph](https://github.com/pytorch/pytorch/blob/master/docs/source/_static/img/dynamic_graph.gif)

				### Python First

				@ -121,8 +106,7 @@ We hope you never spend hours debugging your code because of bad stack traces or

				PyTorch has minimal framework overhead. We integrate acceleration libraries

				such as Intel MKL and NVIDIA (cuDNN, NCCL) to maximize speed.

				At the core, its CPU and GPU Tensor and neural network backends

				(TH, THC, THNN, THCUNN) are written as independent libraries with a C99 API.

				They are mature and have been tested for years.

				(TH, THC, THNN, THCUNN) are mature and have been tested for years.

				Hence, PyTorch is quite fast – whether you run small or large neural networks.

				@ -139,9 +123,8 @@ and with minimal abstractions.

				You can write new neural network layers in Python using the torch API

				[or your favorite NumPy-based libraries such as SciPy](http://pytorch.org/tutorials/advanced/numpy_extensions_tutorial.html).

				If you want to write your layers in C/C++, we provide an extension API based on

				[cffi](http://cffi.readthedocs.io/en/latest/) that is efficient and with minimal boilerplate.

				There is no wrapper code that needs to be written. You can see [a tutorial here](http://pytorch.org/tutorials/advanced/c_extension.html) and [an example here](https://github.com/pytorch/extension-ffi).

				If you want to write your layers in C/C++, we provide a convenient extension API that is efficient and with minimal boilerplate.

				There is no wrapper code that needs to be written. You can see [a tutorial here](http://pytorch.org/tutorials/advanced/cpp_extension.html) and [an example here](https://github.com/pytorch/extension-cpp).

				## Installation

				@ -165,7 +148,9 @@ If you want to compile with CUDA support, install

				If you want to disable CUDA support, export environment variable `NO_CUDA=1`.

				Other potentially useful environment variables may be found in `setup.py`.

				If you want to build on Windows, Visual Studio 2017 and NVTX are also needed.

				If you want to build on Windows, Visual Studio 2017 14.11 toolset and NVTX are also needed.

				Especially, for CUDA 8 build on Windows, there will be an additional requirement for VS 2015 Update 3 and a patch for it.

				The details of the patch can be found out [here](https://support.microsoft.com/en-gb/help/4020481/fix-link-exe-crashes-with-a-fatal-lnk1000-error-when-you-use-wholearch).

				#### Install optional dependencies

				@ -175,9 +160,10 @@ export CMAKE_PREFIX_PATH="$(dirname $(which conda))/../" # [anaconda root direct

				# Install basic dependencies

				conda install numpy pyyaml mkl mkl-include setuptools cmake cffi typing

				conda install -c mingfeima mkldnn

				# Add LAPACK support for the GPU

				conda install -c pytorch magma-cuda80 # or magma-cuda90 if CUDA 9

				conda install -c pytorch magma-cuda92 # or [magma-cuda80 | magma-cuda91] depending on your cuda version

				```

				On macOS

				@ -212,8 +198,11 @@ On Windows

				set "VS150COMNTOOLS=C:\Program Files (x86)\Microsoft Visual Studio\2017\Enterprise\VC\Auxiliary\Build"

				set CMAKE_GENERATOR=Visual Studio 15 2017 Win64

				set DISTUTILS_USE_SDK=1

				REM The following line is needed for Python 2.7, but the support for it is very experimental.

				REM The following two lines are needed for Python 2.7, but the support for it is very experimental.

				set MSSdk=1

				set FORCE_PY27_BUILD=1

				REM As for CUDA 8, VS2015 Update 3 is also required to build PyTorch. Use the following line.

				set "CUDAHOSTCXX=%VS140COMNTOOLS%\..\..\VC\bin\amd64\cl.exe"

				call "%VS150COMNTOOLS%\vcvarsall.bat" x64 -vcvars_ver=14.11

				python setup.py install

				@ -221,7 +210,7 @@ python setup.py install

				### Docker image

				Dockerfile is supplied to build images with cuda support and cudnn v7. Build as usual

				Dockerfile is supplied to build images with cuda support and cudnn v7. You can pass -e PYTHON_VERSION=x.y flag to specificy which python to be used by Miniconda, or leave it unset to use the default. Build as usual

				```

				docker build -t pytorch -f docker/pytorch/Dockerfile .

				```

				@ -235,23 +224,35 @@ Please note that PyTorch uses shared memory to share data between processes, so

				for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you

				should increase shared memory size either with `--ipc=host` or `--shm-size` command line options to `nvidia-docker run`.

				### Building the Documentation

				To build documentation in various formats, you will need Sphinx and the

				readthedocs theme.

				```

				cd docs/

				pip install -r requirements.txt

				```

				You can then build the documentation by running ``make <format>`` from the

				``docs/`` folder. Run ``make`` to get a list of all available output formats.

				### Previous Versions

				Installation instructions and binaries for previous PyTorch versions may be found

				on [our website](http://pytorch.org/previous-versions/).

				on [our website](http://pytorch.org/previous-versions).

				## Getting Started

				Three pointers to get you started:

				- [Tutorials: get you started with understanding and using PyTorch](http://pytorch.org/tutorials/)

				- [Tutorials: get you started with understanding and using PyTorch](https://pytorch.org/tutorials/)

				- [Examples: easy to understand pytorch code across all domains](https://github.com/pytorch/examples)

				- [The API Reference](http://pytorch.org/docs/)

				## Communication

				* forums: discuss implementations, research, etc. http://discuss.pytorch.org

				* GitHub issues: bug reports, feature requests, install issues, RFCs, thoughts, etc.

				* Slack: general chat, online discussions, collaboration etc. https://pytorch.slack.com/ . Our slack channel is invite-only to promote a healthy balance between power-users and beginners. If you need a slack invite, ping us at soumith@pytorch.org

				* Slack: general chat, online discussions, collaboration etc. https://pytorch.slack.com/ . Our slack channel is invite-only to promote a healthy balance between power-users and beginners. If you need a slack invite, ping us at slack@pytorch.org

				* newsletter: no-noise, one-way email newsletter with important announcements about pytorch. You can sign-up here: http://eepurl.com/cbG0rv

				## Releases and Contributing

10

aten/.gitmodules vendored

View File

 @ -1,10 +0,0 @@
 [submodule "src/ATen/cpu/cpuinfo"]
 	path = src/ATen/cpu/cpuinfo
 	url = https://github.com/Maratyszcza/cpuinfo
 [submodule "src/ATen/cpu/tbb/tbb_remote"]
 	path = src/ATen/cpu/tbb/tbb_remote
 	url = https://github.com/01org/tbb
 	branch = tbb_2018
 [submodule "src/ATen/utils/catch"]
 	path = src/ATen/utils/catch
 	url = https://github.com/catchorg/Catch2.git

									
										576

aten/CMakeLists.txt
									
												View File
												
				@ -1,521 +1,105 @@

				cmake_minimum_required(VERSION 3.0)

				set(CMAKE_MODULE_PATH

				  ${CMAKE_CURRENT_SOURCE_DIR}/cmake

				  ${CMAKE_CURRENT_SOURCE_DIR}/../cmake/Modules_CUDA_fix

				if (BUILD_ATEN_MOBILE)

				  return()

				endif()

				# Find modules

				list(APPEND CMAKE_MODULE_PATH

				  /usr/lib/x86_64-linux-gnu/

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/TH/cmake

				  ${CMAKE_MODULE_PATH})

				SET(CMAKE_LIBRARY_PATH /usr/lib/x86_64-linux-gnu/ ${CMAKE_LIBRARY_PATH})

				project(ATen)

				  ${CMAKE_CURRENT_SOURCE_DIR}/../cmake/Modules

				  ${CMAKE_CURRENT_SOURCE_DIR}/../cmake/public

				  ${CMAKE_CURRENT_SOURCE_DIR}/../cmake/Modules_CUDA_fix)

				list(APPEND CMAKE_LIBRARY_PATH /usr/lib/x86_64-linux-gnu/)

				cmake_policy(SET CMP0012 NEW)

				# Polyfill for upstream FindCUDA

				include(CMakeInitializeConfigs)

				#############################################

				# RPATH stuff

				# see https://cmake.org/Wiki/CMake_RPATH_handling

				if(APPLE)

				  set(CMAKE_MACOSX_RPATH ON)

				endif()

				set(CMAKE_SKIP_BUILD_RPATH  FALSE)

				set(CMAKE_BUILD_WITH_INSTALL_RPATH FALSE)

				set(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_PREFIX}/lib")

				set(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE)

				set(CMAKE_POSITION_INDEPENDENT_CODE TRUE)

				list(FIND CMAKE_PLATFORM_IMPLICIT_LINK_DIRECTORIES "${CMAKE_INSTALL_PREFIX}/lib" isSystemDir)

				if("${isSystemDir}" STREQUAL "-1")

				  set(CMAKE_INSTALL_RPATH "${CMAKE_INSTALL_PREFIX}/lib")

				set(ATen_CPU_SRCS)

				set(ATen_CPU_TEST_SRCS)

				set(ATen_CPU_INCLUDE)

				set(ATen_THIRD_PARTY_INCLUDE)

				set(ATen_CUDA_SRCS)

				set(ATen_CUDA_TEST_SRCS)

				set(ATen_CUDA_INCLUDE)

				set(ATen_CPU_DEPENDENCY_LIBS)

				set(ATen_CUDA_DEPENDENCY_LIBS)

				set(ATen_PUBLIC_CUDA_DEPENDENCY_LIBS)

				SET(ATEN_INSTALL_BIN_SUBDIR "bin" CACHE PATH "ATen install binary subdirectory")

				SET(ATEN_INSTALL_LIB_SUBDIR "lib" CACHE PATH "ATen install library subdirectory")

				SET(ATEN_INSTALL_INCLUDE_SUBDIR "include" CACHE PATH "ATen install include subdirectory")

				if(USE_CUDA)

				  list(APPEND ATen_CUDA_INCLUDE ${CUDA_INCLUDE_DIRS})

				endif()

				IF(NOT MSVC)

				  set(CMAKE_CXX_FLAGS "--std=c++11 -Wall -Wextra -Wno-missing-field-initializers -Wno-type-limits -Wno-unused-parameter -Wno-unknown-pragmas -Wno-vla -fexceptions ${CMAKE_CXX_FLAGS}")

				  set(CMAKE_C_FLAGS "-fexceptions ${CMAKE_C_FLAGS}")

				  if ($ENV{WERROR})

				    set(CMAKE_CXX_FLAGS "-Werror ${CMAKE_CXX_FLAGS}")

				set(TH_LINK_STYLE STATIC)

				add_subdirectory(src/TH)

				set(TH_CPU_INCLUDE

				  # dense

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/TH

				  ${CMAKE_CURRENT_BINARY_DIR}/src/TH

				  ${CMAKE_CURRENT_SOURCE_DIR}/src

				  ${CMAKE_CURRENT_BINARY_DIR}/src

				  ${CMAKE_BINARY_DIR}/aten/src)

				list(APPEND ATen_CPU_INCLUDE ${TH_CPU_INCLUDE})

				if(USE_CUDA OR USE_ROCM)

				  set(TH_CUDA_INCLUDE

				    # dense

				    ${CMAKE_CURRENT_SOURCE_DIR}/src/THC

				    ${CMAKE_CURRENT_BINARY_DIR}/src/THC)

				  list(APPEND ATen_CUDA_INCLUDE ${TH_CUDA_INCLUDE})

				endif()

				add_subdirectory(src/THNN)

				# Find the HIP package, set the HIP paths, load the HIP CMake.

				IF(USE_ROCM)

				  include(LoadHIP)

				  if (NOT PYTORCH_FOUND_HIP)

				    MESSAGE(FATAL_ERROR

				      "Could not find HIP installation")

				  endif()

				ENDIF(NOT MSVC)

				INCLUDE(CheckCXXSourceCompiles)

				# disable some verbose warnings

				IF (MSVC)

				  set(CMAKE_CXX_FLAGS "/wd4267 /wd4251 /wd4522 /wd4522 /wd4838 /wd4305 /wd4244 /wd4190 /wd4101 /wd4996 /wd4275 ${CMAKE_CXX_FLAGS}")

				ENDIF(MSVC)

				# windef.h will define max/min macros if NOMINMAX is not defined

				IF(MSVC)

				  add_definitions(/DNOMINMAX)

				ENDIF(MSVC)

				#Check if certain std functions are supported. Sometimes

				#_GLIBCXX_USE_C99 macro is not defined and some functions are missing.

				CHECK_CXX_SOURCE_COMPILES("

				#include <cmath>

				#include <string>

				int main() {

				  int a = std::isinf(3.0);

				  int b = std::isnan(0.0);

				  std::string s = std::to_string(1);

				  return 0;

				  }" SUPPORT_GLIBCXX_USE_C99)

				if(NOT SUPPORT_GLIBCXX_USE_C99)

				  message(FATAL_ERROR

				          "The C++ compiler does not support required functions. "

				          "This is very likely due to a known bug in GCC 5 "

				          "(and maybe other versions) on Ubuntu 17.10 and newer. "

				          "For more information, see: "

				          "https://github.com/pytorch/pytorch/issues/5229"

				         )

				endif()

				# Top-level build config

				############################################

				# Flags

				# When using MSVC

				# Detect CUDA architecture and get best NVCC flags

				# finding cuda must be first because other things depend on the result

				#

				# NB: We MUST NOT run this find_package if NO_CUDA is set, because upstream

				# FindCUDA has a bug where it will still attempt to make use of NOTFOUND

				# compiler variables to run various probe tests.  We could try to fix

				# this, but since FindCUDA upstream is subsumed by first-class support

				# for CUDA language, it seemed not worth fixing.

				IF(NOT CUDA_FOUND AND NOT NO_CUDA)

				  FIND_PACKAGE(CUDA 5.5)

				ENDIF()

				IF(MSVC)

				  # we want to respect the standard, and we are bored of those **** .

				  ADD_DEFINITIONS(-D_CRT_SECURE_NO_DEPRECATE=1)

				  LIST(APPEND CUDA_NVCC_FLAGS "-Xcompiler /wd4819 -Xcompiler /wd4503 -Xcompiler /wd4190 -Xcompiler /wd4244 -Xcompiler /wd4251 -Xcompiler /wd4275 -Xcompiler /wd4522")

				  ADD_DEFINITIONS(-DTH_EXPORTS)

				  IF (NOT NO_CUDA)

				    ADD_DEFINITIONS(-DTHC_EXPORTS)

				  ENDIF()

				ENDIF(MSVC)

				IF (NOT MSVC)

				  IF (CMAKE_VERSION VERSION_LESS "3.1")

				    SET(CMAKE_C_FLAGS "-std=c11 ${CMAKE_C_FLAGS}")

				  ELSE ()

				    SET(CMAKE_C_STANDARD 11)

				  ENDIF ()

				ENDIF(NOT MSVC)

				if(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")

				  if(CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "4.9")

				    if(CUDA_VERSION VERSION_LESS "8.0")

				      MESSAGE(STATUS "Found gcc >=5 and CUDA <= 7.5, adding workaround C++ flags")

				      set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -D_FORCE_INLINES -D_MWAITXINTRIN_H_INCLUDED -D__STRICT_ANSI__")

				    endif(CUDA_VERSION VERSION_LESS "8.0")

				  endif(CMAKE_CXX_COMPILER_VERSION VERSION_GREATER "4.9")

				endif(CMAKE_CXX_COMPILER_ID STREQUAL "GNU")

				LIST(APPEND CUDA_NVCC_FLAGS -Wno-deprecated-gpu-targets)

				LIST(APPEND CUDA_NVCC_FLAGS --expt-extended-lambda)

				if(NOT CMAKE_CXX_COMPILER_ID STREQUAL "GNU")

				  SET(CMAKE_CXX_STANDARD 11)

				endif()

				IF(NOT COMMAND CUDA_SELECT_NVCC_ARCH_FLAGS)

				  INCLUDE(${CMAKE_CURRENT_SOURCE_DIR}/cmake/select_compute_arch.cmake)

				ENDIF()

				LIST(APPEND CUDA_NVCC_FLAGS $ENV{TORCH_NVCC_FLAGS})

				CUDA_SELECT_NVCC_ARCH_FLAGS(NVCC_FLAGS_EXTRA $ENV{TORCH_CUDA_ARCH_LIST})

				LIST(APPEND CUDA_NVCC_FLAGS ${NVCC_FLAGS_EXTRA})

				IF(CMAKE_POSITION_INDEPENDENT_CODE AND NOT MSVC)

				  LIST(APPEND CUDA_NVCC_FLAGS "-Xcompiler -fPIC")

				ENDIF()

				IF(CUDA_HAS_FP16 OR NOT ${CUDA_VERSION} LESS 7.5)

				  MESSAGE(STATUS "Found CUDA with FP16 support, compiling with torch.CudaHalfTensor")

				  LIST(APPEND CUDA_NVCC_FLAGS "-DCUDA_HAS_FP16=1 -D__CUDA_NO_HALF_OPERATORS__")

				  SET(CMAKE_C_FLAGS "-DCUDA_HAS_FP16=1 ${CMAKE_C_FLAGS}")

				ELSE(CUDA_HAS_FP16 OR NOT ${CUDA_VERSION} LESS 7.5)

				  MESSAGE(STATUS "Could not find CUDA with FP16 support, compiling without torch.CudaHalfTensor")

				ENDIF(CUDA_HAS_FP16 OR NOT ${CUDA_VERSION} LESS 7.5)

				OPTION(NDEBUG "disable asserts (WARNING: this may result in silent UB e.g. with out-of-bound indices)")

				IF(NOT NDEBUG)

				  MESSAGE(STATUS "Removing -DNDEBUG from compile flags")

				  STRING(REPLACE "-DNDEBUG" "" CMAKE_C_FLAGS "" ${CMAKE_C_FLAGS})

				  STRING(REPLACE "-DNDEBUG" "" CMAKE_C_FLAGS_DEBUG "" ${CMAKE_C_FLAGS_DEBUG})

				  STRING(REPLACE "-DNDEBUG" "" CMAKE_C_FLAGS_RELEASE "" ${CMAKE_C_FLAGS_RELEASE})

				  STRING(REPLACE "-DNDEBUG" "" CMAKE_CXX_FLAGS "" ${CMAKE_CXX_FLAGS})

				  STRING(REPLACE "-DNDEBUG" "" CMAKE_CXX_FLAGS_DEBUG "" ${CMAKE_CXX_FLAGS_DEBUG})

				  STRING(REPLACE "-DNDEBUG" "" CMAKE_CXX_FLAGS_RELEASE "" ${CMAKE_CXX_FLAGS_RELEASE})

				ENDIF()

				# OpenMP support?

				SET(WITH_OPENMP ON CACHE BOOL "OpenMP support if available?")

				IF (APPLE AND CMAKE_COMPILER_IS_GNUCC)

				  EXEC_PROGRAM (uname ARGS -v  OUTPUT_VARIABLE DARWIN_VERSION)

				  STRING (REGEX MATCH "[0-9]+" DARWIN_VERSION ${DARWIN_VERSION})

				  MESSAGE (STATUS "MAC OS Darwin Version: ${DARWIN_VERSION}")

				  IF (DARWIN_VERSION GREATER 9)

				    SET(APPLE_OPENMP_SUCKS 1)

				  ENDIF (DARWIN_VERSION GREATER 9)

				  EXECUTE_PROCESS (COMMAND ${CMAKE_C_COMPILER} -dumpversion

				    OUTPUT_VARIABLE GCC_VERSION)

				  IF (APPLE_OPENMP_SUCKS AND GCC_VERSION VERSION_LESS 4.6.2)

				    MESSAGE(STATUS "Warning: Disabling OpenMP (unstable with this version of GCC)")

				    MESSAGE(STATUS " Install GCC >= 4.6.2 or change your OS to enable OpenMP")

				    SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-unknown-pragmas")

				    SET(WITH_OPENMP OFF CACHE BOOL "OpenMP support if available?" FORCE)

				  ENDIF ()

				ENDIF ()

				IF (WITH_OPENMP AND NOT CHECKED_OPENMP)

				  FIND_PACKAGE(OpenMP)

				  SET(CHECKED_OPENMP ON CACHE BOOL "already checked for OpenMP")

				  # OPENMP_FOUND is not cached in FindOpenMP.cmake (all other variables are cached)

				  # see https://github.com/Kitware/CMake/blob/master/Modules/FindOpenMP.cmake

				  SET(OPENMP_FOUND ${OPENMP_FOUND} CACHE BOOL "OpenMP Support found")

				ENDIF (WITH_OPENMP AND NOT CHECKED_OPENMP)

				IF(OPENMP_FOUND)

				  MESSAGE(STATUS "Compiling with OpenMP support")

				  SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} ${OpenMP_C_FLAGS}")

				  SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} ${OpenMP_CXX_FLAGS}")

				ENDIF(OPENMP_FOUND)

				SET(CUDA_ATTACH_VS_BUILD_RULE_TO_CUDA_FILE OFF)

				FIND_PACKAGE(MAGMA)

				IF(CUDA_FOUND AND MAGMA_FOUND)

				  INCLUDE_DIRECTORIES("${MAGMA_INCLUDE_DIR}")

				  SET(CMAKE_REQUIRED_INCLUDES "${MAGMA_INCLUDE_DIR};${CUDA_INCLUDE_DIRS}")

				  INCLUDE(CheckPrototypeDefinition)

				  check_prototype_definition(magma_get_sgeqrf_nb

				   "magma_int_t magma_get_sgeqrf_nb( magma_int_t m, magma_int_t n );"

				   "0"

				   "magma.h"

				    MAGMA_V2)

				  IF (MAGMA_V2)

				    add_definitions(-DMAGMA_V2)

				  ENDIF (MAGMA_V2)

				  SET(USE_MAGMA 1)

				  MESSAGE(STATUS "Compiling with MAGMA support")

				  MESSAGE(STATUS "MAGMA INCLUDE DIRECTORIES: ${MAGMA_INCLUDE_DIR}")

				  MESSAGE(STATUS "MAGMA LIBRARIES: ${MAGMA_LIBRARIES}")

				  MESSAGE(STATUS "MAGMA V2 check: ${MAGMA_V2}")

				ELSE()

				  MESSAGE(STATUS "MAGMA not found. Compiling without MAGMA support")

				ENDIF()

				# ARM specific flags

				FIND_PACKAGE(ARM)

				IF (ASIMD_FOUND)

				  MESSAGE(STATUS "asimd/Neon found with compiler flag : -D__NEON__")

				  SET(CMAKE_C_FLAGS "-D__NEON__ ${CMAKE_C_FLAGS}")

				ELSEIF (NEON_FOUND)

				  MESSAGE(STATUS "Neon found with compiler flag : -mfpu=neon -D__NEON__")

				  SET(CMAKE_C_FLAGS "-mfpu=neon -D__NEON__ ${CMAKE_C_FLAGS}")

				ENDIF (ASIMD_FOUND)

				IF (CORTEXA8_FOUND)

				  MESSAGE(STATUS "Cortex-A8 Found with compiler flag : -mcpu=cortex-a8")

				  SET(CMAKE_C_FLAGS "-mcpu=cortex-a8 -fprefetch-loop-arrays ${CMAKE_C_FLAGS}")

				ENDIF (CORTEXA8_FOUND)

				IF (CORTEXA9_FOUND)

				  MESSAGE(STATUS "Cortex-A9 Found with compiler flag : -mcpu=cortex-a9")

				  SET(CMAKE_C_FLAGS "-mcpu=cortex-a9 ${CMAKE_C_FLAGS}")

				ENDIF (CORTEXA9_FOUND)

				IF(UNIX)

				  # prevent Unknown CMake command "check_function_exists".

				  INCLUDE(CheckFunctionExists)

				ENDIF(UNIX)

				INCLUDE (CheckIncludeFile)

				INCLUDE (CheckCSourceCompiles)

				INCLUDE (CheckCSourceRuns)

				# Check that our programs run.  This is different from the native CMake compiler

				# check, which just tests if the program compiles and links.  This is important

				# because with ASAN you might need to help the compiled library find some

				# dynamic libraries.

				CHECK_C_SOURCE_RUNS("

				int main() { return 0; }

				" COMPILER_WORKS)

				IF(NOT COMPILER_WORKS)

				  # Force cmake to retest next time around

				  unset(COMPILER_WORKS CACHE)

				  MESSAGE(FATAL_ERROR

				      "Could not run a simple program built with your compiler. "

				      "If you are trying to use -fsanitize=address, make sure "

				      "libasan is properly installed on your system (you can confirm "

				      "if the problem is this by attempting to build and run a "

				      "small program.)")

				ENDIF()

				CHECK_INCLUDE_FILE(cpuid.h HAVE_CPUID_H)

				# Check for a cpuid intrinsic

				IF(HAVE_CPUID_H)

				    CHECK_C_SOURCE_COMPILES("#include <cpuid.h>

				        int main()

				        {

				            unsigned int eax, ebx, ecx, edx;

				            return __get_cpuid(0, &eax, &ebx, &ecx, &edx);

				        }" HAVE_GCC_GET_CPUID)

				ENDIF()

				IF(HAVE_GCC_GET_CPUID)

				  SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DHAVE_GCC_GET_CPUID")

				ENDIF(HAVE_GCC_GET_CPUID)

				CHECK_C_SOURCE_COMPILES("#include <stdint.h>

				    static inline void cpuid(uint32_t *eax, uint32_t *ebx,

				    			 uint32_t *ecx, uint32_t *edx)

				    {

				      uint32_t a = *eax, b, c = *ecx, d;

				      asm volatile ( \"cpuid\" : \"+a\"(a), \"=b\"(b), \"+c\"(c), \"=d\"(d) );

				      *eax = a; *ebx = b; *ecx = c; *edx = d;

				    }

				    int main() {

				      uint32_t a,b,c,d;

				      cpuid(&a, &b, &c, &d);

				      return 0;

				    }" NO_GCC_EBX_FPIC_BUG)

				IF(NOT NO_GCC_EBX_FPIC_BUG)

				  SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DUSE_GCC_GET_CPUID")

				ENDIF(NOT NO_GCC_EBX_FPIC_BUG)

				FIND_PACKAGE(SSE) # checks SSE, AVX and AVX2

				IF(C_SSE2_FOUND)

				  MESSAGE(STATUS "SSE2 Found")

				  SET(CMAKE_C_FLAGS "${C_SSE2_FLAGS} -DUSE_SSE2 ${CMAKE_C_FLAGS}")

				ENDIF(C_SSE2_FOUND)

				IF(C_SSE4_1_FOUND AND C_SSE4_2_FOUND)

				  SET(CMAKE_C_FLAGS "${C_SSE4_1_FLAGS} -DUSE_SSE4_1 ${C_SSE4_2_FLAGS} -DUSE_SSE4_2 ${CMAKE_C_FLAGS}")

				ENDIF()

				IF(C_SSE3_FOUND)

				  MESSAGE(STATUS "SSE3 Found")

				  SET(CMAKE_C_FLAGS "${C_SSE3_FLAGS} -DUSE_SSE3 ${CMAKE_C_FLAGS}")

				  SET(CMAKE_CXX_FLAGS "${C_SSE3_FLAGS} -DUSE_SSE3 ${CMAKE_CXX_FLAGS}")

				ENDIF(C_SSE3_FOUND)

				# we don't set -mavx and -mavx2 flags globally, but only for specific files

				# however, we want to enable the AVX codepaths, so we still need to

				# add USE_AVX and USE_AVX2 macro defines

				IF(C_AVX_FOUND)

				  MESSAGE(STATUS "AVX Found")

				  SET(CMAKE_C_FLAGS "-DUSE_AVX ${CMAKE_C_FLAGS}")

				ENDIF(C_AVX_FOUND)

				IF(C_AVX2_FOUND)

				  MESSAGE(STATUS "AVX2 Found")

				  SET(CMAKE_C_FLAGS "-DUSE_AVX2 ${CMAKE_C_FLAGS}")

				  SET(CMAKE_CXX_FLAGS "-DUSE_AVX2 ${CMAKE_CXX_FLAGS}")

				ENDIF(C_AVX2_FOUND)

				CHECK_C_SOURCE_RUNS("

				#include <stdatomic.h>

				// ATOMIC_INT_LOCK_FREE is flaky on some older gcc versions

				// so if this define is not usable a preprocessor definition

				// we fail this check and fall back to GCC atomics

				#if ATOMIC_INT_LOCK_FREE == 2

				#define TH_ATOMIC_IPC_REFCOUNT 1

				#endif

				int main()

				{

				  int a;

				  int oa;

				  atomic_store(&a, 1);

				  atomic_fetch_add(&a, 1);

				  oa = atomic_load(&a);

				  if(!atomic_compare_exchange_strong(&a, &oa, 3))

				    return -1;

				  return 0;

				}

				" HAS_C11_ATOMICS)

				IF(NOT HAS_C11_ATOMICS)

				  CHECK_C_SOURCE_RUNS("

				#include <intrin.h>

				int main()

				{

				  long a;

				  _InterlockedExchange(&a, 1);

				  _InterlockedExchangeAdd(&a, 1);

				  if(_InterlockedCompareExchange(&a, 3, 2) != 2)

				    return -1;

				  return 0;

				}

				" HAS_MSC_ATOMICS)

				  CHECK_C_SOURCE_RUNS("

				int main()

				{

				  int a;

				  __sync_lock_test_and_set(&a, 1);

				  __sync_fetch_and_add(&a, 1);

				  if(!__sync_bool_compare_and_swap(&a, 2, 3))

				    return -1;

				  return 0;

				}

				" HAS_GCC_ATOMICS)

				ENDIF()

				IF(HAS_C11_ATOMICS)

				  ADD_DEFINITIONS(-DUSE_C11_ATOMICS=1)

				  MESSAGE(STATUS "Atomics: using C11 intrinsics")

				ELSEIF(HAS_MSC_ATOMICS)

				  ADD_DEFINITIONS(-DUSE_MSC_ATOMICS=1)

				  MESSAGE(STATUS "Atomics: using MSVC intrinsics")

				ELSEIF(HAS_GCC_ATOMICS)

				  ADD_DEFINITIONS(-DUSE_GCC_ATOMICS=1)

				    MESSAGE(STATUS "Atomics: using GCC intrinsics")

				ELSE()

				  SET(CMAKE_THREAD_PREFER_PTHREAD TRUE)

				  FIND_PACKAGE(Threads)

				  IF(THREADS_FOUND)

				    ADD_DEFINITIONS(-DUSE_PTHREAD_ATOMICS=1)

				    TARGET_LINK_LIBRARIES(TH ${CMAKE_THREAD_LIBS_INIT})

				    MESSAGE(STATUS "Atomics: using pthread")

				  ENDIF()

				ENDIF()

				IF (WIN32 AND NOT CYGWIN)

				  SET(BLAS_INSTALL_LIBRARIES "OFF"

				    CACHE BOOL "Copy the required BLAS DLLs into the TH install dirs")

				ENDIF (WIN32 AND NOT CYGWIN)

				MACRO(Install_Required_Library ln)

				    get_filename_component(libpath ${ln} PATH)

				    get_filename_component(libname ${ln} NAME_WE)

				    file(GLOB libdlls "${libpath}/${libname}*.dll")

				    install(PROGRAMS ${libdlls}

				      DESTINATION "${TH_INSTALL_BIN_SUBDIR}")

				ENDMACRO(Install_Required_Library libname)

				FIND_PACKAGE(BLAS)

				SET(AT_MKL_ENABLED 0)

				SET(AT_MKL_MT 0)

				IF(BLAS_FOUND)

				  SET(USE_BLAS 1)

				  IF(BLAS_INFO STREQUAL "mkl")

				    ADD_DEFINITIONS(-DTH_BLAS_MKL)

				    IF(NOT BLAS_INCLUDE_DIR)

				      MESSAGE(FATAL_ERROR "MKL is used, but MKL header files are not found. \

				        You can get them by `conda install mkl-include` if using conda (if \

				        it is missing, run `conda upgrade -n root conda` first), and \

				        `pip install mkl-devel` if using pip. If build fails with header files \

				        available in the system, please make sure that CMake will search the \

				        directory containing them, e.g., by setting CMAKE_INCLUDE_PATH.")

				    ENDIF()

				    IF(MSVC AND MKL_LIBRARIES MATCHES ".*libiomp5md\\.lib.*")

				      ADD_DEFINITIONS(-D_OPENMP_NOFORCE_MANIFEST)

				      SET(AT_MKL_MT 1)

				    ENDIF()

				    INCLUDE_DIRECTORIES(${BLAS_INCLUDE_DIR})  # include MKL headers

				    SET(AT_MKL_ENABLED 1)

				  ENDIF()

				ENDIF(BLAS_FOUND)

				FIND_PACKAGE(LAPACK)

				IF(LAPACK_FOUND)

				  SET(USE_LAPACK 1)

				ENDIF(LAPACK_FOUND)

				#############################################

				set(ATen_CPU_SRCS)

				set(ATen_CPU_INCLUDE)

				set(ATen_CUDA_SRCS)

				set(ATen_CUDA_INCLUDE)

				SET(ATEN_INSTALL_BIN_SUBDIR "bin" CACHE PATH "ATen install binary subdirectory")

				SET(ATEN_INSTALL_LIB_SUBDIR "lib" CACHE PATH "ATen install library subdirectory")

				SET(ATEN_INSTALL_INCLUDE_SUBDIR "include" CACHE PATH "ATen install include subdirectory")

				add_definitions(-DTH_INDEX_BASE=0)

				set(TH_LINK_STYLE STATIC)

				add_subdirectory(src/TH)

				include_directories(

				  # dense

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/TH

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/THC

				  ${CMAKE_CURRENT_BINARY_DIR}/src/TH

				  ${CMAKE_CURRENT_BINARY_DIR}/src/THC

				  # sparse

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/THS

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/THCS

				  ${CMAKE_CURRENT_BINARY_DIR}/src/THS

				  ${CMAKE_CURRENT_BINARY_DIR}/src/THCS

				  ${CMAKE_CURRENT_SOURCE_DIR}/src

				  ${CMAKE_CURRENT_BINARY_DIR}/src)

				add_subdirectory(src/THNN)

				add_subdirectory(src/THS)

				if(NO_CUDA)

				  message("disabling CUDA because NO_CUDA is set")

				  SET(CUDA_FLAG -n)

				  SET(AT_CUDA_ENABLED 0)

				else()

				if(USE_ROCM)

				  SET(AT_CUDA_ENABLED 1)

				  add_subdirectory(src/THC)

				  add_subdirectory(src/THCUNN)

				  message("ROCm is enabled.")

				elseif(USE_CUDA)

				  SET(AT_CUDA_ENABLED 1)

				  INCLUDE_DIRECTORIES(${CUDA_INCLUDE_DIRS})

				  find_package(CUDA 5.5 REQUIRED)

				  add_subdirectory(src/THC)

				  add_subdirectory(src/THCUNN)

				  add_subdirectory(src/THCS)

				endif()

				find_package(CuDNN)

				IF(NOT AT_CUDA_ENABLED OR NOT CUDNN_FOUND)

				  MESSAGE(STATUS "CuDNN not found. Compiling without CuDNN support")

				  set(AT_CUDNN_ENABLED 0)

				ELSE()

				  INCLUDE_DIRECTORIES(BEFORE ${CUDNN_INCLUDE_DIRS})

				  set(AT_CUDNN_ENABLED 1)

				ENDIF()

				if(NO_MKLDNN)

				  message("disabling MKLDNN because NO_MKLDNN is set")

				  set(AT_MKLDNN_ENABLED 0)

				else()

				  find_package(MKLDNN)

				  if(NOT MKLDNN_FOUND)

				    message(STATUS "MKLDNN not found. Compiling without MKLDNN support")

				    set(AT_MKLDNN_ENABLED 0)

				  else()

				    INCLUDE_DIRECTORIES(${MKLDNN_INCLUDE_DIRS})

				    set(AT_MKLDNN_ENABLED 1)

				  endif()

				  message("disabling CUDA because USE_CUDA is set false")

				  SET(AT_CUDA_ENABLED 0)

				endif()

				set(cwrap_files

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/ATen/Declarations.cwrap

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/THNN/generic/THNN.h

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/THCUNN/generic/THCUNN.h

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/ATen/nn.yaml

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/ATen/native/native_functions.yaml

				)

				include_directories(

				${CMAKE_CURRENT_SOURCE_DIR}/src/THNN

				${CMAKE_CURRENT_SOURCE_DIR}/src/THCUNN)

				list(APPEND ATen_CPU_INCLUDE

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/THNN

				  ${CMAKE_CURRENT_SOURCE_DIR}/src/THCUNN)

				list(APPEND ATen_CPU_INCLUDE

				  ${CMAKE_CURRENT_SOURCE_DIR}/src

				  ${CMAKE_CURRENT_SOURCE_DIR}/../third_party/catch/single_include

				  ${CMAKE_CURRENT_BINARY_DIR}/src/ATen)

				add_subdirectory(src/ATen)

				include_directories(

				${CMAKE_CURRENT_SOURCE_DIR}/src

				${CMAKE_CURRENT_SOURCE_DIR}/src/ATen/utils/catch/single_include

				${CMAKE_CURRENT_BINARY_DIR}/src/ATen)

				if(NOT NO_CUDA)

				  include_directories(${CUDA_INCLUDE_DIRS})

				endif()

				add_subdirectory(src/ATen/test)

				if(ATEN_NO_CONTRIB)

				  message("disable contrib because ATEN_NO_CONTRIB is set")

				else()

				  add_subdirectory(contrib/data)

				  add_subdirectory(contrib/meter)

				endif()

				# Pass source, includes, and libs to parent

				set(ATen_CPU_SRCS ${ATen_CPU_SRCS} PARENT_SCOPE)

				set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} PARENT_SCOPE)

				set(ATen_CPU_TEST_SRCS ${ATen_CPU_TEST_SRCS} PARENT_SCOPE)

				set(ATen_CUDA_TEST_SRCS ${ATen_CUDA_TEST_SRCS} PARENT_SCOPE)

				set(ATen_CPU_INCLUDE ${ATen_CPU_INCLUDE} PARENT_SCOPE)

				set(ATen_CUDA_INCLUDE ${ATen_CUDA_INCLUDE} PARENT_SCOPE)

				set(ATen_THIRD_PARTY_INCLUDE ${ATen_THIRD_PARTY_INCLUDE} PARENT_SCOPE)

				set(ATen_CPU_DEPENDENCY_LIBS ${ATen_CPU_DEPENDENCY_LIBS} PARENT_SCOPE)

				set(ATen_CUDA_DEPENDENCY_LIBS ${ATen_CUDA_DEPENDENCY_LIBS} PARENT_SCOPE)

				set(ATen_CORE_TEST_SRCS ${ATen_CORE_TEST_SRCS} PARENT_SCOPE)

									
										11

aten/README.md
									
												View File
												
				@ -2,7 +2,7 @@

				ATen is a simple tensor library thats exposes the Tensor operations in Torch

				and PyTorch directly in C++11. The wrapper respects the semantics of operators

				in PyTorch, except minor details due to differences between C++ in Python in

				in PyTorch, except minor details due to differences between C++ and Python in

				the way default arguments are handled. See the [documentation for tensors](http://pytorch.org/docs/tensors.html) in PyTorch for what these operations do.

				ATen's API is auto-generated from the same declarations PyTorch uses so the

				two APIs will track each other over time.

				@ -12,7 +12,7 @@ does not include templates. That is, there is one `Tensor` type. It can hold a

				CPU or CUDA Tensor, and the tensor may have Doubles, Float, Ints, etc. This design

				makes it easy to write generic code without templating everything.

				See the _generated_ [`Tensor.h` file](doc/Tensor.h) and [`Functions.h` file](doc/Functions.h) for the provided API. Excerpt:

				See https://pytorch.org/cppdocs for the provided API. Excerpt:

				```c++

				Tensor atan2(const Tensor & other) const;

				Tensor & atan2_(const Tensor & other);

				@ -48,7 +48,8 @@ sudo pip install pyyaml

				mkdir build

				cd build

				cmake .. -DCMAKE_INSTALL_PREFIX=/where/you/want # specify your dest directory

				# cmake .. -DNO_CUDA=true  # for CPU only machines

				# cmake .. -DUSE_NVRTC=ON -DUSE_TENSORRT=OFF -DCMAKE_INSTALL_PREFIX=../install -DCAFFE2_CMAKE_BUILDING_WITH_MAIN_REPO=OFF -DUSE_CUDA=ON # for CUDA

				# cmake .. -DUSE_CUDA=OFF  # for CPU only machines

				make install

				```

				@ -87,7 +88,7 @@ for(auto i = 0; i < 100000; i++) {

				Expressions like `CUDA(kFloat)` are first-class `at::Type` objects that represent

				the type of a Tensor and are used to create Tensors when their type cannot be

				inferred. See the _generated_ [Type header](doc/Type.h) for its API.

				inferred.

				See more in [sample files](src/ATen/test).

				@ -164,7 +165,7 @@ behave as normal tensors.

				### Scalars and zero-dimensional tensors

				In addition to the `Tensor` objects, ATen also includes `Scalar`s that represent a single number.

				Like a Tensor, Scalars are dynamically typed and can hold any one of ATen's [number types](doc/Type.h).

				Like a Tensor, Scalars are dynamically typed and can hold any one of ATen's number types.

				Scalars can be implicitly constructed from C++ number types. Scalars are needed because some functions like `addmm` take numbers along with Tensors and expect these

				numbers to be the same dynamic type as the tensor. They are also used in the API to indicate places where

				a function will _always_ return a Scalar value, like `sum`.

									
										70

aten/cmake/FindCuDNN.cmake
									
												View File
											
				@ -1,70 +0,0 @@

				# - Try to find cuDNN

				#

				# The following variables are optionally searched for defaults

				#  CUDNN_ROOT_DIR:            Base directory where all cuDNN components are found

				#

				# The following are set after configuration is done:

				#  CUDNN_FOUND

				#  CUDNN_INCLUDE_DIRS

				#  CUDNN_LIBRARIES

				#  CUDNN_LIBRARY_DIRS

				#

				# Borrowed from https://github.com/caffe2/caffe2/blob/master/cmake/Modules/FindCuDNN.cmake

				include(FindPackageHandleStandardArgs)

				set(CUDNN_ROOT_DIR "" CACHE PATH "Folder contains NVIDIA cuDNN")

				if($ENV{CUDNN_INCLUDE_DIR})

				  SET(CUDNN_INCLUDE_DIR $ENV{CUDNN_INCLUDE_DIR})

				else($ENV{CUDNN_INCLUDE_DIR})

				  find_path(CUDNN_INCLUDE_DIR cudnn.h

				    HINTS ${CUDNN_ROOT_DIR} ${CUDA_TOOLKIT_ROOT_DIR}

				    PATH_SUFFIXES cuda/include include)

				endif($ENV{CUDNN_INCLUDE_DIR})

				IF ($ENV{USE_STATIC_CUDNN})

				  MESSAGE(STATUS "USE_STATIC_CUDNN detected. Linking against static CUDNN library")

				  SET(CUDNN_LIBNAME "libcudnn_static.a")

				ELSE()

				  SET(CUDNN_LIBNAME "cudnn")

				ENDIF()

				if($ENV{CUDNN_LIBRARY})

				  SET(CUDNN_LIBRARY $ENV{CUDNN_LIBRARY})

				else($ENV{CUDNN_LIBRARY})

				  find_library(CUDNN_LIBRARY ${CUDNN_LIBNAME}

				    HINTS ${CUDNN_LIB_DIR} ${CUDNN_ROOT_DIR} ${CUDA_TOOLKIT_ROOT_DIR}

				    PATH_SUFFIXES lib lib64 cuda/lib cuda/lib64 lib/x64)

				endif($ENV{CUDNN_LIBRARY})

				find_package_handle_standard_args(

				    CUDNN DEFAULT_MSG CUDNN_INCLUDE_DIR CUDNN_LIBRARY)

				if(CUDNN_FOUND)

					# get cuDNN version

				  file(READ ${CUDNN_INCLUDE_DIR}/cudnn.h CUDNN_HEADER_CONTENTS)

					string(REGEX MATCH "define CUDNN_MAJOR * +([0-9]+)"

								 CUDNN_VERSION_MAJOR "${CUDNN_HEADER_CONTENTS}")

					string(REGEX REPLACE "define CUDNN_MAJOR * +([0-9]+)" "\\1"

								 CUDNN_VERSION_MAJOR "${CUDNN_VERSION_MAJOR}")

					string(REGEX MATCH "define CUDNN_MINOR * +([0-9]+)"

								 CUDNN_VERSION_MINOR "${CUDNN_HEADER_CONTENTS}")

					string(REGEX REPLACE "define CUDNN_MINOR * +([0-9]+)" "\\1"

								 CUDNN_VERSION_MINOR "${CUDNN_VERSION_MINOR}")

					string(REGEX MATCH "define CUDNN_PATCHLEVEL * +([0-9]+)"

								 CUDNN_VERSION_PATCH "${CUDNN_HEADER_CONTENTS}")

					string(REGEX REPLACE "define CUDNN_PATCHLEVEL * +([0-9]+)" "\\1"

								 CUDNN_VERSION_PATCH "${CUDNN_VERSION_PATCH}")

				  # Assemble cuDNN version

				  if(NOT CUDNN_VERSION_MAJOR)

				    set(CUDNN_VERSION "?")

				  else()

				    set(CUDNN_VERSION "${CUDNN_VERSION_MAJOR}.${CUDNN_VERSION_MINOR}.${CUDNN_VERSION_PATCH}")

				  endif()

				  set(CUDNN_INCLUDE_DIRS ${CUDNN_INCLUDE_DIR})

				  set(CUDNN_LIBRARIES ${CUDNN_LIBRARY})

				  message(STATUS "Found cuDNN: v${CUDNN_VERSION}  (include: ${CUDNN_INCLUDE_DIR}, library: ${CUDNN_LIBRARY})")

				  mark_as_advanced(CUDNN_ROOT_DIR CUDNN_LIBRARY CUDNN_INCLUDE_DIR)

				endif()

									
										196

aten/cmake/select_compute_arch.cmake
									
												View File
											
				@ -1,196 +0,0 @@

				# Synopsis:

				#   CUDA_SELECT_NVCC_ARCH_FLAGS(out_variable [target_CUDA_architectures])

				#   -- Selects GPU arch flags for nvcc based on target_CUDA_architectures

				#      target_CUDA_architectures : Auto | Common | All | LIST(ARCH_AND_PTX ...)

				#       - "Auto" detects local machine GPU compute arch at runtime.

				#       - "Common" and "All" cover common and entire subsets of architectures

				#      ARCH_AND_PTX : NAME | NUM.NUM | NUM.NUM(NUM.NUM) | NUM.NUM+PTX

				#      NAME: Fermi Kepler Maxwell Kepler+Tegra Kepler+Tesla Maxwell+Tegra Pascal

				#      NUM: Any number. Only those pairs are currently accepted by NVCC though:

				#            2.0 2.1 3.0 3.2 3.5 3.7 5.0 5.2 5.3 6.0 6.2

				#      Returns LIST of flags to be added to CUDA_NVCC_FLAGS in ${out_variable}

				#      Additionally, sets ${out_variable}_readable to the resulting numeric list

				#      Example:

				#       CUDA_SELECT_NVCC_ARCH_FLAGS(ARCH_FLAGS 3.0 3.5+PTX 5.2(5.0) Maxwell)

				#        LIST(APPEND CUDA_NVCC_FLAGS ${ARCH_FLAGS})

				#

				#      More info on CUDA architectures: https://en.wikipedia.org/wiki/CUDA

				#

				# This list will be used for CUDA_ARCH_NAME = All option

				set(CUDA_KNOWN_GPU_ARCHITECTURES  "Fermi" "Kepler" "Maxwell")

				# This list will be used for CUDA_ARCH_NAME = Common option (enabled by default)

				set(CUDA_COMMON_GPU_ARCHITECTURES "3.0" "3.5" "5.0")

				if (CUDA_VERSION VERSION_GREATER "6.5")

				  list(APPEND CUDA_KNOWN_GPU_ARCHITECTURES "Kepler+Tegra" "Kepler+Tesla" "Maxwell+Tegra")

				  list(APPEND CUDA_COMMON_GPU_ARCHITECTURES "5.2")

				endif ()

				if (CUDA_VERSION VERSION_GREATER "7.5")

				  list(APPEND CUDA_KNOWN_GPU_ARCHITECTURES "Pascal")

				  list(APPEND CUDA_COMMON_GPU_ARCHITECTURES "6.0" "6.1" "6.1+PTX")

				else()

				  list(APPEND CUDA_COMMON_GPU_ARCHITECTURES "5.2+PTX")

				endif ()

				################################################################################################

				# A function for automatic detection of GPUs installed  (if autodetection is enabled)

				# Usage:

				#   CUDA_DETECT_INSTALLED_GPUS(OUT_VARIABLE)

				#

				function(CUDA_DETECT_INSTALLED_GPUS OUT_VARIABLE)

				  if(NOT CUDA_GPU_DETECT_OUTPUT)

				    set(cufile ${PROJECT_BINARY_DIR}/detect_cuda_archs.cu)

				    file(WRITE ${cufile} ""

				      "#include <cstdio>\n"

				      "int main()\n"

				      "{\n"

				      "  int count = 0;\n"

				      "  if (cudaSuccess != cudaGetDeviceCount(&count)) return -1;\n"

				      "  if (count == 0) return -1;\n"

				      "  for (int device = 0; device < count; ++device)\n"

				      "  {\n"

				      "    cudaDeviceProp prop;\n"

				      "    if (cudaSuccess == cudaGetDeviceProperties(&prop, device))\n"

				      "      std::printf(\"%d.%d \", prop.major, prop.minor);\n"

				      "  }\n"

				      "  return 0;\n"

				      "}\n")

				    execute_process(COMMAND "${CUDA_NVCC_EXECUTABLE}" "--run" "${cufile}"

				                    "-ccbin" ${CMAKE_CXX_COMPILER}

				                    WORKING_DIRECTORY "${PROJECT_BINARY_DIR}/CMakeFiles/"

				                    RESULT_VARIABLE nvcc_res OUTPUT_VARIABLE nvcc_out

				                    ERROR_QUIET OUTPUT_STRIP_TRAILING_WHITESPACE)

				    if(nvcc_res EQUAL 0)

				      string(REPLACE "2.1" "2.1(2.0)" nvcc_out "${nvcc_out}")

				      set(CUDA_GPU_DETECT_OUTPUT ${nvcc_out} CACHE INTERNAL "Returned GPU architetures from detect_gpus tool" FORCE)

				    endif()

				  endif()

				  if(NOT CUDA_GPU_DETECT_OUTPUT)

				    message(STATUS "Automatic GPU detection failed. Building for common architectures.")

				    set(${OUT_VARIABLE} ${CUDA_COMMON_GPU_ARCHITECTURES} PARENT_SCOPE)

				  else()

				    set(${OUT_VARIABLE} ${CUDA_GPU_DETECT_OUTPUT} PARENT_SCOPE)

				  endif()

				endfunction()

				################################################################################################

				# Function for selecting GPU arch flags for nvcc based on CUDA architectures from parameter list

				# Usage:

				#   SELECT_NVCC_ARCH_FLAGS(out_variable [list of CUDA compute archs])

				function(CUDA_SELECT_NVCC_ARCH_FLAGS out_variable)

				  set(CUDA_ARCH_LIST "${ARGN}")

				  if("X${CUDA_ARCH_LIST}" STREQUAL "X" )

				    set(CUDA_ARCH_LIST "Auto")

				  endif()

				  set(cuda_arch_bin)

				  set(cuda_arch_ptx)

				  if("${CUDA_ARCH_LIST}" STREQUAL "All")

				    set(CUDA_ARCH_LIST ${CUDA_KNOWN_GPU_ARCHITECTURES})

				  elseif("${CUDA_ARCH_LIST}" STREQUAL "Common")

				    set(CUDA_ARCH_LIST ${CUDA_COMMON_GPU_ARCHITECTURES})

				  elseif("${CUDA_ARCH_LIST}" STREQUAL "Auto")

				    CUDA_DETECT_INSTALLED_GPUS(CUDA_ARCH_LIST)

				    message(STATUS "Autodetected CUDA architecture(s): ${CUDA_ARCH_LIST}")

				  endif()

				  # Now process the list and look for names

				  string(REGEX REPLACE "[ \t]+" ";" CUDA_ARCH_LIST "${CUDA_ARCH_LIST}")

				  list(REMOVE_DUPLICATES CUDA_ARCH_LIST)

				  foreach(arch_name ${CUDA_ARCH_LIST})

				    set(arch_bin)

				    set(add_ptx FALSE)

				    # Check to see if we are compiling PTX

				    if(arch_name MATCHES "(.*)\\+PTX$")

				      set(add_ptx TRUE)

				      set(arch_name ${CMAKE_MATCH_1})

				    endif()

				    if(arch_name MATCHES "(^[0-9]\\.[0-9](\\([0-9]\\.[0-9]\\))?)$")

				      set(arch_bin ${CMAKE_MATCH_1})

				      set(arch_ptx ${arch_bin})

				    else()

				      # Look for it in our list of known architectures

				      if(${arch_name} STREQUAL "Fermi")

				        set(arch_bin "2.0 2.1(2.0)")

				      elseif(${arch_name} STREQUAL "Kepler+Tegra")

				        set(arch_bin 3.2)

				      elseif(${arch_name} STREQUAL "Kepler+Tesla")

				        set(arch_bin 3.7)

				      elseif(${arch_name} STREQUAL "Kepler")

				        set(arch_bin 3.0 3.5)

				        set(arch_ptx 3.5)

				      elseif(${arch_name} STREQUAL "Maxwell+Tegra")

				        set(arch_bin 5.3)

				      elseif(${arch_name} STREQUAL "Maxwell")

				        set(arch_bin 5.0 5.2)

				        set(arch_ptx 5.2)

				      elseif(${arch_name} STREQUAL "Pascal")

				        set(arch_bin 6.0 6.1)

				        set(arch_ptx 6.1)

				      else()

				        message(SEND_ERROR "Unknown CUDA Architecture Name ${arch_name} in CUDA_SELECT_NVCC_ARCH_FLAGS")

				      endif()

				    endif()

				    if(NOT arch_bin)

				      message(SEND_ERROR "arch_bin wasn't set for some reason")

				    endif()

				    list(APPEND cuda_arch_bin ${arch_bin})

				    if(add_ptx)

				      if (NOT arch_ptx)

				        set(arch_ptx ${arch_bin})

				      endif()

				      list(APPEND cuda_arch_ptx ${arch_ptx})

				    endif()

				  endforeach()

				  # remove dots and convert to lists

				  string(REGEX REPLACE "\\." "" cuda_arch_bin "${cuda_arch_bin}")

				  string(REGEX REPLACE "\\." "" cuda_arch_ptx "${cuda_arch_ptx}")

				  string(REGEX MATCHALL "[0-9()]+" cuda_arch_bin "${cuda_arch_bin}")

				  string(REGEX MATCHALL "[0-9]+"   cuda_arch_ptx "${cuda_arch_ptx}")

				  if(cuda_arch_bin)

				    list(REMOVE_DUPLICATES cuda_arch_bin)

				  endif()

				  if(cuda_arch_ptx)

				    list(REMOVE_DUPLICATES cuda_arch_ptx)

				  endif()

				  set(nvcc_flags "")

				  set(nvcc_archs_readable "")

				  # Tell NVCC to add binaries for the specified GPUs

				  foreach(arch ${cuda_arch_bin})

				    if(arch MATCHES "([0-9]+)\\(([0-9]+)\\)")

				      # User explicitly specified ARCH for the concrete CODE

				      list(APPEND nvcc_flags -gencode arch=compute_${CMAKE_MATCH_2},code=sm_${CMAKE_MATCH_1})

				      list(APPEND nvcc_archs_readable sm_${CMAKE_MATCH_1})

				    else()

				      # User didn't explicitly specify ARCH for the concrete CODE, we assume ARCH=CODE

				      list(APPEND nvcc_flags -gencode arch=compute_${arch},code=sm_${arch})

				      list(APPEND nvcc_archs_readable sm_${arch})

				    endif()

				  endforeach()

				  # Tell NVCC to add PTX intermediate code for the specified architectures

				  foreach(arch ${cuda_arch_ptx})

				    list(APPEND nvcc_flags -gencode arch=compute_${arch},code=compute_${arch})

				    list(APPEND nvcc_archs_readable compute_${arch})

				  endforeach()

				  string(REPLACE ";" " " nvcc_archs_readable "${nvcc_archs_readable}")

				  set(${out_variable}          ${nvcc_flags}          PARENT_SCOPE)

				  set(${out_variable}_readable ${nvcc_archs_readable} PARENT_SCOPE)

				endfunction()

									
										61

aten/contrib/data/BatchDataset.cc
									
												View File
											
				@ -1,61 +0,0 @@

				#include "BatchDataset.h"

				#include "Dataset.h"

				#include "ATen/ATen.h"

				#include <vector>

				#include <cassert>

				#include <math.h>

				using namespace at;

				BatchDataset::BatchDataset(Dataset& dataset, uint64_t batchsize) {

				   BatchDataset(dataset, batchsize, true);

				}

				BatchDataset::BatchDataset(Dataset& dataset, uint64_t batchsize, bool fullbatches) {

				   dataset_ = &dataset;

				   size_ = dataset_->size();

				   batchsize_ = batchsize;

				   fullbatches_ = fullbatches;

				}

				void BatchDataset::getField(uint64_t idx, std::string& fieldkey, at::Tensor& field) {

				   // assertions:

				   assert(idx < size());

				   assert(hasField(fieldkey));

				   // loop over samples:

				   Tensor singlefield, buffer;

				   uint64_t maxsize = std::min(batchsize_, size_ - idx * batchsize_);

				   for(int n = 0; n < maxsize; n++) {

				      // get sample:

				      uint64_t batchidx = idx * batchsize_ + n;

				      dataset_->getField(batchidx, fieldkey, singlefield);

				      // allocate memory for batch:

				      if(n == 0) {

				         // determine size of batch:

				         std::vector<int64_t> fieldsize;

				         fieldsize.push_back(maxsize);

				         for(uint64_t d = 0; d < singlefield.dim(); ++d) {

				            fieldsize.push_back(singlefield.size(d));

				         }

				         // resize buffer:

				         field.resize_(fieldsize);

				      }

				      // copy sample into batch:

				      buffer = select(field, 0, n);

				      buffer.copy_(singlefield);

				   }

				}

				uint64_t BatchDataset::size() {

				   if(fullbatches_)

				      return floor(size_ / batchsize_);

				   else

				      return ceil(size_ / batchsize_);

				}

									
										21

aten/contrib/data/BatchDataset.h
									
												View File
											
				@ -1,21 +0,0 @@

				#ifndef AT_BATCH_DATASET_H

				#define AT_BATCH_DATASET_H

				#include "Dataset.h"

				#include "ATen/ATen.h"

				class BatchDataset : public Dataset

				{

				public:

				   BatchDataset(Dataset& dataset, uint64_t batchsize);

				   BatchDataset(Dataset& dataset, uint64_t batchsize, bool fullbatches);

				   virtual void getField(uint64_t idx, std::string& fieldkey, at::Tensor& field);

				   virtual uint64_t size();

				private:

				   Dataset* dataset_;

				   uint64_t batchsize_;

				   uint64_t size_;

				   bool fullbatches_;

				};

				#endif

									
										30

aten/contrib/data/CMakeLists.txt
									
												View File
											
				@ -1,30 +0,0 @@

				cmake_minimum_required(VERSION 3.0)

				project(ATen)

				if(APPLE)

				  set(CMAKE_MACOSX_RPATH ON)

				endif()

				if(CMAKE_VERSION VERSION_LESS "3.1")

				  set(CMAKE_CXX_FLAGS "--std=c++11 ${CMAKE_CXX_FLAGS}")

				else()

				  set(CMAKE_CXX_STANDARD 11)

				endif()

				set(src

				  BatchDataset.cc

				  ConcatDataset.cc

				  Dataset.cc

				  MergeDataset.cc

				  ResampleDataset.cc

				  ShuffleDataset.cc

				  TensorDataset.cc

				  TransformDataset.cc

				)

				add_library(xtdata ${TH_LINK_STYLE} ${src})

				target_link_libraries(xtdata ATen)

				include_directories(.)

				# add_executable(test-data test/basic.cc)

				# target_link_libraries(test-data xtdata)

									
										63

aten/contrib/data/ConcatDataset.cc
									
												View File
											
				@ -1,63 +0,0 @@

				#include "ConcatDataset.h"

				#include "Dataset.h"

				#include <vector>

				#include <cassert>

				using namespace at;

				ConcatDataset::ConcatDataset(std::vector<Dataset*>& datasets) {

				   datasets_ = &datasets;

				   size_ = 0;

				   beginindices_  = std::vector<uint64_t>();

				   endindices_    = std::vector<uint64_t>();

				   beginindices_.push_back(0);

				   for(Dataset* dataset : datasets) {

				      size_ += dataset->size();

				      uint64_t curidx = endindices_.back();

				      endindices_.push_back(beginindices_.back() + dataset->size());

				      if(endindices_.size() > 1)

				         beginindices_.push_back(curidx + 1);

				   }

				}

				void ConcatDataset::getField(uint64_t idx, std::string& fieldkey, Tensor &field) {

				   // assertions:

				   assert(idx < size());

				   assert(hasField(fieldkey));

				   // get sample from correct dataset:

				   uint64_t datasetidx = binarySearch(idx);

				   Dataset* curdataset = (*datasets_)[datasetidx];

				   curdataset->getField(idx - beginindices_[datasetidx], fieldkey, field);

				}

				uint64_t ConcatDataset::binarySearch(uint64_t idx) {

				   assert(idx < size());  // TODO: Add caching to this method.

				   uint64_t left = 0;

				   uint64_t right = size_ - 1;

				   while(left != right) {

				      uint64_t middle = (right - left) / 2;

				      if(left == middle) {

				         if(idx > endindices_[left])

				            left = right;

				         else

				            right = left;

				      }

				      else {

				         if(idx > endindices_[middle])

				            left = middle;

				         else if(idx < beginindices_[middle])

				            right = middle;

				         else {

				            left  = middle;

				            right = middle;

				         }

				      }

				   }

				   return left;

				}

				uint64_t ConcatDataset::size() {

				   return size_;

				}

									
										20

aten/contrib/data/ConcatDataset.h
									
												View File
											
				@ -1,20 +0,0 @@

				#ifndef AT_CONCAT_DATASET_H

				#define AT_CONCAT_DATASET_H

				#include "Dataset.h"

				class ConcatDataset : public Dataset

				{

				public:

				   ConcatDataset(std::vector<Dataset*>& datasets);

				   virtual void getField(uint64_t idx, std::string& fieldkey, at::Tensor &field);

				   virtual uint64_t size();

				private:

				   uint64_t binarySearch(uint64_t idx);

				   std::vector<Dataset*>* datasets_;

				   std::vector<uint64_t> beginindices_;

				   std::vector<uint64_t> endindices_;

				   uint64_t size_;

				};

				#endif

									
										28

aten/contrib/data/Dataset.cc
									
												View File
											
				@ -1,28 +0,0 @@

				#include "Dataset.h"

				#include <cassert>

				typedef std::map<std::string, at::Tensor> Fields;

				void Dataset::get(int64_t idx, Fields& fields) {

				   for(auto& field : fields) {

				      std::string fieldname = field.first;

				      assert(hasField(fieldname));

				      getField(idx, fieldname, field.second);

				   }

				}

				bool Dataset::hasField(std::string& fieldkey) {

				   auto search = fieldkeys_.find(fieldkey);

				   return (search != fieldkeys_.end());

				}

				std::set<std::string>& Dataset::fieldKeys() {

				   return fieldkeys_;

				}

				void Dataset::addFieldKey(std::string& fieldkey) {

				   fieldkeys_.insert(fieldkey);

				}

				Dataset::~Dataset() {

				}

									
										23

aten/contrib/data/Dataset.h
									
												View File
											
				@ -1,23 +0,0 @@

				#ifndef AT_DATASET_H

				#define AT_DATASET_H

				#include "ATen/ATen.h"

				#include <string>

				#include <map>

				#include <set>

				typedef std::map<std::string, at::Tensor> Fields;

				class Dataset {

				   std::set<std::string> fieldkeys_;

				public:

				   virtual uint64_t size() = 0;  // pure virtual function

				   virtual void getField(uint64_t idx, std::string& fieldkey, at::Tensor& field) = 0;

				   virtual bool hasField(std::string& fieldkey);

				   virtual std::set<std::string>& fieldKeys();

				   virtual void addFieldKey(std::string& fieldkey);

				   virtual void get(int64_t idx, Fields& fields);

				   virtual ~Dataset();

				};

				#endif

									
										93

aten/contrib/data/DatasetIterator.h
									
												View File
											
				@ -1,93 +0,0 @@

				#ifndef AT_DATASET_ITERATOR_H

				#define AT_DATASET_ITERATOR_H

				#include "Dataset.h"

				#include <iterator>

				class DatasetIterator : public std::iterator<std::forward_iterator_tag, Fields> {

				private:

				   uint64_t idx_ = 0;

				   Dataset* dataset_;

				public:

				   DatasetIterator(Dataset& dataset)  {

				      dataset_ = &dataset;

				      idx_ = 0;

				   }

				   DatasetIterator(DatasetIterator& rhs) {

				      DatasetIterator(*rhs.dataset_);

				   }

				   DatasetIterator& operator ++() {

				      ++idx_;

				      return *this;

				   }

				   DatasetIterator operator ++ (int) {

				      DatasetIterator tmp(*this);

				      ++idx_;

				      return tmp;

				   }

				   friend bool operator == (const DatasetIterator& lhs, const DatasetIterator& rhs);

				   friend bool operator != (const DatasetIterator& lhs, const DatasetIterator& rhs);

				   Fields operator* () const {

				      Fields sample;

				      dataset_->get(idx_, sample);

				      return sample;

				   }

				   Fields* operator-> () const {

				      Fields sample;

				      dataset_->get(idx_, sample);

				      return &sample;

				   }

				};

				bool operator == (const DatasetIterator& lhs, const DatasetIterator& rhs) {

				   return lhs.dataset_ == rhs.dataset_;

				}

				bool operator != (const DatasetIterator& lhs, const DatasetIterator& rhs) {

				   return lhs.dataset_ != rhs.dataset_;

				}

				typedef DatasetIterator iterator;

				//typedef DatasetIterator<const Fields> const_iterator;

				#endif

				/**

				iterator {

				    iterator(const iterator&);

				    ~iterator();

				    iterator& operator=(const iterator&);

				    iterator& operator++(); //prefix increment

				    reference operator*() const;

				    friend void swap(iterator& lhs, iterator& rhs); //C++11 I think

				};

				input_iterator : public virtual iterator {

				    iterator operator++(int); //postfix increment

				    value_type operator*() const;

				    pointer operator->() const;

				    friend bool operator==(const iterator&, const iterator&);

				    friend bool operator!=(const iterator&, const iterator&);

				};

				//once an input iterator has been dereferenced, it is

				//undefined to dereference one before that.

				output_iterator : public virtual iterator {

				    reference operator*() const;

				    iterator operator++(int); //postfix increment

				};

				//dereferences may only be on the left side of an assignment

				//once an input iterator has been dereferenced, it is

				//undefined to dereference one before that.

				forward_iterator : input_iterator, output_iterator {

				    forward_iterator();

				};

				**/

									
										30

aten/contrib/data/MergeDataset.cc
									
												View File
											
				@ -1,30 +0,0 @@

				#include "MergeDataset.h"

				#include <cassert>

				using namespace at;

				MergeDataset::MergeDataset(std::vector<Dataset*>& datasets) {

				   datasets_ = &datasets;

				   uint64_t idx = 0;

				   for(Dataset* dataset : *datasets_) {

				      for(auto& fieldkey : dataset->fieldKeys()) {

				         std::string fieldkeyc = fieldkey;

				         addFieldKey(fieldkeyc);

				         datasetidx_[fieldkeyc] = idx;

				      }

				   }

				}

				void MergeDataset::getField(uint64_t idx, std::string& fieldkey, Tensor& field) {

				   assert(idx < size());

				   assert(hasField(fieldkey));

				   Dataset* curdataset = (*datasets_)[datasetidx_[fieldkey]];

				   return curdataset->getField(idx, fieldkey, field);

				}

				uint64_t MergeDataset::size() {

				   uint64_t size = 0;

				   for(Dataset* dataset : *datasets_)

				      size += dataset->size();

				   return size;

				}

									
										19

aten/contrib/data/MergeDataset.h
									
												View File
											
				@ -1,19 +0,0 @@

				#ifndef AT_MERGE_DATASET_H

				#define AT_MERGE_DATASET_H

				#include "Dataset.h"

				#include <vector>

				#include <string>

				class MergeDataset : public Dataset

				{

				public:

				   MergeDataset(std::vector<Dataset*>& datasets);

				   virtual void getField(uint64_t idx, std::string& fieldkey, at::Tensor& field);

				   virtual uint64_t size();

				private:

				   std::vector<Dataset*>* datasets_;

				   std::map<std::string, int> datasetidx_;

				};

				#endif

									
										47

aten/contrib/data/ResampleDataset.cc
									
												View File
											
				@ -1,47 +0,0 @@

				#include "ResampleDataset.h"

				#include "Dataset.h"

				#include <vector>

				#include <cassert>

				using namespace at;

				ResampleDataset::ResampleDataset(Dataset& dataset) {

				   dataset_ = &dataset;

				   size_ = dataset.size();

				   perm_ = std::vector<uint64_t>();

				   perm_.reserve(size_);

				   for(int n = 0; n < size_; ++n)

				      perm_[n] = n;

				}

				ResampleDataset::ResampleDataset(Dataset& dataset, std::vector<uint64_t>& perm) {

				   dataset_ = &dataset;

				   size_ = dataset.size();

				   perm_ = perm;

				   assert(perm_.size() == size_);

				}

				ResampleDataset::ResampleDataset(Dataset& dataset, std::function<uint64_t(uint64_t)> perm) {

				   dataset_ = &dataset;

				   size_ = dataset.size();

				   permfunc_ = perm;

				   resample();

				}

				void ResampleDataset::getField(uint64_t idx, std::string& fieldkey, at::Tensor& field) {

				   assert(idx < size());

				   assert(hasField(fieldkey));

				   dataset_->getField(perm_[idx], fieldkey, field);

				}

				void ResampleDataset::resample() {

				   if(permfunc_) {

				      perm_.reserve(size_);

				      for(int n = 0; n < size_; ++n)

				         perm_[n] = permfunc_(n);

				   }

				}

				uint64_t ResampleDataset::size() {

				   return size_;

				}

									
										28

aten/contrib/data/ResampleDataset.h
									
												View File
											
				@ -1,28 +0,0 @@

				#ifndef AT_RESAMPLE_DATASET_H

				#define AT_RESAMPLE_DATASET_H

				#include <string>

				#include <vector>

				#include <functional>

				#include "ATen/ATen.h"

				#include "Dataset.h"

				class ResampleDataset : public Dataset

				{

				public:

				   ResampleDataset(Dataset& dataset);

				   ResampleDataset(Dataset& dataset, std::vector<uint64_t>& perm);

				   ResampleDataset(Dataset& dataset, std::function<uint64_t(uint64_t)> perm);

				   virtual void getField(uint64_t idx, std::string& fieldkey, at::Tensor& field);

				   virtual uint64_t size();

				   virtual void resample();

				protected:

				   std::vector<uint64_t> perm_;

				   uint64_t size_;

				private:

				   Dataset* dataset_;

				   std::function<uint64_t(uint64_t)> permfunc_;

				   std::vector<at::Tensor> fields_;

				};

				#endif

									
										16

aten/contrib/data/ShuffleDataset.cc
									
												View File
											
				@ -1,16 +0,0 @@

				#include "ShuffleDataset.h"

				#include "Dataset.h"

				#include <algorithm>

				using namespace at;

				ShuffleDataset::ShuffleDataset(Dataset& dataset) : ResampleDataset(dataset) {

				   resample();

				}

				void ShuffleDataset::resample() {

				   perm_.reserve(size_);

				   for(int n = 0; n < size_; ++n)

				      perm_[n] = n;

				   std::random_shuffle(perm_.begin(), perm_.end());

				}

									
										14

aten/contrib/data/ShuffleDataset.h
									
												View File
											
				@ -1,14 +0,0 @@

				#ifndef AT_SHUFFLE_DATASET_H

				#define AT_SHUFFLE_DATASET_H

				#include "Dataset.h"

				#include "ResampleDataset.h"

				class ShuffleDataset : public ResampleDataset

				{

				public:

				   ShuffleDataset(Dataset& dataset);

				   virtual void resample();

				};

				#endif

									
										27

aten/contrib/data/TensorDataset.cc
									
												View File
											
				@ -1,27 +0,0 @@

				#include "TensorDataset.h"

				#include "ATen/ATen.h"

				#include <cassert>

				using namespace at;

				TensorDataset::TensorDataset(Tensor& t, std::string& fieldkey) {

				   t_ = t;

				   fieldkey_ = fieldkey;

				   addFieldKey(fieldkey);

				}

				void TensorDataset::getField(uint64_t idx, std::string& fieldkey, Tensor& field) {

				   // assertions:

				   assert(idx < size());

				   assert(fieldkey_.compare(fieldkey) == 0);

				   // get sample:

				   Tensor buffer = select(t_, 0, idx);

				   field.copy_(buffer);

				}

				uint64_t TensorDataset::size() {

				   return t_.size(0);

				}

									
										19

aten/contrib/data/TensorDataset.h
									
												View File
											
				@ -1,19 +0,0 @@

				#ifndef AT_TENSOR_DATASET_H

				#define AT_TENSOR_DATASET_H

				#include "Dataset.h"

				#include "ATen/ATen.h"

				#include <string>

				class TensorDataset : public Dataset

				{

				public:

				   TensorDataset(at::Tensor& t, std::string& fieldkey);

				   virtual void getField(uint64_t idx, std::string& fieldkey, at::Tensor& field);

				   virtual uint64_t size();

				private:

				   at::Tensor t_;

				   std::string fieldkey_;

				};

				#endif

									
										25

aten/contrib/data/TransformDataset.cc
									
												View File
											
				@ -1,25 +0,0 @@

				#include "TransformDataset.h"

				#include "ATen/ATen.h"

				#include "ATen/ATen.h"

				#include <cassert>

				using namespace at;

				TransformDataset::TransformDataset(Dataset& dataset, std::string& fieldkey, std::function<Tensor(Tensor)>& transform) {

				   assert(hasField(fieldkey));

				   dataset_ = &dataset;

				   fieldkey_ = fieldkey;

				   transform_ = transform;

				}

				void TransformDataset::getField(uint64_t idx, std::string& fieldkey, Tensor& field) {

				   dataset_->getField(idx, fieldkey, field);

				   if(fieldkey.compare(fieldkey_) == 0) {

				      Tensor transformed = transform_(field);

				      field.copy_(transformed);

				   }

				}

				uint64_t TransformDataset::size() {

				   return dataset_->size();

				}

									
										23

aten/contrib/data/TransformDataset.h
									
												View File
											
				@ -1,23 +0,0 @@

				#ifndef AT_TRANSFORM_DATASET_H

				#define AT_TRANSFORM_DATASET_H

				#include "Dataset.h"

				#include "ATen/ATen.h"

				#include <functional>

				#include <string>

				using namespace at;

				class TransformDataset : public Dataset

				{

				public:

				   TransformDataset(Dataset& dataset, std::string& fieldkey, std::function<Tensor(Tensor)>& transform);

				   virtual void getField(uint64_t idx, std::string& fieldkey, Tensor& field);

				   virtual uint64_t size();

				private:

				   Dataset* dataset_;

				   std::string fieldkey_;

				   std::function<Tensor(Tensor)> transform_;

				};

				#endif

									
										22

aten/contrib/data/test/basic.cc
									
												View File
											
				@ -1,22 +0,0 @@

				#include "Dataset.h"

				#include "DatasetIterator.h"

				#include "TensorDataset.h"

				#include <iostream>

				using namespace at;

				int main()

				{

				   std::cout << "hello\n";

				   Tensor tensor = rand(CPU(kDouble), {256,32});

				   TensorDataset dataset(tensor);

				   DatasetIterator datasetiterator(dataset);

				   uint64_t cnt = 0;

				   for(auto& sample : datasetiterator) {

				      std::cout << "got sample " << cnt << std:endl;

				      cnt++;

				   }

				   return 0;

				}

									
										77

aten/contrib/data/threadpool/ThreadPool.cc
									
												View File
											
				@ -1,77 +0,0 @@

				// dependencies:

				#include "ThreadPool.h"

				#include <vector>

				#include <queue>

				#include <memory>

				#include <thread>

				#include <mutex>

				#include <condition_variable>

				#include <future>

				#include <functional>

				#include <stdexcept>

				#include <stdlib.h>

				#include <stdarg.h>

				// constructor launches the specified number of workers:

				ThreadPool::ThreadPool(uint64_t threads) : stop(false) {

				   // loop over all threads:

				   for(uint64_t i = 0; i < threads; ++i) {

				      workers.emplace_back(

				         [this] {

				            for(;;) {

				               std::function<void()> task;

				               {

				                  std::unique_lock<std::mutex> lock(this->queue_mutex);

				                  this->condition.wait(lock,

				                     [this]{ return this->stop || !this->tasks.empty(); });

				                  if(this->stop && this->tasks.empty())

				                     return;

				                  task = std::move(this->tasks.front());

				                  this->tasks.pop();

				               }

				               task();

				            }

				         }

				      );

				   }

				}

				// synchronize all the threads:

				void ThreadPool::synchronize() {

				   for(uint64_t i = 0; i < futures.size(); i++)

				      futures[i].first.wait();

				}

				// poll the threads for results:

				unsigned int ThreadPool::waitFor() {

				   // wait until a task is finished:

				   uint64_t i;

				   std::future_status status;

				   do {

				      for(i = 0; i < futures.size(); i++) {

				         status = futures[i].first.wait_for(std::chrono::microseconds(0));

				         if(status == std::future_status::ready) break;

				      }

				      std::this_thread::sleep_for(std::chrono::microseconds(1));

				   } while (status != std::future_status::ready);

				   // get the result and remove the future:

				   futures[i].first.get();

				   unsigned int handle = futures[i].second;

				   iter_swap(futures.begin() + i, futures.end() - 1);

				   futures.pop_back();

				   return handle;

				}

				// the destructor joins all threads:

				ThreadPool::~ThreadPool() {

				   {

				      std::unique_lock<std::mutex> lock(queue_mutex);

				      stop = true;

				   }

				   condition.notify_all();

				   for(std::thread &worker: workers)

				      worker.join();

				}

									
										65

aten/contrib/data/threadpool/ThreadPool.h
									
												View File
											
				@ -1,65 +0,0 @@

				#ifndef AT_THREADPOOL_H

				#define AT_THREADPOOL_H

				// dependencies:

				#include <vector>

				#include <queue>

				#include <thread>

				#include <mutex>

				#include <condition_variable>

				#include <future>

				// definition of the ThreadPool class:

				class ThreadPool {

				   public:

				      explicit ThreadPool(uint64_t);

				      void synchronize();

				      unsigned int waitFor();

				      template<class F, class... Args>

				      unsigned int enqueue(F&& f, Args&&... args);

				      ~ThreadPool();

				   private:

				      // all the threads that can perform work:

				      std::vector< std::thread > workers;

				      // the list of futures:

				      unsigned int handle = 0;

				      std::vector< std::pair< std::future<void>, unsigned int > > futures;

				      // the task queue:

				      std::queue< std::function<void()> > tasks;

				      // synchronization:

				      std::mutex queue_mutex;

				      std::condition_variable condition;

				      bool stop;

				};

				// enqueue new work item into the pool:

				template<class F, class... Args>

				unsigned int ThreadPool::enqueue(F&& f, Args&&... args)

				{

				   // create the task:

				   auto task = std::make_shared< std::packaged_task<void()> >(

				      std::bind(std::forward<F>(f), std::forward<Args>(args)...)

				   );

				   // get future and enqueue the task:

				   std::future<void> future = task->get_future();

				   {

				      std::unique_lock<std::mutex> lock(queue_mutex);

				      if(stop) throw std::runtime_error("enqueue on stopped ThreadPool");

				      tasks.emplace([task](){ (*task)(); });

				   }

				   condition.notify_one();

				   // generate handle and store future:

				   handle++;

				   futures.push_back(std::make_pair(std::move(future), handle));

				   return handle;

				}

				#endif

									
										89

aten/contrib/meter/APMeter.cc
									
												View File
											
				@ -1,89 +0,0 @@

				#include "APMeter.h"

				#include <math.h>

				#include <cassert>

				using namespace at;

				APMeter::APMeter() {

				   reset();

				}

				void APMeter::reset() {

				   outputs_ = CPU(kFloat).tensor();

				   targets_ = CPU(kFloat).tensor();

				   n_ = 0;

				}

				void APMeter::add(Tensor& output, Tensor& target) {

				   // assertions and allocations:

				   assert(output.dim() == 2 && target.dim() == 2);

				   //assert(isSameSizeAs(output, target));

				   assert(output.size(1) == outputs_.size(1));

				   // get current outputs and targets:

				   Tensor curoutputs = getOutputs();

				   Tensor curtargets = getTargets();

				   // make sure underlying storages are sufficiently large:

				   if(numel(outputs_) < numel(curoutputs) + numel(output)) {

				      long long newsize = ceil(numel(outputs_) * 1.5);

				      outputs_.resize_({newsize + numel(output)});

				      targets_.resize_({newsize + numel(output)});

				   }

				   n_ += output.size(0);

				   // store scores and targets:

				   uint64_t offset = (numel(curoutputs) > 0) ? curoutputs.size(0) : 0;

				   Tensor outputbuffer = outputs_.narrow( 0, offset, output.size(0));

				   Tensor targetbuffer = targets_.narrow( 0, offset, target.size(0));

				   outputbuffer.copy_(output);

				   targetbuffer.copy_(target);

				}

				Tensor APMeter::getOutputs() {

				   return outputs_.narrow(0, 0, n_);

				}

				Tensor APMeter::getTargets() {

				   return targets_.narrow(0, 0, n_);

				}

				void APMeter::value(Tensor& val) {

				   // get current outputs and targets:

				   Tensor curoutputs = getOutputs();

				   Tensor curtargets = getTargets();

				   // allocate some memory:

				   val.resize_({curoutputs.size(1)});

				   double * val_d = val.data<double>();

				   Tensor outputbuffer, targetbuffer, sortval, sortidx, sorttgt;

				   Tensor truepos, precision;

				   Tensor range = val.type().range(0,curoutputs.size(0));

				   // loop over all classes:

				   for(uint64_t k = 0; k < curoutputs.size(1); ++k) {

				      // sort scores:

				      outputbuffer = curoutputs.narrow( 1, k, 1);

				      targetbuffer = curtargets.narrow(1, k, 1).contiguous().toType(CPU(kDouble));

				      double * targetbuffer_d = targetbuffer.data<double>();

				      std::tie(sortval, sortidx) = sort(curoutputs, 0);

				      sorttgt = index_select(targetbuffer, 0, sortidx);

				      // compue true positive sums, and precision:

				      truepos = cumsum(targetbuffer,0);  // NOTE: Cast to double first?

				      precision = div(truepos, range);

				      double * precision_d = precision.data<double>();

				      // compute average precision:

				      val_d[k] = .0;

				      for(uint64_t n = 0; n < precision.size(0); ++n) {

				         if(targetbuffer_d[n] != 0.)

				            val_d[k] += precision_d[n];

				      }

				      auto norm = sum(targetbuffer).toCDouble();

				      if(norm > 0)

				        val_d[k] /= norm;

				   }

				}

									
										22

aten/contrib/meter/APMeter.h
									
												View File
											
				@ -1,22 +0,0 @@

				#ifndef AT_AP_METER_H

				#define AT_AP_METER_H

				#include "Meter.h"

				#include "ATen/ATen.h"

				class APMeter : public Meter

				{

				public:

				   APMeter();

				   virtual void add(Tensor& output, Tensor& target);

				   virtual void value(Tensor& val);

				   virtual void reset();

				   virtual Tensor getOutputs();

				   virtual Tensor getTargets();

				private:

				   Tensor outputs_;

				   Tensor targets_;

				   uint64_t n_;

				};

				#endif

									
										60

aten/contrib/meter/AUCMeter.cc
									
												View File
											
				@ -1,60 +0,0 @@

				#include "AUCMeter.h"

				#include "APMeter.h"

				#include <cassert>

				using namespace at;

				AUCMeter::AUCMeter() {

				   reset();

				}

				void AUCMeter::reset() {

				   meter_ = APMeter();

				}

				void AUCMeter::add(Tensor& output, Tensor& target) {

				   meter_.add(output, target);

				}

				void AUCMeter::value(Tensor& val) {

				   // get data from APMeter:

				   Tensor outputs = meter_.getOutputs();

				   Tensor targets = meter_.getTargets();

				   // sort scores:

				   Tensor sortval, sortidx, sorttgt;

				   std::tie(sortval, sortidx) = sort(outputs, 0, true);

				   sorttgt = index_select(targets, 0, sortidx);

				   int64_t * sortidx_d = sortidx.data<int64_t>();

				   int16_t * targets_d = sortidx.data<int16_t>();

				   // construct the ROC curve:

				   Tensor tpr = zeros(CPU(kDouble), {numel(outputs)});

				   Tensor fpr = zeros(CPU(kDouble), {numel(outputs)});

				   double * tpr_d = tpr.data<double>();

				   double * fpr_d = fpr.data<double>();

				   for(uint64_t n = 1; n <= numel(outputs); ++n) {

				      if(targets_d[sortidx_d[n - 1]] == 1) {

				         tpr_d[n] = tpr_d[n - 1] + 1.;

				         fpr_d[n] = fpr_d[n - 1];

				      } else {

				         tpr_d[n] = tpr_d[n - 1];

				         fpr_d[n] = fpr_d[n - 1] + 1.;

				      }

				   }

				   tpr.div_(sum(targets));

				   fpr.div_(sum(at::add(mul(targets, -1.), 1.)));

				   /**

				   local auc = torch.cmul(

				      tpr:narrow(1, 1, tpr:nElement() - 1),

				      fpr:narrow(1, 2, fpr:nElement() - 1) -

				      fpr:narrow(1, 1, fpr:nElement() - 1)):sum()

				   */

				   val.resize_({1}).fill_(

				     sum(mul(tpr.narrow(0, 0, numel(tpr) - 1),

				                sub(fpr.narrow(0, 1, numel(tpr) - 1),

				                     fpr.narrow(0, 0, numel(tpr) - 1)))));

				}

									
										19

aten/contrib/meter/AUCMeter.h
									
												View File
											
				@ -1,19 +0,0 @@

				#ifndef AT_AUC_METER_H

				#define AT_AUC_METER_H

				#include "Meter.h"

				#include "APMeter.h"

				#include "ATen/ATen.h"

				class AUCMeter : public Meter

				{

				public:

				   AUCMeter();

				   virtual void reset();

				   virtual void add(Tensor& output, Tensor& target);

				   virtual void value(Tensor& val);

				private:

				   APMeter meter_;

				};

				#endif

									
										25

aten/contrib/meter/CMakeLists.txt
									
												View File
											
				@ -1,25 +0,0 @@

				# requires to be compiled along xttensor

				include_directories(${CMAKE_CURRENT_SOURCE_DIR})

				# C++11

				if(CMAKE_VERSION VERSION_LESS "3.1")

				  set(CMAKE_CXX_FLAGS "--std=c++11 ${CMAKE_CXX_FLAGS}")

				else()

				  set(CMAKE_CXX_STANDARD 11)

				endif()

				set(src

				  APMeter.cc

				  AUCMeter.cc

				  ClassErrorMeter.cc

				  MAPMeter.cc

				  MSEMeter.cc

				)

				add_library(xtmeter ${TH_LINK_STYLE} ${src})

				target_link_libraries(xtmeter ATen)

				add_executable(test-meter test/basic.cc ${BACKWARD_ENABLE})

				# add_backward(test-meter)

				target_link_libraries(test-meter xtmeter)

									
										59

aten/contrib/meter/ClassErrorMeter.cc
									
												View File
											
				@ -1,59 +0,0 @@

				#include "ClassErrorMeter.h"

				#include "ATen/ATen.h"

				#include <cassert>

				using namespace at;

				ClassErrorMeter::ClassErrorMeter() {

				   ClassErrorMeter(1);

				}

				ClassErrorMeter::ClassErrorMeter(const int64_t topk) {

				   topkval_ = CPU(kShort).tensor();

				   sumval_ = CPU(kShort).tensor();

				   topkval_.resize_({topk});

				   sumval_.resize_({topk});

				   reset();

				}

				void ClassErrorMeter::reset() {

				  range_out(topkval_, 1, numel(topkval_));

				  sumval_.fill_(0.);

				  n_ = 0;

				}

				void ClassErrorMeter::add(Tensor& output, Tensor& target) {

				   // assertions and allocations:

				   assert(output.dim() == 2 && target.dim() == 1);

				   //assert(isSameSizeAs(output, target));

				   auto sumval_d = sumval_.data<int16_t>();

				   auto target_long = target.contiguous().toType(CPU(kLong));

				   auto target_d = target_long.data<int64_t>();

				   // update counts:

				   Tensor val, idx;

				   std::tie(val, idx) = topk(output, numel(topkval_), 1, true, true);

				   for(uint64_t n = 0; n < output.size(0); ++n) {

				      bool targetseen = false;

				      Tensor idx_n = idx.select(0,n);

				      auto idx_n_d = idx_n.data<int64_t>();

				      for(uint64_t k = 0; k < numel(topkval_); ++k) {

				         n_++;

				         if(targetseen) {

				            sumval_d[k]++;

				         } else if(idx_n_d[k] == target_d[n]) {

				            targetseen = true;

				            sumval_d[k]++;

				         }

				      }

				   }

				}

				void ClassErrorMeter::value(Tensor& val) {

				   val.resize_({numel(topkval_)});

				   auto val_d = val.data<double>();

				   auto sumval_d = sumval_.data<int16_t>();

				   for(uint64_t k = 0; k < numel(topkval_); ++k) {

				     val_d[k] = 1.0 - (double(sumval_d[k]) / double(n_));

				   }

				}

									
										21

aten/contrib/meter/ClassErrorMeter.h
									
												View File
											
				@ -1,21 +0,0 @@

				#ifndef AT_CLASS_ERROR_METER_H

				#define AT_CLASS_ERROR_METER_H

				#include "Meter.h"

				#include "ATen/ATen.h"

				class ClassErrorMeter : public Meter

				{

				public:

				   ClassErrorMeter();

				   ClassErrorMeter(const int64_t topk);

				   virtual void reset();

				   virtual void add(Tensor& output, Tensor& target);

				   virtual void value(Tensor& val);

				private:

				   Tensor topkval_;

				   Tensor sumval_;

				   uint64_t n_;

				};

				#endif

									
										23

aten/contrib/meter/MAPMeter.cc
									
												View File
											
				@ -1,23 +0,0 @@

				#include "MAPMeter.h"

				using namespace at;

				MAPMeter::MAPMeter() {

				   reset();

				}

				void MAPMeter::reset() {

				   meter_.reset();

				}

				void MAPMeter::add(Tensor& output, Tensor& target) {

				   meter_.add(output, target);

				}

				void MAPMeter::value(Tensor& val) {

				   //TODO: 0-dim

				   val.resize_({1});

				   Tensor allvalues = val.type().tensor();

				   meter_.value(allvalues);

				   val.fill_(mean(allvalues));

				}

									
										19

aten/contrib/meter/MAPMeter.h
									
												View File
											
				@ -1,19 +0,0 @@

				#ifndef AT_MAP_METER_H

				#define AT_MAP_METER_H

				#include "Meter.h"

				#include "APMeter.h"

				#include "ATen/ATen.h"

				class MAPMeter : public Meter

				{

				public:

				   MAPMeter();

				   virtual void reset();

				   virtual void add(Tensor& output, Tensor& target);

				   virtual void value(Tensor& val);

				private:

				   APMeter meter_;

				};

				#endif

									
										31

aten/contrib/meter/MSEMeter.cc
									
												View File
											
				@ -1,31 +0,0 @@

				#include "MSEMeter.h"

				#include <cassert>

				#include <math.h>

				using namespace at;

				MSEMeter::MSEMeter() {

				   reset();

				}

				void MSEMeter::reset() {

				   n_ = 0;

				   val_ = .0;

				}

				void MSEMeter::add(Tensor& output, Tensor& target) {

				   //assert(isSameSizeAs(output, output

				  Tensor t = output.sub(target);

				  Tensor result = t.mul(t).contiguous().toType(CPU(kDouble));

				  double * data = result.data<double>();

				  for(uint64_t n = 0; n < numel(result); ++n) {

				    n_++;

				    val_ += ( (1. / ((double)n_ - 1.) * val_) +

				              ((1. /  (double)n_) * data[n]));

				  }

				}

				void MSEMeter::value(Tensor& val) {

				  //TODO: 0-dim

				  val.resize_({1}).fill_(val_);

				}

									
										19

aten/contrib/meter/MSEMeter.h
									
												View File
											
				@ -1,19 +0,0 @@

				#ifndef AT_MSE_METER_H

				#define AT_MSE_METER_H

				#include "Meter.h"

				#include "ATen/ATen.h"

				class MSEMeter : public Meter

				{

				public:

				   MSEMeter();

				   virtual void reset();

				   virtual void add(Tensor& output, Tensor& target);

				   virtual void value(Tensor& val);

				private:

				   double val_;

				   uint64_t n_;

				};

				#endif

									
										17

aten/contrib/meter/Meter.h
									
												View File
											
				@ -1,17 +0,0 @@

				#ifndef AT_METER_H

				#define AT_METER_H

				#include "ATen/ATen.h"

				using namespace at;

				class Meter

				{

				public:

				   virtual void add(Tensor& output, Tensor& target) = 0;

				   virtual void value(Tensor& val) = 0;

				   virtual void reset() = 0;

				  virtual ~Meter() {};

				};

				#endif

									
										25

aten/contrib/meter/test/basic.cc
									
												View File
											
				@ -1,25 +0,0 @@

				#include "APMeter.h"

				#include <iostream>

				using namespace at;

				int main()

				{

				   auto && T = CPU(kFloat);

				   std::cout << "hello\n";

				   APMeter meter;

				   Tensor output = at::randn(T, {10, 7});

				   Tensor target = at::zeros(T, {10, 7});

				   for(uint64_t n = 0; n < 10; ++n) {

				     Tensor row = target.select(0,n);

				     auto row_d = row.data<float>();

				     row_d[rand() % 7] = 1.;

				   }

				   std::cout << output;

				   std::cout << target;

				   meter.add(output, target);

				   Tensor val;

				   meter.value(val);

				   std::cout << "value: " << val << std::endl;

				   return 0;

				}

3149

aten/doc/Functions.h

View File

File diff suppressed because it is too large Load Diff

1137

aten/doc/Type.h

View File

File diff suppressed because it is too large Load Diff

									
										29

aten/src/ATen/ATen.h
									
												View File
												
				@ -1,17 +1,26 @@

				#pragma once

				#include "ATen/ATenGeneral.h"

				#include "ATen/CPUGeneral.h"

				#include "ATen/core/ATenGeneral.h"

				#include "ATen/Allocator.h"

				#include "ATen/Scalar.h"

				#include "ATen/Type.h"

				#include "ATen/Generator.h"

				#include "ATen/CPUGeneral.h"

				#include "ATen/CUDAGuard.h"

				#include "ATen/Context.h"

				#include "ATen/Storage.h"

				#include "ATen/Device.h"

				#include "ATen/DeviceGuard.h"

				#include "ATen/DimVector.h"

				#include "ATen/Dispatch.h"

				#include "ATen/Formatting.h"

				#include "ATen/Functions.h"

				#include "ATen/core/Generator.h"

				#include "ATen/core/Layout.h"

				#include "ATen/OptionsGuard.h"

				#include "ATen/core/Scalar.h"

				#include "ATen/ScalarOps.h"

				#include "ATen/core/Storage.h"

				#include "ATen/Tensor.h"

				#include "ATen/TensorGeometry.h"

				#include "ATen/Functions.h"

				#include "ATen/Formatting.h"

				#include "ATen/core/TensorMethods.h"

				#include "ATen/TensorOperators.h"

				#include "ATen/TensorMethods.h"

				#include "ATen/Dispatch.h"

				#include "ATen/core/TensorOptions.h"

				#include "ATen/Type.h"

				#include "ATen/core/Error.h"

									
										11

aten/src/ATen/ATenGeneral.h
									
												View File
											
				@ -1,11 +0,0 @@

				#pragma once

				#ifdef _WIN32

				# ifdef ATen_EXPORTS

				#  define AT_API __declspec(dllexport)

				# else

				#  define AT_API __declspec(dllimport)

				# endif

				#else

				# define AT_API

				#endif

									
										43

aten/src/ATen/AccumulateType.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,43 @@

				#pragma once

				#include "ATen/Config.h"

				#include "ATen/core/Half.h"

				// Defines the accumulation type for a scalar type.

				// Example:

				//   using accscalar_t = acc_type<scalar_t, true>;

				#ifdef __CUDACC__

				#include <cuda.h>

				#include <cuda_fp16.h>

				#endif

				namespace at {

				template <typename T, bool is_cuda>

				struct AccumulateType { };

				#ifdef __CUDACC__

				template <> struct AccumulateType<half, true> { using type = float; };

				#endif

				template <> struct AccumulateType<Half, true> { using type = float; };

				template <> struct AccumulateType<float, true> { using type = float; };

				template <> struct AccumulateType<double, true> { using type = double; };

				template <> struct AccumulateType<int8_t, true> { using type = int64_t; };

				template <> struct AccumulateType<uint8_t, true> { using type = int64_t; };

				template <> struct AccumulateType<char, true> { using type = int64_t; };

				template <> struct AccumulateType<int16_t, true> { using type = int64_t; };

				template <> struct AccumulateType<int32_t, true> { using type = int64_t; };

				template <> struct AccumulateType<int64_t, true> { using type = int64_t; };

				template <> struct AccumulateType<float, false> { using type = double; };

				template <> struct AccumulateType<double, false> { using type = double; };

				template <> struct AccumulateType<int8_t, false> { using type = int64_t; };

				template <> struct AccumulateType<uint8_t, false> { using type = int64_t; };

				template <> struct AccumulateType<char, false> { using type = int64_t; };

				template <> struct AccumulateType<int16_t, false> { using type = int64_t; };

				template <> struct AccumulateType<int32_t, false> { using type = int64_t; };

				template <> struct AccumulateType<int64_t, false> { using type = int64_t; };

				template<typename T, bool is_cuda>

				using acc_type = typename AccumulateType<T, is_cuda>::type;

				}  // namespace at

									
										34

aten/src/ATen/Allocator.h
									
												View File
												
				@ -1,34 +1,2 @@

				#pragma once

				#include <memory>

				#include <stddef.h>

				#include "ATen/Retainable.h"

				namespace at {

				struct Allocator {

				  virtual ~Allocator() {}

				  virtual void* allocate(std::size_t n) const = 0;

				  virtual void deallocate(void* ptr) const = 0;

				};

				namespace detail {

				struct AllocatorRetainable : public Retainable {

				  AllocatorRetainable(std::unique_ptr<Allocator> allocator)

				    : allocator(std::move(allocator)) {}

				  void* allocate(std::size_t n) {

				    return allocator->allocate(n);

				  }

				  void deallocate(void* ptr) {

				    return allocator->deallocate(ptr);

				  }

				private:

				  std::unique_ptr<Allocator> allocator;

				};

				}  // namespace at::detail

				}  // namespace at

				#include <ATen/core/Allocator.h>

									
										192

aten/src/ATen/ArrayRef.h
									
												View File
												
				@ -1,192 +1,2 @@

				//===--- ArrayRef.h - Array Reference Wrapper -------------------*- C++ -*-===//

				//

				//                     The LLVM Compiler Infrastructure

				//

				// This file is distributed under the University of Illinois Open Source

				// License. See LICENSE.TXT for details.

				//

				//===----------------------------------------------------------------------===//

				// ATen: modified from llvm::ArrayRef.

				// removed llvm-specific functionality

				// removed some implicit const -> non-const conversions that rely on

				// complicated std::enable_if meta-programming

				// removed a bunch of slice variants for simplicity...

				#pragma once

				#include <ATen/Error.h>

				#include <ATen/SmallVector.h>

				#include <array>

				#include <iterator>

				#include <vector>

				namespace at {

				  /// ArrayRef - Represent a constant reference to an array (0 or more elements

				  /// consecutively in memory), i.e. a start pointer and a length.  It allows

				  /// various APIs to take consecutive elements easily and conveniently.

				  ///

				  /// This class does not own the underlying data, it is expected to be used in

				  /// situations where the data resides in some other buffer, whose lifetime

				  /// extends past that of the ArrayRef. For this reason, it is not in general

				  /// safe to store an ArrayRef.

				  ///

				  /// This is intended to be trivially copyable, so it should be passed by

				  /// value.

				  template<typename T>

				  class ArrayRef {

				  public:

				    typedef const T *iterator;

				    typedef const T *const_iterator;

				    typedef size_t size_type;

				    typedef std::reverse_iterator<iterator> reverse_iterator;

				  private:

				    /// The start of the array, in an external buffer.

				    const T *Data;

				    /// The number of elements.

				    size_type Length;

				  public:

				    /// @name Constructors

				    /// @{

				    /// Construct an empty ArrayRef.

				    /*implicit*/ ArrayRef() : Data(nullptr), Length(0) {}

				    /// Construct an ArrayRef from a single element.

				    /*implicit*/ ArrayRef(const T &OneElt)

				      : Data(&OneElt), Length(1) {}

				    /// Construct an ArrayRef from a pointer and length.

				    /*implicit*/ ArrayRef(const T *data, size_t length)

				      : Data(data), Length(length) {}

				    /// Construct an ArrayRef from a range.

				    ArrayRef(const T *begin, const T *end)

				      : Data(begin), Length(end - begin) {}

				    /// Construct an ArrayRef from a SmallVector. This is templated in order to

				    /// avoid instantiating SmallVectorTemplateCommon<T> whenever we

				    /// copy-construct an ArrayRef.

				    template<typename U>

				    /*implicit*/ ArrayRef(const SmallVectorTemplateCommon<T, U> &Vec)

				      : Data(Vec.data()), Length(Vec.size()) {

				    }

				    /// Construct an ArrayRef from a std::vector.

				    template<typename A>

				    /*implicit*/ ArrayRef(const std::vector<T, A> &Vec)

				      : Data(Vec.data()), Length(Vec.size()) {}

				    /// Construct an ArrayRef from a std::array

				    template <size_t N>

				    /*implicit*/ constexpr ArrayRef(const std::array<T, N> &Arr)

				        : Data(Arr.data()), Length(N) {}

				    /// Construct an ArrayRef from a C array.

				    template <size_t N>

				    /*implicit*/ constexpr ArrayRef(const T (&Arr)[N]) : Data(Arr), Length(N) {}

				    /// Construct an ArrayRef from a std::initializer_list.

				    /*implicit*/ ArrayRef(const std::initializer_list<T> &Vec)

				    : Data(Vec.begin() == Vec.end() ? (T*)nullptr : Vec.begin()),

				      Length(Vec.size()) {}

				    /// @}

				    /// @name Simple Operations

				    /// @{

				    iterator begin() const { return Data; }

				    iterator end() const { return Data + Length; }

				    reverse_iterator rbegin() const { return reverse_iterator(end()); }

				    reverse_iterator rend() const { return reverse_iterator(begin()); }

				    /// empty - Check if the array is empty.

				    bool empty() const { return Length == 0; }

				    const T *data() const { return Data; }

				    /// size - Get the array size.

				    size_t size() const { return Length; }

				    /// front - Get the first element.

				    const T &front() const {

				      AT_ASSERT(!empty(), "Empty list!");

				      return Data[0];

				    }

				    /// back - Get the last element.

				    const T &back() const {

				      AT_ASSERT(!empty(), "Empty list!");

				      return Data[Length-1];

				    }

				    /// equals - Check for element-wise equality.

				    bool equals(ArrayRef RHS) const {

				      if (Length != RHS.Length)

				        return false;

				      return std::equal(begin(), end(), RHS.begin());

				    }

				    /// slice(n, m) - Chop off the first N elements of the array, and keep M

				    /// elements in the array.

				    ArrayRef<T> slice(size_t N, size_t M) const {

				      AT_ASSERT(N+M <= size(), "Invalid specifier");

				      return ArrayRef<T>(data()+N, M);

				    }

				    /// slice(n) - Chop off the first N elements of the array.

				    ArrayRef<T> slice(size_t N) const { return slice(N, size() - N); }

				    /// @}

				    /// @name Operator Overloads

				    /// @{

				    const T &operator[](size_t Index) const {

				      return Data[Index];

				    }

				    /// Vector compatibility

				    const T &at(size_t Index) const {

				      AT_ASSERT(Index < Length, "Invalid index!");

				      return Data[Index];

				    }

				    /// Disallow accidental assignment from a temporary.

				    ///

				    /// The declaration here is extra complicated so that "arrayRef = {}"

				    /// continues to select the move assignment operator.

				    template <typename U>

				    typename std::enable_if<std::is_same<U, T>::value, ArrayRef<T>>::type &

				    operator=(U &&Temporary) = delete;

				    /// Disallow accidental assignment from a temporary.

				    ///

				    /// The declaration here is extra complicated so that "arrayRef = {}"

				    /// continues to select the move assignment operator.

				    template <typename U>

				    typename std::enable_if<std::is_same<U, T>::value, ArrayRef<T>>::type &

				    operator=(std::initializer_list<U>) = delete;

				    /// @}

				    /// @name Expensive Operations

				    /// @{

				    std::vector<T> vec() const {

				      return std::vector<T>(Data, Data+Length);

				    }

				    /// @}

				    /// @name Conversion operators

				    /// @{

				    operator std::vector<T>() const {

				      return std::vector<T>(Data, Data+Length);

				    }

				    /// @}

				  };

				} // end namespace at

				#include <ATen/core/ArrayRef.h>

									
										2

aten/src/ATen/Backend.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,2 @@

				#pragma once

				#include <ATen/core/Backend.h>

									
										2

aten/src/ATen/Backtrace.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,2 @@

				#pragma once

				#include <ATen/core/Backtrace.h>

									
										592

aten/src/ATen/CMakeLists.txt
									
												View File
												
				@ -1,132 +1,12 @@

				CMAKE_MINIMUM_REQUIRED(VERSION 2.8)

				cmake_minimum_required(VERSION 3.0 FATAL_ERROR)

				SET(CMAKE_MODULE_PATH ${CMAKE_CURRENT_SOURCE_DIR}/cmake ${CMAKE_MODULE_PATH})

				# avoid some cmake warnings

				IF(POLICY CMP0026)

				 CMAKE_POLICY(SET CMP0026 OLD)

				ENDIF()

				IF(MSVC AND NOT "${CMAKE_BUILD_TYPE}" MATCHES "Debug")

				  SET(MSVC_OPT_FLAG "/Ox /fp:strict ")

				ELSE()

				  SET(MSVC_OPT_FLAG "")

				ENDIF()

				########################

				# SET_SOURCE_FILES_PROPERTIES must be in the same CMakeLists.txt file as the target that includes the file

				# so we need to set these commands here rather than in src/TH

				IF(C_SSE4_1_FOUND AND C_SSE4_2_FOUND)

				  IF(MSVC)

				    SET_SOURCE_FILES_PROPERTIES(${PROJECT_SOURCE_DIR}/src/TH/generic/simd/convolve5x5_sse.c PROPERTIES COMPILE_FLAGS "${MSVC_OPT_FLAG}/fp:fast")

				  ELSE(MSVC)

				    SET_SOURCE_FILES_PROPERTIES(${PROJECT_SOURCE_DIR}/src/TH/generic/simd/convolve5x5_sse.c PROPERTIES COMPILE_FLAGS "-O3 -ffast-math")

				  ENDIF(MSVC)

				ENDIF(C_SSE4_1_FOUND AND C_SSE4_2_FOUND)

				IF(C_AVX_FOUND)

				  IF(MSVC)

				    SET_SOURCE_FILES_PROPERTIES(${PROJECT_SOURCE_DIR}/src/TH/generic/simd/convolve5x5_avx.c PROPERTIES COMPILE_FLAGS "${MSVC_OPT_FLAG}/fp:fast ${C_AVX_FLAGS}")

				    SET_SOURCE_FILES_PROPERTIES(${PROJECT_SOURCE_DIR}/src/TH/vector/AVX.c PROPERTIES COMPILE_FLAGS "${MSVC_OPT_FLAG}/arch:AVX ${C_AVX_FLAGS}")

				  ELSE(MSVC)

				    SET_SOURCE_FILES_PROPERTIES(${PROJECT_SOURCE_DIR}/src/TH/generic/simd/convolve5x5_avx.c PROPERTIES COMPILE_FLAGS "-O3 -ffast-math ${C_AVX_FLAGS}")

				    SET_SOURCE_FILES_PROPERTIES(${PROJECT_SOURCE_DIR}/src/TH/vector/AVX.c PROPERTIES COMPILE_FLAGS "-O3 ${C_AVX_FLAGS}")

				  ENDIF(MSVC)

				ENDIF(C_AVX_FOUND)

				IF(C_AVX2_FOUND)

				  IF(MSVC)

				    SET_SOURCE_FILES_PROPERTIES(${PROJECT_SOURCE_DIR}/src/TH/vector/AVX2.cpp PROPERTIES COMPILE_FLAGS "${MSVC_OPT_FLAG}/arch:AVX2 ${C_AVX2_FLAGS}")

				  ELSE(MSVC)

				    SET_SOURCE_FILES_PROPERTIES(${PROJECT_SOURCE_DIR}/src/TH/vector/AVX2.cpp PROPERTIES COMPILE_FLAGS "-O3 ${C_AVX2_FLAGS}")

				  ENDIF(MSVC)

				ENDIF(C_AVX2_FOUND)

				IF(NOT MSVC AND NOT "${CMAKE_C_COMPILER_ID}" MATCHES "Clang")

				  SET_SOURCE_FILES_PROPERTIES(${PROJECT_SOURCE_DIR}/src/TH/THAtomic.c PROPERTIES COMPILE_FLAGS "-fno-openmp")

				  SET_SOURCE_FILES_PROPERTIES(${PROJECT_SOURCE_DIR}/src/TH/THAllocator.c PROPERTIES COMPILE_FLAGS "-fno-openmp")

				ENDIF()

				FILE(GLOB cpu_kernel_cpp_in RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "native/cpu/*.cpp")

				LIST(APPEND CPU_CAPABILITY_NAMES "DEFAULT")

				IF(MSVC)

				  LIST(APPEND CPU_CAPABILITY_FLAGS "${MSVC_OPT_FLAG}")

				ELSE(MSVC)

				  LIST(APPEND CPU_CAPABILITY_FLAGS "-O3")

				ENDIF(MSVC)

				IF(CXX_AVX_FOUND)

				  SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DHAVE_AVX_CPU_DEFINITION")

				  LIST(APPEND CPU_CAPABILITY_NAMES "AVX")

				  IF(MSVC)

				    LIST(APPEND CPU_CAPABILITY_FLAGS "${MSVC_OPT_FLAG}/arch:AVX")

				  ELSE(MSVC)

				    LIST(APPEND CPU_CAPABILITY_FLAGS "-O3 -mavx")

				  ENDIF(MSVC)

				ENDIF(CXX_AVX_FOUND)

				IF(CXX_AVX2_FOUND)

				  SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -DHAVE_AVX2_CPU_DEFINITION")

				  LIST(APPEND CPU_CAPABILITY_NAMES "AVX2")

				  IF(MSVC)

				    LIST(APPEND CPU_CAPABILITY_FLAGS "${MSVC_OPT_FLAG}/arch:AVX2")

				  ELSE(MSVC)

				    LIST(APPEND CPU_CAPABILITY_FLAGS "-O3 -mavx2")

				  ENDIF(MSVC)

				ENDIF(CXX_AVX2_FOUND)

				list(LENGTH CPU_CAPABILITY_NAMES NUM_CPU_CAPABILITY_NAMES)

				math(EXPR NUM_CPU_CAPABILITY_NAMES "${NUM_CPU_CAPABILITY_NAMES}-1")

				FOREACH(i RANGE ${NUM_CPU_CAPABILITY_NAMES})

				  FOREACH(IMPL ${cpu_kernel_cpp_in})

				    LIST(GET CPU_CAPABILITY_NAMES ${i} CPU_CAPABILITY)

				    SET(NEW_IMPL ${CMAKE_CURRENT_BINARY_DIR}/${IMPL}.${CPU_CAPABILITY}.cpp)

				    CONFIGURE_FILE(${IMPL} ${NEW_IMPL} COPYONLY)

				    SET(cpu_kernel_cpp ${NEW_IMPL} ${cpu_kernel_cpp}) # Create list of copies

				    LIST(GET CPU_CAPABILITY_FLAGS ${i} FLAGS)

				    IF(MSVC)

				      SET(MACRO_FLAG "/DCPU_CAPABILITY=${CPU_CAPABILITY} /DCPU_CAPABILITY_${CPU_CAPABILITY}")

				    ELSE(MSVC)

				      SET(MACRO_FLAG "-DCPU_CAPABILITY=${CPU_CAPABILITY} -DCPU_CAPABILITY_${CPU_CAPABILITY}")

				    ENDIF(MSVC)

				    SET_SOURCE_FILES_PROPERTIES(${NEW_IMPL} PROPERTIES COMPILE_FLAGS "${FLAGS} ${MACRO_FLAG}")

				  ENDFOREACH()

				ENDFOREACH()

				################################################################################

				# Helper functions

				################################################################################

				FUNCTION(EXCLUDE_DIR list_name dir_name)

				  # A helper that excludes all files that contain dir_name in their file path

				  SET(local_list ${${list_name}})

				  FOREACH(source ${local_list})

				    IF(${source} MATCHES ${dir_name})

				      MESSAGE(STATUS "Excluding " ${source} " from the build")

				      LIST(REMOVE_ITEM local_list ${source})

				    ENDIF()

				  ENDFOREACH()

				  SET(${list_name} ${local_list} PARENT_SCOPE)

				ENDFUNCTION()

				function(filter_list output input)

				    unset(result)

				    foreach(filename ${${input}})

				        foreach(pattern ${ARGN})

				            if("${filename}" MATCHES "${pattern}")

				                list(APPEND result "${filename}")

				            endif()

				        endforeach()

				    endforeach()

				    set(${output} ${result} PARENT_SCOPE)

				endfunction()

				IF ($ENV{TH_BINARY_BUILD})

				  MESSAGE(STATUS "TH_BINARY_BUILD detected. Statically linking libstdc++")

				  SET(CMAKE_CXX_FLAGS "-static-libstdc++ ${CMAKE_CXX_FLAGS}")

				ENDIF()

				IF(NOT MSVC)

				  SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-ignored-qualifiers")

				  SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-ignored-qualifiers")

				  SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -Wno-absolute-value")

				  SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -Wno-absolute-value")

				ENDIF(NOT MSVC)

				# Can be compiled standalone

				IF(NOT AT_INSTALL_BIN_DIR OR NOT AT_INSTALL_LIB_DIR OR NOT AT_INSTALL_INCLUDE_DIR OR NOT AT_INSTALL_SHARE_DIR)

				@ -136,131 +16,94 @@ IF(NOT AT_INSTALL_BIN_DIR OR NOT AT_INSTALL_LIB_DIR OR NOT AT_INSTALL_INCLUDE_DI

				  SET(AT_INSTALL_SHARE_DIR "share" CACHE PATH "AT install include subdirectory")

				ENDIF()

				# TODO: Maybe put this in the generated files directory

				CONFIGURE_FILE(Config.h.in "${CMAKE_CURRENT_SOURCE_DIR}/Config.h")

				CONFIGURE_FILE(cuda/CUDAConfig.h.in "${CMAKE_CURRENT_SOURCE_DIR}/cuda/CUDAConfig.h")

				FILE(GLOB base_h RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "*.h")

				FILE(GLOB base_cpp RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "*.cpp")

				FILE(GLOB native_cpp RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "native/*.cpp")

				FILE(GLOB native_cudnn_cpp RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "native/cudnn/*.cpp")

				FILE(GLOB native_cuda_cu RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "native/cuda/*.cu")

				FILE(GLOB native_mkl_cpp RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "native/mkl/*.cpp")

				FILE(GLOB native_mkldnn_cpp RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "native/mkldnn/*.cpp")

				# NB: If you edit these globs, you'll have to update setup.py package_data as well

				FILE(GLOB base_h "*.h" "detail/*.h")

				FILE(GLOB base_cpp "*.cpp" "detail/*.cpp")

				add_subdirectory(core)

				FILE(GLOB cuda_h "cuda/*.h" "cuda/detail/*.h" "cuda/*.cuh" "cuda/detail/*.cuh")

				FILE(GLOB cuda_cpp "cuda/*.cpp" "cuda/detail/*.cpp")

				FILE(GLOB cuda_cu "cuda/*.cu" "cuda/detail/*.cu")

				FILE(GLOB cudnn_h "cudnn/*.h" "cudnn/*.cuh")

				FILE(GLOB cudnn_cpp "cudnn/*.cpp")

				FILE(GLOB miopen_h "miopen/*.h")

				FILE(GLOB miopen_cpp "miopen/*.cpp")

				FILE(GLOB mkl_cpp "mkl/*.cpp")

				FILE(GLOB mkldnn_cpp "mkldnn/*.cpp")

				FILE(GLOB_RECURSE cuda_h

				     RELATIVE "${CMAKE_CURRENT_SOURCE_DIR}"

				     "cuda/*.cuh" "cuda/*.h" "cudnn/*.cuh" "cudnn/*.h")

				FILE(GLOB native_cpp "native/*.cpp")

				FILE(GLOB native_sparse_cpp "native/sparse/*.cpp")

				FILE(GLOB native_sparse_cuda_cu "native/sparse/cuda/*.cu")

				FILE(GLOB native_sparse_cuda_cpp "native/sparse/cuda/*.cpp")

				FILE(GLOB native_cudnn_cpp "native/cudnn/*.cpp")

				FILE(GLOB native_miopen_cpp "native/miopen/*.cpp")

				FILE(GLOB native_cuda_cu "native/cuda/*.cu")

				FILE(GLOB native_cuda_cpp "native/cuda/*.cpp")

				FILE(GLOB native_mkl_cpp "native/mkl/*.cpp")

				FILE(GLOB native_mkldnn_cpp "native/mkldnn/*.cpp")

				FILE(GLOB cudnn_cpp RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "cudnn/*.cpp")

				FILE(GLOB mkl_cpp RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "mkl/*.cpp")

				FILE(GLOB mkldnn_cpp RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "mkldnn/*.cpp")

				FILE(GLOB all_python RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "*.py")

				FILE(GLOB_RECURSE aten_cuda_cu RELATIVE ${CMAKE_CURRENT_SOURCE_DIR} "cuda/*.cu")

				IF(DEFINED ENV{PYTORCH_PYTHON})

				  message(STATUS "Using python found in $ENV{PYTORCH_PYTHON}")

				  SET(PYCMD "$ENV{PYTORCH_PYTHON}")

				ELSE()

				  SET(PYCMD "python")

				ENDIF()

				SET(GEN_COMMAND

				    ${PYCMD} ${CMAKE_CURRENT_SOURCE_DIR}/gen.py ${CUDA_FLAG}

				    -s ${CMAKE_CURRENT_SOURCE_DIR}

				    ${cwrap_files}

				)

				EXECUTE_PROCESS(

				    COMMAND ${GEN_COMMAND}

				      --output-dependencies ${CMAKE_CURRENT_BINARY_DIR}/generated_cpp.txt

				    RESULT_VARIABLE RETURN_VALUE

				)

				if (NOT RETURN_VALUE EQUAL 0)

				    message(STATUS ${generated_cpp})

				    message(FATAL_ERROR "Failed to get generated_cpp list")

				set(all_cpu_cpp ${base_cpp} ${ATen_CORE_SRCS} ${native_cpp} ${native_sparse_cpp} ${native_mkl_cpp} ${native_mkldnn_cpp} ${generated_cpp} ${ATen_CPU_SRCS} ${cpu_kernel_cpp})

				if(AT_MKL_ENABLED)

				  set(all_cpu_cpp ${all_cpu_cpp} ${mkl_cpp})

				endif()

				file(READ ${CMAKE_CURRENT_BINARY_DIR}/generated_cpp.txt generated_cpp)

				FILE(GLOB_RECURSE all_templates "templates/*")

				FILE(MAKE_DIRECTORY ${CMAKE_CURRENT_BINARY_DIR}/ATen)

				ADD_CUSTOM_COMMAND(OUTPUT ${generated_cpp}

				COMMAND ${GEN_COMMAND}

				DEPENDS ${all_python} ${all_templates} ${cwrap_files})

				# Generated headers used from a cuda (.cu) file are

				# not tracked correctly in cmake . We make the libATen.so depend explicitly

				# on building the generated aten files to workaround.

				ADD_CUSTOM_TARGET(aten_files_are_generated

				  DEPENDS ${generated_cpp}

				)

				SET(all_cpp ${base_cpp} ${native_cpp} ${native_cudnn_cpp} ${native_mkl_cpp} ${native_mkldnn_cpp} ${generated_cpp} ${ATen_CPU_SRCS} ${cpu_kernel_cpp})

				INCLUDE_DIRECTORIES(${ATen_CPU_INCLUDE})

				IF(NOT NO_CUDA)

				  INCLUDE_DIRECTORIES(${ATen_CUDA_INCLUDE})

				  INCLUDE_DIRECTORIES("${CUDA_SDK_ROOT_DIR}/common/inc")

				  INCLUDE_DIRECTORIES("${CMAKE_CURRENT_SOURCE_DIR}/cuda")

				  SET(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} ${aten_cuda_cu} ${native_cuda_cu})

				  SET(all_cpp ${all_cpp} ${ATen_CUDA_SRCS})

				  IF(CUDNN_FOUND)

				    SET(all_cpp ${all_cpp} ${cudnn_cpp})

				  ENDIF()

				  IF(AT_MKL_ENABLED)

				    SET(all_cpp ${all_cpp} ${mkl_cpp})

				  ENDIF()

				if(AT_MKLDNN_ENABLED)

				  set(all_cpu_cpp ${all_cpu_cpp} ${mkldnn_cpp})

				endif()

				IF(AT_MKLDNN_ENABLED)

				  SET(all_cpp ${all_cpp} ${mkldnn_cpp})

				ENDIF()

				IF(USE_CUDA OR USE_ROCM)

				  list(APPEND ATen_CUDA_INCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/cuda)

				  set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} ${cuda_cu} ${native_cuda_cu} ${native_sparse_cuda_cu})

				  set(all_cuda_cpp ${native_sparse_cuda_cpp} ${cuda_cpp} ${native_cuda_cpp} ${cuda_generated_cpp} ${ATen_CUDA_SRCS})

				  IF(USE_CUDA)

				    SET(all_cuda_cpp ${native_cudnn_cpp} ${native_miopen_cpp} ${all_cuda_cpp})

				    IF(CUDNN_FOUND)

				      SET(all_cuda_cpp ${all_cuda_cpp} ${cudnn_cpp})

				    ENDIF()

				  ELSEIF(USE_ROCM)

				    SET(all_cuda_cpp ${native_cudnn_cpp} ${native_miopen_cpp} ${miopen_cpp} ${all_cuda_cpp})

				  ENDIF()

				endif()

				filter_list(generated_h generated_cpp "\\.h$")

				filter_list(cuda_generated_h cuda_generated_cpp "\\.h$")

				INCLUDE_DIRECTORIES(${CMAKE_CURRENT_SOURCE_DIR}/..)

				list(APPEND ATen_CPU_INCLUDE ${CMAKE_CURRENT_SOURCE_DIR}/..)

				# so the build can find the generated header files

				INCLUDE_DIRECTORIES(${CMAKE_CURRENT_BINARY_DIR})

				list(APPEND ATen_CPU_INCLUDE ${CMAKE_CURRENT_BINARY_DIR})

				IF(NOT AT_LINK_STYLE)

				  SET(AT_LINK_STYLE SHARED)

				ENDIF()

				IF(CUDA_FOUND)

				  CUDA_ADD_LIBRARY(ATen ${AT_LINK_STYLE} ${all_cpp})

				ELSE()

				  ADD_LIBRARY(ATen ${AT_LINK_STYLE} ${all_cpp})

				ENDIF()

				ADD_DEPENDENCIES(ATen aten_files_are_generated)

				set(TBB_ROOT_DIR "${PROJECT_SOURCE_DIR}/src/ATen/cpu/tbb/tbb_remote")

				set(TBB_BUILD_STATIC ON CACHE BOOL " " FORCE)

				set(TBB_BUILD_SHARED OFF CACHE BOOL " " FORCE)

				set(TBB_BUILD_TBBMALLOC OFF CACHE BOOL " " FORCE)

				set(TBB_BUILD_TBBMALLOC_PROXY OFF CACHE BOOL " " FORCE)

				set(TBB_BUILD_TESTS OFF CACHE BOOL " " FORCE)

				add_subdirectory(${PROJECT_SOURCE_DIR}/src/ATen/cpu/tbb)

				set_property(TARGET tbb_static tbb_def_files PROPERTY FOLDER "dependencies")

				target_include_directories(tbb_static PUBLIC ${TBB_ROOT_DIR}/include)

				target_link_libraries(ATen tbb_static)

				if(NOT ${CMAKE_VERSION} VERSION_LESS "3.1")

				    SET_PROPERTY(TARGET ATen PROPERTY CXX_STANDARD 11)

				endif(NOT ${CMAKE_VERSION} VERSION_LESS "3.1")

				IF(BLAS_FOUND)

				  IF ($ENV{TH_BINARY_BUILD})

				    MESSAGE(STATUS "TH_BINARY_BUILD detected. Enabling special linkage.")

				    TARGET_LINK_LIBRARIES(ATen "${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")

				    list(APPEND ATen_CPU_DEPENDENCY_LIBS

				      "${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")

				    if(USE_CUDA OR USE_ROCM)

				      list(APPEND ATen_CUDA_DEPENDENCY_LIBS

				        "${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")

				    endif()

				  ELSE ($ENV{TH_BINARY_BUILD})

				    TARGET_LINK_LIBRARIES(ATen ${BLAS_LIBRARIES})

				    list(APPEND ATen_CPU_DEPENDENCY_LIBS ${BLAS_LIBRARIES})

				    if(USE_CUDA OR USE_ROCM)

				      list(APPEND ATen_CUDA_DEPENDENCY_LIBS "${BLAS_LIBRARIES}")

				    endif()

				  ENDIF ($ENV{TH_BINARY_BUILD})

				ENDIF(BLAS_FOUND)

				IF(LAPACK_FOUND)

				  TARGET_LINK_LIBRARIES(ATen ${LAPACK_LIBRARIES})

				  list(APPEND ATen_CPU_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})

				  if(USE_CUDA OR USE_ROCM)

				    # Although Lapack provides CPU (and thus, one might expect that ATen_cuda

				    # would not need this at all), some of our libraries (magma in particular)

				    # backend to CPU BLAS/LAPACK implementations, and so it is very important

				    # we get the *right* implementation, because even if the symbols are the

				    # same, LAPACK implementions may have different calling conventions.

				    # This caused https://github.com/pytorch/pytorch/issues/7353

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${LAPACK_LIBRARIES})

				  endif()

				ENDIF(LAPACK_FOUND)

				IF (UNIX AND NOT APPLE)

				@ -268,7 +111,7 @@ IF (UNIX AND NOT APPLE)

				   # https://github.com/libgit2/libgit2/issues/2128#issuecomment-35649830

				   CHECK_LIBRARY_EXISTS(rt clock_gettime "time.h" NEED_LIBRT)

				   IF(NEED_LIBRT)

				     TARGET_LINK_LIBRARIES(ATen rt)

				     list(APPEND ATen_CPU_DEPENDENCY_LIBS rt)

				     SET(CMAKE_REQUIRED_LIBRARIES ${CMAKE_REQUIRED_LIBRARIES} rt)

				   ENDIF(NEED_LIBRT)

				ENDIF(UNIX AND NOT APPLE)

				@ -295,89 +138,258 @@ IF(UNIX)

				  ENDIF(HAVE_MALLOC_USABLE_SIZE)

				ENDIF(UNIX)

				IF(NOT MSVC)

				  TARGET_LINK_LIBRARIES(ATen m)

				ELSE(NOT MSVC)

				  IF(AT_MKL_MT)

				    set_target_properties(ATen PROPERTIES LINK_FLAGS_RELEASE "/NODEFAULTLIB:vcomp")

				    set_target_properties(ATen PROPERTIES LINK_FLAGS_DEBUG "/NODEFAULTLIB:vcomp")

				    set_target_properties(ATen PROPERTIES STATIC_LIBRARY_FLAGS "/NODEFAULTLIB:vcomp")

				  ENDIF(AT_MKL_MT)

				ENDIF(NOT MSVC)

				# Is __thread supported?

				IF(NOT MSVC)

				  CHECK_C_SOURCE_COMPILES("static __thread int x = 1; int main() { return x; }" C_HAS_THREAD)

				ELSE(NOT MSVC)

				  CHECK_C_SOURCE_COMPILES("static __declspec( thread ) int x = 1; int main() { return x; }" C_HAS_THREAD)

				ENDIF(NOT MSVC)

				IF(NOT C_HAS_THREAD)

				  MESSAGE(STATUS "Warning: __thread is not supported, generating thread-unsafe code")

				ELSE(NOT C_HAS_THREAD)

				  SET(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DTH_HAVE_THREAD")

				ENDIF(NOT C_HAS_THREAD)

				if(NOT MSVC)

				  list(APPEND ATen_CPU_DEPENDENCY_LIBS m)

				endif()

				if(MKLDNN_FOUND)

				  target_link_libraries(ATen ${MKLDNN_LIBRARIES})

				  list(APPEND ATen_CPU_DEPENDENCY_LIBS ${MKLDNN_LIBRARIES})

				endif(MKLDNN_FOUND)

				# ---[ Configure cpuinfo

				IF(NOT TARGET cpuinfo)

				  SET(CPUINFO_BUILD_TOOLS OFF CACHE BOOL "")

				  SET(CPUINFO_BUILD_UNIT_TESTS OFF CACHE BOOL "")

				  SET(CPUINFO_BUILD_MOCK_TESTS OFF CACHE BOOL "")

				  SET(CPUINFO_BUILD_BENCHMARKS OFF CACHE BOOL "")

				  ADD_SUBDIRECTORY("cpu/cpuinfo")

				ENDIF()

				TARGET_LINK_LIBRARIES(ATen cpuinfo)

				list(APPEND ATen_CPU_DEPENDENCY_LIBS cpuinfo)

				IF(CUDA_FOUND)

				  TARGET_LINK_LIBRARIES(ATen

				    ${CUDA_LIBRARIES}

				    ${CUDA_cusparse_LIBRARY}

				    ${CUDA_curand_LIBRARY})

				  CUDA_ADD_CUBLAS_TO_TARGET(ATen)

				  CUDA_ADD_CUFFT_TO_TARGET(ATen)

				if(NOT MSVC AND NOT EMSCRIPTEN)

				  # Preserve values for the main build

				  set(__aten_sleef_build_shared_libs ${BUILD_SHARED_LIBS})

				  set(__aten_sleef_build_tests ${BUILD_TESTS})

				  # Unset our restrictive C++ flags here and reset them later.

				  # Remove this once we use proper target_compile_options.

				  set(OLD_CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS})

				  set(CMAKE_CXX_FLAGS)

				  # Bump up optimization level for sleef to -O1, since at -O0 the compiler

				  # excessively spills intermediate vector registers to the stack

				  # and makes things run impossibly slowly

				  set(OLD_CMAKE_C_FLAGS_DEBUG ${CMAKE_C_FLAGS_DEBUG})

				  IF(${CMAKE_C_FLAGS_DEBUG} MATCHES "-O0")

				    string(REGEX REPLACE "-O0" "-O1" CMAKE_C_FLAGS_DEBUG ${OLD_CMAKE_C_FLAGS_DEBUG})

				  ELSE()

				    set(CMAKE_C_FLAGS_DEBUG "${CMAKE_C_FLAGS_DEBUG} -O1")

				  ENDIF()

				  set(BUILD_SHARED_LIBS OFF CACHE BOOL "Build sleef static" FORCE)

				  set(BUILD_DFT OFF CACHE BOOL "Don't build sleef DFT lib" FORCE)

				  set(BUILD_GNUABI_LIBS OFF CACHE BOOL "Don't build sleef gnuabi libs" FORCE)

				  set(BUILD_TESTS OFF CACHE BOOL "Don't build sleef tests" FORCE)

				  add_subdirectory("${CMAKE_CURRENT_SOURCE_DIR}/../../../third_party/sleef" ${CMAKE_BINARY_DIR}/sleef)

				  set_property(TARGET sleef PROPERTY FOLDER "dependencies")

				  list(APPEND ATen_THIRD_PARTY_INCLUDE ${CMAKE_BINARY_DIR}/include)

				  link_directories(${CMAKE_BINARY_DIR}/sleef/lib)

				  list(APPEND ATen_CPU_DEPENDENCY_LIBS sleef)

				  set(CMAKE_C_FLAGS_DEBUG ${OLD_CMAKE_C_FLAGS_DEBUG})

				  set(CMAKE_CXX_FLAGS ${OLD_CMAKE_CXX_FLAGS})

				  # Set these back. TODO: Use SLEEF_ to pass these instead

				  set(BUILD_SHARED_LIBS ${__aten_sleef_build_shared_libs} CACHE BOOL "Build shared libs" FORCE)

				  set(BUILD_TESTS ${__aten_sleef_build_tests} CACHE BOOL "Build tests" FORCE)

				endif()

				IF(USE_CUDA AND NOT USE_ROCM)

				  IF ($ENV{ATEN_STATIC_CUDA})

				    # CuFFT has a complicated static story (especially around CUDA < 9) because it has device callback support

				    # we first have to build a fake lib that links with no device callbacks,

				    # and then we link against this object file.

				    # This was recommended by the CuFFT team at NVIDIA

				    # build fake CuFFT lib in build dir

				    EXECUTE_PROCESS(COMMAND touch ${CMAKE_CURRENT_BINARY_DIR}/empty_file.cc)

				    if(${CUDA_VERSION_MAJOR} EQUAL "8")

				      SET(CUFFT_FAKELINK_OPTIONS

					--generate-code arch=compute_35,code=sm_35

					--generate-code arch=compute_50,code=sm_50

					--generate-code arch=compute_60,code=sm_60)

				    elseif(${CUDA_VERSION_MAJOR} EQUAL "9")

				      SET(CUFFT_FAKELINK_OPTIONS

					--generate-code arch=compute_35,code=sm_35

					--generate-code arch=compute_50,code=sm_50

					--generate-code arch=compute_60,code=sm_60

					--generate-code arch=compute_70,code=sm_70)

				    else()

				      MESSAGE(FATAL_ERROR "Unhandled major cuda version ${CUDA_VERSION_MAJOR}")

				    endif()

				    ADD_CUSTOM_COMMAND(

				      OUTPUT ${CMAKE_CURRENT_BINARY_DIR}/cufft_static_library.a

				      COMMAND "${CUDA_TOOLKIT_ROOT_DIR}/bin/nvcc" -o ${CMAKE_CURRENT_BINARY_DIR}/cufft_static_library.a -Xcompiler -fPIC

				      ${CUFFT_FAKELINK_OPTIONS}

				      --device-link ${CMAKE_CURRENT_BINARY_DIR}/empty_file.cc -lcufft_static -lculibos

				      )

				    ADD_CUSTOM_TARGET(FAKELINKED_CUFFT_TARGET DEPENDS ${CMAKE_CURRENT_BINARY_DIR}/cufft_static_library.a)

				    add_library(FAKELINKED_CUFFT STATIC IMPORTED GLOBAL)

				    add_dependencies(FAKELINKED_CUFFT FAKELINKED_CUFFT_TARGET)

				    set_target_properties(FAKELINKED_CUFFT PROPERTIES IMPORTED_LOCATION ${CMAKE_CURRENT_BINARY_DIR}/cufft_static_library.a)

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS

				      ${CUDA_LIBRARIES}

				      ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcusparse_static.a

				      ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcurand_static.a

				      ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcublas_static.a

				      FAKELINKED_CUFFT

				      ${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcufft_static.a

				      )

				  ELSE()

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS

				      ${CUDA_LIBRARIES}

				      ${CUDA_cusparse_LIBRARY}

				      ${CUDA_curand_LIBRARY})

				  ENDIF()

				  if(CUDNN_FOUND)

				    target_link_libraries(ATen ${CUDNN_LIBRARIES})

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${CUDNN_LIBRARIES})

				  endif(CUDNN_FOUND)

				  IF(USE_MAGMA)

				    TARGET_LINK_LIBRARIES(ATen ${MAGMA_LIBRARIES})

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${MAGMA_LIBRARIES})

				    IF ($ENV{TH_BINARY_BUILD})

				      TARGET_LINK_LIBRARIES(ATen "${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")

				      list(APPEND ATen_CUDA_DEPENDENCY_LIBS

				        "${BLAS_LIBRARIES};${BLAS_LIBRARIES};${BLAS_LIBRARIES}")

				    ENDIF($ENV{TH_BINARY_BUILD})

				  ENDIF(USE_MAGMA)

				  IF ($ENV{ATEN_STATIC_CUDA})

				    TARGET_LINK_LIBRARIES(ATen "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libculibos.a")

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libculibos.a")

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS "${CUDA_TOOLKIT_ROOT_DIR}/lib64/libcudart_static.a")

				  ENDIF($ENV{ATEN_STATIC_CUDA})

				ENDIF()

				INSTALL(TARGETS ATen

				  RUNTIME DESTINATION "${AT_INSTALL_BIN_DIR}"

				  LIBRARY DESTINATION "${AT_INSTALL_LIB_DIR}"

				  ARCHIVE DESTINATION "${AT_INSTALL_LIB_DIR}")

				IF(USE_ROCM)

				 ### Link in the ROCm libraries BLAS / RNG.

				 FIND_LIBRARY(ROCBLAS_LIBRARY rocblas HINTS ${ROCBLAS_PATH}/lib)

				 FIND_LIBRARY(HIPRAND_LIBRARY hiprand HINTS ${HIPRAND_PATH}/lib)

				 list(APPEND ATen_CUDA_DEPENDENCY_LIBS ${ROCBLAS_LIBRARY} ${HIPRAND_LIBRARY})

				ENDIF()

				# Include CPU paths for CUDA as well

				list(APPEND ATen_CUDA_INCLUDE ${ATen_CPU_INCLUDE})

				# We have two libraries: libATen_cpu.so and libATen_cuda.so,

				# with libATen_cuda.so depending on libATen_cpu.so.  The CPU library

				# contains CPU code only.  libATen_cpu.so is invariant to the setting

				# of USE_CUDA (it always builds the same way); libATen_cuda.so is only

				# built when USE_CUDA=1 and CUDA is available.

				set(ATen_CPU_SRCS ${all_cpu_cpp})

				if(AT_LINK_STYLE STREQUAL "INTERFACE")

				  # Source code can't be added to an interface library, so it is

				  # passed back to be compiled into the containing library

				  add_library(ATen_cpu INTERFACE)

				  list(APPEND ATen_CPU_DEPENDENCY_LIBS ATEN_CPU_FILES_GEN_LIB)

				else()

				  add_library(ATen_cpu ${AT_LINK_STYLE} ${ATen_CPU_SRCS})

				  if (ATen_THIRD_PARTY_INCLUDE)

				    target_include_directories(ATen_cpu SYSTEM PRIVATE ${ATen_THIRD_PARTY_INCLUDE})

				  endif()

				  target_include_directories(ATen_cpu INTERFACE $<INSTALL_INTERFACE:include>)

				  target_include_directories(ATen_cpu PRIVATE ${ATen_CPU_INCLUDE})

				  target_link_libraries(ATen_cpu PUBLIC ${ATen_CPU_DEPENDENCY_LIBS})

				  target_link_libraries(ATen_cpu PRIVATE ATEN_CPU_FILES_GEN_LIB)

				  caffe2_interface_library(ATen_cpu ATen_cpu_library)

				  # Set standard properties on the target

				  aten_set_target_props(ATen_cpu)

				  # Make sure these don't get built by parent

				  set(ATen_CPU_SRCS)

				endif()

				if(USE_CUDA OR USE_ROCM)

				  set(ATen_CUDA_SRCS ${all_cuda_cpp})

				  if(AT_LINK_STYLE STREQUAL "INTERFACE")

				    # Source code can't be added to an interface library, so it is

				    # passed back to be compiled into the containing library

				    add_library(ATen_cuda INTERFACE)

				    list(APPEND ATen_CUDA_DEPENDENCY_LIBS ATEN_CUDA_FILES_GEN_LIB)

				  else()

				    # A hack to deal with cuda library dependencies and modern CMake: the

				    # CUDA_ADD_LIBRARY includes a target_link_libraries, and as a result,

				    # one cannot use PUBLIC/PRIVATE/INTERFACE for the target anymore. This

				    # hack adds the PRIVATE keywords to CUDA_LIBRARIES so we can deal with

				    # it. We will then manually add the cudart library as interface libs.

				    set(__tmp ${CUDA_LIBRARIES})

				    set(CUDA_LIBRARIES PRIVATE ${CUDA_LIBRARIES})

				    torch_cuda_based_add_library(ATen_cuda ${AT_LINK_STYLE} ${ATen_CUDA_SRCS})

				    set(CUDA_LIBRARIES ${__tmp})

				    target_link_libraries(ATen_cuda INTERFACE caffe2::cudart)

				    target_include_directories(

				        ATen_cuda INTERFACE $<INSTALL_INTERFACE:include>)

				    target_include_directories(

				        ATen_cuda PRIVATE ${ATen_THIRD_PARTY_INCLUDE})

				    target_include_directories(

				        ATen_cuda PRIVATE ${ATen_CUDA_INCLUDE})

				    target_link_libraries(

				        ATen_cuda PRIVATE ${ATen_CUDA_DEPENDENCY_LIBS} ATEN_CUDA_FILES_GEN_LIB)

				    # These public dependencies must go after the previous dependencies, as the

				    # order of the libraries in the linker call matters here when statically

				    # linking; libculibos and cublas must be last.

				    target_link_libraries(

				        ATen_cuda PUBLIC ATen_cpu ${ATen_PUBLIC_CUDA_DEPENDENCY_LIBS})

				    # Set standard properties on the target

				    aten_set_target_props(ATen_cuda)

				    caffe2_interface_library(ATen_cuda ATen_cuda_library)

				    # Make sure these don't get built by parent

				    set(ATen_CUDA_SRCS)

				  endif()

				endif()

				if(NOT AT_LINK_STYLE STREQUAL "INTERFACE")

				  if(USE_CUDA)

				    if (NOT $ENV{ATEN_STATIC_CUDA})

				      cuda_add_cublas_to_target(ATen_cuda)

				      cuda_add_cufft_to_target(ATen_cuda)

				    endif()

				  endif()

				  if(NOT MSVC)

				    aten_compile_options(ATen_cpu)

				    if(USE_CUDA OR USE_ROCM)

				      aten_compile_options(ATen_cuda)

				    endif()

				  endif()

				  if(NOT ${CMAKE_VERSION} VERSION_LESS "3.1")

				    set_property(TARGET ATen_cpu PROPERTY CXX_STANDARD 11)

				    if(USE_CUDA OR USE_ROCM)

				      set_property(TARGET ATen_cuda PROPERTY CXX_STANDARD 11)

				    endif()

				  endif()

				endif()

				GET_TARGET_PROPERTY(ATEN_OUTPUT_NAME ATen LOCATION)

				GET_FILENAME_COMPONENT(ATEN_OUTPUT_NAME ${ATEN_OUTPUT_NAME} NAME)

				SET(ATEN_LIBRARIES "${CMAKE_INSTALL_PREFIX}/${AT_INSTALL_LIB_DIR}/${ATEN_OUTPUT_NAME}")

				SET(ATEN_INCLUDE_DIR "${CMAKE_INSTALL_PREFIX}/${AT_INSTALL_INCLUDE_DIR}")

				CONFIGURE_FILE(ATenConfig.cmake.in "${CMAKE_CURRENT_BINARY_DIR}/cmake-exports/ATenConfig.cmake")

				INSTALL(FILES "${CMAKE_CURRENT_BINARY_DIR}/cmake-exports/ATenConfig.cmake"

				  DESTINATION "${AT_INSTALL_SHARE_DIR}/cmake/ATen")

				FOREACH(HEADER ${base_h})

				  INSTALL(FILES ${HEADER} DESTINATION ${AT_INSTALL_INCLUDE_DIR}/ATen)

				ENDFOREACH()

				FOREACH(HEADER ${generated_h})

				  INSTALL(FILES ${CMAKE_CURRENT_BINARY_DIR}/${HEADER}

				  DESTINATION ${AT_INSTALL_INCLUDE_DIR}/ATen)

				ENDFOREACH()

				FOREACH(HEADER ${cuda_h})

				  # https://stackoverflow.com/questions/11096471/how-can-i-install-a-hierarchy-of-files-using-cmake

				  GET_FILENAME_COMPONENT(DIR ${HEADER} DIRECTORY)

				# https://stackoverflow.com/questions/11096471/how-can-i-install-a-hierarchy-of-files-using-cmake

				FOREACH(HEADER ${base_h} ${ATen_CORE_HEADERS} ${cuda_h} ${cudnn_h})

				  string(REPLACE "${CMAKE_CURRENT_SOURCE_DIR}/" "" HEADER_SUB ${HEADER})

				  GET_FILENAME_COMPONENT(DIR ${HEADER_SUB} DIRECTORY)

				  INSTALL(FILES ${HEADER} DESTINATION ${AT_INSTALL_INCLUDE_DIR}/ATen/${DIR})

				ENDFOREACH()

				INSTALL(FILES ${CMAKE_CURRENT_BINARY_DIR}/ATen/Declarations.yaml

				FOREACH(HEADER ${generated_h} ${cuda_generated_h})

				  # NB: Assumed to be flat

				  INSTALL(FILES ${HEADER} DESTINATION ${AT_INSTALL_INCLUDE_DIR}/ATen)

				ENDFOREACH()

				INSTALL(FILES ${CMAKE_BINARY_DIR}/aten/src/ATen/Declarations.yaml

				  DESTINATION ${AT_INSTALL_SHARE_DIR}/ATen)

				if(ATEN_NO_TEST)

				  message("disable test because ATEN_NO_TEST is set")

				else()

				  add_subdirectory(test)

				endif()

				# Pass source, includes, and libs to parent

				set(ATen_CORE_SRCS ${ATen_CORE_SRCS} PARENT_SCOPE)

				set(ATen_CPU_SRCS ${ATen_CPU_SRCS} PARENT_SCOPE)

				set(ATen_CUDA_SRCS ${ATen_CUDA_SRCS} PARENT_SCOPE)

				set(ATen_CPU_TEST_SRCS ${ATen_CPU_TEST_SRCS} PARENT_SCOPE)

				set(ATen_CUDA_TEST_SRCS ${ATen_CUDA_TEST_SRCS} PARENT_SCOPE)

				set(ATen_CPU_INCLUDE ${ATen_CPU_INCLUDE} PARENT_SCOPE)

				set(ATen_THIRD_PARTY_INCLUDE ${ATen_THIRD_PARTY_INCLUDE} PARENT_SCOPE)

				set(ATen_CUDA_INCLUDE ${ATen_CUDA_INCLUDE} PARENT_SCOPE)

				set(ATen_CPU_DEPENDENCY_LIBS ${ATen_CPU_DEPENDENCY_LIBS} PARENT_SCOPE)

				set(ATen_CUDA_DEPENDENCY_LIBS ${ATen_CUDA_DEPENDENCY_LIBS} PARENT_SCOPE)

									
										789

aten/src/ATen/CPUApplyUtils.h
									
												View File
												
				@ -1,18 +1,109 @@

				#pragma once

				#include <sstream>

				#include "ATen/Parallel.h"

				#include "ATen/TensorUtils.h"

				#include <limits>

				#include <utility>

				#include <cstring>

				namespace at {

				/*

				[collapse dims] Updates sizes, and strides to reflect a "collapse" of

				the info, possibly excluding the optional excludeDim. A "collapsed" version

				of the info is the fewest dims that order the tensor's elements in the same

				way as the original info. If excludeDim is specified, the collapse is the

				fewest dims that order the tensor's elements as the original and preserve the

				excluded dimension, unless the tensor collapses to a point.

				This function returns a pair of values.

				1) The (new) index of the preserved dimension if excludeDim is

				specified. 0 if the tensor is collapsed to a point. -1

				otherwise.

				2) The new number of dimensions.

				*/

				template <typename T>

				inline std::pair<int64_t, int64_t> collapse_dims(

				    T* sizes,

				    T* strides,

				    int64_t dims,

				    const int excludeDim = -1) {

				  AT_CHECK(

				      excludeDim >= -1 && excludeDim < dims,

				      "expected excluded dim between -1 and dims - 1");

				  int64_t stopDim = (excludeDim == -1) ? dims : excludeDim;

				  int64_t newIndex = -1;

				  int64_t oldIndex = 0;

				  int64_t remappedExcludedDim = -1;

				  while (oldIndex < dims) {

				    // Finds a dimension to collapse into

				    for (; oldIndex < stopDim; ++oldIndex) {

				      if (sizes[oldIndex] == 1) {

				        continue;

				      }

				      ++newIndex;

				      sizes[newIndex] = sizes[oldIndex];

				      strides[newIndex] = strides[oldIndex];

				      ++oldIndex;

				      break;

				    }

				    // Collapses dims

				    for (; oldIndex < stopDim; ++oldIndex) {

				      if (sizes[oldIndex] == 1) {

				        continue;

				      }

				      if (strides[newIndex] == sizes[oldIndex] * strides[oldIndex]) {

				        sizes[newIndex] *= sizes[oldIndex];

				        strides[newIndex] = strides[oldIndex];

				      } else {

				        ++newIndex;

				        sizes[newIndex] = sizes[oldIndex];

				        strides[newIndex] = strides[oldIndex];

				      }

				    }

				    // Handles excludeDim being set (oldIndex == excludeDim)

				    if (oldIndex != dims) {

				      // Preserves excluded dimension

				      ++newIndex;

				      sizes[newIndex] = sizes[oldIndex];

				      strides[newIndex] = strides[oldIndex];

				      remappedExcludedDim = newIndex;

				      // Restarts iteration after excludeDim

				      ++oldIndex;

				      stopDim = dims;

				    }

				  }

				  // Handles special case of all dims size 1

				  if (newIndex == -1 || (newIndex == 0 && sizes[0] == 1)) {

				    dims = 1;

				    sizes[0] = 1;

				    strides[0] = 1;

				    return std::pair<int64_t, int64_t>(0, 1);

				  }

				  dims = newIndex + 1;

				  return std::pair<int64_t, int64_t>(remappedExcludedDim, dims);

				}

				/*

				 * The basic strategy for apply is as follows:

				 *

				 * 1. Starting with the outermost index, loop until we reach a dimension where the

				 * data is no longer contiguous, i.e. the stride at that dimension is not equal to

				 * the size of the tensor defined by the outer dimensions. Let's call this outer

				 * (contiguous) tensor A. Note that if the Tensor is contiguous, then A is equal

				 * to the entire Tensor. Let's call the inner tensor B.

				 * 1. Starting with the outermost index, loop until we reach a dimension where

				 * the data is no longer contiguous, i.e. the stride at that dimension is not

				 * equal to the size of the tensor defined by the outer dimensions. Let's call

				 * this outer (contiguous) tensor A. Note that if the Tensor is contiguous, then

				 * A is equal to the entire Tensor. Let's call the inner tensor B.

				 *

				 * 2. We loop through the indices in B, starting at its outermost dimension. For

				 * example, if B is a 2x2 matrix, then we do:

				@ -22,289 +113,455 @@ namespace at {

				 * B[1][0]

				 * B[1][1]

				 *

				 * We set the offset into the underlying storage as (storageOffset + stride_B * index_B),

				 * i.e. basically we compute the offset into the storage as we would normally for a

				 * Tensor. But because we are guaranteed the subsequent data is contiguous in memory, we

				 * can simply loop for sizeof(A) iterations and perform the operation, without having to

				 * follow the order described by the strides of A.

				 * We set the offset into the underlying storage as (storageOffset + stride_B *

				 * index_B), i.e. basically we compute the offset into the storage as we would

				 * normally for a Tensor. But because we are guaranteed the subsequent data is

				 * contiguous in memory, we can simply loop for sizeof(A) iterations and perform

				 * the operation, without having to follow the order described by the strides of

				 * A.

				 *

				 * 3. As an optimization, we merge dimensions of A that are contiguous in memory. For

				 * example, if A is a 3x3x3x3 tensor narrowed from a 3x3x4x3 tensor, then the first two

				 * dimensions can be merged for the purposes of APPLY, reducing the number of nested

				 * loops.

				 * 3. As an optimization, we merge dimensions of A that are contiguous in

				 * memory. For example, if A is a 3x3x3x3 tensor narrowed from a 3x3x4x3 tensor,

				 * then the first two dimensions can be merged for the purposes of APPLY,

				 * reducing the number of nested loops.

				 */

				// TODO: turn this macro into a proper template

				#define __ATH_TENSOR_APPLYX_PREAMBLE(TYPE, ATENSOR, DIM, ALLOW_CONTIGUOUS) \

				  TYPE *ATENSOR##_data = NULL; \

				  int64_t *ATENSOR##_counter = NULL, *ATENSOR##_sizes = NULL, *ATENSOR##_strides = NULL, *ATENSOR##_dimOffset = NULL; \

				  int64_t ATENSOR##_stride = 0, ATENSOR##_size = 0, ATENSOR##_dim = 0, ATENSOR##_i; \

				  int ATENSOR##_contiguous = ALLOW_CONTIGUOUS && DIM < 0; \

				\

				  if(ATENSOR.sizes().equals({0})) \

				    TH_TENSOR_APPLY_hasFinished = true; \

				  else \

				  { \

				    ATENSOR##_data = ATENSOR.data<TYPE>(); \

				    ATENSOR##_size = 1; \

				    ATENSOR##_stride = 1; \

				    for(ATENSOR##_i = ATENSOR.dim() - 1; ATENSOR##_i >= 0; ATENSOR##_i--) { \

				      if(ATENSOR.sizes()[ATENSOR##_i] != 1) { \

				        if(ATENSOR.strides()[ATENSOR##_i] == ATENSOR##_size && ATENSOR##_i != DIM) \

				          ATENSOR##_size *= ATENSOR.sizes()[ATENSOR##_i]; \

				        else{ \

				          ATENSOR##_contiguous = 0; \

				          break; \

				        } \

				      } \

				    } \

				    if (!ATENSOR##_contiguous) { \

				      /* Find the dimension of contiguous sections */ \

				      ATENSOR##_dim = 1; \

				      for(ATENSOR##_i = ATENSOR.dim() - 2; ATENSOR##_i >= 0; ATENSOR##_i--) \

				      { \

				        if(ATENSOR.strides()[ATENSOR##_i] != ATENSOR.strides()[ATENSOR##_i+1] * ATENSOR.sizes()[ATENSOR##_i+1] || ATENSOR##_i == DIM || ATENSOR##_i+1 == DIM) \

				          ATENSOR##_dim++; \

				      } \

				      /* Allocate an array of 3*dim elements, where dim is the number of contiguous sections */ \

				      ATENSOR##_counter = new int64_t[3*ATENSOR##_dim]; \

				      ATENSOR##_sizes = ATENSOR##_counter + ATENSOR##_dim; \

				      ATENSOR##_strides = ATENSOR##_counter + 2*ATENSOR##_dim; \

				      TH_TENSOR_dim_index = ATENSOR##_dim-1; \

				      ATENSOR##_dimOffset = (DIM == ATENSOR.dim()-1) ? &ATENSOR##_i : &ATENSOR##_counter[DIM]; \

				      ATENSOR##_sizes[TH_TENSOR_dim_index] = ATENSOR.sizes()[ATENSOR.dim()-1]; \

				      ATENSOR##_strides[TH_TENSOR_dim_index] = ATENSOR.strides()[ATENSOR.dim()-1]; \

				      /* ATENSOR##_counter tracks where we are in the storage. The offset into the */ \

				      /* storage is given by storage_offset + (i * j), where i is the stride */ \

				      /* vector and j is tensor_counter vector. This sets the starting position for the loop. */ \

				      for(ATENSOR##_i = ATENSOR##_dim-1; ATENSOR##_i >= 0; --ATENSOR##_i) { \

				        ATENSOR##_counter[ATENSOR##_i] = 0; \

				      } \

				      for(ATENSOR##_i = ATENSOR.dim()-2; ATENSOR##_i >= 0; --ATENSOR##_i) { \

				        if (ATENSOR.strides()[ATENSOR##_i] == ATENSOR.strides()[ATENSOR##_i+1] * ATENSOR.sizes()[ATENSOR##_i+1] && ATENSOR##_i != DIM && ATENSOR##_i+1 != DIM) { \

				          ATENSOR##_sizes[TH_TENSOR_dim_index] = ATENSOR.sizes()[ATENSOR##_i] * ATENSOR##_sizes[TH_TENSOR_dim_index]; \

				          if (DIM != ATENSOR.dim()-1 && ATENSOR##_i < DIM) \

				            ATENSOR##_dimOffset--; \

				        } else { \

				          --TH_TENSOR_dim_index; \

				          ATENSOR##_sizes[TH_TENSOR_dim_index] = ATENSOR.sizes()[ATENSOR##_i]; \

				          ATENSOR##_strides[TH_TENSOR_dim_index] = ATENSOR.strides()[ATENSOR##_i]; \

				        } \

				      } \

				      /* Size of the inner most section */ \

				      ATENSOR##_size = ATENSOR##_sizes[ATENSOR##_dim-1]; \

				      /* Stride of the inner most section */ \

				      ATENSOR##_stride = ATENSOR##_strides[ATENSOR##_dim-1]; \

				    } \

				  } \

				  ATENSOR##_i = 0;

				// TODO: turn this macro into a proper template

				#define  __ATH_TENSOR_APPLYX_UPDATE_COUNTERS(ATENSOR, ALWAYS_UPDATE) \

				  if(ATENSOR##_i == ATENSOR##_size || ALWAYS_UPDATE) \

				  { \

				    if(ATENSOR##_contiguous) \

				      break; \

				\

				    if(ATENSOR##_dim == 1) \

				       break; \

				\

				    /* Reset pointer to beginning of loop */ \

				    ATENSOR##_data -= ATENSOR##_size*ATENSOR##_stride; \

				    for(ATENSOR##_i = ATENSOR##_dim-2; ATENSOR##_i >= 0; ATENSOR##_i--) \

				    { \

				      ATENSOR##_counter[ATENSOR##_i]++; \

				      /* Jump ahread by the stride of this dimension */ \

				      ATENSOR##_data += ATENSOR##_strides[ATENSOR##_i]; \

				\

				      if(ATENSOR##_counter[ATENSOR##_i]  == ATENSOR##_sizes[ATENSOR##_i]) \

				      { \

				        if(ATENSOR##_i == 0) \

				        { \

				          TH_TENSOR_APPLY_hasFinished = true; \

				          break; \

				        } \

				          else \

				        { \

				          /* Reset the pointer to the beginning of the chunk defined by this dimension */ \

				          ATENSOR##_data -= ATENSOR##_counter[ATENSOR##_i]*ATENSOR##_strides[ATENSOR##_i]; \

				          ATENSOR##_counter[ATENSOR##_i] = 0; \

				        } \

				      } \

				      else \

				        break; \

				    } \

				    ATENSOR##_i = 0; \

				inline Tensor sort_strides(Tensor& tensor_) {

				  IntList strides = tensor_.strides();

				  std::vector<int64_t> indices;

				  indices.reserve(tensor_.ndimension());

				  for (int64_t i = 0; i < tensor_.ndimension(); i++) {

				    indices.push_back(i);

				  }

				  std::sort(indices.begin(), indices.end(), [&strides](int64_t i1, int64_t i2) {

				    return strides[i1] > strides[i2];

				  });

				  Tensor tensor = tensor_.permute(indices);

				  return tensor;

				}

				template <typename T, int N>

				struct strided_tensor_iter_fixed {

				 public:

				  T* data_ = NULL;

				  int64_t dim_ = 0;

				  int64_t counter_[N] = {0};

				  int64_t sizes_[N] = {0};

				  int64_t strides_[N] = {0};

				  strided_tensor_iter_fixed(strided_tensor_iter_fixed const&) = delete;

				  void operator=(strided_tensor_iter_fixed const& x) = delete;

				  strided_tensor_iter_fixed(strided_tensor_iter_fixed&&) = default;

				  strided_tensor_iter_fixed(Tensor& tensor, bool sort_strides = false)

				      : data_(tensor.data<T>()) {

				    std::memset(counter_, 0, sizeof(int64_t) * N);

				    std::memcpy(

				        sizes_, tensor.sizes().data(), tensor.ndimension() * sizeof(int64_t));

				    std::memcpy(

				        strides_,

				        tensor.strides().data(),

				        tensor.ndimension() * sizeof(int64_t));

				    dim_ = std::get<1>(collapse_dims(sizes_, strides_, tensor.ndimension()));

				  }

				};

				template <typename T>

				struct strided_tensor_iter {

				 private:

				 public:

				  T* data_ = NULL;

				  int64_t dim_;

				  std::vector<int64_t> counter_;

				  std::vector<int64_t> sizes_;

				  std::vector<int64_t> strides_;

				  strided_tensor_iter(strided_tensor_iter const&) = delete;

				  void operator=(strided_tensor_iter const& x) = delete;

				  strided_tensor_iter(strided_tensor_iter&&) = default;

				  strided_tensor_iter(Tensor& tensor)

				      : data_(tensor.data<T>()),

				        dim_(tensor.ndimension()),

				        counter_(dim_, 0),

				        sizes_(tensor.sizes().vec()),

				        strides_(tensor.strides().vec()) {

				    dim_ = std::get<1>(collapse_dims(sizes_.data(), strides_.data(), dim_));

				  }

				};

				inline bool _all_equal_numel(at::ArrayRef<Tensor> tensors) {

				  if (tensors.size() == 0)

				    return true;

				  int64_t all_numel = tensors[0].numel();

				  for (size_t i = 1; i < tensors.size(); i++) {

				    if (tensors[i].numel() != all_numel)

				      return false;

				  }

				  return true;

				}

				inline std::string _all_equal_numel_error(at::ArrayRef<Tensor> tensors) {

				  std::ostringstream oss;

				  oss << "inconsistent tensor size, expected ";

				  for (size_t i = 0; i < tensors.size() - 1; i++) {

				    oss << tensors[i].sizes() << ", ";

				  }

				  oss << "and " << tensors[tensors.size() - 1].sizes()

				      << " to have the same number of elements, but got ";

				  for (size_t i = 0; i < tensors.size() - 1; i++) {

				    oss << tensors[i].numel() << ", ";

				  }

				  oss << "and " << tensors[tensors.size() - 1].numel()

				      << " elements respectively";

				  return oss.str();

				}

				inline bool _apply_preamble(ArrayRef<Tensor> tensors) {

				  checkBackend("CPU_tensor_apply", tensors, Backend::CPU);

				  if (!_all_equal_numel(tensors))

				    AT_ERROR(_all_equal_numel_error(tensors));

				  // An empty tensor has no elements

				  for (auto& t : tensors)

				    if (t.numel() == 0)

				      return false;

				  return true;

				}

				inline int64_t _max_dim_tensors(ArrayRef<Tensor> tensors) {

				  int64_t dim = 0;

				  for (auto& t : tensors)

				    dim = std::max(dim, t.ndimension());

				  return dim;

				}

				inline void iterate(int64_t size){};

				template <typename Arg, typename... Args>

				inline void iterate(int64_t size, Arg& iter, Args&... iter_tail) {

				  iter.counter_[iter.dim_ - 1] += size;

				  iter.data_ = iter.data_ + size * iter.strides_[iter.dim_ - 1];

				  iterate(size, iter_tail...);

				}

				inline bool iterate_continue() {

				  return true;

				};

				template <typename Arg, typename... Args>

				inline bool iterate_continue(Arg& iter, Args&... iter_tail) {

				  return iter.counter_[iter.dim_ - 1] < iter.sizes_[iter.dim_ - 1] &&

				      iterate_continue(iter_tail...);

				}

				inline int64_t max_iterate_size() {

				  return std::numeric_limits<int64_t>::max();

				};

				template <typename Arg, typename... Args>

				inline int64_t max_iterate_size(Arg& iter, Args&... iter_tail) {

				  return std::min(

				      (iter.sizes_[iter.dim_ - 1] - iter.counter_[iter.dim_ - 1]),

				      max_iterate_size(iter_tail...));

				}

				inline void iterate_overflow(){};

				template <typename Arg, typename... Args>

				inline void iterate_overflow(Arg& iter, Args&... iter_tail) {

				  if (iter.counter_[iter.dim_ - 1] == iter.sizes_[iter.dim_ - 1]) {

				    for (int64_t i = iter.dim_ - 1; i > 0; i--) {

				      if (iter.counter_[i] == iter.sizes_[i]) {

				        iter.counter_[i] = 0;

				        iter.counter_[i - 1]++;

				        iter.data_ = iter.data_ - (iter.sizes_[i] * iter.strides_[i]) +

				            iter.strides_[i - 1];

				      }

				    }

				  }

				  iterate_overflow(iter_tail...);

				}

				inline void forward(int64_t offset){};

				template <typename Arg, typename... Args>

				inline void forward(int64_t offset, Arg& iter, Args&... iter_tail) {

				  int64_t multi = offset;

				  for (int64_t i = iter.dim_ - 1; i >= 0; i--) {

				    int64_t inc = multi % iter.sizes_[i];

				    multi = multi / iter.sizes_[i];

				    iter.data_ = iter.data_ + inc * iter.strides_[i];

				    iter.counter_[i] += inc;

				  }

				  forward(offset, iter_tail...);

				}

				inline int64_t max_dim() {

				  return 0;

				}

				template <typename Arg, typename... Args>

				inline int64_t max_dim(Arg& iter, Args&... iter_tail) {

				  return std::max(iter.dim_, max_dim(iter_tail...));

				}

				inline void apply_op(){};

				template <typename Op, typename... Args>

				inline void

				apply_op(int64_t numel, int64_t offset, const Op& op, Args... iters) {

				  // For 0-dim tensors

				  if (numel == 1 && max_dim(iters...) == 0) {

				    op(*iters.data_...);

				    return;

				  }

				  if (offset > 0)

				    forward(offset, iters...);

				  // Splitting this into chunks helps the compiler create faster assembly

				  for (int64_t i = 0; i < numel;) {

				    for (; iterate_continue(iters...) && i < numel;) {

				      op(*iters.data_...);

				      iterate(1, iters...);

				      i++;

				    }

				    iterate_overflow(iters...);

				  }

				}

				inline void apply_kernel(){};

				// TODO: Deal elegantly with 0-dim tensors. iters.strides_ of 0-dim

				// strided_tensor_iter will be of size 0 for dim 0 and iters.strides_[iters.dim_

				// - 1] will index at -1. C++14 integer_sequence could be of use here.

				template <typename Op, typename... Args>

				inline void

				apply_kernel(int64_t numel, int64_t offset, const Op& op, Args... iters) {

				  if (offset > 0)

				    forward(offset, iters...);

				  int64_t size = std::min(numel, max_iterate_size(iters...));

				  op(size, iters.data_..., iters.strides_[iters.dim_ - 1]...);

				  iterate(size, iters...);

				  iterate_overflow(iters...);

				  int64_t i = size;

				  size = std::min(numel, max_iterate_size(iters...));

				  for (; i < numel;) {

				    op(size, iters.data_..., iters.strides_[iters.dim_ - 1]...);

				    iterate(size, iters...);

				    i += size;

				    iterate_overflow(iters...);

				  }

				}

				template <typename scalar1, typename scalar2, typename Op>

				void CPU_tensor_apply2_dim(Tensor& tensor1, Tensor& tensor2, int64_t dim, Op op) {

				  checkBackend("CPU_tensor_apply2", {tensor1, tensor2}, Backend::CPU);

				  bool TH_TENSOR_APPLY_hasFinished = false;

				  int64_t TH_TENSOR_dim_index = 0;

				  __ATH_TENSOR_APPLYX_PREAMBLE(scalar1, tensor1, dim, 1)

				  __ATH_TENSOR_APPLYX_PREAMBLE(scalar2, tensor2, dim, 1)

				  auto t1_numel = tensor1.numel();

				  auto t2_numel = tensor2.numel();

				  if(t1_numel != t2_numel) {

				    std::ostringstream oss;

				    oss << "inconsistent tensor size, expected " << tensor1.sizes() << " and " << tensor2.sizes()

				        << " to have the same number of elements, but got " << t1_numel << " and " << t2_numel << " elements respectively";

				    throw std::runtime_error(oss.str());

				inline void

				CPU_tensor_parallel_kernel_apply2(Tensor tensor1, Tensor tensor2, const Op op) {

				  if (!_apply_preamble({tensor1, tensor2}))

				    return;

				  if (tensor1.numel() == 1) {

				    op(1, tensor1.data<scalar1>(), tensor2.data<scalar2>(), 0, 0);

				    return;

				  }

				  while(!TH_TENSOR_APPLY_hasFinished)

				  {

				    /* Loop through the inner most region of the Tensor */

				    for(; tensor1_i < tensor1_size && tensor2_i < tensor2_size; tensor1_i++, tensor2_i++, tensor1_data += tensor1_stride, tensor2_data += tensor2_stride)

				    {

				      op(*tensor1_data, *tensor2_data);

				    }

				    __ATH_TENSOR_APPLYX_UPDATE_COUNTERS(tensor1, 0)

				    __ATH_TENSOR_APPLYX_UPDATE_COUNTERS(tensor2, 0)

				  if (tensor1.ndimension() < 8 && tensor2.ndimension() < 8) {

				    parallel_for(

				        0,

				        tensor1.numel(),

				        1,

				        [&tensor1, &tensor2, &op](int64_t begin, int64_t end) {

				          apply_kernel(

				              end - begin,

				              begin,

				              op,

				              strided_tensor_iter_fixed<scalar1, 8>(tensor1),

				              strided_tensor_iter_fixed<scalar2, 8>(tensor2));

				        });

				  } else {

				    parallel_for(

				        0,

				        tensor1.numel(),

				        1,

				        [&tensor1, &tensor2, &op](int64_t begin, int64_t end) {

				          apply_kernel(

				              end - begin,

				              begin,

				              op,

				              strided_tensor_iter<scalar1>(tensor1),

				              strided_tensor_iter<scalar2>(tensor2));

				        });

				  }

				  if(tensor1_counter != NULL)

				    delete [] tensor1_counter;

				  if(tensor2_counter != NULL)

				    delete [] tensor2_counter;

				}

				/*

				  Apply a pointwise operator to two tensors.

				  Apply a pointwise operator to sequence of tensors

				  The calling convention for op is a function/functor that takes takes two references to

				  type scalar; at least one of these references should be non-const in order to write the output.

				  For example, to compute a = b^2, op would be of the form:

				  [](scalar &a_val, const scalar &b_val) { a_val = b_val * b_val; };

				  The calling convention for op is a function/functor that takes takes the same

				  number of pointers of type scalar as the number of given tensors. For example,

				  to compute a = b * c, op would be of the form:

				  [](scalar* a_val, const scalar* b_val, const scalar* c_val) { a_val[0] =

				  b_val[0] * c_val[0]; };

				*/

				template<typename scalar1, typename scalar2, typename Op>

				void CPU_tensor_apply2(Tensor tensor1, Tensor tensor2, Op op) {

				  CPU_tensor_apply2_dim<scalar1, scalar2, Op>(tensor1, tensor2, -1, op);

				template <typename scalar1, typename Op>

				inline void CPU_tensor_apply1(Tensor tensor1, const Op op) {

				  if (!_apply_preamble({tensor1}))

				    return;

				  if (tensor1.ndimension() < 8) {

				    apply_op(

				        tensor1.numel(),

				        0,

				        op,

				        strided_tensor_iter_fixed<scalar1, 8>(tensor1, true));

				  } else {

				    apply_op(tensor1.numel(), 0, op, strided_tensor_iter<scalar1>(tensor1));

				  }

				}

				template<typename scalar1, typename scalar2, typename scalar3, typename Op>

				void CPU_tensor_apply3_dim(Tensor &tensor1, Tensor& tensor2, Tensor& tensor3, int64_t dim, Op op) {

				  checkBackend("CPU_tensor_apply3", {tensor1, tensor2, tensor3}, Backend::CPU);

				  bool TH_TENSOR_APPLY_hasFinished = false;

				  int64_t TH_TENSOR_dim_index = 0;

				  __ATH_TENSOR_APPLYX_PREAMBLE(scalar1, tensor1, dim, 1)

				  __ATH_TENSOR_APPLYX_PREAMBLE(scalar2, tensor2, dim, 1)

				  __ATH_TENSOR_APPLYX_PREAMBLE(scalar3, tensor3, dim, 1)

				  int elements_equal = 1;

				  auto t1_numel = tensor1.numel();

				  auto t2_numel = tensor2.numel();

				  auto t3_numel = tensor3.numel();

				  if(t1_numel!= t2_numel) {

				    elements_equal = 0;

				  } else if(t1_numel != t3_numel) {

				    elements_equal = 0;

				template <typename scalar1, typename scalar2, typename Op>

				inline void CPU_tensor_apply2(Tensor tensor1, Tensor tensor2, const Op op) {

				  if (!_apply_preamble({tensor1, tensor2}))

				    return;

				  if (_max_dim_tensors({tensor1, tensor2}) <= 8) {

				    apply_op(

				        tensor1.numel(),

				        0,

				        op,

				        strided_tensor_iter_fixed<scalar1, 8>(tensor1),

				        strided_tensor_iter_fixed<scalar2, 8>(tensor2));

				  } else {

				    apply_op(

				        tensor1.numel(),

				        0,

				        op,

				        strided_tensor_iter<scalar1>(tensor1),

				        strided_tensor_iter<scalar2>(tensor2));

				  }

				  if (elements_equal == 0) {

				    std::ostringstream oss;

				    oss << "inconsistent tensor size, expected " << tensor1.sizes() << ", " << tensor2.sizes() << ", and " << tensor3.sizes()

				        << " to have the same number of elements, but got " << t1_numel << ", " << t2_numel << ", and " << t3_numel << " elements respectively";

				    throw std::runtime_error(oss.str());

				  }

				  while(!TH_TENSOR_APPLY_hasFinished)

				  {

				    /* Loop through the inner most region of the Tensor */

				    for(; tensor1_i <  tensor1_size && tensor2_i < tensor2_size && tensor3_i < tensor3_size; tensor1_i++, tensor2_i++, tensor3_i++, tensor1_data += tensor1_stride, tensor2_data += tensor2_stride, tensor3_data += tensor3_stride)

				    {

				      op(*tensor1_data, *tensor2_data, *tensor3_data);

				    }

				    __ATH_TENSOR_APPLYX_UPDATE_COUNTERS(tensor1, 0)

				    __ATH_TENSOR_APPLYX_UPDATE_COUNTERS(tensor2, 0)

				    __ATH_TENSOR_APPLYX_UPDATE_COUNTERS(tensor3, 0)

				  }

				  if(tensor1_counter != NULL)

				    delete [] tensor1_counter;

				  if(tensor2_counter != NULL)

				    delete [] tensor2_counter;

				  if(tensor3_counter != NULL)

				    delete [] tensor3_counter;

				}

				/*

				  Apply a pointwise operator to three tensors.

				  The calling convention for op is a function/functor that takes takes three references to

				  type scalar; at least one of these references should be non-const in order to write the output.

				  For example, to compute a = b + c, op would be of the form:

				  [](scalar &a_val, const scalar &b_val, const scalar &c_val) { a_val = b_val + c_val; };

				*/

				template<typename scalar1, typename scalar2, typename scalar3, typename Op>

				void CPU_tensor_apply3(Tensor tensor1, Tensor tensor2, Tensor tensor3, Op op) {

				  CPU_tensor_apply3_dim<scalar1, scalar2, scalar3, Op>(tensor1, tensor2, tensor3, -1, op);

				}

				template <typename scalar1, typename scalar2, typename scalar3, typename scalar4, typename Op>

				void CPU_tensor_apply4_dim(Tensor &tensor1, Tensor& tensor2, Tensor& tensor3, Tensor& tensor4, int64_t dim, Op op) {

				  checkBackend("CPU_tensor_apply4", {tensor1, tensor2, tensor3, tensor4}, Backend::CPU);

				  bool TH_TENSOR_APPLY_hasFinished = false;

				  int64_t TH_TENSOR_dim_index = 0;

				  __ATH_TENSOR_APPLYX_PREAMBLE(scalar1, tensor1, dim, 1)

				  __ATH_TENSOR_APPLYX_PREAMBLE(scalar2, tensor2, dim, 1)

				  __ATH_TENSOR_APPLYX_PREAMBLE(scalar3, tensor3, dim, 1)

				  __ATH_TENSOR_APPLYX_PREAMBLE(scalar4, tensor4, dim, 1)

				  int elements_equal = 1;

				  auto t1_numel = tensor1.numel();

				  auto t2_numel = tensor2.numel();

				  auto t3_numel = tensor3.numel();

				  auto t4_numel = tensor4.numel();

				  if(t1_numel!= t2_numel) {

				    elements_equal = 0;

				  } else if(t1_numel != t3_numel) {

				    elements_equal = 0;

				  } else if(t1_numel != t4_numel) {

				      elements_equal = 0;

				template <typename scalar1, typename scalar2, typename scalar3, typename Op>

				inline void

				CPU_tensor_apply3(Tensor tensor1, Tensor tensor2, Tensor tensor3, const Op op) {

				  if (!_apply_preamble({tensor1, tensor2, tensor3}))

				    return;

				  if (_max_dim_tensors({tensor1, tensor2, tensor3}) <= 8) {

				    apply_op(

				        tensor1.numel(),

				        0,

				        op,

				        strided_tensor_iter_fixed<scalar1, 8>(tensor1),

				        strided_tensor_iter_fixed<scalar2, 8>(tensor2),

				        strided_tensor_iter_fixed<scalar3, 8>(tensor3));

				  } else {

				    apply_op(

				        tensor1.numel(),

				        0,

				        op,

				        strided_tensor_iter<scalar1>(tensor1),

				        strided_tensor_iter<scalar2>(tensor2),

				        strided_tensor_iter<scalar3>(tensor3));

				  }

				  if (elements_equal == 0) {

				    std::ostringstream oss;

				    oss << "inconsistent tensor size, expected " << tensor1.sizes() << ", " << tensor2.sizes() << ", "

				        << tensor3.sizes() << ", and " << tensor4.sizes() << " to have the same number of elements, but got "

				        << t1_numel << ", " << t2_numel << ", " << t3_numel << ", and " << t4_numel << " elements respectively";

				    throw std::runtime_error(oss.str());

				}

				template <

				    typename scalar1,

				    typename scalar2,

				    typename scalar3,

				    typename scalar4,

				    typename Op>

				inline void CPU_tensor_apply4(

				    Tensor tensor1,

				    Tensor tensor2,

				    Tensor tensor3,

				    Tensor tensor4,

				    const Op op) {

				  if (!_apply_preamble({tensor1, tensor2, tensor3, tensor4}))

				    return;

				  if (_max_dim_tensors({tensor1, tensor2, tensor3, tensor4}) <= 8) {

				    apply_op(

				        tensor1.numel(),

				        0,

				        op,

				        strided_tensor_iter_fixed<scalar1, 8>(tensor1),

				        strided_tensor_iter_fixed<scalar2, 8>(tensor2),

				        strided_tensor_iter_fixed<scalar3, 8>(tensor3),

				        strided_tensor_iter_fixed<scalar4, 8>(tensor4));

				  } else {

				    apply_op(

				        tensor1.numel(),

				        0,

				        op,

				        strided_tensor_iter<scalar1>(tensor1),

				        strided_tensor_iter<scalar2>(tensor2),

				        strided_tensor_iter<scalar3>(tensor3),

				        strided_tensor_iter<scalar4>(tensor4));

				  }

				}

				  while(!TH_TENSOR_APPLY_hasFinished)

				  {

				    /* Loop through the inner most region of the Tensor */

				    for(; tensor1_i <  tensor1_size && tensor2_i < tensor2_size && tensor3_i < tensor3_size && tensor4_i < tensor4_size

				        ; tensor1_i++, tensor2_i++, tensor3_i++, tensor4_i++,

				          tensor1_data += tensor1_stride, tensor2_data += tensor2_stride, tensor3_data += tensor3_stride, tensor4_data += tensor4_stride)

				    {

				      op(*tensor1_data, *tensor2_data, *tensor3_data, *tensor4_data);

				    }

				    __ATH_TENSOR_APPLYX_UPDATE_COUNTERS(tensor1, 0)

				    __ATH_TENSOR_APPLYX_UPDATE_COUNTERS(tensor2, 0)

				    __ATH_TENSOR_APPLYX_UPDATE_COUNTERS(tensor3, 0)

				    __ATH_TENSOR_APPLYX_UPDATE_COUNTERS(tensor4, 0)

				template <typename scalar1, typename Op>

				inline void CPU_tensor_parallel_apply1(

				    Tensor tensor1,

				    const Op op,

				    int64_t grain_size = internal::GRAIN_SIZE) {

				  if (!_apply_preamble({tensor1}))

				    return;

				  if (tensor1.ndimension() < 8) {

				    parallel_for(

				        0,

				        tensor1.numel(),

				        grain_size,

				        [&tensor1, &op](int64_t begin, int64_t end) {

				          apply_op(

				              end - begin,

				              begin,

				              op,

				              strided_tensor_iter_fixed<scalar1, 8>(tensor1, true));

				        });

				  } else {

				    parallel_for(

				        0,

				        tensor1.numel(),

				        grain_size,

				        [&tensor1, &op](int64_t begin, int64_t end) {

				          apply_op(

				              end - begin, begin, op, strided_tensor_iter<scalar1>(tensor1));

				        });

				  }

				  if(tensor1_counter != NULL)

				    delete [] tensor1_counter;

				  if(tensor2_counter != NULL)

				    delete [] tensor2_counter;

				  if(tensor3_counter != NULL)

				    delete [] tensor3_counter;

				  if(tensor4_counter != NULL)

				    delete [] tensor4_counter;

				}

				/*

				  Apply a pointwise operator to four tensors.

				  The calling convention for op is a function/functor that takes takes four references to

				  type scalar; at least one of these references should be non-const in order to write the output.

				  For example, to compute a = b + c * d, op would be of the form:

				  [](scalar &a_val, const scalar &b_val, const scalar &c_val, const scalar &d_val) {

				    a_val = b_val + c_val * d_val;

				  };

				*/

				template<typename scalar1, typename scalar2, typename scalar3, typename scalar4, typename Op>

				void CPU_tensor_apply4(Tensor tensor1, Tensor tensor2, Tensor tensor3, Tensor tensor4, Op op) {

				  CPU_tensor_apply4_dim<scalar1, scalar2, scalar3, scalar4, Op>(tensor1, tensor2, tensor3, tensor4, -1, op);

				template <typename scalar1, typename scalar2, typename Op>

				inline void CPU_tensor_parallel_apply2(

				    Tensor tensor1,

				    Tensor tensor2,

				    const Op op,

				    int64_t grain_size = internal::GRAIN_SIZE) {

				  if (!_apply_preamble({tensor1, tensor2}))

				    return;

				  if (tensor1.ndimension() < 8 && tensor2.ndimension() < 8) {

				    parallel_for(

				        0,

				        tensor1.numel(),

				        grain_size,

				        [&tensor1, &tensor2, &op](int64_t begin, int64_t end) {

				          apply_op(

				              end - begin,

				              begin,

				              op,

				              strided_tensor_iter_fixed<scalar1, 8>(tensor1),

				              strided_tensor_iter_fixed<scalar2, 8>(tensor2));

				        });

				  } else {

				    parallel_for(

				        0,

				        tensor1.numel(),

				        grain_size,

				        [&tensor1, &tensor2, &op](int64_t begin, int64_t end) {

				          apply_op(

				              end - begin,

				              begin,

				              op,

				              strided_tensor_iter<scalar1>(tensor1),

				              strided_tensor_iter<scalar2>(tensor2));

				        });

				  }

				}

				}

				} // namespace at

									
										2

aten/src/ATen/CPUFixedAllocator.h
									
												View File
												
				@ -1,7 +1,7 @@

				#pragma once

				#include "ATen/core/Error.h"

				#include "TH/TH.h"

				#include "ATen/Error.h"

				// This file creates a fake allocator that just throws exceptions if

				// it is actually used.

									
										10

aten/src/ATen/CPUGeneral.h
									
												View File
												
				@ -1,12 +1,12 @@

				#pragma once

				// Using AT_API is crucial as otherwise you'll see

				// Using CAFFE2_API is crucial as otherwise you'll see

				// linking errors using MSVC

				// See https://msdn.microsoft.com/en-us/library/a90k134d.aspx

				// This header adds this if using AT_API

				#include "ATen/ATenGeneral.h"

				// This header adds this if using CAFFE2_API

				#include "ATen/core/ATenGeneral.h"

				namespace at {

				AT_API void set_num_threads(int);

				AT_API int get_num_threads();

				CAFFE2_API void set_num_threads(int);

				CAFFE2_API int get_num_threads();

				}

									
										5

aten/src/ATen/CPUGenerator.cpp
									
												View File
												
				@ -37,6 +37,11 @@ CPUGenerator& CPUGenerator::manualSeed(uint64_t seed) {

				  return *this;

				}

				CPUGenerator& CPUGenerator::manualSeedAll(uint64_t seed) {

				  // There's only one CPU generator

				  return manualSeed(seed);

				}

				void * CPUGenerator::unsafeGetTH() {

				  return generator;

				}

									
										20

aten/src/ATen/CPUTypeDefault.cpp
									
										Normal file
									
												View File
												
				@ -0,0 +1,20 @@

				#include <ATen/CPUTypeDefault.h>

				#include <ATen/Context.h>

				#include <ATen/CPUGenerator.h>

				namespace at {

				Allocator* CPUTypeDefault::allocator() const {

				  return getCPUAllocator();

				}

				Device CPUTypeDefault::getDeviceFromPtr(void * data) const {

				  return DeviceType::CPU;

				}

				std::unique_ptr<Generator> CPUTypeDefault::generator() const {

				  return std::unique_ptr<Generator>(new CPUGenerator(&at::globalContext()));

				}

				} // namespace at

									
										14

aten/src/ATen/CPUTypeDefault.h
									
										Normal file
									
												View File
												
				@ -0,0 +1,14 @@

				#pragma once

				#include <ATen/TypeDefault.h>

				namespace at {

				struct CAFFE2_API CPUTypeDefault : public TypeDefault {

				  CPUTypeDefault(TensorTypeId type_id, bool is_variable, bool is_undefined)

				      : TypeDefault(type_id, is_variable, is_undefined) {}

				  Allocator* allocator() const override;

				  Device getDeviceFromPtr(void * data) const override;

				  std::unique_ptr<Generator> generator() const override;

				};

				} // namespace at

									
										43

aten/src/ATen/CUDAFixedAllocator.h
									
												View File
											
				@ -1,43 +0,0 @@

				#pragma once

				#include "THC/THC.h"

				#include "ATen/Error.h"

				// This file creates a fake allocator that just throws exceptions if

				// it is actually used.

				// state passed to the allocator is the std::function<void(void*)> called

				// when the blob is release by ATen

				namespace at {

				static cuda_fixed_malloc(void *, void**, size_t, cudaStream_t) {

				  AT_ERROR("attempting to resize a tensor view of an external blob");

				}

				static cpu_fixed_realloc(void*, void**, size_t, size_t, cudaStream_t) {

				  AT_ERROR("attempting to resize a tensor view of an external blob");

				}

				static cuda_fixed_free(void * state, void * allocation) {

				    auto on_release = static_cast<std::function<void(void*)>*>(state);

				    (*on_release)(allocation);

				    delete on_release;

				}

				static cuda_fixed_emptyCache(void*) {

				  AT_ERROR("?? attempting to resize a tensor view of an external blob");

				}

				static cuda_fixed_cacheInfo(void*, int, size_t*, size_t*) {

				  AT_ERROR("?? attempting to resize a tensor view of an external blob");

				}

				static THCDeviceAllocator CUDA_fixed_allocator = {

				  cuda_fixed_malloc, cuda_fixed_realloc, cuda_fixed_free, cuda_fixed_emptyCache,

				  cuda_fixed_cacheInfo

				};

				}

Compare commits

2861 Commits v0.4.0 ... v1.0rc0

974 .circleci/config.yml Normal file Unescape Escape View File

4 .clang-format Unescape Escape View File

51 .clang-tidy Normal file Unescape Escape View File

1 .gitattributes vendored Normal file Unescape Escape View File

38 .github/ISSUE_TEMPLATE.md vendored Unescape Escape View File

49 .github/ISSUE_TEMPLATE/bug-report.md vendored Normal file Unescape Escape View File

9 .github/ISSUE_TEMPLATE/documentation.md vendored Normal file Unescape Escape View File

24 .github/ISSUE_TEMPLATE/feature-request.md vendored Normal file Unescape Escape View File

13 .github/ISSUE_TEMPLATE/questions-help-support.md vendored Normal file Unescape Escape View File

109 .gitignore vendored Unescape Escape View File

31 .gitmodules vendored Unescape Escape View File

296 .jenkins/caffe2/build.sh Unescape Escape View File

3 .jenkins/caffe2/dirty.sh Unescape Escape View File

105 .jenkins/caffe2/test.sh Unescape Escape View File

3 .jenkins/pytorch/build-asan.sh Unescape Escape View File

115 .jenkins/pytorch/build.sh Unescape Escape View File

35 .jenkins/pytorch/common.sh Unescape Escape View File

6 .jenkins/pytorch/dirty.sh Unescape Escape View File

14 .jenkins/pytorch/enabled-configs.txt Unescape Escape View File

29 .jenkins/pytorch/macos-build-test.sh Unescape Escape View File

72 .jenkins/pytorch/macos-build.sh Executable file Unescape Escape View File

112 .jenkins/pytorch/macos-test.sh Executable file Unescape Escape View File

17 .jenkins/pytorch/multigpu-test.sh Unescape Escape View File

2 .jenkins/pytorch/perf_test/common.sh Unescape Escape View File

16 .jenkins/pytorch/perf_test/compare_with_baseline.py Unescape Escape View File

6 .jenkins/pytorch/perf_test/test_gpu_speed_cudnn_lstm.sh Unescape Escape View File

6 .jenkins/pytorch/perf_test/test_gpu_speed_lstm.sh Unescape Escape View File

6 .jenkins/pytorch/perf_test/test_gpu_speed_mlstm.sh Unescape Escape View File

3 .jenkins/pytorch/perf_test/test_gpu_speed_mnist.sh Unescape Escape View File

11 .jenkins/pytorch/print_sccache_log.py Normal file Unescape Escape View File

4 .jenkins/pytorch/short-perf-test-gpu.sh Unescape Escape View File

176 .jenkins/pytorch/test.sh Unescape Escape View File

108 .jenkins/pytorch/win-build.sh Unescape Escape View File

48 .jenkins/pytorch/win-test.sh Unescape Escape View File

3 aten/.travis.yml → .travis.aten.yml Unescape Escape View File

9 .travis.yml Unescape Escape View File

220 CMakeLists.txt Unescape Escape View File

17 CODEOWNERS Unescape Escape View File

159 CONTRIBUTING.md Unescape Escape View File

42 LICENSE Unescape Escape View File

42 NOTICE Unescape Escape View File

87 README.md Unescape Escape View File

10 aten/.gitmodules vendored Unescape Escape View File

576 aten/CMakeLists.txt Unescape Escape View File

11 aten/README.md Unescape Escape View File

70 aten/cmake/FindCuDNN.cmake Unescape Escape View File

196 aten/cmake/select_compute_arch.cmake Unescape Escape View File

61 aten/contrib/data/BatchDataset.cc Unescape Escape View File

21 aten/contrib/data/BatchDataset.h Unescape Escape View File

30 aten/contrib/data/CMakeLists.txt Unescape Escape View File

63 aten/contrib/data/ConcatDataset.cc Unescape Escape View File

20 aten/contrib/data/ConcatDataset.h Unescape Escape View File

28 aten/contrib/data/Dataset.cc Unescape Escape View File

23 aten/contrib/data/Dataset.h Unescape Escape View File

93 aten/contrib/data/DatasetIterator.h Unescape Escape View File

30 aten/contrib/data/MergeDataset.cc Unescape Escape View File

19 aten/contrib/data/MergeDataset.h Unescape Escape View File

47 aten/contrib/data/ResampleDataset.cc Unescape Escape View File

28 aten/contrib/data/ResampleDataset.h Unescape Escape View File

16 aten/contrib/data/ShuffleDataset.cc Unescape Escape View File

14 aten/contrib/data/ShuffleDataset.h Unescape Escape View File

27 aten/contrib/data/TensorDataset.cc Unescape Escape View File

19 aten/contrib/data/TensorDataset.h Unescape Escape View File

25 aten/contrib/data/TransformDataset.cc Unescape Escape View File

23 aten/contrib/data/TransformDataset.h Unescape Escape View File

22 aten/contrib/data/test/basic.cc Unescape Escape View File

77 aten/contrib/data/threadpool/ThreadPool.cc Unescape Escape View File

65 aten/contrib/data/threadpool/ThreadPool.h Unescape Escape View File

89 aten/contrib/meter/APMeter.cc Unescape Escape View File

22 aten/contrib/meter/APMeter.h Unescape Escape View File

60 aten/contrib/meter/AUCMeter.cc Unescape Escape View File

19 aten/contrib/meter/AUCMeter.h Unescape Escape View File

25 aten/contrib/meter/CMakeLists.txt Unescape Escape View File

59 aten/contrib/meter/ClassErrorMeter.cc Unescape Escape View File

21 aten/contrib/meter/ClassErrorMeter.h Unescape Escape View File

23 aten/contrib/meter/MAPMeter.cc Unescape Escape View File

19 aten/contrib/meter/MAPMeter.h Unescape Escape View File

31 aten/contrib/meter/MSEMeter.cc Unescape Escape View File

2861 Commits

v0.4.0 ... v1.0rc0

974

.circleci/config.yml Normal file

View File

4

.clang-format

View File

51

.clang-tidy Normal file

View File

1

.gitattributes vendored Normal file

View File

38

.github/ISSUE_TEMPLATE.md vendored

View File

49

.github/ISSUE_TEMPLATE/bug-report.md vendored Normal file

View File

9

.github/ISSUE_TEMPLATE/documentation.md vendored Normal file

View File

24

.github/ISSUE_TEMPLATE/feature-request.md vendored Normal file

View File

13

.github/ISSUE_TEMPLATE/questions-help-support.md vendored Normal file

View File

109

.gitignore vendored

View File

31

.gitmodules vendored

View File

296

.jenkins/caffe2/build.sh

View File

3

.jenkins/caffe2/dirty.sh

View File

105

.jenkins/caffe2/test.sh

View File

3

.jenkins/pytorch/build-asan.sh

View File

115

.jenkins/pytorch/build.sh

View File

35

.jenkins/pytorch/common.sh

View File

6

.jenkins/pytorch/dirty.sh

View File

14

.jenkins/pytorch/enabled-configs.txt

View File

29

.jenkins/pytorch/macos-build-test.sh

View File

72

.jenkins/pytorch/macos-build.sh Executable file

View File

112

.jenkins/pytorch/macos-test.sh Executable file

View File

17

.jenkins/pytorch/multigpu-test.sh

View File

2

.jenkins/pytorch/perf_test/common.sh

View File

16

.jenkins/pytorch/perf_test/compare_with_baseline.py

View File

6

.jenkins/pytorch/perf_test/test_gpu_speed_cudnn_lstm.sh

View File

6

.jenkins/pytorch/perf_test/test_gpu_speed_lstm.sh

View File

6

.jenkins/pytorch/perf_test/test_gpu_speed_mlstm.sh

View File

3

.jenkins/pytorch/perf_test/test_gpu_speed_mnist.sh

View File

11

.jenkins/pytorch/print_sccache_log.py Normal file

View File

4

.jenkins/pytorch/short-perf-test-gpu.sh

View File

176

.jenkins/pytorch/test.sh

View File

108

.jenkins/pytorch/win-build.sh

View File

48

.jenkins/pytorch/win-test.sh

View File

3

aten/.travis.yml → .travis.aten.yml

View File

9

.travis.yml

View File

220

CMakeLists.txt

View File

17

CODEOWNERS

View File

159

CONTRIBUTING.md

View File

42

LICENSE

View File

42

NOTICE

View File

87

README.md

View File

10

aten/.gitmodules vendored

View File

576

aten/CMakeLists.txt

View File

11

aten/README.md

View File

70

aten/cmake/FindCuDNN.cmake

View File

196

aten/cmake/select_compute_arch.cmake

View File

61

aten/contrib/data/BatchDataset.cc

View File

21

aten/contrib/data/BatchDataset.h

View File

30

aten/contrib/data/CMakeLists.txt

View File

63

aten/contrib/data/ConcatDataset.cc

View File

20

aten/contrib/data/ConcatDataset.h

View File

28

aten/contrib/data/Dataset.cc

View File

23

aten/contrib/data/Dataset.h

View File

93

aten/contrib/data/DatasetIterator.h

View File

30

aten/contrib/data/MergeDataset.cc

View File

19

aten/contrib/data/MergeDataset.h

View File

47

aten/contrib/data/ResampleDataset.cc

View File

28

aten/contrib/data/ResampleDataset.h

View File

16

aten/contrib/data/ShuffleDataset.cc

View File

14

aten/contrib/data/ShuffleDataset.h

View File

27

aten/contrib/data/TensorDataset.cc

View File

19

aten/contrib/data/TensorDataset.h

View File

25

aten/contrib/data/TransformDataset.cc

View File

23

aten/contrib/data/TransformDataset.h

View File

22

aten/contrib/data/test/basic.cc

View File

77

aten/contrib/data/threadpool/ThreadPool.cc

View File

65

aten/contrib/data/threadpool/ThreadPool.h

View File

89

aten/contrib/meter/APMeter.cc

View File

22

aten/contrib/meter/APMeter.h

View File

60

aten/contrib/meter/AUCMeter.cc

View File

19

aten/contrib/meter/AUCMeter.h

View File

25

aten/contrib/meter/CMakeLists.txt

View File

59

aten/contrib/meter/ClassErrorMeter.cc

View File

21

aten/contrib/meter/ClassErrorMeter.h

View File

23

aten/contrib/meter/MAPMeter.cc

View File

19

aten/contrib/meter/MAPMeter.h

View File

31

aten/contrib/meter/MSEMeter.cc

View File

19

aten/contrib/meter/MSEMeter.h

View File